Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis

Similar documents
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

SVM Ensemble Model for Investment Prediction

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Chapter 6. The stacking ensemble approach

Early defect identification of semiconductor processes using machine learning

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

GA as a Data Optimization Tool for Predictive Analytics

Classification of Bad Accounts in Credit Card Industry

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

Combining SOM and GA-CBR for Flow Time Prediction in Semiconductor Manufacturing Factory

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

E-commerce Transaction Anomaly Classification

Data Mining - Evaluation of Classifiers

Spam detection with data mining method:

An Overview of Knowledge Discovery Database and Data mining Techniques

Advanced Ensemble Strategies for Polynomial Models

Data Mining. Nonlinear Classification

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data, Measurements, Features

Management Science Letters

Alpha Cut based Novel Selection for Genetic Algorithm

Support Vector Machines Explained

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms

Evolutionary Tuning of Combined Multiple Models

A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES

A Robust Method for Solving Transcendental Equations

Operations Research and Knowledge Modeling in Data Mining

Random Forest Based Imbalanced Data Cleaning and Classification

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

Intrusion Detection via Machine Learning for SCADA System Protection

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Predict the Popularity of YouTube Videos Using Early View Data

Machine Learning in FX Carry Basket Prediction

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Towards better accuracy for Spam predictions

Novelty Detection in image recognition using IRF Neural Networks properties

Neural Networks and Support Vector Machines

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

A Review And Evaluations Of Shortest Path Algorithms

Learning is a very general term denoting the way in which agents:

A FUZZY LOGIC APPROACH FOR SALES FORECASTING

International Journal of Software and Web Sciences (IJSWS)

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Introduction to Data Mining

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

Keywords - Algorithm, Artificial immune system, Classification, Non-Spam, Spam

Immune Support Vector Machine Approach for Credit Card Fraud Detection System. Isha Rajak 1, Dr. K. James Mathai 2

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Introducing diversity among the models of multi-label classification ensemble

A Stock Pattern Recognition Algorithm Based on Neural Networks

Mining Life Insurance Data for Customer Attrition Analysis

Visualization of large data sets using MDS combined with LVQ.

Enhancing Quality of Data using Data Mining Method

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Getting Even More Out of Ensemble Selection

A CLASSIFIER FUSION-BASED APPROACH TO IMPROVE BIOLOGICAL THREAT DETECTION. Palaiseau cedex, France; 2 FFI, P.O. Box 25, N-2027 Kjeller, Norway.

Knowledge Based Descriptive Neural Networks

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Numerical Research on Distributed Genetic Algorithm with Redundant

Sample subset optimization for classifying imbalanced biological data

ClusterOSS: a new undersampling method for imbalanced learning

Background knowledge-enrichment for bottom clauses improving.

Fuzzy Logic Based Revised Defect Rating for Software Lifecycle Performance. Prediction Using GMR

Support Vector Machine (SVM)

Predictive Dynamix Inc

Roulette Sampling for Cost-Sensitive Learning

Document Image Retrieval using Signatures as Queries

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Ensemble Approach for the Classification of Imbalanced Data

An Approach to Detect Spam s by Using Majority Voting

Support Vector Pruning with SortedVotes for Large-Scale Datasets

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

The Artificial Prediction Market

Programming Risk Assessment Models for Online Security Evaluation Systems

Tweaking Naïve Bayes classifier for intelligent spam detection

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

A fast multi-class SVM learning method for huge databases

A Non-Linear Schema Theorem for Genetic Algorithms

Scalable Developments for Big Data Analytics in Remote Sensing

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY

Addressing the Class Imbalance Problem in Medical Datasets

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

and Hung-Wen Chang 1 Department of Human Resource Development, Hsiuping University of Science and Technology, Taichung City 412, Taiwan 3

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

Evaluation of Crossover Operator Performance in Genetic Algorithms With Binary Representation

Image Normalization for Illumination Compensation in Facial Images

International Journal of Software and Web Sciences (IJSWS)

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation

ADVANCED MACHINE LEARNING. Introduction

Transcription:

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis Xiujuan Chen 1, Yichuan Zhao 2, Yan-Qing Zhang 1, and Robert Harrison 1 1 Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA 2 Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30302, USA {xchen8@,matyiz@langate,zhang@taichi.cs, cscrwh@asterix.cs}gsu.edu Abstract. Recently, the use of Receiver Operating Characteristic (ROC) Curve and the area under the ROC Curve (AUC) has been receiving much attention as a measure of the performance of machine learning algorithms. In this paper, we propose a SVM classifier fusion model using genetic fuzzy system. Genetic algorithms are applied to tune the optimal fuzzy membership functions. The performance of SVM classifiers are evaluated by their AUCs. Our experiments show that AUC-based genetic fuzzy SVM fusion model produces not only better AUC but also better accuracy than individual SVM classifiers. Keywords: Receiver Operating Characteristic (ROC), Support Vector Machines (SVMs), Gene Expression, Genetic Fuzzy System (GFS), Classifier fusion. 1 Introduction With the key technologies developed in the biomedical area, such as DNA sequencing, micro-array, and structure genomics, large-scale biological and biomedical data have been accumulated, including DNA sequences, protein sequences and structures, gene expression data, protein profiling data, and genomic sequence data. Accordingly, the bulk of research efforts have been shifted to the biomedical data analysis to extract patterns and discover useful information from the data and therefore provide valuable supports for biomedical and evolutionary research. Machine learning and classification techniques have been widely used to assist the interpretation and analysis of biomedical data. As recently discovered pattern recognition tools, Support Vector Machines (SVMs) [1] have become popular become of their outstanding learning performance when applied to real-world classification applications. SVMs are capable of classifying not only linear separable but also nonlinear separable problems. They aim to find an optimal hyperplane to separate positive/negative classes with the maximum margin in a high dimensional feature space, which is transformed from the original input space by applying a kernel function, for instance, a polynomial or a RBF kernel. However, how to select an appropriate SVM kernel to achieve the best possible performance for a real classification application is one of the practical difficulties. I. Mandoiu and A. Zelikovsky (Eds.): ISBRA 2007, LNBI 4463, pp. 496 505, 2007. Springer-Verlag Berlin Heidelberg 2007

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 497 Instead of finding the best kernel for the application by exhaustively trying out all possible kernel functions with all possible parameters, classifier fusion methods provide a more efficient but still high-performance way of solving the practical problem existing in the SVM classification. Classifier fusion is to combine a set of classifiers in a certain way so that the combined classifier can receive a better performance than its composing individual classifiers. The reason that the combined classifier could outperform the best individual classifier is because the data examples misclassified by the different classifiers would not necessarily overlap, which leaves the room for the classifier complementariness [2]. One sufficient condition for a combined classifier to be more accurate than any of its individual members is that individual classifiers should be accurate and diverse [3]. Diverse is crucial when combining classifiers. Different classifiers can be achieved by using different data feature sets, different training sets, or different classification algorithms [2], [3], [4], [5], [6]. In this study, we propose a classifier fusion model particularly for SVM classifiers aiming to boost the performance of SVM classifiers. A fuzzy logic system (FLS) is constructed to combine multiple SVM classifiers in the light of the performance of each individual classifier. The memberships of the fuzzy logic system are tuned by genetic algorithms (GAs) to generate the optimal fuzzy logic system. One question here is how to evaluate classifier performance in the fusion model. Typically, accuracy is the standard criterion to evaluate a classifier performance [7], [8]. In many scenarios, however, accuracy is not enough or not much meaningful. Researchers are often interested in ranking of data examples rather than mere positive/negative classification results. Moreover, if class distribution is skewed or unbalanced, a classifier can still receive a high accuracy by simply classifying all data examples in the dominant class [7], [9]. Recently, the Receiver Operating Characteristics (ROC) and the area under an ROC curve (AUC) have been shown to be statistically consistent with and more discriminating than accuracy empirically and theoretically [7], [8], [10]. This paper will use AUC as the evaluation of classifier performance to build the genetic fuzzy fusion model to enhance the performance of SVM classifiers. It has also been shown that classifiers based on AUC produce not only better AUC, but also better accuracy [11]. In this paper, we will first introduce the concepts associated with ROC analysis in Section 2. Then we will discuss genetic fuzzy algorithms in Section 3. The SVM classifier fusion model will be proposed in Section 4 and gene expression data will be experimented in Section 5. Finally in Section 6, conclusions will be drawn. 2 ROC Analysis for Binary Classification ROC has been receiving much attention recently as a measure to analyze classifier performance and has attractive properties that make it especially useful for domains with skewed class distribution and unequal classification error costs. An ROC curve of a classifier is a plot of true positive rate (TPR) on Y axis versus false positive rate (FPR) on the X axis as shown in Fig.1. The TPR and FPR are defined as follows [9]. TPR =, TP+ N FPR = (1) FP N

498 X. Chen et al. where TP denotes true positives, FP denotes false positives, and N + and N denote positives and negatives respectively. Fig. 1. An ROC curve For a discrete classifier, which produces only a positive/negative class label on each example, only a single point can be drawn in the ROC graph. However, for a probabilistic classifier, which yields a numeric value on each example representing the degree to which an example belongs to a class, if various decision thresholds are applied to classify data examples, a series of points can be plotted in a ROC plane with pairs of {FPR, TPR} as their coordinates. Each threshold results in one point on the ROC curve representing the classifier which is generated by using this threshold as the cutoff point. Therefore, an ROC curve of a probabilistic classifier can be viewed as an aggregation of classifiers from all possible decision thresholds [7]. The quality of an ROC curve can be summarized in one value by calculating the area under the ROC curve (AUC). AUC represents the probability that one classifier ranks a randomly chosen positive example higher than a randomly chosen negative example [9]. According to Hand [12], AUC can be simply calculated in the following formula: AUC = N + + + r N N + i = 1 i + N ( 1) / 2 (2) where r i denotes the rank of ith positive example in the ranking list if we arrange the classification results of data examples in ascending order. AUC has been shown to be a better measure than accuracy when assessing classifier performances [8], [10]. N 3 Genetic Fuzzy Systems Fuzzy logic has demonstrated the powerful abilities to handle the imprecision and uncertainties in real-world applications. It captures uncertainties by defining linguistic fuzzy sets with fuzzy membership functions (MFs) and reasoning fuzzy rules in a

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 499 rigorous mathematical discipline. However, the success of designing a fuzzy logic system (FLS) largely relies on high-performance fuzzy MFs and fuzzy rules to interpret the expert knowledge. When lack of human expert, rather than choosing fuzzy MFs or defining fuzzy rules in a manual trail-and-error manner, we may seek the assistant from a learning process. A genetic fuzzy system (GFS) is able to learn and search fuzzy MFs or fuzzy rules efficiently. It is basically a fuzzy system augmented by a learning process based on a genetic algorithm [13]. GAs are optimization algorithms inspired by natural evolution and provide robust search and learning capabilities in complex space. GAs are able to learn or train or tune different components of fuzzy logic systems [13]. For instance, some genetic fuzzy rule-based systems may learn and determine the number of IF- THEN fuzzy rules from all possible rules [14], [15]. Other genetic fuzzy systems may tune MFs of a given fuzzy rule set, such as tuning positions or shapes of MFs [16], [17], [18]. To tune fuzzy MFs, there are two techniques in general: Pittsburgh approach and Michigan approach [13]. Pittsburgh approach is to represent an entire fuzzy rule set as a chromosome and maintain a population of candidate rule sets using genetic operations to produce new generations of rule sets [19]. Michigan approach is to represent an individual rule as a chromosome and the whole rule set is represented by the entire population [20]. 4 Genetic Fuzzy SVM Fusion Based on AUC The genetic fuzzy fusion model for combing SVM classifiers is constructed as shown in Fig. 2. The system has three phases. In phase I, training data are trained on Training Data SVM 1 SVM 2 SVM 3 SVM n PHASE I Validation Data Validation Validation Data Data Model 1 Model 2 Model 3 Model n Validation Data AUC 1 Distances 1 AUC 2 Distances 2 AUC 3 Distances 3 AUC n Distances n PHASE II Genetic Fuzzy SVM Classifier Fusion System Best AUC or Maximum Generation No Testing Data Yes Optimal Fuzzy SVM Classifier Fusion System (Optimal Fuzzy MFs) PHASE III AUC Distances Fig. 2. Genetic fuzzy fusion system for SVM classifiers

500 X. Chen et al. different SVMs. Validation data are classified to obtain individual SVM AUCs and distances of validation data examples to SVM hyperplanes. In phase II, a GFS is constructed and fuzzy MFs are tuned by GAs in cross validation manner. Finally, in phase III, testing data are fed into the optimal fuzzy fusion system to make the final decision. We have implemented the proposed fusion system on combing THREE SVM classifiers and will give the detailed explanation in the rest of the section. This process can be easily extended to combine arbitrary number of SVMs in general. 4.1 Fuzzy System Inputs and Output The fuzzy fusion system is designed by applying Mamdani model [21] where the consequences of fuzzy rules are fuzzy sets. In the fusion system combining three SVM classifiers, there are three AUC inputs depicting three SVM classifier performances, three distance inputs representing the classification results of a data example from three individual SVM classifiers, and one output indicating the final decision from the fusion system for the example. All the MFs of the inputs and output are defined as simple triangles shown in Fig. 3. Each AUC input is described by two fuzzy sets: low and high, and each Membership Grades 0.9 1 Low (L) High (H) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.2 0.4 0.6 0.8 1.0 L l H l L r H AUC r Membership Grades 0.9 1 Negative (N) Positive (P) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1-5 -4-3 -2-1 0 1 2 3 4 5 N l P l N Distance r P r Membership Grades 0.9 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 O 1 O 2 O 3 O 4 O 5 O 6 O 60 O 61 O 62 O 63O 64-1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1.0 Output Fig. 3. MFs for the inputs and output

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 501 distance input is also represented by two fuzzy sets: negative and positive. The output is composed of 64 fuzzy sets corresponding to the consequences of 64 fuzzy rules. All the MFs are not fixed and each has control parameters to control its position and shape. Each AUC MF or distance MF has two control points and each output MF has one control point. We will discuss how to tune the MFs later. 4.2 Fuzzy Rule Base There are 64 rules in total each corresponding to one of 64 combinations of six inputs (2 ^ 6 = 64). The ith (i = 1...64) fuzzy rule is defined as follows: IF auc 1 is A i1 and auc 2 is A i2 and auc 3 is A i3 and dis 1 is D i1 and dis 2 is D i2 and dis 3 is D i3, THEN g i is O i (i = 1...64). where auc j denotes jth AUC input and dis j denotes jth distance input (j=1..3). A ij (j=1..3) denotes the AUC fuzzy set in {Low, High}, D ij (j=1..3) denotes a distance fuzzy set in {Negative, Positive}, and O i denotes an output fuzzy set in {O 1...O 64 } for the ith rule. 4.3 Fuzzy System Output and Defuzzification The system output is calculated by aggregating individual rule contributions: y = 64 g 64 = i i = β β (3) i 1 i 1 i where g i is the output value of the ith rule and β i is the firing strength of the ith rule defined by product t-norm: β i j= 1 Aij j Dij j 3 = μ ( auc )* μ ( dis ) (4) where μ A ( auc j ) and μ ( ) ij D dis are the membership grades of input auc ij j j and dis j (j=1 3) in the fuzzy sets A ij and D ij. If the output value is greater than or equal to 0, the data example is defuzzified in the positive class. Otherwise, it is in the negative class. This information may be used to calculate the accuracy of the model. 4.4 Tuning Fuzzy System Using GAs We use a real-coded GA and apply Pittsburgh approach [19] to tune the input and output MFs. Each chromosome is composed of the 72 genes representing 72 entire membership control parameters: 4 control points for AUC, 4 for distance MFs, and the rest 64 for output MFs. The fuzzy MFs are tuned in cross-validation manner. The fitness of the GA is defined to maximize the average AUC of each fold of data examples by applying the same MFs defined in a chromosome. Selection schema is roulette-wheel selection, which implies the higher the AUC, the greater the chance that a chromosome will be selected into the next generation. We also apply elitism strategy to ensure the best fuzzy MFs to be selected. Uniform crossover is used here since it is believed to outperform one or multiple crossover in

502 X. Chen et al. many applications. We apply Gaussian mutation to modify genes by adding a Gaussian distributed random number with a mean of zero to them [22]. 5 Experiments on Gene Expression Data In the experiment, we have tested the proposed SVM fusion model using colon tumor dataset from Kent Ridge Biomedical Data Set Repository [23]. The colon tumor dataset is a set of gene expression data. It is collected from colon cancer patients. There are totally 62 data examples, among which 40 examples are tumor tissues from diseased parts of the patients and 22 are normal tissues from healthy parts of the colons of the same patients. Each example is composed of 2000 genes as 2000 features. The data in Phase I in Fig. 2 are classified using SVM Light software [24]. The generalization parameter C of SVMs is set to 1. Two types of kernels are used: polynomial kernels and RBF kernels. The degree in polynomial kernels are set to 1, 2,, 10, and gamma in RBF kernels are set to 10-4, 10-3,,10 1. In order to avoid the selection bias, we apply cross validation strategy to assess accuracies and AUCs of individual SVM classifiers. Table 1 shows the SVM testing AUCs and testing accuracies for colon tumor data in 4-fold cross validation. The genetic fuzzy system is constructed and tuned in cross-validation manner as well. Each training dataset in Phase I is further divided into second-level training and testing data. The second-level training data are still trained by SVM Light to obtain the AUCs and distances of the second-level testing data (validation data in Fig. 2), which will be used as the inputs of the genetic fuzzy fusion system to tune the optimal fuzzy MFs based on AUC measure. After the optimal MFs are adapted, the testing data in the first-level are applied to the tuned optimal fuzzy fusion model to make the final decision. The genetic fuzzy fusion system combining three SVM classifiers has been implemented in C language. The parameter setting for the GA is as follows: crossover probability 90%, generation of 200, and population size of 3000. Table 1. Testing AUC and accuracy using individual SVMs (4-fold cross-validation) Kernels Testing AUC Testing Accuracy (%) Polynomial degree 1 2 3 4 Avg. 1 2 3 4 Avg. poly_1 1 0.87 0.94 0.84 0.91 0.89 75.00 93.75 86.67 80.00 83.86 poly_2 2 0.87 0.90 0.86 0.91 0.88 81.25 87.50 86.67 86.67 85.52 poly_3 3 0.87 0.84 0.82 0.77 0.82 75.00 75.00 86.67 80.00 79.17 poly_4 4 0.88 0.83 0.76 0.77 0.81 87.50 81.25 66.67 66.67 75.52 poly_5 5 0.88 0.84 0.70 0.73 0.79 87.50 81.25 66.67 66.67 75.52 poly_6 6 0.90 0.83 0.70 0.70 0.78 87.50 75.00 66.67 66.67 73.96 poly_8 8 0.85 0.79 0.64 0.68 0.74 87.50 75.00 66.67 66.67 73.96 poly_10 10 0.85 0.73 0.56 0.68 0.71 87.50 75.00 66.67 66.67 73.96 RBF gamma rbf_0.0001 0.0001 0.83 0.95 0.84 0.91 0.88 62.50 56.25 66.67 73.33 64.69 rbf_0.001 0.001 0.85 0.95 0.82 0.91 0.88 62.50 56.25 66.67 73.33 64.69 rbf_0.01 0.01 0.85 0.95 0.82 0.91 0.88 62.50 56.25 66.67 73.33 64.69 rbf_0.1 0.1 0.87 0.94 0.82 0.91 0.88 62.50 56.25 66.67 73.33 64.69 rbf_1 1 0.83 0.90 0.78 0.93 0.86 75.00 93.75 86.67 73.33 82.19 rbf_10 10 0.85 0.90 0.74 0.82 0.83 62.50 56.25 66.67 80.00 66.36

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 503 We chose six groups of three SVM classifiers and combined each group of three SVMs using the proposed genetic fuzzy SVM fusion model. Table 2 shows the experimental results. Table 2. Three selected SVM classifiers, maximum and average AUC and accuracy (%) of the three individual SVM classifiers, and AUC and accuracy (%) from the fusion model by combing the three SVMs Test SVM1 SVM2 SVM3 Max Avg. Max Avg. Fusion Fusion AUC AUC Accuracy Accuracy AUC Accuracy 1 poly_1 poly_3 rbf_0.01 0.89 0.86 83.86 75.91 0.92 88.75 2 poly_4 poly_8 rbf_1 0.86 0.80 82.19 77.22 0.87 83.75 3 poly_4 rbf_1 rbf_0.01 0.88 0.85 82.19 74.13 0.91 85.11 4 poly_1 poly_5 poly_10 0.89 0.80 83.86 77.78 0.89 86.94 5 rbf_0.0001 rbf_0.01 rbf_1 0.88 0.87 82.19 70.52 0.88 88.81 6 poly_3 rbf_0.001 rbf_0.1 0.88 0.86 79.17 69.52 0.89 88.65 Avg. 0.880 0.841 82.243 74.181 0.893 87.002 From Table 2 we may see that the proposed SVM fusion model demonstrates stable and robust classification capabilities. It not only performs far better than the average of three individual SVM classifiers in terms of both AUC and accuracy, but also outperforms the best of three individual SVMs in terms of accuracy and achieves as least as much performance as the best in terms of AUC. For all the six tests, the model accuracy is better than the best accuracy. For Tests 1, 5 and 6, the model achieves 88% accuracy, but the best accuracy is only no more than 83%. For four of six tests (Tests 1, 2, 3, 6), the model achieves a better AUC than the best AUC of three individual SVM classifiers. The remaining (Tests 4, 5) receives the same AUC as the best. These two tests combine three RBF or three polynomial SVM classifiers. RBF classifiers or polynomial classifiers behavior similar and this might cause not much complementary room for the combined classifiers. We can also see that the classifier fusion model that optimizes AUC measure not only achieves nice AUC performance, but also excellent accuracy as well [11]. The genetic fuzzy SVM fusion model based on AUC produces a combined classifier with the best AUC naturally because of the properties of AUC. This means that the accurate ranking of data examples is maintained and it provides researchers more interpretation of the data than mere positive or negative classification results. 6 Conclusion In this paper, we propose a genetic fuzzy SVM classifier fusion model to combine multiple SVM classifiers. Individual SVMs are combined in a genetic fuzzy system and GAs are applied to tune the fuzzy MFs based on AUC measure. The experimental results show that the proposed genetic fuzzy system is more stable and more robust than individual SVMs. Moreover, the combined SVM classifier from the genetic fuzzy fusion model accomplishes more accurate ranking of data examples which provides valuable interpretation of the real-world data and may help medical diagnosis.

504 X. Chen et al. Acknowledgments. This work was supported in part by NIH under P20 GM065762, NIGMS 065762, Georgia Cancer Coalition, and Georgia Research Alliance. Dr. Harrison is a GCC distinguished cancer scholar. References 1. Vapnik, V. N.: The Nature of Statistical Learning Theory. Springer-Verlag, New York (1995) 2. Kittler, J., Hatef, M., Duin R., Matas J.: On Combining Classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 3. (1998) 226-239 3. Hansen, L., Salamon, P.: Neural network ensembles, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12. (1990) 993-1001 4. Ho, T.K., Hull, J.J., Srihari, S.N.: Decision Combination in Multiple Classifier Systems, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 16, No. 1. (1994) 66-75 5. Ho, T.K.: Random Decision Forests, Third Int l Conf. Document Analysis and Recognition, Montreal. (1995) 278-282 6. Xu, L., Krzyzak, A., Suen, C.Y.: Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition, IEEE Trans. Systems, Man, and Cybernetics, Vol. 22, No. 3. (1992) 418-435 7. Qin, Z.-C.: ROC Analysis for Predictions Made by Probabilistic Classifiers, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Vol. 5. (2005) 3119-3124 8. Ling, C.X., Huang, J., Zhang, H.: AUC: A Statistically Consistent and More Discriminating Measure than Accuracy, Proc. 18th Int'l Conf. Artificial Intelligence (IJCAI '03). (2003) 329-341 9. Fawcett, T.: ROC graphs: Notes and practical considerations for researchers, Tech Report HPL-2003-4, HP Laboratories. (2003) 10. Huang, J., Ling, C.X.: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng, Vol. 17, No. 3. (2005) 299-310 11. Ling, C.X., Zhang, H.: Toward Bayesian Classifiers with Accurate Probabilities, Proceedings of the Sixth Pacific-Asia Conference on KDD, Springer. (2002) 12. Hand, D. J., Till, R. J.: A Simple Generalization of the Area under the ROC Curve for Multiple Class Classification Problems, Machine Learning, Vol. 45. (2001) 171 186 13. Magdalena, L., Cordon, O., Gomide, F., Herrera, F., Hoffmann, F.: Ten Years of Genetic Fuzzy Systems: Current Framework and New Trends, Fuzzy Sets & Systems, Vol. 141, No. 1. (2004) 5-31. 14. Herrera, F., Lozano, M., Verdegay, J.L.: Generating Fuzzy Rules from Examples Using Genetic Algorithms, Fuzzy Logic and Soft Computing. (1995c) 15. 15. Karr, C.: Applying Genetic to Fuzzy Logic, AI Expert, Vol. 6. (1991) 26-33 16. Homaifar, and McCormick, E.: Simultaneous Design of Membership Functions and Rule Sets for Fuzzy Controllers Using Genetic Algorithms, IEEE Transactions on Fuzzy Systems, Vol. 3, No. 2. (1995) 129-139 17. Park, D., Kandel, A.: Genetic-based New Fuzzy Reasoning Models with Application to Fuzzy Control, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 24, No. 1. (1994) 39-47 18. Cordon and Herrera, F.: A Three-Stage Evolutionary Process for Learning Descriptive and Approximate Fuzzy Logic Controller Knowledge Bases from Examples, International Journal of Approximate Reasoning, Vol. 17, No. 4. (1997) 369-407

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 505 19. Smith, S.: A Learning System Based on Genetic Adaptive Algorithms, Doctoral dissertation, Department of Computer Science, University of Pittsburgh. (1980) 20. 20.Holland, J., Reitman, J.: Cognitive Systems Based on Adaptive Algorithms, Pattern- Directed Inference Systems, Academic Press. (1978) 21. Mamdani, E.H.: Application of Fuzzy Algorithms for Control of Simple Dynamic Plant, IEEE Proceedings, Vol. 121, No. 12. (1974) 1585-1588 22. 22.Bäck, T., Hoffmeister, F., Schwefel, H.: A Survey of Evolution Strategies, Proceedings of the Fourth International Conference on Genetic Algorithms. (1991) 2 9 23. 23.Li, J., Liu, H.: Kent Ridge Biomedical Data Set Repository, http://sdmc.i2r.astar.edu.sg/rp/. (2003) 24. Joachims, T.: Making large-scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press. (1999)