Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis

Size: px
Start display at page:

Download "Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis"

Transcription

1 Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis Xiujuan Chen 1, Yichuan Zhao 2, Yan-Qing Zhang 1, and Robert Harrison 1 1 Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA 2 Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30302, USA {xchen8@,matyiz@langate,zhang@taichi.cs, cscrwh@asterix.cs}gsu.edu Abstract. Recently, the use of Receiver Operating Characteristic (ROC) Curve and the area under the ROC Curve (AUC) has been receiving much attention as a measure of the performance of machine learning algorithms. In this paper, we propose a SVM classifier fusion model using genetic fuzzy system. Genetic algorithms are applied to tune the optimal fuzzy membership functions. The performance of SVM classifiers are evaluated by their AUCs. Our experiments show that AUC-based genetic fuzzy SVM fusion model produces not only better AUC but also better accuracy than individual SVM classifiers. Keywords: Receiver Operating Characteristic (ROC), Support Vector Machines (SVMs), Gene Expression, Genetic Fuzzy System (GFS), Classifier fusion. 1 Introduction With the key technologies developed in the biomedical area, such as DNA sequencing, micro-array, and structure genomics, large-scale biological and biomedical data have been accumulated, including DNA sequences, protein sequences and structures, gene expression data, protein profiling data, and genomic sequence data. Accordingly, the bulk of research efforts have been shifted to the biomedical data analysis to extract patterns and discover useful information from the data and therefore provide valuable supports for biomedical and evolutionary research. Machine learning and classification techniques have been widely used to assist the interpretation and analysis of biomedical data. As recently discovered pattern recognition tools, Support Vector Machines (SVMs) [1] have become popular become of their outstanding learning performance when applied to real-world classification applications. SVMs are capable of classifying not only linear separable but also nonlinear separable problems. They aim to find an optimal hyperplane to separate positive/negative classes with the maximum margin in a high dimensional feature space, which is transformed from the original input space by applying a kernel function, for instance, a polynomial or a RBF kernel. However, how to select an appropriate SVM kernel to achieve the best possible performance for a real classification application is one of the practical difficulties. I. Mandoiu and A. Zelikovsky (Eds.): ISBRA 2007, LNBI 4463, pp , Springer-Verlag Berlin Heidelberg 2007

2 Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 497 Instead of finding the best kernel for the application by exhaustively trying out all possible kernel functions with all possible parameters, classifier fusion methods provide a more efficient but still high-performance way of solving the practical problem existing in the SVM classification. Classifier fusion is to combine a set of classifiers in a certain way so that the combined classifier can receive a better performance than its composing individual classifiers. The reason that the combined classifier could outperform the best individual classifier is because the data examples misclassified by the different classifiers would not necessarily overlap, which leaves the room for the classifier complementariness [2]. One sufficient condition for a combined classifier to be more accurate than any of its individual members is that individual classifiers should be accurate and diverse [3]. Diverse is crucial when combining classifiers. Different classifiers can be achieved by using different data feature sets, different training sets, or different classification algorithms [2], [3], [4], [5], [6]. In this study, we propose a classifier fusion model particularly for SVM classifiers aiming to boost the performance of SVM classifiers. A fuzzy logic system (FLS) is constructed to combine multiple SVM classifiers in the light of the performance of each individual classifier. The memberships of the fuzzy logic system are tuned by genetic algorithms (GAs) to generate the optimal fuzzy logic system. One question here is how to evaluate classifier performance in the fusion model. Typically, accuracy is the standard criterion to evaluate a classifier performance [7], [8]. In many scenarios, however, accuracy is not enough or not much meaningful. Researchers are often interested in ranking of data examples rather than mere positive/negative classification results. Moreover, if class distribution is skewed or unbalanced, a classifier can still receive a high accuracy by simply classifying all data examples in the dominant class [7], [9]. Recently, the Receiver Operating Characteristics (ROC) and the area under an ROC curve (AUC) have been shown to be statistically consistent with and more discriminating than accuracy empirically and theoretically [7], [8], [10]. This paper will use AUC as the evaluation of classifier performance to build the genetic fuzzy fusion model to enhance the performance of SVM classifiers. It has also been shown that classifiers based on AUC produce not only better AUC, but also better accuracy [11]. In this paper, we will first introduce the concepts associated with ROC analysis in Section 2. Then we will discuss genetic fuzzy algorithms in Section 3. The SVM classifier fusion model will be proposed in Section 4 and gene expression data will be experimented in Section 5. Finally in Section 6, conclusions will be drawn. 2 ROC Analysis for Binary Classification ROC has been receiving much attention recently as a measure to analyze classifier performance and has attractive properties that make it especially useful for domains with skewed class distribution and unequal classification error costs. An ROC curve of a classifier is a plot of true positive rate (TPR) on Y axis versus false positive rate (FPR) on the X axis as shown in Fig.1. The TPR and FPR are defined as follows [9]. TPR =, TP+ N FPR = (1) FP N

3 498 X. Chen et al. where TP denotes true positives, FP denotes false positives, and N + and N denote positives and negatives respectively. Fig. 1. An ROC curve For a discrete classifier, which produces only a positive/negative class label on each example, only a single point can be drawn in the ROC graph. However, for a probabilistic classifier, which yields a numeric value on each example representing the degree to which an example belongs to a class, if various decision thresholds are applied to classify data examples, a series of points can be plotted in a ROC plane with pairs of {FPR, TPR} as their coordinates. Each threshold results in one point on the ROC curve representing the classifier which is generated by using this threshold as the cutoff point. Therefore, an ROC curve of a probabilistic classifier can be viewed as an aggregation of classifiers from all possible decision thresholds [7]. The quality of an ROC curve can be summarized in one value by calculating the area under the ROC curve (AUC). AUC represents the probability that one classifier ranks a randomly chosen positive example higher than a randomly chosen negative example [9]. According to Hand [12], AUC can be simply calculated in the following formula: AUC = N r N N + i = 1 i + N ( 1) / 2 (2) where r i denotes the rank of ith positive example in the ranking list if we arrange the classification results of data examples in ascending order. AUC has been shown to be a better measure than accuracy when assessing classifier performances [8], [10]. N 3 Genetic Fuzzy Systems Fuzzy logic has demonstrated the powerful abilities to handle the imprecision and uncertainties in real-world applications. It captures uncertainties by defining linguistic fuzzy sets with fuzzy membership functions (MFs) and reasoning fuzzy rules in a

4 Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 499 rigorous mathematical discipline. However, the success of designing a fuzzy logic system (FLS) largely relies on high-performance fuzzy MFs and fuzzy rules to interpret the expert knowledge. When lack of human expert, rather than choosing fuzzy MFs or defining fuzzy rules in a manual trail-and-error manner, we may seek the assistant from a learning process. A genetic fuzzy system (GFS) is able to learn and search fuzzy MFs or fuzzy rules efficiently. It is basically a fuzzy system augmented by a learning process based on a genetic algorithm [13]. GAs are optimization algorithms inspired by natural evolution and provide robust search and learning capabilities in complex space. GAs are able to learn or train or tune different components of fuzzy logic systems [13]. For instance, some genetic fuzzy rule-based systems may learn and determine the number of IF- THEN fuzzy rules from all possible rules [14], [15]. Other genetic fuzzy systems may tune MFs of a given fuzzy rule set, such as tuning positions or shapes of MFs [16], [17], [18]. To tune fuzzy MFs, there are two techniques in general: Pittsburgh approach and Michigan approach [13]. Pittsburgh approach is to represent an entire fuzzy rule set as a chromosome and maintain a population of candidate rule sets using genetic operations to produce new generations of rule sets [19]. Michigan approach is to represent an individual rule as a chromosome and the whole rule set is represented by the entire population [20]. 4 Genetic Fuzzy SVM Fusion Based on AUC The genetic fuzzy fusion model for combing SVM classifiers is constructed as shown in Fig. 2. The system has three phases. In phase I, training data are trained on Training Data SVM 1 SVM 2 SVM 3 SVM n PHASE I Validation Data Validation Validation Data Data Model 1 Model 2 Model 3 Model n Validation Data AUC 1 Distances 1 AUC 2 Distances 2 AUC 3 Distances 3 AUC n Distances n PHASE II Genetic Fuzzy SVM Classifier Fusion System Best AUC or Maximum Generation No Testing Data Yes Optimal Fuzzy SVM Classifier Fusion System (Optimal Fuzzy MFs) PHASE III AUC Distances Fig. 2. Genetic fuzzy fusion system for SVM classifiers

5 500 X. Chen et al. different SVMs. Validation data are classified to obtain individual SVM AUCs and distances of validation data examples to SVM hyperplanes. In phase II, a GFS is constructed and fuzzy MFs are tuned by GAs in cross validation manner. Finally, in phase III, testing data are fed into the optimal fuzzy fusion system to make the final decision. We have implemented the proposed fusion system on combing THREE SVM classifiers and will give the detailed explanation in the rest of the section. This process can be easily extended to combine arbitrary number of SVMs in general. 4.1 Fuzzy System Inputs and Output The fuzzy fusion system is designed by applying Mamdani model [21] where the consequences of fuzzy rules are fuzzy sets. In the fusion system combining three SVM classifiers, there are three AUC inputs depicting three SVM classifier performances, three distance inputs representing the classification results of a data example from three individual SVM classifiers, and one output indicating the final decision from the fusion system for the example. All the MFs of the inputs and output are defined as simple triangles shown in Fig. 3. Each AUC input is described by two fuzzy sets: low and high, and each Membership Grades Low (L) High (H) L l H l L r H AUC r Membership Grades Negative (N) Positive (P) N l P l N Distance r P r Membership Grades O 1 O 2 O 3 O 4 O 5 O 6 O 60 O 61 O 62 O 63O Output Fig. 3. MFs for the inputs and output

6 Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 501 distance input is also represented by two fuzzy sets: negative and positive. The output is composed of 64 fuzzy sets corresponding to the consequences of 64 fuzzy rules. All the MFs are not fixed and each has control parameters to control its position and shape. Each AUC MF or distance MF has two control points and each output MF has one control point. We will discuss how to tune the MFs later. 4.2 Fuzzy Rule Base There are 64 rules in total each corresponding to one of 64 combinations of six inputs (2 ^ 6 = 64). The ith (i = ) fuzzy rule is defined as follows: IF auc 1 is A i1 and auc 2 is A i2 and auc 3 is A i3 and dis 1 is D i1 and dis 2 is D i2 and dis 3 is D i3, THEN g i is O i (i = ). where auc j denotes jth AUC input and dis j denotes jth distance input (j=1..3). A ij (j=1..3) denotes the AUC fuzzy set in {Low, High}, D ij (j=1..3) denotes a distance fuzzy set in {Negative, Positive}, and O i denotes an output fuzzy set in {O 1...O 64 } for the ith rule. 4.3 Fuzzy System Output and Defuzzification The system output is calculated by aggregating individual rule contributions: y = 64 g 64 = i i = β β (3) i 1 i 1 i where g i is the output value of the ith rule and β i is the firing strength of the ith rule defined by product t-norm: β i j= 1 Aij j Dij j 3 = μ ( auc )* μ ( dis ) (4) where μ A ( auc j ) and μ ( ) ij D dis are the membership grades of input auc ij j j and dis j (j=1 3) in the fuzzy sets A ij and D ij. If the output value is greater than or equal to 0, the data example is defuzzified in the positive class. Otherwise, it is in the negative class. This information may be used to calculate the accuracy of the model. 4.4 Tuning Fuzzy System Using GAs We use a real-coded GA and apply Pittsburgh approach [19] to tune the input and output MFs. Each chromosome is composed of the 72 genes representing 72 entire membership control parameters: 4 control points for AUC, 4 for distance MFs, and the rest 64 for output MFs. The fuzzy MFs are tuned in cross-validation manner. The fitness of the GA is defined to maximize the average AUC of each fold of data examples by applying the same MFs defined in a chromosome. Selection schema is roulette-wheel selection, which implies the higher the AUC, the greater the chance that a chromosome will be selected into the next generation. We also apply elitism strategy to ensure the best fuzzy MFs to be selected. Uniform crossover is used here since it is believed to outperform one or multiple crossover in

7 502 X. Chen et al. many applications. We apply Gaussian mutation to modify genes by adding a Gaussian distributed random number with a mean of zero to them [22]. 5 Experiments on Gene Expression Data In the experiment, we have tested the proposed SVM fusion model using colon tumor dataset from Kent Ridge Biomedical Data Set Repository [23]. The colon tumor dataset is a set of gene expression data. It is collected from colon cancer patients. There are totally 62 data examples, among which 40 examples are tumor tissues from diseased parts of the patients and 22 are normal tissues from healthy parts of the colons of the same patients. Each example is composed of 2000 genes as 2000 features. The data in Phase I in Fig. 2 are classified using SVM Light software [24]. The generalization parameter C of SVMs is set to 1. Two types of kernels are used: polynomial kernels and RBF kernels. The degree in polynomial kernels are set to 1, 2,, 10, and gamma in RBF kernels are set to 10-4, 10-3,,10 1. In order to avoid the selection bias, we apply cross validation strategy to assess accuracies and AUCs of individual SVM classifiers. Table 1 shows the SVM testing AUCs and testing accuracies for colon tumor data in 4-fold cross validation. The genetic fuzzy system is constructed and tuned in cross-validation manner as well. Each training dataset in Phase I is further divided into second-level training and testing data. The second-level training data are still trained by SVM Light to obtain the AUCs and distances of the second-level testing data (validation data in Fig. 2), which will be used as the inputs of the genetic fuzzy fusion system to tune the optimal fuzzy MFs based on AUC measure. After the optimal MFs are adapted, the testing data in the first-level are applied to the tuned optimal fuzzy fusion model to make the final decision. The genetic fuzzy fusion system combining three SVM classifiers has been implemented in C language. The parameter setting for the GA is as follows: crossover probability 90%, generation of 200, and population size of Table 1. Testing AUC and accuracy using individual SVMs (4-fold cross-validation) Kernels Testing AUC Testing Accuracy (%) Polynomial degree Avg Avg. poly_ poly_ poly_ poly_ poly_ poly_ poly_ poly_ RBF gamma rbf_ rbf_ rbf_ rbf_ rbf_ rbf_

8 Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC 503 We chose six groups of three SVM classifiers and combined each group of three SVMs using the proposed genetic fuzzy SVM fusion model. Table 2 shows the experimental results. Table 2. Three selected SVM classifiers, maximum and average AUC and accuracy (%) of the three individual SVM classifiers, and AUC and accuracy (%) from the fusion model by combing the three SVMs Test SVM1 SVM2 SVM3 Max Avg. Max Avg. Fusion Fusion AUC AUC Accuracy Accuracy AUC Accuracy 1 poly_1 poly_3 rbf_ poly_4 poly_8 rbf_ poly_4 rbf_1 rbf_ poly_1 poly_5 poly_ rbf_ rbf_0.01 rbf_ poly_3 rbf_0.001 rbf_ Avg From Table 2 we may see that the proposed SVM fusion model demonstrates stable and robust classification capabilities. It not only performs far better than the average of three individual SVM classifiers in terms of both AUC and accuracy, but also outperforms the best of three individual SVMs in terms of accuracy and achieves as least as much performance as the best in terms of AUC. For all the six tests, the model accuracy is better than the best accuracy. For Tests 1, 5 and 6, the model achieves 88% accuracy, but the best accuracy is only no more than 83%. For four of six tests (Tests 1, 2, 3, 6), the model achieves a better AUC than the best AUC of three individual SVM classifiers. The remaining (Tests 4, 5) receives the same AUC as the best. These two tests combine three RBF or three polynomial SVM classifiers. RBF classifiers or polynomial classifiers behavior similar and this might cause not much complementary room for the combined classifiers. We can also see that the classifier fusion model that optimizes AUC measure not only achieves nice AUC performance, but also excellent accuracy as well [11]. The genetic fuzzy SVM fusion model based on AUC produces a combined classifier with the best AUC naturally because of the properties of AUC. This means that the accurate ranking of data examples is maintained and it provides researchers more interpretation of the data than mere positive or negative classification results. 6 Conclusion In this paper, we propose a genetic fuzzy SVM classifier fusion model to combine multiple SVM classifiers. Individual SVMs are combined in a genetic fuzzy system and GAs are applied to tune the fuzzy MFs based on AUC measure. The experimental results show that the proposed genetic fuzzy system is more stable and more robust than individual SVMs. Moreover, the combined SVM classifier from the genetic fuzzy fusion model accomplishes more accurate ranking of data examples which provides valuable interpretation of the real-world data and may help medical diagnosis.

9 504 X. Chen et al. Acknowledgments. This work was supported in part by NIH under P20 GM065762, NIGMS , Georgia Cancer Coalition, and Georgia Research Alliance. Dr. Harrison is a GCC distinguished cancer scholar. References 1. Vapnik, V. N.: The Nature of Statistical Learning Theory. Springer-Verlag, New York (1995) 2. Kittler, J., Hatef, M., Duin R., Matas J.: On Combining Classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 3. (1998) Hansen, L., Salamon, P.: Neural network ensembles, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12. (1990) Ho, T.K., Hull, J.J., Srihari, S.N.: Decision Combination in Multiple Classifier Systems, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 16, No. 1. (1994) Ho, T.K.: Random Decision Forests, Third Int l Conf. Document Analysis and Recognition, Montreal. (1995) Xu, L., Krzyzak, A., Suen, C.Y.: Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition, IEEE Trans. Systems, Man, and Cybernetics, Vol. 22, No. 3. (1992) Qin, Z.-C.: ROC Analysis for Predictions Made by Probabilistic Classifiers, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Vol. 5. (2005) Ling, C.X., Huang, J., Zhang, H.: AUC: A Statistically Consistent and More Discriminating Measure than Accuracy, Proc. 18th Int'l Conf. Artificial Intelligence (IJCAI '03). (2003) Fawcett, T.: ROC graphs: Notes and practical considerations for researchers, Tech Report HPL , HP Laboratories. (2003) 10. Huang, J., Ling, C.X.: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng, Vol. 17, No. 3. (2005) Ling, C.X., Zhang, H.: Toward Bayesian Classifiers with Accurate Probabilities, Proceedings of the Sixth Pacific-Asia Conference on KDD, Springer. (2002) 12. Hand, D. J., Till, R. J.: A Simple Generalization of the Area under the ROC Curve for Multiple Class Classification Problems, Machine Learning, Vol. 45. (2001) Magdalena, L., Cordon, O., Gomide, F., Herrera, F., Hoffmann, F.: Ten Years of Genetic Fuzzy Systems: Current Framework and New Trends, Fuzzy Sets & Systems, Vol. 141, No. 1. (2004) Herrera, F., Lozano, M., Verdegay, J.L.: Generating Fuzzy Rules from Examples Using Genetic Algorithms, Fuzzy Logic and Soft Computing. (1995c) Karr, C.: Applying Genetic to Fuzzy Logic, AI Expert, Vol. 6. (1991) Homaifar, and McCormick, E.: Simultaneous Design of Membership Functions and Rule Sets for Fuzzy Controllers Using Genetic Algorithms, IEEE Transactions on Fuzzy Systems, Vol. 3, No. 2. (1995) Park, D., Kandel, A.: Genetic-based New Fuzzy Reasoning Models with Application to Fuzzy Control, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 24, No. 1. (1994) Cordon and Herrera, F.: A Three-Stage Evolutionary Process for Learning Descriptive and Approximate Fuzzy Logic Controller Knowledge Bases from Examples, International Journal of Approximate Reasoning, Vol. 17, No. 4. (1997)

10 Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC Smith, S.: A Learning System Based on Genetic Adaptive Algorithms, Doctoral dissertation, Department of Computer Science, University of Pittsburgh. (1980) Holland, J., Reitman, J.: Cognitive Systems Based on Adaptive Algorithms, Pattern- Directed Inference Systems, Academic Press. (1978) 21. Mamdani, E.H.: Application of Fuzzy Algorithms for Control of Simple Dynamic Plant, IEEE Proceedings, Vol. 121, No. 12. (1974) Bäck, T., Hoffmeister, F., Schwefel, H.: A Survey of Evolution Strategies, Proceedings of the Fourth International Conference on Genetic Algorithms. (1991) Li, J., Liu, H.: Kent Ridge Biomedical Data Set Repository, (2003) 24. Joachims, T.: Making large-scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press. (1999)

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects Journal of Computer Science 2 (2): 118-123, 2006 ISSN 1549-3636 2006 Science Publications Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects Alaa F. Sheta Computers

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Early defect identification of semiconductor processes using machine learning

Early defect identification of semiconductor processes using machine learning STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

GA as a Data Optimization Tool for Predictive Analytics

GA as a Data Optimization Tool for Predictive Analytics GA as a Data Optimization Tool for Predictive Analytics Chandra.J 1, Dr.Nachamai.M 2,Dr.Anitha.S.Pillai 3 1Assistant Professor, Department of computer Science, Christ University, Bangalore,India, chandra.j@christunivesity.in

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie

More information

Combining SOM and GA-CBR for Flow Time Prediction in Semiconductor Manufacturing Factory

Combining SOM and GA-CBR for Flow Time Prediction in Semiconductor Manufacturing Factory Combining SOM and GA-CBR for Flow Time Prediction in Semiconductor Manufacturing Factory Pei-Chann Chang 12, Yen-Wen Wang 3, Chen-Hao Liu 2 1 Department of Information Management, Yuan-Ze University, 2

More information

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Management Science Letters

Management Science Letters Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and

More information

Alpha Cut based Novel Selection for Genetic Algorithm

Alpha Cut based Novel Selection for Genetic Algorithm Alpha Cut based Novel for Genetic Algorithm Rakesh Kumar Professor Girdhar Gopal Research Scholar Rajesh Kumar Assistant Professor ABSTRACT Genetic algorithm (GA) has several genetic operators that can

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2009 30 June, 1 3 July 2009. A Hybrid Approach to Learn with Imbalanced Classes using

More information

Evolutionary Tuning of Combined Multiple Models

Evolutionary Tuning of Combined Multiple Models Evolutionary Tuning of Combined Multiple Models Gregor Stiglic, Peter Kokol Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia {Gregor.Stiglic, Kokol}@uni-mb.si

More information

A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES

A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES K.M.Ruba Malini #1 and R.Lakshmi *2 # P.G.Scholar, Computer Science and Engineering, K. L. N College Of

More information

A Robust Method for Solving Transcendental Equations

A Robust Method for Solving Transcendental Equations www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

A Review And Evaluations Of Shortest Path Algorithms

A Review And Evaluations Of Shortest Path Algorithms A Review And Evaluations Of Shortest Path Algorithms Kairanbay Magzhan, Hajar Mat Jani Abstract: Nowadays, in computer networks, the routing is based on the shortest path problem. This will help in minimizing

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

A FUZZY LOGIC APPROACH FOR SALES FORECASTING

A FUZZY LOGIC APPROACH FOR SALES FORECASTING A FUZZY LOGIC APPROACH FOR SALES FORECASTING ABSTRACT Sales forecasting proved to be very important in marketing where managers need to learn from historical data. Many methods have become available for

More information

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

International Journal of Software and Web Sciences (IJSWS) www.iasir.net International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University

More information

Shafzon@yahool.com. Keywords - Algorithm, Artificial immune system, E-mail Classification, Non-Spam, Spam

Shafzon@yahool.com. Keywords - Algorithm, Artificial immune system, E-mail Classification, Non-Spam, Spam An Improved AIS Based E-mail Classification Technique for Spam Detection Ismaila Idris Dept of Cyber Security Science, Fed. Uni. Of Tech. Minna, Niger State Idris.ismaila95@gmail.com Abdulhamid Shafi i

More information

Immune Support Vector Machine Approach for Credit Card Fraud Detection System. Isha Rajak 1, Dr. K. James Mathai 2

Immune Support Vector Machine Approach for Credit Card Fraud Detection System. Isha Rajak 1, Dr. K. James Mathai 2 Immune Support Vector Machine Approach for Credit Card Fraud Detection System. Isha Rajak 1, Dr. K. James Mathai 2 1Department of Computer Engineering & Application, NITTTR, Shyamla Hills, Bhopal M.P.,

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

Mining Life Insurance Data for Customer Attrition Analysis

Mining Life Insurance Data for Customer Attrition Analysis Mining Life Insurance Data for Customer Attrition Analysis T. L. Oshini Goonetilleke Informatics Institute of Technology/Department of Computing, Colombo, Sri Lanka Email: oshini.g@iit.ac.lk H. A. Caldera

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

A CLASSIFIER FUSION-BASED APPROACH TO IMPROVE BIOLOGICAL THREAT DETECTION. Palaiseau cedex, France; 2 FFI, P.O. Box 25, N-2027 Kjeller, Norway.

A CLASSIFIER FUSION-BASED APPROACH TO IMPROVE BIOLOGICAL THREAT DETECTION. Palaiseau cedex, France; 2 FFI, P.O. Box 25, N-2027 Kjeller, Norway. A CLASSIFIER FUSION-BASED APPROACH TO IMPROVE BIOLOGICAL THREAT DETECTION Frédéric Pichon 1, Florence Aligne 1, Gilles Feugnet 1 and Janet Martha Blatny 2 1 Thales Research & Technology, Campus Polytechnique,

More information

Knowledge Based Descriptive Neural Networks

Knowledge Based Descriptive Neural Networks Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: jtyao@cs.uregina.ca Abstract This paper presents a

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Numerical Research on Distributed Genetic Algorithm with Redundant

Numerical Research on Distributed Genetic Algorithm with Redundant Numerical Research on Distributed Genetic Algorithm with Redundant Binary Number 1 Sayori Seto, 2 Akinori Kanasugi 1,2 Graduate School of Engineering, Tokyo Denki University, Japan 10kme41@ms.dendai.ac.jp,

More information

Sample subset optimization for classifying imbalanced biological data

Sample subset optimization for classifying imbalanced biological data Sample subset optimization for classifying imbalanced biological data Pengyi Yang 1,2,3, Zili Zhang 4,5, Bing B. Zhou 1,3 and Albert Y. Zomaya 1,3 1 School of Information Technologies, University of Sydney,

More information

ClusterOSS: a new undersampling method for imbalanced learning

ClusterOSS: a new undersampling method for imbalanced learning 1 ClusterOSS: a new undersampling method for imbalanced learning Victor H Barella, Eduardo P Costa, and André C P L F Carvalho, Abstract A dataset is said to be imbalanced when its classes are disproportionately

More information

Background knowledge-enrichment for bottom clauses improving.

Background knowledge-enrichment for bottom clauses improving. Background knowledge-enrichment for bottom clauses improving. Orlando Muñoz Texzocotetla and René MacKinney-Romero Departamento de Ingeniería Eléctrica Universidad Autónoma Metropolitana México D.F. 09340,

More information

Fuzzy Logic Based Revised Defect Rating for Software Lifecycle Performance. Prediction Using GMR

Fuzzy Logic Based Revised Defect Rating for Software Lifecycle Performance. Prediction Using GMR BIJIT - BVICAM s International Journal of Information Technology Bharati Vidyapeeth s Institute of Computer Applications and Management (BVICAM), New Delhi Fuzzy Logic Based Revised Defect Rating for Software

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Predictive Dynamix Inc

Predictive Dynamix Inc Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

More information

Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the

More information

Ensemble Approach for the Classification of Imbalanced Data

Ensemble Approach for the Classification of Imbalanced Data Ensemble Approach for the Classification of Imbalanced Data Vladimir Nikulin 1, Geoffrey J. McLachlan 1, and Shu Kay Ng 2 1 Department of Mathematics, University of Queensland v.nikulin@uq.edu.au, gjm@maths.uq.edu.au

More information

An Approach to Detect Spam Emails by Using Majority Voting

An Approach to Detect Spam Emails by Using Majority Voting An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,

More information

Support Vector Pruning with SortedVotes for Large-Scale Datasets

Support Vector Pruning with SortedVotes for Large-Scale Datasets Support Vector Pruning with SortedVotes for Large-Scale Datasets Frerk Saxen, Konrad Doll and Ulrich Brunsmann University of Applied Sciences Aschaffenburg, Germany Email: {Frerk.Saxen, Konrad.Doll, Ulrich.Brunsmann}@h-ab.de

More information

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Programming Risk Assessment Models for Online Security Evaluation Systems

Programming Risk Assessment Models for Online Security Evaluation Systems Programming Risk Assessment Models for Online Security Evaluation Systems Ajith Abraham 1, Crina Grosan 12, Vaclav Snasel 13 1 Machine Intelligence Research Labs, MIR Labs, http://www.mirlabs.org 2 Babes-Bolyai

More information

Tweaking Naïve Bayes classifier for intelligent spam detection

Tweaking Naïve Bayes classifier for intelligent spam detection 682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. araturi@uci.edu 2 School of Computing, Information

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY Sérgio Moro and Raul M. S. Laureano Instituto Universitário de Lisboa (ISCTE IUL) Av.ª das Forças Armadas 1649-026

More information

Addressing the Class Imbalance Problem in Medical Datasets

Addressing the Class Imbalance Problem in Medical Datasets Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

and Hung-Wen Chang 1 Department of Human Resource Development, Hsiuping University of Science and Technology, Taichung City 412, Taiwan 3

and Hung-Wen Chang 1 Department of Human Resource Development, Hsiuping University of Science and Technology, Taichung City 412, Taiwan 3 A study using Genetic Algorithm and Support Vector Machine to find out how the attitude of training personnel affects the performance of the introduction of Taiwan TrainQuali System in an enterprise Tung-Shou

More information

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

More information

Evaluation of Crossover Operator Performance in Genetic Algorithms With Binary Representation

Evaluation of Crossover Operator Performance in Genetic Algorithms With Binary Representation Evaluation of Crossover Operator Performance in Genetic Algorithms with Binary Representation Stjepan Picek, Marin Golub, and Domagoj Jakobovic Faculty of Electrical Engineering and Computing, Unska 3,

More information

Image Normalization for Illumination Compensation in Facial Images

Image Normalization for Illumination Compensation in Facial Images Image Normalization for Illumination Compensation in Facial Images by Martin D. Levine, Maulin R. Gandhi, Jisnu Bhattacharyya Department of Electrical & Computer Engineering & Center for Intelligent Machines

More information

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

International Journal of Software and Web Sciences (IJSWS) www.iasir.net International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation Shanofer. S Master of Engineering, Department of Computer Science and Engineering, Veerammal Engineering College,

More information

ADVANCED MACHINE LEARNING. Introduction

ADVANCED MACHINE LEARNING. Introduction 1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

More information