# Feature Selection with Decision Tree Criterion

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland Abstract Classification techniques applicable to real life data are more and more often complex hybrid systems comprising feature selection. To augment their efficiency we propose two feature selection algorithms, which take advantage of a decision tree criterion. Large computational experiments have been done to test the possibilities of the two methods and to compare their results with other techniques of similar goal.. Introduction Effective and versatile classification can not be achieved by single classification algorithms. They must be hybrid complex models comprising a feature selection stage. Similarly to other data analysis tasks, also in feature selection it is very beneficial to take advantage of different kinds of algorithms. One of the great features of decision tree algorithms is that they inherently estimate a suitability of features for separation of objects representing different classes. This facility can be directly exploited for the purpose of feature selection. After presenting the Separability of Split Value criterion, we derive two algorithms of ranking features. The methods are applied to a number of publicly available datasets and their results compared to those obtained with some most commonly used feature selection techniques. At the end we present a number of conclusions and future perspectives. 2. SSV based algorithms One of the most efficient heuristics used for decision tree construction is the Separability of Split Value (SSV) criterion [3].Its basic advantage is that it can be applied to both continuous and discrete features in such a manner that the estimates of separability can be compared regardless the substantial difference in types. The split value is defined differently for continuous and symbolic features. For continuous features it is a real number and for symbolic ones it is a subset of the set of alternative values of the feature. The left side (LS) and right side (RS) of a split value s of feature f for a given dataset D are defined as: { {x D : f(x) <s} if f is continuous LS s,f (D) = {x D : f(x) s} otherwise RS s,f (D) =D LS s,f (D) where f(x) is the f s feature value for the data vector x. The definition of the separability of split value s is: SSV(s, f, D) =2 LS s,f (D c ) RS s,f (D D c ) c C c C min( LS s,f (D c ), RS s,f (D c ) ) where C is the set of classes and D c is the set of data vectors from D assigned to class c C. Originally the criterion was used to construct decision trees with recursive splits. In such an approach, current split is selected to correspond to the best split value among all possible splits of all the features (to avoid combinatorial explosion in the case of discrete features with numerous symbols the analysis is confined to a relatively small number of subsets). The subsequent splits are performed as long as any of the tree nodes can be sensibly split. Then the trees are pruned to provide possibly highest generalization (an inner cross validation is used to estimate the best pruning parameters). 2.. SSV criterion for feature selection The SSV criterion has been successfully used not only for building classification trees. It proved efficient in data type conversion (from continuous to discrete [2] and in the

2 opposite direction [4]) and as the discretization part of feature selection methods, which finally rank the features according to such indices like Mutual Information []. Decision tree algorithms are known for the ability of detecting the features that are important for classification. Feature selection is inherent there, so it is not so necessary at the data preparation phase. Inversely: the trees capabilities can be used for feature selection. Feature selection based on the SSV criterion can be designed in different ways. From the computational point of view, the most efficient one is to create feature ranking on the basis of the maximum SSV criterion values calculated for each of the features, for the whole training dataset. The cost is the same as when creating decision stubs (singlesplit decision trees). However, when the classification task is a multiclass one, separabilities of single splits can not reflect the actual virtue of the features. Sometimes two or three splits may be necessary to prove that the feature can accurately separate different classes. Thus it is reasonable to reward such features, but some penalty must be introduced, when multiple splits are being analyzed. These ideas brought the following algorithm: Algorithm (Separate feature analysis) Input: Training data (a sample X of input patterns and their labels Y ). Output: Features in decreasing order of importance.. For each feature f (a) T the SSV decision tree built for onedimensional data (feature f only). (b) n the number of leaves in T. (c) For i = n downto i. s i SSV(T ) log(2+i), where SSV(T ) is the sum of the SSV values for all the splits in T. ii. Prune T by deleting a node N of minimum error reduction. (d) Define the rank R(f) of feature f as the maximum of the s i values. 2. Return the list of features in decreasing order of R(f). Another way of using SSV for feature ranking is to create a single decision tree and read feature importance from it. The filter we have used here is the algorithm 2. Algorithm 2 (Feature selection from single SSV tree) Input: Training data X, Y. Output: Features in decreasing order of importance.. T the SSV decision tree built for X, Y. 2. For each non-leaf node N of T, G(N) the classification error reduction of node N. 3. F the set of all the features of the input space. 4. i 0. While F do: (a) For each feature f Fnot used by T define its rank R(f) i. Remove these features from F. (b) Prune T by deleting all the final splits of nodes N for which G(N) is minimal. (c) Prune T by deleting all the final splits of nodes N for which G(N) =0. (d) i i + 6. Return the list of features in decreasing order of R(f). This implements a,,full-featured filter the decision tree building algorithm selects the splits locally, i.e. with respect to the splits selected in earlier stages, so that the features occurring in the tree, are complementary. It is important to see that despite this, the computational complexity of the algorithm is very attractive (contrary to wrapper methods of feature selection). The ranking generated by the algorithm 2 can be used for feature selection either by dropping all the features of rank 0 or by picking a given number of top-ranked features. In some cases the full classification trees use only a small part of the features. It does not allow to select any number of features the maximum is the number of features used by the tree, because the algorithm gives no information about the ranking of the rest of the features. 3. Algorithms used for the comparison Numerous entropy based methods and many other feature relevance indices are designed to deal with discrete data. Their results strongly depend on the discretization method applied before calculation of feature ranks []. This makes reliable comparisons of ranking algorithms very difficult, so here we compare the results obtained with methods, which do not suffer from such limitations. We have chosen two simple but very successful methods based on the Pearson s correlation coefficient and a Fisher like criterion. Another two methods being the subject of the comparisons belong to the Relief family. The Pearson s correlation coefficient detects linear correlation. Thus, used for feature selection it may provide poor results in the case of nonlinear dependencies.nevertheless in many applications it seems optimal because of very good results and computational simplicity. The Fisher-like criterion is the F-score defined as m 0 m, () s 0 + s where m i is the mean value of the feature calculated for the elements of i-th class, and s i is the corresponding standard

3 deviation. Such criterion can be used only in the case of binary classification. For multiclass data our estimate of feature relevance was the maximum of the one class vs the rest ranks. The two representatives of the Relief family of algorithms are the basic Relief method [6] and its extension by Kononenko known as ReliefF [7]. The basic algorithm randomly selects m vectors and for each (vector V ) of them finds the nearest hit H and miss M (neighbors belonging to the same and different class respectively) then the weight of each feature f is changed according to W f W f m diff f (V,H)+ m diff f (V,M), (2) where diff f, for numerical feature f, is the normalized difference between the values of f and for nominal f it is the truth value of the equality of given values. The ReliefF algorithm differs from the basic version in that it takes k nearest hits and k nearest misses of each class in a generalization of formula (2). All our calculations used k =. The m parameter was selected to be less than or equal to 0. For smaller datasets we used each training vector once instead of m randomly selected vectors. Most of the calculations time was spent on running the Relief methods, which are most complex of the six methods being analyzed all the complexities are given in table. Method Complexity CC fn F-score fn Relief m(n + f) ReliefF m(n log k + fk) SSV fnlog n SSV2 fnlog n Table. Feature selection methods complexities (n vectors count, f features count) 4. Testing methodology The algorithms resulting in a feature ranking can be compared only in the context of particular classifiers. Different kinds of classification learning methods may broaden the scope of the analysis, so we have applied Naive Bayesian Classifier, k Nearest Neighbors (knn) method with Euclidean and Manhattan distance metrics and Support Vector Machines (SVM) with Gaussian and linear kernels (all our SVMs were trained with parameters C= and bias=0.). All datasets were standardized before the tests, however it must be pointed out, that none of the feature ranking methods tested here is sensitive to data scaling the standardization affects only knn and SVM classification methods. Moreover, Naive Bayes and SSV based selections directly deal with symbolic data, so in such cases, the standardization does not concern them at all. Drawing reliable conclusions requires appropriate testing strategy and appropriate analysis of the results. The major group of our tests were -fold cross-validation (CV) tests repeated times to obtain good approximation of the average result. In each fold of the CV tests we ran six feature ranking algorithms: CC based ranking, F-score based ranking, basic Relief, ReliefF, and two our methods presented as algorithms (SSV) and 2 (SSV2). The six rankings were analyzed to determine all the feature subsets generated by selecting a number of top-ranked features. Then for each such subset the classification algorithm was applied and results collected (these are the results averaged by the repetitions of -fold CV). Such strategy may require enormous amount of calculations. For example: when we tested SVM models on features selected from a 60 features dataset, we needed to solve about 3 classification tasks at each CV pass (there are 6*60 = 360 possible selections some feature subsets are always repeated). If we deal with a 3-class task we need to build 3 SVM models for each CV pass ( one class vs the rest technique), which implies 3*3 = 960 SVM runs. We repeat times the -fold CV, so finally we need to build almost SVM models such calculations took more than 0 hours of (one half of) a 3.6GHz Pentium 4 HT. To determine the statistical significance of the differences in the results we used the t-test (or paired t-test) methodology with p =0.0.. Results To make thorough analysis possible, feature selection methods must be applied to datasets defined in a space of not too large dimensionality, but on the other hand too small dimensionality would make feature selection tests baseless. Thus we decided to use the higher-dimensional classification tasks data available in the UCI repository [8]. Table 2 presents the datasets we have used for the tests, however because of the space limits we discuss just a representative part of the results the figures for all the datasets can be seen at kgrabcze/papers/0- sel-figures.pdf. The datasets distributed in separate training and test files had been merged and CV tests performed for the whole data. The figures present average classification accuracies ( repetitions of -fold CV) obtained with different classi-

4 dataset features vectors classes chess glass image segmentation ionosphere mushroom promoters satellite image splice thyroid Table 2. Datasets summary Naive Bayes NN (Manhattan) 2 fiers trained on data after given number of top-ranked features were selected (the rankings were generated for each of the CV folds independently). Ionosphere data (Figure ). The results obtained with Naive Bayesian classifier for the ionosphere data are unique. The SSV2 algorithm selecting all the features used by SSV decision trees leads to much higher accuracy (89.87%) than obtainable with the other methods. Please notice that the SSV2 algorithm can sensibly rank only the features which occur in the tree, thus we can not reliably select more features than there are in the tree the figures reflect this fact by constant values of the SSV2 series for the numbers of features exceeding the sensible maximum. NN classifier with Euclidean metric gives maximum accuracy (89.9%) when 4 features are selected with CC or F-Score, but a very high result (88.32%) can be obtained for just 2 features selected with SSV2 (the accuracy is significantly better than for the whole dataset 84.6%). NN with Manhattan distance yields maximum 9.% accuracy for features selected with SSV2. It is interesting that the SSV selection with SVM classifier works poorly for small numbers of features, but is one of the fastest to reach the accuracy level of SVM trained in the whole data. Although feature selection can not significantly improve the accuracy, it may be very precious because of the dimensionality reduction it offers. Promoters data (Figure 2). This is an example of a dataset for which almost each selection method improves the accuracy of each classifier and the improvement is very significant. The exceptions are the combinations of the Naive Bayes classifier with the Relief and SSV2 feature selection methods. Please, notice that the maximum accuracy is obtained usually for the number of features between 3 and, so the selection is crucial here. Splice data (Figure 3). The conclusions for this dataset are very similar to those of the promoters data. Feature selection provides very significant improvement of NN and SVM (gaussian kernels) CC F-Score Relief ReliefF SSV SSV2 Figure. Results for the ionosphere data. SVM classifiers. For Naive Bayes we get a slight improvement but also statistically significant the best results are within the range of - features and are obtained with ReliefF (96.2% accuracy with standard deviation of 0.4%), and SSV (96.% ± 0.4%), while with no selection we achieve 9.66% ± 0.08%. The results show less variance than in the case of promoters (this is due to the larger training data samples). Hence even small differences can be statistically significant here. Other data sets (Figure 4). Image segmentation and thyroid data are the examples of classification tasks, where decision trees show best adaptation abilities. It is consistent with the fact that for these datasets the SSV2 selector is most useful also for other classification algorithms. NN applied to image segmentation achieves its best 2 2

5 Naive Bayes Naive Bayes 60 NN (Manhattan) NN (Euclidean) 60 SVM (gaussian kernels) SVM (gaussian kernels) 60 CC F-Score Relief ReliefF SSV SSV2 CC F-Score Relief ReliefF SSV SSV2 Figure 2. Results for the promoters data. Figure 3. Results for the splice data. result (96.43% ± 0.22%) after selecting features with SSV2, which is significantly better result than with any other feature selection method, we have tested. 3 features are enough to get 96.28% ± 0.3%. The plot showing the results of NN with Euclidean measure, applied to the thyroid data shows clearly the advantage of SSV2 selector (its ability to detect dependencies between features) selecting 3 features yields a high accuracy of very high stability (98.4% ± 0.04%), specific for decision trees in the case of this dataset. The remaining two plots of figure 4 demonstrate some interesting negative results. For the satellite image data CC based selection loses any comparisons, though proves very valuable in many other tests. It is also an exceptional dataset, where ReliefF performs worse than Relief. For the chess data, SSV selection methods is not in the same league with others. Also the F-Score and CC methods can not improve the accuracy of SVM while SSV2, Relief and ReliefF offer significant improvements. 6. Conclusions We have presented two feature selection methods based on the SSV criterion and experimentally confirmed that they are a very good alternative to the most popular methods. On our way we have shown several examples of spectacular improvement obtained by means of feature selection. Feature selection algorithms can not be estimated without the context of final models. We must realize that different classifiers may require different feature selectors to do their best, however in general we observe that for a particular dataset, a feature selectors performing good (or bad)

6 0, Image segmentation data - NN (Euclidean) Thyroid data - NN (Euclidean) 8 for some classifier, perform good (or bad respectively) with other classifiers. In accordance with the no-free-lunch theorem, there is no single feature selection method performing very good for any dataset. Thus effective feature selection requires a validation stage to confirm methods usability in a given context. Such validation can be expensive, so we must be careful. To reduce the complexity of the search for optimal feature sets we may find some regularities and use them as heuristics (for example, we see that SSV2 is usually very good feature selector for the datasets, where decision tree algorithms provide high accuracies). Further research. The comparison presented here treats each of the feature selection methods separately. It is a natural step forward to think about efficient algorithms to combine the results of different feature selection methods. Some simple methods have already been used [] but deeper metalevel analysis should bring many interesting conclusions and more versatile feature selection methods. Acknowledgements. The research is supported by the Polish Ministry of Science a grant for years References Satellite image data - NN (Manhattan) 2 3 Chess data - SVM (gaussian kernels) 2 3 CC F-Score Relief ReliefF SSV SSV2 Figure 4. Other interesting results. [] W. Duch, J. Biesiada, T. Winiarski, K. Grudziński, and K. Grąbczewski. Feature selection based on information theory filters and feature elimination wrapper methods. In Proceedings of the Int. Conf. on Neural Networks and Soft Computing (ICNNSC 02), Advances in Soft Computing, pages 73 76, Zakopane, 02. Physica-Verlag (Springer). [2] K. Grąbczewski. SSV criterion based discretization for Naive Bayes classifiers. In Proceedings of the 7th International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, June 04. [3] K. Grąbczewski and W. Duch. A general purpose separability criterion for classification systems. In Proceedings of the 4th Conference on Neural Networks and Their Applications, pages 3 8, Zakopane, Poland, June 999. [4] K. Grąbczewski and N. Jankowski. Transformations of symbolic data for continuous data oriented models. In Artificial Neural Networks and Neural Information Processing ICANN/ICONIP 03, pages Springer, 03. [] K. Grąbczewski and N. Jankowski. Mining for complex models comprising feature selection and classification. In I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors, Feature extraction, foundations and Applications. Springer, 0. [6] K. Kira and L. A. Rendell. A practical approach to feature selection. In ML92: Proceedings of the ninth international workshop on Machine learning, pages , San Francisco, CA, USA, 992. Morgan Kaufmann Publishers Inc. [7] I. Kononenko. Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning, pages 7 82, 994. [8] C. J. Merz and P. M. Murphy. UCI repository of machine learning databases, mlearn/mlrepository.html.

### Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

### Feature Selection with Data Balancing for. Prediction of Bank Telemarketing

Applied Mathematical Sciences, Vol. 8, 2014, no. 114, 5667-5672 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.47222 Feature Selection with Data Balancing for Prediction of Bank Telemarketing

### Meta-learning as intelligent searching through model space

Meta-learning as intelligent searching through model space Norbert Jankowski & Krzysztof Grąbczewski Department of Informatics Nicolaus Copernicus University Toruń, Poland http://www.is.umk.pl/ norbert@is.umk.pl

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Computational intelligence methods for information understanding and information management

Computational intelligence methods for information understanding and information management Włodzisław Duch 1,2, Norbert Jankowski 1 and Krzysztof Grąbczewski 1 1 Department of Informatics, Nicolaus Copernicus

### Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### Feature Selection with Monte-Carlo Tree Search

Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1 Agenda 1 Feature Selection 2 Feature Selection

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Paper Classification for Recommendation on Research Support System Papits

IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5A, May 006 17 Paper Classification for Recommendation on Research Support System Papits Tadachika Ozono, and Toramatsu Shintani,

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### MACHINE LEARNING. Introduction. Alessandro Moschitti

MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:00-16:00

### Meta-learning via Search Combined with Parameter Optimization.

Meta-learning via Search Combined with Parameter Optimization. Włodzisław Duch and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

### Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

### MANY factors affect the success of data mining algorithms on a given task. The quality of the data

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 3, MAY/JUNE 2003 1 Benchmarking Attribute Selection Techniques for Discrete Class Data Mining Mark A. Hall, Geoffrey Holmes Abstract Data

### Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### Decision tree algorithm short Weka tutorial

Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a. 2009-2010 Machine Learning: brief summary Example You need to write a program that: given a

### Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

### Music Classification by Composer

Music Classification by Composer Janice Lan janlan@stanford.edu CS 229, Andrew Ng December 14, 2012 Armon Saied armons@stanford.edu Abstract Music classification by a computer has been an interesting subject

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

### The Optimality of Naive Bayes

The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

### Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

### Model Trees for Classification of Hybrid Data Types

Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,

### Mutual information based feature selection for mixed data

Mutual information based feature selection for mixed data Gauthier Doquire and Michel Verleysen Université Catholique de Louvain - ICTEAM/Machine Learning Group Place du Levant 3, 1348 Louvain-la-Neuve

### Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

### Predicting Student Performance by Using Data Mining Methods for Classification

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

### BACKPROPAGATION (BP) methods use local, gradient-based minimization techniques

Searching for optimal MLP Włodzisław Duch, Krzysztof Grąbczewski Department of Computer Methods, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. E mail: {duch, kgrabcze}@phys.uni.torun.pl

### Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

### Music Mood Classification

Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### Reducing multiclass to binary by coupling probability estimates

Reducing multiclass to inary y coupling proaility estimates Bianca Zadrozny Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114 zadrozny@cs.ucsd.edu

### Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

### Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

### INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

### RapidMiner Sentiment Analysis Tutorial. Some Orientation

RapidMiner Sentiment Analysis Tutorial Some Orientation Set up Training First make sure, that the TextProcessing Extensionis installed. Retrieve labelled data: http://www.cs.cornell.edu/people/pabo/movie-review-data

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### T3: A Classification Algorithm for Data Mining

T3: A Classification Algorithm for Data Mining Christos Tjortjis and John Keane Department of Computation, UMIST, P.O. Box 88, Manchester, M60 1QD, UK {christos, jak}@co.umist.ac.uk Abstract. This paper

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Decompose Error Rate into components, some of which can be measured on unlabeled data

Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

### Supervised Learning with Unsupervised Output Separation

Supervised Learning with Unsupervised Output Separation Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa 150 Louis Pasteur, P.O. Box 450 Stn. A Ottawa, Ontario,

### Predicting Star Ratings of Movie Review Comments Aju Thalappillil Scaria (ajuts) Rose Marie Philip (rosep) Sagar V Mehta (svmehta)

Predicting Star Ratings of Movie Review Comments Aju Thalappillil Scaria (ajuts) Rose Marie Philip (rosep) Sagar V Mehta (svmehta) 1. Introduction The growth of the World Wide Web has resulted in troves

### Software Defect Prediction Modeling

Software Defect Prediction Modeling Burak Turhan Department of Computer Engineering, Bogazici University turhanb@boun.edu.tr Abstract Defect predictors are helpful tools for project managers and developers.

### An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### Drug Store Sales Prediction

Drug Store Sales Prediction Chenghao Wang, Yang Li Abstract - In this paper we tried to apply machine learning algorithm into a real world problem drug store sales forecasting. Given store information,

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Triangular Visualization

Triangular Visualization Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland tmaszczyk@is.umk.pl;google:w.duch http://www.is.umk.pl Abstract.

### Classifiers & Classification

Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### Cross-Validation. Synonyms Rotation estimation

Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

### Meta-learning. Synonyms. Definition. Characteristics

Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore wduch@is.umk.pl (or search

### A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY

### Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### 6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of

Clustering Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. Differences between models/algorithms

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### Sub-class Error-Correcting Output Codes

Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat

### A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

### Combining Classifiers with Meta Decision Trees

Machine Learning, 50, 223 249, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Combining Classifiers with Meta Decision Trees LJUPČO TODOROVSKI Ljupco.Todorovski@ijs.si SAŠO DŽEROSKI

### Predicting Flight Delays

Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

### International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

### Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

### Final Exam, Spring 2007

10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

### Decision Trees (Sections )

Decision Trees (Sections 8.1-8.4) Non-metric methods CART (Classification & regression Trees) Number of splits Query selection & node impurity Multiway splits When to stop splitting? Pruning Assignment

### Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

### Searching for Interacting Features

Searching for Interacting Features Zheng Zhao and Huan Liu Department of Computer Science and Engineering Arizona State University {zheng.zhao, huan.liu}@asu.edu Abstract Feature interaction presents a

### Ensembles of Similarity-Based Models.

Ensembles of Similarity-Based Models. Włodzisław Duch and Karol Grudziński Department of Computer Methods, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. E-mails: {duch,kagru}@phys.uni.torun.pl

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### Entropy and Information Gain

Entropy and Information Gain The entropy (very common in Information Theory) characterizes the (im)purity of an arbitrary collection of examples Information Gain is the expected reduction in entropy caused

### COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS

COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS AMJAD HARB and RASHID JAYOUSI Faculty of Computer Science, Al-Quds University, Jerusalem, Palestine Abstract This study exploits

### Chapter 7. Feature Selection. 7.1 Introduction

Chapter 7 Feature Selection Feature selection is not used in the system classification experiments, which will be discussed in Chapter 8 and 9. However, as an autonomous system, OMEGA includes feature

### BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

### Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

### Mining a Corpus of Job Ads

Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department

### Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13

### Constructing New and Better Evaluation Measures for Machine Learning

Constructing New and Better Evaluation Measures for Machine Learning Jin Huang School of Information Technology and Engineering University of Ottawa Ottawa, Ontario, Canada K1N 6N5 jhuang33@site.uottawa.ca

### Improving spam mail filtering using classification algorithms with discretization Filter

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

### NEW METHOD OF VARIABLE SELECTION FOR BINARY DATA CLUSTER ANALYSIS

STATISTICS IN TRANSITION new series, June 2016 295 STATISTICS IN TRANSITION new series, June 2016 Vol. 17, No. 2, pp. 295 304 NEW METHOD OF VARIABLE SELECTION FOR BINARY DATA CLUSTER ANALYSIS Jerzy Korzeniewski

### A fast multi-class SVM learning method for huge databases

www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

### Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

### Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

### Beating the MLB Moneyline

Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or