Learning Predictive Clustering Rules
|
|
|
- Brianna Park
- 10 years ago
- Views:
Transcription
1 Learning Predictive Clustering Rules Bernard Ženko1, Sašo Džeroski 1, and Jan Struyf 2 1 Department of Knowledge Technologies, Jožef Stefan Institute, Slovenia [email protected], [email protected] 2 Department of Computer Science, Katholieke Universiteit Leuven, Belgium [email protected] Abstract. The two most commonly addressed data mining tasks are predictive modelling and clustering. Here we address the task of predictive clustering, which contains elements of both and generalizes them to some extent. We propose a novel approach to predictive clustering called predictive clustering rules, present an initial implementation and its preliminary experimental evaluation. 1 Introduction Predictive modeling or supervised learning aims at constructing models that can predict the value of a target (dependent) variable from the known values of attributes (independent variables). A wide array of predictive modeling methods exist which produce more or less (or not at all) interpretable models. Typical representatives of the group of methods that result in understandable and interpretable models are decision trees learning[13] and rule learning [7]. (They also happen to partition the space of examples into subgroups.) Clustering [9], on the other hand, is an unsupervised learning method. It tries to find subgroups of examples or clusters with homogenous values of all variables, not just the target variable. In fact, the target variable is usually not even defined in clustering tasks. The result is a set of clusters and not necessarily their descriptions or models, but usually we can link new examples to the constructed clusters based on e.g., proximity in the variable space. Predictive modeling and clustering are therefore regarded as quite different techniques. Nevertheless, different viewpoints also exist [10] which stress many similarities that some predictive modeling methods (methods which partition the space of examples such as tree and rule induction) and clustering share. Predictive modelling methods (trees and rules) partition the space of examples into subspaces with homogeneous values of the target variable, while (distance-based) clustering methods seek subspaces with homogeneous values of the attributes. In this paper, we consider the task of predictive clustering, which contains elements of both predictive modelling and clustering and generalizes them to some extent. In predictive clustering, one can simultaneously consider homogeneity along the target variables and the attributes. Also, symbolic descriptions of the learned clusters can be generated in the forms of trees and rules (which connect 122
2 closely to the predictive modelling variants thereof). We propose a novel approach to predictive clustering, namely, predictive clustering rules (PCRs) and present a learning method for them. The task of learning PCRs generalizes the task of rule induction, on one hand, and clustering, and in particular itemset constrained clustering, on the other. It is thus important for constraint-based data mining and inductive databases. As IDBs are the most promising approach to finding a general framework for data mining, bringing the two most common data mining tasks a step closer is a step in this direction. Also, constraint-based clustering is certainly an under-researched topic in constraint-based data mining. In the next section, we discuss in more detail prediction and clustering, then define the task of predictive clustering and in particular the task of learning PCRs. We then describe an algorithm for learning PCRs, followed by a discussion of related work. Before concluding and outlining directions for further work, we present a preliminary experimental evaluation. 2 Prediction, clustering, and predictive clustering The tasks of predictive modelling and clustering are two of the oldest and most commonly addressed tasks of data analysis (and data mining). Here we briefly introduce each of them and introduce predictive clustering, a task that combines elements of both prediction and clustering. We also discuss in some more detail the more specific task of learning predictive clustering rules that is the focus of this paper. 2.1 Predictive modelling Predictive modeling aims at constructing models that can predict a target property of an object from a description of the object. Predictive models are learned from sets of examples, where each example has the form (A,T), with A being an object description and T a target property value. While a variety of languages ranging from propositional to first order logic have been used for A, T is almost always considered to consist of a single dependent variable called the class: if this variable is discrete we are dealing with a classification problem and if continuous with a regression problem. In practice, A is most commonly a vector of independent variables, called attributes (attribute-value representation). In the remainder of the paper, we will consider both A and T to be vectors of attributes (discrete or real-valued). Given T is a vector with several target variables, we call the task of predictive modelling multi-objective prediction. If T only contains discrete variables we speak of multi-objective classification. If T only contains continuous variables we speak of multi-objective regression. Predictive models (that are learned from data) can take many different forms that range from linear equations to logic programs. Two most commonly used 123
3 types of models are decision trees [13] and rules [7]. Unlike (regression) equations that provide a single predictive model for the entire example space, trees and rules partition the space of examples into subspaces and provide a simple prediction or predictive model for each of these. While the subspaces defined by rules can overlap, those defined by trees (i.e., their leafs) do not overlap. 2.2 Clustering, clustering trees and clustering rules Clustering in general is concerned with grouping objects into classes of similar objects. Given a set of examples (object descriptions), the task of clustering is to partition these examples into subsets, called clusters. Note that examples do not contain a target property to be predicted, but only an object description (which is typically a vector of attributes A). The goal of clustering is to achieve high similarity between objects within individual clusters and low similarity between objects that belong to different clusters. Conventional clustering focuses on distance-based cluster analysis. The notion of a distance (or conversely, similarity) is crucial here: examples are considered to be points in a metric space (a space with a distance measure). Note that a prototype (prototypical example) may be used as a representative for a cluster. This may be, e.g., the mean or the medoid of the examples in the cluster (which are examples with lowest average distance to all the examples in the cluster). In conceptual clustering [12], a symbolic representation of the resulting clusters is produced in addition to the partition into clusters: we can thus consider each cluster to be a concept (much like a class in classification). In this context, a decision tree structure may be used to represent a hierarchical clustering: such a tree is called a clustering tree [1]. In a clustering tree each node represents a cluster. The conjunction of conditions on the path from the root to that node gives a symbolic representation of the cluster. Essentially, each cluster has a symbolic description in the form of a rule (IF conjunction of conditions THEN cluster), while the tree structure represents the hierarchy of clusters. Clusters that are not on the same branch of a tree do not overlap. If we are interested in generating a flat set of clusters (no hierarchy), we can represent a clustering as a set of clustering rules. Each cluster would be represented by a rule (IF conjunction of conditions THEN cluster), like in clustering trees. However, the rules need not be disjunct and the clusters can overlap. Given the above, predictive modelling approaches which partition the space of examples, such as decision tree and rule induction, are in a sense very similar to clustering. A major difference is that they partition the space of examples into subspaces with homogeneous values of the target variable, while (distance-based) clustering methods seek subspaces with homogeneous values of the attributes. On the other hand, the predictive modelling approaches mentioned seek conceptual clusters that have symbolic descriptions, which is typically not the case for distance-based clustering. 124
4 2.3 Predictive clustering The task of predictive clustering combines elements from both prediction and clustering. As is common in clustering, we seek for clusters of examples that are similar to each other (and dissimilar to examples in other clusters). In addition, we seek for a predictive model to be associated with each cluster; the model gives a prediction of the target variables T in terms of the attributes A for all examples that are established to belong to that cluster. In the simplest and most common case, the predictive model associated to a cluster would be the projection on T of the prototype of the examples that belong to that cluster. This would be a simple average when T is a single continuous variable. In the discrete case, it would be a probability distribution across the discrete values or the mode thereof. When T is a vector, the prototype would be a vector of averages and distributions/modes. We will be interested in the case where each cluster has both a symbolic description (in terms of a language bias over A) and a predictive model (a prototype in T) associated to it. The corresponding tree-based and rule-based representations are called predictive clustering trees [2] and predictive clustering rules (this paper). Unlike clustering, which only considers homogeneity for A and predictive modelling, which only considers homogeneity for T, predictive clustering in general takes both A and T into account. The task of predictive clustering is thus defined as follows: Given a set of examples E, where each example has the form O = (A,T) a declarative bias B over the language for describing examples a distance measure d that computes the distance between two examples a prototype function p that computes the prototype of a set of examples Find a set of clusters, where each cluster is associated with a description expressed in B each cluster has an associated prediction expressed as a prototype within-cluster distance is low (similarity is high) and between-cluster distance is high (similarity is low) 2.4 Learning predictive clustering rules (PCRs) for multi-objective classification (MOC) In this paper, we will address the task of predictive clustering and focus on learning rules for multi-objective classification. Extending the proposed approach to multi-objective regression and multi-objective prediction in general will be the subject of immediate further work. The examples will be vectors, divided into attributes and target variables, where attributes can be either discrete or continuous. The declarative bias will restrict our hypothesis language to rules consisting of conjunctions of attribute-value conditions over the attributes A. (Additional language constraints are planned for consideration in the near future.) 125
5 The distance measure d would typically only consider T, since we are dealing with prediction. However, we will be considering a distance measure that consists of two components, one over the attributes and the other over the targets: d = (1 τ)d A + τd T. This measure will be used as a heuristic in the search for rules. The distance d (including d A and d T ) is computed as a weighted sum of distances along each (normalized) dimension. For continuous variables, the absolute difference is taken. Values of discrete variables are treated as probability distributions thereof, the distances between probability distributions being the sum of absolute differences over the probabilities for all possible values. Given a (partial) rule that covers a set of examples S, its quality is estimated as the average distance of an example in S to the prototype of S. We will call this the compactness of the rule. For evaluating the predictive performance (accuracy), only the d T part is relevant. The d A part also takes into account the compactness of (average distance between) the examples covered by the rules, in the attribute space. The prototype function takes a set of examples as input and returns a prototype that is computed per dimension. For continuous attributes the average is taken; for discrete variables, the probability distribution across possible values. 3 Learning predictive clustering rules This section describes our system for learning PCRs. The majority of rule induction methods are based on the covering algorithm and among these the CN2 algorithm [4] is well known. Our system is based on this algorithm, but several important parts are modified. In this section we first briefly describe the original CN2 algorithm, and then we present our modifications. 3.1 Rule induction with CN2 The CN2 algorithm iteratively constructs rules that cover examples with homogenous target variable values. The heuristic used to guide the search is simply the accuracy of the rule under construction. After a rule has been constructed, the examples covered by this rule are removed from the training set, and the procedure is repeated on the new data set until the data set is empty or no new rules are found. The rules constructed in this way are ordered, meaning that they can be used for prediction as a decision list; we test rules on a new example one by one and the first rule that fires is used for prediction of the target value of this example. Alternatively, CN2 can also construct unordered rules if only correctly classified examples are removed from the training set after finding each rule and if rules are built for each possible class value. When using unordered rules for prediction, several rules can fire on each example in which case weighted voting is used to get the final prediction. 126
6 3.2 The search heuristic: compactness The main difference between standard rule induction with the CN2 covering algorithm and the approach presented in this paper is the heuristic used for guiding the search of rules. The purpose of the heuristic is the evaluation of different rules, so it should measure a property which is somehow important for the quality of each rule and/or the whole rule set. One of the most important properties of rules (and other models) is their accuracy, and standard CN2 simply uses this as a heuristic. Accuracy is only connected to the target attribute. Our goal when developing predictive clustering rules was (besides accuracy) that the induced rules should cover compact subsets of examples, just as clustering does. For this purpose we need a heuristic which takes into account the target variables as well as the attributes. As explained above, we will use the compactness (average distance of an example covered by a rule to the prototype of this set of examples). The compactness takes into account both the attribute and target variable dimensions and is a weighted sum of the compactness along each of the dimensions (the latter are normalized to be between 0 and 1). At present only a general weight τ is applied for putting the emphasis on the targets (τ = 1) or the attributes (τ = 0): target attributes should in general have higher weights in order to guide the search towards accurate rules. Because the attributes can be in general nominal or numeric, different measures for each type are needed which are then combined (added) into a single measure. The compactness measure of a set of examples along one nominal attribute is calculated as the average distance of all examples in a set from its prototype. The prototype is simply a tuple with relative frequencies of each of the possible attribute values in the given set of examples. For the attribute with K possible values (v 1 to v K ) the prototype is of the form (f 1,f 2,...,f K ). The distance of an example with a value v k from this prototype is equal to (1 f k ). The compactness measure for one numeric attribute is equal to the mean absolute error of the attributes value and its mean. The values of numeric attributes are normalized in advance. Note that our compactness heuristic is actually an incompactness heuristic, since smaller values mean more compact sets of examples. 3.3 Weighted covering The standard covering algorithm removes the examples covered by a rule from the training set in each iteration. As a consequence, subsequent rules are constructed on smaller example subsets which can be improperly biased and can have small coverage. To overcome these shortages we employ the weighted covering algorithm [11]. The difference is that once the example is covered by a new rule, the example is not removed from the training set but instead, its weight is decreased. As a result, the already covered example will be less likely covered in the next iterations. We use the additive weighting scheme, which means that 1 the weight of the example after being covered m times is equal to 1+m. Finally, when the example is covered more than a predefined number of times (in our experiments five times), the example is completely removed from the training set. 127
7 3.4 Probabilistic classification As already mentioned, the original CN2 algorithm can induce ordered or unordered rules. In case of ordered rules or decision list the classification is straightforward. We scan the rules one by one and whichever rule fires first on a given example is used for prediction. If no rule fires, the default rule is used. When classifying with unordered rules, CN2 collects class distributions of all rules that fire on an example and uses them for weighted voting. We use the same probabilistic classification scheme even though our unordered rules are not induced for each possible class value separately. 4 Related work Predictive modeling and clustering are regarded as quite different tasks. While there are many approaches addressing each of predictive modelling and clustering, few approaches look at both or try to relate them. A different viewpoint is taken by Langley [10] that predictive modeling and clustering have many similarities, which has motivated some recent research on combining prediction and clustering. The approach presented in this paper is closely related to clustering trees [2], which also address the task of predictive clustering. The systems TILDE [2] and CLUS [3] use a modified top-down induction of decision trees algorithm to construct clustering trees (which can predict values of more than one target variables simultaneously). So far, however, distances used in TILDE and CLUS systems have considered attributes or classes separately, but not both together, even though the idea was presented in [1]. Our approach uses a rule-based representation for predictive clustering. As such, it is closely related to approaches for rule induction, and among these in particular CN2 [4]. However, it extends rule induction to the more general task of multi-objective prediction. While some work exists on multi-objective classification with decision trees (e.g., [16]), the authors are not aware of any work on rule-induction for multi-objective classification. Also, little work exists on rule-based regression (some recent examples come from the are of ILP, e.g., FORS [8]), let alone rule-based multi-objective regression (or multi-objective prediction in general, with mixed continuous and discrete targets). Related to rule induction is subgroup discovery [11], which tries to find and describe interesting groups of examples. While subgroup discovery algorithms are similar to rule induction ones, they have introduced interesting innovations, including the weighted covering approach used in our system. Another related approach to combining clustering and classification is itemset constrained clustering [14, 15]. Here the attributes describing each example are separated in two groups, called feature items and objective attributes. Clustering is done on the objective attributes, but only clusters which can be described in terms of frequent item sets (using the feature items attributes) are constructed. As a result each cluster can be classified by a corresponding frequent item set. As in our approach, itemset classified clustering tries to find groups of examples with small variance of the objective attributes. As compared to itemset 128
8 Table 1. The attributes of the river water quality data set. Independent attributes Target attributes physical & chemical properties taxa presences/absences numeric type nominal type (0,1) water temperature alkalinity (ph) electrical conductivity dissolved O 2 O 2 saturation CO 2 conc. total hardness NO 2 conc. NO 3 conc. NH 4 conc. PO 4 conc. Cl conc. SiO 2 conc. chemical oxygen demand KMnO 4 chemical oxygen demand K 2Cr 2O 7 biological oxygen demand (BOD) Cladophora sp. Gongrosira incrustans Oedogonium sp. Stigeoclonium tenue Melosira varians Nitzschia palea Audouinella(Chantransia) chalybea Erpobdella octoculata Gammarus fossarum Baetis rhodani Hydropsyche sp. Rhyacophila sp. Simulium sp. Tubifex sp. classified clustering, our approach allows both discrete (and not only binary attributes / items) and continuous variables on the feature/attribute side, as well as the objective/target side. Itemset constrained clustering is also related to subgroup discovery, as it tries to find interesting groups of examples, rather than a set of (overlapping) clusters that cover all examples. 5 Experiments The current implementation of predictive clustering rules has been tested on the water quality domain with multiple target attributes. Two sets of experiments have been performed. First, we tried to test the performance of our method when predicting multiple target attributes at once in comparison to single target attribute prediction task. In the second set of experiments we investigated the influence of the target weighting parameter (τ) on the accuracy and compactness of induced rules. 5.1 Data set The data set used in our experiments comprises biological and chemical data that were collected through regular monitoring of rivers in Slovenia. The data come from the Environmental Agency of the Republic of Slovenia that performs water quality monitoring for most Slovenian rivers and maintains a database of water quality samples. The data cover a six year period, from 1990 to 1995 and have been previously used in [5]. 129
9 Table 2. The default accuracy and accuracies of predictive clustering rules (PCR) used for multiple target prediction of all taxa together, for predicting plant and animal taxa separately, and for single target prediction of each taxa separately. The last two columns presents the accuracies of predictive clustering trees (PCT) as taken from [6]. PCR PCT Target attribute Default All Plants/ Animals Indiv. Plants/ Animals Indiv. Cladophora sp Gongrosira incrustans Oedogonium sp Stigeoclonium tenue Melosira varians Nitzschia palea Audouinella chalybea Erpobdella octoculata Gammarus fossarum Baetis rhodani Hydropsyche sp Rhyacophila sp Simulium sp Tubifex sp Average accuracy The experimental setting is not the same as in case of other experiments. Biological samples are taken twice a year, once in summer and once in winter, while physical and chemical analyses are performed several times a year for each sampling site. The physical and chemical samples include the measured values of 15 different parameters. The biological samples include a list of all taxa (plant and animal species) present at the sampling site. All the attributes of the data set are listed in Table 1. In total, 1060 water samples are available in the data set. In our experiments we have considered the physical and chemical properties as independent attributes, and presences/absences of taxa as target attributes. 5.2 Results The first set of experiments was performed in order to test the appropriateness of predictive clustering rules for multiple target prediction. In all experiments the minimal number of examples covered by a rule was 20, and the weight of target attributes (τ) was set to 0.9. The results of 10-fold cross validation can be seen in Table 2. In the first column the default accuracies for each target attribute (taxon) are given, in the second the accuracies of predictions by multiple target model predicting all 14 class values together, and in the third column the accuracies of predictions by two multiple target models; one predicting plant taxa together (first seven classes) and another one predicting animal taxa together 130
10 Table 3. The accuracy of predictive clustering rules used for multiple target prediction of all taxa together with different target attributes weightings (τ). τ Target attributes Cladophora sp Gongrosira incrustans Oedogonium sp Stigeoclonium tenue Melosira varians Nitzschia palea Audouinella chalybea Erpobdella octoculata Gammarus fossarum Baetis rhodani Hydropsyche sp Rhyacophila sp Simulium sp Tubifex sp Average accuracy Average compactness Average coverage (last seven classes). The fourth column presents the accuracies of single target attribute models; one model was built for each taxa separately. The last two columns present the accuracies of predictive clustering trees (PCT) for multiple and single prediction as taken from [6]. The last row in the table gives average accuracies across all 14 target attributes. Looking at these average accuracies we can see that the performance of a model predicting all classes together is only slightly worse than the performance of single target models, while the performance of models predicting plants and animals together is somewhere in between. When comparing predictive clustering rules to predictive clustering trees, the performance of the latter is somewhat better, but the results are not directly comparable because of different experimental setup when evaluating PCTs. The task of the second set of experiments was to evaluate the influence of the target weighting parameter (τ) on the accuracy and compactness of induced rules (Table 3). The rules were induced for predicting all taxa together with six different values of the τ parameter. At the bottom of the table we have average accuracies of 10-fold cross-validation, average compactness of subsets of examples covered by rules in each model, and average coverage of rules in each model. The rules induced with larger weighting of the nontarget attributes (smaller τ) are on average more compact (smaller number for compactness means more compact subsets) and on average cover more examples. This trend is quite clear for values of τ from 1.0 to 0.8 or 0.7. Small weights for non-targets slightly 131
11 increase both overall compactness and accuracy, while large weights increase overall compactness at the expense of accuracy. 6 Conclusions and further work In this paper, we have considered the data mining task of predictive clustering. This is a very general task that contains many features of (and thus to a large extent generalizes over) the tasks of predictive modelling and clustering. While this task has been considered before, we have defined it both more precisely and in a more general form (i.e., to consider distances on both target and attribute variables and to consider clustering rules in addition to trees). We have introduced the notion of clustering rules and focused on the task of learning predictive clustering rules for multi-objective prediction. The task of inducing PCRs generalizes the task of rule induction, extending it to multiobjective classification, regression and in general prediction. It also generalizes some forms of distance-based clustering and in particular itemset constrained clustering. Given this, we believe this task is important for constraint-based data mining and inductive databases. Namely, IDBs are the most promising approach to finding a general framework for data mining: bringing the two most common data mining tasks a step closer is a step in this direction. Also, constraintbased clustering is certainly a (under-researched) topic in constraint-based data mining. We have implemented a preliminary version of a system for learning PCRs for multi-objective classification. We have also performed some preliminary experiments on a real-world dataset. The results show that a single rule-set for MOC can be as accurate as the collection of rule-sets for individual prediction of each target. The accuracies are also comparable to those of predictive clustering trees. Experiments in varying the weight of target vs. non-target attributes in the compactness heuristic used in the search for rules show that small weights for non-targets slightly increase both overall compactness and accuracy, while large weights increase overall compactness at the expense of accuracy. Note, however, that many more experiments are necessary to evaluate the proposed paradigm and implementation. These would include experiments on additional datasets. The latter should include both single-objective prediction and multi-objective prediction. In the latter case, classification, regression and a mixture thereof should be considered. Also, a comparison to other approaches to constrained clustering would be in order. Other directions for further work concern further development of the PCR paradigm and its implementation. At present, our implementation only considers multi-objective classification, but can be easily extended to regression problems, and also to mixed, classification/regression problems. Currently, the heuristic guiding the search for rules does not take the number of covered examples in consideration. Consequently, construction of overly specific rules can only be prevented by means of setting the minimum number of examples covered by a rule. Adding a coverage dependant part to the heuristic would enable the 132
12 induction of compact rules with sufficient coverage. Another possibility is the use of some sort of significance testing analogous to significance testing of the target variable distribution employed by CN2. Finally, the selection of weights for calculating the distance measure (and the compactness heuristic) is an open issue. One side of this is the weighting of target vs. non-target variables. Another side is the assignment of relevance-based weights to the attributes: while this has been considered for single-objective classification, we need to extend it to multi-objective prediction. References 1. Blockeel, H. (1998): Top-down induction of first order logical decision trees. PhD thesis, Department of Computer Science, Katholieke Universiteit, Leuven. 2. Blockeel, H., De Raedt, L., and Ramon, J. (1998): Top-down induction of clustering trees. Proceedings of the 15th International Conference on Machine Learning, pages 55 63, Morgan Kaufmann. 3. Blockeel, H. and Struyf, J. (2002): Efficient algorithms for decision tree crossvalidation, Journal of Machine Learning Research, 3(Dec): , Microtome Publishing. 4. Clark, P. and Niblett, T. (1989): The CN2 Induction Algorithm, Machine Learning, 3: , Kluwer. 5. Džeroski, S., Demšar, D., and Grbović, J. (2000): Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence, 13(1): Džeroski, S., Blockeel, H., and Grbović. (2001): Predicting river water communities with logical decision trees. Presented at the Third European Ecological Modelling Conference, Zagreb, Croatia. 7. Flach, P. and Lavrač, N. (1999): Rule induction. In Intelligent Data Analysis, eds. Berthold, M. and Hand, D. J., pages , Springer. 8. Karalič, A. and Bratko, I. (1997): First Order Regression. Machine Learning, 26: , Kluwer. 9. Kaufman, L. and Rousseeuw, P. J. (1990): Finding groups in data: An introduction to cluster analysis, John Wiley & Sons. 10. Langley, P. (1996): Elements of Machine Learning. Morgan Kaufman. 11. Lavrač, N., Kavšek, B., Flach, P., and Todorovski, L. (2004): Subgroup discovery with CN2-SD, Journal of Machine Learning Research, 5(Feb): , Microtome Publishing. 12. Michalski, R. S. (1980): Knowledge acquisition through conceptual clustering: A theoretical framework and algorithm for partitioning data into conjunctive concepts. International Journal of Policy Analysis and Information Systems, 4: Quinlan, J. R. (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann. 14. Sese, J. and Morishita, S. (2004): Itemset Classified Clustering. Proceedings of the Eighth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 04), pages , Springer. 15. Sese, J., Kurokawa, Y., Kato, K., Monden, M., and Morishita, S. (2004) Constrained clusters of gene expression profiles with pathological features. Bioinformatics. 16. Suzuki, E., Gotoh,M., and Choki, Y. (2001): Bloomy Decision Tree for Multiobjective Classification. Proceedings of the Fifth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 01), pages , Springer. 133
Učenje pravil za napovedno razvrščanje
Univerza v Ljubljani Fakulteta za računalništvo in informatiko DOKTORSKA DISERTACIJA Učenje pravil za napovedno razvrščanje Bernard Ženko Mentor: akad. prof. dr. Ivan Bratko Somentor: prof. dr. Sašo Džeroski
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Background knowledge-enrichment for bottom clauses improving.
Background knowledge-enrichment for bottom clauses improving. Orlando Muñoz Texzocotetla and René MacKinney-Romero Departamento de Ingeniería Eléctrica Universidad Autónoma Metropolitana México D.F. 09340,
Efficient Algorithms for Decision Tree Cross-validation
Journal of Machine Learning Research 3 (2002) 621-650 Submitted 12/01; Published 12/02 Efficient Algorithms for Decision Tree Cross-validation Hendrik Blockeel Jan Struyf Department of Computer Science
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Predictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34
Machine Learning Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
An Analysis of Four Missing Data Treatment Methods for Supervised Learning
An Analysis of Four Missing Data Treatment Methods for Supervised Learning Gustavo E. A. P. A. Batista and Maria Carolina Monard University of São Paulo - USP Institute of Mathematics and Computer Science
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Classification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal [email protected] Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal [email protected]
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
Ensemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Data mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Selective Naive Bayes Regressor with Variable Construction for Predictive Web Analytics
Selective Naive Bayes Regressor with Variable Construction for Predictive Web Analytics Boullé Orange Labs avenue Pierre Marzin 3 Lannion, France [email protected] ABSTRACT We describe our submission
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Experimental Comparison of Symbolic Learning Programs for the Classification of Gene Network Topology Models
Experimental Comparison of Symbolic Learning Programs for the Classification of Gene Network Topology Models Andreas D. Lattner, Sohyoung Kim, Guido Cervone, John J. Grefenstette Center for Computing Technologies
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
BIRCH: An Efficient Data Clustering Method For Very Large Databases
BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Relational Learning for Football-Related Predictions
Relational Learning for Football-Related Predictions Jan Van Haaren and Guy Van den Broeck [email protected], [email protected] Department of Computer Science Katholieke Universiteit
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining
GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining Celso Y. Ishida, Aurora T. R. Pozo Computer Science Department - Federal University of Paraná PO Box: 19081, Centro Politécnico - Jardim das
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Discretization and grouping: preprocessing steps for Data Mining
Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Data Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data
Master of Business Information Systems, Department of Mathematics and Computer Science Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data Master Thesis Author: ing. Y.P.J.M.
Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System
Machine Learning in Hospital Billing Management Janusz Wojtusiak 1, Che Ngufor 1, John M. Shiver 1, Ronald Ewald 2 1. George Mason University 2. INOVA Health System Introduction The purpose of the described
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
