Incremental Learning

Size: px
Start display at page:

Download "Incremental Learning"

Transcription

1 Incremental Learning Abdelhamid Bouchachia Department of Informatics University of Klagenfurt Universitaetsstr Klagenfurt, 9020 Austria voice: fax:

2 Incremental Learning Abdelhamid Bouchachia, University of Klagenfurt, Austria INTRODUCTION Data mining and knowledge discovery is about creating a comprehensible model of the data. Such a model may take different forms going from simple association rules to complex reasoning system. One of the fundamental aspects this model has to fulfill is adaptivity. This aspect aims at making the process of knowledge extraction continually maintainable and subject to future update as new data become available. We refer to this process as knowledge learning. Knowledge learning systems are traditionally built from data samples in an off-line oneshot experiment. Once the learning phase is exhausted, the learning system is no longer capable of learning further knowledge from new data nor is it able to update itself in the future. In this chapter, we consider the problem of incremental learning (IL). We show how, in contrast to offline or batch learning, IL learns knowledge, be it symbolic (e.g., rules) or sub-symbolic (e.g., numerical values) from data that evolves over time. The basic idea motivating IL is that as new data points arrive, new knowledge elements may be created and existing ones may be modified allowing the knowledge base (respectively, the system) to evolve over time. Thus, the acquired knowledge becomes self-corrective in light of new evidence. This update is of paramount importance to ensure the adaptivity of the system. However, it should be meaningful (by capturing only interesting events brought by the arriving data) and sensitive (by safely ignoring unimportant events). Perceptually, IL is a fundamental problem of cognitive development. Indeed, the perceiver usually learns how to make sense of its sensory inputs in an incremental

3 manner via a filtering procedure. In this chapter, we will outline the background of IL from different perspectives: machine learning and data mining before highlighting our IL research, the challenges, and the future trends of IL. BACKGROUND IL is a key issue in applications where data arrives over long periods of time and/or where storage capacities are very limited. Most of the knowledge learning literature reports on learning models that are one-shot experience. Once the learning stage is exhausted, the induced knowledge is no more updated. Thus, the performance of the system depends heavily on the data used during the learning (knowledge extraction) phase. Shifts of trends in the arriving data cannot be accounted for. Algorithms with an IL ability are of increasing importance in many innovative applications, e.g., video streams, stock market indexes, intelligent agents, user profile learning, etc. Hence, there is a need to devise learning mechanisms that are able of accommodating new data in an incremental way, while keeping the system under use. Such a problem has been studied in the framework of adaptive resonance theory (Carpenter et al., 1991). This theory has been proposed to efficiently deal with the stability-plasticity dilemma. Formally, a learning algorithm is totally stable if it keeps the acquired knowledge in memory without any catastrophic forgetting. However, it is not required to accommodate new knowledge. On the contrary, a learning algorithm is completely plastic if it is able to continually learn new knowledge without any requirement on preserving the knowledge previously learned. The dilemma aims at accommodating new data (plasticity) without forgetting (stability) by generating knowledge

4 elements over time whenever the new data conveys new knowledge elements worth considering. Basically there are two schemes to accommodate new data. To retrain the algorithm from scratch using both old and new data is known as revolutionary strategy. In contrast, an evolutionary continues to train the algorithm using only the new data (Michalski, 1985). The first scheme fulfills only the stability requirement, whereas the second is a typical IL scheme that is able to fulfill both stability and plasticity. The goal is to make a tradeoff between the stability and plasticity ends of the learning spectrum as shown in Fig.1. Incremental learning Favoring stability Favoring plasticity Figure 1: Learning spectrum As noted in (Polikar et al., 2000), there are many approaches referring to some aspects of IL. They exist under different names like on-line learning, constructive learning, lifelong learning, and evolutionary learning. Therefore, a definition of IL turns out to be vital: IL should be able to accommodate plasticity by learning knowledge from new data. This data can refer either to the already known structure or to a new structure of the system. IL can use only new data and should not have access at any time to the previously used data to update the existing system. IL should be able to observe the stability of the system by avoiding forgetting. It is worth noting that the IL research flows in three directions: clustering, classification, and rule associations mining. In the context of classifcation and clustering, many IL approaches have been introduced. A typical incremental approach is discussed in (Parikh & Polikar, 2007) which consists of combining an ensemble of multilayer perceptron networks (MLP) to

5 accommodate new data. Similar work was done later in (Chakraborty & Pal, 2003) using also MLP. Note here that stand-alone MLPs, like many other neural networks, need retraining in order to learn from the new data. Other IL algorithms were proposed in (Fritzke, 1994) and in (Domeniconi & Gunopulos, 2001). The former algorithm is based on radial basis function networks (RBFs), while the latter aims at constructing incremental support vector machine classifiers. Actually, there exist four neural models that are inherently incremental: (i) adaptive resonance theory (ART) (Carpenter et al., 1991), (ii) min-max neural networks (Simpson, 1992), (iii) nearest generalized exemplar (Salzberg, 1991), and (iv) neural gas model (Fritzke, 1995). The first three incremental models aim at learning hyper-rectangle categories, while the last one aims at building point-prototyped categories. It is important to mention that there exist many classification approaches that are referred to as IL approaches and which rely on neural networks. These range from retraining misclassified samples to various weighing schemes (Freeman & Saad, 1997; Grippo, 2000). All of them are about sequential learning where input samples are sequentially, but iteratively, presented to the algorithm. However, sequential learning works only in close-ended environments where classes to be learned have to be reflected by the readily available training data and more important prior knowledge can also be forgotten if the classes are unbalanced. In contrast to sub-symbolic learning, few authors have studied incremental symbolic learning, where the problem is incrementally learning simple classification rules (Maloof & Michalski, 2004; Reinke & Michalski, 1988; Utgoff, 1988). In addition, the concept of incrementality has been discussed in the context of association rules mining (ARM). The goal of ARM is to generate all association rules in the form of X Y that have support and confidence greater than a user-specified minimum support and minimum

6 confidence respectively. The motivation underlying incremental ARM stems from the fact that databases grow over time. The association rules mined need to be updated as new items are inserted in the database. Incremental ARM aims at using only the incremental part to infer new rules. However, this is usually done by processing the incremental part separately and scanning the older database if necessary. Some of the algorithms proposed are FUP (Cheung et al., 1996), temporal windowing (Rainsford et al., 1997), and DELI (Lee & Cheung, 1997). In contrast to static databases, IL is more visible in data stream ARM. The nature of data imposes such an incremental treatment of data. Usually data continually arrives in the form of high-speed streams. IL is particularly relevant for online streams since data is discarded as soon as it has been processed. Many algorithms have been introduced to maintain association rules (Charikar et al., 2004; Chang & Lee, 2004; Domingos & Hulten, 2000; Giannella et al., 2003; Lin et al., 2005; Yu et al., 2004). Furthermore, many classification clustering algorithm, which are not fully incremental, have been developed in the context of stream data (Aggarwal et al., 2004; Guha et al., 2000; Last, 2002). FOCUS IL has a large spectrum of investigation facets. We shall focus in the following on classification and clustering which are key issues in many domains such as data mining, pattern recognition, knowledge discovery, and machine learning. In particular, we focus on two research avenues which we have investigated: (i) incremental fuzzy classifiers (IFC) (Bouchachia & Mittermeir, 2006) and (ii) incremental learning by function decomposition (IFD) (Bouchachia, 2006a). The motivation behind IFC is to infer knowledge in the form of fuzzy rules from data that

7 evolves over time. To accommodate IL, appropriate mechanisms are applied in all steps of the fuzzy system construction: (1) Incremental supervised clustering: Given a labeled data set, the first step is to cluster this data with the aim of achieving high purity and separability of clusters. To do that, we have introduced a clustering algorithm that is incremental and supervised. These two characteristics are vital for the whole process. The resulting labeled clusters prototypes are projected onto each feature axis to generate some fuzzy partitions. (2) Fuzzy partitioning and accommodation of change: Fuzzy partitions are generated relying on two steps: Initially, each cluster is mapped onto a triangular partition. In order to optimize the shape of the partitions, the number and the complexity of rules, an aggregation of these triangular partitions is performed. As new data arrives, these partitions are systematically updated without referring to the previously used data. The consequent of rules are then accordingly updated. (3) Incremental feature selection: To find the most relevant features (which results in compact and transparent rules), an incremental version of Fisher s interclass separability criterion is devised. As new data arrives, some features may be substituted for new ones in the rules. Hence, the rules premises are dynamically updated. At any time of the life of a classifier, the rule base should reflect the semantic contents of the already used data. To the best of our knowledge, there is no previous work on feature selection algorithms that observe the notion of incrementality. In another research axis, IL has been thoroughly investigated in the context of neural networks. In (Bouchachia, 2006a; Bouchachia, 2006b) we have proposed a novel IL algorithm

8 based on function decomposition (ILFD) that is realized by a neural network. ILFD uses clustering and vector quantization techniques to deal with classification tasks. The main motivation behind ILFD is to enable an on-line classification of data lying in different regions of the space allowing to generate non-convex partitions and, more generally, to generate disconnected partitions (not lying in the same contiguous space). Hence, each class can be approximated by a sufficient number of categories centered around their prototypes. Furthermore, ILFD differs from the aforementioned learning techniques (Sec. Background) with respect to the following aspects: Most of those techniques rely on geometric shapes to represent the categories, such as hyper-rectangles, hyper-ellipses, etc.; whereas the ILFD approach is not explicitly based on a particular shape since one can use different types of distances to obtain different shapes. Usually, there is no explicit mechanism (except for the neural gas model) to deal with redundant and dead categories, the ILFD approach uses two procedures to get rid of dead categories. The first is called dispersion test and aims at eliminating redundant category nodes. The second is called staleness test and aims at pruning categories that become stale. While all of those techniques modify the position of the winner when presenting the network with a data vector, the learning mechanism in ILFD consists of reinforcing the winning category from the class of the data vector and pushes away the second winner from a neighboring class to reduce the overlap between categories. While the other approaches are either self-supervised or need to match the input with all existing categories, ILFD compares the input only with categories having the same label

9 as the input in the first place and then with categories from other labels distinctively. The ILFD can also deal with the problem of partially labeled data. Indeed, even unlabeled data can be used during the training stage. Moreover, the characteristics of ILFD can be compared to other models such as fuzzy ARTMAP (FAM), min-max neural networks (MMNN), nearest generalized exemplar (NGE), and growing neural gas (GNG) as shown in Tab. I (Bouchachia et al., 2007). TABLE I: Characteristics of some IL algorithms Characteristics FAM MMNN NGE GNG ILFD Online learning Y Y Y Y Y Type of prototypes Hyperbox Hyperbox Hyperbox Graph node Cluster center Generation control Y Y Y Y Y Shrinking of prototypes N Y Y U U Deletion of prototypes N N N Y Y Overlap of prototypes Y N N U U Growing of prototypes Y Y Y U U Noise resistance U Y U U U Sensitivity to data order Y Y Y Y Y Normalization Y Y Y/N N Y/N Legend: Y: yes N: no U: uknown/undefined In our research, we have tried to stick to the spirit of IL. To put it clearly, an IL algorithm, in our view, should fulfill the following characteristics: Ability of life-long learning and to deal with plasticity and stability Old data is never used in subsequent stages No prior knowledge about the (topological) structure of the system is needed

10 Ability to incrementally tune the structure of the system No prior knowledge about the statistical properties of the data is needed No prior knowledge about the number of existing classes and the number of categories per class and no prototype initialization are required. FUTURE TRENDS The problem of incrementality remains a key aspect in learning systems. The goal is to achieve adaptive systems that are equipped with self-correction and evolution mechanisms. However, many issues, which can be seen as shortcomings of existing IL algorithms, remain open and therefore worth investigating: Order of data presentation: All of the proposed IL algorithms suffer from the problem of sensitivity to the order of data presentation. Usually, the inferred classifiers are biased by this order. Indeed different presentation orders result in different classifier structures and therefore in different accuracy levels. It is therefore very relevant to look closely at developing algorithms whose behavior is data-presentation independent. Usually, this is a desired property. Category proliferation: The problem of category proliferation in the context of clustering and classification refers to the problem of generating a large number of categories. This number is in general proportional to the granularity of categories. In other terms, fine category size implies large number of categories and larger size implies less categories. Usually, there is a parameter in each IL algorithm that controls the process of category generation. The problem here is: what is the appropriate value of such a parameter. This is clearly related to the problem of plasticity that plays a central role in IL algorithms.

11 Hence, the question: how can we distinguish between rare events and outliers? What is the controlling parameter value that allows making such a distinction? This remains a difficult issue. Number of parameters: One of the most important shortcomings of the majority of the IL algorithms is the huge number of user-specified parameters that are involved. It is usually hard to find the optimal value of these parameters. Furthermore, they are very sensitive to data, i.e., in general to obtain high accuracy values, the setting requires change from one data set to another. In this context, there is a real need to develop algorithms that do not depend heavily on many parameters or which can optimize such parameters. Self-consciousness & self-correction: The problem of distinction between noisy input data and rare event is not only crucial for category generation, but it is also for correction. In the current approaches, IL systems cannot correct wrong decisions made previously, because each sample is treated once and any decision about it has to be taken at that time. Now, assume that at the processing time the sample x was considered a noise, while in reality it was a rare event, then in a later stage the same rare event was discovered by the system. Therefore, in the ideal case the system has to recall that the sample x has to be reconsidered. Current algorithms are not able to adjust the systems by re-examining old decisions. Thus, IL systems have to be equipped with some memory in order to become smarter enough. Data drift: One of the most difficult questions that is worth looking at is related to drift. Little, if none, attention has been paid to the application and evaluation of the aforementioned IL algorithms in the context of drifting data although the change of

12 environment is one of the crucial assumptions of all these algorithms. Furthermore, there are many publicly available datasets for testing systems within static setting, but there are very few benchmark data sets for dynamically changing problems. Those existing are usually artificial sets. It is very important for the IL community to have a repository, similar to that of the Irvine UCI, in order to evaluate the proposed algorithms in evolving environments. As a final aim, the research in the IL framework has to focus on incremental but stable algorithms that have to be transparent, self-corrective, less sensitive to the order of data arrival, and whose parameters are less sensitive to the data itself. CONCLUSION Building adaptive systems that are able to deal with nonstandard settings of learning is one of key research avenues in machine learning, data mining and knowledge discovery. Adaptivity can take different forms, but the most important one is certainly incrementality. Such systems are continuously updated as more data becomes available over time. The appealing features of IL, if taken into account, will help integrate intelligence into knowledge learning systems. In this chapter we have tried to outline the current state of the art in this research area and to show the main problems that remain unsolved and require further investigations. REFERENCES Aggarwal, C., Han, J., Wang, J., & Yu, P. (2004). On demand classification of data streams. International Conference on Knowledge Discovery and Data Mining, pages:

13 Bouchachia, A. & Mittermeir, R. (2006). Towards fuzzy incremental classifiers. Soft Computing, 11(2): , January Bouchachia, A., Gabrys, B. & Sahel, Z. (2007). Overview of some incremental learning algorithms. To appear in proc. of the 16 th IEEE international conference on fuzzy systems, IEEE Computer Society, 2007 Bouchachia, A. (2006a). Learning with incrementality. The 13 th International conference on neural information processing, LNCS 4232, pages: Bouchachia, A. (2006b). Incremental learning via function decomposition. The 5 th International conference on machine learning and applications, pages: 63-68, IEEE Computer Society, Carpenter, G., Grossberg, D., & Rosen, D. (1991). Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks, 4(6): Chakraborty, D., & Pal, N. (2003). A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning. IEEE Transaction on Neural Networks, 14(1):1-14. Chang, J., & Lee, W. (2004). A sliding window method for finding recently frequent itemsets over online data streams; Journal of Information Science and Engineering, 20(4): Charikar, M., Chen, K., & Farach-Colton, M. (2004). Finding frequent items in data streams. International Colloquium on Automata, Languages and Programming, pages: Cheung, D., Han, J., Ng, V., & Wong, C. (1996). Maintenance of discovered association rules in large databases: An incremental updating technique; IEEE International Conference on Data Mining, Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. The ACM 6 th International Conference on Knowledge Discovery and Data Mining, pages:

14 Domeniconi, C. & Gunopulos, D. (2001). Incremental Support Vector Machine Construction. International Conference on Data Mining, pages: Freeman, J. & Saad, D. (1997). On-line learning in radial basis function networks. Neural Computation, 9: Fritzke, B. (1994). Fast learning with incremental RBF networks. Neural Processing Letters, 1(1):25. Fritzke, B. (1995). A growing neural gas network learns topologies. Advances in neural information processing systems, pages Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in data streams at multiple time granularities. Workshop on Data Mining: Next Generation Challenges and future Directions, AAAI. Grippo, L. (2000). Convergent on-line algorithms for supervised learning in neural networks. IEEE Trans. on Neural Networks, 11: Guha, S., Mishra, N., Motwani, R., & O'Callaghan, L. (2000). Clustering data streams. IEEE Symposium on Foundations of Computer Science, pages: Last, M. (2002). Online classification of non-stationary data streams, Intelligent Data Analysis, 6(2): Lee, S., & Cheung, D. (1997). Maintenance of discovered association rules: when to update?. SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. Lin, C., Chiu, D., Wu, Y., & Chen, A. (2005). Mining frequent itemsets from data streams with a time-sensitive sliding window. International SIAM Conference on Data Mining. Maloof, M., & Michalski, R. (2004). Incremental learning with partial instance memory. Artificial Intelligence 154: Michalski, R. (1985). Knowledge repair mechanisms: evolution vs. revolution. International

15 Machine Learning Workshop, pages Parikh, D. & Polikar, R. (2007). An ensemble-based incremental learning approach to data fusion. IEEE transaction on Systems, Man and Cybernetics, 37(2): Polikar, R., Udpa, L., Udpa, S. & Honavar, V. (2000). Learn++: An incremental learning algorithm for supervised neural networks. IEEE Trans. on Systems, Man, and Cybernetics, 31(4): Rainsford, C., Mohania, M., & Roddick, J. (1997). A temporal windowing approach to the incremental maintenance of association rules. International Database Workshop, Data Mining, Data Warehousing and Client/Server Databases, pages: Reinke, R., & Michalski, R. (1988). Machine intelligence, chapter: Incremental learning of concept descriptions: a method and experimental results, pages Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine learning, 6: Simpson, P. (1992). Fuzzy min-max neural networks. Part 1: Classification. IEEE Trans. Neural Networks, 3(5): Utgoff, P. (1988). ID5: An incremental ID3. International Conference on Machine Learning, pages Yu, J., Chong, Z., Lu, H., & Zhou, A.(2004). False positive or false negative: mining frequent itemsets from high speed transactional data streams. International Conference on Very Large Databases, pages: KEY TERMS AND THEIR DEFINITIONS

16 Knowledge learning: Knowledge Learning: The process of automatic extracting Knowledge from data. Incrementality: The characteristic of an algorithm that is capable of processing data which arrives over time sequentially in a stepwise manner without referring to the previously seen data. Stability: A learning algorithm is totally stable if it keeps the acquired knowledge in memory without any catastrophic forgetting. Plasticity: A learning algorithm is completely plastic if it is able to continually learn new knowledge without any requirement on preserving previously seen data. Data drift: Unexpected change over time of the data values (according to one or more dimensions). Keywords: online learning, incrementality, adaptivity, model evolution, stability-plasticity

Stabilization by Conceptual Duplication in Adaptive Resonance Theory

Stabilization by Conceptual Duplication in Adaptive Resonance Theory Stabilization by Conceptual Duplication in Adaptive Resonance Theory Louis Massey Royal Military College of Canada Department of Mathematics and Computer Science PO Box 17000 Station Forces Kingston, Ontario,

More information

Evaluating Algorithms that Learn from Data Streams

Evaluating Algorithms that Learn from Data Streams João Gama LIAAD-INESC Porto, Portugal Pedro Pereira Rodrigues LIAAD-INESC Porto & Faculty of Sciences, University of Porto, Portugal Gladys Castillo University Aveiro, Portugal jgama@liaad.up.pt pprodrigues@fc.up.pt

More information

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College

More information

A Comparative Study of Simple Online Learning Strategies for Streaming Data

A Comparative Study of Simple Online Learning Strategies for Streaming Data A Comparative Study of Simple Online Learning Strategies for Streaming Data M. MILLÁN-GIRALDO J. S. SÁNCHEZ Universitat Jaume I Dept. Llenguatges i Sistemes Informátics Av. Sos Baynat s/n, 1271 Castelló

More information

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams Pramod D. Patil Research Scholar Department of Computer Engineering College of Engg. Pune, University of Pune Parag

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Meta-learning. Synonyms. Definition. Characteristics

Meta-learning. Synonyms. Definition. Characteristics Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore wduch@is.umk.pl (or search

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Using One-Versus-All classification ensembles to support modeling decisions in data stream mining

Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa Patricia.Lutu@up.ac.za

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

A FRAMEWORK FOR AN ADAPTIVE INTRUSION DETECTION SYSTEM WITH DATA MINING. Mahmood Hossain and Susan M. Bridges

A FRAMEWORK FOR AN ADAPTIVE INTRUSION DETECTION SYSTEM WITH DATA MINING. Mahmood Hossain and Susan M. Bridges A FRAMEWORK FOR AN ADAPTIVE INTRUSION DETECTION SYSTEM WITH DATA MINING Mahmood Hossain and Susan M. Bridges Department of Computer Science Mississippi State University, MS 39762, USA E-mail: {mahmood,

More information

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network

More information

Review of Ensemble Based Classification Algorithms for Nonstationary and Imbalanced Data

Review of Ensemble Based Classification Algorithms for Nonstationary and Imbalanced Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. IX (Feb. 2014), PP 103-107 Review of Ensemble Based Classification Algorithms for Nonstationary

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2 Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

Machine Learning: Overview

Machine Learning: Overview Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

More information

Web Mining using Artificial Ant Colonies : A Survey

Web Mining using Artificial Ant Colonies : A Survey Web Mining using Artificial Ant Colonies : A Survey Richa Gupta Department of Computer Science University of Delhi ABSTRACT : Web mining has been very crucial to any organization as it provides useful

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

An Empirical Study of Application of Data Mining Techniques in Library System

An Empirical Study of Application of Data Mining Techniques in Library System An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

The Big Data methodology in computer vision systems

The Big Data methodology in computer vision systems The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of

More information

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Tatsuya Minegishi 1, Ayahiko Niimi 2 Graduate chool of ystems Information cience,

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

SOLVING A DIRECT MARKETING PROBLEM BY THREE TYPES OF ARTMAP NEURAL NETWORKS. Anatoli Nachev

SOLVING A DIRECT MARKETING PROBLEM BY THREE TYPES OF ARTMAP NEURAL NETWORKS. Anatoli Nachev International Journal "Information Theories & Applications" Vol.15 / 2008 63 [Yao, 2006] Y.Y. Yao Granular computing for data mining // Proceedings of SPIE Conference on Data Mining, Intrusion Detection,

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

Customer Relationship Management using Adaptive Resonance Theory

Customer Relationship Management using Adaptive Resonance Theory Customer Relationship Management using Adaptive Resonance Theory Manjari Anand M.Tech.Scholar Zubair Khan Associate Professor Ravi S. Shukla Associate Professor ABSTRACT CRM is a kind of implemented model

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

A Survey of Classification Techniques in the Area of Big Data.

A Survey of Classification Techniques in the Area of Big Data. A Survey of Classification Techniques in the Area of Big Data. 1PrafulKoturwar, 2 SheetalGirase, 3 Debajyoti Mukhopadhyay 1Reseach Scholar, Department of Information Technology 2Assistance Professor,Department

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

Online Pattern Classification With Multiple Neural Network Systems: An Experimental Study

Online Pattern Classification With Multiple Neural Network Systems: An Experimental Study IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 33, NO. 2, MAY 2003 235 Online Pattern Classification With Multiple Neural Network Systems: An Experimental Study

More information

Personalized Hierarchical Clustering

Personalized Hierarchical Clustering Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

A Data Generator for Multi-Stream Data

A Data Generator for Multi-Stream Data A Data Generator for Multi-Stream Data Zaigham Faraz Siddiqui, Myra Spiliopoulou, Panagiotis Symeonidis, and Eleftherios Tiakas University of Magdeburg ; University of Thessaloniki. [siddiqui,myra]@iti.cs.uni-magdeburg.de;

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Cross-Validation. Synonyms Rotation estimation

Cross-Validation. Synonyms Rotation estimation Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

More information

Philosophies and Advances in Scaling Mining Algorithms to Large Databases

Philosophies and Advances in Scaling Mining Algorithms to Large Databases Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell

More information

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, kanak.saxena@gmail.com D.S. Rajpoot Registrar,

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013 ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Knowledge Based Descriptive Neural Networks

Knowledge Based Descriptive Neural Networks Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: jtyao@cs.uregina.ca Abstract This paper presents a

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS. Data Mining 1 OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

L25: Ensemble learning

L25: Ensemble learning L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

Towards applying Data Mining Techniques for Talent Mangement

Towards applying Data Mining Techniques for Talent Mangement 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

A New Method for Traffic Forecasting Based on the Data Mining Technology with Artificial Intelligent Algorithms

A New Method for Traffic Forecasting Based on the Data Mining Technology with Artificial Intelligent Algorithms Research Journal of Applied Sciences, Engineering and Technology 5(12): 3417-3422, 213 ISSN: 24-7459; e-issn: 24-7467 Maxwell Scientific Organization, 213 Submitted: October 17, 212 Accepted: November

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

KNOWLEDGE DISCOVERY and SAMPLING TECHNIQUES with DATA MINING for IDENTIFYING TRENDS in DATA SETS

KNOWLEDGE DISCOVERY and SAMPLING TECHNIQUES with DATA MINING for IDENTIFYING TRENDS in DATA SETS KNOWLEDGE DISCOVERY and SAMPLING TECHNIQUES with DATA MINING for IDENTIFYING TRENDS in DATA SETS Prof. Punam V. Khandar, *2 Prof. Sugandha V. Dani Dept. of M.C.A., Priyadarshini College of Engg., Nagpur,

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

The Research of Data Mining Based on Neural Networks

The Research of Data Mining Based on Neural Networks 2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.09 The Research of Data Mining

More information

Design call center management system of e-commerce based on BP neural network and multifractal

Design call center management system of e-commerce based on BP neural network and multifractal Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce

More information

A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data

A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data T. W. Liao, G. Wang, and E. Triantaphyllou Department of Industrial and Manufacturing Systems

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring 714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: raghavendra_bk@rediffmail.com

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Neural Networks for Data Mining: Constrains and Open Problems

Neural Networks for Data Mining: Constrains and Open Problems Neural Networks for Data Mining: Constrains and Open Problems Răzvan Andonie and Boris Kovalerchuk Computer Science Department Central Washington University, Ellensburg, USA Abstract. When we talk about

More information

How To Find Local Affinity Patterns In Big Data

How To Find Local Affinity Patterns In Big Data Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

More information

Decompose Error Rate into components, some of which can be measured on unlabeled data

Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

More information