Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Size: px
Start display at page:

Download "Data Quality Mining: Employing Classifiers for Assuring consistent Datasets"

Transcription

1 Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, Abstract: Independent from the concrete definition of the term data quality consistency always plays a major role. There are two main points when dealing with the data quality of a database: Firstly, the data quality has to be measured, and secondly, if is necessary, it must be improved. A classifier can be used for both purposes regarding consistency demands by calculating the distance of the classified value to the stored value for measuring and using the classified value for correction. Keywords: data mining, data quality, classifiers, ontology, utilities 1 Introduction A good introduction of the main topics of the field of data quality can be found in (Scannapieco et al. 2005) where a motivation is given and relevant data quality dimensions are highlighted. Having discussed an ontology that describes such a definition and the semantical integration of data quality aspects into given data schemas using an ontological approach in Grüning (2006) we now come to the appliance of data quality mining algorithms to estimate the consistency of a given data set and suggest correct values where necessary. This is one of the four identified algorithms needed for the holistic data quality management approach to be developed in a projected funded by a major German utility. One of its goals is to provide an ICT infrastructure for managing the upcoming power plant mix consisting of more decentralized, probably regenerative, and sustainable power plants, e.g. wind power and biogas plants, and combined heat and power generation together with the conventional power plants. As many decisions for controlling relevant parts of the system are made automatically, good data quality is vital for the health of the overall system, as false data leads to wrong decisions that may worsen the system s overall performance. The

2 2 Fabian Grüning system is used for both day-to-day business and strategical decisions. Examples for those decisions are the regulation of conventional power plants with the wind forecast in mind to provide an optimal integration of the sustainable power plants like wind power plants into the distribution grid. A strategical decision might be the decision where another wind park is built by taking statistical series of wind measurements into account. The system contains costumer data as well as technical data about the distribution grid and power plants. The data is critical for the company as well as the state as it contains information about vital distribution systems so that concrete information about the data cannot be given in this paper. Therefore the example given later in this paper will only contain a simple list of dates. The paper focuses more on the concepts of the approach discussed beforehand. The paper is structured as follows: First we are going to give a short summary of the term data quality mining and the dimensions belonging to it with focus on consistency. We than are going to reasonably chose a concrete classification algorithm that fits our needs in the examined field. The process of using a classifier for checking consistency in data sets is going to be described in the following section giving an example of the algorithm s performance. We are going to touch the subject of using domain experts knowledge through employing ontologies and eventually getting to conclusions and further work to do. 2 Excursus: About Data Quality Mining The definition of data quality by Redman (1996) defines four different data quality dimensions: accuracy, consistency, currency as a specialization of timeliness constraints and correctness. After having discussed the semantics of those dimensions in the previous paper we now concentrate on the realization of the algorithms for data quality mining, namely for checking and improving consistency. The term data quality mining is meant in the way that algorithms of the data mining domain are utilized for the purpose of data quality management (see Hipp et all. 2002). In this paper we are discussing the consistency aspect of data quality. We will explain that classification algorithms are reasonable applicable for this purpose.

3 Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Consistency as a Data Quality Dimension Whenever there is redundancy in a data set, inconsistencies might occur. A good example is the topology of a distribution grid that consists of power supply lines and connections. An inconsistency in a relational orientated data store leads to non realizable topologies where e.g. a power supply line only has one connection or is connected more than twice. Such a datacentric problem leads to real world problems in the sense that power flow algorithms cannot be applied to the data so that management systems and the power grid get unstable or short circuit cannot be detected or are registered all the time. This example also shows that a consistency check can only be done by considering a real world entity, here the distribution grid, on the whole and that the verification of consistency works better all the more the semantical correlation between real world entities and data schemas is realized so that relationships between the single data properties can be utilized (see (Noy and Guinness 2001) and Grüning 2006). A particular good approach for assuring this correlation is using ontologies for modeling a real world extract as they explicitly keep the relationships inside of and between the examined real world s concepts in contrast to for example normalized relational data schemas. 2.2 Employing Classifiers for Consistency Checking Classification algorithms are used to classify one data item of a data record by using the information of the remaining data items. E.g. a triple of two power supply lines and one connection implies that those two lines are connected by the very connector. This is only true if the value of the connector is in fact a valid identifier for a connector. If the connector s name is different from a certain pattern that identifies such a connector, a different resource is addressed and an invalid topology is represented. Such dependence can be learned by a classifier. If the classified value and the stored value differ from one another, an inconsistency in the dataset might have been identified which even can be corrected by using the classified value as a clue. Classifiers can therefore be found basically usable for finding and correcting inconsistencies in datasets and a prototype will confirm this assumption as shown in the following sections. To check every possible consistency violation every data item has to be classified with the rest of the data record respectively. It is therefore necessary to train n classifiers for a data record consisting of n data items.

4 4 Fabian Grüning 3 Using Support Vector Machines as a concrete Classification Algorithm There are several different algorithms for classification tasks like decision trees (C4.5), rule sets (CN2), neural networks, Bayes classifiers, evolutionary algorithms, and support vector machines. A decision has to be made which algorithm fits the needs for the classification task in the field of checking consistency in data sets. The classifiers have in common that their implementation consists of two consecutively phases: In the first phase the algorithm learns through the usage of a representative data set the characteristics of the data. This phase is called the training phase. In the second phase the algorithm classifies not known data records utilizing the knowledge gained from phase one (see (Witten and Frank 2005)). There are two main points a classification algorithm has to fulfill in this kind of application: The dataset for the learning task in which the algorithms adapts to the characteristics of the data is in comparison to the whole data set relatively small. This is related to the fact that the data set for the learning task has to be constructed out of error-free data so that the classifier will detect and complain about data that differs from these. The labeling, i.e. the task of deciding whether a data record is correct or not, has to be done by domain experts and therefore is a complex and expensive task. The classification approach has to be quite general because not much is known about the data to be classified beforehand. A well qualified classification algorithm therefore needs only few parameters to be configured to be adjusted to the classification task. Both demands are fulfilled by support vector machines (SVM) as they scale well for even small data sets and the configuration efforts are restricted to the choice and configuration of the kernel function that is used to map the training set s samples into the high dimensional feature space and the adaptation of the coefficient weighting the costs for misclassification (see (Russell and Norvig 2003) for an introduction to SVMs). A classifier s output can also be understood as a recommendation in the case where the classified value differs from the stored value. The SVM can both be used as a classification or regression algorithm making it possible to not only give recommendations for discrete but also for continuous values. The algorithm for the regression version of SVM does not differ much from the classifier version so that it is easy to be used either way. Classification and regression can be used nearly synonymously when it comes to SVM because the learning phases do not differ much from one another.

5 Data Quality Mining: Employing Classifiers for Assuring consistent Datasets 5 4 Prototype for Consistency Checking The considerations made so far have to be verified by an appliance to realworld data. For this reason a prototype was developed employing YALE (see Mierswa 2006), a learning environment that allows to orchestrate processes that are necessary in the field of learning algorithms. As the whole approach for data quality mining is encouraged by a German utility real data was available for testing purposes. We will show promising results from a prototype utilizing SVMs as a classification algorithm for checking consistency in a given data set. 4.1 Phase I: Selection To compile the training set for the first phase of the learning algorithm, a choice out of the existing data records has to be made (see figure 4.1). On the one hand all relevant equivalent classes for the classification task have to be covered which is addressed by the stratified sampling, on the other hand the cardinal number of the training set has to be restricted because of the expensive labeling task for the training set (see section 3). Therefore the absolute sampling assures that only a certain amount of data records are in the training set at most. Fig Selection phase The data itself is converted to interval scale (see (Bortz 2005)) by one of the following algorithms: If the data originally is in nominal scale the data is mapped to [0, 1] equidistantly. Ordinal data gets normalized and therefore also mapped to [0, 1] where the sequence of the data gets conserved. Strings are addressed separately: They are mapped to interval scale under a given string distance function in a way that similar strings have less distance to one another than less similar strings. The results are clusters of

6 6 Fabian Grüning similar strings that get normalized to [0, 1], having obtained a certain amount of semantics. This preprocessing phase produces data sets that only consist of interval scaled values that are therefore suitable for getting processed via the regression version of the SVM algorithm. We now can use the distance between the outcome of the regression algorithm and the mapped value as a degree of quality. The outcome of the regression algorithm can directly be used as a recommendation for the correct value. Mentioned as a side note we do not lose any practicability by the data s preprocessing as it is still possible to establish arbitrary bounds to use the classification version of the algorithm. 4.2 Phase II: Learning In the learning phase the classifier adapts to the characteristics of the data set. This mainly means to adjust the SVM parameter set so that it adapts optimally to the training set. As (Russel and Norvig 2003) describe, this means to choose the kernel function that adapts the best to the training set and to choose the correct values for the kernel s parameters for optimal results. The learning phase consists of several steps (see figure 4.2): 1. In the preprocessing phase the data sets are completed where necessary because the classification algorithm cannot handle empty data items. This is no problem as the values filled in are uniform so that they cannot be taken into account for classification because they are not characteristic for any data set. 2. The next steps are repeatedly executed to find the optimal parameter setting for the SVM: The training set is split into a learning and a classification subset as the procedure of cross validation plans. The first set is used for training the classifier and the second set is used for validating the trained classifier. Cross validation avoids a too strict adaptation to the training set so that the classifier only adapts to the characteristics of the training set and does not mimic it. Having done that with a defined number of combinations the overall performance of the classifier is evaluated and associated with the parameter configuration. The more parameter combinations of the classification algorithms are tested the better the classifier is as the result of this process. This is one of the strengths of the SVMs as only three variables are used to configure a certain SVM in the case when using the radial basis function as kernel function. The parameter space can therefore be mined quite in

7 Data Quality Mining: Employing Classifiers for Assuring consistent Datasets 7 great detail for finding the optimal parameter configuration so that the out coming classifier is of high quality. 3. Finally, the optimal parameter configuration is used to eventually train a classifier with the whole training set which gets stored for the last step of the process of finding inconsistent data, namely to apply the classifier to not known data records. Fig Learning phase 4.3 Phase III: Appliance In the last phase (see figure 4.3) the classifier is applied to the data records of the whole data set searching for discrepancies between classified and

8 8 Fabian Grüning stored values. The more discrepancies are found the lower the data quality is regarding the consistency aspect. As SVMs can also be used for regression, a concrete recommendation for a correct value can be made for the cases where inconsistencies occur. Such a recommendation is not only a range but a concrete value in contrast to other classification algorithms only capable of classifications, like decision trees, again showing the adequate choice of the classification algorithm. Fig Appliance phase 4.4 Results A typical result is shown in Table 1. It was generated out of a training set consisting of 128 examples that were proved to be valid. The classifier was then used to find inconsistencies between the classified and the stored values. In the examples given there are two major discrepancies between the stored and the classified values (marked by italics). The first one is a result of a deliberate falsification to show the approach s functionality. The correct value had been 1995 so that the distance relative to the remaining distances between stored and classified values is large and implies an error in the specific data set. The classified value can be used as a correction and meets the non-falsified value quite well.

9 Data Quality Mining: Employing Classifiers for Assuring consistent Datasets 9 The second one also shows a huge distance between the classified and the stored value although no falsification has taken place. This is an example that shows that the training set missed a relevant equivalent class so that the algorithm wrongly detects an inconsistency. The user has to mark this wrong classification. Those data sets are then included in the training set so that in the next learning phase the classifier better adapts to the data s characteristics. This procedure may be executed until the classifier has adapted well enough to the relevant data set or regularly to adapt to changes in the underlying structure of the data. Classified Value Stored Value [ ] [ ] Table 1: Prototype's results sample (classified and stored values are shown) 5 Using Ontologies for further Improvements As already pointed out in section 2.1 the usage of ontologies for modeling the examined real world extract is beneficial for the sake of building a classifier for the discovery of inconsistencies in data sets. But not only the semantical coherence of the modeled concepts is useful but also further information the modeling domain expert can annotate to the identified concepts. This information is made explicit and can therefore considered to be directly usable knowledge. We gave examples in chapter 4.1 where the information about the values scale was given by domain experts and annotated to the data scheme. These annotations, can be used to configure the data quality mining s algorithms for further improvements of the achieved results by adjusting them to the needs induced by the underlying data schema and the domain expert s knowledge that would otherwise not be available or would difficulty be utilizable.

10 10 Fabian Grüning 6 Conclusions and further Work In this paper it was shown that classifiers can be employed to find inconsistencies in data sets and to give concrete recommendations for correct values. This approach was first made plausible through a discussion together with the decision to employ support vector machines as the classification algorithm and later through the results of a prototype. For a holistic approach for data quality mining there are still the data quality dimensions accuracy, correctness, and currency open for further research. The solutions for these dimensions will be discussed in upcoming papers. The positive influence of ontologies for the data quality mining approach in particular and checking for consistency problems in general by employing the additional semantical knowledge in contrast to other modeling techniques was highlighted. The results presented in this paper were achieved in a project funded by EWE AG (see which is a major German utility. Bibliography Bortz J (2005) Statistik. Springer Medizin Verlag, Heidelberg. Grüning F (2006) Data Quality Mining in Ontologies for Utilities. In: Managing Environmental Knowledge, 20 th International Conference of Informatics in Environmental Protection Hipp J, Güntzer U, Nakhaeizadeh G (2002) Data Mining of Association Rules and the Process of Knowledge Discovery in Databases. In: Lecture Notes of Computer Science: Advances in Data Mining: Applications in E- Commerce, Medicine, and Knowledge Management, Springer Berlin/Heidelberg, Volume 2394/2002. Noy F N, McGuinness D L (2001) Ontology Development 101: A Guide to Creating Your First Ontology. Stanford Knowledge Systems Laboratory Technical Report KSL and Stanford Medical Informatics Technical Report SMI Redman TC (1996) Data Quality for the Information Age. Artech House, Inc. Russell S, Norvig P (2003) Artificial Intelligence: A Modern Approach. Prentice Hall. Scannapieco M, Missier P, Batini C (2005) Data Quality at a Glance. In: Datenbank-Spektrum, Volume 14, Pages 6-14.

11 Data Quality Mining: Employing Classifiers for Assuring consistent Datasets 11 Witten I H, Frank E (2005) Data Mining: Practical machine learning tools and techniques. 2 nd Edition, Morgan Kaufmann, San Francisco. Mierswa I (2007) YALE Yet Another Learning Environment. (last access )

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Smart Grid Data Analytics for Decision Support

Smart Grid Data Analytics for Decision Support 1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Neural Networks in Data Mining

Neural Networks in Data Mining IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Data mining knowledge representation

Data mining knowledge representation Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

More information

CS229 Titanic Machine Learning From Disaster

CS229 Titanic Machine Learning From Disaster CS229 Titanic Machine Learning From Disaster Eric Lam Stanford University Chongxuan Tang Stanford University Abstract In this project, we see how we can use machine-learning techniques to predict survivors

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Improving students learning process by analyzing patterns produced with data mining methods

Improving students learning process by analyzing patterns produced with data mining methods Improving students learning process by analyzing patterns produced with data mining methods Lule Ahmedi, Eliot Bytyçi, Blerim Rexha, and Valon Raça Abstract Employing data mining algorithms on previous

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Data Quality Assessment

Data Quality Assessment Data Quality Assessment Leo L. Pipino, Yang W. Lee, and Richard Y. Wang How good is a company s data quality? Answering this question requires usable data quality metrics. Currently, most data quality

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

A Semantic Model for Multimodal Data Mining in Healthcare Information Systems. D.K. Iakovidis & C. Smailis

A Semantic Model for Multimodal Data Mining in Healthcare Information Systems. D.K. Iakovidis & C. Smailis A Semantic Model for Multimodal Data Mining in Healthcare Information Systems D.K. Iakovidis & C. Smailis Department of Informatics and Computer Technology Technological Educational Institute of Lamia,

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

KEITH LEHNERT AND ERIC FRIEDRICH

KEITH LEHNERT AND ERIC FRIEDRICH MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

Knowledge Discovery from Data Bases Proposal for a MAP-I UC Knowledge Discovery from Data Bases Proposal for a MAP-I UC João Gama (jgama@fep.up.pt) Universidade do Porto 1 Knowledge Discovery from Data Bases We are deluged by data: scientific data, medical data,

More information

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED

More information

International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

II. RELATED WORK. Sentiment Mining

II. RELATED WORK. Sentiment Mining Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Visualizing class probability estimators

Visualizing class probability estimators Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers

More information

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity

More information

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

C19 Machine Learning

C19 Machine Learning C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

More information

Data Mining for Knowledge Management in Technology Enhanced Learning

Data Mining for Knowledge Management in Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

Prediction Models for a Smart Home based Health Care System

Prediction Models for a Smart Home based Health Care System Prediction Models for a Smart Home based Health Care System Vikramaditya R. Jakkula 1, Diane J. Cook 2, Gaurav Jain 3. Washington State University, School of Electrical Engineering and Computer Science,

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Subject Description Form

Subject Description Form Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

More information

Big Data Analytics for SCADA

Big Data Analytics for SCADA ENERGY Big Data Analytics for SCADA Machine Learning Models for Fault Detection and Turbine Performance Elizabeth Traiger, Ph.D., M.Sc. 14 April 2016 1 SAFER, SMARTER, GREENER Points to Convey Big Data

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Application of Data Mining Methods in Health Care Databases

Application of Data Mining Methods in Health Care Databases 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

The Result Analysis of the Cluster Methods by the Classification of Municipalities

The Result Analysis of the Cluster Methods by the Classification of Municipalities The Result Analysis of the Cluster Methods by the Classification of Municipalities PAVEL PETR, KAŠPAROVÁ MILOSLAVA System Engineering and Informatics Institute Faculty of Economics and Administration University

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Spam detection with data mining method:

Spam detection with data mining method: Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

More information

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Web Site Visit Forecasting Using Data Mining Techniques

Web Site Visit Forecasting Using Data Mining Techniques Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many

More information

A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr. INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr. Meisenbach M. Hable G. Winkler P. Meier Technology, Laboratory

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

Performance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination

Performance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination Performance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination Ceyda Er Koksoy 1, Mehmet Baris Ozkan 1, Dilek Küçük 1 Abdullah Bestil 1, Sena Sonmez 1, Serkan

More information

Big Data Analytics. Tools and Techniques

Big Data Analytics. Tools and Techniques Big Data Analytics Basic concepts of analyzing very large amounts of data Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco Optimizing content delivery through machine learning James Schneider Anton DeFrancesco Obligatory company slide Our Research Areas Machine learning The problem Prioritize import information in low bandwidth

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

First Steps towards a Frequent Pattern Mining with Nephrology Data in the Medical Domain. - Extended Abstract -

First Steps towards a Frequent Pattern Mining with Nephrology Data in the Medical Domain. - Extended Abstract - First Steps towards a Frequent Pattern Mining with Nephrology Data in the Medical Domain - Extended Abstract - Matthias Niemann 1, Danilo Schmidt 2, Gabriela Lindemann von Trzebiatowski 3, Carl Hinrichs

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Role of Component Based Systems in Data Mining & Cloud Computing

Role of Component Based Systems in Data Mining & Cloud Computing Role of Component Based Systems in Data Mining & Cloud Computing Rimmy Chuchra 1, Mahak Jindal 2, Bharti Mehta 3 1 Asst.Proff (CSE) & Sri Sai University (SSU),Palampur(HP) 2,3 M.Tech(CSE) & Yadwindra college

More information

Intrusion Detection System using Log Files and Reinforcement Learning

Intrusion Detection System using Log Files and Reinforcement Learning Intrusion Detection System using Log Files and Reinforcement Learning Bhagyashree Deokar, Ambarish Hazarnis Department of Computer Engineering K. J. Somaiya College of Engineering, Mumbai, India ABSTRACT

More information

Methodology Framework for Analysis and Design of Business Intelligence Systems

Methodology Framework for Analysis and Design of Business Intelligence Systems Applied Mathematical Sciences, Vol. 7, 2013, no. 31, 1523-1528 HIKARI Ltd, www.m-hikari.com Methodology Framework for Analysis and Design of Business Intelligence Systems Martin Závodný Department of Information

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information