Knowledge Reuse in Data Mining Projects and Its Practical Applications

Size: px
Start display at page:

Download "Knowledge Reuse in Data Mining Projects and Its Practical Applications"

Transcription

1 Knowledge Reuse in Data Mining Projects and Its Practical Applications Rodrigo Cunha 1, Paulo Adeodato 1 and Silvio Meira 1, 1 Center of Informatics, Federal University of Pernambuco Caixa Postal Cidade Universitária , Recife-PE, Brazil {rclvc,pjla,srlm}@cin.ufpe.br Abstract. The objective of this paper is providing an integrated environment for knowledge reuse in KDD, for preventing recurrence of known errors and reinforcing project successes, based on previous experience. It combines methodologies from project management, data warehousing, mining and knowledge representation. Different from purely algorithmic papers, this one focuses on performance metrics used for managerial such as the time taken for solution development, the amount of files not automatically managed and other, while preserving equivalent performance on the technical solution quality metrics. This environment has been validated with metadata collected from previous KDD projects developed and deployed for real world applications by the development team members. The case study carried out in actual contracted projects have shown that this environment assesses the risk of failure for new projects, controls and documents all the KDD project development process and helps understanding the conditions that lead KDD projects to success or failure. Keywords: Data mining project, Knowledge reuse in KDD projects, risk assessment of KDD projects. 1 Introduction Early research on artificial intelligence (AI) focused on the implementation and optimization of algorithms. These algorithms however, only produced reliable results in very specific applications, in limited domains. The general application of AI to data generating real world activities, data mining, was far from satisfactory, mainly due to the difficult integration of the databases, to the low quality of the data available and to the poor understanding of the business operation (application domain). In 1996, Fayyad et al. [1] generalized the scope inserting data mining in a more global process coined Knowledge Discovery in Databases (KDD). Also in 1996, potential data mining solutions consumers and suppliers formed a consortium for creating a methodology for systematically developing data mining solutions for real problems. They came up with the CRISP-DM (Cross-Industry Standard Process for Data Mining) [2], a non-proprietary methodology for identifying and decomposing a data mining project in several stages, shared by all domains of application. Those initiatives aimed at standardizing the development process of data mining solutions

2 which involves the use of several tools for modeling, data visualization, analysis and transformation, performance evaluation and even specific programming tasks. Once this standard had been created, the provision of interoperability among the several different platforms in a single environment where all the processes are centralized and documented became one of the most important issues in KDD applications to real world problems. According to Bartlmae and Riemenschneider [3], another important issue in KDD projects nowadays, mainly due to their complexity and strong user dependence, is the inadequate documentation, management and control of the experiences in solution development, thus yielding the recurrence of errors already known from previous projects in new ones. The lack of a platform capable of reusing the knowledge and lessons learned in previous projects developments is a practical problem worsened by the inadequate interoperability among the data mining tools available in the platforms for KDD [4]. Summarizing, the lack of proper interoperability together with the lack of knowledge reuse capability in KDD solution development platforms are deficiencies that may lead projects to failure or delays and cause client dissatisfaction and cost increase. This paper presents the capability of knowledge reuse from previous data mining projects. Therefore, this environment provides a better understanding of the conditions which make KDD projects turn into a failure or a success and a simpler and more precise parameter specification for producing high quality KDD projects that match the clients expectations within the schedule and budget planned. This paper is organized as follows. Section 2 presents the literature survey on approaches related to the proposed environment. Section 3 describes the architecture and functionality of the Knowledge Reuse Environment. Section 4 shows the relevant results for knowledge reuse in data mining projects. Finally, Section 5 summarizes the research carried out, emphasizes its main results along with their interpretation, states its limitations, and proposes future work for improving the Knowledge Reuse. 2 Literature Review IMACS (Interactive Market Analysis and Classification System) [5] was one of the first initiatives to consider involving the user in the KDD process, back in When that system was first proposed, data mining tools used to provide very limited functionality and IMACS development has not followed the evolution of those tools. Thus, IMACS provides support for only the creation of semantic definitions for the data and the formal representation of knowledge. In 1997, CITRUS [6] was proposed based on the CRISP-DM methodology. Two years later, UGM (User Guidance Modulates) was presented [7] as an improvement to CITRUS based on experiences of past projects for knowledge reuse. In 2002, IDAs [8] was proposed based on Fayyad et al. s methodology. After the Clementine release, the vast majority of tools (market tools and academic tools), adopted an interface focused on the data mining workflow. Other relevant work is the application of Case Based Reasoning for Knowledge Management in KDD Projects [3], which proposes a environment aimed at reusing

3 knowledge in data mining projects. The idea is based on the concept of Experiences Factory where Case Based Reasoning (CBR) helps storing and retrieving knowledge packages in a data repository. The Statlog project [9] proposes a methodology to evaluate the performance of different algorithms for machine learning, neural networks and statistics. In spite of using the knowledge reuse concept, the scope of reuse in Statlog is limited to data mining algorithms. Finally, the environment for Distributed Knowledge Discovery Systems project [10] introduces a environment aimed at integrating different data mining tools and platforms related to organizational modeling and integration of solutions. Despite considering integration an important issue, this approach only integrates the data mining tools. It does not deal with either metadata acquisition or meta-data mining on stored knowledge. In summary, the literature offers isolated initiatives for knowledge reuse, but no contribution considering both of these features with focus on the process, as presented in this paper. 3 Knowledge Reuse Environment In this environment, the knowledge databases are stored in three different structures, according to the types of their contents. 1) Metadata Database: This database stores information from previous projects. Here, metadata are all types of information produced along the KDD project development process, such as: data transformation needed, algorithms used, number of components used, project manager, project duration, overall project cost and client s level of satisfaction among others. That is, the metadata database stores information ranging from project management to specific algorithms with their corresponding performances. This module consists of two sub-modules: the transactional metadata database and the managerial metadata database. The transactional metadata database stores all of the project s metadata in a relational logic model. Managerial metadata databases are constructed via an ETL (Extract, Transform, and Load) process [11] carried out on the transactional database. These managerial metadata databases, also called Data Marts, are represented in a star model [11]. The objective of these metadata databases is providing support for both the project manager and the KDD experts along all the KDD solution development process. 2) CBR Projects: This module stores the knowledge of past projects through the technique of Case Based Reasoning (CBR) [3]. The purpose of this module is to reuse cases similar to the current project (being or to be developed) for providing the data mining expert with the adequate condition for making the best decisions in the new project. Currently, the environment offers three milestones for decision support. In the first milestone, it helps the project manager estimate the risk of the project being a success or a failure, even before it has started, and recovers the most similar cases to the new project. The second milestone occurs at the projects planning stages where

4 the goal is to define the most appropriate data mining tasks (classification, forecasting etc) based on the most similar past projects. Finally, at the third milestone, available at the preprocessing stage, the environment analyzes and extracts the most similar past transformations of the data. 3) Learned Lessons Database: This module stores the lessons of previous projects through the technique of Case Based Reasoning. Learned Lessons consist of problems, solutions, suggestions and observations that the experts have catalogued in previous projects with the objective of sharing them in future projects or training. In short, a learned lesson is an entry in the environment s module that makes the experience lived and catalogued by users available for future use. An example of an actual catalogued lesson learned refers to importing data in text file format into the SPSS (Statistical Package for The Social Sciences). This tool is likely to modify the formatting of numeric variables and truncate those of the string type. Now, the environment gives a warning for this problem in projects involving text file inputs to SPSS. 4 Knowledge Reuse and Experimental Platform The knowledge stored can be reused in several ways, from supporting the decision of starting a new project to defining the most appropriate data transformation technique. This Section presents how it has been used and the results achieved in actual projects. 4.1 Problem Characterization Here, the decision support system helps decide if a new data mining project should be developed or not, based on previous projects experience. Even before a new project starts, the system estimates its risk of failure (the higher the score, the higher the risk). If the risk is acceptable, the project starts; otherwise, the system presents the conditions that make the project risky for supporting project renegotiation or, in extreme cases, even project halt. This system helps saving a lot of money and time spent in re-work on ill specified projects. A database collected along recent years of data mining project development by NeuroTech has been used for the environment performance assessment in an actual problem of meta-data mining. The metadata database has been imported from 69 data mining projects executed in the past; 27 labeled as success and 42 labeled as failure (69=27+42). The following three criteria were used for this labeling of project target classes: 1) The contracting client s evaluation (satisfied or dissatisfied); 2) NeuroTech s technical team evaluation: success or failure; and 3) Cost/benefit ratio resulting from the project: success or failure. When a project had a negative evaluation in any of the three criteria, it was labeled a failure; otherwise, it was labeled a success, in this binary classification modeling. Each row of the metadata database represents a project developed whose metadata attributes are stored in its columns. For all projects, there are 19 input attributes (explanatory variables) and an output attribute (dependent variable) which represents the target class label (success or failure). Some of the explanatory variables were:

5 company s (client) size (based on revenue), company s (client) experience with previous DW or KDD (number of projects developed) and if the present project needs behavioral data as input information among other variables. Logistic regression from Weka has been the statistical inference technique used for project risk estimation. Due to the small amount of labeled examples (69) available for modeling, the leave-oneout method has been applied as experimental data sampling strategy using MatLab code. The technical performance evaluation of the system was assessed using the R-Project software in two distinct forms: 1) Separability between the distributions of successes and failures measured by the KS statistical test; 2) Simulation of several decision thresholds scenarios on the project scores produced Experimental Results on Risk Assessment The quality of the meta-data mining is assessed by the usual data mining performance metrics. The performance achieved via leave-one-out reached a maximum value of 0.65 on the KS statistical test [12] which represents a statistically significant difference at α=0.05. This shows that, technically, the system can be used for decision support. For finer decisions, Table 1 presents the scenario for several different score thresholds. For each threshold, it presents the rate of detection of failure in the projects, showing that higher score bands contain higher percentage of failures. New projects that produce scores above 75, for instance, have very high risk of failure and should be renegotiated for risk reduction, before the project start. Table 1. Decision scenario for several score thresholds. Score band Failures Successes Total (30%) 21 (70%) (64%) 4 (36%) (60%) 2 (40%) (100%) 0 (0%) 23 Total 42 (61%) 27 (39%) 69 Should such a system be available for assessing the risk of these 69 projects in the past, just signaling those with scores above 75 would have prevented 23 out of the 42 failed projects without increased attention on any successful project. That would have represented a detection of 55% of the failures, from the start. As previously stated, this is an important managerial metrics for this paper CBR Measurements on Project Similarity The same 69 projects used in the metadata database application were imported to the CBR Project database. In practice, the CBR implementation complements the logistic regression project, returning the cases most similar to the new project. In the end, the

6 project manager has a score for project risk assessment and a collection of the most similar previous projects for decision support. For the cases representation, Case Based Reasoning (CBR) with attribute-value representation [3] was the technique used. The similarity is divided into global similarity and local similarity. The global similarity is weighed and normalized nearest neighbour [13]. The local similarity is related to the attributes that describe the case, in other words, the local similarity depends on the nature of the attribute (string, binary, numeric and ordinal). For each attribute of the "string" type a similarity matrix was constructed by interviewing three NeuroTech s project managers. According to the opinion of each one an average opinion was inferred. For the ordinal and binary attributes, local similarity was defined as the module difference of each attribute s values. For the numeric attributes the local similarity was defined by a linear function. In this case, the similarity grows as the weighed distance decreases. Once the structure of the cases and the similarity measures are defined, the CBR problem becomes the recovery of cases in the knowledge database. The recovery process is constituted of a group of sub-tasks. The first task is the assessment of the situation via a query through a group of relevant attributes. The second sub-task for case recovery is the matching strategy and selection. The objective is to identify a group of cases similar to that in query Q which returns k the most similar cases. In this work, the threshold was defined empirically as 0.5 similarity, Therefore only cases with similarity greater than or equal to 50% in relation to question Q will be returned From the results achieved, NeuroTech decided to adopt the environment to estimate the failure probability of its projects using logistic regression and to find the most similar previously developed projects using CBR. Now, new projects go through the model assessment in order to estimate the chance their success before their development. The score threshold was defined as 75, i.e., only the projects with score below 75 will be automatically approved. Every project with a score higher than 75 should be evaluated by the company s committee, formed by the managers in charge of the business area, the customer area, and by the company s chief-scientist. Only after the committee s approval, the project starts; otherwise, some contractual condition and/or project parameters should be altered based on similar cases and again submitted to the model for risk assessment. Some subjective results have been achieved in NeuroTech with the use of the environment. For instance, a new project contracted by a retail business company for credit scoring solution was evaluated with an 89% chance of failure. When the NeuroTech operation manager used the environment for searching similar cases, the most similar project returned by the CBR system was a project developed for a regional bank. In principle, there was no apparent correlation between a big nationwide retailer and a regional bank. When analyzed in more detail, the project in the bank had failed due to characteristics that matched the retailer's current situation particularly, the inexperience of their staff working in information technology and their lack of commitment with the project. Furthermore, neither the retailer nor the bank had ever developed a data mining project before. As the project had already been negotiated and there was no possibility of aborting it, the manager made two decisions. Firstly, he demanded full-time dedication of a member from the retailer s technology team and, secondly, he defined as the first project activity, a quick basic training course for the retailer's team about data mining. Thus, it was possible to

7 reduce the risks of the new project, with the support of the experience from a similar project previously developed Learned Lessons Database Load and Application Aiming at the practical application of this module to actual problems and assessing the benefit of its use, a learned lessons database has been collected at NeuroTech and imported by the environment. Interviews and forms collected experience from 10 data mining experts at several levels of the company, ranging from technical staff working in modeling to chief officers at the board of directors. A wide spectrum of 61 learned lessons was documented in 6 variables, namely: stage of the CRISP-DM, task of stage of the CRISP-DM, date of learning the lesson, expert who learned the lesson, lesson category and lesson description. These 61 lessons were divided into categories in the following proportions: 35.5% in project risk, 24.2% in best practices, 22.6% in technology and 17.7% distributed in other less frequent categories. Furthermore, the 61 learned lessons were also classified in the following types with their respective proportions: 58.1% of guidelines, 22.6% of problems, 16.1% of problem solutions and 3.2% of general spectrum. The application of this learned lessons database follows the same Case-Based Reasoning methodology and metrics described in the CBR section above. The only differences are the database used and the objective. Up to now, the learned lessons module has been used in NeuroTech by the operations manager, mainly at the beginning of the project, as a complement to the risk estimation module. The data mining specialists are also using the module in two situations: corrective or proactive actions. The corrective situation occurs when a new problem is found, for instance, error in the file importation in SPSS. In this case, after the mistake happens, the specialist consults the Learned Lessons database to identify the best solution to the problem. The proactive situation occurs when a new phase of the project begins, for instance, by signaling the risk of disrupting format in the file importation in SPSS. Another proactive action can be taken after having concluded the pre-processing phase and before beginning the application phase of the algorithm. The specialists query the lessons database aiming at verifying if there is any lesson suggested to avoid the same mistakes in the new phase. Despite the subjective evaluations, some practical actions have been taken by NeuroTech. For instance, according to the operations manager, a learned lesson has helped reduce the risk of a new project for developing a fraud detection solution in telecommunications. The lesson informed that the first meeting for solution requirements specification should not be accomplished with the client s business and information technology teams separately, i.e., a learned lesson informed that the first requirements specification meeting should involve both teams at the same time, otherwise, the lack of understanding and alignment would end up increasing the effort and stress for the entire project. In this scenario, the operations manager postponed the meeting to a controlled occasion where both teams would be together.

8 5 Conclusions This paper has presented an environment for the KDD project development process endowed knowledge reuse at a high level. This environment offers three ways of reusing knowledge: 1) project risk assessment and risk explanation based on the metadata database; 2) reuse of project procedures and settings via Case Based Reasoning on the metadata database; and 3) guidelines, recommendations and warnings from the learned lessons database. Several examples of the knowledge reuse application to real world problems have been presented in this paper, ranging from supporting the decision of whether starting or not a new risky data mining project to finding the most appropriate data transformation and parameter settings along its data mining solution development. The experiments have shown that the risk assessment at the beginning of a project along with the risk conditions help developing a high quality project leading to solutions with high chances of matching the clients expectations within the schedule and budget planned. The recent success of NeuroTech on the PAKDD 2007 Competition (First Runnerup) [14] and publication in IJCNN09 [15] has already proved these ideas. Despite its breadth in terms of managing KDD knowledge, this work has been constrained to the boundaries of binary classification problems. Extensions to other types of problems which were kept out of its scope (e.g. time series forecasting) are already under investigation and will demand a lot of effort. At the moment, the environment is in full application to real world problems and, soon, there will be enough metadata for presenting results with statistical significance. References 1. Fayyad, U.M., Piatetsky-Shapiro, G.,Smyth, P.: From data mining to knowledge discovery: an overview, Advances in knowledge discovery and data mining: p (1996) 2. Shearer, C.: The CRISP-DM model: the new blueprint for data mining, Journal of Data Warehousing, 5: p (2000) 3. Bartlmae, K.,Riemenschneider, M.: Case based reasoning for knowledge management in kdd projects. In: Proceedings of the 3rd International Conference on Practical Aspects of Knowledge Management. PAKM 2000, Basel Switzerland (2000) 4. Rodrigues, M.d.F., Ramos, C.,Henriques, P.R.: How to Make KDD Process More Accessible to Users. In: ICEIS. pp (2000) 5. Brachman, R.J., et al.: Integrated Support for Data Archaeology, International Journal of Intelligent and Cooperative Information Systems, 2(2): p (1993) 6. Wirth, R., et al.: Towards Process-Oriented Tool Support for Knowledge Discovery in Databases. In: Principles of Data Mining and Knowledge Discovery. pp Trondheim, Norway (1997) 7. Engels, R., Component-based User Guidance for Knowledge Discovery and Data Mining Processes, in Karlsruhe, p. 234 p Universidade de Karlsruhe (1999) 8. Bernstein, A., Hill, S.,Provost, F. Intelligent assistance for the data mining process: An ontology-based approach (2002) 9. King, R.D., The STATLOG Project 2007, Department of Statistics and Modelling Science

9 10.Neaga, I., Framework for Distributed Knowledge Discovery Systems Embedded in Extended Enterprise, in Manufacturing Engineering, Loughborough, United KingdomLoughborough University (2003) 11.Kimball, R.: The Data Warehouse Lifecycle Toolkit, New York: John Wiley & Sons (1998) 12.Conover, W.J.: Practical Nonparametric Statistics. Vol, 3, New York: John Wiley & Sons (1999) 13.Aamodt, A.,Plaza, E.: Case-Base Reasoning: Foundational Issues, Methodological Variations and Systems Approaches AICOM, 7(1) (1994) 14.Adeodato, P.J.L., et al.: The Power of Sampling and Stacking for the PAKDD-2007 Cross- Selling Problem, International Journal of Data Warehousing & Mining, 4(2): p (2008) 15.Adeodato, P., et al.: The Role of Temporal Feature Extraction and Bagging of MLP Neural Networks for Solving the WCCI 2008 Ford Classification Challenge. In: International Joint Conference on Neural Networks. IJCNN 2009 (accepted) (2009)

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Business Intelligence and Decision Support Systems

Business Intelligence and Decision Support Systems Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Data Warehouse Architecture Overview

Data Warehouse Architecture Overview Data Warehousing 01 Data Warehouse Architecture Overview DW 2014/2015 Notice! Author " João Moura Pires (jmp@di.fct.unl.pt)! This material can be freely used for personal or academic purposes without any

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction

HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction Federico Facca, Alessandro Gallo, federico@grafedi.it sciack@virgilio.it

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

CS590D: Data Mining Chris Clifton

CS590D: Data Mining Chris Clifton CS590D: Data Mining Chris Clifton March 10, 2004 Data Mining Process Reminder: Midterm tonight, 19:00-20:30, CS G066. Open book/notes. Thanks to Laura Squier, SPSS for some of the material used How to

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology Jun-Zhong Wang 1 and Ping-Yu Hsu 2 1 Department of Business Administration, National Central University,

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

Data Mining Applications in Fund Raising

Data Mining Applications in Fund Raising Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented A Comphrehensive Approach to Data Warehouse Testing Matteo Golfarelli & Stefano Rizzi DEIS University of Bologna Agenda: 1. DW testing specificities 2. The methodological framework 3. What & How should

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Subject Description Form

Subject Description Form Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives

More information

Nagarjuna College Of

Nagarjuna College Of Nagarjuna College Of Information Technology (Bachelor in Information Management) TRIBHUVAN UNIVERSITY Project Report on World s successful data mining and data warehousing projects Submitted By: Submitted

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations

More information

An Overview of Database management System, Data warehousing and Data Mining

An Overview of Database management System, Data warehousing and Data Mining An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,

More information

Revenue Recovering with Insolvency Prevention on a Brazilian Telecom Operator

Revenue Recovering with Insolvency Prevention on a Brazilian Telecom Operator Revenue Recovering with Insolvency Prevention on a Brazilian Telecom Operator Carlos André R. Pinheiro Brasil Telecom SIA Sul ASP Lote D Bloco F 71.215-000 Brasília, DF, Brazil andrep@brasiltelecom.com.br

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining

More information

Knowledge Management

Knowledge Management Knowledge Management Management Information Code: 164292-02 Course: Management Information Period: Autumn 2013 Professor: Sync Sangwon Lee, Ph. D D. of Information & Electronic Commerce 1 00. Contents

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

How To Use Data Mining For Loyalty Based Management

How To Use Data Mining For Loyalty Based Management Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland markus.tresch@credit-suisse.ch,

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

A HOLISTIC FRAMEWORK FOR KNOWLEDGE MANAGEMENT

A HOLISTIC FRAMEWORK FOR KNOWLEDGE MANAGEMENT A HOLISTIC FRAMEWORK FOR KNOWLEDGE MANAGEMENT Dr. Shamsul Chowdhury, Roosevelt University, schowdhu@roosevelt.edu ABSTRACT Knowledge management refers to the set of processes developed in an organization

More information

Decision Support and Business Intelligence Systems. Chapter 1: Decision Support Systems and Business Intelligence

Decision Support and Business Intelligence Systems. Chapter 1: Decision Support Systems and Business Intelligence Decision Support and Business Intelligence Systems Chapter 1: Decision Support Systems and Business Intelligence Types of DSS Two major types: Model-oriented DSS Data-oriented DSS Evolution of DSS into

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

Chapter 13: Knowledge Management In Nutshell. Information Technology For Management Turban, McLean, Wetherbe John Wiley & Sons, Inc.

Chapter 13: Knowledge Management In Nutshell. Information Technology For Management Turban, McLean, Wetherbe John Wiley & Sons, Inc. Chapter 13: Knowledge Management In Nutshell Information Technology For Management Turban, McLean, Wetherbe John Wiley & Sons, Inc. Objectives Define knowledge and describe the different types of knowledge.

More information

DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS

DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS Gorgan Vasile Academy of Economic Studies Bucharest, Faculty of Accounting and Management Information Systems, Academia de Studii Economice, Catedra de

More information

Knowledge Base Data Warehouse Methodology

Knowledge Base Data Warehouse Methodology Knowledge Base Data Warehouse Methodology Knowledge Base's data warehousing services can help the client with all phases of understanding, designing, implementing, and maintaining a data warehouse. This

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Methodology Framework for Analysis and Design of Business Intelligence Systems

Methodology Framework for Analysis and Design of Business Intelligence Systems Applied Mathematical Sciences, Vol. 7, 2013, no. 31, 1523-1528 HIKARI Ltd, www.m-hikari.com Methodology Framework for Analysis and Design of Business Intelligence Systems Martin Závodný Department of Information

More information

Animation. Intelligence. Business. Computer. Areas of Focus. Master of Science Degree Program

Animation. Intelligence. Business. Computer. Areas of Focus. Master of Science Degree Program Business Intelligence Computer Animation Master of Science Degree Program The Bachelor explosive of growth Science of Degree from the Program Internet, social networks, business networks, as well as the

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:

More information

SAP InfiniteInsight Explorer Analytical Data Management v7.0

SAP InfiniteInsight Explorer Analytical Data Management v7.0 End User Documentation Document Version: 1.0-2014-11 SAP InfiniteInsight Explorer Analytical Data Management v7.0 User Guide CUSTOMER Table of Contents 1 Welcome to this Guide... 3 1.1 What this Document

More information

The Prophecy-Prototype of Prediction modeling tool

The Prophecy-Prototype of Prediction modeling tool The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

Data Analysis. Management Information Systems 13

Data Analysis. Management Information Systems 13 Data Analysis Management Information Systems 13 166137-01+02 Management Information Systems Spring 2014 Sync Sangwon Lee, Ph. D D. of Information & Electronic Commerce WONKWANG University Prof. Dr. SSL

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

Strategy for Selecting a Business Intelligence Solution

Strategy for Selecting a Business Intelligence Solution Revista Informatica Economică nr. 1(45)/2008 103 Strategy for Selecting a Business Intelligence Solution Marinela MIRCEA Economy Informatics Department, A.S.E. Bucureşti Considering the demands imposed

More information

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk WHITEPAPER Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk Overview Angoss is helping its clients achieve significant revenue growth and measurable return

More information

Data Mining in Construction s Project Time Management - Kayson Case Study

Data Mining in Construction s Project Time Management - Kayson Case Study Data Mining in Construction s Project Time Management - Kayson Case Study Shahram Shadrokh (Assistant Professor) Sharif University of Technology, Shadrokh@sharif.edu Seyedbehzad Aghdashi (PhD Student)

More information

Data Warehouses in the Path from Databases to Archives

Data Warehouses in the Path from Databases to Archives Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools Paper by W. F. Cody J. T. Kreulen V. Krishna W. S. Spangler Presentation by Dylan Chi Discussion by Debojit Dhar THE INTEGRATION OF BUSINESS INTELLIGENCE AND KNOWLEDGE MANAGEMENT BUSINESS INTELLIGENCE

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

Tom Khabaza. Hard Hats for Data Miners: Myths and Pitfalls of Data Mining

Tom Khabaza. Hard Hats for Data Miners: Myths and Pitfalls of Data Mining Tom Khabaza Hard Hats for Data Miners: Myths and Pitfalls of Data Mining Hard Hats for Data Miners: Myths and Pitfalls of Data Mining By Tom Khabaza The intrepid data miner runs many risks, including being

More information

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database

More information

Big Data. Introducción. Santiago González <sgonzalez@fi.upm.es>

Big Data. Introducción. Santiago González <sgonzalez@fi.upm.es> Big Data Introducción Santiago González Contenidos Por que BIG DATA? Características de Big Data Tecnologías y Herramientas Big Data Paradigmas fundamentales Big Data Data Mining

More information

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com INDUSTRY DEVELOPMENTS AND MODELS Predictive Analytics and ROI: Lessons from IDC's Financial Impact

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Objectives

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT

APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT AIMAN TURANI Associate Prof., Faculty of computer science and Engineering, TAIBAH University, Medina, KSA E-mail: aimanturani@hotmail.com ABSTRACT

More information

Implementing Business Intelligence in Textile Industry

Implementing Business Intelligence in Textile Industry Implementing Business Intelligence in Textile Industry Are Managers Satisfied? 1 Kornelije Rabuzin, 2 Darko Škvorc, 3 Božidar Kliček 1,Kornelije Rabuzin University of Zagreb, Faculty of organization and

More information

<name of project> Software Project Management Plan

<name of project> Software Project Management Plan The document in this file is adapted from the IEEE standards for Software Project Management Plans, 1058-1998, which conforms to the requirements of ISO standard 12207 Software Life Cycle Processes. Tailor

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Data Mining Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information