Knowledge Discovery from Databases
|
|
|
- Nigel Anthony
- 10 years ago
- Views:
Transcription
1 Encyclopedia of Database Technologies and Applications Knowledge Discovery from Databases Jose Hernandez-Orallo Dep. of Information Systems and Computation Technical University of Valencia, Spain I N T R O D U C T I O N As databases are pervading every parcel of reality, they record what happens in and around organizations all over the world. Databases store the detailed history of organizations, institutions, governments and individuals. An efficient and agile analysis of the data recorded in a database can no longer be done manually. Knowledge Discovery from Databases (KDD) is a collection of technologies which aim at extracting nontrivial, implicit, previously unknown, and potentially useful information (Fayyad, Piatetsky-Shapiro, Smyth & Uthurusamy, 1996) from raw data stored in databases. The extracted patterns, models or trends can be used to better understand the data, and hence the context of an organization, and to predict future behaviors in this context that could improve decision making. KDD can be used to answer questions such as: Is there a group of customers buying a special kind of products? Which sequence of financial products improves the chance of contracting a mortgage? Which telephone call patterns suggest a future churn? Are there relevant associations between risk factors in coronary diseases? How can I assess my messages more or less likely to be spam (junk mail)? The previous questions cannot be answered by other tools usually associated to database technology such as OLAP tools, decision support systems, executive information systems, etc. The key difference is that KDD does not convert information into (more aggregated or interwoven) information, but generates inductive models (under the form of rules, equations or other kind of knowledge) that could be sufficiently consistent with the data. In other words, KDD is not a deductive process but an inductive one. B A C K G R O U N D The more and more rapid evolution of the context of organizations forces the revision of their knowledge relentlessly (what used to work before does no longer work). This provisional character of knowledge helps understand why knowledge discovery from databases, still in its inception, is being so successful. The models and patterns that can be obtained by data mining methods are useful for virtually any area dealing with information: finance, insurance, banking, commerce,
2 marketing, industry, private and public healthcare, medicine, bioengineering, telecommunications and many other areas (Berry & Linoff, 2004). The name KDD dates back to the early nineties and is not a new technology itself; KDD is a heterogeneous area that integrates many techniques from several different fields without prejudices, incorporating tools from statistics, machine learning, databases, decision support systems, data visualization, WWW research, among others, in order to obtain novel, valid, and intelligible patterns from data (Han & Kamber 2001; Hand, Mannila & Smyth 2000; Berthold & Hand 2002; Dunham 2003). T H E P R O C E S S O F K N O W L E D G E D I S C O V E R Y F R O M D A T A B A S E S The KDD process is a complex, elaborate process that comprises several stages (Han & Kamber, 2001, Dunham, 2003): data preparation (including data integration, selection, cleansing and transformation), data mining, model evaluation and deployment (including model interpretation, use, dissemination and monitoring). CRISP-DM ( (CRoss-Industry Standard Process for Data Mining) is a standard reference that serves as a guide to carry out a knowledge discovery project and comprises all the stages mentioned above. Figure 1 shows these stages. According to this process, data mining is just a stage of the knowledge discovery process. The data mining stage is also called the modeling stage and converts the prepared data (usually in the form of a minable view with an assigned task) into one or more models. This stage is thereby the most characteristic of the whole process, and data mining is frequently used as a synonym for all the process. The process is cyclic, since after a complete cycle, the goals can be revised or extended, in order to start the whole process again. Figure 1: Stages of the Knowledge Discovery Process Business Understanding Data Understanding Deployment Data Preparation Evaluation Modeling (Data Mining) Many references on KDD choose the data preparation stage as the start of the knowledge discovery process (Han & Kamber, 2001, Dunham, 2003, Fayyad, 1996).
3 The CRISP-DM standard, however, emphasizes some issues that must be tackled before: business understanding and data understanding, which includes data integration. This is reasonable: the first thing before starting with a KDD project is to know what the business needs are, to establish the business goals according to the business context, to see whether there is (or we can get) enough data to solve them, and, in this case, to specify the data mining objectives. For example, consider a distribution company that has a problem of inadequate stocks, which are generating higher costs and, frequently, make some perishable products expire. From this problem, one business objective could be to reduce the stock level of perishables. If there is an internal database with enough information about the orders performed by each customer in the last three years, and we can complement this with external information about the reference market sale prices of these product categories, this could be sufficient to start looking for models to address the business objective. This business objective could be translated into one or more data mining objectives, e.g., predict how many perishable products by category the customer is to buy week by week, from the orders performed during the last three years and the current market sale price. Table 1: Keys to Success in a Knowledge Discovery Project Business needs must drive the knowledge discovery project. The business problems and, from these, the business objectives must be clearly stated. These business objectives will help understand the data that will be needed and the scope of the project. Business objectives must be translated into specific data mining objectives, which must be relevant for the organization. The data mining objectives must be accompanied with a specification on the required quality of the models to be extracted in terms of a set of metrics and features: expected error, reliability, costs, relevance, comprehensibility, etc. The knowledge discovery project must be integrated with other plans in the organization and must have the unconditional support from the organization executives. Data quality is crucial. The integration of data from several sources (internal or external), its neatness and an adequate organization and availability (using a data warehouse, if necessary) is a sinequanon in knowledge discovery. The use of integrated and friendly tools is also decisive, helping the process on many issues, not only for the specific stages of the process but also on other issues: documentation, communication tools, workflow tools, etc. The need of a heterogeneous team, comprising not only professionals with a specific data mining training but also professionals from statistics, databases and business. A strong leadership among the group is also essential. In order to translate the positive results of knowledge discovery into positive results for the organization it is necessary to use a holistic evaluation and an ambitious deployment of the models to the know-how and daily operation of the organization. Before starting the discussion about each specific KDD stage, it is important to highlight that many key issues on the success of a KDD project have to do with a
4 good understanding of the goals and the feasibility of these goals, as well as a precise assessment of the resources needed (data, human, software, organizational). Table 1 shows the most important issues for success in a KDD project. Additionally, a major reason for a possible failure in KDD projects may stem from an excessive focus on technology: implementing data mining because others do, without recognizing what the needs of the organization are and without understanding the resources and data necessary to cover them. Data Integration and Preparation Once the data mining goals are clear and the data that will be required to achieve them is located, it is necessary to get all the data and integrate them. Usually, the data can come from many sources, either internal or external. The typical internal data, and generally the main source of information, includes one or more transactional databases, possibly complemented with additional internal data, in the form of spreadsheets or other kind of formatted or unformatted data. External data includes all other data which can be useful to achieve the data mining objectives, such as demographic, geographic, climate, industrial, calendar or institutional information. Some small-scale data mining projects can be done on a small set of tables or even on a couple of data files. But, generally, the size, different sources and formats of all this information require the integration into one single repository. Repositories of this kind are usually called data warehouses. Data warehouses and related technologies such as OLAP are covered by another entry in this encyclopedia. The reader is referred to this entry for more information. It is important to note, however, that the data requirements for a data warehouse for OLAP analysis are not exactly the same that those for a data warehouse whose purpose is data mining. Data mining usually requires more finegrain data (less aggregation) and must gather the data needed for the data mining objectives, not for performing complex reports or complex analytical queries. Even so, OLAP tools and data mining tools can work well together on the same data warehouse, which is designed taking into account both needs. Along or after the task of determining the required data and integrating them, there is an even more thorny undertaking: data cleansing, selection and transformation. The quality of data is even more important for data mining than it is for transactional databases. Nevertheless, since the data used for data mining is generally historical, we cannot act on the sources of the data. In many situations the only possible solution is cleansing the data, by a heterogeneous set of techniques grouped around the term data cleansing, which might include exploratory data analysis, data visualization, and other techniques. Data cleansing mostly focus on inconsistent and missing data. First, inconsistent data can only be detected by comparing the data with database constraints or context knowledge, or, more interestingly, by using some kind of statistical analysis for detecting outliers. Outliers might or might not be errors, but at least focus the attention of an expert or tool to check whether they are indeed an error. In this case, there are many possibilities: to remove the entire row, the entire
5 column, to substitute the value by a null, by the mean or by an estimated value, among others. Secondly, missing data may seem easier to be detected, at least when the data come from databases which use null values, but can be more difficult if the data repository comes from several sources not using null values to represent unknown or non applicable data. The ways to handle missing values are analogous to those for handling wrong data. Data cleansing is essential since many data mining algorithms are highly sensitive to missing or erroneous values. Having an integrated and clean data repository is not sufficient to perform data mining on them. Data mining tools may not able to handle the overall data repository, or this would be simply prohibitive. Consequently, it is important to select the relevant data from the data repository, de-normalize and transform them, in order to set up a view, which is called the minable view. Data selection can be done horizontally (data sampling or instance selection) or vertically (feature selection). Data transformation includes a more varied collection of operations: creation of derived attributes: by aggregation, combination, numerization or discretization, by principal component analysis, de-normalization, pivoting, etc. Finally, data integration and preparation must be done with a certain degree of understanding about the data itself, and frequently requires more than half of the time and resources of the whole knowledge discovery process. Furthermore, working on the integration and preparation of data, and especially through data visualization (Fayyad, Grinstein & Wierse, 2001), generates a better understanding of the data, and this might suggest new data mining objectives or could show that some of them are unfeasible. Data Mining The input of the data mining, modeling or learning stage, is usually a minable view, jointly with a task. A data mining task specifies the kind of problem to be solved and the kind of model to be obtained. Table 2 shows the most important data mining tasks. Data mining tasks can be divided into two main groups: predictive and descriptive, depending on the use of the patterns obtained. Predictive models aim at predict future values of unseen data, such as a model that predicts the sales for next year. Descriptive models aim at describing or understanding the data, such as there is a strong sequential association between contracting the free-mum-calls service and churn. In order to solve a data mining task, we need to use a certain data mining technique, such as decision trees, neural networks, linear models, support vector machines, etc. A list of the most popular data mining techniques is shown in Table 3 (Han & Kamber, 2001, Hand, Mannila & Smyth, 2000, Berthold & Hand, 2002, Mitchell, 1997, Witten & Frank, 1999). There are some techniques which can be used for several tasks. For instance, decision trees are generally used for classification, but can also be used regression and for clustering. Other techniques are specific for solving one single task. For instance, the a priori algorithm is specific for association rules.
6 Table 2: Data Mining Tasks Predictive: o Classification: the model predicts a nominal output value (one from two or more categories) with one or more input variables. Related tasks are categorization, ranking, preference learning, class probability estimation,... o Regression: the model predicts a numerical output value with one or more input variables. Related tasks are sequential prediction and interpolation. Descriptive: o Clustering: the model detects natural groups in the data. A related task is summarization. o Correlation and factorial analysis: the relation between two (bivariate) or more (multivariate) numerical variables is ascertained. o Association discovery (frequent itemsets): the relation between two or more nominal variables is determined. Associations can be undirected, directed, sequential,... Table 3: Data Mining Techniques Exploratory data analysis and other descriptive statistical techniques. Parametrical and non-parametrical statistical modeling: linear models, generalized linear models, discriminant analysis, Fisher linear discriminant functions, logistic regression,... Frequent itemset techniques: Apriori algorithm and extensions, GRI,... Bayesian techniques: Naïve Bayes, Bayesian networks, EM,... Decision trees and rules: CART, ID3/C4.5/See5, CN2,... Relational and structural methods: inductive logic programming, graph learning,... Artificial neural networks: perceptron, multilayer perceptron with backpropagation, Radial Basis Functions,... Support vector machines and other kernel-based methods: margin classifiers, soft margin classifiers,... Evolutionary techniques, fuzzy logic approaches and other soft computing methods: genetic algorithms, evolutionary programming, genetic programming, simulated annealing, Wang & Mendel algorithm,... Distance-based methods and case-based methods: nearest neighbors, hierarchical clustering (minimum spanning trees), kmeans, self-organizing maps, LVQ. The existence of so many techniques for solving a few tasks is due to the fact that no technique can be better than the rest for all possible problems. They also differ in other features: some of them are numerical, others can be expressed in the form of rules, some are quick, others quite slow,... Hence, it is useful to try an
7 assortment of techniques for a particular problem, and to retain the one that gives the best model. Model Evaluation, Interpretation and Deployment Any data mining technique generates a tentative model, a hypothesis, which must be assessed before even thinking about using it. Furthermore, if we use several samples of the same data or use several algorithms for solving the same task, we will have several models, and we have to choose from them. There are many evaluation techniques and metrics. Metrics usually give a value of the validity, predictability or reliability of the model, in terms of the expectation of some defined error or cost. The appropriate metrics depend on the data mining task. For instance, a classification model can be evaluated with several metrics: accuracy, cost, precision/recall, area under the ROC curve, logloss, and many others. A regression model can be evaluated by other metrics: squared error, absolute error and others. Association rules are usually evaluated by confidence/support. In order to estimate the appropriate metric with reliability it is well-known that the same data that was used for training should not be used for evaluation. Hence, there are several techniques to do this better: train and test evaluation, crossvalidation, bootstrapping,... Both the evaluation metric and the evaluation technique are necessary to avoid one typical problem of learned models: overfitting, i.e. lack of generality. Once the model is evaluated as feasible, it is important to interpret and analyze its consequences. Is it comprehensible? Is it novel? Is it consistent with our previous knowledge? Is it useful? Can be put into practice? All these questions are related to the pristine goal of KDD, to obtain nontrivial, previously unknown, comprehensible and potential useful knowledge. Next, if the model is novel and useful, we would like to apply or scatter it inside the organization. This deployment can be done manually or by embedding the obtained model into software applications. For this purpose, standards for exporting and importing data mining models, such as the Predictive Model Markup Language (PMML) standard, defined by the Data Mining Group ( can be of great help here. Last, but not least, we must recall that our knowledge is always provisional. Therefore, we must monitor our models with the new incoming data and knowledge, and revise or re-generate them when they are no longer valid. F U T U R E T R E N D S Current and future areas of research are manifold. First, KDD can be specialized depending on the kind of data handled. Data mining has not only be applied to structured data in the form of, usually relational, databases, but it can also be applied to other heterogeneous kinds of data: semi-structured data, text, web, multimedia, geographical,... This has given names to specific areas of KDD: web mining (Kosala & Blockeel, 2000), text mining (Berry 2003), multimedia
8 mining (Simoff, Djeraba & Zaïane, 2002),... Intensive research is being done in these areas. Other more general areas of research include: dealing with specific, heterogeneous or multi-relational data, the latter known as relational data mining (Dzeroski & Lavrac, 2001), data mining query languages, data mining standards, comprehensibility, scalability, tool automation and user friendliness, integration with data warehouses, data preparation,... (Smyth, 2001, Domingos & Hulten, 2001, Srikant, 2002, Roddick, 1998). The progress in all these areas will make KDD an even more ubiquitous and indispensable database technology than it is today. C O N C L U S I O N This article has shown the main features of the process of Knowledge Discovery from Databases (KDD): goals, stages and techniques. KDD can be considered inside the group of other analytical, decision support and business intelligence tools that use the information contained in databases to support strategic decisions, such as OLAP, EIS, DSS or more classical reporting tools. However, we have shown that the most distinctive trait of KDD is its inductive character, which converts information into models, data into knowledge. There are many other important issues around KDD not discussed so far. The appearance of data mining software constituted the final boost for KDD. During the nineties many companies and organizations released specific data mining algorithms and general purpose tools, known as data mining suites. While, initially, the tools focused on integrating an assortment of data mining techniques such as decision trees, neural networks, logistic regression, etc. nowadays the suites incorporate techniques for all the stages of the KDD process, including techniques for connecting to external data sources, data preparation tools and data evaluation and interpretation tools. Additionally, data mining suites, either vendorspecific or general, are being more and more integrated with DBMS and other decision support tools, and this trend will be reinforced in the future. R E F E R E N C E S Berry, M.W. (2003) Survey of Text Mining: Clustering, Classification, and Retrieval, Springer Verlag, Berry, M.J.A., & Linoff, G.S. (2004) Data Mining Techniques : For Marketing, Sales, and Customer Relationship Management, 2nd Edition, Wiley Computing Publishers, Berthold, M., & Hand, D.J. (ed.) (2002) Intelligent Data Analysis. An Introduction, Second Edition, Springer Dzeroski, S., & Lavrac, N. (2001) Relational Data Mining, Springer Domingos, P., & Hulten, G. (2001) "Catching Up with the Data: Research Issues in Mining Data Streams" Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), 2001
9 Dunham, M. H. (2003) Data Mining. Introductory and Advanced Topics, Prentice Hall, Fayyad U., Piatetsky-Shapiro G., Smyth P., & Uthurusamy R. (1996) Advances in knowledge discovery and data mining, Cambridge, M.A.: MIT Press, Fayyad, U.M. Grinstein, G. & Wierse, A. (2001) Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, Harcourt Intl., Han, J., & Kamber, M. (2001) Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, Hand, D.J., Mannila, H., & Smyth, P. (2000) Principles of Data Mining, The MIT Press, Kosala, R., & Blockeel, H. (2000) Web Mining Research: A Survey, ACM SIGKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, vol. 2, pp: 1-15, Mitchell, T.M. (1997) Machine Learning, McGraw-Hill Roddick, H, (1998) Data Warehousing and Data Mining: Are we working on the right things? Advances in Database Technologies. Berlin, Springer-Verlag. Lecture Notes in Computer Science Kambayashi, Y., Lee, D. K., Lim, E.-P., Masunaga, Y. and Mohania, M., Eds , Simoff, S.J., Djeraba, C., & Zaïane O.R (2002) MDM/KDD 2002: Multimedia Data Mining between Promises and Problems, SIGKDD Explorations 4(2): , Smyth. P. (2001) Breaking Out of the Black-Box: Research Challenges in Data Mining, Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD 2001 Srikant, R. (2002) New Directions in Data Mining, Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD, 2002 Witten, I.H., & Frank, E. (1999) Tools for Data Mining, Morgan Kaufmann, Terms and Definitions Data Mining: It is a central stage in the process of knowledge discovering from databases. This stage transforms data, usually in the form of a minable view, into some kind of model (decision tree, neural network, linear or non-linear equation, etc.). Because of its the central stage in the KDD process and its catchy name, it is frequently used as a synonym for the whole KDD process. Machine Learning: A discipline in computer science, generally considered a subpart of artificial intelligence, which develops paradigms and techniques for making computers learn autonomously. There are several types of learning: inductive, abductive, by analogy. Data mining integrates many techniques from inductive learning, devoted to learn general models from data. Overfitting: A frequent phenomenon associated to learning, when models do not generalize sufficiently. A model can be overfitted to the training data, performing badly on fresh data. This means that the model has internalized not only the regularities (patterns) but also the irregularities of the training data (e.g. noise), which are useless for future data. Minable View: This term is used to refer to the view constructed from the data repository which is passed to the data mining algorithm. This minable view must include all the relevant features for constructing the model.
10 Classification Model: A pattern or set of patterns that allows a new instance to be mapped to one or more classes. Classification models (also known as classifiers) are learned from data where a special attribute is selected as the class. For instance, a model that classifies customers between likely to sign a mortgage and customers unlikely to do so, is a classification model. Classification models can be learned by many different techniques: decision trees, neural networks, support vector machines, linear and non-linear discriminants, nearest neighbors, logistic models, Bayesian, fuzzy, genetic techniques, etc. Regression Model: A pattern or set of patterns that allows a new instance to be mapped to one numerical value. Regression models are learned from data where a special attribute is selected as the output or dependent value. For instance, a model that predicts the sales of the forthcoming year from the sales of the preceding years is a regression model. Regression models can be learned by many different techniques: linear regression, local linear regression, parametric and non-parametric regression, neural networks, etc. Clustering Model: A pattern or set of patterns that allows examples to be separated into groups. All attributes are treated equally and no attribute is selected as output. The goal is to find clusters such that elements in the same cluster are similar between them but are different to elements of other clusters. For instance, a model that groups employees according to several features is a clustering model. Clustering models can be learned by many different techniques: k- means, minimum spanning trees (dendrograms), neural networks, Bayesian, fuzzy, genetic techniques, etc. Association Rule: A rule showing the association between two or more nominal attributes. Associations can be directed or undirected. For instance, a rule of the form if the customer buys French fries and hamburgers s/he buys ketchup is a directed association rule. The techniques for learning association rules are specific, and many of them, such as the Apriori algorithm, are based on the idea of finding frequent itemsets in the data.
Introduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain [email protected] Horsens, Denmark, 26th September 2005
Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
Subject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
Chapter 2 Literature Review
Chapter 2 Literature Review 2.1 Data Mining The amount of data continues to grow at an enormous rate even though the data stores are already vast. The primary challenge is how to make the database a competitive
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Data Mining: An Introduction
Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 1 Introduction SURESH BABU M ASST PROF IT DEPT VJIT 1 Chapter 1. Introduction Motivation: Why data mining? What is data mining? Data Mining: On what kind of
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
Data Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: [email protected] http://sci2s.ugr.es
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Perspectives on Data Mining
Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London [email protected] April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE
www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. [email protected]
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Knowledge Discovery from Data Bases Proposal for a MAP-I UC
Knowledge Discovery from Data Bases Proposal for a MAP-I UC P. Brazdil 1, João Gama 1, P. Azevedo 2 1 Universidade do Porto; 2 Universidade do Minho; 1 Knowledge Discovery from Data Bases We are deluged
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell [email protected] David Kopcso Babson College [email protected] Abstract: A series of simulations
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Data Mining: Motivations and Concepts
POLYTECHNIC UNIVERSITY Department of Computer Science / Finance and Risk Engineering Data Mining: Motivations and Concepts K. Ming Leung Abstract: We discuss here the need, the goals, and the primary tasks
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Data Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
Three Perspectives of Data Mining
Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining
Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge
Chapter ML:XI. XI. Cluster Analysis
Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York [email protected] Abstract
An Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
Computational Intelligence in Data Mining and Prospects in Telecommunication Industry
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 Scholarlink Research Institute Journals, 2011 (ISSN: 2141-7016) jeteas.scholarlinkresearch.org Journal of Emerging
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
2.1. Data Mining for Biomedical and DNA data analysis
Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: [email protected]) Dr. G.N. Singh Department of Physics and
Master of Science in Health Information Technology Degree Curriculum
Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525
Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management
Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Simplified Data
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
AMIS 7640 Data Mining for Business Intelligence
The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2013, Session
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology [email protected] Abstract This paper is
Knowledge Discovery Process and Data Mining - Final remarks
Knowledge Discovery Process and Data Mining - Final remarks Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 14 SE Master Course 2008/2009
How To Use Data Mining For Loyalty Based Management
Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland [email protected],
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept
Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions
Data Mining for Successful Healthcare Organizations
Data Mining for Successful Healthcare Organizations For successful healthcare organizations, it is important to empower the management and staff with data warehousing-based critical thinking and knowledge
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa [email protected] skype, gtalk: avellido Tels.:
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
