Chapter-4. Forms & Steps in Data Mining Operations. Introduction
|
|
- Constance Preston
- 8 years ago
- Views:
Transcription
1 Chapter-4 Forms & Steps in Data Mining Operations Introduction In today's market place, business managers must take timely advantage of high return opportunities. Doing so requires that they be able to exploit the mountains of data their organizations generate and collect during daily operations. Yet, difficulty of discerning the value in that information --- of separating the wheat form the chaff prevents many companies from fully capitalizing on the wealth of data at their disposal. For example, a bank account manager might want to identify a group of married, two income, affluent customers and send them information about the bank's growth mutual funds, before a competing discount broker can lore them away the information surely resides in the bank's computer system and has probably been there in some form for years. The trick of course, is to find an efficient way to extract and apply it. Data mining is the process of extracting valid previously unknown, comprehensible and actionable information from large data bases and using it to make crucial business decisions: currently performs this task for a growing range of business. After presenting a overview of current data mining techniques, it explores two particularly noteworthy applications of those techniques: market basket analysis and customer segmentations. 4.1 FORMS OF DATA MINING Data mining takes two forms: Verification- driven data mining extracts information in the process of validating a hypothesis postulated by a user it involves techniques such as statistical and multidimensional ctnalysis,,discovery - division data mining uses tools such as symbolic and neural clustering, association discovery, and supervised 55
2 induction to automatically extract information. The extracted information from both approaches takes one of several forms. regression of classification models relations between database records and Deviations from norms, among others. To be effective a data mining application must do three things. First, it must have access to organization- wide views of data, instead of department - specific ones. Frequently the organization's data is supplemented with open- source or purchased Data. The resulting database is called the data warehouse.. During data integration, the application often cleans the data - by removing duplicates, deriving missing values (when possible) - and establishing new, derived attributes, for example. Second, the data mining application must mine the information in the warehouse. Finally, it must organize and present the mined information in a way that enables discussions making. Systems that can satisfy one or more of these requirements range from commercial decision Support systems to customized decisions- support systems and executive information systems. The overall objective or each decision making operation determines the type of information to be mined and the ways for organizing the mined information. For example, by establishing the objective of identifying good prospective customers for mutual funds, the bank account manager mentioned earlier implicitly indicates that she wants to segment the database of bank customers into groups of related customers - such as urban, married, two - income, mid-thirties, low - risk, high -net-worth individuals --:- and establishes the vulnerability of each group of various types of promotional campaigns. 4.2 BASIC STEPS IN DATA MINING Once a data warehouse has been developed, the data mining process falls into four basic steps: data selection, data transformation, data mining and result interpretation. 56
3 4.2.1 DATA SELECTION A data warehouse contains a variety of data, not all of which is needed to achieve each data -mining goal. The first step in the data~mining process is to select the target data. For example, making databases contain data describing customer purchases, demographics and life style preferences. To identify which items and quantities to purchase for a particular store, as well as how to organize the items on the store's shelves a marketing executive might need only to combine customer purchase data with demographic data. The selected data types may be organize along multiple tables, during data selection; the user might need to perform table joins. Furthermore, even after selecting the desired database tables, mining the contents of the entire table is not always necessary for identifying useful information. Under certain conditions and for certain types of data- mining operations (such as when creating a classification of regression model), it is usually a less expensive operations to sample the appropriate table, which might have been created by joining other tables, and then mine only the sample DATA TRANSFORMATION I After selecting he desire database tables and identifying the data to be mined, the user typically needs to perform c~rtain transformation on the data. Three considerations dictate which transformation to use : the task (mailing - list creation, I for example), the data mining operations (Such as predicative modeling), and the data mining technique ( Such as neural networks) involved. Transformation methods include organizing data in desire ways (organizing individual consumer data by household), and converting one type of data to another (Changing nominal values into numeric ones so that they can be processed by a neurai network). 57
4 Another transformation type, the definition to new attributes (derived attributes) involves applying mathematical or logical operation oh the one or more data- base attributes - for Example, by defining the ratio of two attributes DATA MINING. The user subsequently mines the transformed data using one or more techniques to. extract the desire type of information. For example, to develop an accurate, symbolic classification model that predicts whether magazine subscribers will renew their subscriptions a circulation's manager might need to first use clustering to segment the subscriber database, then apply rule induction to automatically create a classification model for each desired cluster: RESULTINTERPRETATION The user must finally analyze the mined information according to his decisionsupport task and goals. Such analysis identifies the best of the information. For example, if a classification model has been developed, during result interpretation, the data- mining application will test model's robustness, using established error - estimation methods such as cross validation. During this Step, the use must also deterinine how best to present the selected mining - operation results to the decision maker. Who will apply them in taking specific actions. (In certain domains, user of the data mining application _,. usually a business analyst - is not the decision, the user of the data - mining application - usually a business analyst- is not the decision maker. The latter may take business decisions by capitalizing on the data- mining results through a simple query and reporting tool.) For Example, the user might decide that the best way to present the classification model is logically in the form of if- the rules. Three observations emerge from this four- step process: Mining is only one step in the. overall process. The quality of the mined information is a function of both the effectiveness of the data- mmmg 58
5 technique used and the quality, and often size, of the data being mined. If users select the wrong data, choose inappropriate attributes, or transform the selected data inappropriately, the results will likely suffer. The process in not linear but involves a variety of feedback loops. After selecting a particular data - mining techniques, a user might determine that the selected data must be preprocessed in particular ways or that the applied technique did not produce results of the expected quality. The user then must repeat earlier steps, which might mean restarting the entire process from the beginning. Visualization plays an important role in the various steps. In particular, during the selection and transformation steps, a user could use statistical visualization -such as scatter plots or histograms- to display the result of exploratory data analysis. Such exploratory analyses often provide preliminary understanding of the data, which helps the user select certain data subsets, During the mining step, the user employs domain specific visualizations. Finally visualizationseither special landscapes or business graphics - can present the result of a mining operation. 4.3 VERIFICATION-DRIVEN DATA MINING OPERATIONS Seven operations are associated with data mining: three with verification driven data mining and four with discovery driven data mining. Verification-driven data-mining Operations. These include query and reporting multidimensional analysis, and statistical analysis QUERY AND REPORTING. This operation constitutes the most basic form of decision support and data mining. Its goal is to validate a hypothesis expressed by the user, such as "sales of. ' four - wheel -driven vehicles increase dl}ri?~ t~e winter".. 59
6 Validating a hypothesis through a query and reporting operation entails creating a query or set of queries, that best express the stated hypothesis, posing the query to the database, and analyzing the returned data to establish whether it supports or refutes the hypothesis.. Each data interpretation or analysis step might lead to additional queries, either new ones or refinements of the initial one. Reports subsequently compiled for distribution through - out an organization contain selected analyses results, presented in graphical, tabular and textual form and including a subset of the queries, because these include the queries, analysis can be automatically repeated at redefined times, such as once a month MULTIDIMENSIONAL ANALYSIS While traditional query and reporting suffices for several types of verification - driven date mining, effective data mining in certain domains requires the creation of very complex queries. These often contain an embedded temporal dimension and may also express change between two stated events. For example, the regional manager of a department store chain might say, "Show me weekly sales during the first quarter of 1994 and 1995,.for Midwestern stores, broken down by department". Multi-dimensional database, often implemented as multidimensional arrays. Organize data along predefined dimensions ( time or department, for Example), Have facilities for taking advantage of sparsely populated portions of the multidimensional structure, and provide specialized language that facilitate queering along dimensions while expending query - processing performance. These databases also allow hierarchical organization of the data along each dimension, with summaries on the higher levels of the hierarchy and the actual data at the lower levels. Quarterly sales might take one level of summarization and monthly sales a second level, with the actual daily sales taking the lowest level of the hierarchy. 60
7 4.3.3 STATISTICAL ANALYSIS Simple statistical analysis operations (Such as first - order statistics) usually executive execute during both query and reporting, as well as during multidimensional analysis. Verifying more complex hypotheses, however, requires statistical operations (such as principal -component analysis regression modeling), coupled with data visualization tools. (SAS, SPSS, S+) incorporate components. that. can be used for discovery-driven modeling (such as CHAID in SPSS and S+), to be effective, statistical analysis must rest on a methodology, such as exploratory data analysis. A methodology might need to be a business of domain-department, so statistics tools such as SAS and SPSS are open -ended, providing function libraries.. that can be organize into larger analysis software systems. 4.4 Evaluation Measures Since multi-label classification has been investigated mostly in text I categorisation, there is very little work conducted on developing evaluation measures 'i''! ',.,. for its classifiers. There are no standard evaluation.techniques applicable to the multilabel classification problems. Moreover, the right measure is often problematic and depends heavily on the features of the conducted problem, such as those used in [3]. In this section, we introduce three evaluation measures suitable for the majority of binary, multi-class and multi-label classification problems Top-label This evaluation measure takes into consideration only the top-ranked class label and ignores any other labels associated with an instance. For traditional classification task where there is only one class label to assign to the test object, and given an instance and its associated class label <d, y>, a classifier H predicts a list of ranked I 2 3 k. class labels Yj = < Yj, Yj ' Yj,... Yj > If the predicted first class label matches the true class label y of the instance, i.e. Y 1 1 = y, then the classification is correct. The top label method estimates how many times the top-ranked class label is I the correct class label So, for a set of single-class instances/=< (xl, yl), (:x2, 61
8 y2),...,(xm, ym)>, the top-label is 1/m ~m =J( Jj 1 = y) where m represents the number of instances. 4.5 Entropy-based Associative Classifier: We denote as class association rules (CARs) [18] those association rules of the form X! c, where the antecedent (X) is composed of feature variables and the consequent (c) is just a class. CARs may be generated by a slightly modified association rule mining algorithm. Each itemset must contain a. class and the rule generation also follows a template in which the consequent is just a class. CARs are essentially decision rules, and as in the case of decision trees, CARs are ranked in decreasing order of information gain. Finally, during the testing phase, the associative classifier simply checks whether each CAR matches the test instance; the class associated with the first match is chosen. Note that, seen in the light of CARs, a decision tree is simply a greedy search for CARs, using a level-wise search algorithm that only expands the current best rule with other features. On the other hand, an eager associative classifier mines all possible CARs with a given minsup. It is also interesting to note that sorting the final rule-set on information gain, and using the best CAR for classification, is also a greedy strategy. While the greedy approach has its limitations, eager associative classifiers -are not limited by the prefix problem of decision rules, that is, once -the best feature is chosen at each node, all nodes under that subtree must contain it. let D be the set of all n training instances. let T be the set of all m test instances. 1. Let Ce be the set of all rules {X! c} mined from D 2. Sort Ce according to information gain 3. for each ti 2 T do 4. Pick the first rule {X! c} 2 Ce I X_ ti 5. Predict class c 62
9 This shows the basic steps of the eager associative classifier. In the initial step, the algorithm mines all frequent CARs, and sorts them in descending order of information gain. Then, for each test instance ti, the first CAR matching ti is used to predict the class. It shows an associative classifier built from our example set of training instances, using the above algorithm. Three CARs match the test instance of our example (last row of Table 1 ): 1. {windy=false and temperature=cool!play=yes} 2. { outlook=sunny and humidity=high!play=no} 3. { outlook=sunny and temperature=cool! play=yes} Rule { windy=false and temperature=cool! play=yes} would be selected, since it is the best ranked CAR. By applying this CAR, the test instance will be correctly classified. Intuitively, associative classifiers perform better than decision trees because associative classifiers allow several CARs to cover the same partition of the training data. In our example, the test case is recognized by only one rule in the decision tree, while the same test case is recognized by three CARs in the associative classifier. Selecting the proper CAR to be applied is an issue in associative classification. Next we present a theoretical discussion about the performance of decision trees and eager associative classifiers. Theorem 1 The rules derived. from. a decision tree are a subset of the CARs mined using an eager associative classifier based on information gain. Proof 1 Let maxe be the maximum entropy of all decision tree rules. Select a set Ce from all CARs such that their entropy is at most maxe. It is clear that the decision tree rules are a subset of Ce. Theorem 1 states that, for a given minsup, CARs contain (at least) all information of the corresponding decision tree. Since each decision tree rule may be seem as a CAR, and since all possible CARs were enumerated, then the decision tree can be built by choosing the proper CARs. Theorem 2 CARs perform no worse than decision tree rules, according to the information gain principle. 63
10 Proof 2 Given an instance to be classified, and, without loss of generality, a decision tree with just pure leaves, the decision tree predicts class c for that instance. We analyze two scenarios: first, just one CAR matches the instance; and s-econd, more than one CAR matches. When just one CAR matches, it is the same as the decision tree rule, since the set of CARs subsumes the set of decision rules. In this case, the - associative classifier and the decision tree make the same prediction. When more than one CAR matches an instance, the prediction may be either the same class (say c) as the matching decision rule or another class. If the associative classifier predicts c then the two approaches are equivalent. In case a class other than c is predicted, by definition, the best matching CAR provides a better information gain than the decision rule, and thus, according to the information gain principle, the CAR will make a better prediction. Theorem 2 states that the additional CARs of the associative classifier that are riot in the decision tree, cannot degrade the classification accuracy. This is because an additional CAR is only used if it is better than all decision rules (according to the information gain principle). However, eager associative classifiers generate a large number of CARs, most of which are useless during classification. For instance, from the set of 13 CARs showed in Figure 4, only 3 match the test instance (the remaining 10 CARs are useless). Next, we present a lazy classifier and compare it to the eager version described in this section. 4.6 Lazy Associative Classifier Unlike the eager associative classifier that extracts a set of ranked CARs from the training data, the lazy associative classifier induces CARs specific to each test instance. The lazy approach projects the training.data, D, only on those features in the test instance, A. From this projected training data, DA, the CARs are induced and I ranked, and the best CAR is used. From the set of all training instances, D, only the instances sharing at least one feature with the test instance A are used to form DA. 64
11 Then, a rule-set Cl A is generated from DA. Since DA contains only features in A, all CARs generated from DA must match A. The lazy associative classifier is presented in Figure 5. let D be the set of all n training instances let T be the set of all m test instances. 1. for each ti 2 T do 2. let Dti be the projection ofd on features only from ti 3. let Cti be the set of all rules {X! c} mined from Dti 4. sort C 1 ti according to information gain 5. pick the first rule {X! c} 2 C 1 1i, and predict class c Figure 5. Lazy Associative Classifier Now we demonstrate that the lazy associative classifier produces better results than its eager counterpart. Given a test instance A, and a set of CARs C, we denote by CA those CARs {X! c} in C where X A Any-label This evaluation technique measures how many times any of the predicted labels of an instance matches. the actual class label in all cases of that instance in the test data. If any of the predicted class labels of an instanced matches the true class label y Label-weight This technique enables each predicted label for an instance to play a role in classifying a test case, basedo~ its ranking,a~d therefore it could be considered as a multi-label evaluation measure. An instance may belong to several class labels, each one associated with it by a number of occurrences in the training data. Each class label can be assigned a weight according to how many times that label has been associated with the instance. Let rule rj be associated with a list of ranked labels. 65
12 We have conducted an extensive perfonnance study to evaluate accuracy and efficiency of CP AR and compare it with that of C4.5 [8], RIPPER [3], CBA [7] and CMAR [6]. As in [7] and [6], 26 datasets from UCI Machine Learning Repository are used. All the experiments are perfonned on a 1.7GHz Pentium-4 PC with 1GB main memory. All the approaches are implemented by their authors. The parameters of CP AR are set as the following. In the rule generation algorithm is set to 0:05, min gain to 0:7, and_ to 2=3. The best 5 rules are used in prediction. Table 1 shows the accuracy of the _ve approaches on 26 datasets from UCI ML Repository. 10-fold cross validation is used for every dataset. Table 2 compares the running (training) time of RIPPER, CMAR (which is claimed to be more e cient than CBA and CP AR on the 26 datasets. Notice that Table 2 uses both arithmetic and geometric average. This is because the running times of di_erent datasets di_er a lot, and the arithmetic average is dominated by the most time-consuming datasets. Using geometric average, equal weight is put on every dataset. Thus we consider geometric average as a more reasonable measure. Table 3 shows the average number of rules used in RIPPER, CMAR and CP AR. 66
13 Dataset c4.5 ripper cba cmar cpar anneal austral auto breast cleve crx diabetes german glass heart h~atic horse hypo n ms labor led lymph pima sick sonar tic-tac vehicle waveform wme zoo Average Table 1: Accuracy: C4.5, RIPPER,CBA, CMAR and CPAR RIPPER CMAR CPAR Arithmetic average Geometric average Table 2: Running time (in sec.): RIPPER, CMAR and CP AR 6 67
14 RIPPER CMAR CPAR Arithmetic average Geometric average Table 3: Number of rules: RIPPER, CMAR and CP AR 68
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationOLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationDATA MINING METHODS WITH TREES
DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationDecision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationData Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
More informationAn Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationEmployer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
More informationOverview. Background. Data Mining Analytics for Business Intelligence and Decision Support
Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam
ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationNew Approach of Computing Data Cubes in Data Warehousing
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationDiscretization and grouping: preprocessing steps for Data Mining
Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationWhat is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM
Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationCRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining
Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining
More informationImplementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationExtend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationA New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationScoring the Data Using Association Rules
Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg
More information1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify
More informationBig Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu
Big Data: The Science of Patterns Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu The Blessing and the Curse: Lots of Data Outlook Temp Humidity Wind Play Sunny Hot High Weak No
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationIntroducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
More informationData Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationData Mining Individual Assignment report
Björn Þór Jónsson bjrr@itu.dk Data Mining Individual Assignment report This report outlines the implementation and results gained from the Data Mining methods of preprocessing, supervised learning, frequent
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationPredicting earning potential on Adult Dataset
MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:
More informationON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationFluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationData Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationUnderstanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
More informationDATA WAREHOUSE E KNOWLEDGE DISCOVERY
DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA
More informationAnalysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationAMIS 7640 Data Mining for Business Intelligence
The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2013, Session
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More informationCOM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
More informationfrom Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationThe University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationMeta-learning. Synonyms. Definition. Characteristics
Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore wduch@is.umk.pl (or search
More informationBusiness Intelligence. Data Mining and Optimization for Decision Making
Brochure More information from http://www.researchandmarkets.com/reports/2325743/ Business Intelligence. Data Mining and Optimization for Decision Making Description: Business intelligence is a broad category
More informationORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
More information