Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier"

Transcription

1 Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier

2 Contents Foreword Preface xix vii Chapter I Introduction I I. I What Motivated Data Mining? Why Is It Important? I 1.2 So, What Is Data Mining? Data Mining On What Kind of Data? Relational Databases Data Warehouses Transactional Databases Advanced Database Systems and Advanced Database Applications Data Mining Functionalities What Kinds of Patterns Can Be Mined? Concept/Class Description: Characterization and Discrimination Association Analysis Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis Are All of the Patterns Interesting? Classification of Data Mining Systems Major Issues in Data Mining Summary 33 Exercises 34 Bibliographic Notes 35 Chapter 2 Data Warehouse and OLAP Technology for Data Mining What Is a Data Warehouse? Differences between Operational Database Systems and Data Warehouses But, Why Have a Separate Data Warehouse? 44 ix

3 x Contents 2.2 A Multidimensional Data Model From Tables and Spreadsheets to Data Cubes Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Databases Examples for Defining Stan Snowflake, and Fact Constellation Schemas Measures: Their Categorization and Computation Introducing Concept Hierarchies OLAP Operations in the Multidimensional Data Model A Starnet Query Model for Querying Multidimensional Databases Data Warehouse Architecture Steps for the Design and Construction of Data Warehouses A Three-Tier Data Warehouse Architecture Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP Data Warehouse Implementation Efficient Computation of Data Cubes Indexing OLAP Data Efficient Processing of OLAP Queries Metadata Repository Data Warehouse Back-End Tools and Utilities Further Development of Data Cube Technology Discovery-Driven Exploration of Data Cubes Complex Aggregation at Multiple Granularities: Multifeature Cubes Other Developments From Data Warehousing to Data Mining Data Warehouse Usage From On-Line Analytical Processing to On-Line Analytical Mining Summary 98 Exercises 99 Bibliographic Notes 103 Chapter 3 Data Preprocessing Why Preprocess the Data? Data Cleaning Missing Values Noisy Data I Inconsistent Data I Data Integration and Transformation Data Integration I Data Transformation 114

4 Contents xi 3.4 Data Reduction Data Cube Aggregation I Dimensionality Reduction I Data Compression Numerosity Reduction Discretization and Concept Hierarchy Generation Discretization and Concept Hierarchy Generation for Numeric Data Concept Hierarchy Generation for Categorical Data Summary 140 Exercises 141 Bibliographic Notes 142 Chapter 4 Data Mining Primitives, Languages, and System Architectures Data Mining Primitives: What Defines a Data Mining Task? Task-Relevant Data The Kind of Knowledge to be Mined Background Knowledge: Concept Hierarchies Interestingness Measures Presentation and Visualization of Discovered Patterns A Data Mining Query Language Syntax for Task-Relevant Data Specification Syntax for Specifying the Kind of Knowledge to be Mined Syntax for Concept Hierarchy Specification Syntax for Interestingness Measure Specification Syntax for Pattern Presentation and Visualization Specification Putting It All Together An Example of a DMQL Query Other Data Mining Languages and the Standardization of Data Mining Primitives Designing Graphical User Interfaces Based on a Data Mining Query Language Architectures of Data Mining Systems Summary 174 Exercises 174 Bibliographic Notes 176 Chapter 5 Concept Description: Characterization and Comparison What Is Concept Description? Data Generalization and Summarization-Based Characterization 181

5 xii Contents Attribute-Oriented Induction Efficient Implementation of Attribute-Oriented Induction Presentation of the Derived Generalization Analytical Characterization: Analysis of Attribute Relevance Why Perform Attribute Relevance Analysis? Methods of Attribute Relevance Analysis Analytical Characterization: An Example Mining Class Comparisons: Discriminating between Different Classes Class Comparison Methods and Implementations Presentation of Class Comparison Descriptions Class Description: Presentation of Both Characterization and Comparison Mining Descriptive Statistical Measures in Large Databases Measuring the Central Tendency Measuring the Dispersion of Data Graph Displays of Basic Statistical Class Descriptions Discussion Concept Description: A Comparison with Typical Machine Learning Methods Incremental and Parallel Mining of Concept Description Summary 220 Exercises 222 Bibliographic Notes 223 Chapter 6 Mining Association Rules in Large Databases Association Rule Mining Market Basket Analysis: A Motivating Example for Association Rule Mining Basic Concepts Association Rule Mining: A Road Map Mining Single-Dimensional Boolean Association Rules from Transactional Databases The Apriori Algorithm: Finding Frequent Itemsets Using Candidate Generation Generating Association Rules from Frequent Itemsets Improving the Efficiency of Apriori Mining Frequent Itemsets without Candidate Generation Iceberg Queries Mining Multilevel Association Rules from Transaction Databases 244

6 Contents xiii Multilevel Association Rules Approaches to Mining Multilevel Association Rules Checking for Redundant Multilevel Association Rules Mining Multidimensional Association Rules from Relational Databases and Data Warehouses Multidimensional Association Rules Mining Multidimensional Association Rules Using Static Discretization of Quantitative Attributes Mining Quantitative Association Rules Mining Distance-Based Association Rules From Association Mining to Correlation Analysis Strong Rules Are Not Necessarily Interesting: An Example From Association Analysis to Correlation Analysis Constraint-Based Association Mining Metarule-Guided Mining of Association Rules Mining Guided by Additional Rule Constraints Summary 269 Exercises 271 Bibliographic Notes 276 Chapter 7 Classification and Prediction What Is Classification? What Is Prediction? Issues Regarding Classification and Prediction Preparing the Data for Classification and Prediction Comparing Classification Methods Classification by Decision Tree Induction Decision Tree Induction Tree Pruning Extracting Classification Rules from Decision Trees Enhancements to Basic Decision Tree Induction Scalability and Decision Tree Induction Integrating Data Warehousing Techniques and Decision Tree Induction Bayesian Classification Bayes Theorem Naive Bayesian Classification Bayesian Belief Networks Training Bayesian Belief Networks Classification by Backpropagation A Multilayer Feed-Forward Neural Network Defining a Network Topology 304

7 xiv Contents Backpropagation Backpropagation and Interpretability Classification Based on Concepts from Association Rule Mining Other Classification Methods k-nearest Neighbor Classifiers Case-Based Reasoning Genetic Algorithms Rough Set Approach Fuzzy Set Approaches Prediction Linear and Multiple Regression Nonlinear Regression Other Regression Models Classifier Accuracy Estimating Classifier Accuracy Increasing Classifier Accuracy Is Accuracy Enough to Judge a Classifier? Summary 326 Exercises 328 Bibliographic Notes 330 Chapter 8 Cluster Analysis What Is Cluster Analysis? Types of Data in Cluster Analysis Interval-Scaled Variables Binary Variables Nominal, Ordinal, and Ratio-Scaled Variables Variables of Mixed Types A Categorization of Major Clustering Methods Partitioning Methods Classical Partitioning Methods: k-means and k-medoids Partitioning Methods in Large Databases: From k-medoids to CLARANS Hierarchical Methods Agglomerative and Divisive Hierarchical Clustering BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies CURE: Clustering Using REpresentatives Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling 361

8 Contents xv 8.6 Density-Based Methods DBSCAN: A Density-Based Clustering Method Based on Connected Regions with Sufficiently High Density OPTICS: Ordering Points To Identify the Clustering Structure DENCLUE: Clustering Based on Density Distribution Functions Grid-Based Methods STING: STatistical INformation Grid WaveCluster: Clustering Using Wavelet Transformation CLIQUE: Clustering High-Dimensional Space Model-Based Clustering Methods Statistical Approach Neural Network Approach Outlier Analysis Statistical-Based Outlier Detection Distance-Based Outlier Detection Deviation-Based Outlier Detection Summary 388 Exercises 389 Bibliographic Notes 391 Chapter 9 Mining Complex Types of Data Multidimensional Analysis and Descriptive Mining of Complex Data Objects Generalization of Structured Data Aggregation and Approximation in Spatial and Multimedia Data Generalization Generalization of Object Identifiers and Class/Subclass Hierarchies Generalization of Class Composition Hierarchies Construction and Mining of Object Cubes Generalization-Based Mining of Plan Databases by Divide-and- Conquer Mining Spatial Databases Spatial Data Cube Construction and Spatial OLAP Spatial Association Analysis Spatial Clustering Methods 41 I Spatial Classification and Spatial Trend Analysis 41 I Mining Raster Databases Mining Multimedia Databases Similarity Search in Multimedia Data Multidimensional Analysis of Multimedia Data Classification and Prediction Analysis of Multimedia Data 416

9 xvi Contents Mining Associations in Multimedia Data Mining Time-Series and Sequence Data Trend Analysis Similarity Search in Time-Series Analysis Sequential Pattern Mining Periodicity Analysis Mining Text Databases Text Data Analysis and Information Retrieval Text Mining: Keyword-Based Association and Document Classification Mining the World Wide Web Mining the Web's Link Structures to Identify Authoritative Web Pages Automatic Classification of Web Documents Construction of a Multilayered Web Information Base Web Usage Mining Summary 443 Exercises 444 Bibliographic Notes 446 Chapter 10 Applications and Trends in Data Mining Data Mining Applications Data Mining for Biomedical and DNA Data Analysis Data Mining for Financial Data Analysis Data Mining for the Retail Industry Data Mining for the Telecommunication Industry Data Mining System Products and Research Prototypes How to Choose a Data Mining System Examples of Commercial Data Mining Systems Additional Themes on Data Mining Visual and Audio Data Mining Scientific and Statistical Data Mining Theoretical Foundations of Data Mining Data Mining and Intelligent Query Answering Social Impacts of Data Mining Is Data Mining a Hype or a Persistent, Steadily Growing Business? Is Data Mining Merely Managers' Business or Everyone's Business? Is Data Mining a Threat to Privacy and Data Security? Trends in Data Mining 478

10 Contents xvii 10.6 Summary 480 Exercises 481 Bibliographic Notes 483 Appendix A An Introduction to Microsoft's OLE DB for Data Mining 485 A. I Creating a DMM object 486 A.2 Inserting Training Data into the Model and Training the Model 488 A3 Using the Model 488 Appendix В An Introduction to DBMiner 493 B. I System Architecture 494 B.2 Input and Output 494 B.3 Data Mining Tasks Supported by the System 495 B.4 Support for Task and Method Selection 498 B.5 Support of the KDD Process 499 B.6 Main Applications 499 B.7 Current Status 499 Bibliography 501 Index 533

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Second Edition Jiawei Han University of Illinois at Urbana-Champaign Micheline Karnber AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

KINGS COLLEGE OF ENGINEERING

KINGS COLLEGE OF ENGINEERING KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR 2011-2012 / ODD SEMESTER SUBJECT CODE\NAME: CS1011-DATA WAREHOUSE AND DATA MINING YEAR / SEM: IV / VII UNIT I BASICS

More information

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining 1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify

More information

COMPUTER SCIENCE AND ENGINEERING COURSE DESCRIPTION FORM

COMPUTER SCIENCE AND ENGINEERING COURSE DESCRIPTION FORM Course Title Course Code Regulation Course Structure COMPUTER SCIENCE AND ENGINEERING COURSE DESCRIPTION FORM DATA MINING AND DATA WAREHOUSING A70520 R13 - JNTUH Lectures Tutorials Practicals 4 - - Course

More information

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Data Warehousing and Data Mining Course Code:

More information

Data Mining Curriculum: A Proposal (Version 1.0)

Data Mining Curriculum: A Proposal (Version 1.0) Data Mining Curriculum: A Proposal (Version 1.0) Intensive Working Group of ACM SIGKDD Curriculum Committee: Soumen Chakrabarti, Martin Ester, Usama Fayyad, Johannes Gehrke, Jiawei Han, Shinichi Morishita,

More information

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

More information

AT78 DATA MINING & WAREHOUSING JUNE 2013

AT78 DATA MINING & WAREHOUSING JUNE 2013 Q2 (a) What is the difference between discrimination and classification? Discrimination differs from classification in that the former refers to a comparison of the general features of target class data

More information

CS1011: DATA WAREHOUSING AND MINING TWO MARKS QUESTIONS AND ANSWERS

CS1011: DATA WAREHOUSING AND MINING TWO MARKS QUESTIONS AND ANSWERS CS1011: DATA WAREHOUSING AND MINING TWO MARKS QUESTIONS AND ANSWERS 1.Define Data mining. It refers to extracting or mining knowledge from large amount of data. Data mining is a process of discovering

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Lectures for the course: Data Warehousing and Data Mining (406035)

Lectures for the course: Data Warehousing and Data Mining (406035) Lectures for the course: Data Warehousing and Data Mining (406035) Week 1 Lecture 1 Discussions on the need for data warehousing How DW is different from OLTP databases Week 2 Lecture 2 Evaluation norms

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Subject Description Form

Subject Description Form Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain. Q2. (a) List and describe the five primitives for specifying a data mining task. Data Mining Task Primitives (b) How data mining is different from knowledge discovery in databases (KDD)? Explain. IETE

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

4-06-35. John R. Vacca INSIDE

4-06-35. John R. Vacca INSIDE 4-06-35 INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND TECHNOLOGIES ONLINE DATA MINING John R. Vacca INSIDE Online Analytical Modeling (OLAM); OLAM Architecture and Features; Implementation Mechanisms;

More information

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students

More information

Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Winter Semester 2012/2013 Free University of Bozen, Bolzano DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html Organization

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Data Mining Introduction

Data Mining Introduction Data Mining Introduction Organization Lectures Mondays and Thursdays from 10:30 to 12:30 Lecturer: Mouna Kacimi Office hours: appointment by email Labs Thursdays from 14:00 to 16:00 Teaching Assistant:

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

CHAPTER-29 Data Mining, System Products and Research Prototypes

CHAPTER-29 Data Mining, System Products and Research Prototypes CHAPTER-29 Data Mining, System Products and Research Prototypes 29.1 How to Choose a Data Mining System 29.2 Data, mining functions and methodologies: 29.3 Coupling data mining with database anti/or data

More information

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition Brochure More information from http://www.researchandmarkets.com/reports/2171322/ Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition Description: This book reviews state-of-the-art methodologies

More information

PROPOSAL TO INTRODUCE A NEW COURSE

PROPOSAL TO INTRODUCE A NEW COURSE PROPOSAL TO INTRODUCE A NEW COURSE (formerly known as subject) 1. COURSE DETAILS 1.1 Course ID COMP9318 1.2 Course name - Long Data Warehousing and Data Mining 1.3 Course name - Abbreviated Data Warehousing

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis 3.1 Basic Concepts of Clustering 3.2 Partitioning Methods 3.3 Hierarchical Methods 3.4 Density-Based Methods 3.5 Model-Based Methods 3.6 Clustering High-Dimensional Data 3.7

More information

Contents. Dedication List of Figures List of Tables. Acknowledgments

Contents. Dedication List of Figures List of Tables. Acknowledgments Contents Dedication List of Figures List of Tables Foreword Preface Acknowledgments v xiii xvii xix xxi xxv Part I Concepts and Techniques 1. INTRODUCTION 3 1 The Quest for Knowledge 3 2 Problem Description

More information

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Lecture Clustering Methods CITS CITS Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement: The

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011 DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Data Warehouse Design

Data Warehouse Design Data Warehouse Design Modern Principles and Methodologies Matteo Golfarelli Stefano Rizzi Translated by Claudio Pagliarani Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City

More information

CHAPTER 4 Data Warehouse Architecture

CHAPTER 4 Data Warehouse Architecture CHAPTER 4 Data Warehouse Architecture 4.1 Data Warehouse Architecture 4.2 Three-tier data warehouse architecture 4.3 Types of OLAP servers: ROLAP versus MOLAP versus HOLAP 4.4 Further development of Data

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,

More information

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Data Mining. Vera Goebel. Department of Informatics, University of Oslo Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD

More information

Knowledge Discovery in Data with FIT-Miner

Knowledge Discovery in Data with FIT-Miner Knowledge Discovery in Data with FIT-Miner Michal Šebek, Martin Hlosta and Jaroslav Zendulka Faculty of Information Technology, Brno University of Technology, Božetěchova 2, Brno {isebek,ihlosta,zendulka}@fit.vutbr.cz

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

Terminology and Definitions. Data Warehousing and OLAP. Data Warehouse characteristics. Data Warehouse Types. Typical DW Implementation

Terminology and Definitions. Data Warehousing and OLAP. Data Warehouse characteristics. Data Warehouse Types. Typical DW Implementation Data Warehousing and OLAP Topics Introduction Data modelling in data warehouses Building data warehouses View Maintenance OLAP and data mining Reading Lecture Notes Elmasriand Navathe, Chapter 26 Ozsu

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Data Mining as Part of Knowledge Discovery in Databases (KDD) Mining as Part of Knowledge Discovery in bases (KDD) Presented by Naci Akkøk as part of INF4180/3180, Advanced base Systems, fall 2003 (based on slightly modified foils of Dr. Denise Ecklund from 6 November

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Contents RELATIONAL DATABASES

Contents RELATIONAL DATABASES Preface xvii Chapter 1 Introduction 1.1 Database-System Applications 1 1.2 Purpose of Database Systems 3 1.3 View of Data 5 1.4 Database Languages 9 1.5 Relational Databases 11 1.6 Database Design 14 1.7

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

A Comparative Study of clustering algorithms Using weka tools

A Comparative Study of clustering algorithms Using weka tools A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering

More information

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview

More information

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information

COURSE NAME: DATA WAREHOUSING & DATA MINING

COURSE NAME: DATA WAREHOUSING & DATA MINING COURSE NAME: DATA WAREHOUSING & DATA MINING LECTURE 5 TOPICS TO BE COVERED: OLTP vs OLAP ROLAP vs MOLAP types of OLAP servers, OLAP SERVER An OLAP Server is a high capacity, multi user data manipulation

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining MIT-652 Data Mining Applications Thimaporn Phetkaew School of Informatics, Walailak University MIT-652: DM 1: Introduction to Data Mining 1 Introduction Motivation: Why data

More information

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation. Federico Rajola Customer Relationship Management in the Financial Industry Organizational Processes and Technology Innovation Second edition ^ Springer Contents 1 Introduction 1 1.1 Identification and

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

DATA WAREHOUSE E KNOWLEDGE DISCOVERY DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution

Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution Q2 What are Data Mining Activities? Q3 What are the basic ideas guide the creation of a data warehouse?

More information

Proposed Application of Data Mining Techniques for Clustering Software Projects

Proposed Application of Data Mining Techniques for Clustering Software Projects Proposed Application of Data Mining Techniques for Clustering Software Projects HENRIQUE RIBEIRO REZENDE 1 AHMED ALI ABDALLA ESMIN 2 UFLA - Federal University of Lavras DCC - Department of Computer Science

More information

Visual Data Mining in Indian Election System

Visual Data Mining in Indian Election System Visual Data Mining in Indian Election System Prof. T. M. Kodinariya Asst. Professor, Department of Computer Engineering, Atmiya Institute of Technology & Science, Rajkot Gujarat, India trupti.kodinariya@gmail.com

More information

Data Mining - Introduction

Data Mining - Introduction Data Mining - Introduction Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 Outline Business Intelligence and its components Knowledge discovery

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

II. OLAP(ONLINE ANALYTICAL PROCESSING)

II. OLAP(ONLINE ANALYTICAL PROCESSING) Association Rule Mining Method On OLAP Cube Jigna J. Jadav*, Mahesh Panchal** *( PG-CSE Student, Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India) **

More information

Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI. XI. Cluster Analysis Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

More information

Master s Program in Information Systems

Master s Program in Information Systems The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems

More information

INFORMATION FILTERS SUPPLYING DATA WAREHOUSES WITH BENCHMARKING INFORMATION 1 Witold Abramowicz, 1. 2. 3. 4. 5. 6. 7. 8.

INFORMATION FILTERS SUPPLYING DATA WAREHOUSES WITH BENCHMARKING INFORMATION 1 Witold Abramowicz, 1. 2. 3. 4. 5. 6. 7. 8. Contents PREFACE FOREWORD xi xiii LIST OF CONTRIBUTORS xv Chapter 1 INFORMATION FILTERS SUPPLYING DATA WAREHOUSES WITH BENCHMARKING INFORMATION 1 Witold Abramowicz, 1 Data Warehouses 2 The HyperSDI System

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

Introduction to Data Mining

Introduction to Data Mining Bioinformatics Ying Liu, Ph.D. Laboratory for Bioinformatics University of Texas at Dallas Spring 2008 Introduction to Data Mining 1 Motivation: Why data mining? What is data mining? Data Mining: On what

More information

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

Knowledge Discovery from Data Bases Proposal for a MAP-I UC Knowledge Discovery from Data Bases Proposal for a MAP-I UC João Gama (jgama@fep.up.pt) Universidade do Porto 1 Knowledge Discovery from Data Bases We are deluged by data: scientific data, medical data,

More information

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 Cleveland State University Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Build a Data Mining Model using Data

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Development of a Data Mining Course for Undergraduate Students

Development of a Data Mining Course for Undergraduate Students Development of a Data Mining Course for Undergraduate Students Terri L. Lenox and Carolyn Cuff Department of Mathematics and Computer Science, Westminster College New Wilmington, PA 16172 USA ABSTRACT

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Clustering Model for Evaluating SaaS on the Cloud

Clustering Model for Evaluating SaaS on the Cloud Clustering Model for Evaluating SaaS on the Cloud 1 Mrs. Dhanamma Jagli, 2 Mrs. Akanksha Gupta 1 Assistant Professor, V.E.S Institute of Technology, Mumbai, India 2 Student, M.E (IT) 2 nd year, V.E.S Institute

More information