Chapter ML:XI. XI. Cluster Analysis



Similar documents
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Chapter ML:XI (continued)

Introduction. A. Bellaachia Page: 1

Introduction to Data Mining

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Warehousing and Data Mining in Business Applications

Dynamic Data in terms of Data Mining Streams

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining Solutions for the Business Environment

Data Mining for Fun and Profit

SPATIAL DATA CLASSIFICATION AND DATA MINING

Introduction to Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

BUSINESS ANALYTICS. Overview. Lecture 0. Information Systems and Machine Learning Lab. University of Hildesheim. Germany

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

Information Management course

Part 22. Data Warehousing

Use of Data Mining in the field of Library and Information Science : An Overview

A Review of Data Mining Techniques

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

TEXT ANALYTICS INTEGRATION

Data Mining System, Functionalities and Applications: A Radical Review

Building Data Cubes and Mining Them. Jelena Jovanovic

Data Mining Analytics for Business Intelligence and Decision Support

Hexaware E-book on Predictive Analytics

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

TIM 50 - Business Information Systems

Database Marketing, Business Intelligence and Knowledge Discovery

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

How To Use Neural Networks In Data Mining

not possible or was possible at a high cost for collecting the data.

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Foundations of Business Intelligence: Databases and Information Management

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING TECHNIQUES FOR CRM

SQL Server 2012 Business Intelligence Boot Camp

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Data Mining for Successful Healthcare Organizations

Foundations of Business Intelligence: Databases and Information Management

Concept and Applications of Data Mining. Week 1

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Mining Algorithms Part 1. Dejan Sarka

ANALYTICS IN BIG DATA ERA

Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Data Mining: Overview. What is Data Mining?

Healthcare Measurement Analysis Using Data mining Techniques

A Survey on Web Research for Data Mining

Course MIS. Foundations of Business Intelligence

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Master of Science in Health Information Technology Degree Curriculum

Data Warehousing and OLAP Technology for Knowledge Discovery

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

A Knowledge Management Framework Using Business Intelligence Solutions

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Fluency With Information Technology CSE100/IMT100

Business Intelligence and Decision Support Systems

Reinventing Business Intelligence through Big Data

Building a Database to Predict Customer Needs

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

Business Intelligence: Effective Decision Making

IDCORP Business Intelligence. Know More, Analyze Better, Decide Wiser

Clustering Technique in Data Mining for Text Documents

Foundations of Business Intelligence: Databases and Information Management

2.1. Data Mining for Biomedical and DNA data analysis

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Inner Classification of Clusters for Online News

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Data Mining + Business Intelligence. Integration, Design and Implementation

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

University of Gaziantep, Department of Business Administration

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Data Warehouse: Introduction

Statistics for BIG data

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Transcription:

Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis ML:XI-1 Cluster Analysis STEIN 2002-2013

Definition 2 (Data Mining) Data mining is the systematic, usually automated or semi-automated discovery and extraction of so far unknown relations from huge data sets. Data mining involves the following steps: 1. specification of the task 2. selection of the data 3. data preprocessing and data transformation 4. pattern recognition 5. presentation ML:XI-2 Cluster Analysis STEIN 2002-2013

Interpretation and evaluation Knowledge Data consolidation Selection and preprocessing Data mining Prepared data Patterns and models Data warehouse (consolidated data) Data source Definition 3 (Knowledge Discovery in Databases, KDD) Knowledge Discovery in Databases is the process of identifying valid, new, relevant, and interpretable patterns in huge data sets. [Fayyad 1996, Wrobel 1998] ML:XI-3 Cluster Analysis STEIN 2002-2013

Remarks: Data mining technology belongs to the field of explorative data analysis. Explorative data analysis deals with both data presentation and search for structures, peculiarities, and anomalies. It is employed if the research question is fuzzy or if the choice of the statistical model is unclear. The data mining definition does not use the notion of information : under the viewpoint of semiotics, data mining operates on the sigmatic layer only. The interpretation of discovered patterns, i.e., the examination of information with regard to new findings and a subjective knowledge gain, which happens on the pragmatic layer, belongs to the field of KDD. In the business world, the terms data mining and knowledge discovery in databases, KDD, are used synonymously. Note however, that data mining designates only a single step within a KDD process, namely the analysis step for pattern recognition. Web data mining is the transfer and usage of data mining technology for information extraction on the Internet and especially the World Wide Web. Text mining is the identification of relevant information in text. ML:XI-4 Cluster Analysis STEIN 2002-2013

OLAP, Online Analytical Processing KDD, Knowledge Discovery in Databases Data mining, Web mining, Text mining Scenario: gigabytes, databases, on the (semantic) Web, in unstructured text Machine learning Scenario: in main memory, specific deduction model Statistic analysis Scenario: clean data, hypothesis evaluation Explorative data analysis ML:XI-5 Cluster Analysis STEIN 2002-2013

Analysis Information visualization Data aggregation... Descriptive data analysis OLAP, Online Analytical Processing KDD, Knowledge Discovery in Databases Data mining, Web mining, Text mining Scenario: gigabytes, databases, on the (semantic) Web, in unstructured text Machine learning Scenario: in main memory, specific deduction model Statistic analysis Scenario: clean data, hypothesis evaluation Explorative data analysis ML:XI-6 Cluster Analysis STEIN 2002-2013

Information visualization OLAP, Online Analytical Processing KDD, Knowledge Discovery in Databases Pragmatics Semantics knowledge Analysis Data aggregation... Data mining, Web mining, Text mining Scenario: gigabytes, databases, on the (semantic) Web, in unstructured text Machine learning Scenario: in main memory, specific deduction model Sigmatics data Statistic analysis Scenario: clean data, hypothesis evaluation Syntax Descriptive data analysis Explorative data analysis Semiotics layer ML:XI-7 Cluster Analysis STEIN 2002-2013

Retrieval Information retrieval, Information extraction Structured query processing Information need Information visualization OLAP, Online Analytical Processing KDD, Knowledge Discovery in Databases Pragmatics Semantics knowledge Analysis Data aggregation... Data mining, Web mining, Text mining Scenario: gigabytes, databases, on the (semantic) Web, in unstructured text Machine learning Scenario: in main memory, specific deduction model Sigmatics data Statistic analysis Scenario: clean data, hypothesis evaluation Syntax Descriptive data analysis Explorative data analysis Semiotics layer ML:XI-8 Cluster Analysis STEIN 2002-2013

Remarks: A clear separation between machine learning and data mining is not always possible. A key difference, however, results from the sizes of the analyzed data sets: machine learning applications are usually executed in main memory. The field of data mining arose from the necessity to apply analysis methods to large data bases. The foci of machine learning are the processes and theories of learning and deduction, such as analogical reasoning, learning from examples, or reinforcement learning. The major driving force behind data mining is the business world with their large data bases. The following count to relevant data mining problems: undirected association analysis to identify dependencies between consumer products (market basket analysis) cluster analysis and categorization filtering of process data forecasting and prediction ML:XI-9 Cluster Analysis STEIN 2002-2013

Methods and Tools cluster analysis Learning of propositional or description-logical rules. Example: IF status=married AND house_owner=true THEN creditor=good Learning of association rules. Example: 75% of the buyers of product A will buy the products B, C, and D as well. principal component analysis (PCA), factor analysis multi-dimensional scaling (MDS) ML:XI-10 Cluster Analysis STEIN 2002-2013