Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1



Similar documents
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining + Business Intelligence. Integration, Design and Implementation

not possible or was possible at a high cost for collecting the data.

Database Marketing, Business Intelligence and Knowledge Discovery

Social Media Mining. Data Mining Essentials

Fluency With Information Technology CSE100/IMT100

Using Data Mining for Mobile Communication Clustering and Characterization

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Chapter ML:XI. XI. Cluster Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

An Introduction to Data Mining

Essential Components of an Integrated Data Mining Tool for the Oil & Gas Industry, With an Example Application in the DJ Basin.

An Overview of Knowledge Discovery Database and Data mining Techniques

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

Data mining techniques: decision trees

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Chapter 20: Data Analysis

Data Mining Techniques

Data Mining Solutions for the Business Environment

Chapter 12 Discovering New Knowledge Data Mining

Data Mining: A Preprocessing Engine

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Data Mining for Fun and Profit

DATA MINING TECHNIQUES AND APPLICATIONS

Introduction to Data Mining Techniques

Rule based Classification of BSE Stock Data with Data Mining

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Course Syllabus. Purposes of Course:

Data Mining Fundamentals

An Overview of Database management System, Data warehousing and Data Mining

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

A Survey on Intrusion Detection System with Data Mining Techniques

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

A Review of Data Mining Techniques

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Warehousing and Data Mining in Business Applications

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Mining System, Functionalities and Applications: A Radical Review

IT services for analyses of various data samples

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Data Mining: Overview. What is Data Mining?

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

The KDD Process: Applying Data Mining

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Data Mining with SQL Server Data Tools

Applying Data Mining Technique to Sales Forecast

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Data Mining for Retail Website Design and Enhanced Marketing

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Comparison of Data Mining Techniques used for Financial Data Analysis

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

A Data Mining Tutorial

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Data Mining Algorithms Part 1. Dejan Sarka

Numerical Algorithms Group

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Information Management course

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

The Data Mining Process

Classification and Prediction

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

SPATIAL DATA CLASSIFICATION AND DATA MINING

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

Data Mining as Part of Knowledge Discovery in Databases (KDD)

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Sanjeev Kumar. contribute

Introduction. A. Bellaachia Page: 1

Data Mining Classification: Decision Trees

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

Introduction to Data Mining

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Part 5. Prediction

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

DHL Data Mining Project. Customer Segmentation with Clustering

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Machine Learning and Statistics: What s the Connection?

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Transcription:

Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1

1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints Alfred Holl Data Mining 2

1.1 Motivation Which goods are bought at the same time by which customers? What goods should be offered a certain customer? Will a customer pay his invoice? What is the probability that a customer will cancel his contract? What will the trend be next season? Alfred Holl Data Mining 3

1.2 Goals optimize business processes find competitive advantages analyze customer behavior derive theories about future developments Alfred Holl Data Mining 4

1.2 Problems huge amount of data important correlations cannot be found by humans Alfred Holl Data Mining 5

1.3 Definitions Data Mining means different methods, which allow to use computer-aided algorithms which analyze huge amounts of data for internal relations and discover new, unknown correlations in them (Kral, Data Mining, 1998, 11). Alfred Holl Data Mining 6

1.3 Definitions Knowledge discovery in databases is the non-trivial process to identify valid, new, potentially useful and finally understandable patterns in data (Fayyad, U. et al. 1996). Alfred Holl Data Mining 7

1.4 Roots Statistics (analysis of data relationships) Data base research (handling of huge amounts of data) Artificial Intelligence (data hypothesis) Alfred Holl Data Mining 8

1.5 The Data Mining process: overview Data Selection Extracted data Preliminary filtering Filtered data Transformation Transformed data Data mining Data patterns Interpretation / Evaluation Knowledge Alfred Holl Data Mining 9

1.5 The Data Mining process: overview Step 1: pre-processing data selection preliminary filtering transformation Step 2: processing execution of the Data Mining algorithm Step 3: post-processing interpretation, evaluation Alfred Holl Data Mining 10

1.5.1 The Data Mining process Step 1: pre-processing data selection understanding the application area identify goals define which data is relevant Alfred Holl Data Mining 11

1.5.1 The Data Mining process Step 1: pre-processing preliminary filtering complete data make data consistent integrate data Alfred Holl Data Mining 12

1.5.1 The Data Mining process Step 1: pre-processing transformation select attributes replace attributes by discrete attributes Alfred Holl Data Mining 13

1.5.1 The Data Mining process: important constraints of step 1 Pre-processing is 85% of the total work. Data integration can be supported by using Data Warehouses. Def. Data Warehouse: A company-wide enterprise concept with the goal to build a logically central, uniform and consistent database for the different applications supporting the analytical tasks of managers. Alfred Holl Data Mining 14

1.5.2 The Data Mining process Step 2: processing execution of the Data Mining algorithm find clusters find anomalies classify generalize Alfred Holl Data Mining 15

1.5.2 The Data Mining process Step 2: processing - typical tasks Alfred Holl Data Mining 16

1.5.3 The Data Mining process Step 3: post-processing interpretation, evaluation show patterns found evaluate patterns in comparison to goals predict future developments and behavior Alfred Holl Data Mining 17

1.6 Epistemological constraints Data Mining is always based upon reduction and abstraction of reality. Only connections between gathered data can be found in the final result. Not all of the theories derived are necessarily correct. The selection of data influences the result. The goals influence the selection of the data. Alfred Holl Data Mining 18

1.6 Epistemological constraints Alfred Holl Data Mining 19

1.6 Epistemological constraints Alfred Holl Data Mining 20

2 Data Mining Methods 2.1 Non-supervised learning 2.2 Supervised learning Alfred Holl Data Mining 21

2.1 Non-supervised learning no training examples segmentation and association segmentation: search for global partition of segments of data association: search for relations between data methods demographic, k-means, hierarchic clustering neural networks Alfred Holl Data Mining 22

2.1.1 Cluster analysis Alfred Holl Data Mining 23

2.1.2 Neural networks Alfred Holl Data Mining 24

2.2 Supervised learning starts from training examples with known classifications learns the classification via training examples uses the classification learnt methods: decision trees (ID3) neural networks rule induction k-nearest neighbors Alfred Holl Data Mining 25

2.2.1 Decision trees Decision tree: Visualization of a classification rule Each node tests attributes. Decision is made when a leaf is reached. Leaf: node without any children Alfred Holl Data Mining 26

2.2.1 Decision trees The basis of the construction of a decision tree is training data. Training data records consists of examples containing several attributes each. One attribute is the target attribute. Alfred Holl Data Mining 27

2.2.2 ID3: Induction of Decision Trees Example: insurance company Training data CustomerID Contract-term Occupation Cancellation Alfred Holl Data Mining 28

2.2.2 ID3 Entropy (S, Z) = - p(z) log2 p(z) S sample Z target attribute z target attribute value # S z z Z p(z) = probability that Z has value z # S In the example: Entropy (customers1-10, cancellation) = -0,3 * log2 0,3 0,7 * log2 0,7 = 0,881 Alfred Holl Data Mining 29

2.2.2 ID3 InfoGain (S, Z, A) = # Entropy (S, Z) - S v * Entropy (Sv, Z) v A # S A attribute whose InfoGain is calculated v index for all possible values of A Sv subset of S whose elements have the value v in A InfoGain (customers1-10, cancellation, contract-term) = Entropy (customers1-10, cancellation) v Vertragsdauer # S # S v * Entropy (Sv, cancellation) Alfred Holl Data Mining 30

2.2.2 ID3 For contract-term = short: 2/10 * Entropy (Scontracttermshort, cancellation) = 0 For contract-term = medium: 5/10 * Entropy (Scontracttermmedium, cancellation) = 0 For contract-term = long: 3/10 * (- 1/3 * log2 1/3 2/3 * log2 2/3) = 0,275 In the example, the InfoGain is: InfoGain (customers1-10, contract-term) = 0,881 0,275 = 0,606 Alfred Holl Data Mining 31

2.2.2 ID3 For occupation: InfoGain (customers1-10, cancellation, occupation) = 0,881- (#Sunemployed / #S * Entropy (Sunemployed) + #Sofficial / #S * Entropy (Sofficial) + #Semployee / #S * Entropy (Semployee) + #Sfreelance / #S * Entropy (Sfreelance)) = 3/10 * 0 + 2/10 * ( -½ * log2 ½ - ½ * log2 ½) + 3/10 * ( -⅓ * log2 ⅓ 2/3 * log2 2/3) + 2/10 * ( -½ * log2 ½ - ½ * log2 ½) = 2/10 * 1 + 0,275 + 2/10 * 1 = 0,675 InfoGain (customers1-10, cancellation, occupation) = 0,881 0,675 = 0,206 Alfred Holl Data Mining 32

2.2.2 ID3 InfoGain (customers1-10, cancellation, contract-term) > InfoGain (customers1-10, cancellation, occupation) Alfred Holl Data Mining 33

2.2.2 ID3 New customers (no training data) CustomerID Contract-term Occupation Alfred Holl Data Mining 34

2.2.2 ID3 Decision rules: With a short-term contract, the customer is insecure. With a medium-term contract, the customer is not insecure. With a long-term contract and occupation unemployed, the customer is not insecure. With a long-term contract and occupation official, the customer is not insecure. With a long-term contract and occupation freelance, the customer is insecure. Alfred Holl Data Mining 35