Introduction to Data Mining

Similar documents
Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

CRISP-DM: The life cicle of a data mining project. KDD Process

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

CS590D: Data Mining Chris Clifton

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Introduction to Data Mining

Data Mining and Neural Networks in Stata

Data Mining: Introduction

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Data Mining Applications in Fund Raising

Data Mining System, Functionalities and Applications: A Radical Review

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Knowledge Discovery Process and Data Mining - Final remarks

Data Mining Analytics for Business Intelligence and Decision Support

CRISP-DM: Towards a Standard Process Model for Data Mining

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Database Marketing, Business Intelligence and Knowledge Discovery

Data Mining course Master in Information Technologies Enginyeria Informàtica Tomàs Aluja. LIAM EIO. UPC Lluis Belanche LSI. UPC

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining Applications in Higher Education

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

DATA MINING TECHNIQUES FOR CRM

relevant to the management dilemma or management question.

not possible or was possible at a high cost for collecting the data.

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Comparison of K-means and Backpropagation Data Mining Algorithms

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

The University of Jordan

Data Mining: Benefits for business.

Course Syllabus For Operations Management. Management Information Systems

Data Mining Solutions for the Business Environment

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

MBA Data Mining & Knowledge Discovery

from Larson Text By Susan Miertschin

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining Techniques in CRM

SPATIAL DATA CLASSIFICATION AND DATA MINING

Data Mining Techniques

Data Mining: Overview. What is Data Mining?

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Foundations of Artificial Intelligence. Introduction to Data Mining

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Data Mining + Business Intelligence. Integration, Design and Implementation

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Big Data. Introducción. Santiago González

Chapter ML:XI. XI. Cluster Analysis

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Chapter Managing Knowledge in the Digital Firm

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

DATA ANALYSIS USING BUSINESS INTELLIGENCE TOOL. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment

Data Mining for Fun and Profit

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Mining an Online Auctions Data Warehouse

CAS CS 565, Data Mining

Big Data. Fast Forward. Putting data to productive use

Data Analytics and Business Intelligence (8696/8697)

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Business Analytics Syllabus

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Ethical Issues in Data Mining

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Bijan Raahemi, Ph.D., P.Eng, SMIEEE Associate Professor Telfer School of Management and School of Electrical Engineering and Computer Science

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Information Management course

MA2823: Foundations of Machine Learning

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Subject Description Form

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

A Spatial Decision Support System for Property Valuation

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

Data Mining and Statistics: What is the Connection?

Data Mining and Marketing Intelligence

Statistics for BIG data

Transcription:

Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro)

Overview Why data mining (data cascade) Application examples Data Mining & Knowledge Discovering Data Mining versus Process Mining

Why Data Mining Cascade of data Different growth rates, but about 30% each year is a low growth rate estimation The possibility to use computers to analyze data 1975 computer for the whole university (main frame) with 1MB working memory, now a PC with 512 MB working memory

Cascade of data Business and government systems (transactions system, ERP systems, Workflow systems,...) Scientific data: astronomy, biology, etc Web, text, and e-commerce (new regularities, about data storage to prevent attempts) Hospitals, internal revenue service...

Examples large data bases AT&T handles billions of calls per day so much data, it cannot be all stored -- analysis has to be done on the fly Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session Google

First conclusion Very little data will ever be looked at by a human Data Mining algorithms and computers are NEEDED to make sense and use of data.

Overview Why data mining (data cascade) Application examples Data Mining & Knowledge Discovering Data Mining versus Process Mining

Application examples I Customer Relationship Management (CRM) Based on a data base with client information and behavior try to select other potential consumers of a product. Euro miles. Profiling tax cheaters Based on the profile of the tax payer and some figures from the tax (electronic) form try to product tax cheating.

Application examples II Health care Given the patient profile and the diagnoses try to predict the number of hospital days. Information is used in planning system. Industry Job shop planning. Based on already accepted jobs, try to product the delivery time of a new offered job.

Type of applications Classification (supervised) Credit risk: result of data mining are rules that can be used to classify new clients as: high, normal, low Estimation (supervised) Credit risk: output is not a classification but a number between -1 and 1 to indicate risk (-1.0 very low, 0.0 normal, +1.0 very high) Clustering (unsupervised) Associations: e.g. Bier & Chips & Peanuts occur frequently in a shopping list of one person Visualization: to facilitate human discovery

Supervised verses unsupervised Supervised (Credit risk) Starting point is a historical data base with client information and his/her financial data including credit history (classification). This data base is used to induce credit risk rules. Unsupervised (Clustering) Try to cluster customers into similar groups (how many groups, in which sense similar)

E-commerce Case Study A person buys a book (product) at Amazon.com. Task: Recommend other books (products) this person is likely to buy Amazon does clustering based on books bought: customers who bought Advances in Knowledge Discovery and Data Mining, also bought Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Recommendation program is quite successful

Hands-on-project I Historical consumer data Age, education, sex, relationship, etc. Income Model to predict income above 50K Use the model to select consumers for direct mailing

Problems Suitable for Data-Mining have sub-optimal current methods have accessible, sufficient, and relevant data provides high payoff for the right decisions! (have a changing environment)

Overview Why data mining (data cascade) Application examples Data Mining & Knowledge Discovering Data Mining versus Process Mining

Knowledge Discovery Definition Knowledge Discovery in Data is the non-trivial process of identifying valid novel potentially useful and ultimately understandable patterns in data. from Advances in Knowledge Discovery and Data Mining, Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, (Chapter 1), AAAI/MIT Press 1996

Related Fields Machine Learning Visualization Data Mining and Knowledge Discovery Statistics Databases

Statistics, Machine Learning and Data Mining Statistics: more theory-based more focused on testing hypotheses Machine Learning more heuristics then theory-based focused on improving performance of a learning algorithms Data Mining and Knowledge Discovery Data Mining one step in the Knowledge Discovery process (applying the Machine Learning algorithm) Knowledge Discovery, the whole process including data cleaning, learning, and integration and visualization of results Distinctions are fuzzy

Knowledge Discovery Process flow, according to CRISP-DM Monitoring Business Understanding + Data Understanding + Data Preparation 80% of the time Modeling (applying mining algorithm) 20%

Phases and Tasks Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Determine Business Objectives Background Business Objectives Business Success Criteria Situation Assessment Inventory of Resources Requirements, Assumptions, and Constraints Risks and Contingencies Terminology Costs and Benefits Determine Data Mining Goal Data Mining Goals Data Mining Success Criteria Collect Initial Data Initial Data Collection Report Describe Data Data Description Report Explore Data Data Exploration Report Verify Data Quality Data Quality Report Data Set Data Set Description Select Data Rationale for Inclusion / Exclusion Clean Data Data Cleaning Report Construct Data Derived Attributes Generated Records Integrate Data Merged Data Format Data Reformatted Data Select Modeling Technique Modeling Technique Modeling Assumptions Generate Test Design Test Design Build Model Parameter Settings Models Model Description Assess Model Model Assessment Revised Parameter Settings Evaluate Results Assessment of Data Mining Results w.r.t. Business Success Criteria Approved Models Review Process Review of Process Determine Next Steps List of Possible Actions Decision Plan Deployment Deployment Plan Plan Monitoring and Maintenance Monitoring and Maintenance Plan Produce Final Report Final Report Final Presentation Review Project Experience Documentation Produce Project Plan Project Plan Initial Asessment of Tools and Techniques

Other related fields Data warehouse A data warehouse thus not contain simply accumulated data at a central point, but the data is carefully assembled from a variety of information sources around the organization, cleaned u, quality assured, and then released (published). Business Intelligence (BI) The use of data in the data ware house to support the managers with important information

Overview Why data mining (data cascade) Application examples Data Mining & Knowledge Discovering Data Mining versus Process Mining

Data Mining versus Process Mining Process Mining is data mining but with a strong business process view. Some of the more traditional data mining techniques can be used in the context of process mining. Some new techniques are developed to perform process mining (mining of process models).

Why Process Mining Traditional As-Is analysis of business processes strongly based on the opinion of process expert. The basic idea is to assemble an appropriate team and to organize modeling sessions in which the knowledge of the team members is used to build an adequate As-Is process model. The surplus values of process mining in the As-Is analysis are: information based on the real performance of the process (objective) more details