Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health



Similar documents
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Information Management course

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining: Concepts and Techniques

Introduction to Data Mining

Data Mining: Overview. What is Data Mining?

Introduction. A. Bellaachia Page: 1

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining Analytics for Business Intelligence and Decision Support

Introduction to Data Mining

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining Solutions for the Business Environment

Perspectives on Data Mining

Learning from Big Data in

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

A Data Mining Tutorial

Knowledge Discovery Process and Data Mining - Final remarks

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

MA2823: Foundations of Machine Learning

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Data Mining for Fun and Profit

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Data Mining Challenges and Opportunities in Astronomy

Database Marketing, Business Intelligence and Knowledge Discovery

Data Mining: Opportunities and Challenges

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Introduction to Data Mining

not possible or was possible at a high cost for collecting the data.

Why do statisticians "hate" us?

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Healthcare Measurement Analysis Using Data mining Techniques

Data Mining. Introduction to Modern Information Retrieval from Databases and the Web. Administrivia

Data Mining: An Introduction

Principles of Data Mining by Hand&Mannila&Smyth

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Data Mining: Introduction

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Introduction to Data Mining

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Data Mining Introduction

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

Dynamic Data in terms of Data Mining Streams

The Scientific Data Mining Process

Chapter ML:XI. XI. Cluster Analysis

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Sanjeev Kumar. contribute

Data Warehousing and Data Mining

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

The Data Mining Process

International Journal of Advance Research in Computer Science and Management Studies

FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Application of Data Mining Methods in Health Care Databases

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

Specific Usage of Visual Data Analysis Techniques

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Social Media Mining. Data Mining Essentials

Big Data Hope or Hype?

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

DATA MINING: AN OVERVIEW

Chapter 12 Discovering New Knowledge Data Mining

Data Mining in Telecommunication

Foundations of Artificial Intelligence. Introduction to Data Mining

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

The KDD Process for Extracting Useful Knowledge from Volumes of Data

Three Perspectives of Data Mining

Data Analytics and Business Intelligence (8696/8697)

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy

Data Mining Applications in Manufacturing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Classification of Household Devices by Electricity Usage Profiles

Big Data Analytics and Healthcare

Introduction to Data Mining Techniques

Quick Introduction of Data Mining Techniques

Introduction to Data Mining. Lijun Zhang

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Statistics for BIG data

Transcription:

Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining resources (conferences, journals, Web resources) Example application (1) Telecommunication Huge amount of data is collected daily Transactional data (about each phone call) (Data on mobile phones, house based phones, Internet, etc.) Other customer data (billing, personal information, etc.) Additional data (network load, faults, etc.) Which customer group is highly profitable, which one is not? To which customers should we advertise what kind of special offers? What kind of call rates would increase profit without loosing good customers? How do customer profiles change over time? Fraud detection (stolen mobile phones or phone cards) Slide 1 of 18 MATH3346 Data Mining, S2/2006 Slide 2 of 18 MATH3346 Data Mining, S2/2006 Example application (2) Health Different aspects of the health system Personal health records (at GPs, specialists, etc.) Hospital data (e.g. admission data, midwives data, surgery data) Billing information (Medicare, PBS) Are doctors following the procedures (e.g. prescription of medication)? Adverse drug reactions (analysis of different data collections to find correlations) Are people committing fraud (e.g. doctor shoppers) Correlations between social and environmental issues and people s health? (temporal and spatial analysis of linked data collections) Example application (3) Astronomy Terabytes of image and other data from telescopes and satellites (large-area sky surveys in optical, infrared, and radio wavelengths) Classification of objects (stars, galaxies, pulsars, quasars, etc.) Detect (large scale) structures in the data Find rare, unusual, or even previously unknown types of astronomical objects and phenomena MACHO (MAssive Compact Halo Objects) (ANU and US) (search for dark matter, objects like brown dwarfs or planets in the milky way) Slide 3 of 18 MATH3346 Data Mining, S2/2006 Slide 4 of 18 MATH3346 Data Mining, S2/2006

Definitions of data mining (1) Knowledge discovery in databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. (Fayyad, Piatetsky-Shapiro and Smyth, 1996) An information extraction activity whose goal is to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis. (http://www.twocrows.com/glossary.htm) Try also: http://www.google.com, search term: define: data mining Definitions of data mining (2) Data mining is often also called Knowledge discovery in databases (KDD) (some say data mining is only one essential step in the KDD process) Essential in definitions is:... non-trivial extraction...... previously unknown or novel...... potentially useful information...... understandable and interesting...... large amounts of data...... prediction and modelling... Slide 5 of 18 MATH3346 Data Mining, S2/2006 Slide 6 of 18 MATH3346 Data Mining, S2/2006 Data mining is multi disciplinary Data mining methods and techniques (1) Database Technology Numerical Mathematics What they do Detect patterns in data: Rules, patterns, classes, associations and functional dependencies, outliers, data distributions Statistics Data Mining Visualisation How they do it Search through data and pattern space, non-parametric modelling, filtering, aggregation Machine Learning High Performance Computing How well they do it Errors and biases, over-fitting, confounding effects, speed, scalability Slide 7 of 18 MATH3346 Data Mining, S2/2006 Slide 8 of 18 MATH3346 Data Mining, S2/2006

Data mining methods and techniques (2) Cluster analysis (unsupervised learning) Group data to form classes, maximise intra-class similarity and minimise similarity between clusters Association rules Find frequent rules in the data; popular with market basket analysis, e.g. if a customer buys chips he also buys beer (with 50% likelihood) Decision trees Build (binary) tree where each node corresponds to a split of attribute values, e.g. if the weather is sunny play golf else don t play Predictive modelling Build mathematical models (functions) of the data in order to predict some unknown or missing values (or future outcomes) Slide 9 of 18 MATH3346 Data Mining, S2/2006 Data mining methods and techniques (3) Neural networks / genetic algorithms Outlier detection Find unusual, rare events (often regarded as noise, these can be the most interesting objects or events in the data), e.g. fraud detection, network intrusion detection, etc. Sequence / time series mining Find patterns over time (e.g. episodes, clusters) Spatial mining Geographical data analysis Stream mining Where access to the data is limited to once (e.g. network data, telecommunications data, etc.), special algorithms are necessary Slide 10 of 18 MATH3346 Data Mining, S2/2006 Data size Major challenges in data mining Size of data collections grows more than linear, doubling every 18 months (similar to Moore s law of CPU speed) Scalable algorithms are needed Data complexity Different types of data (free text, HTML, XML, multimedia) Dimensionality of the data increases (more attributes) The curse of dimensionality affects many algorithms (for example find nearest neighbours in high dimensions) Privacy and confidentiality Slide 11 of 18 MATH3346 Data Mining, S2/2006 Ten grand challenges in data mining (U. Fayyad) Technical challenges 1. How does the data grow? 2. Complexity/understandability trade-off 3. Interestingness 4. Scalability 5. A theory for what we do Pragmatic challenges 6. Where is the data? 7. Embedding algorithms and solutions within operational systems 8. Integrating domain knowledge 9. Managing and maintaining models 10. Effectiveness measurement (Source: http://www.acm.org/sigs/sigkdd/explorations/, Editorial, volume 5, issue 2, Dec. 2003) Slide 12 of 18 MATH3346 Data Mining, S2/2006

The data mining / KDD process (1) Data mining is an interactive process Understand Customer Take Action Understand Data Evaluate Model(s) Data mining = Build Model(s) Prepare Data Build Model(s) Typically 90% of time and efforts are spent in the first 3 steps The data mining / KDD process (2) An iterative sequence of the following steps 1. Data cleaning 2. Data integration 3. Data selection 4. Data transformation 5. Data mining 6. Pattern evaluation 7. Knowledge presentation (Follows: Data Mining: Concepts and Techniques, Han/Kamber) (Follows: CRoss Industry Standard Process for Data Mining, http://www.crisp-dm.org/) Slide 13 of 18 MATH3346 Data Mining, S2/2006 Slide 14 of 18 MATH3346 Data Mining, S2/2006 History of data mining The term data mining was first mentioned by statisticians several decades ago, but with a different meaning compared to today: data dredging (inappropriate (sometimes deliberately so) search for statistically significant relationships in large quantities of data; from Wikipedia) First workshops on knowledge discovery in databases in early 1990 (part of ACM SIGMOD (management of data) conferences) First data mining conferences in mid 1990 Many more conferences since early 2000 Data mining is around 15 years old Conferences Data mining resources (1) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (since 1995) European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) (since 1997) Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (since 1997) SIAM (Society for Industrial and Applied Mathematics) International Conference on Data Mining (since 2001) IEEE (International Institute of Electrical Engineers) International Conference on Data Mining (since 2001) AusDM (workshop since 2002, conference since 2004) Slide 15 of 18 MATH3346 Data Mining, S2/2006 Slide 16 of 18 MATH3346 Data Mining, S2/2006

Journals Data mining resources (2) http://www.kluweronline.com/issn/1384-5810/ (Kluwer Data Mining and Knowledge Discovery) http://www.computer.org/tkde/ (IEEE Transactions on Knowledge and Data Engineering) http://www.acm.org/sigs/sigkdd/explorations/issue.php?issue=current (ACM SIGKDD Explorations) http://www.cs.uvm.edu/ kais/ (Springer Knowledge and Information Systems) (Some) Web resources Data mining resources (3) http://www.kdnuggets.com/ http://www.dmg.org/ (Data mining group, PMML) http://www.acm.org/sigs/sigkdd/ http://datamining.anu.edu.au/ http://www.togaware.com/datamining/catalogue.html http://www.togaware.com/analytics/ Canberra Analytics Group http://kdd.ics.uci.edu/ (UCI Knowledge Discovery in Databases Archive) Slide 17 of 18 MATH3346 Data Mining, S2/2006 Slide 18 of 18 MATH3346 Data Mining, S2/2006