Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining



Similar documents
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining System, Functionalities and Applications: A Radical Review

Introduction. A. Bellaachia Page: 1

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

A Review of Data Mining Techniques

Dynamic Data in terms of Data Mining Streams

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

CRISP-DM: Towards a Standard Process Model for Data Mining

Introduction to Data Mining

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Database Marketing, Business Intelligence and Knowledge Discovery

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Data Mining for Fun and Profit

Data Mining Solutions for the Business Environment

Perspectives on Data Mining

Introduction to Data Mining

SPATIAL DATA CLASSIFICATION AND DATA MINING

An Overview of Knowledge Discovery Database and Data mining Techniques

Data Mining Analytics for Business Intelligence and Decision Support

Information Management course

Big Data. Introducción. Santiago González

Machine Learning and Data Mining. Fundamentals, robotics, recognition

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

Introduction to Data Mining

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining: Motivations and Concepts

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

not possible or was possible at a high cost for collecting the data.

Statistics for BIG data

Chapter ML:XI. XI. Cluster Analysis

Analyzing Customer Behavior using Data Mining Techniques: Optimizing Relationships with Customer

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Introduction to Data Mining

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Transforming the Telecoms Business using Big Data and Analytics

Data Mining: Concepts and Techniques

Healthcare Measurement Analysis Using Data mining Techniques

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India

Introduction to Data Mining Techniques

Use of Data Mining in the field of Library and Information Science : An Overview

ICT Perspectives on Big Data: Well Sorted Materials

Information Visualization WS 2013/14 11 Visual Analytics

Data Mining course Master in Information Technologies Enginyeria Informàtica Tomàs Aluja. LIAM EIO. UPC Lluis Belanche LSI. UPC

Computational Science and Informatics (Data Science) Programs at GMU

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Data Warehousing and Data Mining in Business Applications

An Introduction to Data Mining

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

CRISP-DM 1.0. Step-by-step data mining guide

Step-by-step data mining guide

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Why do statisticians "hate" us?

Machine Learning and Statistics: What s the Connection?

Concept and Applications of Data Mining. Week 1

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Data Mining: Overview. What is Data Mining?

Data Warehousing and Data Mining

DATA MINING AND WAREHOUSING CONCEPTS

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Tom Khabaza. Hard Hats for Data Miners: Myths and Pitfalls of Data Mining

Data Mining + Business Intelligence. Integration, Design and Implementation

Sanjeev Kumar. contribute

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

DATA ANALYSIS USING BUSINESS INTELLIGENCE TOOL. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment

DATA MINING TECHNIQUES FOR CRM

DATA MINING - A DOMAIN SPECIFIC ANALYTICAL TOOL FOR DECISION MAKING

Visualization of Breast Cancer Data by SOM Component Planes

Big Data with Rough Set Using Map- Reduce

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Data Mining and Application in Accounting and Auditing

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Course Syllabus For Operations Management. Management Information Systems

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Using Data Mining Techniques in Customer Segmentation

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

DATA MINING TECHNIQUES AND APPLICATIONS

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

Data Mining with Microsoft SQL Server 2005

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

High Productivity Data Processing Analytics Methods with Applications

Knowledge Discovery Process and Data Mining - Final remarks

Transcription:

Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II

Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.: 934137796, 937398090 www.lsi.upc.edu/~avellido/teaching/data_mining.html /~belanche/docencia/aiddm/aiddm.html

Contents of the course disclaimer:(but who knows) 1. Introduction to DM and its methodologies 2. Visual DM: Exploratory DM through visualization 3. Pattern recognition 1 4. Pattern recognition 2 5. Feature extraction 6. Feature selection 7. Error estimation 8. Linear classifiers, kernels and SVMs 9. Probability in Data Mining 10. Nonlinear Dimensionality Reduction (NLDR) 11. Applications of NLDR: biomed & beyond 12. DM Case studies

2012/2013. Alfredo Vellido An Introduction to Mining (1)

What is DATA MINING? (1) Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting through your data using pattern recognition technologies ( ) is a hot new technology about one of the oldest processes of human endeavour: pattern recognition ( ) It is an iterative process of extracting knowledge from business transactions ( ) DM is the automatic discovery of usable knowledge from your stored data. Jesús Mena: Data Mining your Website (Digital Press, 1999, available @ books.google)

What is DATA MINING? (2) Data Mining, by its simplest definition, automates the detection of relevant patterns in a database ( ) For many years, statisticians have manually mined databases ( ) DM uses well established statistical and machine learning techniques to build models that predict customer behaviour. Today, technology automates the mining process, integrates it with commercial data warehouses, and presents it in a relevant way for business users ( ) the leading DM products address the broader business and technical issues, such as their integration into complex IT environments. Berson, Smith, & Thearling: Building Data Mining Applications for CRM (McGraw Hill, 2000)

What is DATA MINING? (3) WIKIPEDIA 2005 DIXIT: Data mining has been defined as "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data" (1) and "The science of extracting useful information from large data sets or databases" (2). Although it is usually used in relation to analysis of data, data mining, like artificial intelligence, is an umbrella term and is used with varied meaning in a wide range of contexts. (1) W. Frawley and G. Piatetsky Shapiro and C. Matheus, Knowledge Discovery in Databases: An Overview. AI Magazine, 1992, 213 228. (2) D. Hand, H. Mannila, P. Smyth: Principles of Data Mining. MIT Press, 2001. en.wikipedia.org/wiki/data_mining

What is DATA MINING? (4) WIKIPEDIA 06 DIXIT: Data mining (DM), also called Knowledge Discovery in Databases (KDD) or Knowledge Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules. It is a fairly recent topic in computer science but applies many older computational techniques from statistics, information retrieval, machine learning and pattern recognition.

What is DATA MINING? (5) In 1996, in the proceedings of the 1st International Conference on KDD, Fayyad gave one of the best known definitions of Knowledge Discovery from Data: The non trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. KDD quickly gathered strength as an interdisciplinary research field where a combination of advanced techniques from Statistics, Artificial Intelligence, Information Systems, and Visualization are used to tackle knowledge acquisition from large data bases. The term Knowledge Discovery from Data appeared in 1989 referring to the: [...] overall process of finding and interpreting patterns from data, typically interactive and iterative, involving repeated application of specific data mining methods or algorithms and the interpretation of the patterns generated by these algorithms.

What is DATA MINING? (6) WIKIPEDIA 08 DIXIT: Data mining is the process of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but is increasingly being used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. It has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" and "the science of extracting useful information from large data sets or databases." Data mining in relation to enterprise resource planning is the statistical and logical analysis of large sets of transaction data, looking for patterns that can aid decision making.

What is DATA MINING? (7) WIKIPEDIA 10 gave up: BOTTOM LINE: The concept of DM, even if somehow well established, is still quite fluid

What to expect from a DM conference (good and bad examples, starting with a rather bad one) 15 17 September 04: Wessex Institute of Technology (W.I.T.), Málaga, Spain

Data Mining 2004: Main Topics Sessions 1 & 2: Text Mining Session 3: Web Mining Session 4: Clustering Techniques Session 5: Data Preparation Techniques Session 6 & 7: Applications in Business, Industry and Government Session 8: Customer Relationship Management (CRM) Session 9 & 10: Applications in Science and Engineering

Data Mining 2007: Main Topics Session 1: Categorisation Methods Session 2: Data Preparation Session3: Enterprise InformationSystems Session 4: Clustering Techniques Session 5: National Security Session 6: Data and Text Mining Session 7: Mining Environmental and Geospatial Data Session 8: Applications in Business, Industry and Government

IDADM Data Mining 2008: Late years

Data Mining 2009: Late years Investigative Data Mining For Security And Criminal Detection Jesús Mena Butterworth Heinemann 2003

A different (good) conference, a different take IEEE CIDM 2012, Brussels 2012 IEEE Symposium on Computational Intelligence and Data Mining Data mining foundations Novel data mining algorithms in traditional areas (such as classification, regression, clustering, probabilistic modeling, and association analysis) Algorithms for new, structured, data types, such as arising in chemistry, biology, environment, and other scientific domains Developing a unifying theory of data mining Mining sequences and sequential data Mining spatial and temporal datasets Mining textual and unstructured datasets High performance implementations of data mining algorithms

A different conference, a different take IEEE CIDM 2012, Brussels 2012 IEEE Symposium on Computational Intelligence and Data Mining Mining in targeted application contexts Mining high speed data streams Mining sensor data Distributed data mining and mining multi agent data Mining in networked settings: web, social and computer networks, and online communities Data mining in electronic commerce, such as recommendation, sponsored web search, advertising, and marketing tasks

A different conference, a different take IEEE CIDM 2012, Brussels 2012 IEEE Symposium on Computational Intelligence and Data Mining Methodological aspects and the KDD process Data pre processing, data reduction, feature selection, and feature transformation Quality assessment, interestingness analysis, and post processing Statistical foundations for robust and scalable data mining Handling imbalanced data Automating the mining process and other process related issues Dealing with cost sensitive data and loss models Human machine interaction and visual data mining Security, privacy, and data integrity

A different conference, a different take IEEE CIDM 2012, Brussels 2012 IEEE Symposium on Computational Intelligence and Data Mining Integrated KDD applications and systems Bioinformatics, computational chemistry, geoinformatics, and other science & engineering disciplines Computational finance, online trading, and analysis of markets Intrusion detection, fraud prevention, and surveillance Healthcare, epidemic modeling, and clinical research Customer relationship management Telecommunications, network and systems management

But let s talk money... Where is the money in DM?

www.darpa.mil

What s DATA MINING?: A procedural viewpoint

What s DATA MINING?: A historicist viewpoint STATISTICS ESTADÍSTICA DM PATT RECOG KDD ARTIFICIAL INTELLIGENCE EXPERT SYSTEMS MACHINE LEARNING DB MANAGEMENT

What s DATA MINING?: A historicist viewpoint STATISTICS ESTADÍSTICA KDD ADVANCED PROBABILISTIC MODELS Probabilistic Models ARTIFICIAL INTELLIGENCE MACHINE LEARNING OTHERS Algor. Devel. Bio-plausible Models

DATA MINING as a methodology

CRISP: a DM methodology CRoss Industry Standard Process for Data Mining: neutral methodology from the point of view of industry, tool and application (free &nonproprietary) Pete Chapman, Randy Kerber (NCR); Julian Clinton, Thomas Khabaza, Colin Shearer (SPSS), Thomas Reinartz, Rüdiger Wirth (DaimlerChrysler) CRISP DM was conceived in 1996 DaimlerChrysler: leaders in industrial application, SPSS: leaders in product development (Clementine, 1994), NCR: owners of large (huge!) databases (Teradata) Financed by the EU. Version 1.0 released officially in 1999

CRISP: Hierarchic structure of the methodology

CRISP: Description of phases Problem/Business understanding: study of targets and requirements form the business/problem viewpoint. Defining it as a DM problem. Data understanding: data recolection; getting to know the data, trying to detect both quality problems and interesting features. Data preparation: Preparing the data set to be modelled, starting from raw data. This is an iterative and exploratory process. Selection of files, tables, variables, record samples plus data cleaning. Modelling: Data analysis using modelling techniques of a sort that are suitable for the problem at hand. Includes fiddling with the models, tuning their parameters, etc. Evaluation: All previous steps must be evaluated as whole (as a unitary process), and we must decide whether deliverables so far meet the DM challenge. Implementation: All the knowledge aquired to this point must be organized and presented to the client in a usable form. We must define, together with this client, a protocol to reliably deploy the DM findings.

CRISP: The virtuous loop of methodology phases