Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA



Similar documents
Data Mining Part 5. Prediction

Introduction to Data Mining

Azure Machine Learning, SQL Data Mining and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

A Comparison of Leading Data Mining Tools

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Introduction. A. Bellaachia Page: 1

Knowledge Discovery and Data Mining

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Information Management course

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Data Mining. Nonlinear Classification

Database Marketing, Business Intelligence and Knowledge Discovery

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

R s and Predictive Modeling Boot Camp Nov. 8-9, Session #1: Predictive Modeling: An Overview Syed Muzayan Mehmud, ASA, FCA, MAAA

Introduction to Data Mining

A Review of Data Mining Techniques

DATA MINING - SELECTED TOPICS

Table of Contents. June 2010

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

not possible or was possible at a high cost for collecting the data.

An Overview of Knowledge Discovery Database and Data mining Techniques

Knowledge Discovery and Data Mining

Model Combination. 24 Novembre 2009

Subject Description Form

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Data Mining Analytics for Business Intelligence and Decision Support

REVIEW OF ENSEMBLE CLASSIFICATION

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Data Mining Practical Machine Learning Tools and Techniques

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Warehousing and Data Mining in Business Applications

An Introduction to WEKA. As presented by PACE

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Chapter 12 Discovering New Knowledge Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

Fluency With Information Technology CSE100/IMT100

Intrusion Detection. Jeffrey J.P. Tsai. Imperial College Press. A Machine Learning Approach. Zhenwei Yu. University of Illinois, Chicago, USA

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Electronic Payment Fraud Detection Techniques

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

An Overview of Database management System, Data warehousing and Data Mining

Data Mining - Evaluation of Classifiers

A Data Mining Tutorial

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Predictive Modeling and Big Data

Learning is a very general term denoting the way in which agents:

Data Mining: Overview. What is Data Mining?

Introduction Predictive Analytics Tools: Weka

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Data Mining for Fun and Profit

from Larson Text By Susan Miertschin

Course Syllabus. Purposes of Course:

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Essential Components of an Integrated Data Mining Tool for the Oil & Gas Industry, With an Example Application in the DJ Basin.

Improve Results with High- Performance Data Mining

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

2010 Data Miner Survey Highlights

The University of Jordan

Sunnie Chung. Cleveland State University

Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection

How To Use Neural Networks In Data Mining

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Decision Trees from large Databases: SLIQ

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

ADVANCES IN KNOWLEDGE DISCOVERY IN DATABASES

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Session 10 : E-business models, Big Data, Data Mining, Cloud Computing

No BI without Machine Learning

CS590D: Data Mining Chris Clifton

Distributed forests for MapReduce-based machine learning

Data Mining Solutions for the Business Environment

The Data Mining Process

Decision Tree Learning on Very Large Data Sets

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

A New Approach for Evaluation of Data Mining Techniques

An Introduction to Data Mining

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Achieve Better Insight and Prediction with Data Mining

Supervised Learning (Big Data Analytics)

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Data Mining and Machine Learning in Bioinformatics

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Achieve Better Insight and Prediction with Data Mining

ICPSR Summer Program

Transcription:

Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/ 1

Outline Data mining: Techniques and applications How to select the proper techniques for various kinds of projects (supervised, unsupervised, and hybrid) Problems with data - cleaning, transforming, structuring, and organizing data Mining at large: Knowledge management Intelligent learning database systems Incremental data mining Leading data mining tools 2

Data mining: Techniques and applications (with an insurance context) Cluster analysis: Marketing to identify potential clients and to better serve policyholders. Classification of existing products, to determine the most appropriate policy for a new customer Association analysis to prevent customer loss (based on cancellations and unemployment for specific regions) Pattern recognition and deviation detection in claims analysis for detecting areas of potential fraud Simulation of Changes and Potential Policies (interpreting data mining results) 3

How to select the proper techniques for various kinds of projects Data characteristics Intended use of the mined knowledge Typical techniques Classification Rule induction Decision tree construction Association analysis Statistical patterns Neural networks Genetic algorithms Inductive logic programming 4

Problems with data Data cleaning Contradiction and redundancy elimination Missing and stale data Data transforming Feature enhancement and dimension reduction Mapping time-series data Data structuring and organization Common terminology Sampling strategy 5

Mining at large: Knowledge management How to handle massive bodies of knowledge discovered from large, real-world data? When the data comes from different sources (e.g., from different hospitals), how to maintain the consistency of the knowledge that is extracted from each of these data sources? Would bagging and boosting work for different data sources? If the extracted knowledge overfits or underfits the data, how to resolve its incompleteness and contradiction? 6

Knowledge management (2) No match: No knowledge extracted is applicable to a new problem. Multiple match: Knowledge from data mining gives conflicting advice to a new problem. Single match: a test example matches with the description of a specific class 7

Intelligent learning database systems An ILDB system provides mechanisms for Preparation and translation of standard (e.g. relational) database information into a form suitable for use by its induction engines, Using induction techniques to extract knowledge from databases, and Interpreting the extracted knowledge to solve users' problems by deduction (or KBS technology) 8

Incremental data mining Quinlan s windowing technique: costs vs benefits of windowing Incremental learning: training data items one by one Multi-layer incremental induction (Wu and Lo 1998): Subsets (or samples) of a database are processed one by one) Data Partitioning Generalization: A set of rules is learned. Reduction: Behavioral examples are derived. From behavioral examples, generalization can extract new rules, which are expected to correct defects and inconsistencies of previous rules. Successive applications of the above generalizationreduction process allow more accurate and more complex (because of disjunctive) rules to be discovered, by sequentially handling the subsets of examples 9

Incremental data mining (2) Incremental batch learning [Clearwater et al 1989]: Rules are generated on each batch of examples, ``almostgood-rules'' are collected, an additional filter (such as Gen- Spec method and the n-best) is applied, and then a RL technique (derived from the Meta-DENDRAL) is used to refine the remaining rules on the next batch of examples. Meta-learning for scalable inductive learning [Chan 1997]: Partition a large database into subsets, run a learning algorithm on each of the subsets, and combine the results in some principled fashion. Since the learning processes are independent, they can be run in parallel for speed up over a sequential system. Arbiters and combiners 10

Incremental data mining (3) Learning ensembles of classifiers (such as bagging and boosting): Generates multiple versions of a predictor by running an existing learning algorithm many times on a set of re-sampled data. A data item in the original database can be used in many samples for generating different versions of the predictor. Stacked generalization [Wolpert 1992, Ting & Witten 1997] uses a high-level model to combine lower-level models to achieve greater predictive accuracy. These lower-level models are generated also by sampling the original database, and cross-validation is used to generate higher-level data for the high-level model. 11

Leading data mining tools C5.0/See5, RuleQuest Research (Australia) decision tree construction and rule induction based on best-known data mining Intelligent Miner family, IBM (USA) customer relationship marketing; fraud and abuse detection database support Clementine, Integral Solutions (UK) neural networks and rule induction data visualization in tables, histograms, plots and webs Enterprise Miner, SAS Institute (USA) clustering, decision trees, & neural networks GUI designed for business users 12

Leading data mining tools (2) DBMiner, DBMiner Technology (Canada) Data warehousing support A comprehensive set of mining facilities Darwin, Thinking Machines (US, distrbutrs in Japan) parallel, scalable, data mining architecture NN, CART, and GAs Pattern Recognition Workbench, Unica (USA) pattern classification and function estimation with an experiment manager statistical regression, non-parametric, and NN algorithms 13

Leading data mining tools (3) Other data mining tools (http://www.datamininglab.com/resources.html) A Comparison of Leading Data Mining Tools Data Mining Tools for Fraud Detection Fourteen Desktop Data Mining Tools 14