Data Mining Techniques



Similar documents
Data Mining Solutions for the Business Environment

Data Mining Algorithms Part 1. Dejan Sarka

not possible or was possible at a high cost for collecting the data.

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

An Overview of Knowledge Discovery Database and Data mining Techniques

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining + Business Intelligence. Integration, Design and Implementation

Session 10 : E-business models, Big Data, Data Mining, Cloud Computing

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

Data Mining: An Introduction

Foundations of Artificial Intelligence. Introduction to Data Mining

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

Data Warehousing and Data Mining in Business Applications

Data Mining for Fun and Profit

Chapter 12 Discovering New Knowledge Data Mining

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

MBA Data Mining & Knowledge Discovery

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Chapter 2 Literature Review

Introduction. A. Bellaachia Page: 1

BIG DATA What it is and how to use?

Chapter 20: Data Analysis

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Hexaware E-book on Predictive Analytics

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining: Overview. What is Data Mining?

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining and Neural Networks in Stata

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Fluency With Information Technology CSE100/IMT100

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Chapter 6 - Enhancing Business Intelligence Using Information Systems

The Data Mining Process

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Data Mining for Everyone

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Using Data Mining for Mobile Communication Clustering and Characterization

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Customer Analytics. Turn Big Data into Big Value

Foundations of Business Intelligence: Databases and Information Management

Data Mining with SQL Server Data Tools

Data Mining System, Functionalities and Applications: A Radical Review

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

An Overview of Database management System, Data warehousing and Data Mining

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining and Statistics: What is the Connection?

Neural Networks and Support Vector Machines

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Journal of Management Systems

A Knowledge Management Framework Using Business Intelligence Solutions

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

A New Approach for Evaluation of Data Mining Techniques

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

MBA - INFORMATION TECHNOLOGY MANAGEMENT (MBAITM) Term-End Examination December, 2014 MBMI-012 : BUSINESS INTELLIGENCE SECTION I

IBM SPSS Modeler Professional

How To Use Neural Networks In Data Mining

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

SAP Predictive Analysis: Strategy, Value Proposition

Classification algorithm in Data mining: An Overview

Three proven methods to achieve a higher ROI from data mining

Predictive Analytics: Turn Information into Insights

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Data Mining: A Preprocessing Engine

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Comparison of K-means and Backpropagation Data Mining Algorithms

Use of Data Mining in Banking

Chapter Managing Knowledge in the Digital Firm

Social Media Mining. Data Mining Essentials

Data Mining Applications in Fund Raising

Machine Learning: Overview

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

from Larson Text By Susan Miertschin

Rhodes University COMPUTER SCIENCE HONOURS PROJECT Literature Review. Data Mining with Oracle 10g using Clustering and Classification Algorithms

Mining an Online Auctions Data Warehouse

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY

Transcription:

15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses Web Mining and Text Mining 1

Operational vs. Decision Support Systems Decision support processes Transactional processes Linkages with Suppliers, customers, partners Operational vs. Decision Support Systems Operational Systems Support day to day transactions Contain current, up to date data Examples: customer orders, inventory levels, bank account balances Decision Support Systems Support strategic decision making Contain historical, summarized data Examples: performance summary, customer profitability, market segmentation 2

Example of an Op. Ap.: Order Entry This screenshot is from the Microsoft Access software program. Example of a DSS ap: Annual performance summary This screenshot is from the Microsoft Access software program. 3

What is Data Mining? Combination of AI and statistical analysis to discover information that is hidden in the data associations (e.g. linking purchase of pizza with beer) sequences (e.g. tying events together: marriage and purchase of furniture) classifications (e.g. recognizing patterns such as the attributes of customers that are most likely to quit) forecasting (e.g. predicting buying habits of customers based on past patterns) Sample Data Mining Applications Direct Marketing identify which prospects should be included in a mailing list Market segmentation identify common characteristics of customers who buy same products Customer churn Predict which customers are likely to leave your company for a competitor Market Basket Analysis Identify what products are likely to be bought together Insurance Claims Analysis discover patterns of fraudulent transactions compare current transactions against those patterns 4

Case study: Bank is losing customers Attrition rate greater than acquisition rate More profitable customers seem to be the ones to go Case study: Bank of America Bank wants to expand its portfolio of home equity loans Direct mail campaigns have been disappointing 5

The Virtuous Circle of Data Mining Identify the business problem Use data mining to transform the data into actionable information What is the right data and where do we get it from? What are the right techniques? Act on the information Measure the results Business uses of data mining Essentially six tasks Classification Classify credit applicants as low, medium, high risk Classify insurance claims as normal, suspicious Estimation Estimate the probability of a direct mailing response Estimate the lifetime value of a customer Prediction Predict which customers will leave within six months Predict the size of the balance that will be transferred by a credit card prospect 6

Business uses of data mining Affinity Grouping Find out items customers are likely to buy together Find out what books to recommend to Amazon.com users Clustering Difference from classification: classes are unknown! Description Help understand large volumes of data by uncovering interesting patterns Overview of Data Mining Techniques Market Basket Analysis Memory-Based Processing (Collaborative Filtering) Automatic Clustering Decision Trees and Rule Induction Neural Networks 7

Market Basket Analysis Association and sequence discovery Principal concepts Support or Prevalence: frequency that a particular association appears in the database Confidence: conditional predictability of B, given A Example: Total daily transactions: 1,000 Number which include soda : 500 Number which include orange juice : 800 Number which include soda and orange juice : 450 SUPPORT for soda and orange juice = 45% (450/1,000) CONFIDENCE of soda orange juice = 90% (450/500) CONFIDENCE of orange juice soda = 56% (450/800) Applying Market Basket Analysis Create co-occurrence matrix What is the right set of items??? Generate useful rules Weed out the trivial and the inexplicable from the useful Figure out how to act on them Similar techniques can be applied to time series for mining useful sequences of actions 8

Memory-based reasoning Helps predict unknown attributes of customers/situations, based on attributes on their nearest neighbors Principal set of techniques behind Web recommendation engines Otherwise known as collaborative filtering 9

Example: Amazon.com book recommendations Example: Identify books to recommend to customers Company keeps log of past customer purchases Represent each customer as a vector whose components are the past purchases Define a distance function for comparing customers Based on this distance function, identify the customer s nearest neighbor set (NNS) Identify books that have been purchased by a large percentage of the nearest neighbor set but not by the customer Recommend these books to the customer as possible next purchases (Screenshot from Amazon.com showing book recommendations.) 10

Another example: Personalized restaurant recommendations Alice is asking Zagat.com for a personalized rating of Border Cafe Alice has already submitted ratings for 20 other restaurants in the past 12 months Zagat.com finds other members whose ratings for those 20 restaurants are similar to Alice s Zagat.com calculates the average (or weighted average) rating that these nearest neighbors have given to Border Cafe This is Alice s personalized rating of Border Cafe Note that this may be quite different from the average rating of Border Cafe based on the entire population of raters! Clustering Divide (segment) a database into groups Goal: Find groups that are very different from each other, and whose members are similar to each other Number and attributes of these groups are not known in advance 11

Clustering (example) Buys groceries online X X X X X X X X Income Decision Trees Income > $40,000 Job > 5 Years High Debt Low Risk High Risk High Risk Low Risk Data mining is used to construct the tree Example algorithm: CART (Classification and Regression Trees) 12

Decision tree construction algorithms Start with a training set (i.e. preclassified records of loan customers) Each customer record contains Independent variables: income, time with employer, debt Dependent variable: outcome of past loan Find the independent variable that best splits the records into groups where one single class (low risk, high risk) predominates Measure used: entropy of information (diversity) Objective: max[ diversity before (diversity left + diversity right) ] Repeat recursively to generate lower levels of tree Pros Decision Tree pros and cons One of the most intuitive techniques, people really like decision trees Really helps get some intuition as to what is going on Can lead to direct actions/decision procedures Cons Independent variables are not always the best separators Maybe some of them are correlated/redundant Maybe the best splitter is a linear combination of those variables (remember factor analysis) 13

Neural Networks Powerful method for constructing predictive models Each node applies an activation function to its input Activation function results are multiplied by w ij and passed on to output 1 2 w 15 Inputs w 13 w 14 w 25 3 w 23 w 36 w 24 4 6 5 Hidden Layer w 46 w 56 Output Neural Networks Weights are determined using a training set, I.e. a number of test cases where both the inputs and the outputs are known 1 2 w 15 Inputs w 13 w 14 w 25 3 w 23 w 36 w 24 4 6 5 Hidden Layer w 46 w 56 Output 14

Neural Networks Example: Build a neural net to calculate credit risk for loan applicants Inputs: annual income, loan amount, loan duration Outputs: probability of default [0,1] Training set: data from past customers with known outcomes 1 2 w 15 Inputs w 13 w 14 w 25 3 w 23 w 36 w 24 4 6 5 Hidden Layer w 46 w 56 Output Neural Networks Start from an initial estimate for the weights Feed the independent variables for the first record into inputs 1 and 2 Compare with output and calculate error Update estimates of weights by backpropagating error 1 2 w 15 Inputs w 13 w 14 w 25 3 w 23 w 36 w 24 4 6 5 Hidden Layer w 46 w 56 Output 15

Neural Networks Repeat with next training set record until model converges 1 w 13 w 14 3 w 23 w 36 2 w 15 w 25 Inputs w 24 4 6 5 Hidden Layer w 46 w 56 Output Pros Neural networks pros and cons Versatile, give good results in complicated domains Cons NN cannot explain the data All inputs and outputs must be massaged to [0,1] 16

Define business problem Data Mining Process Build data mining database Explore data Prepare data for modeling Build model Evaluate model Deploy model and results Selecting the right data mining technique 17

Evaluating the various techniques What is a data warehouse? Data Mining has a hard requirement for clean and consistent data Decision Support Data Are found in many different databases within the company outside the company Are often inconsistent and unclean In practical terms, locating and integrating all this information in real time is very difficult Solution: Create separate repositories of data for decision support data warehouses 18

Data Warehousing architecture Data Warehousing considerations What data to include? How to reconcile inconsistencies? How often to update? 19

Trends in Business Intelligence Text Mining Mining patterns from unstructured text data, e.g. from the Web Software Agent Technologies Business intelligence on behalf of the consumer Agents learn the preferences and behavior of their human master in order to Search the Web and recommend products Compare prices and other attributes and select providers Automatically negotiate What does this mean for vendors??? Recommended books To delve deeper Data Mining Techniques: Michael J. A. Berry and Gordon Linoff Useful collections of links http://databases.about.com/cs/datamining/ Case studies and industry Datamation magazine website http://www.datamation.com 20