MBA 8473 - Data Mining & Knowledge Discovery



Similar documents
Chapter 12 Discovering New Knowledge Data Mining

Introduction. A. Bellaachia Page: 1

Data Mining Techniques

Mining an Online Auctions Data Warehouse

not possible or was possible at a high cost for collecting the data.

Introduction to Data Mining

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

Data Mining Techniques Chapter 9: Market Basket Analysis and Association Rules

from Larson Text By Susan Miertschin

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Data Mining Algorithms Part 1. Dejan Sarka

Session 10 : E-business models, Big Data, Data Mining, Cloud Computing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

How to Get More Value from Your Survey Data

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Easily Identify Your Best Customers

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Azure Machine Learning, SQL Data Mining and R

ANALYTICS CENTER LEARNING PROGRAM

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Simple Predictive Analytics Curtis Seare

Information Management course

Foundations of Artificial Intelligence. Introduction to Data Mining

Data Mining Applications in Higher Education

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

White Paper. Data Mining for Business

Data Mining Solutions for the Business Environment

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Data Mining Techniques in CRM

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data Mining & Knowledge Discovery: Personalization and Profiling Technologies

Introduction to Data Mining

Hexaware E-book on Predictive Analytics

KNOWLEDGE BASE DATA MINING FOR BUSINESS INTELLIGENCE

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

A Review of Data Mining Techniques

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

IBM SPSS Modeler Professional

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Nagarjuna College Of

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

The Data Mining Process

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Introduction to Data Mining

Data Mining for Business Analytics

Data Mining: Overview. What is Data Mining?

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Joseph Twagilimana, University of Louisville, Louisville, KY

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

A New Approach for Evaluation of Data Mining Techniques

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Introduction of Information Visualization and Visual Analytics. Chapter 4. Data Mining

Analytics with Excel and ARQUERY for Oracle OLAP

Pentaho Data Mining Last Modified on January 22, 2007

Three proven methods to achieve a higher ROI from data mining

The New Era of Education Big Data

Gerry Hobbs, Department of Statistics, West Virginia University

Data Mining and Neural Networks in Stata

Subject Description Form

Sunnie Chung. Cleveland State University

2015 Workshops for Professors

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data

Distance Learning and Examining Systems

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Data Mining: An Introduction

Introduction to Pattern Recognition

Role of Social Networking in Marketing using Data Mining

Introduction to Data Mining

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Foundations of Business Intelligence: Databases and Information Management

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

Developing Relevant Dining Visits with Oracle Advanced Analytics Olive Garden s transition toward tailoring guests experiences

Lecture 6 - Data Mining Processes

Data exploration with Microsoft Excel: analysing more than one variable

Use of Data Mining in Banking

Chapter 2 Literature Review

Foundations of Business Intelligence: Databases and Information Management

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

How To Use Neural Networks In Data Mining

Behavioral Segmentation

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Data Mining for Successful Healthcare Organizations

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Data Warehousing and Data Mining. A.A Datawarehousing & Datamining 1

Transcription:

MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various types of rules. 57. Explain Four Data mining methods and describe how each can use both Visual and Non-visual techniques) 57.1 Association 57.2 Sequence 57.3 Classification 57.4 Clustering 58. Demonstration only- Use of Excel, SPSS (dropped), Backpack a Neural Network technology (dropped). 2

Organization of Concepts Large-Scale Data Management for Organizational Decision-Making Data Warehouse (Data Integration) Multidimension Databases (Enabling Technology) Data Mining (Relationship Discovery) Purpose Basic Features Purpose Basic Features Purpose Applications Definition Data Requirements Transaction Processing vs. Data Warehousing Data Structures Example Architecture Data Mapping Example Architecture of an IBM Data Warehouse Preparing Data for DW Reasons for Failure MDDB vs. Relational Database Rotation Ranging Roll-Up Drill-Down Computations Benefits Tools PowerPlay DM Methods Association Sequencial Pattern Classifying Clustering Hybrid Emergent Applications Analysis Methods Visual Nonvisual 3 What is Data Mining and its purpose? (L.O. 55) Search for relationships and global patterns that exist in large databases but are hidden in the vast amounts of data. Analyst combines knowledge of data and machine learning technologies to discover nuggets of knowledge hidden in the data. Serendipity to science. Easier and more effective when the organization has accumulated as much data as possible, such as with a data warehouse A data warehouse is not a prerequisite to data mining 4

APPLICATIONS - Market Basket Analysis (MBA) (L.O. 56) MBA is form of clustering used for finding groups that tend to occur together in a transaction (or market basket). The models are built to find the likelihood of different products being purchased together and can be expressed as a rule. Example rules found from real data: On Thursdays, grocery store consumers often purchase diapers and beer together. Customers who purchase maintenance agreements are very likely to purchase large appliances. When a new hardware store opens, one of the most commonly sold items is toilet rings. 5 6 6

Taxonomies of items can help decide which items to focus MBA on (O.2). 7 7 All rules are not useful (L.O. 56.1) Three common types of rules that can be produced by by MBA: (1) Useful rule - have some cause and provides actionable information (2) Trivial rule - is one that is already known by anyone at all familiar with the business (3) Inexplicable rule - seems to have no explanation and do not suggest a course of action. Using the above three types, try to rate the rules from previous slide. 8

A special case of trivial rule.. (L.O.56.1) Consider a seemingly interesting result - the people who buy the three-way calling option on their local telephone service almost always buy call waiting A subtle problem could be that this may be the result of marketing promotions and product bundling. Results may simply be measuring the success of previous marketing campaigns. 9 Useful rules lead to action... (L.O. 56.1) How can we incent users to put other items that they are likely to purchase into their carts? - Relocate items on the isle, etc. 10

Other Data Mining Applications (L.O.56) Memory Based Reasoning (MBR) Based on past data (i.e., memory), identify similar cases from experience, then apply the information to the problem at hand. Example Fraud detection - new cases of fraud are likely to be similar to known cases. Customer response prediction - the next customers likely to respond to an offer are probably similar to previous customers that have responded. Medical treatments. MCI mines data from 140 million households, each with as many as 10,000 attributes, including life-style and calling habits. Have identified 22 profiles (secret!) 11 Some popular use of data mining: Customer Relationship Marketing Business-to-Consumer Management Build customer profiles using data collected from web visits Focus on one-to-one marketing Customizing products and services for each consumer Profile warehousing business Track what customers do during each site visit Record time between clicks, links between clicks AOL purchasing profile warehouses (e.g., Junglee) Oracle developing product line for profile warehousing Mine the data for relationships 12

Four data Mining Methods (L.O.57) 1. Looking for association or co-existence, cooccurrence of events (suitable for MBA) 2. Looking for sequence or temporal patterns (MBA, MBR) 3. Looking for classification of data (MBA, MBR) - target groups are known in the beginning. 4. Looking for clustering of data (MBA, MBR) - target groups are NOT known in the beginning 13 Data Mining Method #1 (L.O.57.1) 1. Find Association (can be converted into rules) Identifies affinities existing among the collection of items in a given set of records 80 percent of all records that contain A, B and C also contain D and E; I.e., if A, B and C Then D and E. 85 percent of customers who buy a certain wine brand also buy a certain type of pasta; If buys Wine X then buys Pasta C. On Thursdays, many customers buy a six-pack when they purchase diapers. If Thursday and buys six-pack, then buys diapers. How good is the rule? (We will use grocery data example to clarify the issue of confidence ) 14

Analysis Methods for Discovering Association (L.O.57.1) Visual methods Strategy for visualizing associations Specific association detection Scatter plot Segmented scatter plot Link analysis builds up networks of interconnected objects. Landscape visualization the relative positioning of data elements within the geometric terrain represents information important for analysis 15 Strategy for Visualizing Objects and Their Associations 16

Scatter Plot 17 Scatter Plot Shows out-of-bounds data signifying new findings or corrupt data 18

Network is one popular visualization paradigm 19 Link Analysis for Association Visual Networks for Phone Call Data 20

Landscape Visualization for Association Exploring association between interest variables and their relative Cartesian positioning, such as geography 21 Analysis Methods for Discovering Association Non-visual techniques Correlation analysis (can be done in Excel) Are the variables nominal, ordinal, or continuous? Interpret the strength of the correlation coefficient Contingency tables Cross-tabulate nominal variables (can be done by Pivottable in Excel) Examine the proportion of cases in each cell of the table Use chi-square tests to assess significance 22

Association - Market Basket Analysis 23 23 Two in-class examples by using Excel Grocery Point-of-sale data (very small set, calculation by hand) Discussion on how to know the confidence of the rule. Coffee store data (in coffee.xls) 24

Discovering Association (L.O.57.1 finishes here) Non-visual techniques continues Analysis of variance (ANOVA) Assess if there are mean differences in the dependent variable across two or more predefined groups 25 Data Mining Method #2 (L.O.57.2) 2. Discovering Sequential Pattern Identify frequently occurring sequences from given records 40 percent of female customers buy a gray skirt six months after buying a red jacket 26

Analysis Methods for Discovering Sequential Patterns Visual Methods Link analysis Temporal Patterns (Time based plots) Non-visual methods Time-series analysis 27 Patterns from Link Analysis Diagram - Example 1 U.S. Government s secret data analyzed to find unusual patterns in the network structure (Kicker: data labels not known) 28

Patterns from Link Analysis Diagram - Example 2 Intersection of account type and transaction velocity detects money laundering. 29 Discovering Temporal Patterns 30

Absolute Time Cycle Events 31 Contiguous Time Cycle Events Finds co-occurrence of two or more events within a non-standard time interval 32

L.O.57.2 Finishes here. 33 33 Data Mining Method #3 (L.O.57.3) 3. Classification Identify a priori certain mutually exclusive classes Identify a set of meaningful attributes that discriminate among the classes Illustrations Using a meaningful set of attributes, can we differentiate between frequent, moderate and infrequent customers? Using a meaningful set of attributes, can we differentiate between repeat purchasers and one-time purchasers? 34

Analysis Techniques for Classification Neural networks develops non-linear functions to associate inputs with outputs no assumptions about distribution of data handles missing data well (graceful degradation) Supervised neural networks Estimating and testing the model Construct a training sample and a holdout sample Estimate model parameters using training sample Test the estimated model s classification ability using holdout sample 35 Topographical Map Produced by an Unsupervised Learning Neural Network (L.O.57.3 finishes here) 36

Data Mining Method #4 4. Visual Clustering Objects are assigned a place on the display based on general descriptive values and clustered around shared values. Positioning algorithms for clustering (K Means method - can be done in SPSS) self-organizing network 37 Analysis Methods for Clustering (L.O. 57.4 finishes here) Non-visual methods Cluster Analysis Defineindicator variables to define clusters on income, age, education, etc. Examine differences in clusters on key criterion variables purchase loyalty, purchase behavior, etc Do values of indicator and criterion variables vary systematically across clusters? 38

39 40

41 Self-Organizing Network 42

43 43 Summary and Review What is data mining? What are its two main applications? Do you know how rules are created by Market Basket Analysis (MBA)? Can you compute a rule from a small set of example data? Are all rules useful? If not, why not? We have discussed four different data mining methods. Do you know what they are and what kind of situations they are applicable for? 44