Data Mining for Business Analytics



Similar documents
Business Problems and Data Science Solutions

Data Mining: Overview. What is Data Mining?

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining for Fun and Profit

Data Mining Part 5. Prediction

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining Applications in Higher Education

Data Mining Algorithms Part 1. Dejan Sarka

Banking Analytics Training Program

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Framing Business Problems as Data Mining Problems

Databases - Data Mining. (GF Royle, N Spadaccini ) Databases - Data Mining 1 / 25

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining Applications in Manufacturing

Foundations of Business Intelligence: Databases and Information Management

Analyze It use cases in telecom & healthcare

Data, Measurements, Features

Data Mining: An Introduction

not possible or was possible at a high cost for collecting the data.

The Data Mining Process

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Introduction to Data Mining

CRM Analytics for Telecommunications

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Chapter 2 Literature Review

Database Marketing, Business Intelligence and Knowledge Discovery

Machine Learning using MapReduce

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Analytics on Big Data

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

A Marketer s Guide to Analytics

How To Make A Credit Risk Model For A Bank Account

DATA MINING TECHNIQUES AND APPLICATIONS

Big Data. Fast Forward. Putting data to productive use

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

1 What is Machine Learning?

Data Warehousing and Data Mining in Business Applications

Introduction to Data Mining

I. Create the base view with the data you want to measure

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Using multiple models: Bagging, Boosting, Ensembles, Forests

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

MBA Data Mining & Knowledge Discovery

Chapter 7: Data Mining

Behavioral Segmentation

Data Mining Techniques and its Applications in Banking Sector

TEXT ANALYTICS INTEGRATION

Learning is a very general term denoting the way in which agents:

Predictive Analytics Certificate Program

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Business Intelligence and Decision Support Systems

In-Depth Guide Advanced Spreadsheet Techniques

Customer Analytics. Turn Big Data into Big Value

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Foundations of Artificial Intelligence. Introduction to Data Mining

Predictive Modeling Techniques in Insurance

MS1b Statistical Data Mining

Data Mining + Business Intelligence. Integration, Design and Implementation

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Data Mining Techniques in CRM

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

Data Mining Techniques

CoolaData Predictive Analytics

Data Mining III: Numeric Estimation

An Introduction to Advanced Analytics and Data Mining

NEURAL NETWORKS IN DATA MINING

Data Mining Applications in Fund Raising

Scorecard Element in PMML 4.1 Provides Rich, Accurate Exchange of Predictive Models for Improved Business Decisions

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

BIG DATA What it is and how to use?

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

from Larson Text By Susan Miertschin

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

Chapter 20: Data Analysis

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Information Management course

DATA MINING FOR BUSINESS ANALYTICS

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Clustering through Decision Tree Construction in Geology

Use of Data Mining in Banking

Customer Classification And Prediction Based On Data Mining Technique

Transcription:

Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014

MegaTelCo: Predicting Customer Churn You just landed a great analytical job with MegaTelCo, one of the largest telecommunication firms in the US They are having a major problem with customer retention in their wireless business In the mid-atlantic region, 20% of cell phone customers leave when their contracts expire. Communications companies are now engaged in battles to attract each other s customers while retaining their own Marketing has already designed a special retention offer Your task is to devise a precise, step-by-step plan for how the data science team should use MegaTelCo s vast data resources to solve the problem

MegaTelCo: Predicting Customer Churn What data you might use? How would they be used? How should MegaTelCo choose a set of customers to receive their offer in order to best reduce churn for a particular incentive budget?

Terminology Model: A simplified representation of reality crated to serve a purpose Predictive Model: A formula for estimating the unknown value of interest: the target Prediction: The formula can be mathematical, logical statement (e.g., rule), etc. Estimate an unknown value (i.e. the target) Instance / example: Represents a fact or a data point Described by a set of attributes (fields, columns, variables, or features)

Terminology Model induction: The creation of models from data Training data: The input data for the induction algorithm

Terminology

What is a model? A simplified * representation of reality created for a specific purpose * based on some assumptions Examples: map, prototype, Black-Scholes model, etc. Data Mining example: formula for predicting probability of customer attrition at contract expiration classification model or class-probability estimation model

Feature Types Numeric: anything that has some order Numbers (that mean numbers) Dates (that look like numbers ) Dimension of 1 Categorical: stuff that does not have an order Binary Text Dimension = number of possible values (-1) Food for thought: Names, Ratings, SIC

Dimensionality of the data? Attributes / Features Dimensionality of a dataset is the sum of the dimensions of the features The sum of the number of numeric features and ~ the number of values of categorical features

Common Data Mining Tasks Classification and class probability estimation How likely is this consumer to respond to our campaign? Regression How much will she use the service? Similarity Matching Can we find consumers similar to my best customers? Clustering Do my customers form natural groups?

Common Data Mining Tasks Co-occurrence Grouping Also known as frequent itemset mining, association rule discovery, and market-basket analysis What items are commonly purchased together? Profiling (behavior description) What does normal behavior look like? (for example, as baseline to detect fraud) Data Reduction Which latent dimensions describe the consumer taste preferences?

Common Data Mining Tasks Link Prediction Since John and Jane share 2 friends, should John become Jane s friend? Causal Modeling Why are my customers leaving?

Supervised versus Unsupervised Methods Do our customers naturally fall into different groups? No guarantee that the results are meaningful or will be useful for any particular purpose Can we find groups of customers who have particularly high likelihoods of canceling their service soon after contracts expire? A specific purpose Much more useful results (usually) Different techniques Requires data on the target The individual s label

Common Data Mining Tasks

Supervised Data Mining & Predictive Modeling Is there a specific, quantifiable target that we are interested in or trying to predict? Think about the decision Do we have data on this target? Do we have enough data on this target? Need a min ~500 of each type of classification Do we have relevant data prior to decision? Think timing of decision and action The result of supervised data mining is a model that predicts some quantity A model can either be used to predict or to understand

Subclasses of Supervised Data Mining Classification Categorical target Often binary Includes class probability estimation Regression Numeric target

Subclasses of Supervised Data Mining Will this customer purchase service S1 if given incentive I1? Classification problem Binary target (the customer either purchases or does not) Which service package (S1, S2, or none) will a customer likely purchase if given incentive I1? Classification problem Three-valued target How much will this customer use the service? Regression problem Numeric target Target variable: amount of usage per customer

Data Mining versus Use of the Model Supervised modeling: Data Data Mining Model Training data have all values specified Model in use: New data item Model prediction New data item has some value unknown (e.g., will she leave?)

Classical Pitfalls in DM setup The training data is NOT consistent with the use Bad sample Bad features

Sample: Looking under the streetlight Target Proxy I do not see if a person after seeing an ad bought the book, so lets model clicks Sample Proxy I want to run a campaign in Spain but only have data on US customers

Sample: Survivorship issues Lending club wants to have a model to take over the screening process that selects applications and deny those that are likely to default Data of past loans and the outcomes are provided Bad: Use the data they currently have to predict default

Sample: Different Sources Things go really bad if the positive and negative are treated differently Looking for drivers of diabetes: how do you assemble the training data? Bad: Go to a specialized hospital and get records from people treated for diabetes Go somewhere else to get records for healthy people

Summary on bad habits You are missing all the applications that were turned down already The sick people came from a very artificial subset Your target is NOT really your target No way of telling how the model will perform No way of testing either The training sample should be as similar as possible to the USE data

Digression on features: It is all about the timing in use! T d Decision has to be made T t Target is known Features Leakage

Thanks!

Questions?