Business Problems and Data Science Solutions

Similar documents

Data Mining for Business Analytics

Data Mining Applications in Higher Education

Data Mining Algorithms Part 1. Dejan Sarka

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Big Data: Rethinking Text Visualization

Data Mining mit der JMSL Numerical Library for Java Applications

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Data, Measurements, Features

Database Marketing, Business Intelligence and Knowledge Discovery

Chapter 20: Data Analysis

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Introduction to Data Mining

Data Mining: Overview. What is Data Mining?

Environmental Remote Sensing GEOG 2021

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Data Mining Part 5. Prediction

Azure Machine Learning, SQL Data Mining and R

Social Media Mining. Data Mining Essentials

An Overview of Database management System, Data warehousing and Data Mining

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

BIG DATA What it is and how to use?

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Framing Business Problems as Data Mining Problems

Easily Identify Your Best Customers

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Visualization methods for patent data

Introduction Course in SPSS - Evening 1

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Specific Usage of Visual Data Analysis Techniques

Data Preprocessing. Week 2

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining Fundamentals

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Data Mining with SQL Server Data Tools

Microsoft Azure Machine learning Algorithms

Introduction to Data Mining

Knowledge Discovery from patents using KMX Text Analytics

Data Mining: An Introduction

not possible or was possible at a high cost for collecting the data.

2015 Workshops for Professors

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Introduction to Regression and Data Analysis

Introduction to Data Mining Techniques

Data Mining for Fun and Profit

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Gerry Hobbs, Department of Statistics, West Virginia University

An Introduction to Advanced Analytics and Data Mining

Classification and Prediction

Data Mining Techniques

CoolaData Predictive Analytics

The Data Mining Process

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining for Knowledge Management. Classification

Principles of Data Mining by Hand&Mannila&Smyth

MBA Data Mining & Knowledge Discovery

January 26, 2009 The Faculty Center for Teaching and Learning

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

THE PREDICTIVE MODELLING PROCESS

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Banking Analytics Training Program

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Data Mining: An Overview. David Madigan

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Using Predictions to Power the Business. Wayne Eckerson Director of Research and Services, TDWI February 18, 2009

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Technology and Trends for Smarter Business Analytics

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Predictive Analytics Certificate Program

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Foundations of Artificial Intelligence. Introduction to Data Mining

II. DISTRIBUTIONS distribution normal distribution. standard scores

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

Data Warehousing and Data Mining in Business Applications

Lecture 6 - Data Mining Processes

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Advanced In-Database Analytics

Transcription:

CSCI E-84 A Practical Approach to Data Science Ramon A. Mata-Toledo, Ph.D. Professor of Computer Science Harvard Extension School Unit 1 - Lecture 2 February, Wednesday 3, 2016

Business Problems and Data Science Solutions This lecture is based primarily on Chapters 2 and 3 of the book Data Science for Business by Foster Provost and Tom Fawcett, 2013. Thanks also to Professor P. Adamopoulos (Stern School of Business of New York University) and Professor Tomer Geva (The Tel Aviv University School of Management) Figures are used with the authors permission.

Lecture Objectives At the end of this lecture, the student should be able to identify and define concepts such as: Data Mining Process Data Mining Models Data Mining Tasks Supervised Segmentation (purity, information gain, entropy) In addition, this lecture will introduce you to: Supervised versus Unsupervised Methods Data Warehousing Database Querying On-line Analytical Data Processing Categorical and Quantitative data

Business Scenario You just landed a great analytical job with MegaTelCo, one of the largest telecommunication firms in the US. They are having a major problem with customer retention in their wireless business. In the mid-atlantic region, 20% of cell phone customers leave when their contracts expire. Communications companies are now engaged in battles to attract each other s customers while retaining their own. Marketing has already designed a special retention offer. Your task is to devise a precise, step-by-step plan for how the data science team should use MegaTelCo s vast data resources to solve the problem.

MegaTelCo: Predicting Customer Churn What data you might use? How would they be used? How should MegaTelCo choose a set of customers to receive their offer in order to best reduce churn for a particular incentive budget?

Terminology Model: A simplified representation of reality created to serve a purpose Predictive Model: A formula for estimating the unknown value of interest: the target The formula can be mathematical, logical statement (e.g., rule), etc. Prediction: Estimate an unknown value (i.e. the target) Instance / example: Represents a fact or a data point Described by a set of attributes (fields, columns, variables, or features)

Terminology (Continuation) Model induction: The creation of models from data. The procedure that creates the model is called the induction algorithm or learner Training data: The input data for the induction algorithm Individual: Any entity about which we have data (customer, business, industry)

Terminology (Continuation) Informative variables (attributes): provide information about the characteristics of the entity. It is desirable to choose those variables that correlate well with the target variable.

What is a model? A simplified * representation of reality created for a specific purpose * based on some assumptions deemed important for the target Examples: map (abstraction of physical world), engineering prototypes, the Black-Scholes option pricing model, etc. Data Mining example: formula for predicting probability of customer attrition at contract expiration! classification model or class-probability estimation model

Data Mining Process Data mining is a process that breaks up the overall task of finding patterns from data into a set of well-defined set of subtasks. The solutions to these subtasks can often be put together to solve the overall problem. science + craft + creativity + common sense Data Mining is a process

Cross-Industry Standard Process for Data Mining CRISP-DM Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Determine Business Objectives Collect Initial Data Select Data Select Modeling Technique Evaluate Results Plan Deployment Assess Situation Describe Data Clean Data Generate Test Design Review Process Plan Monitoring & Maintenance Determine Data Mining Goals Explore Data Construct Data Build Model Determine Next Steps Produce Final Report Produce Project Plan Verify Data Quality Integrate Data Assess Model Review Project Format Data

Data Warehousing / Storage Data Mining versus Data Warehousing/Storage Data warehouses coalesce data from across an enterprise, often from multiple transaction-processing systems

Data Mining versus Database Querying and Reporting Database Querying / Reporting (SQL, Excel, QBE, other GUI-based querying) Queries are requests posed to a database for particular subsets of the data Very flexible interface to produce reports summaries of the data No modeling or sophisticated pattern finding Most of the cool visualizations

Data Mining versus On-line Analytical Data Processing OLAP On-line Analytical Processing OLAP provides easy-to-use GUI to explore large data collections Exploration is manual; no modeling Dimensions of analysis preprogrammed into OLAP system

Data Mining versus Traditional statistical analysis Mainly based on hypothesis testing or estimation / quantification of uncertainty Should be used to follow-up on data mining s hypothesis generation Automated statistical modeling (e.g., advanced regression) This is data mining, one type usually based on linear models Massive databases allow non-linear alternatives

Answering business questions with these techniques.. Who are the most profitable customers? Database querying Is there really a difference between profitable customers and the average customer? Statistical hypothesis testing But who really are these customers? Can I characterize them? OLAP (manual search), Data mining (automated pattern finding) Will some particular new customer be profitable? How much revenue should I expect this customer to generate? Data mining (predictive modeling)

Common Data Mining Tasks Classification estimation Given a sample population predict for each individual the class to which it belongs. The set of classes are generally small and disjoint (non overlapping). Class probability estimation (scoring) A model produces, for each individual, a value or score which is the probability that the individual belongs to the class. How likely is this consumer to respond to our campaign? Regression Given a sample population, regression, attempts to estimate the relationship between a selected dependent variable and one or more independent variable. How much will she use the service?

Common Data Mining Tasks (continuation) Similarity Matching Process that attempts to identify similar individual based on particular data about them. Can we find consumers similar to my best customers? Clustering Process that attempts to group individual in a population based on similarities w/o a specific target. Do my customers form natural groups? Co-Occurrence Grouping (known also as frequent itemset mining, association rule discovery, market-basket analysis) Process attempts to find associations based on their transactions. Customers that bought this item also bought

Common Data Mining Tasks (continuation) Profiling Process that attempts to characterize the typical behavior of an individual, group, or population. Often used to establish a baseline. Used also to observe out of the normal behavior for fraud detection. What items do my best customer buy more frequently? Link Prediction Process that attempts to predict connections between nodes of a graph based on the information of the individual nodes. Often used in social networks such as Facebook or LinkedIn. People that you may know Data Reduction Process that transforms a large set of data to replace it with a smaller set that contains most of the relevant information. What items are purchased together?

Common Data Mining Tasks (continuation) Casual Modeling Process that attempts to help us understand the actions or events that influence other events. Did the advertising campaign increased the volume of purchases? This type of technique may include randomized controlled experiments with two variants (A/B tests). One of the variants serves as the control; the other is the variation of the experiment. Which of the two promotional codes increased volume of sales? (assuming an email advertisement sent two identical populations with variations in the promotional code)

Supervised versus Unsupervised Data Mining Unsupervised (no specific target) Do our customers naturally fall into different groups? No guarantee that the results are meaningful or will be useful for any particular purpose Supervised (specific target) Can we find groups of customers who have particularly high likelihoods of canceling their service soon after contracts expire? Results are generally more useful than unsupervised. Supervised data mining require different techniques than unsupervised. Supervised Data Mining requires that the target information be in the data. Example: attempts to retrieve historical behavior of customer for last year when data only has been kept for 6 months.

Common Data Mining Task Supervised vs. Unsupervised

Supervised Data Mining & Predictive Modeling Is there a specific, quantifiable target that we are interested in or trying to predict? Think about the decision Do we have data on this target? Do we have enough data on this target? Need a min ~500 of each type of classification Do we have relevant data prior to decision? Think timing of decision and action The result of supervised data mining is a model that predicts some quantity A model can either be used to predict or to understand

Subclasses of Supervised Data Mining

Data Mining Model in Use New data item Probability estimation model prediction of prob. of class

Model in use: Data Mining versus Use of the Model

Supervised Segmentation How can we divide (segment) the population into groups that differ from each other with respect to some quantity that we would like to predict? Which customers will churn after their expiration date? Which customers are likely to default? Informative attributes Find knowable attributes that correlate with the target of interest Increase accuracy Alleviate computational problems E.g., tree induction

Supervised Segmentation (Continuation) How can we judge whether a variable contains important information about the target variable? How much? What single variable give the most information about the target? Based on this attribute, how can we partition the customers into subgroups that are as pure as possible with respect to the target (i.e., such that in each group as many instances as possible belong to the same class). A subgroup is pure if every member of the group (instance) has the same value for the target otherwise is impure.

Selecting Informative Attributes Example: A binary (Yes, No) classification problem with the following attributes: Head shape (square, circular) Body shape (rectangular, oval) Body color (red, blue, purple) Target variable: Yes, No No Yes Yes Yes Yes No Yes No

Purity Measure Information Gain Entropy Difficulties encountered in segmenting the population due to the impurity of the subgroups can be addressed by creating a formula that evaluates how well each attribute separates the entities into segments. The formula is called a purity measure. The most common criterion for segmenting the population is called information gain (IG). Information gain is based on a purity measure called entropy. Entropy is a measure of the degree of disorder that can be applied to a set. The entropy is defined as follows: entropy = pi lg(pi) P i : the probability (the relative percentage) of property i within the set P i = 0 when no member of the set satisfy the property P i = 1 when all the members of the set satisfy the property lg(p i ) stands for the logarithm of base 2: log 2 p i

Calculating the Information Gain IG (parent, children) = entropy(parent) [p(c1) entropy(c1)+p(c2) entropy(c2) + ] Parent Child 1 (c 1 ) Child 2 (c 2 ) Child Note: Higher IG indicates a more informative split by the variable.

Selecting Informative Attributes Example: Set of 10 people divided into two classes: will default (write-off) and will not default (non-write-off). 7 people will not default; p(n-w-o)=7/10 = 0.7 and p(w-o)=3/10 entropy (S) = -[0.7 x lg(0.7) + 0.3 x lg(0.3)] = 0.88 lg(x) stands for logarithm of base 2. Log (x) is the logarithm of base 10. The logarithm of number x in base 2 equals the logarithm of x in base 10, divided by the logarithm of 2 in base 10. Remember: logarithms of numbers between 0 and 1 are negative.

Information Gain Information gain measures the change in entropy due to any amount of new information being added

Information Gain

Information Gain (continuation) 13/30 approx. 0.43 17/30 approx. 0.57 Relative percentages of the subset with respect to the parent. This percentage is calculated dividing the cardinality of the childset by the cardinality of the parent-set.

Attribute Selection Reasons for selecting only a subset of attributes: Better insights and business understanding Better explanations and more tractable models Reduced cost Faster predictions Better predictions!

Example: Attribution Selection with Information Gain This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended This latter class was combined with the poisonous one The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like leaflets three, let it be for Poisonous Oak and Ivy.

Example: Attribution Selection with Information Gain (continuation)

Example: Attribution Selection with Information Gain

Example: Attribution Selection with Information Gain

Example: Attribution Selection with Information Gain

Multivariate Supervised Segmentation If we select the single variable that gives the most information gain, we create a very simple segmentation If we select multiple attributes each giving some information gain, how do we put them together?

Categorical and Quantitative Variables A categorical variable places an observation in one (and only one) category chosen from two or more possible categories. If there is no ordering that can be done between the categories, the variable is nominal. If there is some intrinsic order that can be assigned to the categories, the variables are ordinal. Examples of categorical data Your gender (male, female) Your class in school (Freshman, Sophomore, Junior, Senior, Graduate) Your performance status (Probation, Regular, Honors) Your political party (Democrat, Republican, Independent) Your hair color (blonde, brown, red, balck, white, other) Your type of pet (cat, dog, ferret, rabbit, other) Your race (Hispanic, Asian, African American, Caucasian) Machine settings (Low, Medium, High) Method of payment (Cash, Credit) The color of some object (red, orange, yellow, green, blue, purple)

Variables and the Case format Before analyzing the data, prepare charts or graphs, or do any statistical test, it s important to gain a sense of what data is all about. Why is this necessary? The type of data available dictates what we can do with it. Do you want to prepare a scatterplot or perform a linear regression to generate a predictive model? Fine as long as you two quantitative variables preferable both continuous Do you want to do a one-way Analysis of Variance (ANOVA)? No problem, as long as you have multiple collections of quantitative variables, and you can split them up into groups using one of the categorical variables you have collected. Do you want to construct a bar plot? OK, but you will need categorical data or else you should be preparing a histogram instead.

Questions?