Click to edit Master title style. Introduction to Predictive Modeling
|
|
- Arron Dickerson
- 1 years ago
- Views:
Transcription
1 Click to edit Master title style Introduction to Predictive Modeling
2 Notices This document is intended for use by attendees of this Lityx training course only. Any unauthorized use or copying is strictly prohibited Lityx, LLC. The information in this document is meant to be delivered in a classroom setting, and is not complete without the associated interactions with the instructor. SERVICES AND SOLUTIONS Lityx offers world-class analytic services and solutions to help our clients become analyticallydriven, successful organizations. Please see for more information. CONTACT INFORMATION By phone 888-LITYX-IQ (phone) (fax) On the web Postal mail 100 West Road, Suite 300 Towson, MD
3 Agenda Basics of Modeling The modeling process 3
4 What are predictive models? Attempt to describe the mathematical relationship between a dependent variable (y) and independent variables (x1,x2,.) Y (also called the response variable) is a customer attribute you would like to predict, like response/no response or yearly revenue. The X s (also called predictors) are attributes that may be useful in making predictions of Y - E.g., if Y is response/no response to a campaign, X s might be Age, Income, Previous Customer?, etc. Ultimately, the mathematical relationship is in the form of a mathematical function like y=f(x1,x2, ) - The function can range from simple to very complex 4
5 Overview of Model Building Use historical customer data to understand the true relationship between Y and X s. The model is a simplification of the true relationship. Yearly Revenue (Y) True Relationship if we could observe all customers for all years forever Yearly Revenue (Y) Estimated from Data but we can only observe some customers for limited time periods Age (X) Circles are actual customer data Dashed line is the model y=f(x) Age (X) 5
6 Overview of Model Building Notice that we still do a pretty good job of estimating the true relationship between revenue and age. Generally, the more data we have, the better we do. The model now allows us to predict yearly revenue for a potential new customer whose age we know. - We realize that there is some prediction error with this prediction. Predicted Yearly Revenue Prediction error range Yearly Revenue Potential customer s age Age 6
7 Overview of Model Building A better understanding of the response variable (and therefore better predictions) typically come from more than one predictor variable - Maybe 10 s or 100 s - Becomes impossible to picture the relationship nicely in a graph - We can now only write the full relationship down using a complex mathematical formula to symbolize the model Finding the model and determining how good it is are what the modeling process is all about - Not easy, often requires many iterations - There are many different techniques for determining a model All give different answers depending on many aspects of the modeling process 7
8 Overview of Model Building Model building is a process - Some aspects can be automated with the help of tools - Overall, requires a lot of attention and iteration from experienced business people and modelers Good models are easy to build; great models are harder to build - Incremental value gained from better models can be large when scaled to the full prospect or customer base Requires understanding of many things - The business problem and the data - Modeling algorithms and parameters - Software - Implementation issues 8
9 Why build models? Want to predict customer behavior so we can make more efficient, profitable business and marketing decisions Two assumptions: - Past behavior can help us predict future behavior - Customers that have similar characteristics behave similarly Customer data can be used to quantify customer attributes and behavior Models can be built using these attributes to learn customer behavior patterns 9
10 Examples of Models Marketing - Predict response to a campaign for customers or prospects - Predict likelihood of a second stay following the first for a hotel chain - Predict cross-sell opportunities based on product or category preferences - Predict number of purchases per year - Predict offer amount most likely to entice response but still be profitable - Predict revenue or profitability over a year Non-marketing - Percent gain for a stock ticker over next year - Likelihood of a credit card transaction being fraudulent - Likelihood of loan default - Number of bears living in a certain habitat 5 years from now 10
11 Typical Attributes Used as Predictors Customer transaction history - Previous stays and stay lengths, previous promotion responses, previous packages purchased Customer demographics - Age, income, zip code, gender Appended data - Customer psychographics, lifestyles, attitudes - Often only available at block or census tract levels Marketing data - Product data, campaign data, offer data Etc. - Many possibilities any transactional, customer, or household level data 11
12 Classifying Models Numeric versus Classification predictions - Numeric prediction model response variable is a numeric quantity E.g., revenue, profit - Classification model response variable is categorical E.g., response/no response, renew/doesn t renew Actually we are predicting the probability of one of the categories (e.g., probability of responding to the campaign) Supervised versus Unsupervised models - Supervised there is a response variable (like all examples so far) - Unsupervised there is no response variable Very different situation - not trying to predict a specific outcome Goal group similar customers together based only on predictor variables Main example is customer segmentation 12
13 Classifying Algorithms For classification or numeric prediction (or both) For supervised or unsupervised modeling Underlying assumptions of the algorithm - Some require data to have certain distributions, others do not Complexity of resulting models Number of algorithm parameters to fine-tune - Relates to ease of building models Length of time to build models Ease of interpreting results 13
14 Typical Model Development Process Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment 14
15 Step 1: Business Understanding What are the goals from a business perspective? - Retain more customers - Grow the customer base - Make profitable customers more loyal - Increase response rates to campaigns/promotions - Reduce churn, increase loyalty - Etc. How will achievement be measured? - Profit/revenue increase - Increased market share - Number of customers - Retention rate - Etc. 15
16 Step 2: Data Understanding Based on the business objectives, what data is available to us for modeling? - Attributes (both response and predictor variables) - Timeframe of available data - Quality of data available - Quantity of data available Different objectives require different views of data, different attributes, and different modeling techniques 16
17 Data Timing Example (Response Model) Modeling Timeframe Time < 1 mo. Modeling data snapshot (predictors) < 1 mo. Prediction data snapshot make predictions Prior promotion period Promotion/Campaign period < 12 mo. Modeling data snapshot (response) Actual response data snapshot Model and promotion assessment Model building 17
18 Step 2: Data Understanding Retrieve the data from data sources - Internal databases/tables, external data, merges, etc. - Row/column flat file format for use in modeling tool - Sampling (same time and computational effort with little accuracy tradeoff) Simple random sample (all records equally likely to be sampled) Stratified sampling (some categories of data oversampled, e.g., higher percentage of responders than non-responders) Analyze data quality - Missing data for some attributes - Data errors, outliers - Data accuracy - Missing attributes 18
19 Step 2: Data Understanding Data exploration - Interactive and iterative data analysis - Look for interesting patterns and correlations - Give initial assessment of important attributes for making predictions 19
20 Step 3: Data Preparation Variable transformations - Reason 1: Business reasons e.g., ProfitPerMonth = Total Profit / Number Months as Customer e.g., Recency = Current Date Date of last stay - Reason 2: Mathematical/modeling reasons (usually automated) e.g., take log ratios of money variables e.g., normalize all attributes to same scale e.g., binary coding of categorical predictor variables Data cleaning - Remove, keep, or impute for missing values - Detect and fix/keep/remove outliers and bad data 20
21 Step 3: Data Preparation Data reduction - In many situations, the number of attributes available is huge. - Modeling effort can grow exponentially with each additional variable. - Use techniques for reducing the number of attributes to be used for modeling. Remove redundant or highly correlated variables. Factor analysis or principal components analysis find a small number of transformations of original variables that contain most of the important information. - Benefits: Performance savings - Drawbacks: May reduce/remove important relationships Note: Some aspects of data prep can be integrated into the model building process (Step 4). Put into final format needed for modeling tool. 21
22 Step 4: Modeling Now it s time to actually build the models Consists of a number of important sub-tasks 1) Select modeling algorithm(s) to use 2) Select parameter settings of the algorithms A single algorithm is often run more than once using different parameter settings 3) Run the models ( fit the models, build the models ) using the tool May build dozens of models depending on number of algorithms chosen and number of parameter settings for each 22
23 Step 4-1: Select Modeling Algorithms May choose one, two, or many different algorithms to try Choice depends on many things: - Availability in your tool - Availability for the particular modeling problem - Analyst capabilities - Time - Ease of deployment - Need for interpretability Some tools automate some of this, some parts are still manual. 23
24 Step 4-2: Select Parameter Settings A single algorithm actually consists of many possible models based on various parameter settings and model forms the analyst can choose. Algorithm parameters are one of two kinds: - Some parameters have to do with the mathematical form of the model itself. - Some parameters have to do with the details of how the model is built. In either case, changing a parameter setting usually results in a different model being built. - Note: for some algorithms, even not changing a parameter setting can result in a different model being built if you run it again! 24
25 Step 4-2: Select Parameter Settings This is usually recommended only for advanced users - Most tools have reasonable default settings for parameters. - Most changes have little effect, but an advanced user can often find a combination of settings that can produce more accurate models. - Novice users can still play around with different settings, but might waste time if they don t really understand what they mean. 25
26 Step 4-3: Run the Models This is always a computer-automated task. - Requires many complex mathematical calculations. Depending on the algorithms chosen, amount of data, number of variables, and other settings, this can take a second or two, or much, much longer. - I ve run models that have taken days to finish, and have heard of some taking months. Result is a listing of completed models (i.e., their mathematical formulation) and their performance. 26
27 Step 5: Evaluation (and Selection) Lots of models now built and finalized. - We now have the mathematical formula that represents each model Need to evaluate each model and select one (or more). Two steps: 1) Estimate performance of the models 2) Select one (or more) winning models 27
28 Step 5-1: Estimate Model Performance Need to know how good each model is - Measure accuracy of predictions, response lift, revenue gain, etc. Usually performed simultaneously with the Model Building step within a software tool. Estimating performance has two parts: - Choice of performance measure: Lift, percent accuracy, profit, mean squared error, etc. - Choice of performance estimation method Holdout set, cross-validation, bootstrapping, etc. 28
29 Model Performance Measures Lift - Computes percent of responses found by model at each decile in the modeling file, and compares to percent of responses that would be returned by a random mailing. - Lift is measured as the total area between curves. High is good. Lift curve for this model Lift curve for random mailing 29
30 Model Performance Measures For numeric prediction models, Root Mean Square Error (RMSE) and coefficient of determination (R 2 ) are also common accuracy measures. - They have to do with the amount of prediction error made by the model. Yearly Revenue Red lines represent prediction errors. Overall, we want these to be as small as possible (in other words, our model should come as close to the data points as possible, except for caveat on next page). Age 30
31 Model Performance - Caveat Seemingly, we would want a model with the best possible performance when measured on the modeling data. - Example: 100% lift in first decile for a response model, or zero error for a numeric prediction But, in reality, such a model is likely to be not very good, or even terrible. Usually, either one of two things can happen to put you in this situation: - Inappropriate modeling data leaves room for model to cheat and find a perfect or near perfect prediction. - Modeling algorithm (or mathematical form of the model) is too complex and can fit any dataset perfectly. This is called overfitting the model. 31
32 Validating Model Performance To guard against overfitting, we typically use some of the modeling data to build the model, and the rest to validate its performance. - Gives us a way to see how well the model generalizes to data it has not seen yet. - If the modeling process came up with an overfit model, we will find out using the data set aside - Performance validated this way is called an unbiased measure of performance or generalization performance. There are different techniques for doing this: - Holdout method (also called train/test method or out-of-sample method) - Cross-validation (more complex) - Bootstrapping 32
33 Performance Validation Methods Holdout Method - Randomly select 70% (or other percentage) of data to build model. - Set aside (i.e., holdout) the other 30% and use it to measure performance of the built model. k-fold Cross-validation (CV) - Maximizes data efficacy of training/testing - Modeling data partitioned into k even groups (folds) Leave one fold out, build model on all the rest Measure performance of that model using the left-out fold Do for each fold, leaving one out at a time Average performance over each iteration - Train final model with all data 33
34 Step 5-2: Select Model Based on different criteria available, pick best model - Performance is easiest criteria since it is numeric and easily ranks the models - Other things may be important Does the model seem sensible? Is it easy to implement? Is it interpretable? May pick two or more good models and test each in a live setting or market test. 34
35 Step 6: Deployment Now we need to implement (deploy) the chosen model - Export model formula to a database or other program - Score prospects or customers according to the model formula - Take appropriate marketing action based on these scores Send/don t send promotion, take anti-churn action, etc. - Measure how well the model performed in the real-world live setting 35
36 Iterative Nature of Entire Process After deploying model, we can measure how well it did Model performance typically degrades over time as customers, products, and markets change Need to continually re-assess live model performance and rebuild models periodically. 36
37 Automation: Modeling as a Service Companies have come to use dozens or even hundreds/thousands of models to make predictions for a variety of business problems and situations. To scale to this level, models cannot be built and scored manually. Complex tools are needed to automate much of the process: - Data extraction and creation for modeling - Model building and testing automation - Model management and refresh - Model scoring This is the concept of analytics as a service to give organizations the capability to do this without investing enormous dollars/time into tools, people, and process. 37
38 The End If you have any questions, we ll be happy to help. Contact Paul Maiste 38
Evaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
ADVANCED MARKETING ANALYTICS:
ADVANCED MARKETING ANALYTICS: MARKOV CHAIN MODELS IN MARKETING a whitepaper presented by: ADVANCED MARKETING ANALYTICS: MARKOV CHAIN MODELS IN MARKETING CONTENTS EXECUTIVE SUMMARY EXECUTIVE SUMMARY...
Mining Marketing Data
1:1 Marketing or CRM Mining Marketing Data The basis of 1:1 marketing is share of customer not just market share. So Instead of selling as many products as possible to who ever will buy them. We try to
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Valuing Analytics & Predictive Modeling in Higher Ed
Valuing Analytics & Predictive Modeling in Higher Ed Michael Laracy CEO & Founder - Rapid Insight Inc.._ Michael Laracy, President, and CEO of Rapid Insight Inc., has had over 20 years of data analysis
THE PREDICTIVE MODELLING PROCESS
THE PREDICTIVE MODELLING PROCESS Models are used extensively in business and have an important role to play in sound decision making. This paper is intended for people who need to understand the process
IBM SPSS Direct Marketing
IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in
Applying Advance Analytics People, Process and Technology
Applying Advance Analytics People, Process and Technology www.sv-europe.com The CRISP-DM process 1.Business 6.Deployment 2.Data 5.Evaluation 3.Data Preparation 4.Modelling www.crispdm.org The CRISP-DM
Cross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
Credit Risk Analysis Using Logistic Regression Modeling
Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
Easily Identify the Right Customers
PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your
Guido Sciavicco. 11 Novembre 2015
classical and new techniques Università degli Studi di Ferrara 11 Novembre 2015 in collaboration with dr. Enrico Marzano, CIO Gap srl Active Contact System Project 1/27 Contents What is? Embedded Wrapper
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
CoolaData Predictive Analytics
CoolaData Predictive Analytics 9 3 6 About CoolaData CoolaData empowers online companies to become proactive and predictive without having to develop, store, manage or monitor data themselves. It is an
CART 6.0 Feature Matrix
CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window
INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER
INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
THE THREE "Rs" OF PREDICTIVE ANALYTICS
THE THREE "Rs" OF PREDICTIVE As companies commit to big data and data-driven decision making, the demand for predictive analytics has never been greater. While each day seems to bring another story of
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
Data Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics
An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor
Five Ways Retailers Can Profit from Customer Intelligence
Five Ways Retailers Can Profit from Customer Intelligence Use predictive analytics to reach your best customers. An Apption Whitepaper Tel: 1-888-655-6875 Email: info@apption.com www.apption.com/customer-intelligence
Predictive Analytics for Database Marketing
Predictive Analytics for Database Marketing Jarlath Quinn Analytics Consultant Rachel Clinton Business Development www.sv-europe.com FAQ s Is this session being recorded? Yes Can I get a copy of the slides?
Lecture 13: Validation
Lecture 3: Validation g Motivation g The Holdout g Re-sampling techniques g Three-way data splits Motivation g Validation techniques are motivated by two fundamental problems in pattern recognition: model
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Myth or Fact: The Diminishing Marginal Returns of Variable Creation in Data Mining Solutions
Myth or Fact: The Diminishing Marginal Returns of Variable in Data Mining Solutions Data Mining practitioners will tell you that much of the real value of their work is the ability to derive and create
Building and Deploying Customer Behavior Models
Building and Deploying Customer Behavior Models February 20, 2014 David Smith, VP Marketing and Community, Revolution Analytics Paul Maiste, President and CEO, Lityx In Today s Webinar About Revolution
A Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Explode Six Direct Marketing Myths
White Paper Explode Six Direct Marketing Myths Maximize Campaign Effectiveness with Analytic Technology Table of contents Introduction: Achieve high-precision marketing with analytics... 2 Myth #1: Simple
Maschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?
Class 1 Data Mining Data Mining and Artificial Intelligence We are in the 21 st century So where are the robots? Data mining is the one really successful application of artificial intelligence technology.
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Predictive Analytics for B2B Sales and Marketing
Buyer s Guide to Predictive Analytics for B2B Sales and Marketing Predictive analytics can improve a wide range of marketing and sales activities. Vendors differ significantly in terms of the data they
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics
stone analytics Automated Analytics and Predictive Modeling A White Paper by Stone Analytics 3665 Ruffin Road, Suite 300 San Diego, CA 92123 (858) 503-7540 www.stoneanalytics.com Page 1 Automated Analytics
Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM
Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview
Why Sample? Why not study everyone? Debate about Census vs. sampling
Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling
730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com info@raabassociatesinc.com
Lead Scoring: Five Steps to Getting Started 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com info@raabassociatesinc.com Introduction Lead scoring applies mathematical formulas to rank potential
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Multiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
Market Research Methodology
Market Research Methodology JANUARY 12, 2008 MARKET RESEARCH ANALYST Market Research Basics Market research is the process of systematic gathering, recording and analyzing of data about customers, competitors
The Power of Predictive Analytics
The Power of Predictive Analytics Derive real-time insights with accuracy and ease SOLUTION OVERVIEW www.sybase.com KXEN S INFINITEINSIGHT AND SYBASE IQ FEATURES & BENEFITS AT A GLANCE Ensure greater accuracy
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Simulation and Risk Analysis
Simulation and Risk Analysis Using Analytic Solver Platform REVIEW BASED ON MANAGEMENT SCIENCE What We ll Cover Today Introduction Frontline Systems Session Ι Beta Training Program Goals Overview of Analytic
A CRM -DATA MINING FRAMEWORK FOR DIRECT BANK MARKETING USING CLASSIFICATION MODEL
A CRM -DATA MINING FRAMEWORK FOR DIRECT BANK MARKETING USING CLASSIFICATION MODEL Miss. Akanksha P. Ingole 1, Prof. P.D.Thakare 2 1 M.E. Department of Computer science & Engineering, Jagadamba college
Churn Modeling for Mobile Telecommunications:
Churn Modeling for Mobile Telecommunications: Winning the Duke/NCR Teradata Center for CRM Competition N. Scott Cardell, Mikhail Golovnya, Dan Steinberg Salford Systems http://www.salford-systems.com June
On Cross-Validation and Stacking: Building seemingly predictive models on random data
On Cross-Validation and Stacking: Building seemingly predictive models on random data ABSTRACT Claudia Perlich Media6 New York, NY 10012 claudia@media6degrees.com A number of times when using cross-validation
Analyzing CRM Results: It s Not Just About Software and Technology
Analyzing CRM Results: It s Not Just About Software and Technology One of the key factors in conducting successful CRM programs is the ability to both track and interpret results. Many of the technological
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Nonparametric statistics and model selection
Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
3 Secrets to Adding Names (and Dollars) to Your Email Marketing Program
3 Secrets to Adding Names (and Dollars) to Your Email Marketing Program IT S BECOME IMPERATIVE THAT MARKETERS INVEST TIME AND RESOURCES INTO BUILDING THEIR EMAIL PROGRAMS BOTH ACQUIRING NEW SUBSCRIBERS
WHITE PAPER. The Five Fundamentals of a Successful FCR Program
The Five Fundamentals of a Successful FCR Program April 2012 Executive Summary Industry analysts agree that First Contact Resolution (FCR) is the best way to measure the effectiveness of your contact center.
The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner
Paper 3361-2015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Marketing Strategies for Retail Customers Based on Predictive Behavior Models
Marketing Strategies for Retail Customers Based on Predictive Behavior Models Glenn Hofmann HSBC Salford Systems Data Mining 2005 New York, March 28 30 0 Objectives Inform about effective approach to direct
Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III
www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed
Predictive Analytics for Donor Management
IBM Software Business Analytics IBM SPSS Predictive Analytics Predictive Analytics for Donor Management Predictive Analytics for Donor Management Contents 2 Overview 3 The challenges of donor management
Chapter 7: Data Mining
Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain
Section 6: Model Selection, Logistic Regression and more...
Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building
Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge
Binary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
Customer Life Time Value
Customer Life Time Value Tomer Kalimi, Jacob Zahavi and Ronen Meiri Contents Introduction... 2 So what is the LTV?... 2 LTV in the Gaming Industry... 3 The Modeling Process... 4 Data Modeling... 5 The
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Data Mining Techniques Chapter 4: Data Mining Applications in Marketing and Customer Relationship Management
Data Mining Techniques Chapter 4: Data Mining Applications in Marketing and Customer Relationship Management Prospecting........................................................... 2 DM to choose the right
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
DISCOVER MERCHANT PREDICTOR MODEL
DISCOVER MERCHANT PREDICTOR MODEL A Proactive Approach to Merchant Retention Welcome to Different. A High-Level View of Merchant Attrition It s a well-known axiom of business that it costs a lot more to
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
L13: cross-validation
Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
Monopolistic Competition, Oligopoly, and maybe some Game Theory
Monopolistic Competition, Oligopoly, and maybe some Game Theory Now that we have considered the extremes in market structure in the form of perfect competition and monopoly, we turn to market structures
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
EVALUATION AND MEASUREMENT IN MARKETING: TRENDS AND CHALLENGES
EVALUATION AND MEASUREMENT IN MARKETING: TRENDS AND CHALLENGES Georgine Fogel, Salem International University INTRODUCTION Measurement, evaluation, and effectiveness have become increasingly important
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information
Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation
Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation ABSTRACT Customer segmentation is fundamental for successful marketing
The Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are