Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*)



Similar documents
1 Choosing the right data mining techniques for the job (8 minutes,

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

Easily Identify Your Best Customers

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

Oracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

GeneSys. Unit Six.Two: Administering a 360 Project. genesysonline.net. psytech.com

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Easily Identify the Right Customers

Welcome! Want to find out more? Follow this tutorial, then launch Part 1 to get started.

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

KnowledgeSEEKER Marketing Edition

Collaborative Forecasts Implementation Guide

Sitecore E-Commerce DMS Cookbook

2 Decision tree + Cross-validation with R (package rpart)

SPSS: Getting Started. For Windows

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Setup Instructions for Secure Hummingbird FTP

Using Excel as a Management Reporting Tool with your Minotaur Data. Exercise 1 Customer Item Profitability Reporting Tool for Management

IBM SPSS Direct Marketing 23

Example 3: Predictive Data Mining and Deployment for a Continuous Output Variable

Notes on Excel Forecasting Tools. Data Table, Scenario Manager, Goal Seek, & Solver

Using Data Mining to Detect Insurance Fraud

Free Trial - BIRT Analytics - IAAs

Infoview XIR3. User Guide. 1 of 20

Using Data Mining to Detect Insurance Fraud

IBM SPSS Direct Marketing 22

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

WEBFOCUS QUICK DATA FOR EXCEL

Tapping into Mobile App Installs. Building a Valuable User Base for Your App

Predictive modelling around the world

Campaign Manager 2.0 for Sitecore CMS 6.6

Deployment of Predictive Models. Sumit Kumar Bardhan

insight for Importers and Exporters An industry-specific guide for your MYOB software

DPL. Portfolio Manual. Syncopation Software, Inc.

ELECTRO-MECHANICAL PROJECT MANAGEMENT

Lecture 10: Regression Trees

Exercise 7.1 What are advertising objectives?

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

Tutorial #7A: LC Segmentation with Ratings-based Conjoint Data

How to Use the Cash Flow Template

In-Depth Guide Advanced Spreadsheet Techniques

PloneSurvey User Guide (draft 3)

Once you ve signed up, all you ll have to do is sign in. To sign in key in your address and password.

Google AdWords vs Google Analytics: Dissecting Remarketing Lists. Written by Carrie Albright, Senior Account Manager. hanapinmarketing.

Framing Business Problems as Data Mining Problems

Didacticiel Études de cas

Alex Vidras, David Tysinger. Merkle Inc.

Chapter 4 Displaying and Describing Categorical Data

DASYLab Techniques. Saving DASYLab data to an ASCII (text) readable file. Updated to reflect changes in DASYLab 13

WebFOCUS BI Portal: S.I.M.P.L.E. as can be

IBM Unica Leads Version 8 Release 6 May 25, User Guide

Advisor Guide. First.com 1/2/13

Data Mining Applications in Higher Education

The Edge Editions of SAP InfiniteInsight Overview

Fast Start Guide. A Single Access Solution that helps Loan Originators become more efficient and communicate more effectively.

Social Media. Measuring the ROI of. Optimize. Discover. Measure

5 WAYS TO DOUBLE YOUR WEB SITE S SALES IN THE NEXT 12 MONTHS

Understanding Characteristics of Caravan Insurance Policy Buyer

Marketing Content Creation

Implementing a Customer Lifetime Value Predictive Model: Use Case

Accounting Lab- Microsoft Dynamics GP 10.0

Integrated Company Analysis

Instructions for creating a data entry form in Microsoft Excel

ReceivablesVision SM Getting Started Guide

Why Modern B2B Marketers Need Predictive Marketing

Analyzing Data Using Excel

Configuration Manager

Profit Strategies for Small Businesses

Chapter 7. Who Wants My Product? Affinity-Based Marketing. Acronyms. 7.1 Introduction

Tutorial: Get Running with Amos Graphics

Creating Drawings in Pro/ENGINEER

IBM SPSS Direct Marketing 19

Oracle Data Miner (Extension of SQL Developer 4.0)

COC131 Data Mining - Clustering

Oracle Data Mining Hands On Lab

WEB TRADER USER MANUAL

37 Marketing Automation Best Practices David M. Raab Raab Associates Inc.

ZOINED RETAIL ANALYTICS. User Guide

This book serves as a guide for those interested in using IBM

The Google Guide to Search Advertising. How to make search advertising work for your business

Marketing for Website Owners: How to turn Clicks into Customers. Marketing from Constant Contact

CoolaData Predictive Analytics

Process Optimizer Hands-on Exercise

Web Analytics and the Importance of a Multi-Modal Approach to Metrics

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING

Marketing and Query Guide

Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model

Microsoft Office 365 Portal

Customer Life Time Value

An Introduction to K12 s Online School (OLS)

QUICK START GUIDE. Cloud based Web Load, Stress and Functional Testing

UDW+ Quick Start Guide to Functionality 2013 Version 1.1

The Five Steps to Unlock $10,000/mo from Google Ad Grants for Nonprofits

Tutorial Customer Lifetime Value

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Transcription:

Business Intelligence Professor Chen NAME: Due Date: Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*) Tutorial Summary Objective: Richard would like to figure out which customers he could expect to buy the new ereader and on what time schedule, based on the company s last release of a high-profile digital reader. How: The decision tree has enabled him to predict that and to determine how reliable the predictions are. He has also been able to determine which attributes are the most predictive of ereader adoption, and to find greater granularity (by switching Decision Tree s criteria from gain_ratio to gini_index PART I. Decision Trees with Two Datasets 1. Import two csv data set (training and scoring datasets) a. Be sure change from Semi-colon to Comma (, ) 2. Training dat aset with an additional attribute (i.e., ereader_adoption) in comparison with Scoring data set. 3. Add Set Role operator a. Add a Set Role operator to both datasets and change from User_ID to id b. Add another Set role operator to Training dataset and change the attribute of ereader_adoption to label (target attribute) 4. Add Decision Tree operator to Training dataset. Run it and obtain the solution of Graphic View (with three good predictors nodes with gray oval shapes; and four label attributes leaves with multicolored end points). PART II. MARKET SEGMENTATION by Combining Two DataSets 1. Add a new Apply Model operator and combining two datasets 2. Run again. 3. The solution of Graphic View remains the same; however, other results are obtained tree has been applied to the scoring data. It means that confidence attributes have been added. 4. From the Meta Data View: with logistic regression, four (4) confidence attributes have been created by RapidMiner, along with a prediction attribute (i.e., ereader_adoption) (Fig. 10-a on the Tutorial or Figure 10-9 in the North s text) 5. Switch to Data View and examine Row. 14: a. RapidMiner is very (but not 100%) convinced that person 77373 (Row 14, Fig. 10-10) is going to be a member of the early majority (88.9%). b. Despite some uncertainly, RapidMiner is completely sure that this person is not going to be an early adopter (0%). * Information is from Data Mining for the Masses by Matthew North (chapter 10) Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-1

CRISP-DM Methodology CRISP-DM stands for cross-industry process for data mining. The CRISP-DM methodology provides a structured approach to planning a data mining project. It is a robust and well-proven methodology. We are evangelists of its powerful practicality, its flexibility and its usefulness when using analytics to solve thorny business issues. It is the golden thread than runs through almost every client engagement. The CRISP-DM model is shown below. Six Phases of CRISP-DM Model [1] [2] [3] [6] [4] [5] We will explore this data mining example (with market segmentation) using CRIPS-DM methodology. Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-2

Phase 1. Business Scenario (Business Understanding) Richard works for a large online retailer. His company is launching a next-generation ereader soon, and they want to maximize the effectiveness of their marketing. They have many customers, some of whom purchased one of the company s previous generation digital readers. Richard has noticed that certain types of people were the most anxious to get the previous generation device, while other folks seemed to content to wait to buy the electronic gadget later. He s wondering what makes some people motivated to buy something as soon as it comes out, while others are less driven to have the product. Richard s employer helps to drive the sales of its new ereader by offering specific products and services for the ereader through its massive web site - for example, ereader owners can use the company s web site to buy digital magazines, newspapers, books, music, and so forth. The company also sells thousands of other types of media, such as traditional printed books and electronics of every kind. Richard believes that by mining the customers data regarding general consumer behaviors on the web site, he ll be able to figure out which customers will buy the new ereader early, which ones will buy next, and which ones will buy later on. He hopes that by predicting when a customer will be ready to buy the next-gen ereader, he ll be able to time his target marketing (i.e., market segmentation) to the people most ready to respond to advertisements and promotions. Organizational Understanding and Diffusion of Innovation Theory ** Diffusion of Innovation Theory Definition: The process by which an innovation is communicated through certain channels over time among the members of a social system. Four elements: Innovation: an idea, practice, or object that is perceived as new by an individual or other unit of adoption. Communication channels: the means by which messages get from one individual to another. Time: a) innovation-decision process b) relative time with which an innovation is adopted c) innovation s rate of adoption Social system: a set of interrelated units that are engaged in joint problem solving to accomplish a common goal. ** By Everett Rogers in his book Diffusion of Innovations. Join when it is new Join when they perceive a benefit Join when there is a productivity gain Join when there is a plenty of help and support Join when they have to Numbers of adopters by group. Cumulative number of adopters over time. Critical Mass (Late Adopters) The diffusion of innovations according to Rogers. With successive groups of consumers adopting the new technology (shown in blue), its market share (yellow) will eventually reach the saturation level. In mathematics the S curve is known as the logistic function. Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-3

Phase 2. Data Understanding Two Data Sets are listed below: 1. Scoring Data Set Online_Retailer_DataSet_Scoring.csv 2. Training Data Set Online_Retailer_DataSet_Training.csv Data file: Online_Retailer_DataSet_Scoring.csv (473 records) Data file: Online_Retailer_DataSet_Training.csv (661 records) Q: What is the difference (attributes) between these two data sets? Four possible values of ereader_adoption: [1] Innovator: purchase within the 1 st week [2] Early Adopter: purchase within 2 or 3 weeks [3] Early Majority: purchase > 3weeks but <= 2 months [4] Late Majority: purchase after the first two months Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-4

Meta Data on Data Sets Business Scenario Summary and Goal Richard has a list of customers and their probable adoption timings for the next-gen ereader. These customers are identifiable by the User_ID that was retained in the results perspective data but not used as a predictor in the model. Goal: He wants to these customers and begin a process of target marketing that is timely and relevant to each individual. The criteria and suggestions for segmenting customers are as follows: [1] Those who are most likely to purchase immediately (predicted innovators) can be contacted and encouraged to go ahead and buy as soon as the new product comes out. They may even want the option to pre-order the new device. On the other hand, perhaps very little marketing is needed to the predicted innovators, since they are predicted to be the most likely to buy the ereader in the first place. [2] Those who are probably likely to purchase earlier (often are opinion leaders, predicted early adopter) should be [3] Those who are less likely (predicted early majority) might need some persuasion, perhaps a free digital book or two with ereader purchase or a discount on digital music playable on the new ereader. [4] The least likely (predicted late majority), can be marketed to passively, or perhaps not at all if marketing budgets are tight and those dollars need to be spent incentivizing the most likely customers to buy. Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-5

Phase 3. Data Preparation - Steps to Achieve the Goal Step A: Decision Tree Step B: Market Segmentation and HOW? (details please see Phase 4) Step A: Decision Trees Decision trees are excellent predictive models when the target attributes is categorical in nature (e.g., erader_adoption with five possible values), and when the data set is of mixed types. In some cases, decision trees are better than more statistics-based approaches at handling attributes that have missing or inconsistent values that are not handled as decision trees will work around such data and still generate usable results. Decision Trees with Training and Scoring Data Sets Decision trees are made of nodes and leaves (connected by labeled branch arrows), representing the best predictor attributes in a data set. The nodes and leaves lead to confidence percentages in the training data set, and can then be applied to similarly structured scoring data in order to generate predictions for the scoring observation. Decision tress provides us: a) What is predicted, b) How confident we can be in the prediction, and c) How we arrived at the prediction? shown in a graphical view. 1. Import the first data set (i.e., Online_Retailer_DataSet_Training.csv) into RapidMiner repository as shown in Fig. 1-a thru Fig. 1-g. a) Import Data Read CSV (since they both are *.csv files) b) Read CSV operator is created (by double click or Drag and drop). c) Make sure on Step 2 (of 4) change from Semicolon ; to Comma, on Column Separation box since the file is with csv format. d) Change the name of the operator (Read CSV) to Training as shown in Fig 1-f and 1-g. Fig 1-a Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-6

Fig 1-b Fig 1-c Fig 1-d Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-7

Fig 1-e Fig 1-f Fig 1-g Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-8

2. Repeat the step to import the second data set (i.e., Online_Retailer_DataSet_Scoring.csv) into RapidMiner repository and the result is shown in Fig. 2 Fig 2 3. RUN the model to examine the data and familiarize with the attributes. Hint: make sure connect the out port to the res port on the process area; otherwise, the result can t be produced. Fig 3 4. Data View for the Two Data Sets are illustrated in Fig 4-a and 4-b Fig 4-a Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-9

Fig 4-b 5. SAVE the two data sets and process into the Local Repository as shown in Fig 5-a and Fig 5-b. Why? a) Return to Design perspective b) Click on Training operator c) Select File then Save Process As d) Click data under Local Repository on the Repository Browser box d) Enter Online Retailer Training Data Set e) Repeat the same step for entering another data set of Online Retailer Scoring Data Set Note that saving the data or process is under same option of Save Process As you then select one of the choice of data or process Fig 5-a Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-10

Fig 5-b 6. Finally SAVE the process into the Local Repository (Fig 6) a) Return to Design perspective b) Click anywhere on the Process area c) Select File then Save Process As d) Select process under Local Repository on the Repository Browser box e) Enter Online Retailer ereader Adoption Project Fig 6 7. Our first goal (Step A) is to find a solution using Decision Tree as shown in Fig 7-a to Fig 7-c. a) While there are no missing or apparently inconsistent values in the data set, there is still some data preparation yet to do. First of all, the User_ID is an arbitrarily assigned value for each customer. Usesr_ID should not be included in the model as an independent variable. b) Rather than removing the attribute (User_ID) we will try a new way of handling a non-predictive attribute. This is accomplished using the Set Role operator. Using the search field in the Operators tab, find and add Set Role operators to both your training and scoring streams. Be sure to re-connect the ports from Set Role operators to res. c) In the Parameters area on the right hand side of the screen, set the role of the User_ID attribute to id. This will leave the attribute in the data set throughout the model, but it won t consider the attribute as a predictor Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-11

for the label attribute. Be sure to do this for both the training and scoring data sets, since the User_ID attribute is found in both of them. Fig 7-a Fig 7-b Fig 7-c 7. cont. d) One of the nice side-effects of setting an attribute s role to id rather than removing it using a Select Attributes operator is that it makes each record easier to match back to individual people later, when viewing predictions in results perspective. e) Before adding a Decision Tree operator, we still need to do another data preparation step by adding another Set Role operator. The Decision Tree operator, as with other predictive model operators we ve used to this point, expects the training stream to supply a label attribute. For this example, we want to predict which adopter group Richard s next-gen ereader customers are likely to be in. So our label will be ereader_adoption and it should be reset in the Set Role operator (Fig 7-d) f) Next add Decision Tree operator to your training stream as it is in Figure 7-e g) Finally, Run the model and switch to the Tree (Decision Tree) tab in results perspective. You will see our preliminary tree in Figure 8-a. Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-12

Fig 7-d Fig 7-e 8. Interpretation of the Decision Tree view In the Decision Tree view (Figure 8-a) we can see what are referred to as nodes and leaves. The nodes are the gray oval shapes. They are attributes which serve as good predictors for our label attribute. The leaves are the multicolored end points that show us the distribution of categories from the label attribute that follow the branch of the tree to the point of that leaf. We can see in this tree that Website_Activity (on the top of the decision tree) is our best predictor of whether or not a customer is going to adopt (buy) the company s new ereader. If the person s activity is frequent or regular, we see that they are likely to be an Innovator or Early Adopter, respectively. If however, they seldom use the web site, then whether or not they ve bought digital books becomes the next best predictor of their ereader adoption category. If they have not bought digital books through the web site in the past, Age is another predictive attribute which forms a node, with younger folks adopting sooner than older ones. This is seen on the branches for the two leaves coming from the Age node in Fig 8-a. Those who seldom use the company s website, have never bought digital books on the site, and are older than 25 ½ are most likely to land in the Late Majority category, while those with the same profile but are under 25 ½ are bumped to the Early Majority prediction. In this example you can see how you read the nodes, leaves and branch labels as you move down through the tree. Fig 8-a Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-13

8. Cont. What information is still missed in Fig 8-b? This implies that we need to move to Phase 4. Fig 8-b Phase 4. Modeling Step B of achieving the Goal of Market Segmentation 1. With our predictor attributes prepared, we are now ready to move on to Step 4, Modeling. Save the process if needed. 2. Return to design perspective. In the Operators tab search for and add an Apply Model operator, bringing the training and scoring streams together. Ensure that both the lab (labelled attribute) and mod ports are connected to res ports in order to generate our desired outputs (Figure 9). Furthermore, exa (example) port in Set Role (2) operator and unl (unlabelled) port in Apply Model should be reconnected. Fig 9 3. Run the model. You will see familiar results the tree remains the same, for now. Click on the ExampleSet tab next to the Tree tab. Our tree has been applied to our scoring data. As was the case with logistic regression, confidence attributes (shown in Meta Data View) have been created by RapidMiner, along with a prediction attribute (Figure 10-a and 10-b) Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-14

Fig 10-a Fig 10-b Phase 5. Evaluation 1. Switch to Data View using the radio button. We see in Figure 10-b the prediction for each customer s adoption group, along with confidence percentages for each prediction. There are four confidence attributes, corresponding to the four possible values in the label (ereader_adoption). We interpret these the same way that we did with the other models though - the percentages add to 100%, and the prediction is whichever category yielded the highest confidence percentage. RapidMiner is very (but not 100%) convinced that person 77373 (Row 14, Figure 11) is going to be a member of the early majority (88.9%). Despite some uncertainty, RapidMiner is completely sure that this person is not going to be an early adopter (0%). Row no. 14 can be interpreted as: person 77373 is going to be a member of the early majority (88.9%). Despite some uncertainty, RapidMiner is completely sure that this person is not going to be an early adopter (0%). Question: What is the business implication? Answer: However, other persons are difficult to categorize accurately which segmentation they belong to? Why? How to resolve this issue? See further process after Deployment. Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-15

Fig 11 Phase 6. Deployment Richard s original desire was to be able to figure out which customers he could expect to buy the new ereader and on what time schedule, based on the company s last release of a high-profile digital reader. The decision tree has enabled him to predict that and to determine how reliable the predictions are and the likelihood of buying for each group. He s also been able to determine which attributes are the most predictive of ereader adoption. In order to produce better results for market segmentation we need to find greater detail, or greater granularity in the model by using gini_index as the tree s underlying algorithm. We are now re-visiting the following phases: Phase 4 Modeling Phase 5 Evaluation Phase 6 Deployment Revisiting Phase 4. Modeling Revisiting 1. Remember that CRISP-DM is cyclical in nature, and that in some modeling techniques, especially those with less structured data, some back and forth trial-and-error can reveal more interesting patterns in data. 2. Switch back to design perspective, click on the Decision Tree operator, and in the Parameters area, change the criterion parameter from gain_ratio to gini_index, as shown in Figure 12. 3. Re-RUN the model. Fig. 12 Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-16

Phase 5. Evaluation Revisiting 1. We see in this tree (Figure 13-a) that there is much more detail, more granularity in using the Gini algorithm as our parameter for our decision tree. We could further modify the tree by going back to design view and changing the minimum number of items to form a node (size for split) or the minimum size for a leaf. Even accepting the defaults for those parameters though, we can see that the Gini algorithm alone is much more sensitive than is the Gain Ratio algorithm in identifying nodes and leaves. 2. Take a minute to explore around this new tree model. We will find that it is extensive, and that we will to use both the Zoom and Mode tools to see it all. We should find that most of our other independent variables (predictor attributes) are now being used, and the granularity with which Richard can identify each customer s likely adoption category is much greater. 3. How active the person is on Richard s employer s web site is still the single best predictor, but gender, and multiple levels of age have now also come into play. We will also find that a single attribute is sometimes used more than once in a single branch of the tree. Decision trees are a lot of fun to experiment with, and with a sensitive algorithm like Gini generating them, they can be tremendously interesting as well. Fig. 13-a 4. Switch to the ExampleSet tab in Data View. We see here (Figure 13-b) that changing our tree s underlying algorithm has, in some cases, also changed our confidence in the prediction. 5. Remember in the previous discussion that most of persons other than 77373 (Row.14) are difficult to categorize accurately which segmentation they belong to since many of them (e.g., those are in Early Adopter ) are calculated as having at least some percentage chance of landing in any one of the four adopter categories. Fig. 13-b Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-17

Interpretation: Let s take the person on Row 1 (ID 56031) as an example. Under the Gain Ratio algorithm, we were 41% sure he d be an early adopter, but almost 32% sure he might also turn out to be an innovator. In other words, we feel confident he ll buy the ereader early on, but we re not sure how early. Gain_Ratio algorithm Fig. 14-a Maybe that matters to Richard, maybe not. However, he will have to decide during the deployment phase. But perhaps using Gieni_Index, we can help him decide. Gini_Index algorithm Fig. 14-b In Figure 14-b, this same man is now shown to have a 60% chance of being an early adopter and only a 20% chance of being an innovator. The odds of him becoming part of the late majority crowd under the Gini model have dropped to zero. We know he will adopt (or at least we are predicting with 100% confidence that he will adopt), and that he will adopt early. Why? While he may not be at the top of Richard s list when deployment rolls around, he ll probably be higher than he otherwise would have been under gain_ratio. Note that while Gini has changed some of our predictions, it hasn t affected all of them. Re-check person ID 77373 briefly. There is no difference in this person s predictions under either algorithm - RapidMiner is quite certain in its predictions for this young man. Phase 6. Deployment Revisiting Richard now has a tree that shows him which attributes matter most in determining the likelihood of buying for each group. New marketing campaigns can use this information to focus more on increasing web site activity level, or on connecting general electronics that are for sale on the company s web site with the ereaders and digital media more specifically. These types of cross-categorical promotions can be further honed to appeal to buyers of a specific gender or in a given age range. Richard has much that he can use in this rich data mining output as he works to promote the next-gen ereader. Tutorial for RapidMiner Advanced Tree and CRISP-DM Model with Market Segmentation; Page-18