INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc.
AGENDA Overview/Introduction to Data Mining and Predictive Modeling Building Models Using SAS Enterprise Miner Walk through example Essential steps: Sample, Explore, Modify, Model, Assess, Score Show selection of tools, how to change their properties and surface results Building Automated Models using Excel or SAS Enterprise Guide (Rapid Predictive Modeler)
INTRODUCTION TO DATA MINING
DATA MINING GOALS INSIGHT AGILE or DYNAMIC PERSONALIZATION SPEED PRECISION IMPROVED PROFITABILITY Better Decisions
ANALYTICS INFERENTIAL Inferential Statistics Uses patterns in the sample data to draw inferences about the population represented, accounting for randomness Answering yes/no questions about the data (hypothesis testing) Describing associations within the data (correlation) Modeling relationships within the data (regression) Source: Wikipedia
ANALYTICS PREDICTIVE Predictive Analytics Encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events. Include: Data Mining Forecasting Source: Wikipedia
ANALYTICS DATA MINING VERSUS FORECASTING Both are predictive and both model past behavior. DATA MINING Time independent Casual (relationship) focused Categorical, Continuous, Discrete Seldom weight more recent observations FORECASTING Time dependent Interval oriented Continuity assumed Frequently weights more recent phenomena
DATA MINING Descriptive Data Mining Predictive Data Mining
DATA MINING Descriptive Data Mining Clustering (Segmentation) Associations and Sequences Predictive Data Mining Classification Models to predict class membership Regression Models to predict a number
THE GOAL? SCORING! Scoring is the act of applying what we ve learned from data mining to new cases. Keep this goal in mind and use it to help formulate the questions and the data needed for data mining and scoring.
THE ULTIMATE GOAL? BETTER DECISIONS The ultimate goal of data mining is to improve decision making. As you formulate your problem, also keep in mind how and when model scores will be used.
EXAMPLE DEVELOPING A CLASSIFICATION MODEL Models are developed using historical data in which the behavior is observed or known. Indicates the behavior was observed in this subject Information about each subject, in this case an individual, is used as inputs to the model to see how well the model can distinguish between the people who exhibit the behavior and those who do not. For example, age, gender, previous behaviors, etc.
EXAMPLE DATA
WHY? Consider a group of subjects whose relevant behavior is unknown. The same information is available for each of these subjects (age, gender, etc.) as is available for the individuals with known behavior. We would like to know which individuals are most likely to have the relevant behavior.
EXAMPLE NEW DATA?
SCORING The output of a predictive classification model output is typically an equation. Models are applied to new cases to calculate the predicted behavior through a process called scoring. Scoring, using the equation, calculates each subject s likelihood to have the relevant behavior. (It also calculates the likelihood to not have the behavior.)
EXAMPLE SCORED DATA
THE ANALYTICS LIFECYCLE BUSINESS MANAGER Domain Expert Makes Decisions Evaluates Processes and ROI EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION BUSINESS ANALYST Data Exploration Data Visualization Report Creation DEPLOY MODEL DATA EXPLORATION IT SYSTEMS / MANAGEMENT Data Preparation Model Validation Model Deployment Model Monitoring VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT DATA MINER / STATISTICIAN Exploratory Analysis Descriptive Segmentation Predictive Modeling
THE ANALYTICS LIFECYCLE EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION DEPLOY MODEL DATA EXPLORATION VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT
MAIN TYPES OF DATA MARTS One-Row-per- Subject Data Mart Multiple-Row-per- Subject Data Mart Longitudinal Data Mart
THE ANALYTICS LIFECYCLE SAS Enterprise Miner focuses on these aspects of the process. DEPLOY MODEL EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION DATA EXPLORATION VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT DATA MINER / STATISTICIAN Exploratory Analysis Descriptive Segmentation Predictive Modeling
SAS ENTERPRISE MINER
SAS ENTERPRISE MINER
SAS ENTERPRISE MINER Organized and logical GUI for data mining success Unmatched suite of modeling techniques and methods Sophisticated set of data preparation, summarization and exploration tools Business-based model comparisons, reporting and management
SAS ENTERPRISE MINER Automated scoring process delivers faster results High-performance gridenabled workbench Modern, distributable data mining system suited for large enterprises Open, extensible design for ultimate flexibility
WHAT IS SAS ENTERPRISE MINER? SAS Enterprise Miner is a sophisticated graphical user interface, designed with the specific needs of data miners in mind. SAS Enterprise Miner is a data miner s workbench that manages the process and provides a comprehensive set of tools to aid the data miner throughout the essential steps, known by the acronym, SEMMA: Sample, Explore, Modify, Model, Assess. SAS Enterprise Miner streamlines the data mining process to create highly accurate predictive and descriptive models based on analysis of vast amounts of data from across an enterprise.
DATA MINING WITH SAS ENTERPRISE MINER
SAS ENTERPRISE MINER 7.1 AND 12.1 MODEL DEVELOPMENT PROCESS (SEMMA) Sample Explore Modify Model Assess Utility
SAS ENTERPRISE MINER
SAS ENTERPRISE MINER Use the desired tools to define a logical process (SEMMA) Sample Explore Modify Model Assess
SAS ENTERPRISE MINER Modify settings (properties) for the tools.
SAS ENTERPRISE MINER Run the flow and check results. Refine as needed.
DEMONSTRATION
AUTOMATED PREDICTIVE MODELING
SAS RAPID PREDICTIVE MODELER KEY DRIVERS (BUSINESS USERS) Need to generate numerous models to solve a variety of business problems in a credible manner Models need to be developed in a quick timeframe using a self-service approach Does not want to always rely on analytic professionals (e.g. statistician or modeler or data miner)
SAS RAPID PREDICTIVE MODELER KEY DRIVERS (ANALYTIC PROFESSIONALS) Solving more complex issues on hand to gain incremental value Further customize or refine models for better results
RAPID PREDICTIVE MODELER
Open your data in SAS Enterprise Guide or Microsoft Excel Use the Rapid Predictive Modeler task and modify settings Review results
Microsoft Excel
SAS Enterprise Guide
RAPID PREDICTIVE MODELER BASIC
RAPID PREDICTIVE MODELER INTERMEDIATE
RAPID PREDICTIVE MODELER ADVANCED
RAPID PREDICTIVE MODELER: SAMPLE OUTPUT
Rapid Predictive Modeler: Sample Output
Rapid Predictive Modeler: Sample Output
Rapid Predictive Modeler: Sample Output
DEMONSTRATION
IN CONCLUSION
SAS ENTERPRISE MINER BENEFITS Support the entire data mining process with a broad set of tools. Build more models faster with an easy-to-use Graphical User Interface. Enhance accuracy of predictions Surface business information and easily share results through the unique model repository
RESOURCES SAS Rapid Predictive Modeler Website Product brief, Press release, Brief product demo, etc. SAS Enterprise Miner Web Site SAS Enterprise Miner Technical Support Web Site SAS Enterprise Miner Technical Forum (Join Today!) SAS Enterprise Miner Training Rapid Predictive Modeling for Customer Intelligence SAS Global Forum 2010 paper written by Wayne Thompson and David Duling, SAS Institute Inc., Cary, NC
POTENTIAL NEXT STEPS Work through the example in Getting Started with SAS Enterprise Miner - Both the data and the documentation are available on support.sas.com http://support.sas.com/documentation/onlinedoc/miner/ Contact SAS Technical Support if you get stuck There is no charge for this it is included in your SAS software license.
THANK YOU FOR USING SAS! www.sas.com