Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Similar documents
Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Universal PMML Plug-in for EMC Greenplum Database

Easy Execution of Data Mining Models through PMML

The R pmmltransformations Package

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

PMML and UIMA Based Frameworks for Deploying Analytic Applications and Services

Make Better Decisions Through Predictive Intelligence

Data Mining + Business Intelligence. Integration, Design and Implementation

SHARING THREAT INTELLIGENCE ANALYTICS FOR COLLABORATIVE ATTACK ANALYSIS

Pentaho Data Mining Last Modified on January 22, 2007

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

The basic data mining algorithms introduced may be enhanced in a number of ways.

Azure Machine Learning, SQL Data Mining and R

Get to Know the IBM SPSS Product Portfolio

Achieve Better Insight and Prediction with Data Mining

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Make Better Decisions Through Predictive Intelligence

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

IBM SPSS Modeler Professional

from Larson Text By Susan Miertschin

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Database Marketing, Business Intelligence and Knowledge Discovery

Data Mining. Dr. Saed Sayad. University of Toronto

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

An Introduction to Data Mining

CUSTOMER Presentation of SAP Predictive Analytics

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

QsarDB first 100 DOIs for predictive models

Oracle Database 10g: Introduction to SQL

Introduction to Data Mining

Ensembles and PMML in KNIME

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Knowledge Discovery in Data with FIT-Miner

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining Part 5. Prediction

What s Cooking in KNIME

8. Machine Learning Applied Artificial Intelligence

Oracle Database: Introduction to SQL

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

How to Optimize Your Data Mining Environment

Oracle Database: Introduction to SQL

Improve Model Accuracy with Unstructured Data

On Compiling Data Mining Tasks to PDDL

Performing a data mining tool evaluation

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Tax Fraud in Increasing

An Overview of Knowledge Discovery Database and Data mining Techniques

Improve Results with High- Performance Data Mining

Introduction. A. Bellaachia Page: 1

Oracle Data Miner (Extension of SQL Developer 4.0)

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

ANALYTICS CENTER LEARNING PROGRAM

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Prerequisites. Course Outline

Oracle Database: Introduction to SQL

DATA MINING ALPHA MINER

not possible or was possible at a high cost for collecting the data.

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Web Document Clustering

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Mining Standards

Develop Predictive Models Using Your Business Expertise

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Database Programming with PL/SQL: Learning Objectives

Achieve Better Insight and Prediction with Data Mining

Dynamic Data in terms of Data Mining Streams

Cascading Pattern - How to quickly migrate Predictive Models (PMML) from SAS, R, Micro Strategies etc., onto Hadoop and deploy them at scale

Scorecard Element in PMML 4.1 Provides Rich, Accurate Exchange of Predictive Models for Improved Business Decisions

About Dell Statistica

Data Mining mit der JMSL Numerical Library for Java Applications

IBM SPSS Modeler 14.2 In-Database Mining Guide

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Oracle SQL. Course Summary. Duration. Objectives

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo Database And Data Mining Research Group

Deployment of Predictive Models. Sumit Kumar Bardhan

Fluency With Information Technology CSE100/IMT100

How To Use A Data Mining Tool

Operationalise Predictive Analytics

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Customer Classification And Prediction Based On Data Mining Technique

Oracle Data Miner (Extension of SQL Developer 4.0)

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

Customer and Business Analytic

Data Mining Extensions (DMX) Reference

Machine Learning with MATLAB David Willingham Application Engineer

Data Mining. Nonlinear Classification

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

IBM SPSS Data Preparation 22

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

DATA PREPARATION FOR DATA MINING

METHODOLOGICAL NOTE: Seasonal adjustment of retail trade sales

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Transcription:

Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1

Model Deployment Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions will need to be carried out in order to actually make use of the created models. http://chem-eng.utoronto.ca/~datamining/ 2

Model Deployment - Poll May 2009 http://www.kdnuggets.com/ http://chem-eng.utoronto.ca/~datamining/ 3

Model Deployments Use the data mining tool Programming Scripts Java, C, VB, SAS, SPSS, SQL Scripts TSQL, PL-SQL, SQL functions PMML (Predictive Model Markup Language) http://chem-eng.utoronto.ca/~datamining/ 4

Using Data Mining Tool (Orange) http://www.ailab.si/orange/ http://chem-eng.utoronto.ca/~datamining/ 5

Programming Scripts - Visual Basic http://chem-eng.utoronto.ca/~datamining/ 6

SQL Scripts - SQL Function select RegressionModel(null,25000,'street') http://chem-eng.utoronto.ca/~datamining/ 7

PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre and post processing). http://chem-eng.utoronto.ca/~datamining/ 8

PMML It is developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. PMML eliminates need for custom model deployment and allows for the clear separation of tasks: model development vs. model deployment. http://chem-eng.utoronto.ca/~datamining/ 9

Predictive Models supported by PMML Regression Neural Networks Support Vector Machines Decision Trees Naïve Bayes Clustering Sequences Rule Sets Association Rules Time-Series (as of PMML 4.0) Text Models http://chem-eng.utoronto.ca/~datamining/ 10

PMML Processes 1. Pre-Processing Data Dictionary: Allows for the explicit specification of valid, invalid and missing values. Mining Schema: Used to define the appropriate treatment to be applied to missing and invalid values. Transformations: Allow for variable discretization, normalization, and mapping with handling of missing and default values. Built-in Functions: Arithmetic expressions, handling of date and time as well as strings. Also used for implementing IF-THEN-ELSE logic and Boolean operations. 2. Models PMML allows for several predictive modeling techniques to be fully expressed. 3. Post-Processing Scaling of model outputs can be performed with PMML element Targets. http://chem-eng.utoronto.ca/~datamining/ 11

PMML Components http://chem-eng.utoronto.ca/~datamining/ 12

PMML Components - Header Header: contains general information about the PMML document, such as copyright information for the model, its description, and information about the application used to generate the model such as name and version. It also contains an attribute for a timestamp which can be used to specify the date of model creation. http://chem-eng.utoronto.ca/~datamining/ 13

PMML Components Data Dictionary Data Dictionary: contains definitions for all the possible fields used by the model. It is here that a field is defined as continuous, categorical, or ordinal. Depending on this definition, the appropriate value ranges are then defined as well as the data type (such as, string or double). http://chem-eng.utoronto.ca/~datamining/ 14

PMML Components Data Transformations Data Transformations: transformations allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations. Normalization: map values to numbers, the input can be continuous or discrete. Discretization: map continuous values to discrete values. Value mapping: map discrete values to discrete values. Functions: derive a value by applying a function to one or more parameters. Aggregation: used to summarize or collect groups of values. http://chem-eng.utoronto.ca/~datamining/ 15

Data Transformations http://chem-eng.utoronto.ca/~datamining/ 16

PMML Components Model Model: contains the definition of the data mining model. For example a fee-forward neural network is represented in PMML by a "NeuralNetwork" element which contains attributes such as: Model Name (attribute modelname) Function Name (attribute functionname) Algorithm Name (attribute algorithmname) Activation Function (attribute activationfunction) Number of Layers (attribute numberoflayers) http://chem-eng.utoronto.ca/~datamining/ 17

PMML Components Mining Schema Mining Schema: the mining schema lists all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as: Name (attribute name): must refer to a field in the data dictionary Usage type (attribute usagetype): defines the way a field is to be used in the model. Typical values are: active, predicted, and supplementary. Predicted fields are those whose values are predicted by the model. Outlier Treatment (attribute outliers): defines the outlier treatment to be use. In PMML, outliers can be treated as missing values, as extreme values (based on the definition of high and low values for a particular field), or as is. Missing Value Replacement Policy (attribute missingvaluereplacement): if this attribute is specified then a missing value is automatically replaced by the given values. Missing Value Treatment (attribute missingvaluetreatment): indicates how the missing value replacement was derived (e.g. as value, mean or median). http://chem-eng.utoronto.ca/~datamining/ 18

Model and Schema http://chem-eng.utoronto.ca/~datamining/ 19

PMML Components Targets Targets: allow for post-processing of the predicted value in the format of scaling if the output of the model is continuous. Targets can also be used for classification tasks. In this case, the attribute priorprobability specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. This can happen, e.g., if an input value is missing and there is no other method for treating missing values. http://chem-eng.utoronto.ca/~datamining/ 20

Targets http://chem-eng.utoronto.ca/~datamining/ 21

PMML 4.0 New Features Improved Pre-Processing Capabilities: Additions to built-in functions include a range of Boolean operations and an If-Then-Else function. Time Series Models: New exponential Smoothing models; also place holders for ARIMA, Seasonal Trend Decomposition, and Spectral Analysis, which are to be supported in the near future. Model Explanation: Saving of evaluation and model performance measures to the PMML file itself. Multiple Models: Capabilities for model composition, ensembles, and segmentation (e.g., combining of regression and decision trees). Extensions of Existing Elements: Addition of multi-class classification for Support Vector Machines, improved representation for Association Rules, and the addition of Cox Regression Models. http://chem-eng.utoronto.ca/~datamining/ 22

References http://www.dmg.org/ http://en.wikipedia.org/wiki/predictive_model_mark up_language http://chem-eng.utoronto.ca/~datamining/ 23