SPSS Modeler Integration with IBM DB2 Analytics Accelerator



Similar documents
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

In-Database Analytics

Harnessing the power of advanced analytics with IBM Netezza

IBM Netezza High Capacity Appliance

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Advanced In-Database Analytics

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Manage your IT Resources with IBM Capacity Management Analytics (CMA)

Netezza and Business Analytics Synergy

Focus on the business, not the business of data warehousing!

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

2015 Ironside Group, Inc. 2

IBM SPSS Modeler Professional

Data Mining for Everyone

Next Generation Data Warehousing Appliances

Main Memory Data Warehouses

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

IBM SPSS Modeler Premium

HIGH PERFORMANCE ANALYTICS FOR TERADATA

Maximizing Return and Minimizing Cost with the Decision Management Systems

Solve your toughest challenges with data mining

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Customer Insight Appliance. Enabling retailers to understand and serve their customer

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Hexaware E-book on Predictive Analytics

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

Using Predictions to Power the Business. Wayne Eckerson Director of Research and Services, TDWI February 18, 2009

Updating Your Microsoft SQL Server 2005 Skills to SQL Server 2008

IBM Netezza High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum

Customized Report- Big Data

III JORNADAS DE DATA MINING

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

IBM SPSS Modeler 14.2 In-Database Mining Guide

Beyond Watson: The Business Implications of Big Data

Oracle Big Data Strategy Simplified Infrastrcuture

DATA MINING USING PENTAHO / WEKA

Sunnie Chung. Cleveland State University

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Bringing Big Data into the Enterprise

Solve your toughest challenges with data mining

IBM Data Warehousing and Analytics Portfolio Summary

Extend your analytic capabilities with SAP Predictive Analysis

Big Data and Data Science: Behind the Buzz Words

SAS. Predictive Analytics Suite. Overview. Derive useful insights to make evidence-based decisions. Challenges SOLUTION OVERVIEW

Exploitation of Predictive Analytics on System z

IBM Netezza Analytics

Multimodal Biometrics R&D Efforts to Exploit Biometric Transaction Management Systems

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

IBM SPSS Modeler 15 In-Database Mining Guide

IBM BigInsights for Apache Hadoop

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

SAS 9.4 Intelligence Platform

Using Data Mining to Detect Insurance Fraud

IBM Analytical Decision Management

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Real-time Fraud Detection Analytics on IBM System z

Industry Models and Information Server

Using Data Mining to Detect Insurance Fraud

SAP HANA In-Memory Database Sizing Guideline

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Integrating Apache Spark with an Enterprise Data Warehouse

Why include analytics as part of the School of Information Technology curriculum?

Big Data on Microsoft Platform

Scenario 2: Cognos SQL and Native SQL.

Datalogix. Using IBM Netezza data warehouse appliances to drive online sales with offline data. Overview. IBM Software Information Management

Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms

Advanced Big Data Analytics with R and Hadoop

IBM SPSS Modeler Professional

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

IBM DB2 Data Archive Expert for z/os:

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

ANALYTICS CENTER LEARNING PROGRAM

Using Tableau Software with Hortonworks Data Platform

Integrating Netezza into your existing IT landscape

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Real World Application and Usage of IBM Advanced Analytics Technology

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Make Better Decisions Through Predictive Intelligence

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Transcription:

SPSS Modeler Integration with IBM DB2 Analytics Accelerator Markus Nentwig August 31, 2012 Markus Nentwig SPSS Modeler Integration with IDAA 1 / 12

Agenda 1 Motivation 2 Basics IBM SPSS Modeler IBM DB2 Analytics Accelerator (IDAA) 3 My Work Task Overview Fraud Prediction for Banking Scenario 4 Results Markus Nentwig SPSS Modeler Integration with IDAA 2 / 12

New information out of old transactions!? Example: retail business website: Customers who bought book A also bought book X and Y Market Basket Analysis Questions: How does it work? What are the problems? Possible solution to Market Basket Analysis Data Mining Markus Nentwig SPSS Modeler Integration with IDAA 3 / 12

IBM SPSS Modeler Data Mining workbench to discover knowledge in databases Tool for Data Mining: IBM SPSS Modeler Scan all transactions made in past find associations, propose them to new customers Market Basket Analysis example: Markus Nentwig SPSS Modeler Integration with IDAA 4 / 12

IBM DB2 Analytics Accelerator (IDAA) Data Warehouse appliance powered by Netezza technology System z196 connected to IDAA Accelerate specific (often analytic) queries Appliance makes it easy to install / operate from Redbook: Optimizing DB2 Queries with IBM DB2 Analytics Accelerator for z/os Figure Markus Nentwig SPSS Modeler Integration with IDAA 5 / 12

IBM DB2 Analytics Accelerator (IDAA) Computation with new approach on IDAA Figure from Redbook: Optimizing DB2 Queries with IBM DB2 Analytics Accelerator for z/os OLAP-type access to data Initial data load once from DB2 Pass query to IDAA Massive Parallel Processing (MPP) on Snippet Blades Data Mining on IDAA, less work on DB2 Markus Nentwig SPSS Modeler Integration with IDAA 6 / 12

IBM DB2 Analytics Accelerator (IDAA) Results with new IDAA approach Iterate about whole data base find associations Netezza-based MPP architecture well suited Use of IDAA ensures integration with DB2 transparent for customer Multiple Terabyte TransactionTable not moved anymore Small resulting table (red) back to DB2 Markus Nentwig SPSS Modeler Integration with IDAA 7 / 12

Task Overview Subjects I worked on Describe model build on IBM SPSS Modeler and possible new approach with IDAA Find real scenarios and map them to both approaches Preparation tasks for performance test Proposal for model build optimization Markus Nentwig SPSS Modeler Integration with IDAA 8 / 12

Fraud Prediction for Banking Scenario Real world business scenario Prediction of possible credit card transaction fraud Examples: Big transactions in abnormal time Multiple purchases from different vendors in short time High risk country origin 1 Model Training: Check old transactions for fraudulent patterns 2 Scoring: Apply model to new transactions block or approve Markus Nentwig SPSS Modeler Integration with IDAA 9 / 12

Fraud Prediction for Banking Scenario Example: algorithm mapped to IDAA Algorithm RFM-Analysis in IBM SPSS Modeler: One node calculates values No algorithm equivalent on IDAA side Map RFM-Analysis to IDAA One page SQL code Markus Nentwig SPSS Modeler Integration with IDAA 10 / 12

Results Model build accelerated using Netezza technology Business scenarios mapped to new architecture Performance measurement in progress Related presentation on IOD: IBM Software InformationOnDemand 2012 October 21-25 IDW-1626A zolap - Accelerate SPSS Modeling and Data Mining Using IDAA on z Speakers: Oliver Benke, Oliver Draese, Roland Seiffert Markus Nentwig SPSS Modeler Integration with IDAA 11 / 12

Thank you! Thank you for listening. Any questions? IBM, the IBM logo, ibm.com and DB2 Analytics Accelerator are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Markus Nentwig SPSS Modeler Integration with IDAA 12 / 12

Backup Backup Markus Nentwig SPSS Modeler Integration with IDAA 13 / 12

Backup Preparation for performance test (1) Data preparation Data extraction out of the given complex scheme We only need some tables for the model creation Adaption to the needs of DB2 / IDAA Table creation, change of data type, date format Enlargement of data basis (from less than 100 MB to GB-TB) Java tool, care for primary key indices Markus Nentwig SPSS Modeler Integration with IDAA 14 / 12

Backup Preparation for performance test (2) Load to DB2 and also to IDAA DB2 LOAD utility used within a JCL script on the host Accelerate (Copy) tables to IDAA with IDAA Studio \\LOAD EXEC PGM=DSNUTILB,PARM=DBNI... LOAD DATA INDDN INPUTD REPLACE LOG NO ENFORCE NO FORMAT DELIMITED INTO TABLE NENTWIG.TABLE_NAME ( PARAM TYPE,... ) Markus Nentwig SPSS Modeler Integration with IDAA 15 / 12

Backup Preparation for performance test (3) Markus Nentwig SPSS Modeler Integration with IDAA 16 / 12

Backup Preparation for performance test (4) Implement applied algorithms on Netezza Much pre-defined functionality with IBM SPSS In-Database Analytics like Discretization, normalization Decision trees, association rules Different clustering algorithms and so on Exploit and adapt to work like in SPSS Modeler Example discretization: CALL nza..efdisc( outtable=rfm_bounds, intable=source, bins=5 incolumn=recency_days;frequency;monetary ); CALL nza..apply_disc( outtable=rfm, intable=source, btable=rfm_bounds, replace=false ); Markus Nentwig SPSS Modeler Integration with IDAA 17 / 12