Exploitation of Predictive Analytics on System z



Similar documents
Hybrid Transaction/Analytic Processing (HTAP) The Fillmore Group June A Premier IBM Business Partner

Netezza and Business Analytics Synergy

Harnessing the power of advanced analytics with IBM Netezza

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Manage your IT Resources with IBM Capacity Management Analytics (CMA)

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Predictive analytics with System z

III JORNADAS DE DATA MINING

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

PREDICTIVE ANALYTICS IN HIGHER EDUCATION NOVEMBER 6, 2014

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

Query Optimization in Cloud Environment

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Predictive Analytics: Turn Information into Insights

Solve your toughest challenges with data mining

IBM Analytical Decision Management

IBM Netezza High Capacity Appliance

Main Memory Data Warehouses

Solve your toughest challenges with data mining

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Understanding Data Warehouse Needs Session #1568 Trends, Issues and Capabilities

Advanced In-Database Analytics

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Performance Analysis of Cloud Relational Database Services

Data Warehouse: Introduction

Integrating Netezza into your existing IT landscape

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Introduction to Predictive Analytics: SPSS Modeler

MDM and Data Warehousing Complement Each Other

SPSS Modeler Integration with IBM DB2 Analytics Accelerator

IBM InfoSphere Guardium for DB2 on z/os Technical Deep Dive

A Whole New World. Big Data Technologies Big Discovery Big Insights Endless Possibilities

James Serra Sr BI Architect

Driving Peak Performance IBM Corporation

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

2015 Ironside Group, Inc. 2

Beyond the Single View with IBM InfoSphere

How the oil and gas industry can gain value from Big Data?

Next Generation Data Warehousing Appliances

IBM Data Warehousing and Analytics Portfolio Summary

Focus on the business, not the business of data warehousing!

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Data-Driven Decisions: Role of Operations Research in Business Analytics

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Predictive Analytics And IT Service Management

Data Mining for Everyone

Advanced Big Data Analytics with R and Hadoop

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Introducing Oracle Exalytics In-Memory Machine

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Business Intelligence at Albert Heijn

The HP Neoview data warehousing platform for business intelligence

2009 Oracle Corporation 1

PureSystems: Changing The Economics And Experience Of IT

Chapter 4 Getting Started with Business Intelligence

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Customer Insight Appliance. Enabling retailers to understand and serve their customer

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Making critical connections: predictive analytics in government

Q1 Labs Corporate Overview

Oracle Database In-Memory The Next Big Thing

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Three steps to put Predictive Analytics to Work

BROCADE PERFORMANCE MANAGEMENT SOLUTIONS

Parallel Data Warehouse

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

Luncheon Webinar Series May 13, 2013

Big Data and Trusted Information

Leveraging the RA/FM Platform to Deliver Business Insights to Finance & Marketing

INVESTOR PRESENTATION. First Quarter 2014

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Business Performance without limits how in memory. computing changes everything

Maximizing Return and Minimizing Cost with the Decision Management Systems

QlikView Business Discovery Platform. Algol Consulting Srl

Integrating SAP and non-sap data for comprehensive Business Intelligence

Inge Os Sales Consulting Manager Oracle Norway

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Five Technology Trends for Improved Business Intelligence Performance

What is Security Intelligence?

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Exploiting Accelerator Technologies for Online Archiving

Transcription:

Nordic GSE 2013, S506 Exploitation of Predictive Analytics on System z End to End Walk Through Wang Enzhong (wangec@cn.ibm.com) Technical and Technology Enablement, System z Brand IBM System and Technology Group Wei Kewei (weikewei@cn.ibm.com) DB2 for z/os Optimizer Development and Tuning, IBM

Agenda Overview on Business analytics on system z Cross-sell end-to-end solution Real-time Anti-fraud detection for credit card(case Study and Best Practice) Q&A 2

Business Analytics Landscape Stochastic Optimization Optimization How can we achieve the best outcome including the effects of variability? How can we achieve the best outcome? Prescriptive Predictive modeling What will happen next? Analytics Competitive Advantage Simulation Forecasting Alerts Query/drill down Ad hoc reporting What will happen if? What if these trends continue? What actions are needed? What exactly is the problem? How many, how often, where? Predictive Reporting Descriptive Standard Reporting What happened? Degree of Complexity Increasing prevalence of compute and data intensive parallel algorithms in commercial workloads driven by real time decision making requirements and industry wide limitations to increasing thread speed. Based on: Competing on Analytics, Davenport and Harris, 2007 3

Critical Components in Contemporary BA Solutions Data Warehouse Data Intensive Data Marts DW/BA App. Server ELT ELT Batch Copy BA Modeler Engine Numerically Intensive Operational Source Systems Structured/ Unstructured Data 6/12/2013 Transactional System Compute Scores BA Scoring Engine 4

DB2 Analytics Accelerator V3.1 Capitalizing on the best of both worlds System z and Netezza What is it? The IBM DB2 Analytics Accelerator is a workload optimized, appliance add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times. How is it different Performance: Unprecedented response times to enable 'train of thought' analyses frequently blocked by poor query performance. Integration: Deep integration with DB2 provides transparency to all applications. Self-managed workloads: queries are executed in the most efficient location Transparency: applications connected to DB2 are entirely unaware of the Accelerator Simplified administration: appliance hands-free operations, eliminating most database tuning tasks Breakthrough Technology Enabling New Opportunities 5

DB2 Analytics Accelerator V3.1 Lowering the costs of trusted analytics What s New? High Performance Storage Saver Store a DB2 table or partition of data solely on the Accelerator. Removes the requirement for the data to be replicated on both DB2 and the Accelerator Incremental Update Enables tables within the Accelerator to be continually updated throughout the day. zenterprise EC12 Support Version 3 will support the zenterprise EC12, z196 and z114 System z platforms Query Prioritization Brings System z workload management down to the individual query being routed to the Accelerator High Capacity Support has been extended to include the entire Netezza 1000 line (1.28 PB) UNLOAD Lite Reduces z/os MIPS consumption, by moving the preparation off System z. 6

Deep DB2 Integration within zenterprise Applications DBA Tools, z/os Console,... Application Interfaces (standard SQL dialects) DB2 for z/os Operational Interfaces (e.g. DB2 Commands) Data Manager Buffer Manager... IRLM Log Manager IBM DB2 Analytics Accelerator Superior availability reliability, security, Workload management z/os on System z Superior performance on analytic queries 7

Bringing Netezza AMPP TM Architecture to DB2 AMPP = Asymmetric Massively Parallel Processing CPU FPGA Advanced Analytics Memory BI DB2 for z/os SMP Host CPU Memory FPGA Legacy Reporting CPU FPGA DBA Memory Network Fabric S-Blades IBM DB2 Analytics Accelerator Disk Enclosures 8

Large Insurance Company Business Reporting we had this up and running in days with queries that ran over 1000 times faster Total Rows Reviewed DB2 Only Total Rows Returned Hours Sec(s) Hours Sec(s) Times Faster Query Query 1 2,813,571 853,320 2:39 9,540 0.0 5 1,908 Query 2 2,813,571 585,780 2:16 8,220 0.0 5 1,644 Query 3 8,260,214 274 1:16 4,560 0.0 6 760 Query 4 2,813,571 601,197 1:08 4,080 0.0 5 816 Query 5 3,422,765 508 0:57 4,080 0.0 70 58 Query 6 4,290,648 165 0:53 3,180 0.0 6 530 Query 7 361,521 58,236 0:51 3,120 0.0 4 780 Query 8 3,425.29 724 0:44 2,640 0.0 2 1,320 Query 9 4,130,107 137 0:42 2,520 0.1 193 13 DB2 Analytics Accelerator (Netezza 1000-12) Production ready - 1 person, 2 days Choose a Table for Acceleration Table Acceleration Setup in 2 Hours DB2 Add Accelerator Choose a Table for Acceleration Load the Table (DB2 Loads Data to the Accelerator Knowledge Transfer Query Comparisons DB2 with IDAA Initial Load Performance 400 GB Loaded in 29 Minutes 570 Million Rows Loaded 800 GB to 1.3 TB per hour Extreme Query Acceleration - 1908x faster 2 Hours 39 minutes to 5 Seconds CPU Utilization Reduction to 35% 9

Critical Components in Contemporary BA Solutions Data Warehouse Data Intensive Data Marts DW/BA App. Server ELT ELT Batch Copy BA Modeler Engine Numerically Intensive Operational Source Systems Structured/ Unstructured Data 6/12/2013 Transactional System Compute Scores BA Scoring Engine 10

IBM SPSS Technology IBM SPSS technology drives the widespread use of data in decisionmaking through statistics-based analysis of data and the deployment of predictive analytics into the decision making process What is predictive analytics? Predictive analytics is a business intelligence technology that predicts what is likely to happen in the future by analyzing patterns in past data Predictions are delivered in the form of scores generated by a predictive model that has been trained on historical data Assigning these predictive score is the job of a predictive model which has been trained over your data With predictive analytics, the enterprise learns from is cumulative experience (data) and takes actions to apply what has been learned 11

Business Analytics Real Time Predictive Analytics Support for both in-transaction and indatabase scoring on the same platform End to end solution Customer Interaction Data In Business System / OLTP ETL DB2 for z/os Data Historical Store Real-Time Score/ Decision Out Application w/latest data DB2 for z/os R-T, min, hr, wk, mth Copy Reduced Networking Meet & Exceed SLA Automated Model Updates SPSS Modeler For Linux on System z Scoring Algorithm Consolidates Resources 12

The Three Pillars of Predictive Analytics Acquire customers: Understand who your best customers are Connect with them in the right ways Take the best action maximize what you sell to them Grow customers: Understand the best mix of things needed by your customers and channels Maximize the revenue received from your customers and channels Take the best action every time to interact Retain customers: Understand what makes your customers leave and what makes them stay Keep your best customers happy Take action to prevent them from leaving Predictive Operational Analytics Manage Maintain Maximize Manage operations: Maximize the usage of your assets Make sure inventory and resources are in the right place at the right time Identify the impact of investment Maintain infrastructure: Understand what causes failure in your assets Maximize uptime of assets Reduce costs of upkeep Maximize capital efficiency: Improve the efficiency and effectiveness of your assets Reduce operational costs Drive operational excellence in all phases: procurement, development, availability and distribution Predictive Threat & Fraud Analytics Monitor Detect Control Monitor environments: Identify leaks Increase compliance Leverage insights in critical business functions Detect suspicious activity: Identify fraudulent patterns Reduce false positives Identity collusive and fraudulent merchants and employees Identify unanticipated transaction patterns Control outcomes: Take action in real-time to prevent abuse Reduce Claims Handling Time Alert clients of transaction fraud 13

Data Warehousing and Business Analytics with zenterprise and IBM DB2 Analytics Accelerator zenterprize EC12/ 196 and z114* IBM DB2 Analytics Accelerator Data Warehouse (DB2 for z/os) Data Intensive Data Marts DW/BA App. Server (Cognos 10.1 BI) ELT Operational Source Systems Structured/ Unstructured Data 08/28/ 12 ELT Infosphere Data Warehouse 9.7.3 Information Server Batch Copy Transactional System (DB2 for z/os) DB2 UDFs Compute Scores BA Modeler Engine (SPSS Modeler 15) Numerically Intensive BA Scoring Engine (DB2 10 Accessory Suite) Complex, analytical queries requiring extensive table scans of large, historical data sets run on IBM DB2 Analytics Accelerator. Results returned from the analysis can be joined with current or near real-time data in the data warehouse on the System z to deliver immediate recommendations, creating, in effect, a high performance opera-tional BI service. 14

OnLine Transactional and Analytics Processing (OLTAP) Operational Systems (OLTP) Operational Systems (OLTP) Enterprise Data z/os Warehouse LPAR Enterprise EDWH Data Warehouse Data Sharing Group RT Trx Scoring DB2 for z/os OLTP plus Add. dimension ELT ETL DB2 for z/os EDWH Batch Scoring Model Refresh Batch Scoring IDAA using Netezza technology Model Construction DB2 for z/os EDWH Static data Batch Scoring SPSS Statistics and Modelling InfoSphere Warehouse on System z SQW and Cubing Services Cognos BI and Reporting Linux on System z Linux on System z z/os, Linux System z 15

Agenda Overview on Business analytics on system z Cross-sell end-to-end solution Real-time Anti-fraud detection for credit card(case Study and Best Practice) Q&A 16

User Scenario Real time product recommendation Predictive Analytics helps connect data to effective action by drawing reliable conclusions about current conditions and future events Gareth Herschel, Research Director, Gartner Group By learning from the shopping history data, recommend what the customer is likely to buy in real time to increase cross-sell. 17

Data Mining Methodology CRISP-DM Business understanding: Business understanding includes determining business objectives, assessing the situation, determining data mining goals, and producing a project plan. Data understanding:this phase addresses the need to understand what your data resources are and the characteristics of those resources. It includes collecting initial data, describing data, exploring data, and verifying data quality. Deployment:This phase focuses on integrating your new knowledge into your everyday business processes to solve your original business problem. This phase includes plan deployment, monitoring and maintenance, producing a final report, and reviewing the project. Data preparation:preparations include selecting, cleaning, constructing, integrating, and formatting data. Modeling:sophisticated analysis methods are used to extract information from the data. This phase involves selecting modeling techniques, generating test designs, and building and assessing models. Evaluation:evaluate how the data mining results can help you to achieve your business objectives. Elements of this phase include evaluating results, reviewing the data mining process, and determining the next steps. CRISP-DM:CRoss-Industry Standard Process for Data Mining 18

Predictive Analytics Process Real-Time Predictive Analytics Input Data 3. Deploy model OLTP Scoring 1.Select Model 2. Train Model using historical data Data WareHouse orderkey partkey suppkey linenumber quantity 15711815 27 12636 1 26 15711815 40 16799 2 47 15711815 39 17390 3 41 15711815 45 7496 4 19 15711815 32 21483 5 37 15711815 11 18212 6 48 15711815 28 22274 7 13 Model Generation 19

Cross-Sell Solution Overview Incoming Transaction Business Understanding Recommend relative product based on historical data to grow profit Data Understanding Distorted TPCH data, which is in e-commercial industry, to simulate real situation Deployment Deploy the model into OLTP environment as UDF in DB2 for z/os and web-service on zlinux. Evaluation Us another dataset to evaluation the pattern are correct or not Data Preparation Transform order detail into vector group by order id. Modeling Association algorithm Aprori to discover what products are usually purchased together. 20

Data Understanding PART LINEITEM ORDERS PARTKEY NAME MFGR ORDERKEY PARTKEY SUPPKEY * 1 ORDERKEY CUSTKEY ORDERSTATUS * BRAND LINENUMBER TOTAL PRICE TYPE QUANTITY ORDER DATE SIZE PARTSUPP EXTENEDE PRICE ORDER PRIORITY CONTAINER PARTKEY DISCOUNT CLERK RETAIL PRICE SUPPKEY TAX SHIP PRIORITY COMMENT SUPPLIER SUPPKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT AVAILQTY SUPPLY COST COMMENT RETURN FLAG LINESTATUS SHIP DATE COMMIT DATE RECEIPT DATE SHIP INSTRUCTION SHIPMODE COMMENT COMMENT CUSTOMER CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL 1 PART SUPPLIER PARTSUPP CUSTOMER ORDERS LINEITEM Product information Supplier information Information of product in certain supplier Customer information Order information Product purchased in one order MKTSEGMENT COMMENT 21

Data Preparation Source : Shopping history data orderkey partkey suppkey linenumber quantity 15711815 27 12636 1 26 15711815 40 16799 2 47 15711815 39 17390 3 41 15711815 45 7496 4 19 15711815 32 21483 5 37 15711815 11 18212 6 48 15711815 28 22274 7 13 Source Data Ouput Output: Vector by order partkey orkerkey 0 11 27 28 32 39 40 45 49 15711815 0 0 10..0 1 10..0 10..0 1 10..0 1 0..0 0 22

Modeling partkey orderkey 0 11 27 28 32 39 40 45 49 15711815 0 0 10..0 1 10..0 10..0 1 10..0 1 0..0 0 Model Input Model Setup: Given the Antecedent number, the top 3 Consequent with highest Confidence% will return as the most likely to buy products. 23

Deployment SPSS Modeler Trained Model In-Database SPSS Adapter DB2 for z/os Application Server Application Server Collaboration and Deployment Service Repository Service Scoring Service Application Server Application Server 24 Ex-Database

Scoring In-database Scoring Ex-database Scoring 25

Scoring Application The top 3 product with highest score which means the customer is mostly likely to buy are recommended 26

Agenda Overview on Business analytics on system z Cross-sell end-to-end solution Real-time Anti-fraud detection for credit card(case Study and Best Practice) Q&A 27

User Scenario Real Time Credit Card Fraud Detection Customer Action: Credit Card Payment Message: Authorization Request Contact customer, investigate the case Authorization System Action: 1.Validation 2.Scoring Authorization Request Fraud Detection Engine Action: 1.Scoring 2.Case Opening 号 Action: Assign Tasks Case Management System Anti-fraud Investigator Case Manager 28

Real-time Anti-fraud Detection Overview Incoming Transaction Deployment Deploy the model into OLTP environment as UDF in DB2 for z/os. Business Understanding Identify fraud credit card transactions based on historical data to mitigate risks Data Understanding N years historical data from a real banking customer Data Preparation Transform transaction detail into vector group by account id. Evaluation Use N months historical data to evaluate the pattern are correct or not Modeling Neural Network algorithm to assign a fraud score to the credit card transaction. 29

Data Understanding Transaction Message Plastic Number Transaction number Transaction amount Transaction currency Message format of incoming transaction High Risk Region Table Transaction Region number Number of region which is marked as high risk. High Risk Business Table TXN History Table Unique Transaction ID Issuer Branch Zone Account No Card No Card Flag Card Type Expiration Date Credit Limit Customer Table Customer ID Name Gender Customer Information Feature Table (New) Account Number Transaction Business number Number of region which is marked as high risk. Available Balance Control Balance Transaction Type CR/DR Account Table Timestamp of last transaction Accumulated Attributes (36 attributes) Derived Attributes(24 attributes) Transaction Amount Transaction Currency Post Amount Account No Account Type Balance Timestamp of last transaction: records the timestamp of last transaction on this card which is to used calculate time different between current transaction and last transaction to omit those happened over 7 days. Accumulated/derived Attributes: variables to be feed into model for scoring and more detail on next slides Post currency Transaction Region number Transaction History Account Information 30

Feature Table Feature Table: Account Number Timestamp of last transaction Accumulation Attributes Derived Attributes(ratio or difference between accumulation attributes) :real-time calculation 3H 1D 7D 7D-1D 3H/7D :calculate in batch the base which will be increased by current transaction in real time. Exp: for accumulation in last 7 days, the accumulation data within last 7 days will be accumulated in batch and increased by current transaction in online process. Accumulation amount and number of TXN in high risk region Accumulation amount and number of TXN in early morning Accumulation amount and number of TXN happened in high risk merchant Accumulation amount and number of failed TXN Accumulation TXN amount and number Accumulation amount and number of TXN in high risk region Accumulation amount and number of TXN in early morning Accumulation amount and number of TXN happened in high risk merchant Accumulation amount and number of failed TXN Accumulation TXN amount and number Derived attributes will be calculated in real-time based on accumulated attributes Accumulation amount and number of high amount TXN Accumulation amount and number of high amount TXN 31

End-to-end Flow TR logic Transaction task 1 Real Time F calculation (For 3 hours/1 Day) Batch feature calculation task Get profile Business rules 2 1 5 6 Prepare SPSS adapter parameters Call SPSS adapter 4 Feature calculation 3 Update profile 3 2 Batch F calculation (as base for 7 days stats) Process flow Business rules Update history& Sample DB TR logic 4 Profile Table History data 1 2 3 Get profile Feature calculation Update profile Data flow SQL Select SQL Update Mainframe 32

Real Time Scoring Performance Comparison Remote scoring vs UDF in Database scoring Transactions per second 4500 4000 3500 3000 2500 2000 1500 1000 500 0 DB2 on z/os z196 LPAR with 2 CPs SPSS Linux on z z196 LPAR with 2 IFL 414 20.2 1755 11.6 Remote DB, Remote score Remote DB, indb score Local access, Remote score Trx/sec RT 578 6.9 4050 1 Local access, indb score Measurements optimized for max throughput on fully utilized system. Response times include full transaction with multiple DB accesses 25 20 15 10 5 0 Response Time - Milliseconds 33

Pure SQL Vs Scoring Adapter (UDFs) for Model Scoring Pure SQL Difficult to support some model scoring algorithms Requires a SQL mapping to be constructed for each model type Resulting SQL will run on many database systems No database extensions required Performance/reliability harder to predict Harder to generate SQL to score ensemble models Scoring Adapter (UDFs) Easily supports a large class of scoring algorithms Reuses existing scoring component to score each model type Needs to be adapted for each database system requiring support Pure SQL Requires database extensions to be installed Performance/reliability Di easier to predict Easier to score ensemble models 34

Question 35

Controlling outcomes with predictive analytics Demographic data Transaction data Analyses Segments Profiles Scoring models Anomaly detection... Reports, KPIs, KPPs External data Domain Expertise Define List Assign weight (points) to each indicator... Scoring Define Thresholds Determine the level of Risk Capture Predict Act 37