SPSS Modeler Integration with IBM DB2 Analytics Accelerator Markus Nentwig August 31, 2012 Markus Nentwig SPSS Modeler Integration with IDAA 1 / 12
Agenda 1 Motivation 2 Basics IBM SPSS Modeler IBM DB2 Analytics Accelerator (IDAA) 3 My Work Task Overview Fraud Prediction for Banking Scenario 4 Results Markus Nentwig SPSS Modeler Integration with IDAA 2 / 12
New information out of old transactions!? Example: retail business website: Customers who bought book A also bought book X and Y Market Basket Analysis Questions: How does it work? What are the problems? Possible solution to Market Basket Analysis Data Mining Markus Nentwig SPSS Modeler Integration with IDAA 3 / 12
IBM SPSS Modeler Data Mining workbench to discover knowledge in databases Tool for Data Mining: IBM SPSS Modeler Scan all transactions made in past find associations, propose them to new customers Market Basket Analysis example: Markus Nentwig SPSS Modeler Integration with IDAA 4 / 12
IBM DB2 Analytics Accelerator (IDAA) Data Warehouse appliance powered by Netezza technology System z196 connected to IDAA Accelerate specific (often analytic) queries Appliance makes it easy to install / operate from Redbook: Optimizing DB2 Queries with IBM DB2 Analytics Accelerator for z/os Figure Markus Nentwig SPSS Modeler Integration with IDAA 5 / 12
IBM DB2 Analytics Accelerator (IDAA) Computation with new approach on IDAA Figure from Redbook: Optimizing DB2 Queries with IBM DB2 Analytics Accelerator for z/os OLAP-type access to data Initial data load once from DB2 Pass query to IDAA Massive Parallel Processing (MPP) on Snippet Blades Data Mining on IDAA, less work on DB2 Markus Nentwig SPSS Modeler Integration with IDAA 6 / 12
IBM DB2 Analytics Accelerator (IDAA) Results with new IDAA approach Iterate about whole data base find associations Netezza-based MPP architecture well suited Use of IDAA ensures integration with DB2 transparent for customer Multiple Terabyte TransactionTable not moved anymore Small resulting table (red) back to DB2 Markus Nentwig SPSS Modeler Integration with IDAA 7 / 12
Task Overview Subjects I worked on Describe model build on IBM SPSS Modeler and possible new approach with IDAA Find real scenarios and map them to both approaches Preparation tasks for performance test Proposal for model build optimization Markus Nentwig SPSS Modeler Integration with IDAA 8 / 12
Fraud Prediction for Banking Scenario Real world business scenario Prediction of possible credit card transaction fraud Examples: Big transactions in abnormal time Multiple purchases from different vendors in short time High risk country origin 1 Model Training: Check old transactions for fraudulent patterns 2 Scoring: Apply model to new transactions block or approve Markus Nentwig SPSS Modeler Integration with IDAA 9 / 12
Fraud Prediction for Banking Scenario Example: algorithm mapped to IDAA Algorithm RFM-Analysis in IBM SPSS Modeler: One node calculates values No algorithm equivalent on IDAA side Map RFM-Analysis to IDAA One page SQL code Markus Nentwig SPSS Modeler Integration with IDAA 10 / 12
Results Model build accelerated using Netezza technology Business scenarios mapped to new architecture Performance measurement in progress Related presentation on IOD: IBM Software InformationOnDemand 2012 October 21-25 IDW-1626A zolap - Accelerate SPSS Modeling and Data Mining Using IDAA on z Speakers: Oliver Benke, Oliver Draese, Roland Seiffert Markus Nentwig SPSS Modeler Integration with IDAA 11 / 12
Thank you! Thank you for listening. Any questions? IBM, the IBM logo, ibm.com and DB2 Analytics Accelerator are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Markus Nentwig SPSS Modeler Integration with IDAA 12 / 12
Backup Backup Markus Nentwig SPSS Modeler Integration with IDAA 13 / 12
Backup Preparation for performance test (1) Data preparation Data extraction out of the given complex scheme We only need some tables for the model creation Adaption to the needs of DB2 / IDAA Table creation, change of data type, date format Enlargement of data basis (from less than 100 MB to GB-TB) Java tool, care for primary key indices Markus Nentwig SPSS Modeler Integration with IDAA 14 / 12
Backup Preparation for performance test (2) Load to DB2 and also to IDAA DB2 LOAD utility used within a JCL script on the host Accelerate (Copy) tables to IDAA with IDAA Studio \\LOAD EXEC PGM=DSNUTILB,PARM=DBNI... LOAD DATA INDDN INPUTD REPLACE LOG NO ENFORCE NO FORMAT DELIMITED INTO TABLE NENTWIG.TABLE_NAME ( PARAM TYPE,... ) Markus Nentwig SPSS Modeler Integration with IDAA 15 / 12
Backup Preparation for performance test (3) Markus Nentwig SPSS Modeler Integration with IDAA 16 / 12
Backup Preparation for performance test (4) Implement applied algorithms on Netezza Much pre-defined functionality with IBM SPSS In-Database Analytics like Discretization, normalization Decision trees, association rules Different clustering algorithms and so on Exploit and adapt to work like in SPSS Modeler Example discretization: CALL nza..efdisc( outtable=rfm_bounds, intable=source, bins=5 incolumn=recency_days;frequency;monetary ); CALL nza..apply_disc( outtable=rfm, intable=source, btable=rfm_bounds, replace=false ); Markus Nentwig SPSS Modeler Integration with IDAA 17 / 12