IBM PureData Systems Robert Božič robert.bozic@si.ibm.com
IBM PureData System Meeting Big Data Challenges Fast and Easy! System for Hadoop For Exploratory Analysis & Queryable Archive Hadoop data services optimized for big data analytics and online archive with appliance simplicity System for Analytics For apps like Customer Analysis Data warehouse services optimized for high-speed, peta-scale analytics and simplicity System for Operational Analytics System for Transactions For apps like Real-time Fraud Detection Operational data warehouse services optimized to balance high performance analytics and real-time operational throughput For apps like E-commerce Database cluster services optimized for transactional throughput and scalability
Announcing a New Model! PureData System for Analytics N200X ( Striper ) February 1 Generally Available February 5 Public Announce PureData for Analytics now has 3 models N1001 high performance and scalability N2002-002 Announced October 8th 1,6x faster performance N2001 highest performance appliance to-date 3X faster performance 3
The IBM Netezza Appliance: Revolutionizing Analytics What is Netezza?
Appliances make it simple, completely transforming the user experience. Dedicated device Optimized for purpose Complete solution Fast installation Very easy operation Standard interfaces Low cost 5
The IBM Netezza Appliance: Revolutionizing Analytics Purpose-built analytics engine Integrated database, server & storage Standard interfaces Low total cost of ownership Speed: 10-100x faster than traditional systems Simplicity: Minimal administration Scalability: Peta-scale user data capacity Smart: High-performance advanced analytics
Slovenian customers Zavarovalnica Maribor Zavarovalnca Triglav Telekom Slovenije Tuš Mobil Petrol Informatika NLB MKZ Fabrika Duvana Sarajevo Iskratel
Good prospects for Netezza Large Data volumes, minimally over 1TB ideally 5TB - 20TB of user data Transformational projects in highly competitive industries where best-use of data and analytics separates competition. Company views data as a corporate asset (competitive advantage) Must be doing complex analysis Need to bring an application online fast Hair on fire type of scenario not meeting SLAs
Smart Predicts what shoppers are likely to buy in future visits Coupon redemption rates as high as 25% Because of (Netezza s) in-database technology, we believe we'll be able to do 600 predictive models per year (10X as many as before) with the same staff." Eric Williams, CIO and executive VP
Appliance Simplicity
Managing The Netezza Appliance No software installation No storage administration No database tuning
The Netezza Appliance Loading Data Integration IBM Information Server Ab Initio Business Objects/SAP Composite Software Expressor Software GoldenGate Software (Oracle) Informatica Sunopsis (Oracle) WisdomForce Data In SQL ODBC JDBC OLE-DB
The Netezza Appliance Querying Reporting & Analysis Cognos (IBM) SPSS (IBM) Unica (IBM) Actuate Business Objects/SAP Information Builders Kalido KXEN MicroStrategy Oracle OBIEE QlikTech Quest Software SAS Data Out SQL ODBC JDBC OLE-DB
Simple to Deploy and Operate Operations Simply load and go. it s an appliance Installation to Business Value in ~2 days Ease of Evaluation and Perform As Advertised BI Developers Data model agnostic No configuration or physical modeling No indexes or tuning out of the box performance Focus on business value, not physical design ETL Developers Faster load and transformation times No aggregate tables needed simpler ETL logic In-database transformation ELT Business Analysts On-Stream processing by 100 s of nodes Train of thought analysis 10 to 100x faster True ad hoc queries Lower latency load & query simultaneously 14
Appliance Architecture
IBM Netezza True Appliance Architecture SOLARIS AIX Client TRU64 HP-UX WINDOWS LINUX Database Server Storage DATA SQL ETL Server Source Systems DBA CLI 3rd Party Apps I/O CACHE CACHE I/O CACHE SQL High Performance Loader Data
IBM Netezza True Appliance Architecture SOLARIS AIX Client TRU64 HP-UX Database WINDOWS LINUX Server Storage ODBC 3.X JDBC Type 4 SQL-92 SQL-99 Analytics ETL Server Source Systems DBA CLI CACHE Database, CACHE I/O Server, I/O Storage - in one 3rd Party Apps CACHE High Performance Loader
Information Management IBM Netezza True Appliance Architecture Optimized Hardware+Software Streaming Data Purpose-built for high performance analytics; requires no tuning Hardware-based query acceleration for blistering fast results True MPP Deep Analytics All processors fully utilized for maximum speed and efficiency Complex analytics executed in-database for deeper insights 18
IBM Netezza True Appliance Massively Parallel Processing SOLARIS AIX Client TRU64 HP-UX WINDOWS LINUX ODBC 3.X JDBC Type 4 OLE-DB SQL/92 SQL Compiler Query Plan Execution Engine 1 2 3 S-Blade Processor & streaming DB logic S-Blade Processor & streaming DB logic S-Blade Processor & streaming DB logic Source Systems ETL Server DBA CLI 3rd Party Apps High-Speed Loader/Unloader Optimize Admin Front End DBOS SMP Host Network Fabric 960 High-Performance Database Engine Streaming joins, aggregations, sorts S-Blade Processor & streaming DB logic Massively Parallel Intelligent Storage High Performance Loader
IBM Netezza True Appliance Massively Parallel Processing SOLARIS AIX Client TRU64 HP-UX WINDOWS LINUX SQL SQL Compiler Query Plan Snippets 1 2 3 Execution Engine 1 2 3 S-Blade S-Blade S-Blade 2 3 Processor & streaming DB logic 2 3 Processor & streaming DB logic 2 3 Processor & streaming DB logic Source Systems ETL Server DBA CLI 3rd Party Apps High-Speed Loader/Unloader Optimize Admin SQL Front End DBOS SMP Host Network Fabric 960 High-Performance Database Engine Streaming joins, aggregations, sorts S-Blade 2 3 Processor & streaming DB logic Massively Parallel Intelligent Storage High Performance Loader
IBM Netezza True Appliance Massively Parallel Processing SOLARIS AIX Client TRU64 HP-UX WINDOWS LINUX Consolidate 1 S-Blade 2 3 Processor & streaming DB logic SQL Compiler Query Plan Execution Engine 2 3 S-Blade S-Blade 2 3 Processor & streaming DB logic 2 3 Processor & streaming DB logic Source Systems ETL Server DBA CLI 3rd Party Apps High-Speed Loader/Unloader Optimize Admin Front End DBOS SMP Host Network Fabric 960 High-Performance Database Engine Streaming joins, aggregations, sorts S-Blade 2 3 Processor & streaming DB logic Massively Parallel Intelligent Storage High Performance Loader
Our Secret Sauce select DISTRICT, PRODUCTGRP, sum(nrx) from MTHLY_RX_TERR_DATA where MONTH = '20091201' FPGA Core CPU Core and MARKET = 509123 and SPECIALTY = 'GASTRO' Uncompress Project Restrict, Visibility Complex Joins, Aggs, etc. Slice of table MTHLY_RX_TERR_DATA (compressed) select DISTRICT, where MONTH = '20091201' sum(nrx) PRODUCTGRP, and MARKET = 509123 sum(nrx) and SPECIALTY = 'GASTRO'
How We Did It 325 MB/sec (2.5 drives / core) PureData System for Analytics N1001 N200X 1000 500MB/sec 800 1000 MB/sec + 130 MB/sec 120MB/sec FPGA Core CPU Core 130 MB/sec 1300 480 MB/sec MB/sec 65 MB/sec Decompress Project Restrict Visibility SQL & Advanced Analytics From Select Where Group by
N200X
Performance & Capacity comparison N2002-002 N2002-002
Advanced Analytics
i-class: Analytics Without Constraints Big Data Big Math Analyze wider and deeper data > Additional dimensions > Richer history Increase computational intensity > More complex models > Faster execution for results
Advanced Analytics the Traditional Netezza Way Way SAS, SPSS Data Warehouse Analytics Grid Data ETL Demand Forecastin g SQL ETL R, S+ ETL SQL Fraud Detection SQL C/C++, Java, Python, Fortran,
Advanced Analytics the Netezza Way SAS, SPSS Data Analytics Grid Demand Forecastin g ETL R, S+ SQL C/C++, Java, Python, Fortran, Fraud Detection
Advanced Analytics the Netezza Way SAS, SPSS complex analytics SAS, SPSS, R, Java, etc implicit parallelism petabyte scalability appliance simplicity SQL Demand Forecastin g R, S+ Fraud Detection SQL
Pre-Built In-Database Analytics Statistics Transformations Time Series Mathematical Descriptive Statistics+ Distance Measures* Hypothesis Testing* Chi-Square & Contingency Tables* Univariate & Multivariate Distributions+ Monte Carlo Simulation* Data Profiling / Descriptive Statistics+ General Diagnostics Statistics+ Sampling Data prep Autoregressive+ Forecasting* Basic Math* Permutation and Combination* Greatest Common Divisor and Least Common Multiple* Conversion of Values* Exponential and Logarithm* Gamma and Beta Functions Matrix Algebra+ Area Under Curve* Interpolation Methods* Data Mining Predictive Geospatial Association Rules+ Clustering+ Linear Regression+ Logistic Regression+ Geospatial Data Type Geometric Functions * Fuzzy Logix DB Lytix capabilities Feature Extraction+ Discriminant Analysis* Classification Bayesian Sampling Geometric Analysis + Netezza Analytics and Fuzzy Logix DB Lytix capabilities Model Testing