Introduction to the PureData for Analytics System (PDA) + Details on the N3001 Family Dan Simchuk simchuk@us.ibm.com
Legal Disclaimer IBM Corporation 2015. All Rights Reserved. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. 2
PureData for Analytics Basics
IBM PureData System for Analytics The Simple Appliance for Serious Analytics Built-in Expertise No indexes or tuning Data model agnostic Fully parallel, optimized In Database Analytics Integration by Design Server, Storage, Database in one easy to use package Automatic parallelization and resource optimization to scale economically Enterprise-class security and platform management Speed Simplicity Scalability Smart Simplified Experience Up and running in hours Minimal ongoing administration Standard interfaces to best of breed Analytics, BI, and data integration tools Built-in analytics capabilities allow users to derive insight from data quickly Easy connectivity to other Big Data Platform components 4
Evolution of Netezza & PureData System for Analytics PureData System for Analytics N300x World s First appliance with no cost encryption PureData System for Analytics N200x World s Fastest and Greenest Analytical Appliance World s First Analytic Data Warehouse Appliance World s First Petabyte Data Warehouse Appliance TwinFin World s First 100 TB Data Warehouse Appliance NPS 10000 Series World s First Data Warehouse Appliance NPS 8000 Series 2003 5 TwinFin with iclass Advanced Analytics 2006 2009 2010 2012 2014
PureData System for Analytics Family 10-100x faster than custom systems1 3.3x faster I/O scan rate2 Load and go, no tuning Designed to run complex analytics in minutes, not hours Rich set of in-database analytics 1...plus In the box capability for realtime analytics, Hadoop data services, data movement and business intelligence Advanced security Partial rack to 8-rack configurations plus Rack mountable appliance Ideal for small and medium business with up to 16 TB of user data The hybrid computing platform integrating Netezza technology with zenterprise technology Supports transaction processing and analytic workloads concurrently, efficiently & cost effectively Accelerates complex queries, up to 2000x faster Required security compliance with Data-at-Rest Encryption Based on IBM customers' reported results. "Traditional custom systems" refers to systems that are not professionally pre-built, pre-tested and optimized. Individual results may vary. 2 6 Comparing N1001 scan rate of 145 TB/hour to N2002 scan rate of 478 TB/hour
Next Generation Architecture for Big Data and Analytics All Data Machine and sensor data Actionable Insight Real-time Data Processing & Analytics Streams, Data Replication Decision Management SPSS Modeler Gold Image and Video Operational Data Zone Enterprise Content Transaction and application data Social Media DB2, Informix, PureData System for Transactions Deep Analytics Modeling Landing, Exploration and Archive Data Zone BigInsights PureData System for Analytics Reporting & Interactive Analysis DB2 BLU, PureData System for Analytics Predictive Analytics and Modeling SPSS Modeler Reporting and Analysis COGNOS BI COGNOS TM1 Discovery and Exploration Watson Explorer Third-party data 7 Information Integration & Governance Information Server, MDM, Guardium, Optim, Federation Server, Replication
Warehousing and Analytics, The PDA Way
Traditional Data Warehouses are just too complex They do NOT to meet the demands of advanced analytics on big data. Too complex an infrastructure Too inefficient at analytics Too complicated to deploy Too many people needed to maintain Too much tuning required Too costly to operate Too long to get answers 9 9
Appliances Make It Simple transforming the user experience. Dedicated device Optimized for purpose Complete solution Fast installation Very easy operation Standard interfaces Low cost 10
Simplify Move Analytics into the Data Warehouse Integrate the server, storage and database into one optimized package Server Move complex analytics into the database Storage Leverage proven technology that accelerates analytics with no tuning or storage administration Database Analytics Analytics Database Storage Server 11
Data Warehouse Workload Fewer requests, lots of data manipulation Transactional System used for BI Request Request CPU 12 General Purpose Storage
Data Warehouse Workload Transaction systems are inefficient for data shuffling Transactional System used for BI Results Request CPU 13 General Purpose Storage
Data Warehouse Blades Designed for Tera-scale Business Intelligence PureData for Analytics Performance Server System Results Request CPU Intelligent Storage Asymmetric Massively Parallel Processing 14
Data Warehouse Blades Highly efficient data movement PureData for Analytics Performance Server System 2% of CPU 1% of network Results requirements traffic Request CPU Intelligent Storage Asymmetric Massively Parallel Processing 15
The PureData System for Analytics AMPP Architecture Field Programmable Gate Array = a blank canvas until it s configured CPU FPG A Memory Advanced Analytics CPU Lite Host FPG A BI (IBM xseries, Red Hat Linux) Memory ETL CPU FPG A Memory Loaders Disk Enclosures S-Blades Network Fabric PureData System for Analytics Appliance 16 Applications
S-Blade Data Stream Processing FPGA Core Stream via Decompress Zone Map From From CPU Core Project Restrict Visibility Select Where SQL & Advanced Analytics Group by Select State, Age, Gender, count(*) FromFrom MultiBillionRowCustomerTable MultiBillionRowCustomerTable Where BirthDate Where BirthDate < 01/01/1960 < 01/01/1960 ( FL, GA, SC, Group State, Gender Order by State, Group by NC ) State, Age, by Gender OrderAge, by State, Age, Gender And State in And ( FL,State GA,in SC, NC ) Age, Gender 17
Asymmetric Massively Parallel Processing Clie nt SOLARIS AIX System Z HP-UX WINDOWS IBM PureData System for Analytics Appliance 1 LINUX ODBC 3.X JDBC Type 4 OLE-DB SQL/92 S-Blade Processor & streaming DB logic SQL Compiler 2 S-Blade Processor & streaming DB logic Query Plan Execution Engine 3 S-Blade Processor & streaming DB logic Optimize... Admin ETL Server High-Speed Loader/Unloader 920 DBA CLI Source Systems Front End DBOS High-Performance Database Engine Streaming joins, aggregations, sorts S-Blade Processor & streaming DB logic 3rd Party Apps SMP Host Network Fabric Massively Parallel Intelligent Storage High Performance Loader 18
Asymmetric Massively Parallel Processing Clie nt SOLARIS AIX System Z HP-UX WINDOWS IBM PureData System for Analytics Appliance 1 LINUX 2 1 Processor & 2 DB logic 1streaming 3 Snippets SQL S-Blade 3 SQL Compiler 2 S-Blade 1 Processor & 2 3 streaming DB logic Query Plan Execution Engine 3 S-Blade 1 2 3 Processor & streaming DB logic Optimize... Admin ETL Server High-Speed Loader/Unloader SQL 92 0 DBA CLI Source Systems Front End DBOS High-Performance Database Engine Streaming joins, aggregations, sorts S-Blade 1 Processor & 2 3 streaming DB logic 3rd Party Apps SMP Host Network Fabric Massively Parallel Intelligent Storage High Performance Loader 19
Asymmetric Massively Parallel Processing Clie nt SOLARIS AIX System Z HP-UX WINDOWS IBM Pure Data System for Analytics Appliance LINUX ODBC 3.X JDBC Type 4 OLE-DB SQL/92 1 Consolidat e S-Blade Processor & 1streaming 2 DB logic 3 SQL Compiler 2 S-Blade 1 Processor & 2 3 streaming DB logic Query Plan Execution Engine 3 S-Blade 1 2 3 Processor & streaming DB logic Optimize... Admin ETL Server High-Speed Loader/Unloader 92 0 DBA CLI Source Systems Front End DBOS High-Performance Database Engine Streaming joins, aggregations, sorts S-Blade 1 Processor & 2 3 streaming DB logic 3rd Party Apps SMP Host Network Fabric Massively Parallel Intelligent Storage High Performance Loader 20
Spend Less Time Managing and More Time Innovating Simplicity and Ease of Administration Easy Administration Portal No software installation No indexes and tuning No storage administration No dbspace/tablespace sizing and configuration No redo/physical/logical log sizing and configuration No page/block sizing and configuration for tables No extent sizing and configuration for tables No Temp space allocation and monitoring No RAID level decisions for dbspaces 21 No logical volume creations of files No integration of OS kernel recommendations No maintenance of OS recommended patch levels No JAD sessions to configure host/network/storage Data Experts, not Database Experts
Data Management in Legacy Databases Journaling create multiset table wcrm.f_monthly_billing_schedule, no fallback, no before journal, no after journal ( Compression per_key integer not null, exposure_detail_key integer not null, billing_schedule_char_key integer not null, source_system_limit_key char(10) not null, charge_type_key smallint not null, Indexes effective_from_date date format 'yy/mm/dd', effective_to_date date format 'yy/mm/dd', amount_due decimal(18,2) compress (0.00,10000.00,50000.00,250000.00,100000.00 ), amount_due_ccy decimal(18,2) compress (0.00,200000.00,10000.00,50000.00,250000.00, Partitions 100000.00,150000.00 ), total_installments integer compress (0,1,36,38,48,51,52,55,56,60,180 ), current_installments integer compress (0,1,2,3,4,5,6,7,8,9,10 ), percent_due decimal(9,6) compress (0.000000,100.000000,10.000000,15.000000 ), as_of_date date format 'yy/mm/dd', last_update_event_ts timestamp(6), last_update_user_id char(8), PLUS: source_rec_id integer) primary index pmy_idx ( exposure_detail_key ) Logs partition by range_n(per_key between 200001 and 200012 each 1, 200101 and 200112 each 1, Tablespaces 200201 and 200212 each 1, Extents 200301 and 200312 each 1, 200401 and 200412 each 1, Bit maps 200501 and 200512 each 1, Etc. 200601 and 200612 each 1, 200701 and 200712 each 1, 200801 and 200812 each 1 ); 22
Table conversion example PureData for Analytics create multiset table wcrm.f_monthly_billing_schedule, Logical model only no fallback, no before journal, No indexes/partitioning no after journal ( per_key integer not null, Compression is automatic exposure_detail_key integer not null, No physical tuning/space considerations billing_schedule_char_key integer not null, source_system_limit_key char(10) not null, Significantly reduced administration charge_type_key smallint not null, effective_from_date date format 'yy/mm/dd', effective_to_date date format 'yy/mm/dd', amount_due decimal(18,2) compress (0.00,10000.00,50000.00,250000.00,100000.00 ), amount_due_ccy decimal(18,2) compress (0.00,200000.00,10000.00,50000.00,250000.00, 100000.00,150000.00 ), total_installments integer compress (0,1,36,38,48,51,52,55,56,60,180 ), current_installments integer compress (0,1,2,3,4,5,6,7,8,9,10 ), percent_due decimal(9,6) compress (0.000000,100.000000,10.000000,15.000000 ), as_of_date date format 'yy/mm/dd', last_update_event_ts timestamp(6), last_update_user_id char(8), The only consideration is how source_rec_id integer) primary index pmy_idx ( exposure_detail_key ) you spread your data across all partition by range_n(per_key between 200001 and 200012 each 1, the disks in the system 200101 and 200112 each 1, 200201 and 200212 each 1, 200301 and 200312 each 1, 200401 and 200412 each 1, 200501 and 200512 each 1, 200601 and 200612 each 1, 200701 and 200712 each 1, 200801 and 200812 each 1 ) DISTRIBUTE ON (exposure_detail_key); 23
Distribution Good distribution is a fundamental element of performance! A data slice is an individual element of parallelism (1000-12 = 94 data slices) If all data slices have the same amount of work to do, a query will be 94 times quicker than if one data slice was asked to do the same work Bad distribution is called data skew Skew to one data slice is the worst case scenario Skew affects the query in hand and others as the data slice has more to do Skew also means that the machine will fill up much quicker Simple rule. Good distribution Good performance 24 24
A Good Distribution: 2.2 Trillion Records 25 25
Synergy with Data Integration and Reporting & Analysis Tools Reporting & Analysis 26 OLE-DB OLE-DB JDBC ODBC SQL Data Out ODBC Ab Initio Cloudera Composite Software IBM BigInsights IBM Information Server IBM InfoSphere Data In Streams Informatica Oracle Data Integrator Oracle GoldenGate SAP Business Objects SQL JDBC Data Integration IBM Cognos IBM SPSS IBM Unica Actuate Information Builders Kalido KXEN Microsoft MicroStrategy Oracle SAP Business Objects SAS Tableau
PureData System for Analytics: In-Database PureData System for Analytics In-Database Transformations Mathematical Geospatial Predictive Statistics Time Series Data Mining No data movement Analyze deep and wide data High performance, parallel computation 27 IBM INTERNAL USE ONLY
IBM Netezza Analytics v3.2 New extensions for ESRI, Spatial and R ESRI functions Open Source R Spatial extensions Open Source R 28
Pre-Built In-Database Analytics Statistics Descriptive Statistics+ Distance Measures* Hypothesis Testing* Chi-Square & Contingency Tables* Univariate & Multivariate Distributions+ Transformations Time Series Data Profiling / Descriptive Statistics+ Autoregressive+ General Diagnostics Forecasting* Mathematical Basic Math* Permutation and Combination* Greatest Common Divisor and Least Common Multiple* Statistics+ Sampling Conversion of Values* Data prep Exponential and Logarithm* Gamma and Beta Functions Monte Carlo Simulation* Matrix Algebra+ Area Under Curve* Interpolation Methods* Data Mining Predictive Association Rules+ Linear Regression+ Geospatial Data Type Clustering+ Logistic Regression+ Geometric Functions Feature Extraction+ Classification Geometric Analysis Discriminant Analysis* Bayesian Sampling Geospatial * Fuzzy Logix DB Lytix capabilities + Netezza Analytics and Fuzzy Logix DB Lytix capabilities Model Testing 29
PureData System for Analytics Optimization With Other IBM Products Big Data Platform Data Integration Business Intelligence / Performance Management System Z 30 InfoSphere Streams InfoSphere BigInsights System ML (Machine Learning) Information Server v9.1 InfoSphere Discovery v4.5 InfoSphere Data Architect v8.1 InfoSphere CDC Heterogeneous Replication InfoSphere Optim Data Archive 9.1 Industry Models v8.4 Banking, Insurance, Healthcare Industry Model Packs Supply Chain, Customer, Market & Campaign Tivoli Storage Manager Vivismo Data Explorer v8.2 Cognos v10.2 Cognos TM1 v9.5 Guardium DB Monitoring v9 SPSS Modeler v15 Unica EMM Marketing Analytics 8.6 Unica NetInsights 8.6 IBM DB2 Analytics Accelerator (IDAA) zlinux ODBC driver Coming Soon: PureData System for Operational Analytics Guardium Informix Data Warehouse Edition SPSS v16
PureData System for Analytics Delivers Faster information delivery With the IBM PureData System for Analytics, we can reduce the time to analyze complex GIS data from days to minutes a more than 98 percent improvement. - Steve Trammell, Strategic Alliances Marketing Manager, Esri Analytical tools that are easy to use We knew that our IBM SPSS Modeler software could scale to meet our needs; the limitation was on the hardware and data warehousing side. Instead of having separate databases and servers for each client, we wanted to build a single, multi-tenant platform that could support a cloud-based service for the entire business. In the IBM PureData System for Analytics, we found the answer. - Patrick Ritto, CTO, FleetRisk Advisors Easy access to required data Making decisions based on data instead of intuition or gut feeling is better. There is already a greater demand from users for data to support day-to-day operations solutions such as the InfoSphere Business Glossary empower them with this information so that they can work more autonomously and efficiently. - Philippe Chartier, BI Team Lead, Information Delivery, Canadian National Railway Company 31
Mini appliance early beta test results Avnet beta test using customer workload IBM PureData System for Analytics Mini Appliance (N3001-001) MS SQL Server 384 3 vs. seconds What could you do if your queries were seconds 127x faster? To hear more, come to Insight 2014, October 26-30 1 Avnet beta test performed using customer workload on PureData System for Analytics N3001-001 compared to MS SQL Server 2008 32
Comparing PureData System for Analytics with Teradata Teradata has 3.8x higher 2.6x higher 3.4x more 33% higher deployment costs1 personnel costs1 DBAs required1 3-year TCO1 than the IBM PureData System for Analytics 33 1 ITG: Comparing Costs and Time to Value with Teradata Data Warehouse Appliance, May 2014.
Comparing PureData System for Analytics with Oracle Oracle has 3.5x higher 3x more 45% higher deployment costs1 DBAs required1 3-year TCO1 than the IBM PureData System for Analytics 1 ITG: Comparing Costs and Time to Value with Oracle Exadata Database Machine X3, June 2014. 34
The new PureData System for Analytics N3001 Family
The PureData System for Analytics N3001 Changing the game for data warehouse appliances (again) Big Data and Business Intelligence ready with capabilities to unlock data s true potential Advanced security in an insecure world at no extra cost An even broader family of appliance models to fit a broad range of data capacity needs and yes, simple is STILL better! 36
Big Data and Business Intelligence Ready Unlocking Data s True Potential Included with the PureData System for Analytics N3001 Data Warehouse Appliance Advanced security New rack-mountable appliance for midsize organizations New 8-rack system for Petabyte+ capacity Data Integration & Transformation InfoSphere DataStage 280 PVUs, 2 concurrent Designer Client licenses and InfoSphere Data Click Exceptional value provided Built-in, In-Database analytic capability and integration with a variety of 3rd party tools For additional value 37 Business Intelligence Cognos software, 5 Analytics User licenses, plus 1 Analytics Administrator license Industry Process & Data Models Models for Banking, Financial Markets, Healthcare, Insurance, Retail, Telco Hadoop Data Services InfoSphere BigInsights Software licenses to manage ~100 TB of Hadoop data Real-time Analytics InfoSphere Streams Developer Edition 2 users, non-production licenses IBM InfoSphere Data Privacy and Security for Data Warehousing
IBM Netezza Analytics Included In-database Analytics For Every Role in Your Enterprise Use cases Reduce hospital admissions or personalize disease treatments Bring the analytics to the data not the data to the analytics Achieve an order of magnitude improvement in manufacturing quality Better understand the risk of catastrophic events and many more Features Data Preparation Predictive Analytics Built-in, in-database analytic functions - Data mining, prediction, transformations, statistics, geospatial, data preparation Full integration with tools for BI & visualization - IBM Cognos, Microstrategy, Business Objects, SAS, MS Excel, SSRS, Kognitio, Qlikview Full integration with tools for model building & scoring Geospatial Analytics Advanced Statistics - IBM SPSS, SAS, Open Source R, Fuzzy Logix Full integration for custom analytics - Open Source R, Java, C, C++, Python, LUA 38
Business Intelligence Included The Power of IBM Cognos with PureData System for Analytics Use cases Reporting, analysis, scorecards, dashboards Rapid deployment of answers to key business questions Data visualization Mobile business intelligence and many others Features Leading Business Intelligence - Interactive analysis - Compelling visualizations - web, mobile or email - Enterprise scalability Optimized for PureData for Analytics - Offers high performing OLAP over relational experience - Cognos Dynamic Query Mode extends benefits of PureData by adding in-memory & caching on top of already fast appliance performance - Exploits Netezza analytic in-database functions 39 1 PureData System for Analytics N3001 must be the data source for Cognos. Included with PureData for Analytics: IBM Cognos Business Intelligence 10.2.1 5 Analytics User licenses, 1 Analytics Administrator license1
Data Integration & Transformation Included InfoSphere DataStage, Designer Client and Data Click Use cases Integration, transform and deliver trustworthy information to your data warehouse Rich capabilities for data integration Analysts, data scientists or even line-of-business users can easily retrieve data and populate the PureData System for Analytics Move data from the data warehouse into a subject area data mart Features Ease of Use - Provides an easy-to-use, top-down, work-as-youthink design interface that enables users to design once and deploy anywhere batch or real time; extract, transform, load (ETL); or extract, load, transform (ELT) - Self-service data integration to enhance business agility Accelerate time to value - Includes a comprehensive library of transformation components for easily defining common integration processes 40 1 PureData System for Analytics N3001 must be the source or target database. Included with PureData for Analytics: IBM InfoSphere DataStage 11.3 (280 PVU Information Server Engine Tier)1, Designer Client (2 concurrent users), InfoSphere Data Click1
Hadoop Data Services Included Included Capability with IBM InfoSphere BigInsights Use cases Federated SQL access across Hadoop and your PureData System for Analytics Bringing the power of Hadoop to your enterprise Pre-processing and landing zone for all data types prior to loading to data warehouse Queryable backup for cold data Features Big data analytical platform - Best of open source + IBM technologies - Big SQL - High performance SQL access of Hadoop - Federation across many data sources combine information from Hadoop and PureData for Analytics - BigSheets visualization tool Built-in analytics - Text analytics, Big R 1 41 Included with PureData for Analytics: InfoSphere BigInsights 3.0 software licenses for 5 enterprise nodes to manage up to ~100 TB of Hadoop data1 Based on 4 data nodes + 1 master node. 12 TB uncompressed per data node with 4 TB drives. 12 TB x 4 nodes = 48 TB uncompressed. Using 2-2.5x compression yields 96-120 TB compressed data. Capacity will depend on hardware configuration selected.
Real-Time Analytics Included Included Capability from IBM InfoSphere Streams Use cases Fraud detection Predict customer churn Telco real-time mediation and analysis Deploy analytic models on data-in-motion to enable real-time decisions and land data in the warehouse to build the analytic models Real-time monitoring of medical sensors to improve healthcare outcomes Defect detection in manufacturing Traffic pattern analysis and management Features Analyze data in motion - Provides sub-millisecond response times, allowing you to view information and events as they unfold - Analyze all kinds of data: simple & advanced text, geospatial, acoustics, images, video, sensors - Eclipse-based development environment 42 Included with PureData for Analytics: InfoSphere Streams Developer Edition 3.2.1 2 developer users, non-production licenses
Inside the IBM PureData System for Analytics N30011 1 N3001-001 does not have Hardware Acceleration (FPGA) Disk Enclosures Optimized Hardware + Software Hardware accelerated AMPP User data, mirror, swap partitions High speed data streaming Purpose-built for high performance analytics Requires no tuning SMP Hosts Snippet Blades Hardware-based query acceleration with FPGAs SQL Compiler Blistering fast results Query Plan Complex analytics executed as the data streams from disk Optimize Admin 43
Hardware Overview: Model N3001 12 Disk Enclosures Total 288 600 GB SAS2 Self Encrypting Drives 240 for User Data 14 for S-Blades 34 Spare Scales up to 8 full Racks RAID 1 Mirroring 2 Hosts (Active-Passive) 2 Intel Ivy Bridge CPUs 5X600 GB SAS Self Encrypting Drives Red Hat Linux 6 64-bit 7 PureData for Analytics S-Blades 2 Intel 10 Core Ivy Bridge CPUs 2 8-Engine Xilinx Virtex-6 FPGAs 128 GB RAM + 8 GB slice buffer Linux 64-bit Kernel User Data Capacity: 192 TB1 Data Scan Speed: 478 TB/hr* Load Speed: 10 TB/hr Terabyte to Petabyte+ Capacity Up to 10TB/hr load rate in multi-rack configurations Power Requirements: 7.5 kw Cooling Requirements: 27,000 BTU/hr 1 Assuming 4X compression 44
Self Encrypting Drive (SED) Feature Overview Protecting Sensitive Data at Rest All data encrypted user and temp Local key management out of the box 2-Tier key management Uses AEK (Authentication Encryption Key) 256 bit AES key One key for SPUs and one for Hosts Keys can be initialized or changed at any time even after loading data No need to reinitialize system for setting keys Supports Instant Cryptoerase functionality to re-purpose the drives nzkeybackup or nzhostbackup utilities to backup AEKs1 Encrypts/Decrypts all user data at full interface speed using dedicated encryption engine AEK locks the drive to protect data at rest One time up front setup, No overhead to pass the key Requirements Available on all N3001 models NPS 7.2.0+ 45 1 Refer Netezza System Administration Guide for details
PureData System for Analytics with NPS 7.2 New database features, Improved performance and predictability Database features Enhanced security enables single sign-on and centralized management New built-in functions and SQL updates Portal enhancements 46 Performance and Predictability Resiliency and Serviceability WLM throughput and latency optimization Enhanced Health Check capabilities Faster load rates up to 10 TB/hr Enhanced storage topology and communication fabric Faster restore rates Call Home via https and SOAP
Netezza Support for GPFS Mount and leverage the GPFS server cluster! Netezza support following GPFS versions: GPFS V3.5 x86_64 on RHEL 5 (N1000 series) GPFS V3.5 x86_64 on RHEL 6 (N2000 series) GPFS V4.1 x86_64 on RHEL 6 (N2000 series) GPFS client / server cluster is independent of NPS Extend the logical warehouse! Add a Netezza node to your GPFS cluster Setup GPFS client for automated failover Use for unload / load ETL operations Run nzbackup / nzrestore to GPFS cluster Create external table and access Join a FPO configured GPFS cluster 47 Seamless capacity High availability 3-way mirroring High performance Policy-driven Simple administration Cost-effective
Kerberos Support Connect to PDA without requiring a password! Benefits Kerberos and PDA Identity federation to provide user convenience via single sign-on (SSO) Requires kerberos v1.12.1 for best results Reduce security administration and costs through a federated approach Currently allows one method of authentication Better accountability and regulatory compliance Only ADMIN will have LOCAL authentication Cross-realm authentication and multi-user are supported Supports nzsql, ODBC, JDBC, and OLEDB Working on ability to delegate credentials and support for dual authentication (local and kerberos) 48
Workload Management: GRA+PQE+SQB+Job Limits Prioritized User Requests L Request Queues Power User 10 job limit N C N C L H Minimum Resource Guarantees with Prioritized Execution C L Departmental User 40 job limit L H H H C C H Admin Tasks 3 job limit L L C N H Priority Queue Execution (PQE) Job Limits Guaranteed Resource Allocation (GRA) Query priorities managed in the context of GRA allocations and job limits Short Query Bias (SQB) Short queries prioritized ahead of longer running queries Powerful mechanisms for managing workloads, partitioning resources and implementing chargeback in complex multi-user environments 49
WLM Latency Based Scheduler The new Latency Based Scheduler can substantially improve latency and throughput. Perfect for busy systems running high concurrency. ❶ ❷ ❸ Throughput scheduling conflicts when queuing is heavy Latency - gives preference to shorter running queries GRA accuracy - minimizes bursts by predictively avoiding over-serving or under-serving specific resource groups through GRA Short (< 2s) Medium (2s to 60s) Long - -Cost estimate configurable -SQB is not applicable -Selected by a blend of arrival and estimate order -Latency metrics available from schedqueues and logs -Not configurable -SQB is not applicable -Minimizes bursts -Better average latency -Higher average throughput -Some queries will be faster and others the same 50 Behaves just like 7.1 Cost estimate configurable SQB applies to Shorts Shorter latency on average
PureData System for Analytics N3001-001: The Mini-Appliance Bringing speed and simplicity to midsize organizations for big outcomes Simple Same user experience as all PureData System for Analytics appliances Full function Netezza Platform Software with IBM Netezza Analytics Support tools and Netezza Performance Portal ODBC/JDBC/OLE-DB/SQL Driver integration Load and go with no tuning or administration Speed 10-100x faster than traditional custom systems1 Smart Rich set of in database analytic functions Protection of all data from unauthorized access Includes starter kits for Big Data and Business Intelligence Agile Easily incorporated into the data center with simplified installation into an existing rack Affordable Purchase or lease Solution Highlights Rack mountable Production ready Full function appliance User data capacity 16 TB* High availability - All redundant hardware, 4 disk spares, hot swap power supply Self encrypting drives, Kerberos support, LDAP/Active directory *Assumes 4x compression 51 1 Based on IBM customers reported results. Traditional custom systems refers to systems that are not professionally pre-built, pre 2015 IBM Corporation tested and optimized. Individual results may vary.
PureData System for Analytics N3001-080 -- 8-rack System 52 1.5 PB of user data capacity1 Hosts: 2x x3750m4 and 600 GB Self Encrypting Drives Blades: 56x HS23 with 20 core IvyBridge processors Storage: 96 EXP2524 disk enclosures with 24x 600 GB Self Encrypting Drives 1 Assumes 4x compression
The PureData System for Analytics N3001 Family Multiple rack systems Single rack systems Specification N3001-001 N3001-002 N3001-005 N3001-010 N3001-020 N3001-040 N3001-080 Racks n/a, 2 x 2U 1 (1/4 full) 1 (1/2 full) 1 2 4 8 Active SBlades n/a 2 4 7 14 28 56 CPU cores 40 40 80 140 280 560 1,120 User data (TB) * 16 32 96 192 384 768 1,536 Linear Scalability! 53 * Assuming 4x compression
Business Benefits of Simplicity 54 54 Lower total cost of ownership (4 DBAs -> 1 part time) Faster delivery (no physical design) More flexible (no need for tuning) Lower risk Ease of Use Fewer mistakes Little Downtime Redundancy throughout the system Maintenance and updates/upgrades included in service contract and can be scheduled to meet workload demands.
THINK 55 55
56
Customer Successes
Bon-Ton Optimizes Their Customer s Experience Using IBM PureData System for Analytics Understand what customers want, when they walk into a Bon-Ton store Targeted advertising to promote products that customers want at the price they want them Freeing the time of Bon-Ton buyers and planners from the mundane task of gathering & compiling customer data so they can spend their time making informed decisions to drive the business I need some way to understand what they're thinking, what they're feeling, without having to have contact with them. PureData for Analytics is what's going to help us understand what the customers want when they walk into my stores - Paula Post, Vice President Merchandising Optimization. Video: https:/www.youtube.com/watch?v=0gswol6gciw 58
Carphone Warehouse Increases Profitability Through New Revenue Streams & Reduced Costs Up to 1200X faster performance; reports that once took an hour to run now take seconds 50% reduction in time to market for new business intelligence services The PureData System, powered by Netezza technology, provided huge technical advantages & big business advantages. We can now insure devices on behalf of a bank in the UK, which we couldn t have done before. - Paul Scullion, Head of Business Intelligence Case Study: http://www-03.ibm. com/software/businesscasestudies? synkey=m183113u13038j58 59
eharmony Attracts New Members by Understanding Behavior and Fine-tuning Matching Algorithm 100% increase in subscriber base 96% decrease in query run times (from 1 hour to 2 minutes) Reduced spending On low-return promotional activities "Through the entire subscription lifecycle, the company tracks everything members do on the website. This process generates an enormous amount of data, which would be completely wasted without the ability to extract hidden insights about how members behave. - eharmony C-Level executive Video: https://www.youtube.com/watch?v=_0wffnyhn8s 60
Canadian National Railway Company leverages the power of predictive analytics to run trains on time Enhanced confidence in data driven decision-making Reduction in time spent on running reports, some reports that took 10-20 minutes earlier now run in 5 seconds Accelerated analytics for faster insight, the company is moving to near real time report generation compared to monthly reports earlier The performance of PureData is very good, most reports we have are running in less than 5 seconds where as with other databases we had reports running for 10-20 minutes - Philippe Chartier, BI Team Lead, Information Delivery, Canadian National Railway Company Video: https://www.youtube.com/watch?v=yyzu5sekbli 61
Seattle Children s Optimizes Business Intelligence & Insight into New Treatment Protocols to Enrich Patient Care 98% reduction in time spent on some queries Promotes self-service business intelligence & insights throughout the hospital More effective diagnosis & treatment by enabling faster, more accurate insights, on-demand We re getting deeper into the data in multiple ways... When we see new commonalities in treatments for children, we can design new protocols to provide the best possible care - Wendy Soethe, Enterprise Data Warehouse Manager Video: https://www.youtube.com/watch?v=bjgwiectvki 62