Exploiting Accelerator Technologies for Online Archiving



Similar documents
Netezza and Business Analytics Synergy

Database Management System Trends IBM DB2 Perspective

Hybrid Transaction/Analytic Processing (HTAP) The Fillmore Group June A Premier IBM Business Partner

Main Memory Data Warehouses

PureSystems: Changing The Economics And Experience Of IT

Achieving the best of both worlds: The hybrid data server approach

IBM DB2 Analytics Accelerator

Exploitation of Predictive Analytics on System z

2009 Oracle Corporation 1

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Next Generation Data Warehousing Appliances

IBM PureData Systems. Robert Božič 2013 IBM Corporation

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

ICONICS Choosing the Correct Edition of MS SQL Server

Inge Os Sales Consulting Manager Oracle Norway

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Key Attributes for Analytics in an IBM i environment

James Serra Sr BI Architect

In-Memory Data Management for Enterprise Applications

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Oracle Database In-Memory The Next Big Thing

Cost-Effective Business Intelligence with Red Hat and Open Source

SQL Server PDW. Artur Vieira Premier Field Engineer

Innovative technology for big data analytics

IBM DB2 specific SAP NetWeaver Business Warehouse Near-Line Storage Solution

Evolving Solutions Disruptive Technology Series Modern Data Warehouse

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Netezza PureData System Administration Course

A Data Warehouse Approach to Analyzing All the Data All the Time. Bill Blake Netezza Corporation April 2006

Data Warehousing with Oracle

IBM Netezza High Capacity Appliance

Oracle Architecture, Concepts & Facilities

Data Warehouse: Introduction

Online Transaction Processing in SQL Server 2008

IBM WebSphere DataStage Online training from Yes-M Systems

MDM and Data Warehousing Complement Each Other

Data Warehousing With DB2 for z/os... Again!

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Performance Verbesserung von SAP BW mit SQL Server Columnstore

SUN ORACLE EXADATA STORAGE SERVER

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Safe Harbor Statement

IBM Netezza High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum

Netezza Basics Class Outline

Integrating Netezza into your existing IT landscape

Big Data Disaster Recovery Performance

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

How To Build An Exadata Database Machine X2-8 Full Rack For A Large Database Server

EII - ETL - EAI What, Why, and How!

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

System Requirements Table of contents

Application-Tier In-Memory Analytics Best Practices and Use Cases

Real-time Data Replication

Session: Archiving DB2 comes to the rescue (twice) Steve Thomas CA Technologies. Tuesday Nov 18th 10:00 Platform: z/os

Oracle Database 11g Comparison Chart

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Module 14: Scalability and High Availability

The Methodology Behind the Dell SQL Server Advisor Tool

<Insert Picture Here> Oracle Exadata Database Machine Overview

Database Performance with In-Memory Solutions

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Active/Active DB2 Clusters for HA and Scalability

Maximum performance, minimal risk for data warehousing

SQL Server 2005 Features Comparison

LearnFromGuru Polish your knowledge

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

Your Data, Any Place, Any Time.

The HP Neoview data warehousing platform for business intelligence Die clevere Alternative

Understanding the Benefits of IBM SPSS Statistics Server

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

2015 Ironside Group, Inc. 2

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Deep Dive into IBM DB2 Analytics Accelerator Query Acceleration

Toronto 26 th SAP BI. Leap Forward with SAP

DBAs having to manage DB2 on multiple platforms will find this information essential.

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

SQL Server 2008 Performance and Scale

CitusDB Architecture for Real-Time Big Data

Understanding Data Warehouse Needs Session #1568 Trends, Issues and Capabilities

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Introducing Oracle Data Integrator and Oracle GoldenGate Marco Ragogna

Oracle Database 12c Built for Data Warehousing O R A C L E W H I T E P A P E R F E B R U A R Y

SQL Server Administrator Introduction - 3 Days Objectives

SQL Server 2012 Business Intelligence Boot Camp

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

Transcription:

Analytics on System z Exploiting Accelerator Technologies for Online Archiving Knut Stolze Architect IBM DB2 Analytics Accelerator stolze@de.ibm.com 1

Agenda Introduction Architecture in Depth Netezza Backend Integrating Netezza with DB2 z/os Data Synchronization High Performance Storage Saver 2

OLTP vs. Analytics Examples OLTP - Transactional Withdrawal from a bank account using an ATM Buying a book at Amazon.com Check-In for a flight at the airport Hand-over manufactured printers to an oversea-carrier Transactional Analytics: (Operational BA) Approve request to increase credit line based on credit history and customer profile Propose additional books based on similar purchases by other customers Offer an upgrade based on frequent flyer history of all passengers and available seats Optimize shipping by selecting cheapest and most reliable carrier on demand Deep Analytics Regular reporting to central bank sum of transactions by account Which books were bestsellers in Europe over the last 2 months? Marketing campaign to sell more tickets in off-peak times Trend of printers sold in emerging countries versus established markets. 3

IBM DB2 Analytics Accelerator for z/os Version 3 zenterprise PureData Technology CLIENT Data Studio Foundation DB2 Analytics Accelerator Admin Plug-in OSA- Express3 10 GbE Network Primary 10Gb Backup Users/ Applications Data Warehouse application DB2 for z/os enabled for IBM DB2 Analytics Accelerator IBM DB2 Analytics Acelerator 4

Today s Typical Data Life Cycle Architecture Analyze Data Mining Segmentation Prediction Statistical Analysis x/p server Multi- Dimensional Analysis x/p server Bulk Analytics Server Scoring Analytical Foresight Hourly/daily Batch Process x/p/z server Staging Area Data Mover Transformation Server Rules Optimized Business Processes Customer Support Staging Area Claims Processing Underwriting x/p server Batch Process Enterprise Data Warehouse (RDBMS) Sales Effectiveness Fraud Management Marketing x/p/z server Operational Systems ODS (RDBMS) OLTP Continuous feed x/p/z server MIS System Budgeting Campaign management Financial Analysis Selling Platforms Customer Profit Analysis CRM Report Departmantal Data Marts Online Queries & Reporting BA Tooling Cleanse Transform Warehouse 5

Short Term Target Data Life Cycle Architecture Analyze LPAR 3: Linux on z MIS System Budgeting memory Campaign Management Financial Analysis Selling Platforms memory Customer Profit Analysis CRM Ad-hoc Queries & Reporting Cognos memory ODS and EDWH/DM (DB2 Inz/OS) Build on request LPAR 2: z/os (ODS/EDW) ELT hourly/daily feed Hourly/daily feed Under control of DB2 Scoring Rules Departmantal Data Marts DB2/SPSS Real-time Scoring Bulk LPAR 1: z/os (OLTP) OLTP CICS Accelerator Risk Calc. DB2 z/os Accelerator Continuous data feed OLTP Report Departmantal Data Marts Analytics Server Cleanse Transform Warehouse 6

Database, Data Warehousing, & Business Analytics Market Segmentation Parameters Traditional Distributed Market User Community C Level Mgt Number of Users Few Trans. Volume Small Trans. Latency Less Important Availability Less Important Trans. Type Complex Traditional System z Market Analysts (e.g. Mktg, Research) Company Management Customer Service & Support (e.g. call centers, sales personnel) DW & BA Market Growth Customers (e.g. external, Web) Highest Qualities of Service Required Many Very Large Critical Critical Simple 7

Why Both? Marrying the best of both worlds IBM PureData N1001 IBM System z Focused Appliance Mixed Workload System Capitalizing on the strengths of both platforms while driving to the most cost effective, centralized solution - destroying the myth that transaction and decision systems had to be on separate platforms Very focused workload Very diverse workload 8

Large Insurance Company Business Reporting we had this up and running in days with queries that ran over 1000 times faster we expect ROI in less than 4 months 9 Total Rows Reviewed DB2 with IDAA DB2 Only Total Rows Returned Hours Sec(s) Hours Sec(s) Times Faster Query Query 1 2,813,571 853,320 2:39 9,540 0.0 5 1,908 Query 2 2,813,571 585,780 2:16 8,220 0.0 5 1,644 Query 3 8,260,214 274 1:16 4,560 0.0 6 760 Query 4 2,813,571 601,197 1:08 4,080 0.0 5 816 Query 5 3,422,765 508 0:57 4,080 0.0 70 58 Query 6 4,290,648 165 0:53 3,180 0.0 6 530 Query 7 361,521 58,236 0:51 3,120 0.0 4 780 Query 8 3,425.29 724 0:44 2,640 0.0 2 1,320 Query 9 4,130,107 137 0:42 2,520 0.1 193 13 DB2 Analytics Accelerator (PureData 1000-12) Production ready - 1 person, 2 days Table Acceleration Setup in 2 Hours DB2 Add Accelerator Choose a Table for Acceleration Load the Table (DB2 Loads Data to the Accelerator Knowledge Transfer Query Comparisons Initial Load Performance 400 GB Loaded in 29 Minutes 570 Million Rows Loaded 800 GB to 1.3 TB per hour Extreme Query Acceleration - 1908x faster 2 Hours 39 minutes to 5 Seconds CPU Utilization Reduction to 35%

Agenda Introduction Architecture in Depth Netezza Backend Integrating Netezza with DB2 z/os Data Synchronization High Performance Storage Saver 10

Information Management TM Accelerator powered by PureData N1001 Appliance Slice of User Data Swap and Mirror partitions High speed data streaming High compression rate Disk Enclosures SMP Hosts Snippet BladesTM (S-Blades, SPUs) EXP3000 JBOD Enclosures 12 x 3.5 1TB, 7200RPM, SAS (3Gb/s) max 116MB/s (200-500MB/s compressed data) e.g. TF12: 8 enclosures 96 HDDs 32TB uncompressed user data ( 128TB) Accelerator Server SQL Compiler, Query Plan, Optimize, Administration 2 front/end hosts, IBM 3650M3 or 3850X5 clustered active-passive 2 Nehalem-EP Quad-core 2.4GHz per host Processor & streaming DB logic High-performance database engine streaming joins, aggregations, sorts, etc. e.g. TF12: 12 back/end SPUs (more details on following charts) 11

Information Management The PureData S-BladeTM Components Dual-Core FPGA PureData DB Accelerator Intel Quad-Core IBM BladeCenter Server 12

Snippet-Blade (S-Blade) Components HX5 Blade 128 GB RAM 16 Intel Cores BPE4 Side Car 16 GB RAM 16 Virtex-6 FPGA Cores SAS Controller 13 IBM BladeCenter Server PureData DB Accelerator

PureData System for Analytics Models Pure Data System for Analytics N1001 Blade Type HS22 HX-5 Pure Data System for Analytics N2001 CPU Cores / Blade 2 x 4 Core Intel CPUs 2 x 8 Core Intel CPUs # Disks 96 x 3.5 / 1 TB SAS (92 Active) Raw Capacity 96 TB 172.8 TB Total Disk Bandwidth ~11 GB/s ~32 GB/s S-Blades per Rack (cores) 14 (112) 7 (112) S-Blade Memory 24 GB 128 GB Rack Configurations ¼, ½, 1, 1 ½, 2 10 ½, 1, 2, 4 288 x 2.5 / 600GB SAS2 (240 Active) FPGA Cores / Blade 8 (2 x 4 Engine Xilinx FPGA) 16 ( 2 x 8 Engine Xilinx Virtex 6 FPGA) User Data / Rack * 128 TB 192 TB 14 * Assuming 4x Compression

The Key to the Speed select DISTRICT,PRODUCTGRP, sum(nrx) from MTHLY_RX_TERR_DATA where MONTH = '20091201' and MARKET = 509123 and SPECIALTY = 'GASTRO' Zone Map FPGA Core CPU Core Slice of table MTHLY_RX_TERR_DATA (compressed) Uncompress Project Restrict, Visibility Complex Joins, Aggs, etc. 15 select DISTRICT, PRODUCTGRP, sum(nrx) where MONTH = '20091201' and MARKET = 509123 and SPECIALTY = 'GASTRO' sum(nrx)

N1001 Systems and Sizes PureData System for Analytics N1001 1 10 002 005 010 015 020 030 040 060 080 100... Just Just installed installed at at Banco Banco do do Brazil Brazil Cabinets 1/4 1/2 1 1 1/2 2 3 4 6 8 10 S-Blades 3 6 12 18 24 36 48 72 96 120 Processing Units 24 48 96 144 192 288 384 576 768 960 Capacity (TB) 8 16 32 48 64 96 128 192 256 320 Effective Capacity 32 64 128 192 256 384 512 768 1024 1280 Predictable, Linear Scalability throughout entire family Capacity = User Data space Effective Capacity = User Data Space with compression *: 4X compression assumed 16

N2001 Systems and Sizes PureData System for Analytics N2001 005 010 020 040 Cabinets 1/2 1 2 4 Watch this space S-Blades 4 7 14 28 Processing Units 56 112 224 448 Capacity (TB) Effective Capacity 24 48 96 192 96 192 384 768 Predictable, Linear Scalability throughout entire family Capacity = User Data space Effective Capacity = User Data Space with compression *: 4X compression assumed 17

Agenda Introduction Architecture in Depth Netezza Backend Integrating Netezza with DB2 z/os Data Synchronization High Performance Storage Saver 18

IBM DB2 Analytics Accelerator for z/os Version 3 zenterprise PureData Technology CLIENT Data Studio Foundation DB2 Analytics Accelerator Admin Plug-in OSA- Express3 10 GbE Network Primary 10Gb Backup Users/ Applications Data Warehouse application DB2 for z/os enabled for IBM DB2 Analytics Accelerator IBM DB2 Analytics Acelerator 19

Workload-Optimized Query Execution 20 Operational Analytics Real time data ingestion High Concurrency Advanced Analytics * Standard Reports OLAP Complex Queries User control and DB2 heuristic DB2 for z/os and DB2 Analytics Accelerator DB2 Native Processing Optimized processing for BI Workload Single and unique system for mixed query workloads Dynamic decision for most efficient execution platform Combines the strengths of both System z and PureData Merging operational and data warehouse into a single optimized environment New special register QUERY ACCELERATION NONE ENABLE ENABLE WITH FAILBACK New heuristic in DB2 optimizer

Topology M:N DB2 Analytics DB2 Analytics Accelerator GUI Accelerator GUI DB2 Analytics Accelerator GUI System zec12, z196 or z114 CEC System zec12, z196 / z114 CEC LPAR T1 DB2 DSG: DEV SSID: DBD1 LPAR T2 DB2 DSG: TEST SSID: DBT1 DB2 DSG: TEST SSID: DBT2 LPAR P1 DB2 DSG: PROD SSID: DBP1 Data Sharing Group LPAR P2 DB2 SSID: DB2A 21 IBM DB2 Analytics Accelerator 1 IBM DB2 Analytics Accelerator 2 Multiple Accelerators Same Tables can reside in multiple Accelerators for Scalability and Availability Resource Management Accelerator appliance CANNOT be partitioned, but can be shared. Structures/Objects owned by other DB2 s not visible to other DB2s Customer Console permits resource priority assignments for different attached DB2 subsystems. E.g. System Test DB2 can be configured to have higher guaranteed resources than Development

Deep DB2 Integration within zenterprise Applications DBA Tools, z/os Console,... Application Interfaces (standard SQL dialects) Operational Interfaces (e.g. DB2 Commands) DB2 for z/os Data Manager Buffer Manager... IRLM Log Manager IBM DB2 Analytics Accelerator Superior availability reliability, security, Workload management z/os on System z Superior performance on analytic queries PureData 22

Query Execution Process Flow Application Interface Optimizer CPU SPU FPGA Application Query execution run-time for queries that cannot be or should not be off-loaded to Accelerator Accelerator DRDA Requestor SMP Host Memory SPU CPU FPGA Memory SPU CPU FPGA Memory SPU CPU FPGA Memory DB2 for z/os DB2 Analytics Accelerator Queries executed without DB2 Analytics Accelerator Queries executed with DB2 Analytics Accelerator 23

Integrating PureData Functionality in DB2 The Accelerator GUI is able to receive the PureData Plan files for query executions that happened on the accelerator side. These files are parsed and embedded into DB2 Visual Explain. Distribution and Organizing keys can be altered on the fly based on the Explain output. The accelerator then redistributes table data in the background. 24

Agenda Introduction Architecture in Depth Netezza Backend Integrating Netezza with DB2 z/os Data Synchronization High Performance Storage Saver 25

Synchronization Options with IBM DB2 Analytics Accelerator Synchronization options Full Table Refresh The entire content of a database table is refreshed for accelerator processing Table Partition Refresh For a partitioned database table, selected partitions can be refreshed for accelerator processing Incremental Update Log-based capturing of changes and propagation to IBM DB2 Analytics Accelerator with low latency (typically a minute) Use cases, characteristics and requirements Existing ETL process replaces entire table Multiple sources or complex transformations Smaller, un-partitioned tables Reporting based on consistent snapshot Optimization for partitioned warehouse tables, typically appending changes at the end More efficient than full table refresh for larger tables Reporting based on consistent snapshot Scattered updates after bulk load Reporting on continuously updated data (e.g., an ODS), considering most recent changes More efficient for smaller updates than full table refresh 26

Option 1: Full Table Refresh Changes in data warehouse tables typically driven by scheduled (nightly or more frequently) ETL process Data used for complex reporting based on consistent and validated content (e.g., weekly transaction reporting to the central bank) Multiple sources or complex transformations prevent propagation of incremental changes Full table refresh triggered through DB2 stored procedure (scheduled, integrated into ETL process or through GUI) Operational Analytics, Reports, OLAP, DB2 native processing Continuous Query Processing DB2 z/os Query Optimizer Accelerator processing Queries may continue during full table refresh for accelerator ETL Process Full table refresh DB2 for z/os database 27 Changes / Replacement

Accelerator Data Load DB2 for z/os Accelerator Table B Table A Accelerator Studio Accelerator Administrative Stored Procedures Table C Table D Part 1 Part 2 Part 1 Part 2... Unload Unload... USS Pipe USS Pipe... Coordinator CPU FPGA Memory CPU FPGA Memory CPU FPGA Memory Part 3 Part m Unload USS Pipe CPU Memory FPGA 28 1 TB / h can vary, depending on CPU resources, table partitioning, Update on table partition level, concurrent queries allowed during load V2.1 & V3 unload in DB2 internal format, single translation by accelerator

Option 2: Table Partition Refresh Changes in data warehouse table typically driven by delta ETL process (considering only changes in source tables compared to previous runs) or by more frequent changes to most recent data Optimization of Option 1 when target data warehouse table is partitioned and most recent updates are only applied to the latest partition Table partition refresh triggered through DB2 stored procedure (scheduled, integrated into ETL process or through GUI) Operational Analytics, Reports, OLAP, Continuous Query Processing 29 Maintains snapshot semantics for consistent reports Queries may continue during table partition refresh for accelerator Replication ETL Process Changes DB2 native processing January February March April May DB2 z/os Query Optimizer Partition refresh DB2 for z/os database Accelerator processing

Option 3: Incremental Update Changes in data warehouse tables typically driven by replication or manual updates 30 Corrections after a bulk-etl-load of a data warehouse table Continuously changing data (e.g. trickle-feed updates from a transactional system to an ODS) Reporting and analysis based on most recent data May be combined with Option 1 & 2 (first table refresh and then continue with incremental updates) Incremental update can be configured per database table Replication Application Changes Operational Analytics, Reports, OLAP, DB2 native processing Continuous Query Processing DB2 z/os Query Optimizer Incremental Update DB2 for z/os database Accelerator processing

Agenda Introduction Architecture in Depth Netezza Backend Integrating Netezza with DB2 z/os Data Synchronization High Performance Storage Saver 31

IBM DB2 Analytics Accelerator for z/os Version 3 zenterprise PureData Technology CLIENT Data Studio Foundation DB2 Analytics Accelerator Admin Plug-in OSA- Express3 10 GbE Network Primary 10Gb Backup Users/ Applications Data Warehouse application DB2 for z/os enabled for IBM DB2 Analytics Accelerator IBM DB2 Analytics Acelerator 32

Requirement Most of the data in an ODS or EDW is static The large tables are partitioned by time Older partitions are never changed The most recent partition is frequently changed Many DBMS vendors provide multi-temperature data solutions The level of sophistication varies, the industry leading solutions are so called 'near-line storage servers' 'near-line' means 'near-online' Value proposition is twofold: Move less frequently accessed data to cheaper storage Improve performance for both queries and administrative operations accessing more recent data The drawback is degraded performance of analytical queries that access old data Better solution is needed if the query access pattern includes both Transactional, i.e. accessing limited amount of data, predominantly from the most recent partition, and Analytical, i.e. accessing large amount of data across all the partitions IDAA can offer such a solution Online Storage Server as opposed to nearline storage server Netezza provides very large disk capacity at a fraction of cost of the System z disk subsystem, e.g. TF12 typically more than 100TB of user data, Cruiser even more than that. IDAA technology provides the basis for transparent access to data irrespective of where they reside (on DB2 disks or Netezza disks) DB2 with IDAA is a hybrid DBMS that supports both transactional and analytical access patterns 33

Basic Proposal DB2 for z/os Combine old data in IDAA with current data in DB2 database New data DB2 for z/os Database Table Sales History IDAA Netezza DB Table Sales History Month 08/2011 (current) Month 07/2011 Month 06/2011 Month 05/2011 Month 04/2011 Month 08/2011 (current) Month 07/2011 Month 06/2011 Month 05/2011 Month 04/2011 Back-up Moved to IDAA Changes propagated to IDAA 34

Save Over 95% of Host Disk Space for Historical Data Historical Data Year Year -1 Year -2 Year -3 Year -4 Year -5 Year -7 1Q 1Q 1Q 1Q 1Q 1Q 1Q 2Q 2Q 2Q 2Q 2Q 2Q 2Q 3Q 3Q 3Q 3Q 3Q 3Q 3Q 4Q 4Q 4Q 4Q 4Q 4Q Current Data 4Q One Quarter = 3.57% of 7 years of data One Month = 1.12% of 7 years of data One month = 2.78% of 3 years of data 35

High Performance Storage Saver Reducing the cost of high speed storage Time-partitioned tables where: only the recent partitions are used in a transactional context (frequent data changes, short running queries) the entire table is used for analytics (data intensive, complex queries). High Performance Storage Saver s Archive Process: Data is loaded into Accelerator if not already loaded Automatically takes Image Copy of Each Partition to be Archived Automatically Remove data from DB2 archived tablespace partitions DBA starts archived partitions as read-only Query from Application Part #1 DB2 No longer present on DB2 Storage Or Part #1 Part #2 Part #3 Accelerator Part #4 Part #5 Part #6 Part #7 36 Active Archive

Storage options to match data needs Optimized in both price and performance for differing workloads High Performance Storage Saver Single Disk Store Only stored on Accelerator storage (Less Cost) Optimized performance for deep analytics, multifaceted, reporting and complex queries Only full table update or full partition update from backup Same high speed query access transparently through DB2 Database Resident Partitions Dual Disk Store Stored on both DB2 and Accelerator storage Mixed query workload with transactions, single record queries and record updates with deep analytics, multifaceted, reporting and complex queries. Full table, full partition update, Incremental update from DB2 data Same high speed query access transparently through DB2 Cost The right mix of cost and functionality Functionality 37 37

Disclaimer HPSS has specific semantics that is new to DB2 and the users need to familiarize themselves with it in order to ensure its proper use. A failure to adhere to this can have severe consequences including a loss of data and integrity exposures. The key characteristics is that some data no longer resides in DB2, therefore any operation that is not supported by the IBM DB2 Analytics Accelerator, such as data or schema modifying SQL, can have unpredictable and undesired consequences. So, please, carefully read the documentation (Usage Guide)! 38

Initial Situation Application DB2 IDAA part n part n part n-1 part n-1 SELECT FROM X yes routing? no table X part n-2... table X part n-2... part 2 part 2 part 1 part 1 backup part 1 backup part 2... backup part n-1 backup part n 39

Partitions to be Moved are Firstly Backed Up Old Supplied Partitions Stored are Deleted Procedure from DB2 and Encapsulates Table X is Split Within Data IDAA Move Application DB2 IDAA table X part n part n-1 part n part n-1 part n-2 part n-2 CALL stored procedure ACCEL_ARCHIVE_ TABLES partitions specification 'partitions specification' is given in terms of which tables and which partitions should be moved to IDAA. Let's say that in this particular example only the last two partitions n and n-1 of table X should stay in DB2 backup part 1... backup part 2 part n-1 backup part 2 table X... Schema information (partition boundaries) for old partitions are still present in the DB2 catalog, but the partitions are empty and the disk space use is limited to the primary allocation quantity which can be made very small part 2 part 1 backup part n table X... part 2 part 1 40

Applications have Transparent Access to the Table Application DB2 IDAA yes SELECT FROM X routing? no table X part n part n-1 table X part n part n-1 Set zparm (1) or Set special register (2) SELECT FROM X U N I O N table X complement part n-2... part 2 part 1 (1) Set once on global scope, without application changes (2) Set within the application and allows changing the scope on a per-statement level backup part 1 backup part 2... backup part n-1 backup part n 41

42

Moving Partitions to IDAA via GUI 43

ACCEL_ARCHIVE_TABLES <?xml version="1.0" encoding="utf-8"?> <dwa:tablesetforarchiving xmlns:dwa="http:~//www.ibm.com/xmlns/prod/dwa/2011" version="1.0"> <table name="sales" schema="bcke"> <!-- explicitly specified logical partition numbers --> <partitions>1,5:10,20</partitions> </table> <table name="customer" schema="bcke"> <partitionselectionpredicate>limitkey < CURRENT DATE 3 MONTHS</partitionSelectionPredicate> </table> <table name="order2009" schema="bcke"> </table> </dwa:tablesetforarchiving>

ACCEL_RESTORE_ARCHIVE_TABLES <?xml version="1.0" encoding="utf-8"?> <dwa:tablesetforrestorearchiving xmlns:dwa="http:~//www.ibm.com/xmlns/prod/dwa/2011" version="1.0"> <table name="sales" schema="bcke" > <partitions>1,5:10,20</partitions> </table> <table name="customer" schema="bcke"> <partitions>1,2,3,4,5,6,7</partitions> </table> </dwa:tablesetforrestorearchiving>

ACCEL_GET_TABLES_INFO <?xml version="1.0" encoding="utf-8"?> <dwa:tableinformation xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa/2011" version="1.1"> <table schema="bcke" name="sales"> <status loadstatus="loaded" accelerationstatus="true" integritystatus="unimpaired" archiveenabled="true" /> <statistics useddiskspaceinmb="1" rowcount="2" usedarchivediskspaceinmb="100" archiverowcount="10000" skew="0.3" organizedpercent="95.00" lastloadtimestamp="2012-01-09t11:53:27.997141" /> </table> </dwa:tableinformation>

ACCEL_GET_TABLES_DETAILS <?xml version="1.0" encoding="utf-8"?> <dwa:tablesetchanges xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa/2011" version="1.0"> <table name="test_by_range" schema="mydwa" allarchivedpartitionskept="true" /> <partinformation type="by_range"> <column name="col1"/> </partinformation> <part dbmspartnr="2" logicalpartnr="1" endingat="2011-10-31"> <archiveinformation timestamp="2012-03-26t17:27:00.000000" datasizeinmb="13765" specification="date(limitkey) <= '2012-03-28'"> </part> <part dbmspartnr="3" logicalpartnr="2" endingat="2011-11-30"> <changeinformation category="none"... /> </part> <part dbmspartnr="4" logicalpartnr="3" endingat="2011-12-31"> <archiveinformation timestamp="2012-03-26t17:27:00.000000" datasizeinmb="135"> <changeinformation category="unknown"... /> </part> <part dbmspartnr="1 logicalpartnr="4" endingat="2011-01-31"> <changeinformation category="reload_required"... /> </part> </table> <table name="test_unpartitioned" schema="mydwa"/> <changeinformation category="unknown"... /> </table> </dwa:tablesetchanges>

Limitations (Enforced by IDAA) Only tables partitioned by range can be archived in the accelerator. A table involved as a parent in a referential integrity constraint managed by DB2 cannot be archived in the accelerator. A table which includes any of the following data types: BLOB, CLOB, DBCLOB, and XML cannot be archived in the accelerator. Restoring individual partitions back to DB2 is not supported. A table can be archived to one accelerator only. Some frequently used COPY utility options cannot be specified. The smallest unit for archiving is a partition. HPSS inherits all limitations of the IBM DB2 Analytics Accelerator, e.g. No support for static SQL Pruning the data of archived partition places an exclusive lock on the partition as part of running LOAD REPLACE utility. 48

Limitations (NOT Enforced by IDAA) 49 Data modifying operations (insert, delete, update, merge, load) on a table archived in the accelerator are not prevented. Customer has to guarantee to not insert/modify data in already archived partitions. Schema modifying operations (DDL) are not prevented for a table archived in the accelerator, but the archived data is no longer available for queries. Schema modifying operations (DDL) are not prevented for the tablespace on which a table archived in the accelerator is defined. A table involved as a parent in a referential integrity constraint managed outside of DB2 (i.e. not known to DB2) can be archived in the accelerator. A table which includes columns with the following data types: DECFLOAT, ROWID, TIMESTAMP with a scale other than 6, and user-defined data types, can be archived in the accelerator. Archived data of those columns cannot be queried MODIFY RECOVERY can erase catalog records about the image copies taken at table archiving time The high-lever qualifier (HLQ) for the image copies data sets created by the ACCEL_ARCHIVE_TABLES stored procedure must identify system managed storage (SMS) data sets. It is undefined whether queries running concurrently to partitions being archived include the data currently being archived. This applies to queries running in DB2 as well as queries that are routed to the accelerator.

You want details? Sorry, still too soon 50

51