Daten archivieren mit der IBM InfoSphere Optim Data Growth Solution Heidrun Wietzorek wietzo@de.ibm.com July 1, 2011
Agenda Gründe für die Archivierung von Daten Vorgehensweise Daten identifizieren Archivieren Löschen Wiederhestellen Verwalten Datenzugriff auf Archivdaten Planungsgesichtspunkte Standardprodukt vs. eigene Lösung Zusammenfassung 2
Creating more Data then you can handle? Just add hardware? 50-70% of records typically kept past retention requirements Forrester (*) estimates that, on average, data repositories for large applications grow by 50% annually (structured data) and 85% of data stored in databases is inactive * Source: Noel Yuhanna, Forrester Research, Database Archiving Remains An Important Part Of Enterprise DBMS Strategy 3
Prospective on: Value, Users, Data, Infrastructure & SLA?? SLA <1 sec SLA <1 week New data 90 days old 1 year old 4 years old n years old Application access Reporting access 4
Data Multiplier Effect Actual Data Burden = Size of production database + all replicated clones 200 GB Production 200 GB Test 1200 GB Total 200 GB Backup 200 GB Development 200 GB Disaster Recovery 200 GB Quality Control 5
A Definition of Archiving: Archiving is an intelligent process for placing inactive or infrequently accessed data that still has business value on the right tier of storage, with the right class of service, while maintaining search and retrieval capability during a specified retention period. 6
How Does Archiving Improve Performance? Improved Availability No downtime caused by batch process overruns Uptime during crunch time Meet SLAs Speeding Backup and Recovery Bring up important/recent data first Bring up older/reference data as conditions permit Improved Application Performance One of the most understated benefits to archiving Longest and most lasting benefit 7
How can I save money by Archiving Data? Storage Production level data is typically stored on the most expensive media Migrate and store data according to its evolving business value (ILM) Use tiered storage strategies to maximize cost efficiencies Utilize the storage you already have (including tape!) Administrative costs of data management Software license fees Hardware costs Labor to manage data growth DBA System Admin Storage Admin Reduction in processor upgrades More MIPS/processors required to process large data repositories 8
IBM Optim Archiving Process Discover Identify & define relationships Create business object. Archive/Manage Archive/Restore Apply/Remove Legal Hold Expire/Destroy Secure Administer. Present Simple Reporting Enterprise Reporting Web Mashup Records Manager 9
Identify the data to be archived Access Definition Defines a subset of of relational data Start table Associated data Relationships Extraction rules Index specifications CUSTOMERS -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- ORDERS -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- ITEMS -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- DETAILS -- ---- ---- ---- ------- ---- -- ---- ---- ---- ------- ---- -- ---- ---- ---- ------- ---- -- ---- ---- ---- ------- ---- -- ---- ---- ---- ------- ---- 10
1 Complete Business Object: Discover the Relationships Automatically discover Relationships (Primary and Foreign Keys) even when enforced in the application layer or exists across heterogeneous databases Groups related tables in to business objects Single click to create a consistent sample set across business objects Additional Transformation and Complex Business Rules Discovery Optim Unique Feature 11
Optim Captures the Complete Business Object Business view historical reference snapshot of business activity The Archive : Represents application data record payment, invoice, customer Referentially-intact subset of data across related tables and applications; includes metadata, DDL, Reference + Transaction. Benefit: Get rid of the application and related costs! Benefit: Restore completely to any platform Benefit: Referential Integrity: Leveraged for Audits, Reports and Purge DBA view Referentially-intact subset of data Related LUW Files or Documents Complete Data RI Preserved! OS Independent DB independent ODBC Accessible Federated access to data and metadata Oracle Oracle DB2 DB2 Sybase Sybase Adabas Adabas 12
Complete Business Object: Self-Describing Object ANSI ANSI SQL SQL Immutable Archive object XML XML Content Original DDL & DBSM Objects Business Object Definition Local Metadata Compressed Raw Data in Tables Compressed Documents Default Storage Strategy Default Retention Strategy Security Signature Reason/Benefit Defines/rebuilds source/new databases Retains extract/delete strategy Enables/describes data access Saves up to 90% of space To help meeting compliance Archive Objects are self-describing objects that can be accessed via ODBC/JDBC ANSI SQL-92 or rendered as XML structures 13
Defining the Archive Process Steps PRODDB 3- RESTORE Customers Orders 1- ARCHIVE Archive Details 2- DELETE Archiving/ Delete Steps Restoring Steps Identify the data to archive Define the data to delete Select Archive File storage Choose a delete method Run Archive Request Create Delete Request if deferred Run Delete Request if deferred Locate Archive File Create a Restore Request Run Restore Request 14
Defining the Archive (continued) The Start Table Customers May start archive with a child or parent table Orders Items Details 15 Archive: All ORDERS older than four years and the related data in the other tables Delete: From ORDERS and DETAILS only
Defining the Archive (continued) The Table List Identify the Start Table Populate list with the RELATED tables Include selection criteria Indicate which tables will have rows deleted Specify Archive actions and indexes 16
Defining the Archive (continued) Relationship Usage Select relationship paths - Use Relationships Defined in the RDBMS catalog or - Create New Relationships that are stored in the Optim Directory Designate relationship traversal Limit number of child rows archived Specify Access Method / Key Lookup Limit 17
Defining the Archive (continued) Relationship Traversal Customers Orders Items Details Option 1 Option 2 Option 2 Parts Option 1: Only ITEMS that are parents of DETAILS Back Orders Option 2: All other DETAILS for those ITEMS Each of the PARTS for those ITEMS 18
Defining the Archive (continued) Show the Archive Steps Shows the following: Steps required to perform archive Cycles processed Any untraversed tables 19
Defining the Archive (continued) Archive Parameters Point and Shoot Customers Orders Archive Index File Archive File Details Process Report Delete records immediately after archive or defer (performance optimization) Archive both data and object definitions Execute Online or Batch Optim Repository 20
Run the Archive Request Online or Batch PRODDB Customers Optim Repository Online Archive Library/ Database Offline Archive Library Orders Archive Details Delete DBMS Load Files CSV / Text Files 21
Archive Process Delete the Archived Data PRODDB Archive File Customers Customers Orders Orders Control File Details Details Delete is automatic after successful archive or can be deferred post archive verification Delete specifications define which data to delete Control File enables Retry/Restart of delete 22
Archive Process (continued) Delete the Archived Data - z/os Offline Delete Method Archive File PRODDB PRODDB Customers Unload LOAD Customers Orders New Unload Orders Details OFFLINE DELETE Details Incorporate delete into normal database maintenance procedures Delete specifications define which data to delete Eliminates impact of logging during delete 23
Restoring Archived Data Production/ Archive Database PRODDB Optim Repository Customers Orders Details Metadata Mapping (Optional) Restore Data to Restore Archive Library Research/ Browse 24
Restoring Archived Data Table Map Map unlike table names, qualifiers Exclude individual tables from restore Can be saved in Optim Directory 25
Restoring Archived Data (continued) Column Map Literals Special Registers Expressions Default Values User exits Column Maps allow: Mapping unlike column names Datatype conversions Populate new destination columns 26
Restoring Archived Data (continued) Control File Customers If errors occur during RESTORE: Archive Index File Archive File Insert, Load, Update Orders Details BROWSE the control file for error information RETRY/RESTART the RESTORE process Process Report Control File Statistical information Error information 27
The Archive Directory - Managing Your Archived Data Search archive directory by group name, date, table, or column Apply search criteria (e.g., specific CUST_ID) Create directory information reports Delete old, unwanted archive files Register archive files after their relocation 28
The Archive Directory (continued) Archive Indexes Enable rapid searches for archived data Defined in the Access Definition or can be added later Two index types: - Sparse: only high/low column values are stored in the Archive Directory Useful to locate candidate archive files in directory during search - Dense: all column values are stored in a file pointed to by the Archive Directory Useful to speed searching archive for a particular record 29
Browsing the Archive Files Reporting Options Browse the Archive file using the built in browser Convert Archive file to CSV file for input to other reporting programs or applications ODBC / JDBC access via Open Data Manager (ODM) Optim includes IBM Mashups for use with Archive data 30
Presentation Layer - Range of Archive Reporting Options Optim IBM Mashup Center Application independent access Industry standard methods: SQL, ODBC/JDBC, XML IBM Mashup Center Report writers: Crystal Reports, Cognos, Business Objects, Discoverer, Actuate Desktop formats: Excel, CSV, MS Access Database formats Google Like Search Engine Access Any Record, Anytime, Anywhere! 31
IBM InfoSphere Optim Data Find Requirements Data Find Direkte Informationssuche von archivierten Daten mittels einer Webbasierten Search engine Ermöglicht Business Anwendern ad-hoc Archive zu durchsuchen und einen schnellen Datenzugriff Erlaubt den Zugriff mittels advanced query capabilities Ermöglicht damit, ohne die Nutzung teurer IT Resourcen, ad-hoc -Recherche-Requests zu bedienen Benefits Verbesserung der Kundenzufriedenheit durch Einhaltung von SLAs Erzeugung besserer und Reports Kosten- und Zeitreduktion für Entwicklung von Reports 32
Have a Storage Plan for the Entire Lifecycle of Data Store in Any Environment Current Data Active Historical Online Archive Offline Archive 1-2 years 3-4 years 5-6 years 7+ years Production Database Archive Restore Archive Reporting Database Non DBMS Retention Platform ATA File Server EMC Centera IBM RS550 HDS Offline Retention Platform CD Tape Optical Archive Definitions Compressed Archives Compressed Archives Compressed Archives 33
Enterprise Architecture Non Production Environments Subset & Mask Production Environments Archive Siebel Oracle PeopleSoft JDEdwards SAP Amdocs Custom OEM/ISV Siebel Oracle PeopleSoft JDEdwards SAP Amdocs Custom OEM/ISV Data Growth, Data Privacy, Test Data Management, Application Upgrades, Application Retirement Optim Oracle SQL Server Sybase Informix DB2 LUW XML IBM IMS VSAM Adabas DB2 for z/os Teradata Windows XP/2000 Solaris HP/UX Linux IBM AIX IBM OS/390 IBM z/os IBM iseries Network Access Storage (NAS), Storage Area Network (SAN), Advanced Technology Attachment (ATA), Content Addressable Storage (CAS), Tape 34
IBM OPTIM Data Growth Management - A Total Solution Enforce retention policies Restore data in original or different DBMS Isolate from original DBMS Meet compliance requirements Express archive and Business Oriented archiving Access archived data directly via SQL without restore Ability to join heterogeneous data Compress data to save storage costs Ability move across multiple storage tiers Ability to use multiple storage devices Ability to automate purge at end of retention Ability to hold purge during litigation 35
IBM Optim An Integrated Data Management Platform Integrated Data Management Test & Development Databases Production Databases Value: Automates analysis of data and data relationships for complete understanding of data assets IBM InfoSphere Discovery Define the business objects for archiving and subsetting Identify all instances of private data so that they can be fully protected Discover undocumented business rules used to transform data from existing systems Prototype and test new transformations for the target system IBM Optim Test Data Management Solution IBM Optim Data Privacy Solution IBM Optim Decommissioning Solution IBM Optim Data Growth Solution Support Enterprise Initiatives Value: Speed Application Delivery Create realistic and manageable test environments Speed application delivery Improve Test Coverage Improve Quality Value: Risk Management Protect PII Data Apply Single Data Masking Solution Leverage realistic data Value: Reduce Infrastructure Cost & Compliance Decommission redundant or obsolete applications Retain Access to historical data Value: Improve Application Performance, Reduce Infrastructure Costs & Improve Compliance Retain only needed data, move the rest to archives Deploy Tiered Storage Strategies Retain Data According to Value Simplify Infrastructure Value: Understand the enterprise to ensure success Master Data Management (MDM) Data Warehousing Data Quality Application Consolidation Data Profiling 36 36
To Buy or To Build?? 37
Points to Consider When Building a Custom Archiving Solution Marginal Utility- Is the software being written in house: Flexible enough to cover use cases? Reusable within the same application? Scalable and leverages all DB and OS capabilities? Usable across the enterprise by other departments, applications, databases? Cost of Personnel to Build Writing software not directly related to your business. Starting from scratch with no expertise or framework Cost of maintaining software Adding features that didn t make first release Training person to support custom scripts/code Managing data models changes Time frame to put into production is highly variable Scripts or programs written from scratch Must prioritize what features will make each release Can turn into a long term Black hole project 38
Home Grown vs. IBM OPTIM Comparison Chart Requirement Support archive, purge and restore operations, including selective restore Ensure referential integrity of archived data High performance to handle large volume of data and minimize batch window Allow scope of archive and cascading purge to be controlled Provide pre-defined archival configurations for key packaged apps objects Allow pre-defined archival configurations to be modified to reflect configurations made to applications Archive data stored in database as well as file system, and maintained linkage Maintain schema information in addition to archive data Provide access to archived data from within applications Allow data to be archived to another database and offline storage, and integrate with hierarchical storage management If there is an interruption in the archive data purge and restore processes, be able to recover from the point of the interruption Ensure security of archived data Report on what data is archived Provide administrative tools to manage the archives Custom? X X? X???? Optim X X X X X X X X X X X X X X 39
A Sampling of IBM Optim Clients by Industry Financial Insurance Retail Manuf. Utility Telecom Pharma Auto IT Refining Chemical Transport 40
Summary Databases are growing at an unprecedented rate Cost of storing data increases as the data grows The larger the database, the slower the performance Archive historical data to reclaim space, improve performance Prevent data growth from impairing business results Automate Business Object Discovery to gain new Data Insights, Ensure Accuracy and Speed Implementation Control database size at desired level Minimize storage footprint, cut costs Streamline routine maintenance 41
42