Christian Bolik (bolik@de.ibm.com), IBM Research & Development, November 2010 IBM Information Archive: Architecture and Internals
Disclaimer Copyright IBM Corporation 2010. All rights reserved. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE. IBM, the IBM logo, ibm.com, DB2, WebSphere, and FileNet P8 are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml 2 IBM Archive Cloud for Financial Services
Agenda IBM s Smart Archive -Strategy What is IBM Information Archive? Physical and Software Architecture of the IA Appliance Key Concepts and Features in IA 3
IBM Smart Archive Strategy http://www.ibm.com/software/data/smart-archive/ Reports ERP / CRM (SAP, PeopleSoft ) Content (Documents, Images ) Paper Collaborative (Quickr, SharePoint) Data Email (Notes, Exchange) Value Added Services Optimization Services System Services Managed Services Reference Architecture Information Governance Optimized and Unified Assessment, Collection and Classification On Premise (Custom Config) Flexible and Secure Infrastructure with Unified Retention and Protection Appliance (Pre-Config) As A Service (SaaS, Multiple Options) Cloud Ready Archive Storage with Optional ECM Integrated Compliance, Records Management, Analytics and ediscovery 4
Existing IBM Archiving Solutions Reports ERP / CRM (SAP, PeopleSoft ) Content (Documents, Images ) Value Added Services Optimization Services System Services Managed Services Reference Architecture Information Governance Paper Capture CMOD On Premise (Custom Config) Collaborative (Quickr, SharePoint) Appliance (Pre-Config) Data Email (Notes, Exchange) Optimized and Unified Assessment, Collection and Classification ECM Repositories Optim SAP Archiving Content Collector and Classification Module Flexible and Secure Infrastructure with Unified Retention and Protection Information Archive As A Service (SaaS, Multiple Options) Cloud Ready Archive Storage with Optional ECM Enterprise Records and ediscovery Analytics Integrated Compliance, Records Management, Analytics and ediscovery 5 CMOD = Content Manager On Demand
Introducing IBM Information Archive Next Generation Information Retention Solution Universal, scalable, and secure storage repository for structured and unstructured information, compliant or non-compliant Integrated Archive Appliance combines the best of IBM Software, Hardware & Services Protects Data by enforcing the industry s most stringent information retention laws Highly versatile, highly scalable information retention solution for mid-size and enterprise organizations 6
What is IBM Information Archive The successor of the IBM System Storage DR550 A universal archiving repository for all types of content that addresses the complete information retention needs of midsize and enterprise clients faced with managing an increasing volume of information Combines fast accessible disk with low-cost tape within a single archive pool to enable businesses to deploy an archive strategy that minimizes total cost of ownership over the life of the archived information Brings together IBM s General Parallel File System technology, Tivoli Storage Manager and patent-pending Enhanced Tamper Protection to offer a high performance, high scalability, and secure platform Designed for archiving a broad range of electronic based records, including e-mail, digital images, databases, applications, instant messages, account records, contracts or insurance claim documents, and other types of storage records 7
Information Archive Announcements 24.09.2010: IBM Information Archive for Email, Files, and ediscovery Bundles IA with servers and licenses for IBM Content Manager, IBM Content Collector for Files and Email, ediscovery Manager and Analyzer Offered with implementation services 22.02.2010: IBM Information Archive R1.2 Improved Disaster Recovery capabilities Scales to up to 608 TB (raw capacity, 444 TB usable with RAID6) 26.10.2009: IBM Smart Archive Strategy 06.10.2009: IBM Information Archive R1.1 8
Information Archive Characteristics Information Archive Architecture Universal Common platform for archive of multiple types of data Variety of data interfaces (NAS, TSM) Scalable Scale-out of processing and storage Tiered storage, including external tape Adaptable Compliant and non-compliant archives Pluggable architecture for future dataspecific function Secure Fully protected, lockable compliant store No root access in full compliance mode through Enhanced Tamper Protection NAS Client TSM API Client Web-browser NAS Interface SSAM Server IA Mgmt GUI GPFS Filesystem & IA Middleware TSM Server Disk Storage Disk Storage Collection 1 Collection 2 9
Physical Architecture 10
ipdu ipdu Hardware Redundancy 11 2231 IA3 2231 IA3 File Archive Configuration Main Rack FC9910 Specified IA3 3-node + no App Srvrs 36 RSM Server (FC5601) Mandatory 35 D1B Disk Exp #1-6 Optional 34 (optional) 33 6+2P; 6+2P 32 D1B Disk Exp #1-5 Optional 31 (optional) 30 6+2P; 6+2P 29 D1B Disk Exp #1-4 Optional 28 (optional) 27 6+2P; 6+2P 26 D1B Disk Exp #1-3 Optional 25 (optional) 24 5+2P; S; 6+2P 23 22 21 Keybd, Monitor, KVM Mandatory 20 Two 24 port Brocade SAN24B4 Optional 19 FC switches (optional but paired) Optional 18 Mgmt Server (FC5600) Mandatory 17 Two SMC 8126L2 26 port Mandatory Ethernet 10/100/1G Sw 16 Mandatory (46M2175) 15 S2M Server Mandatory 14 13 S2M Server Optional 12 (opt 1) 11 S2M Server Optional 10 (opt 2) 9 D1B Disk Exp #1-2 Optional 8 (optional) 7 6+2P; 6+2P 6 D1B Disk Exp #1-1 Optional 5 (optional) 4 6+2P; 65+2P 3 D1A Disk Ctrlr #1 Mandatory 2 1 5+2P; S; 6+2P 112 TB Raw (1TB HDDs) 96TB User (RAID5) 82TB User (RAID6) Mandatory All servers have redundant power supplies Redundant Ethernet switches Redundant Fiber Channel switches Dual ipdu s Bonded Ethernet port configuration Dual internal/external Ethernet paths R1.2 added support for 2 TB drives: Up to 224 TB raw, 164 TB usable (RAID6)
ipdu ipdu Storage Redundancy 12 2231 IS3 File Archive Expansion Rack Storage Expansion Rack for File Archive attachment only (IA3 with FC9910) 36 D1B Disk Exp #2-5 Optional 35 (optional) 34 6+2P; 6+2P 33 D1B Disk Exp #1-5 Optional 32 (optional) 31 6+2P; 6+2P 30 D1B Disk Exp #2-4 Optional 29 (optional) 28 6+2P; 6+2P 27 D1B Disk Exp #1-4 Optional 26 (optional) 25 6+2P; 6+2P 24 D1B Disk Exp #2-3 Optional 23 (optional) 22 5+2P; S; 6+2P 21 D1B Disk Exp #1-3 Optional 20 (optional) 19 5+2P; S; 6+2P 18 D1B Disk Exp #2-2 Optional 17 (optional) 16 6+2P; 6+2P 15 D1B Disk Exp #1-2 Optional 14 (optional) 13 6+2P; 6+2P 12 D1B Disk Exp #2-1 Optional 11 (optional) 10 6+2P; 6+2P 9 D1B Disk Exp #1-1 Optional 8 (optional) 7 6+2P; 6+2P 6 D1A Disk Ctrlr #2 Optional 5 (optional) 4 5+2P; S; 6+2P 3 D1A Disk Ctrlr #1 Mandatory 2 1 5+2P; S; 6+2P 192 TB Raw (1TB HDDs) 164TB User (RAID5) 140TB User (RAID6) Mandatory R1.2 added support for 2 TB drives: Up to 384 TB raw, 280 TB usable (RAID6) Storage Hardware Redundancy All servers have mirrored internal hard drives Each storage controller drawer has two controllers with failover capability RAID 6 used on all fiber Channel attached storage Dual paths from each archive node to storage controllers
Software Architecture 13
Software Failover NFS Client Clustered NFS is integrated with GPFS clustering, when a node fails, the NFS virtual IP is moved to another server in the cluster and the locks are also migrated Given known NFS stateless semantics, customers should sync data before they consider it committed to the system. HTTP (read) HTTP is sharing the virtual IP with clustered NFS. When CNFS fails over, HTTP sessions will be redirected to the failed-to node. Client will have to reauthenticate as the HTTP state is not shared between nodes in the cluster. TSM/SSAM API and Archive Client TSM/SSAM Server IP address is moved to another node, TSM/SSAM is restarted and performs transaction recovery. API/Archive Client will retry on server IP address and either recover and continue current transaction, or fail failing transaction and rolling back. 14
Information Archive Collections Collection = Set of archived documents managed under the same policy domain Types of collections: NAS, SSAM Multiple collections per IA appliance (up to 3 in R1) with separation of data Interface Policy Interface protocol (NAS or TSM API) Object commit method (NAS only) XML file, timeout, NetApp Snaplock TM Retention Policy Controlled internally or by external application Time-based (internal) and Event-based (external) retention Automatic (internal) or manual (external) deletion after expiration Storage Policy Disk replication Tape storage tier Encryption (tape only) Deduplication Shredding (SSAM only in R1) Mode Compliance Policy Delete before expire? Retention Period Shorten? Lengthen? Basic Yes Yes Yes Intermediate No Yes Yes 15 Maximum* No No Yes *SSAM Collections are always Maximum compliance
Customizable Policy-based Retention Features Time Based Minimum Fixed Period X Day 0 Dispose after fixed period from creation date Event Based with Fixed Protection Periods Day 0 Minimum Fixed Period Event Fixed Period X Dispose after fixed period from event date Event Based with no Fixed Protection Period Event X Day 0 Dispose after event 16
Information Archive NAS Collection Architecture (R1) 1 Scale-out NAS interface with global namespace 2 1 2 3 4 5 XML-based application metadata (optional) Parallel ingest processing, including indexing Seamless tiered storage through TSM HSM Disk-based metro and global mirroring Ingest Processing 3 Plug-In Advanced Indexing Analytics, tagging Parse/Index Data + Metadata 4 5 17
Ingestion Chain of IA NAS Collections Archiving Application Information Archive Manager Write document and metafile (files) Set retention and explicitly commit (optional) Process event log, dispatch change log = Archiving Application Update Audit Log Time Implicitly commit document (if required) Apply policies, set service class and retention 1 2 3 4 5 6 7 Pre-migrate to Level 2 storage = Event Log Processor = Policy Manager 18
Meta Files (NAS Collections only) Use existing standards (NFS, XML) to extend the user s ability to manage files in the archive via NAS protocols NetApp SnapLock TM compatibility only provides time-based retention The addition of metafiles enables support of time and event based retention, policy based retention and retention hold/release Customer/application can define user fields Binding and non-binding (binding cannot be updated after document is committed) User defined fields will be indexed in the future Event based retention, holds, etc. can be signaled via EVENT fields in meta files. User/application can use standard file protocols to update meta files. 19
Meta File Example /IIA/col1/data /IIA/col1/data/file1.doc /IIA/col1/meta /IIA/col1/meta/file1.doc <?xml version="1.0" encoding="utf-8"?> <fields> <_SYSTEM_md5Checksum> Now is the time for all good men to come to the aid of their country. da1e100dc9e7bebb810985e37875de38 </_SYSTEM_md5Checksum> <USER_confidential> Yes </USER_confidential> <_EVENT_setRetention_> 20101231 </_EVENT_setRetention_> </fields> 20
Security and Compliance Characteristics Role based security: Security, Systems and Archive Administrator Roles Archive User, Service Engineer and Auditor Roles Audit logs provide compliant audit trail. Physical security: locking cabinet To achieve maximum compliance protection with IBM IA customers enable the patentpending Enhanced Tamper Protection feature Removes root login capability from the IBM IA Cluster Neither customer nor IBM has root login authority Once enabled, cannot be disabled Expected admin and support operations pre-programmed to remove need for root access Best practice to enable during installation We have a procedure for an unforeseen emergency requiring root access by delivering a signed - time-bounded patch to the customer. 21
Information Archive Integrated, Web-based Interface Initial configuration is integrated and guided Start archiving data in 1 day or less Consolidated and integrated management in a single administrative interface* Security, administration, monitoring, notifications (Email and SNMP), troubleshooting, serviceability Manage multiple collections from a single interface Role-based administrative security Simple, efficient, consistent administrative experience Wizards guide user on the creation of objects Overviews provide high level situational awareness of system health Consolidation of information for efficient administration Task-oriented, with drill down where needed for deeper configuration and trouble shooting 22
IBM Information Archive Key Feature Summary Feature Ease of use Installation & Implementation User Interface IBM Information Archive Appears as a file-server, simply drag & drop files to IA. Information Archive is physically installed by a CE. Quick configuration via wizards. Dynamically add storage. Task oriented GUI to manage the entire solution (+ CLI) Application Interfaces NFS, CIFS*, HTTP*, FTP*, SSAM all within IA Scalability Retention Policies Compliance / Security Index / Search Supports billions of objects 1 billion per collection, up to 3 collections per IA (more in the future*) Accepts retention policies from applications or users (Event-based or time-driven) Each collection can be configured with different protection levels (basic, intermediate, maximum). New patent-pending tamper proof technology increases protection. Full text indexing and search (both data and metadata) to be added in a future release* 23 * Statement of direction for a future release
Links for IBM Information Archive Homepage: http://www.ibm.com/systems/storage/disk/archive Also access support information (technical notes etc.) from this page Redbook: http://www.redbooks.ibm.com/abstracts/sg247843.html?open Or go to http://www.redbooks.ibm.com and search for Information Archive IA Wiki at Developer Works: http://www.ibm.com/developerworks/wikis/display/tivolidoccentral/ibm+information+archive Or go to http://www.ibm.com/developerworks/wikis and search page for Information Archive 24
THANK YOU! 25