Overview of Storage and Data Management Industry Trends in Long Term Information Retention and Preservation

Similar documents
<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Best Practices for Long-Term Retention & Preservation. Michael Peterson, Strategic Research Corp. Gary Zasman, Network Appliance

In ediscovery and Litigation Support Repositories MPeterson, June 2009

Storage Considerations for Database Archiving. Julie Lockner, Vice President Solix Technologies, Inc.

Reference Architectures for Repositories and Preservation Archiving

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

ILM: Tiered Services & The Need For Classification

Long term retention and archiving the challenges and the solution

<Insert Picture Here> Cloud Archive Trends and Challenges PASIG Winter 2012

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

The Archival Upheaval Petabyte Pandemonium Developing Your Game Plan Fred Moore President

Taming Big Data Storage with Crossroads Systems StrongBox

Cloud OS Vision. Modern platform for the world s apps

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Versity All rights reserved.

How To Manage Tiered Storage

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

ClearPath Storage Update Data Domain on ClearPath MCP

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

IBM Infrastructure for Long Term Digital Archiving

ETERNUS for small and medium-sized businesses Reliable Storage Solutions for your Dynamic Infrastructures

Solving the long term archiving challenges with IBM System Storage Archive Manager Solutions

Restoration Technologies. Mike Fishman / EMC Corp.

XenData Video Edition. Product Brief:

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

10th TF-Storage Meeting

WHITE PAPER WHY ORGANIZATIONS NEED LTO-6 TECHNOLOGY TODAY

Big data Devices Apps

Solution Brief: Creating Avid Project Archives

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

IBM System Storage Portfolio Overview

Tape s evolving data storage role Balancing Performance, Availability, Capacity, Energy for long-term data protection and retention

Media for Long-Term Archiving. June 2014

IBM System Storage DR550

CompTIA Storage+ Powered by SNIA

Oracle Content Management and Archiving

XenData Archive Series Software Technical Overview

Datasheet Fujitsu ETERNUS LT20 S2

Software Defined Microsoft. PRESENTATION TITLE GOES HERE Siddhartha Roy Cloud + Enterprise Division Microsoft Corporation

Backup and Recovery Solutions for Exadata. Cor Beumer Storage Sales Specialist Oracle Nederland

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Multi-Terabyte Archives for Medical Imaging Applications

Miguel Ortiz, Sr. Systems Engineer. Globanet

EMC BACKUP MEETS BIG DATA

Introduction to Optical Archiving Library Solution for Long-term Data Retention

Introduction to NetApp Infinite Volume

Long Term Record Retention and XAM

Vodacom Managed Hosted Backups

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

巨 量 資 料 分 層 儲 存 解 決 方 案

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Data Sheet FUJITSU Storage ETERNUS CS800 S5

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Time Value of Data. Creating an active archive strategy to address both archive and backup in the midst of data explosion.

EMC Backup and Recovery for Microsoft SQL Server

E4 UNIFIED STORAGE powered by Syneto

Hitachi Cloud Service for Content Archiving. Delivered by Hitachi Data Systems

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

Sun Open Archive Framework and Fedora Repository Solutions

Hybrid Cloud Storage. Sævar Haukdal sölustjóri skýjalausna

How To Understand The Strategic Importance Of Archive Solutions

ntier Verde Simply Affordable File Storage

EMC BACKUP AND RECOVERY SOLUTIONS

How To Use An Fujitsu Ces800 S5 Backup Appliance

How To Backup To Disk With Deduplication On An Fujitsu Ecson Cser800 S4 Data Protection Appliance

June Blade.org 2009 ALL RIGHTS RESERVED

PoINT Storage Manager

Managing Data Storage in the Public Cloud. October 2009

Next Generation Backup Solutions

Transcription:

Overview of Storage and Data Management Industry Trends in Long Term Information Retention and Preservation Raymond A. Clarke Sr. Enterprise Storage Solutions Specialist SNIA Data Management Forum, Board of Directors Sun Microsystems - Archive & Backup Solutions 1

What We'll Cover What's Changed in 3 Years? Archive Requirements How have they changed? Some Truths About Tape in Archive Environments What is Sun Doing to Bring Innovation to Archive? What's SNIA's Doing To Help? 2

Why is Archive So Important?... because The History of Data Growth is Exponential! 24 Words - Pythagorean Theorem 67 Words - Archimedes Principal 179 Words - 10 Commandments 286 Words - Lincoln's Gettysburg Address 1300 Words - US Declaration of independence 26911 Words... EU REGULATION ON THE SALE OF CABBAGES 3

CIS1 Slide from 2006 SNIA 100 Year Archive Task Force Challenge: Manage Data for 75++ Years HW typically only backward compatible N-1 Yearly capacity increases Every 2-5 years HW becomes obsolete: Need to migrate current data to newer HW components Replace compute parcels Replace FC Parcel for performance and capacity Replace tape drives and media to current technology Replace SATA parcels for capacity / footprint Minimize vulnerability HW migration is inevitable; PLAN for it 4

Demands of a New Archive Reality Is the ratio for archiving solutions changing? 10 / 90 versus 2 / 18 / 80 (aka Tier 1, Tier 2 Tier 3) Next Generation Archives need to address a new dimension of the massive resting data How do you search Petabytes of data from the edge? The new ratio has evolved into a Write / Read / Search relationship (2 / 18 / 80) Business semantics and data classification need to drive data management not systematic schemas Storage abstraction and Search become critical to the presentation of the data, something new is needed... Compute, Store and Network resources need to be integrated, yet be independently extensible. 5

Power Consumption is a Big Problem in Storage Cooling/HVAC Servers Storage (Disk and Tape) Network (SAN/LAN/WAN) Other 48% 37-40% 50-60% IT Equipment Other 40-50% 80% External Storage (All Tiers) 20% Tape Drives & Tape Automation(1) Storage is a significant part of data center energy usage, and at 20% CAGR, it is the fastest growing segment. (2) Some data centers are being told that they can have no more power! Source (1): 2007 Jupiter Media Corporation and The StorageIO Group Source (2): Report to Congress on Server and Data Energy Efficiency (08/2007) 6

Additional Advantages of Tape Unit Form Factor 2007 Inches Volumetric Density GB/in3 Cartridge Capacity(Native) Arial Density GB/TB Data Rate MB/sec/Drive GB/in 2 Tape Speed for Data Meters/sec. 2009 2011 2017 2019 5.25 FH 5.25 FH 5.25 FH 5.25 HH 5.25 HH 5.25 HH 5.25 HH 5.25 HH 3.5 3.5 3.5 3.5 3.5 100 800 GB 1.2 120 6-8 200 400 2000 10000 1.2-1.6 TB 3.0-4.0 TB 12-16TB 48-64TB 2.0 3.0-3.5 10-14 20-40 160-180 200-280 400-800 800-1,600 8-10 10-12 12-15 The Cost Ratio for a Terabyte Stored Long-Term on SATA Disk versus LTO-4 Tape is about 23:1 For energy cost, it is about 290:1.Clipper Notes-October, 2008 12-15 Source: Bi-annual inemi Mass Storage Report for 2008 7

How do you build an Archive, sensitive to Access and Presentation, when any asset can be requested at any time? 8

Open Storage/Open Archive Anatomy NFS FCP SAS iscsi CIFS IB VTL OSD CAS IB XAM Web DAV Open Storage Appliances Sun Storage 7110 Sun Storage 7210 Storage Servers SunFire X4240 Sun Fire X4250 Replication Sun Storage 7310 Sun Storage 7410 Open Storage Arrays SunFire X4540 Storage J4200 Storage J4400 SAS HBA's Open Storage Flash SSD ZFS Mirror/Snap Search Storage J4500 Encryption De-duplication CMT Servers Tape-Related Software Security File-Systems ZFS, Lustre, SAM/QFS, pnfs SAM-QFS Migration Backup Compliance 9

Sun Storage 7000 Unified Storage Systems Converges Compute, Storage and the Network Open Architecture Open data formats, Open protocols OpenSolaris and Appliance Kit platforms Storage and communities Integrated products, reusable components Technical Innovation Integrated ZFS and Flash Hybrid Storage Pool Platform DTrace Analytics Compute Network Game Changing Economics ZFS+Flash+SATA yields best $/G, $/IOP, $/MB/s, W/IOP, W/MB/s Industry-standard architecture that leads volume price/performance No appliance software fees and license keys 10

Enterprise Archive Alternatives Unstructured Data Structured Data SAP Email Video Images Oracle Primary Access/Ingest Primary Database Sparc or Sun Fire SSD-Based Servers Green Archive Database Backup Capacity Disk Unified Storage or JBOD Data Deduplication Encryption 7000 Unified Storage/HSP 7000 Unified Storage/HSP Database Archive Tiered Storage Management Assured Delete SAM-FS (IAS) Replication NFS NFS NFS Remote Disk,Tape Libraries and Virtual Tape Long-term Preservation & Retention Archive SAM-FS Replication EMail Archiver Open Storage/ Open Archive Sparc or Sun Fire SSD-Based Servers (IAS) Capacity Disk Unified Storage or JBOD 7000 Tiered Storage Management Unified Storage/HSP Encryption VTL Local Tape October 28, 2009 Capacity Disk JBOD Sun Microsystems, Inc.- Local Tape Assured Delete VTL Page 11

Industry Challenge Reduce the complexity of managing heterogeneous data and storage environment Store, Protect, Secure, Access, Archive, Comply and Shred Cost Complexity Risk 12

SNIA Legal Notice his section is copyrighted dividuals may use this material in presentations and literature under the follo st be reproduced without modification dged as source of any material used in the body of any document containing mate ect of the SNIA Data Management Forum 13

About SNIA and the DMF out the Storage Networking Industry Association (SNIA) A s primary goal is to ensure that storage networks become complete trusted solutions across the IT community additional information about SNIA see www.snia.org out the SNIA Data Management Forum (DMF) - www.snia-dmf.org DMF is a sub-group of SNIA acting as the worldwide authority on Data Management, Data Protection a DMF is a collaborative storage industry resource available to anyone onsible for the accessibility and integrity of their organization s information. DMF Data Protection Initiative (DPI) Information Lifecycle Management LongInitiative term Archive and Compliance Storage Initiative (LT (ILMI) Developing, teaching and Addressing ILM practices, challengesimplementation in developing, securing, methods, and retaining longnew approaches and best practices for data protection andpromoting recovery and benefits 14

What's SNIA Doing About Industry Challenges? Educates, Defines and Taking Action to Address Industry Challenges New DMF Developed or Managed Activities The Self-Contained Information Retention Format(SIRF)* Rationale & Objectives Requirements & Use Cases Bridging Terminology * Green Storage Initiative XAM extensible Access Method 15 Sun Proprietary/Confidential: Internal Use Only

What is the Self-Contained Information Retention Format (SIRF)? A logical container format for the storage subsystem appropriate for the long-term storage of digital information A logical data format of a mountable unit e.g. a filesystem, a block device, a stream device, an object store, a tape, etc. Includes a cluster of interpretable preservation objects that can be understood in the future Self-describing can be interpreted by different systems Self-contained all data needed for the preservation objects interpretation is contained within the preservation objects cluster If a mountable unit is damaged or lost, the effect is contained the information in the other mountable units is still valid! Need to define how and when external references are supported A work effort by SNIA s Long Term Retention Technical Work Group 16

Problem SIRF is Addressing Without SIRF Application A With SIRF Application B Application A Application B SIRF Interface Interface Interface Preservation Preservation Retention Retention Storage Storage Subsystem Subsystem Subsystem Data Type Data Type SIRF SIRF Data Type Data Type SIRF SIRF Preservation Retention Storage S top Interface PASS Preservation Retention Storage SIRF Subsystem Can move cluster of preservation objects between e cluster of preservation objects between systems by itself itself inal application who wrote the preservation objects can read andsystems interpretby them Any SIRF compliant application can read and and import processes interpret the preservation objects Objects cannot be sustained over the long-term No need for export and import processes Preservation Objects can survive longer 17

SIRF Objectives Facilitate transparent logical and physical migration and movement in order to support long term preservation Media, subsystem or bitstream movement remove the mountable unit from system A and put it at system B. Transparent system A is not involved. All the information needed for system B to understand the mountable unit is self-described and selfcontained within the mountable unit. Long term 15 years and above (according to 100 years archive requirements survey). Preservation sustain the understandability and usability of the data and not just the bits. Considering multiple implementations of SIRF to utilize: the Open Archival Information System (OAIS) ISO standard SNIA s extensible Access Method (XAM) 18

SIRF Initial Requirements Format Self-describing The amount of required information is small and can be acquired in stages Interpretable by both humans and machines Ability to do offline inspection Support self-contained data Include means to represent internal links and cross references Support methodology for verification of completeness and correction Interoperability Ability to migrate data between different systems without loss of information data should be interpretable after migrations and interpretable in the future 19

OAIS AIP Logical Structure (ISO 14721:2002) he focus of the preservation. n required to interpret the raw data to its designated community. dentifiers for the content information. he content information and any changes that may have taken place since it was originated, and who the content information and relationship to its environment. ontent information has not been altered in an undocumented manner. SNIA Data Management Forum October 28, 2009 Sun Microsystems, Inc.- Page 20

SIRF Initial Requirements Preservation Object Data Model Support different data models for preservation objects Support different object data models at one time Support complex data structures like collections of objects Support migrating objects from one data model to an alternative data model Can handle any proper data format for the raw data No restrictions on file formats Enable retention of multiple versions of the original preservation object with their relations References from new to existing preservation objects of the same version series There must be a persistent identifier for each preservation object Include additional external identifiers 21

Building a Terminology Bridge What is it? A framework designed to enhance better communication among IT professionals, records administrators, security, legal, librarians, archivists, data curators and compliance officers, along with the business groups, What specific problem does it address? Terminology used in the data center can be very confusing. all have their own vernacular and they all hold a portion of the responsibility for maintaining corporate information assets Objectives of the Terminology Bridge Stimulate the ILM discussion and it's adoption Improve communication Explain/define terminology and practices 22

Building a Terminology Bridge Archive: the report advocates that IT practices adopt a more consistent usage of the term archive with other departments within the organization. To the archival, preservation, and records management communities, an archive is a specialized repository with preservation services and attributes. Preservation: managing information in today s datacenter with requirements to safeguard information assets for ediscovery, litigation evidence, security, and regulatory compliance requires that many classes of information be preserved from time of creation. Preservation is a set of services that protect, provide availability, integrity and authenticity controls, include security and confidentiality safeguards, and include an audit log, control of metadata, and other practices for each preservation object. The old IT practice of placing information into an archive when it becomes inactive or expired no longer works for compliance or litigation support, and only adds cost. Authenticity: is defined in a digital retention and preservation context as a practice of verifying a digital object has not changed. Authenticity attempts to identify that an object is currently the same genuine object that it was originally and verify that it has not changed over time unless that change is known and authorized. Authenticity verification requires the use of metadata. The critical change for IT practices is that metadata is now very important and must be safeguarded with the same priorities the data is. IT practices 23

Building a Terminology Bridge You can obtain a copy of Building a Terminology Bridge from the DMF s website at http://www.snia.org/forums/dmf/knowledge/term_bridge/ and you can participate in active discussion about it and other Data Management topics at the DMF Community site, http://community.snia-dmf.org. 24

Sun s Infinite Archive System Approach Incorporates Backup, Archive and Physical/Logical Information MigrationEmail Video Transactional Legal Images Web Any Communications Protocol Intelligent, Policy-Based Automated Archive Infinite Archive System (IAS) Infinite Archive System (IAS) Storage Archive Manager SAM-QFS 2500 series SAS Tier 1 Disk Options F5100 Solid State Disk Sun Storage 7410 ST6000 FC Disk Tier 2 Disk Options SunFire X4540 Storage Server Sun Storage 7000 Storage J4200 2500 series SATA Storage J4400 Storage J4500 Scalable, Eco-Efficient Tape Tier Options Encryption October 28, 2009 Libraries Virtual Tape Sun Microsystems, Inc.- Access & Capacity Drives Page 25

26

Thank You for Your Time THANK YOU! and Attention Raymond.Clarke@sun.com (212) 558-9321 Raymond.Clarke@Sun.com (212) 558-9321