A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data centre cost savings exercise



Similar documents
A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

WHITE PAPER WHY ORGANIZATIONS NEED LTO-6 TECHNOLOGY TODAY

Extended Data Life Management:

The ROI of Tape Consolidation

D2D2T Backup Architectures and the Impact of Data De-duplication

Securing Data Stored On Tape With Encryption: How To Choose the Right Encryption Key Management Solution

Save Time and Money with Quantum s Integrated Archiving Solution

Backup and Recovery: The Benefits of Multiple Deduplication Policies

WHITE PAPER. BIG DATA: Managing Explosive Growth. The Importance of Tiered Storage

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

Solutions for Encrypting Data on Tape: Considerations and Best Practices

Quantum StorNext. Product Brief: Distributed LAN Client

Taming Big Data Storage with Crossroads Systems StrongBox

DEDUPLICATION BASICS

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage

How to Manage Critical Data Stored in Microsoft Exchange Server By Hitachi Data Systems

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Disk-to-Disk-to-Tape (D2D2T)

Protecting Big Data Data Protection Solutions for the Business Data Lake

Data Deduplication: An Essential Component of your Data Protection Strategy

Building a Successful Strategy To Manage Data Growth

How To Protect Data On Network Attached Storage (Nas) From Disaster

Solution Brief: Creating Avid Project Archives

Protect Microsoft Exchange databases, achieve long-term data retention

The Archival Upheaval Petabyte Pandemonium Developing Your Game Plan Fred Moore President

DXi Accent Technical Background

ETERNUS CS800 data protection appliance featuring deduplication to protect your unique data

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

3Gen Data Deduplication Technical

Cloud, Appliance, or Software? How to Decide Which Backup Solution Is Best for Your Small or Midsize Organization.

Reduce your data storage footprint and tame the information explosion

QUICK REFERENCE GUIDE: KEY FEATURES AND BENEFITS

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Data Management using Hierarchical Storage Management (HSM) with 3-Tier Storage Architecture

WHITE PAPER. QUANTUM S TIERED ARCHITECTURE And Approach to Data Protection

Total Cost of Ownership Analysis

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

THE CASE FOR ACTIVE DATA ARCHIVING

Time Value of Data. Creating an active archive strategy to address both archive and backup in the midst of data explosion.

Virtual Disaster Recovery

Long term retention and archiving the challenges and the solution

The Modern Virtualized Data Center

Data Protection. the data. short retention. event of a disaster. - Different mechanisms, products for backup and restore based on retention and age of

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

CA ARCserve Family r15

Protecting enterprise servers with StoreOnce and CommVault Simpana

Tier 2 Nearline. As archives grow, Echo grows. Dynamically, cost-effectively and massively. What is nearline? Transfer to Tape

Whitepaper: Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam. Copyright 2014 SEP

Acme Corporation Enterprise Storage Assessment Prepared by:

Long-term data storage in the media and entertainment industry: StrongBox LTFS NAS archive delivers 84% reduction in TCO

SYMANTEC NETBACKUP APPLIANCE FAMILY OVERVIEW BROCHURE. When you can do it simply, you can do it all.

QStar White Paper. Tiered Storage

XenData Video Edition. Product Brief:

Why cloud backup? Top 10 reasons

W H I T E P A P E R T h e C r i t i c a l N e e d t o P r o t e c t M a i n f r a m e B u s i n e s s - C r i t i c a l A p p l i c a t i o n s

Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection

Cisco UCS and Quantum StorNext: Harnessing the Full Potential of Content

IBM Tivoli Storage Manager

XenData Archive Series Software Technical Overview

SharePoint Archive Rules Options

Archiving, Backup, and Recovery for Complete the Promise of Virtualization

Why StrongBox Beats Disk for Long-Term Archiving. Here s how to build an accessible, protected long-term storage strategy for $.003 per GB/month.

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Oracle Database Backup Service. Secure Backup in the Oracle Cloud

WHITE PAPER. Acronis Backup & Recovery 10 Overview

for Oil & Gas Industry

Backup, Recovery & Archiving. Choosing a data protection strategy that best suits your IT requirements and business needs.

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK

HP StorageWorks D2D Backup Systems and StoreOnce

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Symantec OpenStorage Date: February 2010 Author: Tony Palmer, Senior ESG Lab Engineer

TCO Case Study Enterprise Mass Storage: Less Than A Penny Per GB Per Year

Big data management with IBM General Parallel File System

Implementing Offline Digital Video Storage using XenData Software

Hitachi Cloud Service for Content Archiving. Delivered by Hitachi Data Systems

Transcription:

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data centre cost savings exercise NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and does not represent a commitment on the part of Quantum. Although using sources deemed to be reliable, Quantum assumes no liability for any inaccuracies that may be contained in this White Paper. Quantum makes no commitment to update or keep current the information in this White Paper, and reserves the right to make changes to or discontinue this White Paper and/or products without notice. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or information storage and retrieval systems, for any person other than the purchaser s personal use, without the express written permission of Quantum.

CONTENTS Introduction...3 Data and data types...4 Value of data...4 Tiered storage...6 Operational cost...8 Data Management Software....8 The role of tape...9 Disaster Recovery...9 Conclusion...10 Archiving Persistent Data 2

Today s enterprises are experiencing greater storage growth than ever before. The growth comes from structured data from enterprise databases, or unstructured data from a variety of applications. Wherever it comes from, it must be preserved for business continuity, data retention laws, and to meet compliance requirements. The data centre needs to reduce its total cost of ownership (TCO) for its backup/archiving infrastructure. It must contain costs, manage data growth, improve the backup/archive process, and make it more efficient. Unstructured data is often the core data asset in an organisation s work flow and is inherent in revenue generating operations. This white paper discusses how to manage unstructured data growth in the most cost-effective manner. It also discusses the distinct differences between backup and archive and what is the best policy for both. INTRODUCTION When developing a strategy for managing unstructured data growth there are many considerations that need to be taken into account including the type of data and its value, the cost of data growth, the correct platform for the data to reside on and data security. But, to start with, what is archiving and what is the difference between archive and backup? To (over) simplify: Archive is a copy of data that is being retained in a safe and economical location for long periods of time but is reused from time to time. It is a method of storing data in the most cost-effective location. The performance of the systems the archive is stored on is matched to the requirements of the application using the data. Backup is a copy of the data that is only recovered when there is a failure or form of corruption. Archiving Persistent Data 3

Asset Value Cost of Storage Paying too much Primary Storage High performance Frequency of Access Secondary Storage Tier 2 or 3 storage DATA TYPES Not all data is created equal. If you consider the difference between an on-line trading system, processing multiple transactions in milliseconds, and the company payroll that only has to issue funds once per month you can see that data should be prioritised in terms of its currency, life cycle and value. Later in the document we will discuss value and give some real world examples of how data is dynamic. In general, data can be categorised as live, persistent or backup, meaning that it is either in use, is not in use but can be recalled at any point, or it is part of a copy of the primary data to only be used when the system needs to be restored after a failure. One of the challenges is to understand and manage the data. Lots of extra copies of data are being made a snapshot is taken of it, some internal applications back it up, and backing it up with an enterprise backup application will probably make a copy of itself. This scenario doesn t even mention all the replication processes going on: applications are replicating, storage is replicating and backup devices are replicating. What is needed is an automated system that can remove duplicate data, and archive just the data you need. VALUE OF DATA Your data is the life blood of your business and is probably your organisation s most critical asset. Independent IT analysts agree that companies that experience a complete data loss have only a 10% chance of survival in the following 2 years - 50% don t ever trade again! Whether you re running a web based retail site, an SQL database or even the staff payroll you will definitely benefit from the peace of mind that comes with knowing your businesses important data is backed up and available to you for immediate access whenever you need it. At the bare minimum you need to ensure you have a copy of all your data in a different place to the original. This gives you the ability to recover from data loss only if it is a real time copy, or near real time copy such as a snapshot. But if you are attacked by a virus then both copies will be infected. What is needed is a series of point in time copies so that in the event of data corruption you can roll back to a previous instance. You will need to store this historic data so it is accessible when needed but in most cases data more than 30 days old will not be needed. You will also be taking up valuable space on storage systems, keeping them powered up and ready in case you need them. Surely there is a better and more cost-effective solution? Archiving Persistent Data 4

The solution is a combination of back up and archive. Backup is an inert copy that can be called upon to restore data to the primary system. Archive is a live file that can be accessed when needed but doesn t need to be on the expensive front line system. Most data is passive meaning that it is rarely accessed 30 to 90 days after its creation and even less after that time period. By taking frequent snap shots of your new data and archiving data older than 30 days, and backing that up, you get the most cost-effective use of resources. IT projects are fast moving and dynamic but in most cases they are reliant on reusable assets e.g. historic data. For years, the way to handle data growth was to simply throw raw storage capacity at the problem. But that approach no longer works as organisations must not only deal with capacity challenges, but also with the performance, management and running cost of the systems. Some examples of fast moving dynamic projects with reusable assets: Media and Entertainment In the film industry it is common practice to store raw and edited content on high performance arrays while work is in progress. Once the project is completed the content is placed in a working archive or a long term archive depending on the time it takes to create intermediaries, certain special effects and other content to develop the final cut. Preservation of source media containing the original content is extremely common since it is difficult, if not impossible to recreate, but this is often insufficient since raw material does not capture any edits or metadata generated during the processing of raw content to create a finished product. As a result raw content as well as final cuts must be archived. Further reading: http://bit.ly/bnlsdo Life Sciences DNA sequencing and the use of imaging technology is producing new volumes of data that must be analysed, stored and managed. Research centres need to access, share and manage hundreds of terabytes of DNA sequencing data for analysis at any time. Each new generation of sequencers, mass spectrometers, microscopes and other lab equipment produces a richer, more detailed set of data. When the data is part of a workflow it must be on the highest performance systems accessible to researchers for analysis and discovery. The data should then be archived on more cost-effective systems for additional review and retrieval, and backed up off site. Further reading: http://bit.ly/arqap4 Utilities/Oil & Gas To increase oil and gas exploration, speeding the processing of seismic data is a vital tool. This involves massively powerful 3D processing software, fast high capacity Ethernet networks and SAN based storage. Daqing Oil Field Petroleum Exploration and Development Research Institute (EDRI) performs seismic data archival, retrieval, data protection and vaulting through a high performance tape library. Based on parameters such as schedules, work areas, users and key processing criteria, its Geophysics Service Centre can migrate data from online RAID systems to tape thereby releasing disk space for other jobs. When archived files are needed, they can be retrieved automatically from tape back to disk. Additionally, a clone of the final version of processed data can be replicated to the tape library to allow offsite vaulting and data protection for final data. Further reading: http://bit.ly/cj90mm Archiving Persistent Data 5

CERN a Government Research project CERN, the European research centre for nuclear research, recently built the Large Hadron Colider to allow scientists to analyse the structure of matter. The system generates approximately one Gigabyte of new data per second and must be sustained day and night for at least one month of an experiment. This is the equivalent of more than a Petabyte of data being accumulated during the month. All these billions of bits of data generated every second are acquired by the A Large Ion Collider Experiment (ALICE) data acquisition system before being selected, transferred and stored in the main computer centre three kilometres away. This requires high speed, shared workflow operations and large-scale, multi-tier archiving. Further reading: http://bit.ly/c2cls1 TIERED STORAGE In order to be more energy efficient you need to match your various business requirements with the right data storage technology. In most cases, this results in a multitier storage architecture that includes a mix of disk and tape hardware together with replication, deduplication, data management and archive software. As mentioned in the data types section at the start of the document data can generally be categorised as either live, persistent, or backup. This means that data is either in use, not in use but can be recalled at any point, or it is part of a copy of the primary data to only be used when the system needs to be restored after a failure. With this in mind you need to prioritise where the data resides to ensure that your live data is on fast, high-performance systems, your persistent data is archived but easily accessible and that backup data is not only on lower cost systems but ideally powered down unless called upon and, if part of a disaster recovery strategy, copied to a different location. For example you should set a policy that moves data that has not been accessed for 30 days to a secondary storage array and then archive it after 90 days. In this case fast primary storage is used for the live data, clustered SAN or NAS disk arrays for the secondary data and tape libraries for the archive. The reason for this structure is to maintain the most cost-effective system. You could put all data on primary storage but the capital expenditure for the hardware, the management time needed and power usage would be excessive. There is often a misconception that disk based arrays are faster than tape. If you want fast access to an individual file then disk is the correct choice but if you need sustained access to multiple files or need to restore files from a backup then tape, when used in conjunction with intelligent file management and archive software, will be your best choice. Archiving Persistent Data 6

Quantum use HP LTO tape drives in its storage libraries. Both companies have a long history of tape and disk based storage and can be completely impartial when giving advice on which technology suits which data set. A file being streamed from LTO tape using Quantum StorNext technology is much faster and more efficient than a disk based remote backup being restored over a Wide Area Network. Further reading: Taneja Group Technology Analysis: http://www.quantum.com/pdf/quantum_ Goes_Beyond_Backup.pdf Disk arrays used for archiving typically use SATA hard drives since they provide high storage capacity for a given price and are reliable when accessed infrequently. Data movement between tiers in an archive can be a manual process but this is cumbersome and susceptible to error, potentially resulting in data loss. Automation software products can be used to simplify this task. These products should include the ability to protect content by copying files and placing them on archive media. They should also work hand in hand with content asset managers and provide other efficiency features such as replication and deduplication for storage tiers. These features will greatly reduce storage requirements while enabling data to be retained longer. Archiving should not be regarded as a static process. Data volumes will always grow and when an archive load becomes too large decisions will have to be made about which content to transfer and preserve on new media. Selecting media format should always be made with a consideration towards backwards compatibility otherwise data transfer could become an almost constant process. LTO tape is considered by many the best choice of archive media because of its speed, capacity, 30 year shelf life and the fact that it is backed by the LTO consortium, therefore guaranteeing easy future access. The LTO consortium s road map shows the intention to provide read/write capability one generation back and read capability two generations back. The hardware should check the integrity of the data. The software automation tools should provide the ability to stream archive data to tape as this speeds the write and recovery processes. Automated policies that refresh the media over time, transparently to the user, also improve efficiency. In reality a combination of enterprise data management and protection software and a high performance LTO tape library will give the most cost-effective archive performance. Further reading: Computer Technology Review LTO article: http://bit.ly/d29a14 Quantum LTO: http://www.quantum.com/products/tapedrives/ltoultrium/lto-5/index.aspx Archiving Persistent Data 7

Acquisition Costs Power & Cooling Relative Cost of Acquisition 100 80 60 40 20 100 80 60 40 20 Power % used Primary Storage High performance Secondary Storage SATA Secondary Storage Tier 2 or 3 storage OPERATIONAL COST Data centre power, cooling, and space requirements are becoming a challenge. And the demands for data protection, improved restore performance, longer data retention times, and technology integration such as deduplication, are growing at a vast rate. Only the original data needs to be backed up and retained for long periods of time. Keeping it on spinning media for years on end will eat away at the energy portion of your IT infrastructure budget. Moving long-term data retention to tape largely removes the electricity costs to store that data, and enables the enterprise to demonstrate sustainability, via green initiatives, that seek to reduce energy consumption. The above diagram shows the acquisition and running costs of an LTO tape library compared with primary and secondary disk storage. If you are only accessing data occasionally it makes sense to ensure it is stored on the most cost-effective and efficient platform. As you can see, power consumption is a vital consideration for a cost-effective system. With primary data continuing to grow, doubling every 12 to 18 months, powering and managing that growth has moved into the top five of CIO concerns. Overall, 15% of office use of electricity is attributable to IT, according to UK-based Carbon Trust, and it forecasts this will rise to 30% by 2020. THE ROLE OF DATA MANAGEMENT SOFTWARE IN THE DATA CENTRE Good data management software should give you high-speed content sharing combined with costeffective data archiving. It s all about helping you build an infrastructure that consolidates your resources, so workflow runs faster and operations cost less. Data sharing and retention should be combined in a single solution, so you don t have to piece together multiple products that may not integrate well. Even in heterogeneous environments, all data should be easily accessible to all hosts. Further reading: Quantum data management: http://bit.ly/azzpk8 Archiving Persistent Data 8

THE ROLE OF TAPE IN THE DATA CENTRE Tape has historically been the primary media for backup and archive support for the data centre. It continues to be pervasive in data centres of all sizes. According to the Clipper Group for backup, 20% of all enterprises use only tape, while another 65% use both tape and disk, with tape usually sitting behind disk. This means that 85% of all enterprises use tape in some capacity for their data protection need. The primary role of tape is evolving to long-term archive and data retention, with many enterprises using disk systems for short term backup and recovery in order to take advantage of the quick access speeds from disk for an individual file. Tape continues to be the primary storage media for most disaster recovery plans. Further reading: Clipper Group Benefits of tape: http://bit.ly/ae47el Quantum s StorNext data management software provides a solution that allows you to load tapes into a library and the data set is there this can save many hours, sometimes even months, compared to a conventional recovery. It does this by storing the file directory data to provide full access as soon as the tape is loaded. Quantum s Scalar i6000 storage library with LTO5 tape drives includes innovative new features, such as ilayer MeDIA for analysing the integrity of media and a bulk load capability for the mass import and export of tape cartridges. ilayer intelligent software simplifies management and helps contain costs by reducing administrative time. Need a long term archiving solution?: http://bit.ly/9p6nya DISASTER RECOVERY As with backup and recovery, disaster recovery is a vital part of your data protection strategy. A disaster recovery policy basically means that you have a copy of your data, in a non corruptible form, in a different location from the primary data. What causes problems is that there is usually too much data to deal with. A good archive plan will get rid of a major portion of the problem. It will also eliminate the need for some of the complexity that is built into the backup process because people are using backups for long term retention of data. Long term retention should be the sole domain of the archive. These copies should then be replicated off-site in case something goes wrong at the original site. Ideally this should be accomplished by one process. If not, it should be managed as part of an overall backup workflow. Archiving Persistent Data 9

CONCLUSION Archiving is a vital part of your corporate IT policy. The key considerations are to ensure that all initial data is backed up in some way but not replicated multiple times. Backup essentially parks the data in case it is needed for restore purposes. Archive is a long term store that is held on cost efficient media and can be accessed easily when needed. When a massive amount of data is persistent the cost savings and speed efficiencies can be equally massive. The combination of intelligent archiving and data preservation software coupled with the latest high speed tape libraries will give you the best value, protection, operation cost savings and disaster recovery plan available. About Quantum Stornext With StorNext data management software, you get high-speed content sharing combined with cost-effective data archiving and content protection. It s all about helping you build an infrastructure that consolidates your resources, so your workflow runs faster and operations cost less. StorNext offers data sharing and retention in a single solution, so you don t have to piece together multiple products that may not integrate well. Even in heterogeneous environments, all data is easily accessible to all hosts. Key Features and Benefits File System Deduplication optimizes the capacity and cost of primary storage. Distributed Data Movers (DDMs) increase the performance and scalability of storage tiers. Replication enables powerful data protection and data distribution solutions. Management Console greatly simplifies data management complexities. Virtualisation of storage tiers greatly reduces future storage requirements while enabling data to be retained longer. Self-Protecting Architecture leverages integrated data protection, and integrity checks safeguard data both on-site and off-site. Further reading: Quantum StorNext: http://www.quantum.com/products/software/index.aspx About Quantum Scalar tape libraries Designed to grow with your needs, Scalar tape libraries provide best-in-class management, monitoring, and data security capabilities with embedded software called the Quantum ilayer. This software uses detailed information to automatically evaluate the integrity of drives and media within the library, so you can increase backup reliability while decreasing the total cost of ownership. The Scalar family of tape libraries easily integrates into your existing infrastructure and works seamlessly with disk for a complete data protection solution. Further reading: Quantum Scalar tape libraries: http://www.quantum.com/products/tapelibraries/index.aspx Archiving Persistent Data 10

Preserving the World s Most Important Data. Yours. www.quantum.com/stornext, email: softwareinfo@quantum.com Quantum Corporation Northern & Eastern Europe, Middle East and Africa Quantum House, 3 Bracknell Beeches, Old Bracknell Lane West, Bracknell, RG12 7BW, United Kingdom Tel: +44 (0) 1344 353500 Quantum Corporation Central Europe Willy-Brandt-Allee 4, 81829 München, Germany Tel: +49 89 94303-0 Quantum Corporation Southern Europe 8 rue des Graviers, 92200 Neuilly-Sur-Seine, France Tel: +33 1 41 43 49 00 For contact and product information, visit quantum.com or call 800-677-6268 Preserving the World s Most Important Data. Yours. 2010 Quantum Corporation. All rights reserved. Quantum, the Quantum logo, and all other logos are registered trademarks of Quantum Corporation or of their respective owners. Protected by Pending and Issued U.S. and Foreign Patents, including U.S. Patent No. 5.990.810. About Quantum Quantum Corp. (NYSE:QTM) is the leading global storage company specializing in backup, recovery and archive. Combining focused expertise, customer-driven innovation, and platform independence, Quantum provides a comprehensive range of disk, tape, media and software solutions supported by a world-class sales and service organization. This includes the DXi -Series, the first disk backup solutions to extend the power of data deduplication and replication across the distributed enterprise. As a long-standing and trusted partner, the company works closely with a broad network of resellers, OEMs and other suppliers to meet customers evolving data protection needs. WP00148B-v01 Oct 2010