Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Similar documents
How To Protect Data On Network Attached Storage (Nas) From Disaster

Backup and Recovery 1

Deduplication has been around for several

Protecting Information in a Smarter Data Center with the Performance of Flash

Protect Data... in the Cloud

Data Deduplication: An Essential Component of your Data Protection Strategy

June Blade.org 2009 ALL RIGHTS RESERVED

Business Benefits of Data Footprint Reduction

Demystifying Deduplication for Backup with the Dell DR4000

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

EMC Disk Library with EMC Data Domain Deployment Scenario

Accelerating Data Compression with Intel Multi-Core Processors

DeltaStor Data Deduplication: A Technical Review

Meeting the Top Backup Challenges in Small and Medium Business Environments

A Business Case for Disk Based Data Protection

Turnkey Deduplication Solution for the Enterprise

EMC VNXe File Deduplication and Compression

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

Get Success in Passing Your Certification Exam at first attempt!

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Future-Proofed Backup For A Virtualized World!

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Virtual Machine Environments: Data Protection and Recovery Solutions

Protect Microsoft Exchange databases, achieve long-term data retention

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Why StrongBox Beats Disk for Long-Term Archiving. Here s how to build an accessible, protected long-term storage strategy for $.003 per GB/month.

EMC AVAMAR. Deduplication backup software and system. Copyright 2012 EMC Corporation. All rights reserved.

LEVERAGING EMC SOURCEONE AND EMC DATA DOMAIN FOR ENTERPRISE ARCHIVING AUGUST 2011

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

DEDUPLICATION BASICS

EMC DATA DOMAIN OPERATING SYSTEM

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

SHARE Lunch & Learn #15372

Best Practices Guide. Symantec NetBackup with ExaGrid Disk Backup with Deduplication ExaGrid Systems, Inc. All rights reserved.

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

EMC DATA DOMAIN OPERATING SYSTEM

IBM Storwize V7000 Unified and Storwize V7000 storage systems

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

Aspirus Enterprise Backup Assessment and Implementation of Avamar and NetWorker

Barracuda Backup Deduplication. White Paper

EMC VFCACHE ACCELERATES ORACLE

NetApp SnapMirror. Protect Your Business at a 60% lower TCO. Title. Name

IBM Tivoli Storage Manager 6

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

EMC DATA DOMAIN PRODUCT OvERvIEW

Efficient Backup with Data Deduplication Which Strategy is Right for You?

WHITE PAPER. Effectiveness of Variable-block vs Fixedblock Deduplication on Data Reduction: A Technical Analysis

Native Data Protection with SimpliVity. Solution Brief

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski Spała

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Optimizing Data Protection Operations in VMware Environments

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

EMC NETWORKER SNAPSHOT MANAGEMENT

The Benefits of Virtualization for Your DR Plan

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

D2D2T Backup Architectures and the Impact of Data De-duplication

IBM PROTECTIER: FROM BACKUP TO RECOVERY

Understanding the Economics of Flash Storage

Tape s evolving data storage role Balancing Performance, Availability, Capacity, Energy for long-term data protection and retention

White. Paper. Improving Backup Effectiveness and Cost-Efficiency with Deduplication. October, 2010

Backup Over 2TB: Best Practices for Cloud Backup and DR with Large Data Sets

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

Optimizing Backup & Recovery Performance with Distributed Deduplication

3Gen Data Deduplication Technical

White. Paper. Addressing NAS Backup and Recovery Challenges. February 2012

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Using EMC SourceOne Management in IBM Lotus Notes/Domino Environments

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System

Windows Server 2008 Hyper-V Backup and Replication on EMC CLARiiON Storage. Applied Technology

Library Recovery Center

Barracuda Backup for Managed Services Providers Barracuda makes it easy and profitable. White Paper

Redefining Oracle Database Management

SQL Server Storage Best Practice Discussion Dell EqualLogic

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Everything you need to know about flash storage performance

EMC XTREMIO EXECUTIVE OVERVIEW

Transcription:

Real-time Compression: Achieving storage efficiency throughout the data lifecycle By Deni Connor, founding analyst Patrick Corrigan, senior analyst July 2011 F or many companies the growth in the volume of data is greater than their ability to effectively and efficiently store it and manage it. Recent studies indicate that enterprise demand for primary data storage capacity is growing at a rate of 35% to 65% annually. 1 Much of that data, as much as 80% in some organizations, is comprised of unstructured data -- files, spreadsheets and multiple data types (e.g. CAD, engineering data, PDFs, etc.) -- that are traditionally stored on network attached storage (NAS) devices and file servers. And that unstructured data is projected to grow at a rate of over 60% this year alone. What processes other than the storing of unstructured data is fueling this unbounded storage growth? First, the need to improve recovery time objectives (RTO) and recovery point objectives (RPO) is contributing massively to storage growth -- the number of mirrors, snapshots, replicas, and clones for migration purposes all these processes greatly increase the amount of data that must be stored. Add to that the data that is being replicated for disaster recovery purposes and the data that is being archived for regulatory and compliance purposes. And, then also consider the amount of data that is copied to tape and shuttled offsite for long-term preservation. The amount of data is cumulative and the copies of identical data being stored, while necessary, are creating a storage burden. It affects not only expenditures for more storage, it also impacts storage management, LAN and WAN bandwidth and performance, backup capacity and backup and recovery time. In the world of increasingly narrow backup windows, with data doubling every 18 months, the ability to backup more data in the same window of time is critical. Further, while server virtualization has helped organizations control physical server sprawl, it has not materially helped alleviate the storage capacity issue. In fact, the increased ease of virtual servers is exacerbating the storage capacity problem, as new virtual servers, which require storage capacity, are being deployed on a moment s notice. According to some studies, the effect of virtualizing an environment causes a 4x growth in storage capacity. Virtualization has not only a significant impact on primary storage costs, it also creates a major impact on backup and replication systems as users scramble to protect their data assets. Reducing the amount of storage dedicated to virtual servers by 72% can result in a 3.5x decrease in the recovery time objective (RTO). Note: The information and recommendations made by Storage Strategies NOW are based upon public information and sources and may also include personal opinions both of Storage Strategies NOW and others, all of which we believe to be accurate and reliable. As market conditions change however and not within our control, the information and recommendations are made without warranty of any kind. All product names used and mentioned herein are the trademarks of their respective owners. Storage Strategies NOW, Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the information and recommendations presented herein, nor for any inadvertent errors which may appear in this document. This Storage Strategies Now White Paper was commissioned by IBM and is distributed under license from Storage Strategies NOW. 1 Source: Wikibon 1

Traditional compression When deploying traditional compression for storage optimization users typically forget that it is done as a post-processing task data is first written to disk and then compressed. Depending on the compression software, this is either done manually (using tools such as WinZip, for example), done immediately after the write (Windows NTFS volume compression) or when CPU cycles are available. Since CPU power is needed for both compression and decompression, and disk space is required to accommodate files before compression and after decompression, these techniques do not typically resolve the storage efficiency issue. Deduplicating secondary data Most current solutions for reducing storage capacity requirements focus on compressing and deduplicating secondary backup data and static archives after it is stored on NAS devices. These approaches are fine as far as they go, but they do not address the issue at the point of importance that of decreasing the amount of storage at the get-go the business of decreasing the amount of primary storage that at some time in its lifecycle will be mirrored, replicated, cloned and backed up for data protection. Users often look to deduplication of data using appliances, such as IBM ProtecTIER, to reduce their storage capacity requirements. Current solutions that focus on secondary backup data only partially address the cost of hardware acquisition, the power consumed by more storage devices and the floor space requirements of increased storage capacity. While they reduce the requirements for power/cooling, staff resources and licensing costs, they don t fully remove them. Data deduplication, depending on the method used, analyzes data and looks for files or blocks of data that are the same. When two or more files or blocks match, the system sets a pointer to a single file or block, and does not store multiple copies of that data. Deduplication provides the greatest benefit where there is a significant amount of redundant data. User home directories, email systems that store copies of messages in each user s mail boxes or multiple virtual servers, for example, all of which typically contain multiple instances of duplicate data, are prime candidates for deduplication. Deduplication generally provides less benefit with structured data, such as SQL databases, which are typically normalized to contain minimum amounts of redundant data. Deduplication can also have a negative impact on backup performance, since data must typically be rehydrated, or un-deduplicated, during recovery, which requires additional CPU time. Post-processing deduplication, which is done after data is backed up to disk, postpones processing until CPU cycles are available, making the effect on performance less noticeable. Post-processing deduplication, however, must use some disk space to hold pending transactions, which again, does not help with optimizing storage efficiency. Also, backup systems that employ deduplication are not very effective at deduplicating files that are compressed using traditional methods, thus limiting the value of backup deduplication. Both these approaches to deduplication overlook the simple answer. Deduplicating only secondary backup data solves only a small part of the storage capacity issue. They ignore the effect of data reduction on primary storage data before it is even backed up, archived and replicated where it will have the greatest effect on storage capacity. 2

Final thoughts on data compression and deduplication Data compression and deduplication, which have been very effective at saving on capacity requirements for secondary backup data and for reducing hardware, cooling and floor space costs, have also been deployed for optimizing primary storage, usually at the storage array itself. Using traditional compression and deduplication techniques on primary data can be problematic due to the potential negative performance impact and, especially in the case of deduplication, its effect on backup performance. IBM Real-Time Compression Unlike traditional compression where data is written to disk and then compressed, IBM Real-time Compression compresses data in-line, before it is written to disk. The IBM Real-time Compression technology is deployed on an STN6500 (for 1Gb networks) or an STN6800 (for 10Gb networks) appliance and sit between a network switch and a NAS array to compress primary data. By compressing data before it arrives at the array, an IBM Real-time Compression Appliance can provide a primary storage reduction of up to 80%, depending on the types of data being compressed without impacting performance or other operations. It compresses the data, leaving the metadata (file permissions, Access Control Lists, ownership information, etc.) intact when stored on the storage array. The storage array, and not the appliance, then returns the write commit information to the application. No changes are required to servers, storage arrays, applications or downstream processes such as backup, archiving, deduplication, snapshots or replication. Integral to IBM's Real-time Compression is the IBM Random Access Compression Engine (RACE). RACE, which is based on 35 patents, allows real-time, random access compression without performance degradation. IBM s RACE uses standard LZ compression algorithms and compression is performed using random access techniques. Read and write operations only need to access the blocks of the compressed file that must be read or written to, rather than decompressing and recompressing the entire file. This technique dramatically improves both read and write operations. In addition, since there is less data being written to the storage array, there is less I/O and with less I/O also comes more CPU cycles to process the given read and write requests. Further, and perhaps most importantly, by compressing data in front of the storage array, a net increase in effective cache size is achieved. Whatever the compression ratio is for your data, this Polycom We deal with the growth of data every day. Polycom has a lot of products, and all of them require multiple versions that we have to store and back up indefinitely, says Amit Bar On, IT manager for Polycom. IBM s Real-time Compression Appliance helps us to manage the data growth more efficiently. We are now less concerned about storage capacity than we ever were before, and at the same time saving on costs. compression ratio transcends to your storage cache. If your data is compressible by 3:1, IBM Real-time Compression provides the equivalent of increasing the size of your storage cache by three times. Since cache is one of the most expensive components of a storage array, and since cache tends to have the biggest impact on storage performance, the more you can increase cache, the better performance users and applications will see. 3

IBM Real-time Compression also allows downstream operations, such as backup, deduplication and snapshots to function optimally without the need to decompress the data prior to any processing by the downstream operation. Because data can be effectively processed (backed up, deduplicated, mirrored, replicated, etc.) in its compressed state, both processing time and storage requirements are significantly reduced. IBM Real-time Compression is designed to optimize both primary and active secondary storage. Ben-Gurion University In the past three years we have continued to see an exponential growth rate in data storage requirements. We ve been amazed by the amount of compression that we can achieve by using the IBM Real-time Compression Appliance. The net of IBM Real-time Compression reduces the data footprint throughout the data lifecycle. Since data is compressed on primary storage its benefits cascade forward, requiring fewer resources, including storage, network bandwidth, power, cooling, floor space, staffing and backup resources. Compression Accelerator The IBM Real-time Compression technology also includes a Web-based utility called the Compression Accelerator, which non-disruptively compresses data already stored on disk as a background task. The Compression Accelerator is a high-performance and intelligent software application running on the IBM Real-time Compression Appliance that, by policy, allows users to compress data that has already been saved to disk while that data remains online and accessible by applications and end users. The policies allow users to throttle how decompressed data gets compressed so as not to have an impact on existing storage performance. To reduce possible impact to throughput, the Compression Accelerator s policy-based management enables access to policies which allow granular control over background compression tasks. IBM Real-time Compression Appliance s ability to transparently compress already stored data significantly enhances and accelerates the benefit to end users and increases their ROI, by freeing up to 80% of the used capacity for new workloads. With the Compression Accelerator running in background, users can reclaim an average of 20TB of existing storage capacity every 24 hours. How Real-time Compression differs from traditional compression With traditional compression, in order to modify a file, the file must be uncompressed, edited, then recompressed into a new file. If data is inserted, all subsequent data blocks after the insertion are either shifted or modified. (See Figure 1. Compression Techniques and File Modification). This creates a negative impact on any downstream deduplication process. With IBM Real-time Compression, an edit only affects the block being edited. If a new data is inserted, IBM Real-time Compression can add the new data and then use a data map to locate that data without rewriting the entire file. This approach creates minimal impact on downstream deduplication and similar processes. 4

Figure 1. Compression Techniques and File Modification IBM Real-time Compression combined with deduplication Studies have shown that the combination of IBM Real-time Compression and downstream deduplication can provide significant reductions in storage requirements beyond what each approach can achieve on its own. Because the compression appliance is transparent to the network, servers, storage devices and applications, the implementation is non-intrusive and does not require system, application or process modifications. IBM Real-time Compression works transparently with and optimizes IBM ProtecTier, NetApp, EMC Data Domain, Celerra and VNX and other storage and deduplication environments. With less primary data being written to disk, there is less data to deduplicate. When IBM Real-time Compression was combined with IBM ProtecTIER, test results showed an 82% savings in initial storage and a 96% overall data reduction. Backup time was reduced by 71% and a lower CPU utilization and lower disk activity were seen on the ProtecTIER deduplication engine. When data was compressed with IBM Real-time Compression and then fed through an EMC Data Domain deduplication appliance, results indicated a 40% improvement in capacity, a 72% reduction in backup time and significant reductions in CPU cycles (72%), disk activity (67%) and network traffic (77%). 5

Benefits To recap, use of IBM s Real-time Compression can provide as much as 5x the storage efficiency savings and results in these benefits: Reduced storage costs. With compression rates of up to 80%, the costs for storing a given amount of raw data are substantially reduced. With an average compression rate of 65%, 3TB of data can be stored on 1TB of disk. This reduction in data stored applies not only to primary storage, but to backups and archives as well. Reduced CAPEX/OPEX. Storage hardware requirements are effectively reduced, as are costs for power, cooling, staffing and floor space build-out and leasing. Transparently fits into your storage environment requiring no changes to any of your existing processes Reduced data size means less LAN and WAN traffic and faster disk reads and writes, reducing data bottlenecks. Meeting Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs). RPOs and RTOs can more easily be met, since IBM Real-time Compression reduces both the volume of data to be restored, when compared to raw, uncompressed data, as well as the time required to restore that data. Studies show a 3.5x decrease in RTOs. Improved backup and restore performance, evidenced by 6.6x faster backups. Snowball VFX We simply couldn t create or maintain the amount of data we need without the IBM Real-time Compression solution, said Yoni Cohen, Founder, Snowball VFX. Without data compression we would have needed twice the amount of disks and twice the amount of storage systems. With IBM Real-time Compression, we can buy a smaller storage system, but maintain the same capacity and performance as a larger, more expensive system. The IBM Realtime Compression Appliance enables us to stay competitive and continue to deliver higher quality animation and effects to our customers at a unique price point in our industry. Lowered backup costs. Less data to back up can reduce the requirements for additional tape libraries, backup software licenses (as much as 2x fewer licenses), staffing and backup media. Faster replication as much as and 3.3x faster. Reduced data footprint throughout the data lifecycle. Since data is compressed on primary storage its benefits cascade throughout the entire data lifecycle, requiring fewer resources, including storage and network bandwidth, and associated management costs. SSG-NOW Assessment The addition of primary data compression capabilities is an important step in an enterprise s storage efficiency strategy. IBM Real-time Compression Appliances provide a significant reduction in primary data which effects storage capacity requirements, downstream processes such as backup and recovery, and operating expenses. By providing seamless and easy to deploy real-time compression appliances, IBM has brought the advantages of real-time compression to a new level of convenience for a broad range of organizations. Transparent realtime compression, particularly if processes are not impacted by additional compute time, should be considered by organizations of all sizes. To learn more about IBM Real-time Compression Appliances, go to www.ibm.com/storage/rtc. TSL03060-USEN-00 6