Cost Effective Backup with Deduplication
Agenda Today s Backup Challenges Benefits of Deduplication Source and Target Deduplication Introduction to EMC Backup Solutions Avamar, Disk Library, and NetWorker Question and Answer 2
BENEFITS OF DEDUPLICATION Today s Backup Challenges Data growth is unavoidable Exponential growth in backup Typically represents a factor of 4-30x plus production capacity Daily, weekly, and monthly full backups kept for months or years New requirements to keep more data for longer periods Cost for management, media, and offsite storage costs multiply 24x7 data center reality No good time to run backups Bandwidth limitations Virtualization drives consolidation AMOUNT OF DIGITAL INFORMATION CREATED AND REPLICATED EACH YEAR Digital Information 1,773 billion gigabytes (1.773 zetabytes) 173 billion gigabytes Information Growth 60% CAGR 60% CAGR 1996 1997 1998 1999 2000 2001 2002 20032004 2005 2006 2007 2008 2009 2010 2011 Source: IDC White Paper, "The Diverse and Exploding Digital Universe, March 2008 Sponsored by EMC 3
BENEFITS OF DEDUPLICATION EMC s Definition of Deduplication The process of detecting and identifying the unique data segments within a given set of information, enabling the elimination of redundancy when stored or moved. Data Set 1 Data Set 2 Deduplication Data Set 3 Before: total segments = 39 After: Unique segments = 6 4
BENEFITS OF DEDUPLICATION Data Deduplication: How it Works First Instance Duplicate Instance Modified Instance May 2007 May 2007 June 2008 A B A B E B C D C D C D A Only unique data segments are backed up B C D Data already backed up, so only a unique ID pointer is stored (20 bytes) E New data segment identified and backed up A B C D E Unique data stored on disk, available for immediate recovery 5
SOURCE AND TARGET DEDUPLICATION Where Can Data Deduplication Occur? Source Client software agents identify repeated sub-file data segments at the source Only new, unique segments are transferred across the network and stored to disk Shorter backup window, reduces daily impact on physical/virtual infrastructure Target Backup application sends native data to a target storage device Data is deduplicated once it reaches the target during or after the backup Found in VTLs or LAN B2D appliances Transparency to backup application offers users a plug and play experience DEDUPLICATION AT SOURCE DEDUPLICATION AT TARGET Network Network 6
SOURCE AND TARGET DEDUPLICATION When Can Data Deduplication Occur? Immediate at the source before data is sent across the network Data is deduplicated at source (client) Ideal for slow, congested infrastructure (e.g. remote offices, VMware) Leverages existing network links and infrastructure for fast, daily full backups IMMEDIATE DEDUPLICATION AT SOURCE Immediate while the backup is running Content is deduplicated while backup happens Ideal for when the backup window is not a limiting design factor, and for optimizing capacity Scheduled after some or all backup is complete Content stored in original format, dedupe later Well-suited for optimal performance in tight backup windows IMMEDIATE OR SCHEDULED DEDUPLICATION Network Network 7
SOURCE AND TARGET DEDUPLICATION Factors that Impact Data Deduplication Ratios Type of data Duplication in user-generated data is greater than from natural sources Encrypted and compressed data are not ideal candidates for dedupe More user created content = higher deduplication ratio Data change rate Small data change rates = more duplicate data in subsequent backups Less change = higher deduplication ratio Retention policy Longer retention increases chances data will be repeatedly backed up Longer retention policy = higher deduplication ratio Ratio of full backups to incremental backups More full backups increase the amount of data being repeatedly backed up More full backups = higher deduplication ratio Data deduplication performance is tied to a number of factors even small variations can have a significant impact These factors apply for all backup deduplication technologies 8
SOURCE AND TARGET DEDUPLICATION Data Deduplication Impact Remote Replication and Bandwidth Requirements Without deduplication No reduction in local backup storage No reduction in replication time nor bandwidth No reduction in offsite storage OFFSITE REPLICATION WITHOUT DEDUPLICATION Leveraging deduplication Reduced local backup storage Reduced replication time and bandwidth Reduced offsite storage REPLICATE AFTER DEDUPLICATION Backup deduplication Primary Site Remote Site Primary Site Remote Site 9
SOURCE AND TARGET DEDUPLICATION Value of Data Deduplication for Backup-to-Disk Lowers infrastructure costs Reduces backup infrastructure requirements Reduced power, cooling, and floor space Enables longer backup retention periods Less data is easier and less costly to manage Meets regulatory requirements Improves data protection Daily full backup now achievable Disk-based backup also speeds restore times Improved security Disk eliminates risks of lost tapes DEDUPLICATION 10
EMC Data Deduplication Backup Solutions Avamar Disk Library NetWorker EMC Avamar Disk Library Family EMC NetWorker Complete backup and recovery solution Dedupes at the source and globally Single step recovery Integrated HA (RAIN) Flexible deployment (e.g. Data Store, virtual appliance, SW only) Backup-to-disk solution, now with the power of policy-based data deduplication Works with existing backup applications and infrastructure Flexible solutions from small to large environments High performance, direct tape creation, and HA architecture Industry-leading backup and recovery software Integration with both Avamar and Disk Library 11
EMC Avamar Complete Backup and Recovery Solution Full-featured backup solution Software and hardware with data deduplication Source-based, global data deduplication Reduces data at source (client) Reduces data globally (at backend disk) Fast, daily full backups Up to 10x faster daily full backups Leverages existing infrastructure Integrated high availability and reliability RAIN for high availability and fault tolerance Avamar server and data recoverability verified daily Flexible deployment options Avamar software Avamar Data Store Avamar Virtual Edition for VMware environments Avamar Data Store Scalable, turnkey solution for small offices to datacenters 12
EMC Avamar: Real-World Results Avamar Daily Full Backups vs. Traditional Daily Full Backups Data Type Amount of Primary Data Backed Up Amount of Data Moved Daily Windows file systems 3,573 GB 6.1 GB 586:1 Daily De-duplication Ratio Mix of Windows, Linux, and UNIX file systems 5,097 GB 11.7 GB 436:1 Engineering files on NAS (NDMP backups) 3,265 GB 24.2 GB 135:1 Mix of 20 percent databases, 80 percent file systems (Windows and UNIX) 9,583 GB 80.0 GB 120:1 Mix of Linux file systems and databases 7,831 GB 104.2 GB 75:1 Source: EMC While results will vary by data type and mix, Avamar can dramatically improve backup performance and efficiency! 13
Avamar Success Story: Corporate Express Time Shortened, Costs Reduced for Remote Office/Branch Office Backup Before Avamar Storage demands were rapidly increasing Tape library was reaching slot capacity and upgrading was not ideal due to age and maintenance costs Needed to control costs and simplify data management Backup and disaster recovery was time consuming With Avamar Reduced stored data by more than 50%, from 92 TB to 44 TB Achieved significant financial savings Enabled disk-based backups to be completed in 30 minutes, compare to 6 hours in the past for tape Reduced restoration times for business-critical data from 24 hours to minutes We were blown away by the simplicity of the management interface and the comprehensive capabilities offered by Avamar. After carrying out a proof of concept, we clearly understood the benefits Avamar would bring to our business. Mark Jones, Technology Infrastructure Manager 14
EMC Disk Library Family Data Deduplication Capabilities for All Platforms Virtual tape libraries and LAN backup-to-disk platforms Policy-based deduplication IP or SAN connectivity IP replication of deduplicated content Industry-proven CLARiiON back-end High performance 5-9s high reliability Disk Library Family Up to 8 TB/hr performance 4 674 TB scalability Hardware compression Energy-efficiency options Consolidated media management IP or SAN connectivity IP replication 15
DL3D 1500 and DL3D 3000 New LAN-based backup-to-disk platforms with Data Deduplication DL3D 1500 4 36 TB capacity Up to 720 GB/hour performance (SAN) DL3D 3000 8 148 TB capacity Up to 1.44 TB/hour performance (SAN) Policy-based data deduplication Select Immediate or Scheduled deduplication Optimize for storage utilization or for backup performance Replication of deduplicated content for HA Up to 10 sources to one target Data encryption 128-bit AES with ability to turn on/off DL3D 1500 6 Gigabit Ethernet ports for CIFS/NFS 2 Fibre Channel SAN ports (VTL) 4 TB upgrades 3- year Enhanced warranty DL3D 3000 8 Gigabit Ethernet ports for CIFS/NFS 4 Fibre Channel SAN ports (VTL) 4 TB upgrades 3-year Enhanced warranty 16
DL4000 Series Industry s Most Popular SAN VTL Now with Deduplication Based on proven CLARiiON CX3-80 arrays Single or dual engine systems Over a PB usable compressed capacity 1 TB SATA drives; up to 930 drives Enhanced system throughput Hardware compression First and only end-to-end 4 Gb/s solution Policy-based data deduplication Optimize performance; reduce storage and replication costs Energy-efficient Automatic drive spin-down and low-power drives DL4000 Series Industry s only virtual tape library, built from the ground up with 4 Gb/s components 17
Disk Library Family Success Story: Oil & Gas Time Shortened for Backup and Restore Before Disk Library Not meeting backup windows Needed to speed restores With Disk Library Provided flexibility and control to increase performance Increased overall performance to meet backup windows Provided simplicity, reliability, and more efficient management Generated significant cost savings Oil & Gas Disk Library 3000 policy-based deduplication provides the flexibility and control to optimize ingest performance and overall performance 18
EMC NetWorker Complete Backup and Recovery from EMC Centralized control of traditional and next-generation backup Combining today s technologies with tomorrow s in a common framework Industry-leading global data deduplication Reduces backup storage by up to 50x and data moved by up to 500x ideal for VMware environments Broad backup to disk Disk library integration, replication, snapshot management, continuous data protection, and NAS backup to disk Enterprise performance Securely backups and reliable recoveries Better recoverability from tape backups Future-proofed Open Tape Format with better recoverability from damaged tape media 19
EMC NetWorker and Deduplication NetWorker and Avamar Integration NetWorker Clients NetWorker Server and Management Console NetWorker client and Management Console communicate with Avamar Avamar Dedupe Node Storage Node Avamar appears as a NetWorker dedupe node enabled via client properties NetWorker manages metadata and data sent to the dedupe node Server and Storage Disk, VTL, Tape 20
EMC NetWorker and Deduplication Manage Source and Target Data Deduplication Integrated deduplication Select source or targeted based on need Optimize dedupe for the greatest benefit Source using NetWorker client integrated with Avamar Managed via NetWorker for client config, schedules and policies, monitoring, and reporting, full indexing, etc. Target using EMC Disk Libraries DL 1500/3000 for LAN backup-to-disk or VTL DL 4000 and optional policy-based deduplication Keep you backup infrastructure running smooth with EMC Data Protection Advisor DL3D 1500/3000 DL 4000 NetWorker Clients NetWorker Avamar Data Store Disk Tape 21
NetWorker Success Story: Retail Centralized Backup Management, Increased Efficiencies, Reduced Costs Before NetWorker 85% of the environment was virtualized on VMware Restores unreliable and difficult to manage With NetWorker Provided integration with Avamar to deduplicate the VMware environment, reducing the size of file system backups Offered centralized backup management Saved money by increasing FTE efficiency Retail NetWorker integrated with Avamar provides an efficient solution for the centralized management of data deduplication and backup 22
Which EMC Deduplication is Right for You? Let Us Help You Determine the Right Solution Depends on: Application and data type Service-level requirements Current backup challenges and environment EMC tools are available to help you understand the benefits of each solution Deduplication analyzer tools TCO tools Backup, e-mail, and file system assessments 23
Why EMC for Backup-to-Disk with Deduplication Deduplication is a Differentiator Comprehensive, integrated set of deduplication solutions Avamar, Disk Library, NetWorker Saves money and drives efficiencies throughout backup recovery lifecycle Only vendor that can deliver a deduplication solution for any customer need From refresh-to-redesign of the backup and recovery infrastructure Tailored to the size of your company, specific need, and budget Avamar Disk Library NetWorker Talk to your EMC or partner representative to share your backup requirements 24