EMC Disk Library with EMC Data Domain Deployment Scenario Best Practices Planning Abstract This white paper is an overview of the EMC Disk Library with EMC Data Domain deduplication storage system deployment scenario. January 2010
Copyright 2010 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners. Part Number h6925 Best Practices Planning 2
Table of Contents Executive summary...4 Introduction...4 Audience... 4 Terminology... 4 Disk Library and Data Domain deployment overview...6 Supported environments... 7 Supported Data Domain systems... 7 Supported EMC Disk Library systems... 7 Physical connectivity... 7 Sizing the Disk Library with the Data Domain system... 8 Conclusion...12 References...12 Best Practices Planning 3
Executive summary Today s IT environments are faced with the combination of data growth and shrinking backup windows. Restore time objectives (RTOs) and restore point objectives (RPOs) are also becoming more stringent, increasing the importance of a highly reliable, high-performance backup environment. As a complement to tape for long-term, offsite storage, backup-to-disk and the EMC Disk Library products have emerged as powerful solutions. Customers seeking the advanced virtual tape library (VTL) functionality of the Disk Library as well as the ROI benefits of deduplication can leverage a Disk Library deployment with Data Domain. This enables customers to move data to Data Domain deduplication storage systems for longer-term retention of data and network-efficient replication. Replication of deduplicated backup data is supported with Data Domain deduplication storage interoperability. The advantages of replicating with Data Domain are that you will be replicating deduplicated data and significantly reducing bandwidth. Further, Data Domain Replicator software sends only new and unique data segments to the remote location. In addition, all data is verified independently at the remote site after copy. Data Domain also offers flexible deployment options to meet a broad scope of data protection needs. Replicator software provides a full range of replication options, including collection, tape pool, many-toone, bi-directional, and cascaded replication. Data Domain network-efficient replication is an efficient way to satisfy vaulting requirements. There are three Disk Library software options available to enable deployment with Data Domain: Automated Tape Caching Embedded EMC NetWorker storage node Embedded Symantec NetBackup media server Any one of these software options will allow you to copy Disk Library virtual tapes to a Data Domain system. Introduction This white paper provides an overview of the best practices involved with the interoperability of the Disk Library and Data Domain deployment scenario. Audience This white paper is intended for EMC customers, EMC system engineers, and members of the EMC and partners professional services community who are interested in incorporating EMC Disk Libraries and Data Domain systems into a backup environment. Terminology Automated Tape Caching Licensable option that allows data to be temporarily stored on the Disk Library. That data is eventually written to back-end physical tape, or a Data Domain appliance, allowing space to be freed up on the Disk Library. Deduplication Process of detecting and identifying the redundant variable-length blocks (or data segments) within a given set of data to eliminate redundancy. DD690 Data Domain DD690 system. DD880 Data Domain DD880 system. DL4000 Series EMC Disk Library 4000 series appliances. DL5000 Series EMC Disk Library 5000 series appliances. Best Practices Planning 4
Embedded media server A feature available on the Disk Library providing Symantec NetBackup media server functionality embedded within the Disk Library engine(s). This allows for NetBackup environment awareness of duplicate copies of virtual tapes that are exported to a physical library connected to the back end of a Disk Library and controlled by the embedded media server. Embedded storage node A feature available on the Disk Library providing NetWorker storage node functionality embedded within the Disk Library engine. This allows for NetWorker environment awareness of clone copies of virtual tapes that are exported to a physical library connected to the back end of a Disk Library and controlled by the embedded storage node. Engine A Disk Library or Data Domain deduplication appliance server. Flex port Fibre Channel (FC) ports on the Disk Library server that can be configured as either front-end (SAN client) ports or back-end (physical library) ports. Flex ports do not connect to the EMC storage arrays. See also library port. Library port Fibre Channel (FC) ports on the Disk Library server(s) used to connect to a back-end physical library, another Disk Library, or a Data Domain appliance. These ports are also referred to as initiator ports. Remote replication Backup data residing on a Data Domain appliance is copied over a LAN or WAN to another Data Domain appliance in deduplicated form for disaster recovery protection. SAN client A backup server that connects through a FC SAN to a Disk Library. SAN client port FC ports on the Disk Library server used to connect backup servers (clients of the Disk Library). These ports are also referred to as target ports. Server A Disk Library or Data Domain appliance server. Also known as an engine. Tape migration The process of sending data from the Disk Library to the Data Domain system using Automated Tape Caching. TLU Tape library unit, sometimes referred to as a physical library unit (PLU). Virtual tape library (VTL) Software emulation of a physical tape library system. Best Practices Planning 5
Disk Library and Data Domain deployment overview The Disk Library with the Data Domain deployment scenario is a 4000 or 5000 series Disk Library with a Data Domain DD690 or DD880 system as shown in Figure 1. In this deployment scenario, data in the Disk Library virtual tape cartridges is migrated or copied to the Data Domain system where it is deduplicated to remove data redundancies, resulting in longer data retention capability than a stand-alone Disk Library. The Data Domain system does not need to be dedicated to the Disk Library. While operations are occurring from the Disk Library to the Data Domain system, concurrent NAS or VTL jobs can be occurring in parallel on the Data Domain system. Figure 1. EMC Disk Library with the Data Domain deployment data flow The EMC Disk Library: Provides significant performance advantages over tape-based solutions since data is written to disk Eliminates all single points of failure for a reliable solution with a high availability (HA) design with redundant components and active engine failover Presents itself as one of many standard, open-system tape library and tape drive formats to backup applications The Data Domain deduplication storage system: Eliminates redundant data from backups to reduce storage requirements, enabling longer onsite retention, and reduced replication costs Performs sub-file, variable block length deduplication as data is ingested into the system Includes built-in data compression that is additive to deduplication in the data reduction process Best Practices Planning 6
The Data Domain Replicator software option leverages its deduplication and compression capabilities, substantially reducing the amount of backup data that needs to be sent to a remote site. Data Domain Replicator software provides rapid local and remote restore with the following benefits: Permits bi-directional replication between Data Domain systems Replicates deduplicated virtual tapes to reduce bandwidth requirements Further reduces network traffic as only changed data is sent to the target Data Domain system Automatically replicates tapes to the target system Provides detailed replication reporting through the Data Domain GUI and CLI Supported environments The Disk Library with the Data Domain system supports the backup applications and versions listed in the EMC Support Matrix on Powerlink for the Disk Library. Supported Data Domain systems The following Data Domain systems are supported: DD690 Version 4.7.3.1 or later DD880 Version 4.7.3.1 or later Supported EMC Disk Library systems DL4000 Series Version 3.3 SP1 or later DL5000 Series Version 4.0 or later These Disk Library models have been qualified with the above Data Domain systems. All configurations require an official EMC Request for Product Qualification (RPQ). Physical connectivity The Disk Library is comprised of one or two servers (engines) attached to one or two CLARiiON arrays. The Disk Library connects to the Data Domain system through a storage area network (SAN) using one to four FC ports on each Disk Library engine (maximum of four connections to the Data Domain system allowed) connected to one to four ports on a Data Domain system. For four-port connectivity, two FC HBA cards must be installed in the Data Domain system. Direct-connecting FC cables from a Disk Library to a Data Domain system is not supported. Each Disk Library engine has four Fibre Channel library ports (4, 5, 8, and 9) for initiator mode SCSI attach. Any one of these ports can be used for connection to the Data Domain appliance. With the Disk Library system, any unused library ports are available for connecting a physical tape library or another Disk Library for use as a back-end library. Figure 2 shows one of the possible ways the Disk Library can be interconnected with the two Data Domain models. Best Practices Planning 7
FC ports available for SAN connections to backup servers FC ports available for SAN connections to backup servers 1 3 7 11 1 3 7 11 4 4 5 8 Disk Library Server A Disk Library Server B 5 8 9 9 0 2 6 10 0 2 6 10 CLARiiON Array FC SAN 6B 6A 5B 5A Ethernet Ports for Replication, Management and Data 1 2 Data Domain System Figure 2. Disk Library with four Data Domain system interconnections Sizing the Disk Library with the Data Domain system In order to properly size the Disk Library when configured with the Data Domain system, retention requirements must be thoroughly reviewed and changed to accommodate the longer retention times possible with deduplication technology. Each system must be sized separately according to the retention scheme desired and data access needs to be anticipated to take full advantage of the features of that system and may require backup policies to be re-evaluated. Storage capacity must be sized to adequately handle the amount of data expected to be retained in both native and deduplicated format. Please contact your EMC representatives to properly size the environment in which this interoperability will be used. Best Practices Planning 8
Moving data from the Disk Library to the Data Domain deduplication storage system There are three methods for tape migration or copying data from the Disk Library to the Data Domain system. These three methods use existing features within the Disk Library software. Automated Tape Caching With the Automated Tape Caching feature, the virtual tapes act as disk-based cache to physical libraries such that data is first written to virtual tapes in a VTL and later copied to virtual tapes in the Data Domain system based on user-defined policies. The movement of data over the SAN from the Disk Library to a Data Domain system is done by a process called tape migration. Migrating data causes a copy of the data to exist in two physical locations, one on the Disk Library and one on the Data Domain system. All reads and writes of the data will occur with the copy of data present on the Disk Library. This data resides on both systems until a reclamation process is run on the Disk Library. Reclamation removes the data in the virtual tape on the Disk Library and replaces it with a pointer (or tape stub) to the data on the Data Domain system. After reclamation, the data is only present in its compressed and deduplicated form on the Data Domain system. This feature is the recommended feature to use when you are using a backup application that is not EMC Networker or Symantec NetBackup and does require a Disk Library license key to activate. For best practices planning and for more information on how to set up and use Automated Tape Caching on the Disk Library, please see the EMC Disk Library Automated Tape Caching Feature A Detailed Review white paper on Powerlink. Embedded storage node (EMC NetWorker) The embedded storage node software treats the Disk Library emulated libraries and drives as if they were physical tape libraries and drives. From a backup application point of view, the devices are standard backup targets. Using the EMC NetWorker cloning operation, data is cloned from the Disk Library to the Data Domain system based on user-defined policies. Data is also expired on the Disk Library and the Data Domain system based on user-defined policies. This feature is the recommended feature to use when you are using the EMC NetWorker backup application and it requires a Disk Library license key to activate. For best practices planning and for more information on how to set up and use the embedded storage node on the Disk Library, please see the EMC Disk Library with NetWorker Best Practices Planning white paper on Powerlink. Embedded media server (Symantec NetBackup) The embedded media server software treats the Disk Library emulated libraries and drives as if they were physical tape libraries and drives. From a backup application point of view, the devices are standard backup targets. Using the Symantec NetBackup duplication operation, data is duplicated from the Disk Library to the Data Domain system based on user-defined policies. Data is also expired on the Disk Library and the Data Domain system based on user-defined policies. This feature is the recommended feature to use when you are using the Symantec NetBackup backup application and it requires a Disk Library license key to activate. For best practices planning and for more information on how to set up and use the embedded media server on the Disk Library, please see the EMC Disk Library with VERITAS NetBackup Best Practices Planning white paper on Powerlink. Best Practices Planning 9
Using the Disk Library with the Data Domain system The most common scenarios for using the Disk Library with the Data Domain system are discussed below. These scenarios include the use of existing Disk Library software options - Automated Tape Caching, embedded storage node, or embedded media server to copy data to the Data Domain system. Copying data from the Disk Library to the Data Domain system In this scenario, either one or two engines are writing data to the Data Domain system. Data is migrated from the Disk Library (using tape caching) or is copied (using the embedded media managers) to the Data Domain system. Data is sent to the Data Domain system through a Fibre Channel SAN. This SAN can be either a normal SAN or can be an extended Fibre Channel SAN. With the Automated Tape Caching feature, the backup application sees the local copy of data and data access is through the Disk Library. With the embedded storage node or embedded media server, the backup application is aware of both copies of data and data access is through the backup application. Copying data from the Disk Library to Data Domain and to a physical tape library In this scenario, data is copied to the Data Domain system and a physical tape library via the embedded storage node/media server. In this configuration, the data can reside on each of the three units for different retention periods. Each engine would have to see the Data Domain system and the physical tape library since the data is seen by each engine individually. Multiple engines can be used in a dualengine configuration, with each writing to its own Data Domain system and physical tape unit. Best Practices Planning 10
Copying data from the Disk Library to multiple Data Domain systems Here, the two Disk Library engines write data to two separate Data Domain systems. Data can either be migrated from the Disk Library (using Automated Tape Caching) or copied (using the embedded media server/storage node) to each Data Domain system from its specific Disk Library engine. This is well suited for environments that require the highest performance and wish to fully utilize the performance capabilites of the Disk Library. Copying data to the Data Domain and replicating to another Data Domain In this example, data is written to the Data Domain system and then replicated to another Data Domain system. Data can either be migrated from the Disk Library (using tape caching) or is copied (using the embedded media managers) to the Data Domain system. The data is then automatically replicated to another Data Domain system. Once the data is present at the target site, it can be recovered using Data Best Practices Planning 11
Domain standard replication commands. A dedicated Disk Library on the target side is not required, although in some tape caching environments, a Disk Library on the target side may be required. Conclusion The information presented in this white paper is intended to provide an overview of a Disk Library with the Data Domain deployment scenario in common backup environments. For more in-depth best practices planning and configuration suggestions, please see the associated white papers available on Powerlink. References EMC Disk Library and EMC Data Domain Solution Sizing Process Guide (for EMC employees only) EMC Disk Library Automated Tape Cache Feature - A Detailed Review white paper EMC Disk Library DL4106, DL4206, and DL4406 Version 3.2 - Best Practices Planning white paper EMC Disk Library with NetWorker - Best Practices Planning white paper EMC Disk Library with VERITAS NetBackup - Best Practices Planning white paper EMC CLARiiON Backup Storage Solutions - The Value of CLARiiON Disk Library with TSM: A Detailed Review white paper Data Domain EMC NetWorker V7.4 Application Introduction Data Domain VERITAS NetBackup 6.5 Application Introduction Data Domain IBM Tivoli Storage Manager Integration Guide Best Practices Planning 12