Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group
White Paper: A Deduplication Appliance Solution for the Enterprise Symantec NetBackup 5000 Appliance Contents Executive Summary...................................................................................... 1 Deduplication Technology Overview....................................................................... 2 Ease of Scale............................................................................................ 2 Multiple Use Cases...................................................................................... 3 Centralized Management of Backups and Hardware......................................................... 5 Reliable Hardware Platform.............................................................................. 5 Summary of Key NetBackup 5000 Benefits................................................................. 6 Conclusion.............................................................................................. 6
Executive Summary Most IT organizations today are challenged with the problem of rapid data growth. International Data Corporation (IDC) 1 studies predict data growth to be 50-60 percent per year. As enterprises continue to grow via mergers and acquisitions, data is no longer confined to a single data center, but is often spread across multiple data centers, remote offices and even virtual machine environments. Traditional backup products generally require a rotational schedule of full and incremental backups, which results in significant amount of data movement on a weekly basis. With the heavy reliance on IT systems by businesses, IT managers can no longer rely on traditional backup approaches to reduce downtimes and deliver the stringent recovery point objectives (RPOs) and recovery time objectives (RTOs) needed to ensure business continuity. A thorough reassessment of current data protection architecture, technologies, and processes must take place to drive the right solution. As a part of this data protection infrastructure reassessment, many companies across the globe have now begun a transition from tape to disk-based data protection. Tape systems have been notorious for their inadequacy especially around performance, recoverability and reliability. Disk-based backups solve many of the challenges faced by tape backups, but they only solve part of the overall data protection challenge especially when used with traditional backup products. Over the past few years, data deduplication has transformed the world of data protection. When it comes to disk backups use of deduplication is a given as it dramatically improves storage utilization and reduces bandwidth consumption for backups. Every major backup and storage vendor now offers some form of data deduplication. However, choosing a deduplication solution is not easy. Every environment is different, and there is no one size that fits all. A good deduplication should offer the user a choice of where the deduplication is done (source or target, inline or post-process), be easy to implement and have a palatable total cost of ownership. The NetBackup 5000 appliance is a scalable deduplication solution that comes prepackaged with Symantec NetBackup PureDisk deduplication software on Telco grade hardware certified by Symantec. It provides customers with critical features required to protect their critical data from remote office to virtual environment to the data center. With built-in global deduplication technology, NetBackup 5000 makes disk backups more cost efficient by eliminating the backup of duplicate data across sites and servers thus addressing many of the problems associated with traditional backup products. Due to its turnkey nature, the NetBackup 5000 appliance greatly reduces on-site prep, test, and implementation time while providing a streamlined single-vendor purchasing, implementation, and service experience. It can be deployed as a standalone backup solution for reduction of backup storage and bandwidth or as an alternative to VTLs behind NetBackup. Data backed up by NetBackup 5000 is encrypted in-flight and also at rest thus providing maximum protection for the backed up data and the built-in replication feature allows independence from tape for DR purposes. Finally, the front end capacity based licensing model used for the software portion of the appliance, significantly reduces the total cost of ownership as compared to other hardware based deduplication solutions. 1-IDC, Worldwide Data Protection and Recovery Software 2010-2014 Forecast: Cloud, Deduplication, and Virtualization Stabilize Market, August 2010, By Robert Amatruda, Research Director, Data Protection and Recovery, IDC #224526, Volume 1 1
Deduplication Technology Overview NetBackup 5000 uses NetBackup PureDisk deduplication technology which offers segment level global deduplication across sites and clients for maximum backup storage reduction. During the backup process, the backup data set is broken down into smaller segments and each segment is assigned a hash value which is calculated based upon the binary content of a file. This is done so as to uniquely identify data segments, rather than depending on the file path and name on any given hardware device. Since the sequence can be used to uniquely identify a file by its contents, it is called the fingerprint. The system therefore refers to files in the same manner that the Internet refers to servers. Moreover, files become referable regardless of their position on any given device. The fingerprint is derived from the total contents of the file. The result is that files with the same content will have the same fingerprint, even when the files have different names, locations, attributes, creation or modification dates, and security attributes. Files with different content will lead to a different fingerprint. Only a comparison of two fingerprints is required to know if two files with different metadata (filename, path name, etc.) are unique or not. Ease of Scale NetBackup 5000 can be deployed as a single configured node which offers the user dedupe storage capacity of 16TB or as a multi-node configuration that can scale up to a maximum usable dedupe capacity of 96TB. The expansion of capacity can be done via a web based GUI and does not require any down time for the environment. The data across all nodes is tracked by one central catalog which ensures that once the data is backed up on a node, it does not get backed up again on another node. 2
Multiple Use Cases The NetBackup 5000 appliance can be used in three primary use cases: 1. Data center Backups 2. Remote Office Backups 3. Virtual Machine Backups 4. Improved Disaster Recovery Data center Backups NetBackup can utilize the NetBackup 5000 appliance as an intelligent deduplicated disk target instead of using the traditional VTLs. The deduplication technology on NetBackup 5000 allows customers to more efficiently use their disk capacity by eliminating redundant data and thus allowing more versions of the data to be retained for longer periods of time. More data available on disk also means better supportability for established RTOs and RPOs. In the data center use case, the NetBackup 5000 appliance takes a unique approach to deduplication in that it distributes the load of deduplication across multiple NetBackup media servers thus reducing the possibility of a bottleneck on the appliance itself. This architecture is set up to scale the deduplication performance beyond the boundaries of a single processing head, while maintaining deduplication across the whole dataset. If a media server starts to become a bottleneck for data processing, additional media servers can be introduced in the environment to balance the load thus delivering the best possible performance. NetBackup 5000 can support both inline and post process deduplication with NetBackup. Post process deduplication requires staging area for the backups. The following table gives an illustration of throughput characteristics of the NetBackup 5000 appliance when used as a target for NetBackup backups. 3
Lastly, the data can also be sent to tape for longer term retention via the integration with NetBackup. This option is especially useful for users that wish to implement a hybrid data protection strategy consisting of disk and tape backups. Remote Office Backups Remote office backups are often an afterthought for many IT organizations due to lack of sufficient bandwidth to run backups across the WAN or due to lack of skilled IT personnel at the remote sites to manage tapes needed for traditional backup products. Unprotected machines at remote sites expose organizations to risk of data loss. NetBackup 5000 eliminates these bandwidth and tape-related backup issues by combining sophisticated disk-based backup with data deduplication. NetBackup 5000 works at the source to eliminate data redundancy before it traverses the network and enters the data center. In the case of smaller remote offices with fewer servers, the built-in NetBackup PureDisk agent is installed on remote machines and data is directly backed up to a NetBackup 5000 appliance in a central data center. If the remote office is large with several servers and significant amount of data, a local instance of NetBackup 5000 can be deployed at the remote site which can then replicate data back to the central data center. A local instance of the NetBackup 5000 appliance allows for faster backups and recovery. When compared with a traditional full backup over a typical retention period, NetBackup 5000 can reduce storage consumption by up to 500x and bandwidth consumption up to 50x. Virtual Machine Backups From a data protection standpoint, protecting virtual machines can be more challenging that protecting physical machines. Not only does the backup application need more storage for protecting the virtual machines, but it is also competing with the virtual machines for the same shared resources on the host. NetBackup 5000 quickly and effectively protects virtual machines by reducing the size of the backup data across virtual machines. NetBackup 5000 also eliminates traditional backup bottleneck caused by large amounts of data that must pass through the same set of shared resources on the host such as the Ethernet adapter, CPU, memory and disk resources. The product offers flexibility by using two backup methods a) PureDisk client within the virtual machine or b) Off host virtual machine backups using VMware vstorage API. 4
Improved Disaster Recovery NetBackup 5000 offers two convenient disaster recovery options: 1. Built-in replication 2. DR backup to NetBackup with tape out capability Disk to disk replication is a standard feature available in NetBackup 5000 at no additional cost. The data is replicated in a deduplicated format and is encrypted during transit and at rest. The replication option requires a NetBackup 5000 setup at the secondary site. If NetBackup 5000 is being used as a storage target for NetBackup backups in the data center use case, the replication is controlled by NetBackup via the optimized duplication process. NetBackup remains fully aware of the secondary copy which allows for easy recovery of data in the case of a disaster. Disk to disk replication allows complete independence from tape, but if a customer is looking to keep a copy of the data on tape for DR purposes, NetBackup 5000 offers disaster recovery backups to tape via its integration with NetBackup. With this option, the data that has been backed up to a NetBackup 5000 can be sent to tapes via NetBackup. The process is fully automated via policies defined in the product GUI. Centralized Management of Backups and Hardware The NetBackup 5000 appliance manages multi-site backup operations from a single, intuitive, web-based management console. This includes administration of day to day backups, restores, replication, and reporting. In a multi node configuration, all nodes report into the web based GUI. This eliminates the need for multiple management interfaces. The appliance also allows monitoring of key hardware components such as the disks, memory, power supplies, and fans. The user is notified of any hardware component failures via alerts in the product GUI. Reliable Hardware Platform The NetBackup 5000 appliance is built on a rugged Telco grade hardware platform. The Telco industry hardware guidelines ensure that the appliance can operate reliably under adverse environmental conditions optimizing uptime for the backup environment. The data disks within the appliance are protected using a RAID 6 configuration. The disks, power entry modules and fans all come with redundant components, and are hot pluggable. The appliance also offers four NIC cards that can be bonded together for high availability. 5
Summary of Key NetBackup 5000 Benefits 1. Operational Simplicity: Turnkey appliance sold and supported by Symantec 2. Reliable Hardware Platform: Telco grade hardware platform helps to optimize availability 3. NetBackup Integration: Integration with NetBackup for scalable, high performance, load-balanced diskbackups 4. Global deduplication Applicable across remote offices, data center, and virtual machines all in one single deployment 5. Storage and Bandwidth Reduction: Up to 500x reduction in storage and up to 50x reduction in bandwidth 2 6. Storage Scalability: Highly scalable with up to 16 TB of usable dedupe capacity per node and 96 TB of usable dedupe capacity per deployment. Protect several petabytes of data in one deployment. 7. Performance Scalability: Backup throughput as high as 4.3 TB/hr 8. Flexible DR Options: Built-in replication that allows for tapeless disaster recovery 9. Choice of Deduplication process: Inline or post-process 10. Choice of Deduplication type: Source or target deduplication in one single solution 11. Data Encryption: Built-in 256 bit Blowfish algorithm that ensures security of data in-flight and at rest on the NetBackup PureDisk server. Conclusion NetBackup 5000 solves the challenges associated with traditional backup approaches enabling fast, reliable, storage and bandwidth efficient backups of data across data centers, remote offices, and virtual machines. The turnkey nature of the appliance simplifies the testing, purchasing, implementation, and support of deduplication in backup environments. The built-in replication option improves disaster recovery by eliminating reliance on tapes. With NetBackup 5000, customers can fearlessly take on the challenges of protecting next generation of platforms and applications. 2-6
About Symantec Symantec is a global leader in providing security, storage and systems management solutions to help consumers and organizations secure and manage their information-driven world. For specific country offices and contact numbers, please visit our website. Symantec World Headquarters 350 Ellis St. Mountain View, CA 94043 USA +1 (650) 527 8000 1 (800) 721 3934 www.symantec.com Symantec helps organizations secure and manage their information-driven world with storage management, email archiving, backup & recovery solutions. Copyright 2010 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. 9/2010