Engineering White Paper Backup-to-Disk: An Overview Abstract This white paper is an overview of disk-based backup methodologies. It compares disk and tape backup topologies and describes important considerations to be taken into account in enterprise-class backup-todisk environments. Published 3/3/2003
Copyright 2003 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. Part Number C1018 Backup-to-Disk: An Overview 2
Table of Contents Summary...4 Disk Backup Overview...4 Advantages of Backup-to-Disk... 4 Backup Performance... 5 Restore Performance... 5 Media Reliability and Data Availability... 5 Overall IT Efficiency... 5 Backup-to-Tape Evolution... 5 Backup-to-Disk Evolves... 6 ATA Technology... 6 Comparing Disk and Tape-Based Configurations... 6 Backup/Restore Time... 7 CLARiiON Storage Array versus Native Tape Drive Performance... 9 Conclusions...10 References...10 Backup-to-Disk: An Overview 3
Summary For many years, backing up data to tape has been the standard. Now, backups to disk have surpassed tape as the premier backup medium, given disk s greater value proposition. Storage backup solutions that incorporate backup-to-disk yield significant benefits over traditional backup-to-tape. These benefits include: Near-term recovery of mission-critical data Rapid restore from disk Greater reliability of the backup medium Multiple host data streams to disk As a complement to using tape for long-term storage, backup-to-disk is an emerging and powerful solution for rapid recovery of mission-critical data. There are many applications that require frequent retrievals of recently captured data, as well as the pervasive need for backup of business-crucial data that must be preserved and retrieved quickly and efficiently. Leading-edge customers are using disk as the destination for storage management application output. Advanced Technology-Attached (ATA) disk technology provides benefits over tape with the performance of disk. Backup arrays or disk-based backups will not replace tape, but will shift tape into an archival role. This paper: Describes the use of disk technology in backup topologies Compares and contrasts disk versus tape backup Highlights considerations to be taken into account in disk-based backup environments Disk Backup Overview Traditionally, backup software was created to write to a tape device. Today, most backup software products support writing to disk, which means writing to a disk file in a file system. The file system may be on a Windows, NetWare, Linux, UNIX, or other platform, depending on the backup server. Disk-based RAID configurations enhance data protection beyond what tape can provide. In most situations, backups-to-disk are faster than tape drives when comparing raw throughput performance. Faster backup shortens backup windows, helping businesses meet their availability windows. Faster restore time provides faster access to information, helping businesses resume operations more quickly. Advantages of Backup-to-Disk Traditionally, tape has been the backup medium of choice due to its cost-per-mb advantages compared with disk. However, the economics of disk are narrowing that gap. EMC CLARiiON's implementation of ATA drive technology is defined in the CLARiiON Storage Arrays section. The advantages of using disk over tape with backup solutions can be grouped into four major categories: Backup performance Restore performance Media reliability and data availability Overall IT efficiency Each of these advantages will be explained in more detail. Backup-to-Disk: An Overview 4
Backup Performance CLARiiON storage systems are much faster than the new-technology tape drives such as SDLT and LTO. Some tape technologies respond to a minimal data stream by shoe-shining or excessive positioning. This physical behavior can significantly reduce a tape drive s performance. Disks do not experience this behavior because they are inherently random access. Restore Performance Faster recovery time for disk drives over tape and tape drives. The difference can be seconds or minutes versus hours with tape. Disks support random and sequential access. Tapes support sequential access only. This enables faster access of data files on disk, improving overall performance. If data is on several tape cartridges: 1. The library must mount each tape. Time: Up to a minute per tape. 2. The tape must load. Time: Thirty seconds to a few minutes. 3. The tape must be positioned to the desired data. Time: An average access time is a few minutes. 4. The tape must be rewound and unloaded. Time: Thirty seconds to a few minutes. 5. Load the tape and repeat the cycle. Time to first byte takes milliseconds for disk versus seconds-to-minutes for tape. Media Reliability and Data Availability Disk system RAID protection enhances data availability and prevents data loss in the event of a disk drive failure, whereas tape-specific media errors are common. Tape handling is reduced or eliminated. Maintaining the set of tapes from a tape library can be problematic and requires properly trained personnel. Overall IT Efficiency Disk does not require tape handling/positioning and RAID protection makes it inherently more reliable. There is less need to perform frequent full backups. Fewer backups need to be performed, saving network and CPU load. Tape undergoes a technology shift every three years, so a conversion process from old to new media must be undertaken at that interval. Disk technology does not go through these types of transitions since the format of the data is not changed as it is with tape technology. New larger capacity disk drives reduce floor space requirements compared with equivalent-capacity tape libraries. Backup-to-Tape Evolution Tape s primary advantages are cost and its ability to be used as removable media. A relatively inexpensive tape media (when compared to the price of the tape drive itself) can be inserted, filled with data, and replaced with another piece of tape media, allowing the tape drive to write an infinite amount of data. However, this movement of media in and out of the drive results in costly manual intervention. This mechanical process was automated with the tape library. Backup-to-Disk: An Overview 5
Initially, all tape library devices (the robot and all tape drives) were connected to one system. Thus all data written to the tape drives needed to pass through that one system. In this model, one system (the backup server) was often a large system, capable of handling lots of I/O as well as the CPU load of de-packetizing the backup data arriving inside network packets. As backup models matured, backup software products supported connecting tape library devices housed in one library to different systems. In this model, a common configuration would be for the autochanger and one or more tape drives to be connected to the backup server, with one or more of the remaining tape drives being connected to one or more other systems. These systems, either database or application servers, had significant amounts of data needing to be backed up. These systems and the backup server could also perform backup tape writing for other systems that sent their data through an IP network. The chief limitation here was that the tape drives were statically assigned to the systems. Use of the tape drive to back up another system s data required that the data be transferred over an IP network, which placed a substantial burden on both the sending and receiving systems. The arrival of storage area networks (SAN) allowed many host systems to access the same tape and disk devices. Now, multiple host systems could write to the same tape drive. Backup software products became yet more sophisticated, with the backup server functioning as a traffic director, ensuring that only one system at a time wrote to a tape drive. Conceptually, tape drives could be moved from host system to host system as needed to perform backups or restores. SAN-based centralized backup topologies are the desired goal for many companies; LAN-based backup is the reality. Backup-to-Disk Evolves Initially, backup software program implementations of backup-to-disk were not as complete as backup-totape. The biggest single reason for this was the price of disk storage. The relatively high cost of disk versus tape made backups to disk unaffordable in most situations. Some backup software products, however, used disk as an intermediary medium: the initial backup performed during the backup window was done from disk to disk; then, at some later time, the backed-up data was moved to tape. This has several advantages, particularly when incremental or differential backups are performed. Because incremental/differential backups capture only new and changed data, the backup application can spend a considerable amount of time looking through the data before finding something that needs to be backed up. While the looking is occurring, the backup device (tape) is idle and not available for other purposes. Many tape drives do not perform well when subjected to alternating idle/busy/idle activity. Disk drives, on the other hand, were designed for exactly this kind of use, and perform much better as receivers of incremental data. ATA Technology EMC has implemented ATA disk technology with CLARiiON. This enables customers to keep more data online for longer periods of time. For many, previous alternatives were not affordable or justifiable. Customers can now mix and match performance Fibre Channel drives and capacity ATA drives within the same array, under common management. The CLARiiON software suite supports ATA drives. This singlearray implementation provides the deployment flexibility customers seek. Comparing Disk and Tape-Based Configurations In both backup-to-tape and backup-to-disk environments, a restore of the data involves a bulk movement of data from the backup medium to the destination disk. Though all system components have increased in speed tremendously over the years, so, too, has the size of the data sets. It still can take many hours to do a bulk restore of the data. Backup-to-Disk: An Overview 6
Enterprise-class storage arrays, including CLARiiON, used as backup destinations, address the many issues found in tape and provide many features that separate them from JBOD. RAID protects data from the failure of a single-disk drive Snapshots, mirrors, and clones provide rapid and near-instant backups and restores Storage array disks, in addition to performing in a backup-to-disk scenario, can also be used in replicabased backup scenarios; this is a very powerful capability that tape technology cannot do, and almost all backup software vendors are now shipping products that use the storage array replicas as backups. So far, we ve assumed that a single system can write a single backup stream to a single device. This is often undesirable, particularly in cases where a particular backup stream (for whatever of many reasons) runs slowly. To maximize the investment in the backup device, most enterprise-class backup software products can write multiple backup streams to the same device concurrently. This process is called multiplexing. Multiplexing enables a host system with four data disks to back up all four disks simultaneously to the same output device. Multiplexing was particularly valuable when network backups over a relatively slow network of relatively slow host systems were sent to a relatively high-speed tape device. Now, with the advent of storage area networks and faster host systems, the bottleneck has shifted to the output device. Backup-to-disk supports the system throughput to eliminate this bottleneck, improving overall system performance. With multiplexed data, restores from tape can be much slower than restores from disk due to the manner in which the multiplexed data is written on the backup media. This is not to say that tape is dead. It is not. But its role has shifted from a backup and restore medium to an archival one. Customers should leverage disk solutions for their backup and restore operations and position tape resources for long-term archival solutions. The unaffordable cost of downtime to the business requires that disk-based backup and restore technologies replace traditional backup-to-tape practices. Once disk-based backups are complete, data is not only more secure than it would be if on tape media, but it can be archived to tape (or other non-alterable media) at a later time to satisfy legal, governmental, and other retention mandates. An organization s current investment in tape technology is still valid, but the role changes from one of primary backup to one of archival. Tape can no longer fulfill its original backup role. Because tape is a sequential-access medium, it is not possible to perform both a backup and a restore using the same tape drive at the same time. So, if a restore must use tape media that is already in use for a backup, either the restore must wait for completion of the backup, or the backup must be aborted. Since disk is a random-access medium, it is possible for backups and restores to use the same disk-based backup device simultaneously. Backup/Restore Time When comparing and contrasting the performance of a backup-to-disk implementation to a backup-to-tape implementation, both the throughput performance and the overall time to complete a backup or restore operation must be considered. There are vast differences between tape and disk in the overall time it takes to perform a backup or restore task. The following two charts compare the restore time difference between tape and disk. Backup-to-Disk: An Overview 7
Disk-to-Disk Restore Time Total Elapsed Time 0:45 4% 96% File Access Time Xfer Data Tape-to-Disk Restore Time Total Elapsed Time 12:45 4% 6% 45% 37% 8% Tape Load Tape Ready File Access Time Xfer Data Rewind/Unload These examples show a typical scenario where a subset of data is requested for restoration. As the chart shows, it took the disk-to-disk restore about 45 seconds to restore the data (1.5 GB). In the tape-to-disk scenario, it took roughly 12 minutes, 45 seconds to complete. This example also accounts for the fact that the requested data can be located on several incremental backup sets or media that all must be loaded and unloaded. Users should account for this overhead when comparing performance of restore media. Customers should consider their recovery time objective, reliability needs, and footprint requirements when determining their optimal solution. Backup-to-Disk: An Overview 8
CLARiiON Storage Array versus Native Tape Drive Performance The following chart compares performance results for disk and tape devices while backing up a large dataset. Each Fibre Channel RAID 5 group used a five-disk (4+1) configuration. Each serial-ata RAID 5 group used a nine-disk (8+1) configuration. CLARiiON versus Tape Drive Performance 2:1 Data Compression MB/s 40 30 20 10 41 39 36 34 34 30 28 19 CX600 FC CX600 ATA CX400 FC CX400 ATA CX200 FC SDLT 320 LTOs SDLT 220 0 Backup Application Average Specific performance and implementation details are available in expanded white papers tailored to individual backup applications. Please refer to these papers for additional detail. Backup-to-Disk: An Overview 9
Conclusions Backup-to-disk is emerging as a technology that offers significant benefits over the traditional tape backup process. With the changing economics of disk technology, backup-to-disk solutions are now affordable. Leading-edge customers are implementing backup-to-disk solutions as improvements to their existing tape implementations. Major advantages of backup-to-disk include: Faster backup performance Faster restore performance Enhanced media reliability and data availability Improved IT efficiency Elimination of tape positioning, tape errors, and other mechanical issues Improved backup reliability Many backup software applications support backup-to-disk functionality today. They are able to leverage the superior performance of CLARiiON disk storage systems. EMC Engineering has tested and currently supports these configurations. This effort helps customers leverage their application investments to meet today s business needs for improved backup and restore operations. References CX-Series Backup-to-Disk Guide (Overview) Backup-to-Disk Guide with EMC Data Manager (EDM) Backup-to-Disk Guide with CA BrightStor ARCserve Backup Backup-to-Disk Guide with CA BrightStor Enterprise Backup Backup-to-Disk Guide with CommVault Galaxy Backup-to-Disk Guide with LEGATO NetWorker Backup-to-Disk Guide with VERITAS Backup Exec Backup-to-Disk Guide with VERITAS NetBackup Backup-to-Disk: An Overview 10