1 WHITE PAPER: DATA PROTECTION Symantec NetBackup, Cisco UCS, and VMware vsphere Joint Backup Performance Benchmark George Winter, Symantec Corporation, Roger Andersson, Cisco Systems, Inc. Paul Vasquez, VMware
2 CONTENTS 1.0 Executive overview Technology overview NetBackup 7 for VMware Cisco Systems Unified Computing System platform NetBackup 7 Media Server Deduplication Pool VMware vstorage APIs for Data Protection Benchmark configuration NetBackup 7 for VMware VMware backup host VMware configuration Performance baseline testing Baseline testing overview Benchmark results Additional performance topics Protecting multiterabyte virtual machine environments NetBackup for VMware performance advantage Backup performance impact comparison ESX server NetBackup 7 for VMware Configuration Tips Additional resources Appendix A: HD_Speed Utility Glossary
3 1.0 Executive overview Since the 6.5 release, Symantec NetBackup software has featured an award-winning backup technology designed specifically for VMware virtual machine protection. In addition to providing integrated support for virtual machine backups, NetBackup engineering also created a unique single-file restore solution. NetBackup Granular File Restore provides the unique ability to index, search for, and restore anything including a single file (Windows ), or an entire virtual machine from a single backup pass. And this is accomplished with no modifications to the existing backup environment. With the release of VMware vsphere 4, VMware has created significant improvements in the area of virtual machine data protection through the vstorage APIs for Data Protection (VADP). The NetBackup 7 release natively supports all of the new features provided by this API and provides additional patent-pending enhancements that no other backup vendor provides. We suggest that you use this benchmark as a guideline for configuring your virtual machine backup strategy. We highly recommend that you utilize the technologies that have been described in this benchmark, but whatever components you deploy, the information and methodologies described in this document can be of help as you make design decisions related to protecting your valuable virtual machine environment. 2.0 Technology overview Obtaining the best possible backup performance is always a team effort, in the sense that multiple technologies are brought together to provide optimal data protection performance. For this benchmark, Cisco, VMware, and Symantec bring together a unique group of product offerings that together provide superior backup capabilities and performance for VMware virtual machine protection. These technologies are described in detail as follows: 2.1 NetBackup 7 for VMware Since the NetBackup release, NetBackup has provided technical innovations that simplify tasks for backup and virtual machine administrators. The NetBackup 7.0 release uses features found in the 6.5.x release and builds upon them. Backup and virtual machine administrators are looking for relief from slow backups, long backup windows, and clumsy restore processes. NetBackup 7.0 can provide that relief. No hardware requirement. The vsphere 4 vstorage APIs for Data Protection have been designed so that no additional holding tank or staging is required. This also means that the concept of a backup proxy no longer applies. Virtual machine backups can be configured using standard NetBackup master server, media server, or clients. Special-purpose Backup Proxy systems designed specifically for virtual machine backups and additional staging area storage no longer need to be purchased. Incrementals by the block. Another feature of the vstorage API is changed block tracking. This is a block-level incremental backup implementation. After the initial full backup is performed, subsequent block-level incremental backups transfer to the backup system only the blocks that have changed since the previous full or incremental backup. This shortens backup windows while retaining full disaster recovery restore functionality. Single file restores no matter what. Block-level incrementals are best when restoring the entire virtual machine; however, administrators need to also address single file restore requests as well. NetBackup 7.0 extends its award-winning Granular File Restore capability to block-based incrementals (Windows). Whether the restore request is based on a full, differential, or cumulative block level backup, individual files can be instantly searched for and restored from any backup storage destination including disk, tape, VTL, or deduplication target. NetBackup never forces you to use disk as a primary backup target or to recreate or restage the virtual machine to disk before a restore. No other vendor can instantly search for an individual file through full or block-level incremental backups and then instantly restore that file from any backup destination (disk, tape, VTL, deduplication target, etc). 1
4 Most efficient backup possible. When a file is moved or deleted, only the references to that file are removed. The data inside each block that used to be associated with the deleted file still exists. NetBackup detects this, and can automatically skip these full but unused blocks (Windows). The amount of backed-up data can be reduced by 30 percent or more when compared to standard backup technologies. This can translate into backup storage (disk, tape, etc.) reductions exceeding hundreds of terabytes in many environments. An additional benefit is that this also reduces the backup window by a similar percentage as less data is backed up. 2.2 Cisco Systems Unified Computing System platform The Cisco Unified Computing System (UCS) is a next-generation data center platform that unites compute, network, and storage access. The platform, optimized for (but not limited to) virtual environments, is designed within open industry standard technologies, and aims to reduce total cost of ownership (TCO) and increase business agility. The system integrates a low-latency, lossless 10 GB Ethernet unified network fabric with enterprise-class, x86-architecture servers. The system is an integrated, scalable, multi-chassis platform in which all resources participate in a unified management domain. Cisco s UCS and Fibre Channel over Ethernet (FCoE) technologies (Figure 1) are the backbone of the virtual infrastructure, providing a data center architecture for administrators that is easy to use and manage. Figure 1. Cisco Unified Computing System (UCS) Modern CPUs with built-in memory controllers support a limited number of memory channels and slots per CPU. The need for virtualization software to run multiple OS instances demands large amounts of memory and that, combined with the fact that CPU performance is outstripping memory performance, can lead to memory bottlenecks. Even some traditional nonvirtualized applications demand large amounts of main memory. Database management system performance can be improved dramatically by caching database tables in memory, and modeling and simulation software can benefit from caching more of the problem state in memory. To obtain a larger memory footprint, most IT organizations are forced to upgrade to larger, more expensive four-socket servers. CPUs that can support four-socket configurations are typically more expensive, require more power, and entail higher licensing costs. 2
5 Cisco Extended Memory Technology expands the capabilities of CPU-based memory controllers by logically changing the geometry of main memory while still using standard DDR3 memory. The technology makes every four DIMM slots in the expanded memory blade server appear to the CPU s memory controller as a single DIMM that is four times the size. For example, using standard DDR3 DIMMs, the technology makes four 8-GB DIMMS appear as a single 32-GB DIMM. This patented technology allows the CPU to access more industry-standard memory than ever before in a two-socket server: For memory-intensive environments (for example, backup deduplication targets), data centers can better balance the ratio of CPU power to memory and install larger amounts of memory without having the expense and energy waste of moving to four-socket servers simply to have a larger memory capacity. With a larger main-memory footprint, CPU utilization can improve because of fewer disk waits on page-in and other I/O operations, making more effective use of capital investments and more conservative use of energy. For environments that need significant amounts of main memory but which do not need a full 384 GB, smaller-sized DIMMs can be used in place of 8 GB DIMMs, with resulting cost savings; four 2 GB DIMMS are typically less expensive than one 8 GB DIMM. Deduplication technologies, such as the embedded Media Server Deduplication Pool (MSDP) in NetBackup, can use this extra CPU power and memory capacity to extend deduplication efficiencies and backup performance. 2.3 NetBackup 7 Media Server Deduplication Pool NetBackup 7 integrates the deduplication capabilities of Symantec NetBackup PureDisk software with the standard NetBackup media server to provide a single point solution for both regular and deduplicated backups. A media server enabled with this feature is called an MSDP, referring to the pool of deduplicated data it is hosting locally on disk. No longer does one require a dedicated server or appliance to host and deduplicate backup data. As well as acting as a media server in its own right, an MSDP media server can also act as a target for other media servers, in the same way that a PureDisk server can with the PureDisk deduplication option. Deduplicate everything. Deduplication technologies, typically look at files and search for blocks or segments within those files that have previously been backed up. These same deduplication technologies take the same approach when deduplicating virtual machine vmdk files. They treat these typically very large files like any other files, and simply try to find blocks that match other blocks that have already been backed up. But these vmdk files are not just simple files. They are virtual machine containers that hold OS files, such as Word docs or Excel spreadsheets. The key is understanding exactly what is inside these vmdk container files. NetBackup uses its patent-pending Granular File Restore technology to look inside these vmdk files. NetBackup then takes this information and deduplicates files that exist inside a vmdk file against files that exist on a physical system. This means that for the first time, a deduplication technology can actually deduplicate data from the virtual world to the physical world. Global storage savings of 20x times or more. Backups are the perfect fit for deduplication, as each new backup of the same file system or application protects a largely unchanged data set, leading to a lot of redundant data. Deploying deduplication, with its ability to identify data segments that are common between backup sets and save only one copy of each segment, leads to huge storage savings. Tests have shown that more than 20x storage savings can be achieved when protecting unstructured data with deduplication. Higher backup speeds. As target deduplication removes redundant data inline before storing to disk, less data needs to be moved and stored to the final target destination. As a result, this reduction in I/O movement increases the backup speeds an average of 30 percent, and some cases even doubles the backup speed. 3
6 Reduced cost by leveraging commodity servers and commodity disk. While most deduplication solutions require dedicated hardware and an extra layer of management, NetBackup 7.0 breaks those boundaries by delivering built-in deduplication in the media server. Bundled in the same installation package as NetBackup 7, this deduplication delivers the simplicity and flexibility a backup administrator is looking for. With this ready-to-go configuration, customers can activate this option at any given time and have the choice to use cheaper storage and servers, again reducing the overall infrastructure cost. Customers with a mixed customer base, can easily activate the deduplication server on a Windows, Oracle Solaris or Red Hat Enterprise Linux media server, without the requirement to maintain a new platform. Faster restore process. As deduplication supports more versions on disk, customers can easily reduce or even replace their daily tape backups by extending the retention backups to disk. Having more versions available online speeds up the restore process and increases the restore success rate. More flexibility in mixed environments. Customers have the flexibility to install media server deduplication pools and standalone PureDisk pools in the same NetBackup domain. Both types of pools are presented as a storage server with storage units, keeping management transparent and consistent. One media server can contribute to the deduplication process and reroute data to any MSDP and PureDisk pool simultaneously, leading to a flexible architecture. 2.4 VMware vstorage APIs for Data Protection The VMware vsphere 4.0 release is a significant departure from the previous Virtual Infrastructure 3 release. There are an abundance of new features in vsphere 4. Fortunately for backup administrators, many of these enhancements are related to virtual machine protection. These improvements are delivered through the vstorage APIs for Data Protection that VMware has developed specifically for virtual machine backup. This API allows NetBackup to provide direct integration of all the backup functionalities that vsphere 4 provides. The vstorage APIs for Data Protection completely replace VMware Consolidated Backup. While the VMware Consolidated Backup style of backups are still supported, this is strictly for backward compatibility. Both ESX 3 and ESX 4 systems can be protected with the vstorage APIs for Data Protection. The vstorage APIs for Data Protection has additional architectural advantages as follows: No staging area required. NetBackup 7.0 no longer requires any disk for a staging area or holding tank. With NetBackup 7.0, the backup data stream is direct from the source ESX datastore to any destination storage unit type that NetBackup supports including disk, tape, VTL, or deduplication target (including the new NetBackup Media Server Deduplication). By eliminating the requirement for a staging area (or holding tank), NetBackup significantly improves backup performance because all backed up data no longer needs to be temporarily staged to disk. This applies to both ESX 3 and ESX 4 (vsphere) systems. Enhanced incremental backup technology. Another exciting feature of the vstorage APIs for Data Protection is related to incremental backups. NetBackup provided a file-level incremental backup, but NetBackup 7.0 features a much faster and efficient block-level incremental backup technology. This means that a virtual machine running on an ESX 4 server can be completely protected by backing up only the blocks that have changed since previous incremental or full backups. Better single file restores? You bet! VMware Consolidated Backup did not support an incremental backup technology where the entire virtual machine could be automatically restored from both full and incremental (vmdk) backups to a specific point in time. The vstorage API now provides this ability and any vendor can enable this. Not only does NetBackup 7 fully support this capability, but also has taken this a step further by uniquely restoring individual files (Windows only) directly from an incremental backup without the need to reconstruct the entire virtual machine. These single files are indexed and searchable as well. While all vendors will be able to utilize VMware s changed block tracking technology, only NetBackup will be able to restore individual files and/or the entire virtual machine directly from a block-level incremental backup. One hundred percent of this data is indexed and any file or virtual machine can be searched for and instantly found and restored without having to first restore or restage the entire virtual machine to disk. 4
7 3.0 Benchmark configuration One of the most important aspects of maximizing backup performance is paying special attention to software and hardware component selection. Although you may be familiar and comfortable with older operating systems, newer versions typically provide performance and scalability improvements. For example, the difference between a 32- and 64-bit OS can be very important if a larger memory footprint is desired. Selecting the proper hardware for maximum performance is equally important. When deciding what hardware to deploy in the backup environment, the components that are involved in the backup path should be chosen with performance in mind. Any component in the backup path that is inherently slow or performing poorly will negatively impact the overall backup performance. The following sections describe the hardware and software environment selected for this benchmark (Figure 2). When selecting this hardware, special care was taken to create a balance between hardware costs and performance. In addition to describing these components, we also provide comments and suggestions related to component selection and overall methods for improving backup performance. Taken together, this information can potentially enhance overall backup performance and in turn reduce hardware requirements and expenses, shorten backup windows, and reduce backup impact on your virtual machines. Figure 2. Benchmark configuration 3.1 NetBackup 7 for VMware With NetBackup 7, the system responsible for virtual machine backups is now termed the VMware backup host. A dedicated (that is, Backup Proxy) system no longer needs to be allocated to support VMware backups. Any new or existing backup system running a supported Windows platform can be designated as a VMware backup host. The VMware Backup host can also be configured as a NetBackup master server, media server, or even a client. This offers considerable flexibility in configuring your VMware backups, and also can reduce additional licensing and/or hardware costs. For this benchmark, a single NetBackup system configured as a master/media server was defined as the VMware backup host. In larger environments, it is common for the NetBackup master and media server(s) to be configured on separate hosts. NetBackup 7 for VMware also provides new features that by themselves can dramatically reduce backup times. VMware vsphere provides a true incremental backup technology through its changed block tracking (CBT) feature. NetBackup fully supports this and implements it as Block Level Incremental Backups (BLIB). NetBackup takes this a step further and provides the ability to restore single files directly from any full or BLIB style backup on Windows virtual machines without ever having 5
8 to restage anything to disk or recombine incremental backups to create a full VIRTUAL MACHINE image. This means that incremental backups can be more commonplace, which reduces the amount of data being backed up without any loss of restore (single file or entire VIRTUAL MACHINE) options. Backing up less data means improving overall backup performance and shortening backup windows. NetBackup Setting Value Notes NetBackup Version 7.0 GA Number of NetBackup Policies 4 Policies aligned with ESX datastore Limit Jobs Per Policy 8 Max Concurrent Jobs 32 NUMBER_DATA_BUFFERS_DISK 512 SIZE_DATA_BUFFERS_DISK Table 1. NetBackup 7 for VMware settings The NetBackup configuration settings used in this benchmark are listed in Table 1. Because there were four ESX server/ datastore pairs, four NetBackup policies were configured one policy per ESX server. This allowed us to limit the number of simultaneous backups that occurred against each ESX server. Using this method, the backup I/O load on each ESX datastore was similar, while backup performance and reliability was optimized. Multiple test runs were made to fine-tune NetBackup buffer settings. The resulting buffer settings that we used maximized disk storage unit performance. NetBackup storage unit In NetBackup terminology, the storage unit is the designation of a backup destination. A NetBackup storage unit can be configured as disk, tape, virtual tape library (VTL) or a deduplication target. This benchmark testing used two types of storage units a basic disk storage unit and a deduplication (MSDP) storage unit. The LUN used for the storage unit was configured on a simple disk partition built on top of two RAID 5 LUNs, mirrored within Windows. Two 4 GB Fibre connections to this disk were configured in conjunction with multi-pathing software that provided the same theoretical performance throughput of a single 8 GB Fibre connection. This mirrored RAID 5 LUN was used for both the standard disk storage as well as the deduplication storage unit. 3.2 VMware backup host The VMware backup host is easily the single most important component within the backup environment. Generally speaking, backup systems are I/O machines. They tend to be responsible for large amounts of I/O. For this reason, we recommend that special focus be placed on the I/O capabilities of the VMware backup host. This benchmark also features backup deduplication. Deduplication relies heavily on both memory (RAM) and CPU resources. For this reason, we extend our VMware backup host system selection criteria to include RAM and CPU as important considerations. These attributes are addressed below. I/O capacity. There are two key areas that influence I/O capacity. The first is the internal bus structure of the VMware backup host. Many computing platforms still offer an internal Front Side Bus architecture. This architecture has been around for some time but has been replaced by newer internal bus structures such as Intel s QuickPath Interconnect (QPI) or AMD s HyperTransport. The Cisco UCS architecture supports QPI. Regardless of which technology you use or have available to you, consider that many computing platforms you may have at your disposal may not be designed or optimized for I/O. 6
9 Another aspect of I/O is the ability to connect to the backup environment through host bus adapters. I/O slot capacity is key. We recommend that each I/O source and destination be configured on separate host bus adapters (HBAs) if possible. This includes network backup traffic, as this is essentially I/O as well. For example, the connection between the backup source (in this case the ESX datastore) and the backup destination (for example, disk or tape) should not be shared. This isolates the backup traffic to separate HBAs that in turn can improve performance. Also consider that additional I/O slots that might empty and are not needed today can be used in the future when your backup needs expand. CPU. Fast, multi-core CPUs are commonplace today. Today s CPUs are so powerful that for traditional backups it is not uncommon for the backup system CPU to be a bit underutilized. But deduplication technologies have changed this significantly. Deduplication relies heavily on CPU power to compare segments (or blocks) of data to determine if they have been previously backed up or if they are unique. More and faster CPUs can improve overall deduplication performance, which in turn improves backup performance. Once again, expandability should also be an important consideration. The ability to add CPU capacity on demand can future-proof your backup system, delaying the need to upgrade your backup system. Memory (RAM). Deduplication technologies are particularly suited to take advantage of large amounts of RAM. Before backup data is committed to disk, it is compared with data that has been previously backed up. This comparison process is performed in RAM, instead of constantly comparing backup data that exists on disk. This significantly speeds up the deduplication process and enhances deduplication efficiently; however, it tends to require a lot of RAM. If deduplication is to be used, we recommend a system that has the capacity to support at least 32 GB of RAM. This system should also have the capacity to expand well beyond this. Once again, this expansion capability can future-proof the backup system to scale to much larger deduplication environments. VMware backup host configuration Cisco System s UCS platform was selected as the VMware backup host. These systems are ideally suited for this benchmark, as we focus on virtual machine backup performance using traditional backup destinations as well as deduplicated backup destinations. Cisco UCS computers excel in all important performance metrics. The UCS platform features Intel s state of the art QuickPath Interconnect technology for fast internal data transfers between critical internal system components. This provides extremely fast I/O capabilities. UCS systems can also expand to as many as 32 cores and up to 384 GB of RAM. The configuration used for the VMware Backup host is listed (see Table 2). Component Description Notes NetBackup VMware Backup Host Cisco UCS B200-M1 Operating System Windows Server 2008, SP 2 64-bit Processors Xeon x GHz CPU 2 Sockets, 4 Cores/Socket RAM 48 GB Unified Network/FCOE CNA Cisco UCS M81KR Virtual Interface Card Two dual-port cards Table 2. NetBackup for VMware backup host configuration 3.3 VMware configuration A complete VMware vsphere 4.0 cloud computing environment was used for this benchmark. This included a vcenter server as well as 4 separate ESX 4.0 systems. The specific VMware components are described as follows: vcenter Server. The vcenter server is a standard vcenter system running on Windows 2008 (64 bit). ESX 4. Each ESX system was configured exactly the same. Each was running ESX 4.0 and housed 23 virtual machines. The ESX datastore was configured on top of RAID 5 LUNs. 7
10 Virtual machines. A total of 92 virtual machines were used as backup targets. The virtual machines were running Windows, had an average size of 43 GB of data, and approximately 100,000 files per virtual machine (see Table 3). ESX Server Component Description Notes Cisco UCS B200-M1 Operating System ESX 4.0.0, Build Patch ESX Processors Xeon x GHz CPU 2 Sockets, 4 Cores/Socket RAM Unified Network/FCOE CNA 24 GB Cisco CNA M71KR-C (AKA Palo or Cisco Virtualization Interface Adapter) One per ESX server Virtual Machine OS Windows 2008 Avg Size = 43 GB Number of VMs per ESX 23 Total number of VMs = 92 Number of files per VM ~100,000 Table 3. ESX server configuration 4.0 Performance baseline testing No matter how fast their backups are, most backup administrators always want just a little bit more speed. Baseline performance testing is arguably the most important step in the process of optimizing the backup performance of your environment. Baseline testing determines the performance characteristics or performance baseline of the hardware environment before any backup software is even installed. This allows us to find and fix any performance issues or bottlenecks before actual backup testing is initiated. We can also record and use this information in the future to determine if any performance degradation has occurred within the environment. The key is to make sure that no matter where the performance bottleneck is, we understand why it is performing as measured, and ensure that it is performing optimally. A common mistake is to configure a backup environment, test backups, and complain that backups are slow without understanding what performance is actually obtainable. During the course of this benchmark, much time was spent testing and tuning the environment. After the Fibre, network, and storage environment was initially configured, few of the components initially performed as expected. Before any backup software was configured or any virtual machines installed, a complete and thorough testing regiment was performed, to ensure that the maximum backup performance could be achieved. 4.1 Baseline testing overview This section details the process we used to tune our virtual machine backup environment. The baseline performance testing we performed was designed to involve as few components as possible. For each test we were able to limit the number of devices that were tested and in turn could be causing performance issues. Once a performance issue was encountered, we were able to quickly isolate and fix it without having to guess where the performance problem exists among dozens of devices. We broke down the baseline testing into three distinct areas: Read performance from each ESX datastore. Disk storage unit read and write performance. Base NetBackup buffer tuning. These three tests were defined to (1) simulate the I/O path that data takes during the backup process, and (2) limit the number of components involved in each test. 8
11 ESX datastore read performance During the NetBackup 7 for VMware backup process, virtual machine data is read from the ESX datastore and sent to the destination storage unit. As part of the backup process, we are only concerned about how fast we can read data from the datastore. Because the datastore is typically designed and configured separately from the backup environment, the backup administrator has little to say regarding how it might be configured. Yet it is still important to understand the performance capabilities of each datastore, so that backup performance expectations can be properly set. The HD_Speed (see Appendix A) utility can be helpful in this area. The HD_Speed utility is non-destructive when performing I/O read tests. During this baseline testing, the HD_Speed utility is used to determine the read I/O capacity of each of the ESX datastores separately. This test simulates the process of reading or pulling the virtual machine files from the datastore, which is common during the NetBackup 7 for VMware backup process. This test is performed from the VMware backup host. Not only does this test stress the ESX datastore, but it also stresses the FC environment, the storage adapters, and drivers. Any one of these components could be responsible for performance problems. While this test doesn t exactly match the I/O pattern that would occur during VMware vstorage APIs for Data Protection backups, it does provide a reasonably accurate idea of the read performance of each datastore. Figure 3 shows the performance measurements obtained from each datastore. Figure 3. ESX datastore read performance NetBackup disk storage unit performance Once we read virtual machine data off of the datastore, we write it to our ultimate destination. Although NetBackup 7 for VMware supports a broad array of backup destinations (for example disk, tape, VTL), for this testing we utilized basic disk. From the VMware backup host, we ran the HD_SPEED utility to determine read performance. These read performance tests are nondestructive. 9
12 Figure 4. NetBackup disk storage unit performance Write performance tests using the HD_SPEED utility are destructive, so care must be used when performing these tests. These tests were run before any valid data was placed on the storage unit. As I/O write operations tend to be more expensive and in turn slower than read operations, we expect the results of these write tests to be less than the read tests. It should also be understood that this disk storage unit will be serving a dual purpose. In addition to being used as a basic disk storage unit, it will also be used as a deduplication target, though never at the same time. This is another reason why performance numbers are important (see Figure 4). NetBackup tuning Now that we have tested and fixed any performance issues related to hardware, we can focus on software. NetBackup provides the ability to fine-tune internal buffers to improve backup performance. These buffers are designed to improve the speed that data can be written to storage units. As our tests will exclusively involve disk, we will fine-tune the NetBackup disk buffers. This is simply an iterative process that involves creating a simple NetBackup policy that writes backed up data to our disk storage unit. There are just two buffer settings that are involved, so the process does not take a long time. The results of this process were noted earlier in Table 1. There is one caveat about these results: While these buffer settings allowed us to obtain maximum performance from our environment, do not assume that these numbers will be optimal for your environment. We highly recommend that you perform your own testing so that you can obtain settings that provide the best possible NetBackup performance. 5.0 Benchmark results VMware s vstorage APIs for Data Protection has significantly changed the virtual machine backup process. Instead of first staging data to disk, virtual machine backup data can now be sent directly to its final backup destination. By itself, this change in backup processing has the potential to improve virtual machine backup performance significantly. NetBackup 7 for VMware uses a combination of technologies from VMware and Cisco to provide backup performance improvements that are considerable. These technologies include: NetBackup 7 for VMware Granular File Restore NetBackup 7 Media Server Deduplication 10
13 VMware vstorage APIs for Data Protection VMware vsphere s changed block tracking feature Cisco UCS performance and scalability enhancements In the following sections, we will discuss basic benchmark results but will also discuss variations related to how these technologies might be implemented. Performance results: basic disk storage unit Using the configuration described in section 3, we were able to achieve a sustained backup performance rate as follows: Basic disk storage unit backup performance = 450 MB/sec This performance is a significant improvement over the VMware Consolidated Backup technology. We previously benchmarked VMware Consolidated Backup using NetBackup and obtained a backup rate of 63 MB/sec. As you can see, with NetBackup 7, we were able to achieve over 700 percent performance improvement. Let s compare this in practical terms. If we assume that the average size of your VIRTUAL MACHINE is 40 GB, VMware Consolidated Backup could protect 56 virtual machines in a 10 hour window performing full backups. As demonstrated by this benchmark, NetBackup 7 can protect more than 400 virtual machines in that same backup window. Performance results: Media Server Deduplication Pool storage unit During this testing we discovered that the NetBackup 7 integrated deduplication technology provides a significant performance enhancement that essentially changes the dynamics of virtual machine backups. The performance we achieved when using a NetBackup MSDP was even better than the performance we achieved by writing to disk. The backup rate we achieved was as follows: Media Server Deduplication Pool storage unit backup performance = 600 MB/sec To clarify, these MSDP backups were performed using exactly the same hardware configuration as the basic disk storage backups. The FC environment and the underlying disk were all unchanged. The only difference between these two tests is that deduplication was implemented as the backup target. The results are summarized (see Figure 5). Figure 5. Performance Benchmark Results 11
14 5.1 Additional performance topics Basic performance numbers are important when determining overall throughput; however, it is also important to understand the implications of these technologies as they are applied to backups. This section provides detailed information related to a number of performance topics. Incremental backups with VMware vsphere s Changed Block Tracking Feature This is probably the most significant backup performance enhancement that VMware created with the vsphere 4 release. With VMware Consolidated Backup, incremental backups when protecting at the vmdk level were not supported. Backup administrators were required to constantly perform full backups. This increased backup windows, required more backend (backup) storage and caused additional backup impact on the virtual machines. vsphere 4 introduced changed block tracking. This mechanism keeps track of the blocks that have changed since the previous backup. This is performed at the virtual machine level, not the file system (VMFS) level. The advantage here is that with very little impact on the virtual machine (VMware claims 1 2 percent impact), true, integrated incremental backups are now a reality. NetBackup for VMware implements this CBT technology through its Block Level Incremental Backup or BLIB option. During our testing, we also tested the performance of this new incremental backup technology. To do this, we created 5 percent data change within each virtual machine and then performed an incremental backup. We were able to fully protect 92 virtual machines in only 12 minutes using this incremental technology. Incremental backup of 92 virtual machines = 12 minutes But what restore options does this provide for us? The CBT technology only provides the ability to restore all or nothing. In other words, CBT can only restore the entire virtual machine to a specific point in time. No single files (for example, Word doc) restores are possible from a CBT incremental unless the entire virtual machine is recombined from all incrementals and restaged to disk. This can be extremely time-consuming. NetBackup 7 provides a true single file restore capability from any incremental or full backup of Windows virtual machines. NetBackup uses its patent-pending Granular File Restore technology to understand which blocks are associated with a given file that has changed and includes those blocks with the incremental backup. This means that single files can be instantly found (indexed) and instantly restored regardless of what type of backup was performed. The entire virtual machine never has to be restaged to disk and the restore process is exactly the same regardless of whether the backup was a full or incremental. This is a technology that is unique to NetBackup. No other vendor is able to provide this capability. Media Server Deduplication Pool performance dynamics The NetBackup Media Server Deduplication Pool feature is designed to provide an embedded deduplication technology that is completely integrated into the NetBackup 7 release. Once you ve installed NetBackup 7, you ve already installed all of the software that is required to enable Media Server Deduplication. Media Server Deduplication is implemented by running a simple configuration wizard. The positive impact that this technology has on virtual machine backups is significant. Media Server Deduplication of virtual machine data is extremely efficient. During this testing we obtained deduplication rates as high as 98 percent. In addition, deduplication also improved overall backup performance significantly. By why does this performance improvement occur? During the backup process, virtual machine data is read off of the ESX datastore and then written or committed to the NetBackup storage unit disk. As was illustrated during our baseline performance testing, the disk storage unit performed slower than the Fibre environment was capable of. We proved that we could stream data through our Fibre environment at the rate of 600 MB/sec but could only write to the disk target at the rate of 450 MB/sec. So, with backups writing to standard disk, our performance was limited by the write performance of this disk. 12
15 When we implemented the MSDP based storage unit, we found that the backup performance increased to the performance limit of the Fibre environment to 600 MB/sec. But why did this improve? In both cases we were using exactly the same disk. We can understand this better by looking at the backup data path. NetBackup uses the vstorage APIs for Data Protection to stream data off of the ESX storage through the NetBackup VMware backup host to a storage unit. When a basic storage unit is used, all of the data must be written to the storage unit (in our case, disk). But when an MSDP storage unit is used, before the data is committed to disk, it is compared in memory (RAM) with other data that is already stored in the deduplication pool and most of the data is not unique (this is why the CPU and RAM performance characteristics of the Cisco UCS system are important). Therefore, with Media Server Deduplication, the process of writing or committing 100 percent of the data to disk is no longer required. With the high levels of deduplication that MSDP provides, most of the data can be skipped and the amount of I/O to disk that is involved is much less. It is this process that changes the performance bottleneck from the disk storage unit to the connectivity technology; in this case, Fibre. Deduplication changes the basic dynamics of this backup process, which means that less disk and less expensive (slower performing) disk can be used without impacting overall backup performance. Granular File Restore impact The NetBackup Granular File Restore technology is very attractive to backup administrators as it provides a single file restore capability for both full and incremental Windows virtual machine backups. But at what price? What is the performance impact when this feature is enabled? The backup environment was as follows: Number of virtual machines = 92 Average size of the virtual machines = 43 GB Number of files in each virtual machine = 100,000 Total number of files (92 VMs) = 9.2 million We tested this by running two backup runs. The first with the Granular File Restore feature turned off and then we ran the exact same backup with the Granular File Restore feature turned on. The results were as follows: Backup time impact of the Granular File Restore technology = 5 minutes This means that after indexing more than 9 million files (this enables instant single file search and restore), the cost to the overall backup process was an extremely small amount of time. Why is this important? All other backup vendors index nothing within the virtual machine or have implemented slow, disk intensive workarounds for the indexing process. The NetBackup 7 for VMware indexing process is designed into the backup process. It is enabled via a simple checkbox and no post-backup processing is ever required. Network based transfers Much of the performance testing we have done focuses on Fibre based transfers. In environments where Fibre is not deployed, it is important to understand the performance characteristics when the backup data path is over the network. We tested this by using the exact same configuration settings as with Fibre based backups but we directed backup traffic over the network. In this case we essentially obtained very similar results when compared to Fibre based backups: Backup performance over Fibre (san transfer) to basic disk storage unit = 450 MB/sec Backup performance over network (nbd transfer) to basic disk storage unit = 436 MB/sec As you can see, the performance is nearly identical. The Cisco UCS system features a capability known as the unified network fabric, which allows multiple types of traffic over a single physical Ethernet network adapter. This adapter can carry both LAN and SAN traffic on the same cable. Our environment was a 10 GbE environment, which enabled these fast transfers. 13
16 6.0 Protecting multiterabyte virtual machine environments Up to this point we ve discussed performance and speeds and feeds. Lets take a look at how these new technologies can be implemented to effectively provide a virtual machine protection solution that ensures that every virtual machine is backed up at least once every 24 hours, efficiently uses the least amount of disk resources possible, and is able to restore either a single file or the entire virtual machine. For the purposes of this exercise we will make the following assumptions: Backups are performed by a single VMware backup host. Backup target is configure using MSDP. Backups are scheduled on a two-week rotation period. In other words, the time between full backups is two weeks. Full backups are only performed during the weekend and have a backup window of 60 hours. Incremental backups are performed during the week and have a backup window of 10 hours. The average amount of data that changes between backups is 5 percent. The average virtual machine size is 40 GB. Using these constraints and the performance numbers collected in this benchmark we can calculate how much virtual machine data can be backed up during the weekend full backups: Backup performance during full = 600 MB/sec = 2.16 TB/hour Now we calculate the amount of raw virtual machine data that can be protected in the 60-hour backup window that is designated for full backups only: (hourly backup rate) x (60 hours) = TB of virtual machine data Here we translate this backup rate into the number of virtual machines that can be protected using that backup throughput number: (129.6 TB of VM data) / (40 GB avg VM size) = 3,240 protected virtual machines Next we determine how many virtual machines can be protected with incremental backups. Here we use the benchmark testing that indicated we can incrementally protect 92 virtual machines in 12 minutes (assuming 5 percent data change) which translates into 460 virtual machines backed up per hour. (460 VMs per hour) x (10 hours) = 4,600 VMs In this example we can protect 3,240 virtual machines during full backups and 4,800 virtual machines during the increment backup window. By using the smaller of these two numbers we can determine the ultimate capacity of the VMware backup host. This indicates that a single VMware backup host can be configured to protect more than 3,000 virtual machines! Single VMware Backup Host Capacity = 3,240 virtual machines Figure 6. Total number of VMs protected with full backups 14
17 7.0 NetBackup for VMware performance advantage Up to this point we ve focused on backup performance. In this section we will consider the impact that backups have on the ESX server itself. By definition, ESX servers are almost always busy. It is not uncommon to see ESX servers hosting 40 or even 50 virtual machines. VMware s technology is extremely efficient at using existing physical resources in support of virtual machines but at some point these resources can become scarce. Once the ESX server is fully loaded with virtual machines and extremely busy we still need to protect and back up all these virtual machines. But how can this be accomplished on a busy ESX server? This is the advantage of VMware s vstorage APIs for Data Protection. The vstorage APIs for Data Protection is an off-host backup technology. It is designed to remove nearly all the backup processing load from the ESX server. The backup load does not magically disappear but is removed from the ESX server and placed on the NetBackup VMware backup host when the ESX datastore is configured in a shared (SAN, iscsi) storage environment. To illustrate how much load is actually placed on the ESX server during client-based backups, we performed two backup load tests. The first backup test was run using standard clients inside each virtual machine. The second test used the vstorage APIs for Data Protection in a shared storage environment. During backups we measured the load on the ESX server. The results are discussed in the following sections. 7.1 Backup performance impact comparison ESX server This first test was performed by backing up the virtual machines with a standard client placed inside each virtual machine. We measured the impact on both the CPU and the ESX datastore as reported by the vcenter server. In both tests, a full backup of all 92 virtual machines was performed. We measured both the CPU load as well as the I/O (disk) load on the ESX server. Test 1. Client Backup Figure 7 and Figure 8 show the CPU and disk load that occur on the ESX server during standard client-based backups. The entire backup load must be shouldered by the ESX server itself. This backup processing load impacts every virtual machine hosted on this ESX server. It also takes longer to perform this style of backup. This backup run was characterized as follows: Number of VMs backed up: 23 Backup time: 57 minutes Figure 7. ESX CPU Load Client Backup Figure 8. ESK Disk Load Client Backup Test 2. NetBackup 7 for VMware Backup For this second test, we relied on NetBackup 7 s native integration with the vstorage APIs for Data Protection. We backed up exactly the same virtual machines as Test 1, but this test was an off-host backup utilizing shared storage (SAN). As can be seen in Figure 9 and Figure 10, nearly all of the backup processing was off loaded to the NetBackup VMware backup host, leaving the ESX server resources for the virtual machine environment. The backups were faster as well. It took about one-third of the amount of time to back up these virtual machines using the vstorage APIs for Data Protection. 15
18 Number of VMs backed up: 23 Backup time: 21 minutes Figure 9. ESX CPU Load NBU for CVMware Figure 10. ESK Disk Load NBU 7 for VMware 8.0 NetBackup 7 for VMware Configuration Tips Up to this point, we have focused primarily on the hardware aspects of virtual machine backups. Proper configuration of NetBackup can also help contribute to the fastest possible backups. As mentioned in Section 6, incremental backups can be an extremely effective method for minimizing the amount of data that is backed up on a daily basis. But there are other NetBackup configuration suggestion methodologies that we used during this benchmark. They include: Align backup policies with storage. This benchmark configuration had four ESX datastores. We configured four separate policies that each contained virtual machines associated with one specific datastore. In this way, we could control the number of backups that occur against any datastore, and in turn, minimize that I/O impact for every datastore. Limit simultaneous backups. During this testing, using NetBackup 7 for VMware we were able to simultaneously back up 15 virtual machines on a single ESX server. In most environments, the maximum number of simultaneous backups that should be run against each ESX server will typically be lower. The actual numbers that you should use will be determined in large part by the initial testing you performed in the performance baseline section (Section 4) of this paper. We recommend this because too many simultaneous backups can actually slow the overall backup speed. Find a number that works well in your environment, and then do not exceed that number. VMware backup host configuration. The VMware backup proxy can be configured as a NetBackup master server, media server, or enterprise client. We recommend that the VMware backup host be configured as a media server. The VMware backup host is a natural focal point of backup-related I/O. VMware backup host access to virtual machine files is typically made through a fast Fibre or iscsi connection. In most configurations, it might make sense to avoid configuring the backup proxy as a NetBackup enterprise client, because this forces all of the VIRTUAL MACHINE backup data through the NetBackup client network, which is typically a slow, shared resource. Deduplication. As shown in this benchmark, MSDP can be an extremely effective tool for reducing backup storage requirements and increasing overall backup speeds. We highly recommend the use of MSDP for improving overall backup efficiencies. 16
19 Additional resources VMware Hardware Compatibility Guide. This is a Web-based searchable guide that can provide compatibility information for systems, SAN, I/O devices, etc. VMware SAN Configuration Guide. Cisco Unified Computing System (UCS). This link provides updated information related to Cisco Unified Computing Products. Symantec NetBackup Backup Planning and Performance Tuning Guide. Provides significant detail related to NetBackup Media Server (and in turn, the Backup Proxy). Overview of support for NetBackup 7.x in virtual environments. Details all aspects of the virtual machine support that NetBackup provides. Symantec NetBackup 7.0 for VMware Guide. Administrator guide for NetBackup 7 VMware functionality. Symantec NetBackup for VMware Guide Update. Provides updated information related to the release. VMware vstorage APIs for Data Protection. The vstorage APIs for Data Protection enable backup software to perform centralized virtual machine backups without the disruption and overhead of running backup tasks from inside each virtual machine. VMware Virtualization Performance Resources. Learn more about VMware technologies designed to improve performance. 17
20 Appendix A: HD_Speed Utility Disk manufactures provide I/O performance metrics for their hard drives; however, those I/O transfer rates can be misleading. The performance figures manufacturers publish tend to be measured under perfect and controlled circumstances. When used in real operating environments, no hardware seems to achieve these results. There are a number of reasons for this. The performance of any device can be significantly impacted by the environment in which it is placed. Factors that can impact performance include connection type (ide/scsi/sata), disk and host controller cache, OS type, file system type, internal bus on host, etc. Instead of relying on manufacturers, published performance figures, our goal was to accurately determine what real-life performance we could expect from our I/O devices. To that end, we utilized a free utility called HD_Speed. HD_Speed Utility is an I/O performance testing utility that is available here: HD_Speed is small (< 100 KB) and only runs on Windows. It measures both sustained and burst data transfer rates of disk drives. It also provides a real-time graphic display of the results: Before performing any tests in this benchmark, every I/O subsystem was thoroughly stress-tested using HD_Speed. During this testing, performance issues and misconfigured hardware were found and corrected. Without having first tested these devices, subsequent performance issues would have occurred, and it would have been difficult to quickly and accurately determine the cause. Another utility that provides I/O performance information is Iometer (http://www.iometer.org). Iometer was originally created by Intel Corporation, and is now made available via SourceForge.net. Iometer supports many operating systems outside of Windows. WARNING: Regardless of which utility is used, we recommend that I/O testing be performed on non- or pre-production systems. Testing I/O writes with HD_Speed is a destructive test. Any existing data on disk will be destroyed when I/O writes are tested using HD_Speed. Before running any tests, make sure you understand the implications of those tests, and take the necessary steps to ensure that valuable data is not destroyed. 18
An Oracle Technical White Paper May 2011 Oracle Optimized Solution for Enterprise Cloud Infrastructure Introduction... 1 Overview of the Oracle Optimized Solution for Enterprise Cloud Infrastructure...
EMC SOLUTIONS FOR MICROSOFT SQL SERVER WITH EMC VNX SERIES EMC Solutions Group Abstract This document describes various best practices for deploying Microsoft SQL Server with EMC VNX series storage arrays.
Best Practices for Virtualizing and Managing SQL Server v1.0 May 2013 Best Practices for Virtualizing and Managing SQL Server 2012 1 1 Copyright Information 2013 Microsoft Corporation. All rights reserved.
TECHNICAL BRIEF: SYMANTEC NETBACKUP 7.5 TECHNICAL BRIEF........................................ Symantec NetBackup 7.5 Technical Brief Who should read this paper This document is intended for backup administrators
SQL Server Data Warehouse Fast Track for Tegile 20 TB Certified Data Warehouse Reference Architecture Installation and Configuration Guide 5U Design: Featuring Tegile Zebi HA2400 Storage Array November
EMC Avamar Backup Solutions for VMware ESX Server on Celerra NS Series Abstract This white paper discusses various backup options for VMware ESX Server deployed on Celerra NS Series storage using EMC Avamar
Backup and recovery best practices for Microsoft SQL Server 2005 Overview................................ 3 Solution configuration........................... 4 SQL 2005 database servers......................
Proven Infrastructure Guide EMC VSPEX PRIVATE CLOUD VMware vsphere 5.5 for up to 1,000 Virtual Machines Enabled by Microsoft Windows Server 2012 R2, EMC VNX Series, and EMC Powered Backup EMC VSPEX Abstract
White Paper SQL Server Consolidation on VMware Using Cisco Unified Computing System White Paper December 2011 Contents Executive Summary... 3 Introduction... 3 Audience and Scope... 4 Today s Challenges...
Migration Planning Kit Microsoft Windows Server 2003 This educational kit is intended for IT administrators, architects, and IT managers. The kit covers the reasons and process you should consider when
This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html. VMware
Symantec NetBackup (NBU) Design Best Practices with Data Domain GlassHouse Whitepaper Introduction Written by: Brian Sakovitch and Kelley Alexander GlassHouse Technologies, Inc. Protecting the ever expanding
Microsoft System Center 2012 R2 Why Microsoft? For Virtualizing & Managing SharePoint July 2014 v1.0 2014 Microsoft Corporation. All rights reserved. This document is provided as-is. Information and views
HP B6200 Backup System Recommended Configuration Guidelines Introduction... 3 Purpose of this guide... 4 Executive summary... 4 Challenges in Enterprise Data Protection... 4 A summary of HP B6200 Backup
XenApp on VMware: This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html.
This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html. VMware
Dell EqualLogic Best Practices Series Sizing and Best Practices for Deploying Citrix XenDesktop on VMware vsphere with Dell EqualLogic Storage A Dell Technical Whitepaper Storage Infrastructure and Solutions
WHITE PAPER VERITAS NetBackup Technical Overview VERITAS NETBACKUP TECHNICAL OVERVIEW 1 TABLE OF CONTENTS VERITAS NetBackup Technical Overview...1 Product Overview...4 Key Features of NetBackup...4 NetBackup
NDMP Backup of Dell EqualLogic FS Series NAS using CommVault Simpana A Dell EqualLogic Reference Architecture Dell Storage Engineering June 2013 Revisions Date January 2013 June 2013 Description Initial
White Paper MICROSOFT EXCHANGE 2010 STORAGE BEST PRACTICES AND DESIGN GUIDELINES FOR EMC STORAGE EMC Solutions Group Abstract Microsoft Exchange has rapidly become the choice of messaging for many businesses,
WHITE PAPER Why AFA Architecture Matters as Enterprises Pursue Dense Mixed Workload Consolidation Sponsored by: Violin Memory Eric Burgener July 2015 IDC OPINION All flash arrays (AFAs) have proven themselves
Providing High Availability and Disaster Recovery in a Multi-Site Virtual Environment End to End Solution Enabled by Microsoft Virtualization, HP P4000 iscsi SAN Solution, and Citrix Essentials for Microsoft
IT@Intel White Paper Intel IT IT Best Practices Private Cloud and Cloud Architecture December 2011 Best Practices for Building an Enterprise Private Cloud Executive Overview As we begin the final phases
Introduction to InfiniBand for End Users Industry-Standard Value and Performance for High Performance Computing and the Enterprise Paul Grun InfiniBand Trade Association INTRO TO INFINIBAND FOR END USERS
White Paper Backup & Recovery for VMware Environments with Avamar 7 A Detailed Review Abstract With the ever increasing pace of virtual environments deployed in the enterprise cloud, the requirements for
HP StoreOnce Catalyst and HP Data Protector 7 Implementation and Best Practice Guide Release 1 Executive Summary This guide is intended to enable the reader to understand the basic technology of HP StoreOnce
The Definitive Guide To tm Building Highly Scalable Enterprise File Serving Solutions Chris Wolf Chapter 5: Building High-Performance, Scalable, and Resilient Linux File-Serving Solutions...87 Challenges