openbench Labs Executive Briefing: January 19, 2012 Comparison of Host-Level vs. VM-Level Backup with AppAssure Smart Agents Optimizing Hypervisor-Independent VM Data Protection
Executive Report: Jack Fegreus January 25, 2012 Comparison of Host-Level vs. VM-Level Backup with AppAssure Smart Agents Optimizing Hypervisor-Independent VM Data Protection ROADMAP TO THE VIRTUAL ENTERPRISE For IT managers, data centers are becoming more difficult to manage and protect as more data and applications are moved into virtual environments. Adding fuel to the fire, CIOs must now deal with corporate mandates to build an IT infrastructure that scales to unknown demand levels and provides service assurance for fluctuating conditions that cannot be accurately projected. The solution requires a transition to private cloud computing characterized by a hypervisor-independent Virtual Infrastructure (VI). Unfortunately, this solution also exacerbates a classic data protection problem. The problem stems from the reliance software vendors put on client infrastructure to optimize backup performance. This is the real issue behind the Agent vs. Agentless marketecture debate not the hyperbole over licenses and management. In backup and recovery tests, openbench Labs found backup server load consistently trumped host Virtual Machine (VM) load as the critical scalability factor. What s more,host-level agentless VM backups introduced significant variability in performance between VMs configured identically on vsphere and Hyper-V hypervisors. AppAssure Smart Agent Key Findings 1) AppAssure Smart Agents were 3X faster than agentless running 3 simultaneous full backups of vsphere VMs without a SAN. 2) AppAssure Smart Agents were 34% faster than agentless running 3 simultaneous full backups of vsphere VMs with a SAN. 3) AppAssure Smart Agents were 40% faster than agentless backup solutions running 3 simultaneous full backups of Hyper-V VMs. 4) AppAssure Smart Agents were 9X faster running incremental backups on vsphere, 15X faster on Hyper-V, and provided zero-perceived recovery time on both hypervisors. 5) Legacy agents working on a file-level basis generated incremental backup filess 100 times larger than block-level backup files. 6) AppAssure Smart Agents restore data instantly (Zero RTO) without remapping data, which must be consolidated to complete the recovery. We measured no difference in VM overhead between AppAssure VM-level Smart Agents and host-level agentless software, which needs to invoke processes on VMs to collect application item data and implement VSS. Furthermore, by optimizing data transfers between clients and backup servers, Smart Agents ran incremental backups 9X faster on vsphere and 15X faster on Hyper-V. More importantly, Smart Agents provided zero-perceived recovery time on both hypervisors without remapping data, which adds I/O overhead until IT runs a consolidation process to finalize the recovery. In a vsphere VI environment, leveraging the hypervisor equates to leveraging vsphere APIs for Data Protection, which provide Changed Block Tracking (CBT) and direct SAN 02
access to ESXi datastores. AppAssure, however, takes a radically different approach. AppAssure uses a Smart Agent and filter driver between the client s file system and OS kernel to capture changes to disk blocks-on any virtual or physical client. What makes CBT so important is the ability to create incremental backup files that contain just the disk blocks that have changed since the previous backup and not complete copies of files that contain changed blocks. Not only does this speed the processing on incremental backups, it means incremental backup files are nearly free of duplicated data. By limiting data transfered in an incremental backup to just changed data blocks, CBT-based backup schemes generate incremental backup files that are a fraction of the size of backup files created using full data files containing changed blocks. In our tests legacy agents working on a file-level basis generated incremental backups that were 100 times larger than incremental backup files created with AppAssure. More importantly, this means a CBT-based backup scheme limits the need for data deduplication to just the contents of each incremental backup file. As a result, there is no need to implement a complex, CPU-intensive, data deduplication scheme that maintains a global store of unique blocks that must be checked every time a backup is processed. Using a filter driver to collect CBT data, the AppAssure client service puts negligible CPU overhead on a VM running in production. On VMs performing heavy IOPS transaction processing, we measured AppAssure overhead to be just 1%. As a result, IT We observed AppAssure client service activity while we ran an Iometer benchmark that wrote 450MB of data can readily per second 128KB writes at 3,600 IOPS. During this test, the AppAssure client service was writing 40KB of deploy data per second to a log in the System Volume directory. Meanwhile, CPU usage for the AppAssure client service AppAssure was just 0.9%. acrosss any hypervisor or physical server environments to garner the performance benefits of CBTbased backup and recovery processing. APPASSURE SMART AGENT OVERHEAD 03
AGENTLESS OVERHEAD PROCESSES Using an dedicated agentless VI backup package, it is still necessary to configure user login credentials to run VM-specific tasks, such as the Windows VSS writer for transactional consistency and content indexing services to enhance file-level restores. More importantly, backup and restore performance potential was gated by hypervisor communications and performance overhead. tables without having to recover an entire VM. Nonetheless, most vendors of data protection software designed for VI environments simply leverage the VMware file system (VMFS) CBT scheme exposed by the vstorage APIs for Data Protection. By following this approach, data protection software can treat VMs as VMs without any need for a special agent to perform a standard VM backup. Working directly with an ESXi or ESX hypervisor, however, is not enough to unlock many important advanced backup and recovery features, such as the recovery of database SMOKE MIRRORS AND AGENTLESS AGENTS In the black and white marketecture debate of agentless versus agent-based data protection, individually licensed and managed agents are the issue. In the real world, there are many shades of gray and agentless host-level backup is not the only alternative to heavy-weight legacy agents, which simply masquerade VMs as physical systems. AppAssure Smart Agents focus on adding value to any client VM or physical and do not require a license or come with a management GUI. An AppAssure Smart Agent provides a backup client with a dynamic CBT mechanism that dramatically enhances the functionality and the performance of backup and recovery processes. For CIOs who need to rigorously support a Service Level Agreement (SLA) addressing business continuity, AppAssure creates an new dynamic with respect to a Recovery Time and a Recovery Point Objective (RTO and RPO) that works the same way in every environment. To get a clear picture of the role that an AppAssure Smart Agent can play in a VM backup scenario, openbench Labs set up three VM servers on vsphere and Hyper-V. Each of the VM server ran Windows 2008 Server R2 and included a Domain Controller and a SQL Server 2008 R2 database server, For comparison, we set up agentless backups leveraging vstorage APIs for Data Protection to run over both a network and a SAN. 04
SUMMARy VM DATA PROTECTIOn PERFORMAnCE By AGEnT TyPE Data Protection Key VM Metrics Analysis AppAssure v4.7 Smart Agent Agentless Legacy Agents Host Server Overhead Backup overhead on host governs backup window options and directly impacts backup RPO. AppAssure s CBT measured less than 1% CPU utilization during an Iometer stress test. Total processing load on VMs identical with agentless testing. For dvanced object-level data protection within VMs, agentless host - level backup software still needs to launch nonpersistent processes on VMs. Masqueraded VMs as physical systems with minimal VM optimization. Backup Functionality Backup processing scalability directly impacts resource utilization and capital expenses. Multiple simultaneous backups scaled by increasing CPU resource utilization while wall clock time remained fixed. AppAssure automatically discovers protected applications and adds daily validation tests for application recovery. GUI displays the status of SQL database and Exchange mailbox tests. Any backup can be used to recover data objects. Multiple simultaneous backups scaled by increasing wall clock time by a factor of 3X, as CPU utilization remained fixed. Automatically discovered protected applications. Any backup can be used to recover data objects. Minimum time to set up and complete an incremental backup averaged 2 minutes. Global block-level deduplication dramatically increased backup and recovery process time. File-level incremental backups based on changed files consumed up to 100X the disk space of block-level incremental backups. RTO and RPO Capabilities for Business Continuity Update standby VMs OS and data with production server disk snapshots from backups Restore applications and data with no perceived delay Smart Agents restored disk volumes with no perceived delay in data delivery. Smart Agent-based incremental backups started and completed in under 15 seconds. Automatically configured and updated standby VMs, which were identical to production servers. Agentless backup enables direct recovery from backup files, which are read-only, by redirecting writes via VM snapshots and redo logs. Full recovery requires consolidation of the original and redirected data files using vmotion or a proprietary VM copy utility. Legacy agents supported all of the hot-metal recovery features on VMs that were supported by physical clients.
SMART AGENT BACKUP SCALABILITY AppAssure backups with Smart Agents on our quad-core backup server were more efficient than similar backup processes that relied on vstorage APIs without agents. A backup of a VM running SQL Server took slightly less time and dramatically less server CPU resources with Smart Agents. As a result, Smart Agents scaled three simultaneous backups in the same backup window, while an agentless backup of the same three VMs took three times the amount of time as numerous subprocesses came into play. The key question for sites considering AppAssure comes down to the value of an independent, driver-based, CBT mechanism. Performing three simultaneous full backups in an environment without a high-speed SAN present, AppAssure was 3X faster than an agentless backup that exploited the vstorage APIs. More importantly, adding an 8-Gbps SAN to give backup servers direct access to vsphere datastores via vstorage APIs did not trump the advantages provided by AppAssure s Smart Agents. Using AppAssure, three VMs could be backed up 34% faster than using agentless 06
backup with an 8Gbps SAN in place. This advantage carried over to a Hyper-V environment, where AppAssure was 40% faster, As a result, both SMB sites that cannot afford a costly SAN and large enterprise sites with extensive SAN infrastructure can leverage AppAssure to garner considerably smaller backup windows. The key backup advantage provided by AppAssure s Smart Agents, however, was not in processing of full backups with lots of data. The key to an effective disk-to-disk (D2D) incremental backup regime with automatic synthetic full backups is the ability to generate highly efficient incremental backups. AppAssure Smart Agents processed incremental backups of both vsphere and Hyper-V VMs running SQL Server in just 14 seconds and generated backup files, which contained 17MB of compressed and deduplicated data. What s more, the AppAssure Smart Agents scaled three simultaneous incremental backups in just 19 seconds. In sharp contrast, an agentless incremental backup of our SQL Server VM took 8X longer (2 minutes and 5 seconds) on vsphere and 15X longer (3 minutes and 25 seconds) on Hyper-V. The bottom line for CIOs is that Smart Agents provide unparalleled RPO and RTO for business continuity. The capability to turn around incremental backups in under one minute gives IT the flexibility to create more recovery points using Smart Agents. What s more, that same technology can then be used to complement that RPO advantage with an equally impressive RTO advantage via AppAssure s Live Recovery. It is also important to note that it was the backup server load and not the VM client load on our host servers that was the controlling factor in our scalability testing. On all of the tests, the processing load of the three combined test VMs on our ESXi host remained below 10% with agentless and smart agent processing. 5-MINUTE RPO AND RTO From the perspective of a Line of Business (LoB) executive, the value of backup and restore lies entirely in the recovery process. Their attention is focused on an aggressive Recovery Point Objective (RPO) which limits data loss and an aggressive Recovery Time Objective (RTO) which limits the time needed to complete a recovery process. AppAssure doubles down on the basic universality provided by its block-based architecture with three key technologies: Live Recovery to meet near-zero RTO and five-minute RPO; Recovery Assure to automatically verify that both a backup is recoverable and that application data within the backup is recoverable; and Universal Recovery to support the granular recovery of files and application items in virtual-to-virtual (V2V), virtualto-physical (V2P), physical-to-virtual (P2V), or physical-to-physical (P2P) scenarios. No data protection software tested by openbench Labs offers the same range of recovery options from a single incremental backup file. For improving IT s image with end users, the most magical and impressive Smart Agent technology is without doubt AppAssure Live Recovery. Live Recovery is a unique 07
volume restoration feature. During a recovery, a Smart Agent monitors block data requests from users and instructs the backup server to reorder the data being transfered to meet user requests. As a result, users perceive all data as having been restored immediately, as they have immediate access to any data on any volume. LIVE RECOVERY OPTIMIZED DATA BLOCK RESTORATION When we started a restore of a data volume, Live Recovery immediately restored the file structure data and enabled access to the volume. When we accessed any file during the restoration process, the AppAssure Core server immediately reordered the block data flow and sent the blocks that we needed. As a result we could instantaneously access any file or a even a corrupt Exchange mailbox database. different from booting a VM from a backup file. With Live Recovery, the restoration legerdemain for a logical volume starts with the transfer of the volume s Master File Table (MFT). With the MFT in place, users immediately see all of the files that were on the drive at the time of the restore point. As a result, end users have the impression that the drive has been magically restored to full operational stratus in seconds. Live Recovery is distinctly When a VM is booted from a backup file, its contents are represented by read-only pointers with no active mechanism to intelligently to handle redirection of writes. As a result, a static method, such as a VM snapshot or redo logs must be chosen. Unless specialized hardware is provided, for example a solid-state disk, only half the normal level of IOPS will be sustainable. In addition, it will also be necessary to recover the snapshot or redo log data by running vmotion or a similar proprietary function. The AppAssure instant recovery scheme is not without serious pitfalls. The most obvious question is: What happens when an end user clicks on a file that is not there? In a database-driven environment, there is the distinct risk of corrupting one or more critical internal tables or causing considerable delays and problems for mission-critical, transaction-processing applications that utilize a database. 08
APPASSURE LIVE RECOVERY WITH EXCHANGE To avoid these issues, AppAssure Live Recovery relies on critical coordination between the AppAssure Smart Agent on the client system and a Smart Service on the AppAssure Core server to intelligently coordinate and reorder I/O during a Live Recovery process. When an application attempts to access data queued on the AppAssure backup server, the requested data addresses are transferred to the backup server, which reorders the block stream queued for the client so that the requested blocks are at the top of the queue. As a result, a large volume file server or an Exchange mailbox database can be restored instantaneously from the perspective of an end users From an Exchange 2010 server, we deleted one of two mailbox database, which contained 100 email accounts. After starting Live Recovery, we launched Jetstress, which generated a message for each account every second. Never did a sampled transaction exceed the access time requirements of Jetstress. As the heavy transaction load stressed the database, AppAssure increased the throughput rate for data recovery measured in MBs per second.. During this period, the IOPS rate continuously escalated until it reached about 175 messages per second for each database. More importantly, we reached the steady state point for Jetstress, with only 60 percent of the Live Recovery process completed. To thoroughly test all of the nuances of the AppAssure recovery options, we set up an Exchange 2010 server. We placed two databases with 100 user accounts each on two separate disks. A third logical disk was used for the Exchange log files. Running Jetstress on the Exchange configuration we easily complied with a Jetstress benchmark designed to simultaneously stress 200 accounts with one email message per second. Our configuration easily passed muster with enough head room to support 50 to 100 additional simultaneously active accounts. To test Live Recovery, we deleted one of the email databases and selected a restore point corresponding to an incremental backup on our backup server. AppAssure automatically generated a synthetic backup for our restore point and began copying the database to the disk on the Exchange server. Once the restore process began, we configured and started a Jetstress benchmark that required the presence of the missing database. From the start of the test, the AppAssure client coordinated the order in which mailbox data was transfered from the AppAssure backup server. Never in the process did a transaction sampled by Jetstress exceed the stringent access time requirements of Exchange performance benchmark. What s more, as the heavy transaction load stressed the database that we were 09
recovering, AppAssure not only transferred the required data, but also increased the throughput rate for data recovery measured in MBs transferred per second. During this period of the test, the IOPS rate continuously escalated until it reached the steady state of about 175 messages per second for each of the two databases. More importantly, we reached the steady state point for Jetstress, with only 60 percent of the Live Recovery process completed. What was by far the most unexpected result of the Live Recovery email database restore test turned out to be the Jetstress performance report: Not only did we not crash or stall the test, we actually passed certification for supporting 200 simultaneously active mailboxes. Even more G iven the CBT architecture of AppAssure, we were able to immediately utilize AppAssure v4.7 in a next generation hypervisorindependent cloud environment, which increases management options and lowers administrative and labor costs dramatically. incredible, the average IOPS rate and access time overhead for the benchmark test turned out to be virtually identical to running Jetstress on our Exchange server with no other processing load. Live Recovery is just one of the ways that AppAssure works to assuage the concerns of LoB executives over business continuity. In a competitive 24x7 economic environment, computer downtime represents more than lost revenue to sales and marketing executives. These executives equate lengthy computer outages with potential losses in customer confidence and market share. As a result, senior LoB executives expect IT to meet an RTO that is measured in hours rather than days and an RPO that is close to lossless. AppAssure empowers IT to meet and exceed those stringent RPO and RTO goals. What s more, with the rich hypervisor-independent capabilities of AppAssure, CIOs can get a jump on the competition by bringing compelling innovative services to market faster and supporting them more effectively, once they are rolled out. NEXT GENERATION CLOUD COMPATIBILITY Given the CBT architecture of AppAssure, we were able to immediately utilize AppAssure v4.7 in a next generation hypervisor-independent cloud environment, which increases management options and lowers administrative and labor costs dramatically. We used the Release Candidate of Microsoft System Center Virtual Machine Manager (SCVMM) 2012 to provide a single-pane-of-glass management environment. Specifically, we were able to create multiple private clouds provisioned with VMs hosted on a mix of vsphere 5 and Hyper-V servers. In addition, SCVMM provided our Hyper-V hosts with a shared library, in which to store full VM images and virtual disk files in order to automate the creation and provisioning of VMs. As a result, IT administrators have the means to create and provision VMs from a library of disk images created to be compliant with business 10
policies. Not only does this provide a means for IT administrators to roll out new applications and services more accurately, they can do it up to 35 times faster than when working within a traditional IT environment. APPASSURE CROSS PLATFORM FUNCTIONALITY From the AppAssure console, we were able to backup and restore Widows-based VMs without regard to the underlying host supporting the VM. More importantly, we were able to utilize the Create VM wizard to create a warm standby VM on an ESXi host for any protected system, including any VM running on a Hyper-V host. We were also able to use the same wizard to export logical disks from any protected system, including any VM running on an ESXi host, as Hyper-V formatted virtual disks for use as templates in the SCVMM library. new software releases to work with both vsphere 5 and Hyper-V. In this configuration, we were able to contain the growing problem of VM sprawl by assigning VMs to distinct clouds. Each cloud represented a distinct management zone. Given the potential to further simplify cloud management with this operational paradigm, it is not surprising that many data protection software vendors are in the process of introducing Nonetheless, most of the new data protection software remains highly dependent on leveraging underlying host hypervisors to enhance performance and functionality. As a result, those packages cannot provide data protection services that integrate across heterogeneous hypervisor platforms in order to extend the hypervisor-independent paradigm. In sharp contrast to competitive offerings, from other vendors, AppAssure v4.7 Backup and Replication software extends many of its advanced functions across VMs resident on vsphere 5 and Hyper-V hosts. Specifically, we were able to leverage the low overhead of AppAssure backups to create a warm standby VM on any vsphere 5 host corresponding to any production VM on a Hyper-V host. What s more, we were able to 11
leverage the cross-platform functionality of AppAssure v4.7 with the SCVMM 2012 library to create templates from VMs running on vsphere 5 hosts and then use the templates to automate VM provisioning on Hyper-V hosts. ESXi and ESX hosts can update VMs that are not in a running state. That allows an AppAssure backup server to generate disk snapshots for a warm standby VM from incremental backups of any virtual or physical system. W hen a protected AppAssure virtual or physical client crashes, the warm standby VM can be booted in seconds and brought online processing data in minutes. A key to keeping a warm standby VM updated is the minimal amount of CPU overhead imposed on both the AppAssure client and target backup server when processing incremental updates. The entire process of sending an incremental backup from the client to processing, and saving the update on the AppAssure backup server can be repeated over intervals measured in minutes, which allows the AppAssure backup server to quickly send disk snapshots to the vsphere 5 host supporting the warm standby VM. The warm standby VM is a complete copy of the original system down to the state of the OS. The standby VM boots directly from a standard host datastore, immediately takes the identity of the original system, and exhibits full disk and network I/O performance. When a protected AppAssure virtual or physical client crashes, the warm standby VM can be booted in seconds and brought online processing data in minutes. In this way, AppAssure can be used to satisfy the most stringent business continuity SLA with respect to an aggressive Recovery Time Objective (RTO). What s more, AppAssure backups are self contained, which allows backups on one server to be replicated to another backup server in an HA scenario. Replicating the small, compressed and, deduplicated incremental backups puts minimal stress on a LAN and is very well suited to WAN infrastructure. IT garners greater efficiency by replicating incremental backups to an off-site secondary AppAssure backup server via a WAN and using the off-site server to generate disk snapshots for an off-site, warm standby VM. We were also able to leverage the AppAssure Create VM wizard to create virtual disks for use by VMs on a Hyper-V server from any backup file. Specifically, we used backups of VMs on vsphere 5 to create virtual disks and exported the virtual disks to the SCVMM library. BACKUP AND RESTORE BENCHMARK SUMMARY In all of the openbench Labs benchmark tests, we consistently measured no difference in the overall CPU utilization levels between VM with Smart Agents installed and VMs running in an agentless environment running on VMware vsphere and Microsoft Hyper-V hosts. This highlights the fact that the agentless construct of hostlevel backups is nothing more than an artifact. To process advanced application-centric functions, VM-level and host-level backups need to run processes on the VM. 12
AppAssure Smart Agent Benchmarks 1) Minimize Backup Windows on Virtual and Physical Clients: Smart Agents install a filter driver to track data block changes and accelerate data transfer. Smart Agents 9X faster than agentless on an incremental backup of a vsphere VM 17MB saved in 14 seconds vs. 24MB saved in 125 seconds. Smart Agents 15X faster than agentless on an incremental backup of a Hyper-V VM 17MB saved in 14 seconds vs. 24MB saved in 209 seconds. Smart Agents 3X faster than agentless running 3 simultaneous full backups of vsphere VMs without a SAN 23.3GB saved in 18 minutes 24 seconds vs. 30GB saved in 55 minutes 22 seconds. Smart Agents 30% faster than agentless running 3 simultaneous full backups of vsphere VMs with a SAN 23.3GB saved in 18 minutes 26 seconds vs 30GB saved in 24 minutes 41 seconds. Smart Agents 40% faster than agentless running 3 simultaneous full backups of Hyper-V VMs 27.8GB saved in 18 minutes 24 seconds vs 24.7GB saved in 25 minutes 48 seconds. 2) Zero Perceived Delay on Restores of Disk Volumes: Smart Agents reorder the transfer of blocks based on user access of files during recovery. Smart Agents recovered an Exchange mailbox database while running Jetstress Recovered a mailbox database for 100 users while processing one email transaction per second per user. The key difference lies in the ability of a host-level backup to exploit the hypervisor. For VMware, this comes down to the advantages provided by the vstorage APIs versus the advantages that Smart Agents can provide by dynamically optimizing data flow. In our tests, backup an recovery with Smart Agents generated significantly lower overhead on our backup server. As a result, we were able to scale multiple processes with Smart Agents without increasing the time needed for backup windows. Using quad-core servers, we were able to run up to three simultaneous processes in the same time frame a single process. What s more, unlike traditional legacy backup agents, AppAssure Smart Agents run entirely in the background on their hosts. There are no associated management task, monitoring, or licensing functions. Jack Fegreus is Managing Director of openbench Labs and consults through Ridgetop Research. He also contributes to InfoStor, Virtual Strategy Magazine, and Open Magazine, and serves as CTO of Strategic Communications. Previously he was Editor in Chief of Open Magazine, Data Storage, BackOffice CTO, Client/Server Today, and Digital Review. Jack also served as a consultant to Demax Software and was IT Director at Riley Stoker Corp. Jack holds a Ph.D. in Mathematics and worked on the application of computers to symbolic logic. 13