Hype Cycle for Business Continuity Management and IT Disaster Recovery Management, 2014

Size: px
Start display at page:

Download "Hype Cycle for Business Continuity Management and IT Disaster Recovery Management, 2014"


1 G Hype Cycle for Business Continuity Management and IT Disaster Recovery Management, 2014 Published: 22 July 2014 Analyst(s): John P Morency, Roberta J. Witty Increasing deployments of cloud-based solutions and the adoption of the ISO 22301:2012 BCM standard are redefining the technology and process foundations of continuity and recovery management. Use this research to determine what can best support the specific requirements of your organization. Table of Contents Strategic Planning Assumption... 3 Analysis...3 What You Need to Know...3 The Hype Cycle... 4 The Priority Matrix...10 Off the Hype Cycle On the Rise Copy Data Management...12 Data Dependency Mapping Technology Recovery Assurance...15 At the Peak...16 Cloud-Based Backup Services Disaster Recovery Service-Level Management...18 IT DRM Exercising Sliding Into the Trough...22 Mobile Satellite Services Cloud-Based Disaster Recovery Services Cloud Storage Gateway...26

2 IT Service Failover Automation for DR IT Vendor Risk Management...29 Virtual Machine Backup and Recovery...31 IT Service Dependency Mapping Public Cloud Storage...36 Hazard Risk Analysis and Communication Services Humanitarian Disaster Relief Workforce Resilience Risk Assessment for BCM Climbing the Slope BCM Planning Software...45 Crisis/Incident Management Software...48 Emergency/Mass Notification Services Ka Band Satellite Communications Appliance-Based Replication BCM Methodologies, Standards and Frameworks...56 Hosted Virtual Desktops Data Deduplication Continuous Data Protection...63 Load Forecasting...65 Lights-Out Recovery Operations Management Server Repurposing...68 IT DRM Insourcing WAN Optimization Services Bare-Metal Restore Outage Management Systems...73 Entering the Plateau Server-Based Replication Continuity...76 WAN Optimization Controllers...78 Work Area Recovery...80 Appendixes Hype Cycle Phases, Benefit Ratings and Maturity Levels Gartner Recommended Reading Page 2 of 88 Gartner, Inc. G

3 List of Tables Table 1. BCM Standards Adoption...83 Table 2. Hype Cycle Phases...85 Table 3. Benefit Ratings...86 Table 4. Maturity Levels...86 List of Figures Figure 1. Hype Cycle for Business Continuity Management and IT Disaster Recovery Management, Figure 2. Priority Matrix for Business Continuity Management and IT Disaster Recovery Management, Figure 3. Hype Cycle for Business Continuity Management and IT Disaster Recovery Management, Figure 4. Plans for BCM Certification by Standard Strategic Planning Assumption By 2019, BCM will be widely used to support strategic and operational business activities. Analysis What You Need to Know Business continuity management (BCM) is the risk management practice of coordinating, facilitating and executing activities that ensure an enterprise's effectiveness in: Identifying operational risks that can lead to business disruptions before they occur Implementing mitigation controls, disaster recovery strategies and recovery plans according to the organization's recovery requirements Responding to disruptive events (natural and man-made; accidental and intentional) in a manner that demonstrates command and control of crisis event responses by your organization Recovering and restoring mission-critical business operations after a disruptive event turns into a disaster Conducting a postmortem to improve future recovery operations Gartner, Inc. G Page 3 of 88

4 IT disaster recovery management (IT DRM) supports BCM through its focus on the recovery of IT services. Due to the increase of disasters around the world, the importance of having an effective BCM program is growing. 1 Formalization of BCM programs is gathering momentum, especially following the introduction of the International Organization for Standardization (ISO) standard. 2, 3 For many enterprises, no single regulation, standard or framework exists that defines the complete set of BCM requirements that organizations need to meet, because they operate in multiple countries that each has its own approach to BCM. This is especially true for financial services where each central bank has its own standard (see the Business Continuity Institute [BCI] document "BCM Legislations, Regulations, Standards and Good Practice" for a list). Finally, there will be a wide diversity in the strategic nature of BCM across enterprises and governments: Those that see BCM as an operational risk management component or are required to prove to their customers that they have an effective program that meets the recovery needs of the customer will view BCM as strategic; those that don't have these drivers will continue to see BCM as a compliance/checklist activity and may therefore fail in the face of a disaster. Gartner believes that it will take at least another five years to change over to a state in which BCM is widely used to support strategic and operational activities across the organization (that is, by 2019, BCM will be widely used to support strategic and operational business activities). The Hype Cycle This Hype Cycle will aid BCM and IT DRM leaders in the identification and implementation of important processes and technologies that can make the most significant contributions in improving BCM and IT DRM program maturity. Today, BCM is a mature management discipline: many of standards and processes, as well as many of the technologies used to respond to, recover from and restore after disasters are welldefined. We also see some BCM disciplines, such as crisis/incident management, IT DRM and some aspects of business recovery being more mature than others, usually because they have had a disaster and have had the pain of a failed recovery. For example, Hurricane Sandy presented many challenges to workforce resilience (a component of business recovery), and organizations have since strengthened those aspects of their overall BCM program. However, implementation still lags, and varies, across all industries even those required to support effective BCM programs because of one or more regulatory requirements. The main reason for the implementation lag continues to be the lack of executive interest in or focus on BCM. This lack of interest/focus results in no executive sponsorship and program governance to provide adequate people and financial resources to match the recovery requirements of the organization. This is especially important in a changing business and IT environment every change requires a review of BCM strategy and recovery plans to ensure that recovery practices meet the current production practices. Without this constant review, your recovery may not be successful due to out-of-date recovery plans, procedures and supporting technologies. Hence the need for strong and ongoing executive sponsorship that supports an enterprisewide BCM governance structure and program management office. Page 4 of 88 Gartner, Inc. G

5 At the beginning of 2014, the average client BCM and IT DRM maturity score (based on the Gartner maturity self-assessment tool called ITScore for Business Continuity Management see "ITScore for Business Continuity Management") is just below 2.5 (on a scale of 1 to 5, where 1 is the least mature and 5 is the most mature). Maturity improvement barriers include: In many cases, the result of business unit and business operation leadership resistance to taking BCM ownership has been little to no focus on business process recovery itself. More often than not, the emphasis is placed on just IT service recovery. "Out of mind/out of sight" behavior. Organizations tend to rush to fix recovery gaps after a large-scale disaster, but after nine to 12 months, memories fade and other events take precedence, thereby moving continuity and recovery planning and implementation to the back burner. Also, if the disaster isn't near one of your operating locations, it is atypical for management to ask the question, "How would we recover from a similar event?" Each year typically brings at least one or two new business products, processes, locations, third parties, customer demands for recovery and technologies that require a revisiting of recovery strategies and procedures. This can be a daunting task in multiproduct or multilocation organizations. One trend that we thought could help improve maturity for some organizations is the level of BCM program formalization that is being facilitated through the U.S. PS-Prep program for organizationlevel certification. However, certification under this program has not taken hold (only eight certifications have been issued since its inception), and the delay in FEMA endorsing ISO 22301:2012 as one of the standards that an organization can leverage in order to achieve certification will only ensure low adoption for certification under PS-Prep. Therefore, we have retired the PS-Prep technology profile and included the program under the profile "BCM methodologies, standards and frameworks." For its part, IT DRM and IT service availability management are converging, primarily because of increased business requirements to reduce the business operations impact of unplanned downtime, regardless of whether the root cause of that downtime is a major disaster event or the result of an internal IT service disruption that does not result in a data center shutdown. Effectively integrating IT DRM and IT service availability management into a more holistic discipline (that is, IT service continuity management [IT SCM]) will likely require significant technology and process management changes. These changes will take time for their full implementation. As a result, many organizations are finding that the full completion of a transition to IT SCM is taking at least 18 to as much as 48 months, given the technology, process and operations maturity improvements that need to take place. There are several reasons why this is the case: IT DRM remains labor-intensive for many organizations, especially in the area of recovery plan exercising. This labor intensity will become a significant barrier to scaling IT DRM program coverage as more in-scope business processes, applications and data are added. Improved recovery services and management automation are critical to overcoming this barrier. The deployment of several technologies, including virtual machine mobility and recovery, IT service Gartner, Inc. G Page 5 of 88

6 dependency mapping, data dependency mapping, disaster recovery assurance and automated lights-out recovery operations management, is increasing as related technology maturity and user adoption rates improve. However, with the exception of virtual machine mobility and recovery, their implementation rates, as a percentage of the total number of production data centers, continue to be relatively small (that is, less than 10%). In general, the investment required to fully implement an IT infrastructure that is more capable of supporting and sustaining IT service continuity for all production applications and data (not just mission-critical) is significant. Its implementation will not be in the form of a single project. Rather, it will take the form of a sequence of phased projects, each of which will have a very bounded completion time frame over that 18- to 48-month time period. Gartner recommends that measurable implementation costs, project deliverables, benefits and a discrete set of reportable success metrics be defined as the basis for the justification for the initiation of each phase. For many clients, this approach has become a preferred implementation alternative to the justification of a one-shot, high-priced disaster-recovery-only solution. At the same time, new generations of cloud-based recovery service offerings, often referred to as recovery as a service (RaaS), as well as managed backup cloud services, have the potential to significantly improve IT DRM efficiency, effectiveness and economics. While the market uptake of RaaS has continued to increase during the past year and the number of related providers has grown to more than 150, Gartner believes that it is still at a fairly early stage of delivery maturity. Therefore, clients should not assume that the use of cloud-based recovery services for failing over some or all of the corporate data center infrastructure will largely subsume the use of traditional IT DRM approaches, at least for the next five years. Hybrid pilots and initial production implementations that combine the use of cloud and non-cloud-based recovery and failover will become increasingly common for different application recovery tiers, especially for small and midsize enterprises, over the next three to five years. Despite the potential of public cloud services to transform the data center into a logical entity, a brick-and-mortar production data center will remain as the norm for most organizations for at least the next five years. As long as a production data center is a physical entity, some form of in-house disaster recovery infrastructure and management will be required. Because the transition from IT DRM management into the more encompassing IT SCM has already begun for many organizations, the IT DRM portion of the BCM and IT DRM Hype Cycle report will be completely replaced by the IT SCM Hype Cycle report, beginning in In 2014, both reports are being published in order to facilitate a smooth content transition between key subject area themes. For 2014, the BCM and IT DRM Hype Cycle changes were made to ensure that the major technology, service and methodology changes that occurred during the past year are properly documented and positioned correctly on the Hype Cycle curve. Forward-Moving Technologies of Particular Note Server virtualization is continuing to transform the manner in which disaster recovery is managed, especially for both Microsoft Windows and Linux-based computing platforms. The replication, testing and restart of virtual machines at remote recovery facilities are being increasingly adopted Page 6 of 88 Gartner, Inc. G

7 by Gartner clients as less costly and more flexible alternatives to the use of more traditional subscription recovery services. As a reflection of this trend, Gartner notes three technologies, all of which are specific to virtual machine (VM) recovery and restart, as forward-moving technologies in 2013: IT Service Failover Automation for DR advanced from postpeak 10% in 2013 to postpeak 35% in 2014 because of the increased deployment of tools that support both VM image replication and the restart of collections of VMs supporting one or more end-to-end application services at a secondary recovery site. Recovery Assurance also advanced rapidly from post-trigger 30% in 2013 to trigger-peak midpoint, primarily because it: Supports the means to improve recovery time predictability through its support for the setup and activation of production application test beds that can be utilized to support more frequent recovery testing of virtual-machine-based applications at a secondary site. This can be done on a monthly, weekly or even overnight basis. Brings one of the key benefits of cloud-based recovery services (that is, the means to more frequently exercise applications recoverability) to the enterprise network without requiring any form of private or public cloud service as a technical prerequisite. Cloud-Based Disaster Recovery Services progressed rapidly from postpeak 15% in 2013 to postpeak 35% due to the following: A significantly increased number of production implementations (Gartner estimates that there are approximately 14,000 today), as well as the fact that nearly every major colocation, managed hosting and disaster recovery service provider currently offers one or more cloud-based recovery services. Less complex and more compelling service pricing (that is, more clearly defined service tiers and more standardized pricing policy for service bursting). Improvement in the quality and maturity of provider operations controls, especially for production data privacy management. Added Technologies Cloud Backup Cloud-based backup services aim to replace or augment traditional onpremises backup. In 2013, cloud-based VM recovery and cloud-based backup were discussed in the same technology profile (Cloud-Based Recovery Services). Because of its very different service focus, Cloud Backup now has its own separate profile. Copy Data Management Copy data management, a rapidly evolving technology, facilitates the use of one copy of data for supporting backup, archive, replication and test/development, thereby dramatically reducing the need for multiple unmanaged copies of data. Hazard Risk Analysis and Communication Services Hazard risk analysis and communication services evaluate worldwide incidents that threaten the health and safety of citizens and the Gartner, Inc. G Page 7 of 88

8 workforce, cause damage to critical physical and technology infrastructure, or cause a disruption to normal business operations. IT Vendor Risk Management (VRM) VRM products and processes are emerging to enable the assessment and management of risks from third-party service providers and IT suppliers. VRM is an important element of enterprise and IT risk management, and is mandated by many privacy and data breach notification regulations, such as the Gramm-Leach-Bliley Act in the U.S. and the Federal Data Protection Act or Bundesdatenschutzgesetz (BDSG) in Germany. Load Forecasting Load forecasting is a utility application category that minimizes risk by predicting future consumption of commodities transmitted or delivered by a utility. Backward-Moving Profiles of Particular Note None Obsolete-Before-Plateau Technologies Data Dependency Mapping As improved data protection and integrity problem detection capabilities of storage vendors improve, this product category will likely become obsolete well before it reaches the Plateau of Productivity. Outage Management Systems (OMSs) This technology is predicted to become obsolete before maturity, because the new breed of distribution management systems (DMSs) will eventually incorporate the OMS functionality as we know it. DMSs will include OMSs within real-time advanced distribution supervisory control and data acquisition (SCADA), which also will include automated restoration and self-healing smart grid functionality. Page 8 of 88 Gartner, Inc. G

9 Figure 1. Hype Cycle for Business Continuity Management and IT Disaster Recovery Management, 2014 expectations Disaster Recovery Service-Level Management IT DRM Exercising Cloud-Based Backup Services Recovery Assurance Data Dependency Mapping Technology Copy Data Management Innovation Trigger Peak of Inflated Expectations Plateau will be reached in: Mobile Satellite Services Cloud-Based Disaster Recovery Services Cloud Storage Gateway IT Service Failover Automation for DR IT Vendor Risk Management Virtual Machine Backup and Recovery Load Forecasting Continuous Data Protection Data IT Service Deduplication Dependency Mapping Public Cloud Storage Hazard Risk Analysis and Communication Services Humanitarian Disaster Relief Workforce Resilience Risk Assessment for BCM Trough of Disillusionment time Hosted Virtual Desktops BCM Methodologies, Standards and Frameworks Appliance-Based Replication Ka Band Satellite Communications Emergency/Mass Notification Services Crisis/Incident Management Software BCM Planning Software As of July 2014 Slope of Enlightenment less than 2 years 2 to 5 years 5 to 10 years more than 10 years Work Area Recovery WAN Optimization Controllers Continuity Server-Based Replication Outage Management Systems Bare-Metal Restore WAN Optimization Services IT DRM Insourcing Server Repurposing Lights-Out Recovery Operations Management Plateau of Productivity obsolete before plateau Source: Gartner (July 2014) Gartner, Inc. G Page 9 of 88

10 The Priority Matrix The BCM Priority Matrix shows the business benefit ratings of the 38 continuity and recovery technologies on the 2014 Hype Cycle. The Priority Matrix maps the benefit rating of a process or technology against the length of time that Gartner expects it will take to reach the Plateau of Productivity. This mapping is displayed in an easy-to-read grid format that answers these questions: How much value will an enterprise get from a process or technology in its BCM program? When will the process or technology be mature enough to provide this value? In the case of a process, when will most enterprises surpass the obstacle that inhibits their ability to achieve mature BCM programs? This alternative perspective helps users determine how to prioritize their BCM investments. In general, companies should begin in the upper-left quadrant of the chart, where the processes and technologies have the most dramatic impact on ensuring a strong ability to recover and restore business and IT operations after a business disruption (these processes and technologies are available now or will be in the near term). Organizations should continue to evaluate alternatives that are high impact, but further out on the time scale, as well as those that have less impact, but are closer in time. Many technologies that are designated high on the Priority Matrix are process-oriented. Therefore, the most important piece of advice that Gartner can provide is to look at the BCM methodology being used in the BCM program, so that consistency of program implementation is achieved across all lines of business. Data deduplication technology is designated transformational, because it can reduce disk storage costs by a factor of 15 to 25 times over nondeduplication recovery solutions. Page 10 of 88 Gartner, Inc. G

11 Figure 2. Priority Matrix for Business Continuity Management and IT Disaster Recovery Management, 2014 benefit years to mainstream adoption less than 2 years 2 to 5 years 5 to 10 years more than 10 years transformational Data Deduplication high Continuous Data Protection IT DRM Insourcing Server-Based Replication BCM Methodologies, Standards and Frameworks BCM Planning Software Cloud Storage Gateway Copy Data Management IT Service Dependency Mapping IT Service Failover Automation for DR WAN Optimization Controllers Crisis/Incident Management Software Mobile Satellite Services Work Area Recovery Disaster Recovery Service-Level Management Emergency/Mass Notification Services Hazard Risk Analysis and Communication Services Hosted Virtual Desktops Humanitarian Disaster Relief IT DRM Exercising Load Forecasting Public Cloud Storage Risk Assessment for BCM Virtual Machine Backup and Recovery WAN Optimization Services Workforce Resilience moderate Bare-Metal Restore Continuity Lights-Out Recovery Operations Management Appliance-Based Replication Cloud-Based Disaster Recovery Services IT Vendor Risk Management Cloud-Based Backup Services Ka Band Satellite Communications Recovery Assurance Server Repurposing low As of July 2014 Source: Gartner (July 2014) Off the Hype Cycle The following technologies were removed from this year's Hype Cycle for completing the transition to the Plateau of Productivity, for becoming technically obsolete, stalled progress or because of a better fit with a different Hype Cycle: Gartner, Inc. G Page 11 of 88

12 Business Impact Analysis (Plateau of Productivity transition completed) Continuous Availability Architectures (moved to IT Service Continuity Hype Cycle) Data Restoration Services (Plateau of Productivity transition completed) Distributed Virtual Tape (Plateau of Productivity transition completed) Long-Distance Live VM Migration (moved to IT Service Continuity Hype Cycle) Mobile Service Level Management Software (Plateau of Productivity transition completed) PS-Prep (U.S. PL , Title IX; Significantly stalled progress on broad-based implementation) Test Lab Provisioning (Plateau of Productivity transition completed) On the Rise Copy Data Management Analysis By: Pushan Rinnen Definition: Copy data management refers to products that use a live clone to consolidate, reduce and centrally manage multiple physical copies of production data that is usually generated by different software tools and resides in separate storage locations. Those copies could be snapshots, clones or replicas in primary storage arrays, and backup and remote replicas in various secondary storage (disk or tape). Position and Adoption Speed Justification: Many organizations have become acutely aware of the increasing cost of managing copy data, whose capacity is often significantly higher than production storage due to multiple copies for different use cases and less managed retention periods. IT organizations have historically used different storage and software products to deliver backup, archive, replication, test/development and other data-intensive services with very little control or management across these services. This results in over-investment in storage capacity, software licenses and operational expenditure costs associated with managing excessive storage and software. Copy data management facilitates the use of one copy of data for all of these functions, thereby dramatically reducing the need for multiple unmanaged copies of data and enabling organizations to cut costs associated with multiple disparate software licenses and storage islands. In the past two years, the concept of copy data management has gathered some momentum as a few vendors are starting to use the same term to describe their existing products' capabilities, although very few vendors have truly centralized copy data management products. From the technology perspective, the techniques used by copy data management tools to reduce storage are not new; they include pointer-based virtual snapshots and clones, deduplication and compression, as well as thin provisioning. What is new with the copy data management product is the fact that it effectively separates copy data from production data so that production data will have minimum performance impact when copy data activities such as backup, replication or testing/development are performed. Moreover, it helps organizations to manage traditionally Page 12 of 88 Gartner, Inc. G

13 disparate copies more efficiently. The main challenge faced by copy data management products is that it has to resonate with higher-level executives, as adoption of such products is usually a more strategic move and will disrupt the existing IT silos. User Advice: Copy data management is still an emerging concept with very few qualifying products in the market that can consolidate many types of copies. IT should look at copy data management as part of a backup modernization effort or when managing multiple copies of testing/development databases has become costly, overwhelming or a bottleneck. Copy data management is also useful for organizations that are looking for active access to secondary data sources for reporting or analytics due to its separation from the production environment. Business Impact: The business impact of copy data management is threefold: It enables organizations to rethink and redesign their strategy for managing secondary copies of data to achieve operational efficiency. It reduces the storage and management costs associated with various copies. It enables organizations to better leverage their secondary data for reporting, analytics and other non-mission-critical activities going forward. Benefit Rating: High Market Penetration: 1% to 5% of target audience Maturity: Emerging Sample Vendors: Actifio; Delphix Data Dependency Mapping Technology Analysis By: John P Morency Definition: Data dependency mapping products are software products that determine and report on the likelihood of achieving specified recovery targets based on analyzing and correlating data from applications, databases, clusters, OSs, virtual systems, and networking and storage replication mechanisms. These products operate on direct-attached storage (DAS), storage area network (SAN)-connected storage and network-attached storage (NAS) at both primary production and secondary recovery data centers. Position and Adoption Speed Justification: Prior to these solutions becoming available, there were only two ways to determine whether a particular recovery time objective (RTO) could be achieved through data restoration testing or through operations failovers conducted during a live recovery test exercise. A frequent outcome of the live test was the discovery of missing, unsynchronized or corrupt data that was not detected during normal backup, asynchronous replication or synchronous mirroring, resulting in unplanned data losses that could cause disruption in the operation of one or more business processes if a production operations recovery occurred. Gartner, Inc. G Page 13 of 88

14 Because of the high risk and cost incurred to discover and remediate potential data loss, a new generation of data assurance technologies was developed. These newer products support a more granular knowledge of application-specific data dependencies, as well as the identification of content inconsistencies that result from application software bugs or misapplied changes. These changes may be attributable to either human error or to the complex and dynamic nature of IT operations. One technology approach by various vendors is the use of well-defined storage management problem signatures, supported by industry-standard storage and data management software, and used in combination with the passive traffic monitoring of local and remote storage traffic (through software-resident agents). This traffic monitoring is used to detect control and user data anomalies and inconsistencies in a more timely way, notifying storage operations staff of the issue's occurrence, and to project the negative RTO effect through onboard analytics. The automation of the verification process and an attempt to quantify the impact on the business are the key deliverables of these solutions. Based on the number of direct Gartner client inquiries, this market does not appear to be evolving as quickly as we originally expected. Possible reasons for this slower-than-expected adoption rate could include lack of knowledge regarding data dependency products, lack of the resources needed to deploy the tools, and difficulty in justifying investments in what may be perceived as a luxury technology. Another contributing factor is that data dependency mapping products are still offered primarily on a stand-alone basis, as opposed to being bundled in larger storage management or backup solutions. For these reasons, Gartner positions data dependency mapping at the trigger-peak midpoint in In addition, Gartner also believes that, as self-management and self-healing capabilities of storage vendors improve, this product category will likely become obsolete well before it reaches the Plateau of Productivity. User Advice: Preproduction pilots are a viable option that may be worth pursuing. In some cases, vendors offer very low-cost pilot projects in which the vendor software autodiscovers and reports potential problems that may have been unknown to storage and server operations staff. The validation of the solution in the actual environment reduces the risk that the solution will not meet the organization's needs, and a positive outcome of the pilot can be used to justify the purchase of the solution. Some vendor products go one step further through the use of proprietary, multivendor data checkpoint and replication technology. As a result, a much richer degree of data integrity and consistency assurance can be supported across primary production and secondary recovery data centers. In addition, data checkpoint, replication and validation support for application platformspecific software, such as Microsoft Exchange, BlackBerry Enterprise Server and Microsoft SharePoint, constitutes an additional product alternative for users whose immediate recovery needs are more product-specific. Business Impact: The primary benefits of this technology are improved predictability and efficiency in both achieving and sustaining required recovery times, especially for mission-critical Tier 1 and Tier 2 applications. This is especially important for high-volume transaction applications, and for low- to medium-volume transaction applications for which the average revenue per transaction is high. In recent years, the frequency of live recovery exercises has become much more limited in Page 14 of 88 Gartner, Inc. G

15 many organizations, making such tools even more valuable in both the early detection and the longterm avoidance of data loss. Benefit Rating: Moderate Market Penetration: Less than 1% of target audience Maturity: Emerging Sample Vendors: 21st Century Software; Continuity Software; Egenera; EMC; FalconStor Software; InMage; NetApp; Sanovi; Symantec; Unitrends; Zerto Recovery Assurance Analysis By: John P Morency; Robert Naegle Definition: Recovery assurance products facilitate the reduction of recovery exercising cost and complexity, increase exercise execution flexibility and ensure that application failover is both successful and sustainable. These products configure and manage sets of virtual servers for the purposes of orchestrating disaster recovery plan exercising. Position and Adoption Speed Justification: Effective recovery plan testing is a multidisciplinary and complex challenge that spans multiple types of systems, applications, databases and even organizations. The most important challenge is to ensure that postrecovery IT operations are as stable as predisaster IT operations, to the extent that is possible. As a result, IT disaster recovery management (DRM) annual exercise budget allocations can range from $20,000 to more than $150,000 annually. IT DRM costs include hardware, software, personnel, travel expenses, data center usage, client desktops and peripherals, help desk, and voice and data networks. Most organizations want to reduce recovery exercise time and cost. The real question is how to best do so either through reducing the frequency of test exercises, increasing the use of exercise automation technology or by employing some combination of the two approaches. Recovery assurance is one technology-based alternative. The primary objective of recovery assurance products is to reduce the cost and improve the predictability of meeting critical recovery targets. Through the use of device drive and network address remapping functionality, recovery assurance products create isolated testing environments, which are also known as test "sandboxes." Test sandboxes are similar in structure to public cloud-based customer recovery configurations. These virtual test configurations simulate the production environment in which the business service runs and include all the virtual machines and related production data that support the service. After the virtual test configuration is activated at the recovery site, IT administrators can initiate automated failover of the in-scope applications from the primary site as often as needed. They can carry out recovery tests of individual IT services on a weekly or daily basis, rather than only doing so either yearly or quarterly. Reporting capabilities allow IT administrators to be alerted to potential problems based on either standard or user-defined thresholds. Recovery assurance is a relatively new recovery management category, with only a handful of supporting products and vendors today. Gartner, Inc. G Page 15 of 88

16 However, as the deployment scope of virtual machine recovery increases, so too will the need for recovery assurance functionality to more effectively manage "guaranteed" recovery times as well as reduce recovery plan testing and cost. Because of the broader use of these products in both private and public clouds over the past year, as well as their demonstrated effectiveness through several customer testimonials, Gartner has raised the Hype Cycle curve position of recovery assurance to the trigger-peak midpoint. User Advice: Recovery assurance products have the potential to improve recoverability as well as reduce the cost and complexity of recovery plan exercising. However, these products are still somewhat unproven in large enterprise environments. Today, organizations should consider initiating limited pilots intended to evaluate the use of recovery assurance products for smaller, nonmission-critical business services. Over time, as the support scope of these products broadens to support virtual machine environments besides VMware and market experience with their usage increases, the application of this technology to the recovery of a broader set of mission-critical applications will become more viable. Business Impact: Companies will experience increased staffing and logistics costs to support recovery testing and will struggle to maintain consistency between primary and secondary configurations. Recovery assurance products are targeted squarely at reducing testing costs and improving recoverability. Although still somewhat unproven, there is a clear potential benefit in augmenting more traditional disaster recovery exercising with recovery assurance technology. The related business impact is moderate. Benefit Rating: Moderate Market Penetration: Less than 1% of target audience Maturity: Emerging Sample Vendors: Actifio; CloudVelocity; Continuity Software; Egenera; NetIQ; Sanovi; Sios Technology; Unitrends; VMware; Zerto Recommended Reading: "Cool Vendors in Business Continuity Management and IT Disaster Recovery Management, 2014" At the Peak Cloud-Based Backup Services Analysis By: Pushan Rinnen Definition: Also known as backup as a service (BaaS), cloud-based backup services aim to replace or augment traditional on-premises backup with three main deployment models: (1) using local host agents to send backup data directly to the cloud data centers; (2) backing up first to a local device, which in turn sends backup data to the cloud either as another replica or as a lower tier; (3) backing up data that is generated in the cloud. Page 16 of 88 Gartner, Inc. G

17 Position and Adoption Speed Justification: Network links for the Internet and WANs have become larger and cheaper in the past few years, enabling more data to be transmitted to the cloud within the same time period or the same amount of data to be transferred faster. Improved network throughput is a key factor for cloud backup adoption, as we are starting to see some 1 Gbps or even 10 Gbps links become available for some community clouds or in areas located near public cloud data centers. The first deployment model is the traditional online backup, increasingly used for endpoint backup as the enterprise workforce becomes more mobile. Small businesses and small branch offices with limited amounts of data also leverage online backup to eliminate the hassle of managing local backup. The downside of online backup is that it has a limited backup window and slow online recovery speeds. The second deployment model is also called hybrid cloud backup where the local device offers much faster backup and restore capabilities due to the use of local networks instead of the Internet or WAN access. It therefore can scale to a much larger server environment than the first scenario. All successful cloud server backup providers offer a local device. The more innovative solutions also offer integrated cloud backup and cloud disaster recovery where the backup copies stored in the cloud could be used to boot up standby virtual machines in the cloud for fast failover. The third deployment model is still nascent; cloud-native applications such as Google Apps and salesforce.com are just starting to be used by enterprises, and some enterprises do not realize that accidental or malicious user deletion of data stored in the cloud either cannot be recovered by the cloud application provider or can only be restored at a high cost. Overall, the adoption of cloud-based backup services among midsize to large enterprises is very low for server backup due to their large amounts of data to be protected, limited network bandwidth and security concerns. For organizations that want to eliminate tape for off-site backup and don't have a desirable secondary site, some have shown interest in replication services for deduplication backup target appliances deployed on customer premises. For endpoint and small branch office server backup, Gartner is witnessing an increased interest in the use of Web-scale public cloud providers, especially among organizations that have many global offices and employees. User Advice: Cloud backup is inherently more complex than local backup at the adoption stage because of the additional consideration of networks and security. However, once implemented successfully, cloud backup can eliminate much of the daily management overhead for on-premises backup. Although technological limitations are mostly overcome for laptop backup and backup of a small number of servers, cloud backup remains largely impractical for environments with 20TB or more of production data. Gartner recommends deploying a local backup/restore device for environments with 500GB of daily incremental backup or restore workloads for most environments today. Business Impact: Cloud server backup is often used to replace traditional tape off-site backup, eliminating the daily operational complexities associated with tape backup and the management of removable media. Solutions offering integrated cloud backup and cloud DR provide small organizations with the business continuity benefits they couldn't afford before. Although the business impact for small businesses is high, its impact for large enterprises is low today. Benefit Rating: Moderate Gartner, Inc. G Page 17 of 88

18 Market Penetration: 1% to 5% of target audience Maturity: Emerging Sample Vendors: Asigra; Axcient; Backupify; Barracuda Networks; Ctera Networks; EVault; Hosting; HP Autonomy; Microsoft; nscaled; NaviSite; Spanning; SunGard; Verizon Terremark; Zetta Recommended Reading: "How to Determine If Cloud Backup Is Right for Your Servers" "Exploring Common Cloud Backup Options" "Pricing Differences Complicate the Adoption of Backup and Disaster Recovery as a Service" Disaster Recovery Service-Level Management Analysis By: John P Morency Definition: Disaster recovery service-level management refers to the support procedures and technology needed to ensure that committed recovery time objective (RTO) and recovery point objective (RPO) service levels are met during recovery plan exercising or following an actual disaster declaration. Position and Adoption Speed Justification: Disaster recovery service-level management processes support IT disaster recovery management (DRM) service levels that are defined specific to business process and production application RTOs, RPOs or a combination of the two. The objectives themselves are typically defined in units of minutes, hours or days. Both types of service levels are manually measured (typically by business unit end users), although service-level tracking automation (especially for RTO service levels) is now being supported in recovery assurance products and cloud-based recovery. External service providers offer two types of disaster recovery service levels. The first is applicationspecific RTO- or RPO-based (or both), which software as a service (SaaS) providers sometimes offer. One example of application-specific service levels is the service-level commitment from salesforce.com. For some customers, salesforce.com supports disaster recovery service levels that include a 12-hour RTO and a four-hour RPO. The second type of disaster recovery service level is application-independent and is supported by a combination of server virtualization and virtual machine failover to a provider's managed facility or cloud. A few examples of application-independent service levels managed by recovery as a service (RaaS) providers are: EVault's cloud disaster recovery offering supports three separate RTO service tiers with associated guarantees of four, 24 and 48 hours, respectively. HP's Enterprise Cloud Services Continuity supports RTO service guarantees of four hours for RTO and 15 minutes for RPO. SunGard Availability Services' Recover2Cloud for Server Replication provides a contractually guaranteed SLA for four hours or less for RTO and 15 minutes or less for RPO. Page 18 of 88 Gartner, Inc. G

19 Regardless of the type of service levels offered by a given provider, however, financial penalties that are the result of missed service levels are fairly small and, in general, only consist of a monthly service fee credit. In general, provider master service agreement wording will typically limit the provider's financial liability. In addition to external providers, IT recovery teams also support formal IT DRM, and RTO- or RPO-based (or both) service-level targets. Formal service-level definition and management is typically found in organizations that have high IT DRM maturity. While lower maturity organizations also need to support either formal or informal recovery time targets, the definition and management of formalized service levels to support these targets is far less common. Not only is the market need for more-predictable operations recovery increasing, but the required recovery times for the most important mission-critical applications continue to be measured in the order of minutes or hours versus in days. This improvement will not happen at the same pace in all enterprises. Over time, disaster recovery service-level targets will be defined at a relatively early stage in the application design and implementation life cycle. Because of its dependence on technologies such as virtual machine recovery, application-specific failover mechanisms and continuously available infrastructure (such as stretch clusters), disaster recovery service-level management cannot be positioned on the Hype Cycle at a point later than its technological prerequisites. For this reason, disaster recovery servicelevel management has been moved up to the Peak of Inflated Expectations in User Advice: Over time, recovery, hosting, application and storage cloud providers may offer more robust service availability and data protection alternatives compared to in-house IT. This is a nascent, albeit fast-growing, provider service differentiator. Therefore, it is important to continually re-evaluate the recovery sourcing strategy to ensure that IT operations recovery continues to be predictable, sustainable and cost-effective, regardless of who is responsible for delivering servicelevel protection. Because service-level excellence is so critical to long-term provider viability, it's important for customers to understand the type of service-level management that individual service providers offer and to hold the providers accountable in supporting the RTO and RPO objectives of the business. Business Impact: The ability to manage recovery service levels in an automated, repeatable and timely manner is becoming increasingly critical for many organizations. As Web-based applications support more business-critical processes, managed recovery service levels will become an important basis for improving business resiliency. Benefit Rating: High Market Penetration: 1% to 5% of target audience Maturity: Adolescent Sample Vendors: EVault; HP; IBM; SunGard Availability Services Recommended Reading: "Critical Capabilities for Recovery as a Service" "IT DRM Modernization Effect on RTO, RPO, and Budget Allocation" Gartner, Inc. G Page 19 of 88

20 "Do Your Homework Before Committing to Cloud-Based Recovery Services" IT DRM Exercising Analysis By: John P Morency Definition: Exercising an IT disaster recovery (DR) plan (which is also known as DR testing) involves a set of sequenced testing tasks typically performed at a recovery data center. These tasks focus on ensuring that the availability of and access to a production application (or group of production applications) can be restarted within a specified time (the recovery time objective [RTO]) with the required level of data consistency and an acceptable level of data loss (the recovery point objective [RPO]). Position and Adoption Speed Justification: As the recovery scope of mission-critical business processes, applications and data increases, sustaining the quality and consistency of recovery exercises can be a daunting technical and logistical challenge. This occurs especially as the frequency with which recovery exercises are held increases, in addition to increased change frequency. Regardless of the frequency with which recovery exercises are held, the consistency among the current state of the production data center infrastructure, applications and data, and their state at the time of the last recovery test, erodes on a daily basis. This is a direct side effect of the changes that are applied to the production configuration to support new business requirements. For many organizations, recovery exercising is still either a partially or totally manual exercise, making exercise scalability more difficult as new in-scope applications and data are brought into production. An additional risk is that labor-intensive manual testing, regardless of how thorough it is, cannot fully guarantee 100% correct operation of production applications should the initiation of recovery operations become necessary. To reduce test time, especially for mission-critical applications, some organizations are rearchitecting their most critical applications for active-active operations, meaning that the application runs live in two or more data centers, and that exercising is done every day by means of the production implementation across two sites. A new generation of recovery-in-the-cloud offerings also has the potential to improve the frequency with which customers can conduct live exercising and eliminate recovery configuration server and storage capital cost. The key attributes of cloud computing service-based, scalable and elastic, shared, metered by use and Internetbased access offer a strong alternative to more inflexible and expensive traditional DR services, and may even be a superior alternative to an in-house, self-managed approach. As a result, Gartner expects recovery as a service to continue to grow as a logical extension of cloud infrastructure services. To remain competitive, cloud-based recovery service providers are continuously improving remote access to their data centers, which enables in-house recovery management teams to orchestrate live exercises remotely, without having to travel. Supporting this, however, means that the provider must ensure that proper authentication, access and (if needed) data encryption controls are in place. Page 20 of 88 Gartner, Inc. G