The disaster recovery procedures started immediately. Services to IBM users were restored within 48 hours.



Similar documents
Don't Pay to Support CRM 'Shelfware'

Decision Framework, DF J. Holincheck. Application Service Provider Traditional Payroll/Benefits Outsourcing Business Process Outsourcing

Gartner Updates Its Definition of IT Infrastructure Utility

Server Vendors' High-Availability Services: Magic Quadrant

Business Continuity Planning and Disaster Recovery Planning

Business Intelligence: The European Perspective

Hanh Do, Director, Information System Audit Division, GAA. SUBJECT: Review of HUD s Information Technology Contingency Planning and Preparedness

Midsize Enterprises Lead in Adoption of Payment Outsourcing

Magic Quadrant for Data Center Outsourcing, 4Q03

The Difference Between Disaster Recovery and Business Continuance

The case for cloud-based disaster recovery

How Deal Size Matters in IT Infrastructure Outsourcing (Executive Summary) Executive Summary

PAPER-6 PART-1 OF 5 CA A.RAFEQ, FCA

secure Agent Secure Enterprise Solutions Remote Recovery from a Data Center Outage SecureAgent Software

Government Insights: Possible IT Budget Cuts

Why Should Companies Take a Closer Look at Business Continuity Planning?

IBM Information Technology Services Global sourcing.

Management Update: CRM Success Lies in Strategy and Implementation, Not Software

What you need to know about cloud backup: your guide to cost, security and flexibility.

Magic Quadrant for Storage Services, 2Q05 25 May 2005 Adam W. Couture Robert E. Passmore

Predicts 2004: Supplier Relationship Management

Management Update: Gartner s Updated Help Desk Outsourcing Magic Quadrant

Mainline Disaster Recovery Services Equipment Replacement (Quickship) Services

RBC Insurance Fetes Online Auto/Home Insurance Growth

NEEDS BASED PLANNING FOR IT DISASTER RECOVERY

Disaster Recovery Strategies

University of Michigan Disaster Recovery / Business Continuity Administrative Information Systems 4/6/2004 1

IT Operational Considerations for Cloud Computing

HP Data Protection. Business challenge: Resulting pain points: HP technology solutions:

EMEA CRM Analytics Suite Magic Quadrant Criteria 3Q02

NHS ISLE OF WIGHT CLINICAL COMMISSIONING GROUP BUSINESS CONTINUITY POLICY

PAPER-6 PART-4 OF 5 CA A.RAFEQ, FCA

Beyond Disaster Recovery: Why Your Backup Plan Won t Work

Could a Managed Services Agreement Save Your Company Tens of Thousands of Dollars Each Year?

This white paper was written by Csilla Zsigri, The 451 Group, based on the work done by the SmartLM Consortium in business modeling.

Business Continuity Planning (BCP) / Disaster Recovery (DR)

Vertical Data Warehouse Solutions for Financial Services

Business Continuity Planning. Presentation and. Direction

The 9 Ugliest Mistakes Made with Data Backup and How to Avoid Them

Application / Hardware - Business Impact Analysis Template. MARC Configuration Requirements. Business Impact Analysis

Best Practices for Password Strength

What It Takes to Really Run IT like a Business

A Case Study in Global Supply Chain Risk Management: How AGCO Implemented an SCRM Solution to Save Millions

Management Update: How to Implement a Successful ERP II Project

Social Intranets and the Supply Chain

Top Ambulatory Electronic Health Records Vendors

BCP and DR. P K Patel AGM, MoF

First Data Learns to Manage Online Merchant Risk

PLM Eclipses CPC as a Software Market

Now Is the Time for Security at the Application Level

Backup is Good, Recovery is KING

DISASTER RECOVERY PLANNING GUIDE

Defining the PLM Magic Quadrant by Criteria and Use. We provide the methodology used in developing our product life cycle management Magic Quadrant.

The ITO and BPO Offering Continuum

Recovery Site Evaluation: Finding Viable Alternatives

An Introduction to HIPAA and how it relates to docstar

Oracle Maps Cloud Service Enterprise Hosting and Delivery Policies Effective Date: October 1, 2015 Version 1.0

2010 Gartner FEI Technology Study: Planned Shared Services and Outsourcing to Increase

Backup and Disaster Recovery Modernization Is No Longer a Luxury, but a Business Necessity

White Paper. Managed IT Services as a Business Solution

TO: Chief Executive Officers of National Banks, Federal Branches and Data-Processing Centers, Department and Division Heads, and Examining Personnel

HA / DR Jargon Buster High Availability / Disaster Recovery

ITIL Essentials Study Guide

Designtech Cloud-SaaS Hosting and Delivery Policy, Version 1.0, Designtech Cloud-SaaS Hosting and Delivery Policy

CA API Management SaaS

SFJCCAD2 Promote business continuity management

Transcription:

C. Da Rold, S. Mingay Research Note 7 November 2003 Commentary Italian Blackout Impacts IBM Image and Clients' Business IBM's data center in Vimercate failed to deliver IT services to several clients after the nationwide power cut in September. This is a wake-up call for all businesses to check if their infrastructure really is resilient. The Italian blackout of 28 September 2003 (see "Italy's Blackout Shows Enterprises Need Greater Resilience") happened at the "best time" during a weekend, at 3am on Sunday morning. It caused no major problems or public order issues. Many people worked that Sunday on contingency containment or emergency services and then forgot the event, saying, "Luckily, it wasn't a working day." But signs of problems started to appear in the middle of the following week. IBM's main data center in Italy, at Vimercate near Milan, was not delivering services to some clients with outsourcing contracts. A week later, Gartner asked IBM for clarification of the event and discussed the situation with some of the affected organizations. As a matter of due diligence, we polled other organizations (clients running their own data centers, telecommunications companies and the clients of other outsourcing providers) to check if further blackout effects had been experienced. None of them reported significant issues. On 11 October, IBM Italy provided the following response to Gartner: "The nationwide blackout of 28 September affected the IBM data center located in Vimercate, which delivers IT services to both IBM users and customers' organizations. At the time the power went out, 3:21am, the UPS system (backup battery) started to work, as expected in such cases. For reasons that are currently under investigation, overheating developed, impacting also some areas of the data center. Also, the fire alarm went off, causing the intervention of the fire brigade. Although there was no fire in the data center, the above events caused serious damage to various pieces of equipment and interruption of the delivery of certain IT services. The disaster recovery procedures started immediately. Services to IBM users were restored within 48 hours. As for the customers served in outsourcing by the Vimercate data center, disaster recovery procedures were also started in accordance with the customers' specific requirements. Gartner 2003 Gartner, Inc. and/or its Affiliates. All Rights Reserved. Reproduction of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The reader assumes sole responsibility for the selection of these materials to achieve its intended results. The opinions expressed herein are subject to change without notice.

Normal disaster recovery procedures provide restoration of services within a previously agreed time frame, on the basis of the data resulting from the last available backup. The backup is usually provided on a periodic basis, normally daily or weekly in accordance with the relevant contracts in force between IBM and each customer. A number of customers who had disaster recovery services and contracted for frequent backups followed the normal procedures of disaster recovery and were able to restart operations within the agreed time frame. However, other customers, who had disaster recovery services but did not contract for a daily backup, decided, with IBM, not to run disaster recovery services immediately on the last available backup, but to focus on restoring the most recent data, in spite of a possible longer time of recovery. In fact, they aimed at getting their operations back to the level where they were at the moment of the national blackout (including therefore the data not present in their last available backup). IBM has also worked for all those customers who did not have any disaster recovery procedure included in their contracts, in order to allow them to recover their most recent data in any case. Customers' operations started to be recovered from 29 September. IBM has been in continuous communication with the highest managerial and IT levels of its customers throughout these events. The efforts have been extraordinary, for all those involved, both IBM and its customers. IBM made available and put to work its national and international resources and capabilities (experts from its software laboratories and production centers, its support centers, and its advanced techniques and tools for data restoring). More than 400 IBM staff worked in shifts, 24 hours a day, until the restart of full operations. Furthermore, IBM installed for the benefit of its customers additional and more powerful hardware and storage (than that used before 28 September), in order to restore customers' normal operations more quickly." (Source: IBM Italy.) How Some Clients Were Affected On 30 September (the second working day after the event), an Italian bank announced to its clients that business operations based on central mainframe services, which it had outsourced to the Vimercate data center, were not operational. Branch processing (supported by local configuration and software) and point-of-sale operations (managed in-house) were available as normal. The bank only reported a return to full business operations on 7 October, nine calendar days after the blackout. Press reports indicate that two other banks suffered similar if not worse problems and described the significant difficulties experienced by customers. It was also reported that a Nestle production plant for Perugina chocolate, near Perugia, was closed and the workforce sent home for three days (from 30 September to 2 October) because of the IBM Vimercate event. This made it impossible for the client to manage the plant's logistics and warehouses. These clients' business problems have been confirmed by their customers and other indirect sources. Gartner Comment 7 November 2003 2

A real disaster affected the IBM site. It had a significant business impact on some outsourcing clients up to nine calendar days. The fault if any may lie with IBM, with its clients' disaster recovery practices, or somewhere in between. Our focus here is on the impact of this kind of event on the market and, more importantly, how to avoid it. The event raises some unanswered questions. Nevertheless, the lessons learned will affect: Italian businesses. The event will push security, disaster recovery and business continuity higher up the corporate priority list. Organizations that have outsourced services, including disaster recovery and business continuity. The event demonstrates that business risk cannot be easily transferred to providers. IBM. A logical and advisable source of data center outsourcing services, the company is left managing an embarrassing situation. The outsourcing market in Italy. We expect resistance against outsourcing to rise. Other IT services providers. Competitors may take advantage of IBM's failure to enhance their position. Another provider has already declared, "Our clients encountered no problems from the blackout, because continuity is part of our contracts and our culture." For Italian businesses, disaster recovery and business continuity practices have not been a high enough priority for many years. A separate and distinct market for these services never grew enough to allow the development of recovery specialists. These services are usually delivered by outsourcing providers or managed in-house. They are already hot topics in financial institutions, because Basel II regulations are adding new scrutiny and financial burdens to operational risks. Business dependence on IT services continuity is already strong in many other verticals, including manufacturing, where integration between enterprise resource planning and plants, and automation of the supply chain make continuity a must. Security, disaster recovery, business continuity best practices, the continuous update of these practices and plans, and recurring tests must become embedded in day-to-day business processes. Every contingency plan must include communications that limit the damage to an organization's business and image. Businesses can't just "hope nothing will go wrong." Unfortunately, sometimes it does. Client organizations must understand their provider's contractual responsibilities and which ones rest with them. Very often, the client still owns the majority of the risk. We advise organizations to ask themselves the questions in these five areas: Do you understand your own recovery needs? Do you understand the impact of service failure or loss of data on business operations? Do you understand how the impact increases over time? With this business information, you can judge the requirements and value of backup and recovery practices. You can also define the appropriate recovery time objectives (RTO) the recovery "window" and the recovery point objectives (RPO) the acceptable data loss. Without these business requirements, disaster recovery and business continuity services can look like a costly overhead to be cut. Are requirements clearly communicated to the provider? Does the service provider and the internal IT organization recognize these requirements? Are they prepared to achieve it? Does your organization accept that there is a price to be paid? 7 November 2003 3

Does your contract reflect current requirements for backup and recovery services? Are the contractual recovery plans based on your current business environment? During negotiations of service contracts, these kinds of service easily drop off the "radar screen," especially when the price is cut down through multiple negotiations. Does your change management process capture and communicate changing requirements? RTO, RPO and critical loads change over time. The test of real business continuity capability lies in the ability to capture and act on those changes. Do you test and audit recovery capability and requirements? Audits are needed to ensure the appropriate controls and practices are up and running and to provide a level of assurance that requirements and deliverables are aligned. An untested recovery plan is at best a set of good intentions. Providers and clients must work together to conduct regular tests of these capabilities. In summary, adopting best practices for disaster recovery and business continuity, enforcing contractual audit rights on the provider and carefully co-managing recovery tests are not exotic requests. They are healthy practices from both a business and a personal perspective. For IBM, many questions about its management of this event remain unanswered: Why did a nonevent precipitate a disaster? In the Milan area, the blackout lasted three hours a nonevent for a data center. Italy was in a declared state of "controlled blackout" for most of the summer. Every data center and plant should have tested its power supplies and contingency plans. IBM's other Milan data center suffered an accidental fire a few years ago, and U.S. companies are a potential target of terrorist activity in Europe. Plant maintenance, recovery plans and tests, best practices in problem containment and business continuity should have already been enforced to the letter. Were damages minimized by the first reaction? Although IBM reports there was no fire, the intervention of the fire brigade added delay and perhaps uncertainty. If there was no fire, what damaged the equipment? Was the fire brigade trained for an emergency on this site? Best practices for managing data centers aim to limit initial damages. Were contingency and recovery plans implemented quickly and efficiently? Gartner has talked with clients affected by the outage. Their view of the timing of events doesn't fit entirely with information from IBM, possibly the result of miscommunication or misunderstanding. As confirmed by IBM, a few clients requested an unplanned recovery path (using original data from the damaged center) instead of the more usual plan (48 hours, using the last data backup). Has a realistic evaluation of the time needed and the risk to customers been done? Was a quicker and sounder path to recovery expected? Were regular recovery tests done with these clients? Did IBM consultants advise clients of the risks? Was the risk to clients clearly described in their contracts? While it's clear that IBM's efforts after the event have been remarkably strong, continuity of service and disaster recovery best practices are about anticipating risk, avoiding and containing damage, and executing contractual recovery plans that have been tested. Can a leading provider give clients complete freedom on security and recovery issues? Even if all blame lies with clients, this event harmed IBM's image in key areas: mainframes, data centers, disaster recovery and business continuity services. Service providers and their clients are obviously free to define their service relationship when drawing up a contract. Nevertheless, shouldn't wise outsourcing providers and security consultants prevent their clients from operating below a certain security level, to avoid being damaged if something goes wrong? 7 November 2003 4

Has the communication plan worked well? Every continuity expert teaches that communications (internal, to clients and to external parties) are of paramount importance in an emergency. Sometimes, organizations affected by a problem elect not to talk about it, perhaps in the hope of limiting the spread of bad news. This approach does not work, because it fuels speculation and can damage a company's image unnecessarily. Bottom Line: The events at Vimercate re-emphasize the need to build security and resilience into business infrastructure whether delivered internally or outsourced. We advise every business to evaluate the potential direct and indirect damage in the event of a disaster, and to check the efficiency of its contingency plans. The events have a wider implication for the image of IBM in Europe, because a provider's disaster recovery and business continuity practices are usually strictly aligned, at least at a regional level. Clients of any outsourcing provider who are not clear about their service-level guarantees and procedures in the event of a major incident should talk to their account representatives as soon as possible. IBM must work hard to assure clients and prospects of the continuity of its services, while significantly improving its communication plans, if it is to avoid significant damage to its image in the region. 7 November 2003 5