Why Email Fails MessageOne Survey of Email Outages



Similar documents
WHITE PAPER. Best Practices to Ensure SAP Availability. Software for Innovative Open Solutions. Abstract. What is high availability?

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

PROTECTING MICROSOFT SQL SERVER TM

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Neverfail Solutions for VMware: Continuous Availability for Mission-Critical Applications throughout the Virtual Lifecycle

Blackboard Managed Hosting SM Disaster Recovery Planning Document

Protecting Microsoft SQL Server

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Real-time Protection for Hyper-V

Creating A Highly Available Database Solution

INSIDE. Preventing Data Loss. > Disaster Recovery Types and Categories. > Disaster Recovery Site Types. > Disaster Recovery Procedure Lists

How To Fix A Powerline From Disaster To Powerline

5054A: Designing a High Availability Messaging Solution Using Microsoft Exchange Server 2007

Everything You Need to Know About Network Failover

Disaster Recovery (DR) Planning with the Cloud Desktop

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

Blackboard Collaborate Web Conferencing Hosted Environment Technical Infrastructure and Security

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Availability and Disaster Recovery: Basic Principles

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

WHITE PAPER: ENTERPRISE SECURITY. Symantec Backup Exec Quick Recovery and Off-Host Backup Solutions

ARCHITECTURAL OVERVIEW Availability Service (EAS) with Activ box

High availability and disaster recovery with Microsoft, Citrix and HP

WHITE PAPER PPAPER. Symantec Backup Exec Quick Recovery & Off-Host Backup Solutions. for Microsoft Exchange Server 2003 & Microsoft SQL Server

SOLUTION BRIEF Citrix Cloud Solutions Citrix Cloud Solution for Disaster Recovery

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

Disaster Recovery Hosting Provider Selection Criteria

DISASTER RECOVERY WITH AWS

MSP Service Matrix. Servers

Maximizing Data Center Uptime with Business Continuity Planning Next to ensuring the safety of your employees, the most important business continuity

Business Process Desktop: Acronis backup & Recovery 11.5 Deployment Guide

Whitepaper Continuous Availability Suite: Neverfail Solution Architecture

Veritas Storage Foundation High Availability for Windows by Symantec

Security+ Guide to Network Security Fundamentals, Fourth Edition. Chapter 13 Business Continuity

Complete Managed Services. Proposal for managed services for the City of Tontitown

IBM Virtualization Engine TS7700 GRID Solutions for Business Continuity

IT Disaster Recovery Plan Template

3. Where can I obtain the Service Pack 5 software?

Business Continuity: Choosing the Right Technology Solution

DISASTER RECOVERY PLANNING GUIDE

Protecting SQL Server in Physical And Virtual Environments

Backup and Redundancy

Veritas Cluster Server by Symantec

M4 Systems. M4 Online Backup. M4 Systems Ltd Tel: International: +44 (0)

GSN Cloud Contact Centre Availability & DR Datasheet

HP Data Protector software Zero Downtime Backup and Instant Recovery. Data sheet

CA ARCserve Family r15

Client Hardware and Infrastructure Suggested Best Practices

Oracle Maps Cloud Service Enterprise Hosting and Delivery Policies Effective Date: October 1, 2015 Version 1.0

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

Virtual Infrastructure Security

White Paper. Optimizing the SAN Fiber Optic Physical Layer in Your Data Center

Veritas Cluster Server from Symantec

DeltaV Virtualization High Availability and Disaster Recovery

Online Transaction Processing in SQL Server 2008

Cloud Computing Disaster Recovery (DR)

White Paper. Prepared by: Neil Shah Director, Product Management March, 2014 Version: 1. Copyright 2014, ezdi, LLC.

Selecting the Right NAS File Server

High Availability for Citrix XenApp

QuickSpecs. HP Smart Array 5312 Controller. Overview

<Client Name> IT Disaster Recovery Plan Template. By Paul Kirvan, CISA, CISSP, FBCI, CBCP

HP StorageWorks Data Protection Strategy brief

AdvancedHosting SM Solutions from SunGard Availability Services

Distribution One Server Requirements

Disaster Recovery for Small Businesses

Eliminate SQL Server Downtime Even for maintenance

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Leveraging Virtualization for Disaster Recovery in Your Growing Business

Enhancing Exchange Server 2010 Availability with Neverfail Best Practices for Simplifying and Automating Continuity

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Disaster Recovery & Business Continuity Dell IT Executive Learning Series

End-to-End Availability for Microsoft SQL Server

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Preface Introduction... 1 High Availability... 2 Users... 4 Other Resources... 5 Conventions... 5

Windows Server 2008 R2 Hyper-V Live Migration

Disaster Recovery Plan

Ezi Managed Services Pty Ltd Introduction to Our Managed Service Agreement

Reducing the Cost and Complexity of Business Continuity and Disaster Recovery for

Is online backup right for your business? Eight reasons to consider protecting your data with a hybrid backup solution

CA XOsoft Replication r12.5 and CA XOsoft High Availability r12.5

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Dell PowerVault DL Backup to Disk Appliance Powered by CommVault. Backup-to-disk and recovery with deduplication

Total Business Continuity with Cyberoam High Availability

VERITAS Business Solutions. for DB2

CA XOsoft Replication and CA XOsoft High Availability CA Partner Frequently Asked Questions

The 5 Most Commonly Used Disaster Recovery Process

Transcription:

Why Email Fails MessageOne Survey of Email Outages White Paper MessageOne, Inc. 11044 Research Blvd. Building C, Fifth Floor Austin, TX 78759 Toll-Free: 888.367.0777 Telephone: 512.652.4500 Fax: 512.652.4504 www.messageone.com

Executive Summary Email has become the most pervasive form of business communication, impacting every aspect of every organization: communications between management, employees, prospects, customers, vendors, suppliers, partners, investors, and analysts. The average email user sends 34 emails and receives 99 emails every day, and overall email use is growing 53% per year. Despite large enterprise investments in replication, mirroring, and tape back up systems, email systems continue to fail. While it is widely known that natural and man-made disasters can lead to email outages, new data shows that email systems are more frequently brought down by technological failures. MessageOne, a leading provider of email continuity solutions, commissioned this research report to understand the frequency, severity, and cause of email outages in North American corporations using Microsoft Exchange, Lotus Notes, and Novell GroupWise. This research shows that enterprise email systems are prone to a variety of potential breakdowns including SAN (Storage Area Network) failures, mis-configuration, losses in network access, database corruption, and viruses. Data from the survey shows that in any given 12-month time period, there is a 75% likelihood of an unplanned email outage and a 14% likelihood of a planned email outage for any given company. This research report analyses the leading causes of failure with enterprise email systems and provides preventative guidance to lower the probability of unplanned email outages. 2

MessageOne Survey Results: Email Failures by Cause MessageOne recently surveyed its customers on email outages during a recent 30 week period. MessageOne provides hundreds of companies serving over 1,000,000 email users with a highly scalable standby messaging system that can be activated instantly at the customer s request, guaranteeing uninterrupted email services in the event that an organization s primary messaging system becomes unavailable or incapacitated. Companies were surveyed after activation of MessageOne s backup email system, providing timely and reliable reporting on the cause, impact, and duration of email outages. Email Outage Frequency & Duration Survey results show that in any given 12-month time period, there is a 75% likelihood of an unplanned email outage and a 14% likelihood of a planned email outage in any given company. The length of email outages in the companies surveyed ranged from a minimum of 2 minutes to a maximum of 120 hours with the average email outage being 32.1 hours long. The largest concentration of outages was between 4 and 24 hours in duration (29%). More than 43% of the outages lasted longer than 24 hours, a length of time that can lead to significant business disruption and damage. 48+Hours 1-4 Hours 26% 28% 24-48 Hours 17% 29% 4-24 Hours Email Outage Causation - Unplanned Outages A large majority of email outages were caused by unplanned events, most of which were due to technological failures. Of the technological failures, 35% were due to server hardware failures, averaging 18.1 hours in outage duration, 19% were due to connectivity losses, averaging 27.4 hours, 16% were due to SAN failures averaging 25.5 hours, and 16% were due to database corruption averaging 9.0 hours in duration. The majority of these, specifically database corruption and SAN failures, are troubling because they are the most difficult to prevent. Even with expensive mirroring and replication backup solutions, data corruption and SAN failures are often propagated to the mirror or replicated backup server. These failures usually result in long outages as companies resort to incremental tape backups to locate the last backup before the corruption occurred. While natural disasters accounted for only 14% of unplanned email outages, the average downtime due to such disasters was over 60 hours, meaning these can lead to significant impact on businesses. 3

Email outages by cause Natural disaster 14% Server hardware failure SAN failure 16% 35% Database corruption 16% 19% Connectivity loss Downtime by failure cause Database/Software Corruption Server Hardware Failure SAN Failure Connectivity Loss 9.0 Hrs 18.1 Hrs 25.5 Hrs 27.4 Hrs Natural Disaster 60.8 Hrs 6 12 18 24 60 Outage Length (Hours) SAN Failures Many enterprises have installed a Storage Area Network (SAN) to provide high availability for corporate email services within their headquarters facilities. SAN hardware, while designed to provide highly available and highly redundant storage for data, adds a significant level of complexity to a messaging infrastructure. Combined with the unique footprint and usage patterns of messaging systems, the design and implementation of SAN infrastructure requires continual optimization to ensure reliable performance. Mis-configuration of LUNs (Logical Unit Number), out of date drivers, and administration of physical hardware by teams outside of the messaging group were all contributors to SAN-related outages for customers surveyed. In some cases, these errors led to significant data corruption that was replicated to backup email systems. Hardware Server Failure A wide array of hardware related failures contributed to outages for customers surveyed. From catastrophic drive failure, to bad RAM (Random Access Memory), over a quarter of customer-related outages could be traced to hardware failure. In many of these scenarios, customers had already taken steps to mitigate hardware related issues by building highly redundant servers, including dual backplanes, and redundant RAID (Redundant Array of Independent Discs) controllers. 4

Several items of note: Branch office messaging servers were often not as fully redundant as their datacenter counterparts. New hardware or recently upgraded hardware was more commonly the source of server related outages. Server sizing issues, contributed in several cases to performance degradation or outages. Database Corruption Potential downtime due to database corruption is a well-known hazard for mail administrators. With the typical customer having.75tb or more of messaging data, this downtime can be significant. Less well publicized is the risk of downtime associated with Microsoft Active Directory (AD) corruption. Several customers experienced significant system-wide downtime as the result of AD-related corruption. In each instance, Exchange-specific attributes or data was corrupted in a manner that disrupted communications. In several of these cases, identification, repair, and recovery resulted in outages exceeding 48 hours. Connectivity Losses Connectivity loss includes LAN (Local Area Network) or WAN (Wide Area Network) outages which prevent users from accessing an otherwise functioning server. Causes of loss include hub, switch, or router failure as well as broken or damaged cable or fiber from a variety causes such as construction (backhoe) and damage during moves or maintenance. In one instance, construction down the street from a surveyed company resulted in the loss of both primary and secondary WAN connections through two separate providers. Natural Disasters Flooding, hurricanes, power outages, and construction-related network outages accounted for 14% of outages among those surveyed. In one example, a company s top-floor datacenter was unavailable for a week due to flooding. Extreme cold had caused water pipes in the ceiling to burst, leaving their datacenter in four inches of water. In another example, a company had to evacuate their datacenter to exterminate termites. Email Outage Causation - Planned Outages Survey results showed that planned events account for 14.3% of email outages in any given 12-month time period. On average, the planned outages last for 36.1 hours. A number of reasons can be sited for planned outages including email platform upgrade or migration, datacenter or office move, planned power outage, system maintenance, required patch management and disaster recovery testing. For example, maintenance windows are necessary to keep servers appropriately tuned or patched. In several instances, customers needed to bring servers down for prolonged maintenance tasks (such as an integrity check on a Microsoft Exchange database), to replace hardware components, to address performance problems, or to preempt impending outages. In some cases theses outages were planned well in advance, in others, they were quickly executed to address problems and potential risks. 5

Conclusions: Despite Heavy Investment, Email Still Fails Every day, more and more companies are concluding that email is a mission-critical application worthy of inclusion in a business continuity plan. Generally speaking, organizations email business continuity and disaster recovery plans fall into two camps: tape backup or replication and mirroring solutions. While tape backup is the most inexpensive way to back up data, tape backup does not provide email continuity only recovery after a lengthy amount of outage with the potential for lost data. While traditional replication and mirroring solutions have their place in disaster recovery and business continuity planning, trying to use such solutions for email continuity can prove futile there is a host of common scenarios for which replication fails to provide high availability. High Availability Email: Key Points of Failure Based on the research data collected, there are numerous points of failure with tape backup and traditional replication and mirroring solutions. 1. Failure Point: Replicated Database Corruption When a corruption occurs in a database store, it can cause a main server to go offline. In most cases, replication software, which transfers data byte-by-byte, will copy the corrupted data to the backup server. In this case, the backup server will be corrupted as well. Typically, corruptions are a slow process of degradation and may require administrators to restore many backup tapes until a tape is found before the corruption. 2. Failure Point: Single Platform Dependency While most organizations depend on backup email systems, the secondary systems are usually on the same email platform as the primary system. For example, companies that use Microsoft Exchange may have a primary and backup email server running the same version of Microsoft Exchange. This dependency on a single platform creates a point-of-failure where a virus, worm, or bug can incapacitate both the primary and backup system simultaneously. 3. Failure Point: SAN Complexity Complexity makes systems more prone to failure, more difficult to test, and more dependent on key resources that can be deployed on higher value problems. For example, consider a firm that purchased an expensive SAN cluster and configured it to perform Exchange database replication. A truly esoteric mis-configuration caused a complete failure of the SAN and 2 1/2 days without email. Increasingly complex SAN hardware, often used for primary and backup data stores, is becoming a common failure point for enterprise email systems. 4. Failure Point: Dependency on Tape The very nature of tape backup is just that: backup. An organisation uses tape backup generally to backup data files, databases, applications, etc., which are used/created regularly by the employees of the organization. Tape backup is by far one of the most inexpensive and least complex ways to backup an organization s data. Where tape backup fails as an email continuity and recovery solution, is the fact that it takes anywhere from hours to days to recover a company s data from tape. In the event of a disaster, whether natural, man-made or technological, keeping the lines of communication up and running is critical to recovery. If used as an email backup option, tape backup is too slow to meet reasonable recovery goals. 6

How MessageOne s EMS TM Provides Unbreakable Email Continuity MessageOne EMS Email Continuity Overview Email has evolved to become a mission-critical application as essential as electricity or telephone service. In fact, according to a META Group survey, 80% of corporate email users believe that email is far more valuable than telephone for business communications. Today, 90% of corporate communication is driven by email. As part of MessageOne s Email Management Service TM, EMS Email Continuity is the only affordable solution on the market today that addresses the shortfalls of tape backup and traditional mirroring and replication solutions. EMS is a highly-scalable standby messaging and employee notification system that is hosted at world-class disaster recovery facilities (including SunGard and IBM), equipped with redundant power, servers and internet backbones, and manned 24x7 by expert support staff. EMS provides guaranteed email continuity 24x7x365 no matter what. Features of EMS Email Continuity include: Guaranteed Email Continuity 24x7x365 1/20th the cost of traditional replication and high availability solutions Makes sure email system outages are never evident to the outside world Provides secure email continuity to employees anywhere and any time Sends critical notification information to cell phones, BlackBerries, alternate email accounts, etc. Can deploy for an entire global enterprise in less than a day Is easily tested on a monthly or quarterly basis Built on Linux to avoid a single platform point of failure How EMS Email Continuity Works Prior to Activation Prior to activation, critical information about the primary mail system, such as the user directory and employee notification information, is automatically replicated at predefined intervals using MessageOne software that runs in the customer s environment and integrates with multiple back-end systems. Upon Activation Upon activation, EMS simultaneously performs two actions: It broadcasts alerts to the company s employees, notifying them that EMS has been activated. Employees will receive these alerts via SMS-to-mobile-phone, RIM BlackBerry (using PIN), text pagers, and alternate email addresses previously provided by the employees. It automatically routes all email sent to corporate addresses to the secure, 128-bit SSL EMS system hosted at SunGard or IBM, readily accessible by employees who have received login instructions via the EMS alerts. Authentication to use the web-based email system can be done with pass phrases, SecureID, and other methods that are designed to integrate with the customer s existing security system. Upon Deactivation Upon deactivation, once the company s primary mail system has been restored, all emails that were sent or received during the emergency are seamlessly and automatically merged into the company s primary mail system, including all forensic information and timestamps. To learn more about EMS, Contact MessageOne toll-free at 888-367-0777 or 512-652-4500, email us at info@messageone.com or visit our web site at www.messageone.com. 7

About MessageOne, Inc. Founded in 1998 by Adam Dell, MessageOne helps enterprises prepare for and respond to disruptions in their normal business operations with the best and most cost-efficient solutions in the industry. When problems arise, you can t always depend on normal lines of communication to be available. MessageOne provides a suite of products and services that guarantee email continuity and offer powerful emergency notification and escalation tools. The company has become a leader in business continuity, disaster recovery and crisis communication solutions. MessageOne is headquartered in Austin, Texas with offices in New York, New York. MessageOne, 2005. All Rights Reserved. MessageOne, the MessageOne logo, MessageOne.com, the Emergency Mail System, EMS and AlertFind are trademarks of MessageOne, Inc. All other product or company names mentioned are used for identification purposes only, and may be trademarks of their respective owners. 8