Exchange Data Protection: To the DAG and Beyond. Whitepaper by Brien Posey



Similar documents
Exchange DAG backup and design best practices

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft Exchange

SQL SERVER ADVANCED PROTECTION AND FAST RECOVERY WITH DELL EQUALLOGIC AUTO SNAPSHOT MANAGER

A review of BackupAssist within a Hyper-V Environment. By Brien Posey

Deploying Exchange Server 2007 SP1 on Windows Server 2008

Continuous Data Protection. PowerVault DL Backup to Disk Appliance

Exchange Server 2010 backup and recovery tips and tricks

Real-time Protection for Hyper-V

Non-Native Options for High Availability

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

How to Manage Critical Data Stored in Microsoft Exchange Server By Hitachi Data Systems

Enhancing Exchange Server 2010 Availability with Neverfail Best Practices for Simplifying and Automating Continuity

White Paper. Mimosa NearPoint for Microsoft Exchange Server. Next Generation Archiving for Exchange Server By Bob Spurzem and Martin Tuip

SQL SERVER ADVANCED PROTECTION AND FAST RECOVERY WITH EQUALLOGIC AUTO-SNAPSHOT MANAGER

The Microsoft Large Mailbox Vision

A review of BackupAssist within a Hyper-V Environment

Windows Geo-Clustering: SQL Server

Virtual Infrastructure Security

Vess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0

EMC RECOVERPOINT FAMILY

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

TECHNICAL SOLUTION BRIEF

Deploy App Orchestration 2.6 for High Availability and Disaster Recovery

WHITE PAPER THE BENEFITS OF CONTINUOUS DATA PROTECTION. SYMANTEC Backup Exec 10d Continuous Protection Server

Protecting Exchange 2010

Continuous Data Protection for any Point-in-Time Recovery: Product Options for Protecting Virtual Machines or Storage Array LUNs

Symantec Backup Exec 11d for Windows Servers Sets the Standard for Exchange 2007 Server Data Protection

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Using Hitachi Protection Manager and Hitachi Storage Cluster software for Rapid Recovery and Disaster Recovery in Microsoft Environments

10135A: Configuring, Managing, and Troubleshooting Microsoft Exchange Server 2010

Acronis Backup & Recovery Backing Up Microsoft Exchange Server Data

WHITE PAPER PPAPER. Symantec Backup Exec Quick Recovery & Off-Host Backup Solutions. for Microsoft Exchange Server 2003 & Microsoft SQL Server

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

SnapManager 5.0 for Microsoft Exchange Best Practices Guide

Acronis Backup & Recovery Update 2. Backing Up Microsoft Exchange Server Data

Near-Instant Oracle Cloning with Syncsort AdvancedClient Technologies White Paper

Acronis Backup Advanced for Exchange. Version 11.5 Update 3. Backing Up Microsoft Exchange Server Data

70-662: Deploying Microsoft Exchange Server 2010

System Protection for Hyper-V Whitepaper

Availability and Disaster Recovery: Basic Principles

EXCHANGE 2010 DISASTER RECOVERY OPTIONS WITH CROSS-SITE EXCHANGE DAG AND EMC RECOVERPOINT

Guarding Physical Servers in a Virtual World

Microsoft Exchange 2013 Ultimate Bootcamp Your pathway to becoming a GREAT Exchange Administrator

Acronis Backup & Recovery 11.5

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

BUSINESS CONTINUITY AND DISASTER RECOVERY FOR ORACLE 11g

Ensuring business continuity after Active Directory disasters

Assuring High Availability in Healthcare Interfacing Considerations and Approach

Exchange 2010 ITPro. Milan Marenčík Microsoft Services

Best practice: Simultaneously upgrade your Exchange and disaster recovery protection

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

Advice for Virtualizing Exchange 2010 Server Roles

CREATING SQL SERVER DISASTER RECOVERY SOLUTIONS WITH SIOS DATAKEEPER

Westek Technology Snapshot and HA iscsi Replication Suite

Eliminating End User and Application Downtime:

Acronis Backup Advanced Version 11.5 Update 6

IMPROVING MICROSOFT EXCHANGE SERVER RECOVERY WITH EMC RECOVERPOINT

Version: Page 1 of 5

Data Storage And Backup

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Protecting Microsoft Hyper-V 3.0 Environments with CA ARCserve

Availability for the modern datacentre Veeam Availability Suite v8 & Sneakpreview v9

Backup and Recovery for Microsoft Hyper-V Using Best Practices Planning. Brien M. Posey

Administration GUIDE. Exchange Database idataagent. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 233

Handling Hyper-V. In this series of articles, learn how to manage Hyper-V, from ensuring high availability to upgrading to Windows Server 2012 R2

Cloud Optimize Your IT

EMC NetWorker Module for Microsoft for Exchange Server VSS

High availability and disaster recovery with Microsoft, Citrix and HP

LEARN EXCHANGE PART 2 Managing your Exchange Architecture

Preface Introduction... 1 High Availability... 2 Users... 4 Other Resources... 5 Conventions... 5

EMC NetWorker Module for Microsoft for Exchange Server VSS

How To Install The Exchange Idataagent On A Windows (Windows 7) (Windows 8) (Powerpoint) (For Windows 7) And Windows 7 (Windows) (Netware) (Operations) (X

High Availability for Citrix XenApp

This course is intended for IT professionals who are responsible for the Exchange Server messaging environment in an enterprise.

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

Exchange 2010 Best Practices and Strategies for Backing Up and Restoring Exchange 2010 Environment

Windows Server Failover Clustering April 2010

System Protection for Hyper-V User Guide

AppSense Environment Manager. Enterprise Design Guide

CA ARCserve Family r15

Hyper-V Protection. User guide

Symantec NetBackup Blueprints

UBDR and Microsoft Cluster Server

Data Storage and Backup

Cisco and EMC Solutions for Application Acceleration and Branch Office Infrastructure Consolidation

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

Dell PowerVault DL2200 & BE 2010 Power Suite. Owen Que. Channel Systems Consultant Dell

Cloud Optimize Your IT

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

NSI Solutions with Microsoft VSS

WHITE PAPER [MICROSOFT EXCHANGE 2007 WITH ETERNUS STORAGE] WHITE PAPER MICROSOFT EXCHANGE 2007 WITH ETERNUS STORAGE

Hyper-V backup implementation guide

How To Protect Data On Network Attached Storage (Nas) From Disaster

The Benefits of Continuous Data Protection (CDP) for IBM i and AIX Environments

Protecting Windows Microsoft Exchange Server Data Protection

SnapManager 4.0 for Microsoft Exchange

Five Fundamentals for Modern Data Center Availability

CA XOsoft High Availability for Windows

Transcription:

Exchange Data Protection: To the DAG and Beyond Whitepaper by Brien Posey

Exchange is Mission Critical Ask a network administrator to name their most mission critical applications and Exchange Server is almost certain to be near the top of the list. Exchange admins know all too well that Exchange servers store more than just E-mail messages. Users also use Exchange to store customer contact data and appointments for client meetings. To make a long story short, even a brief Exchange Server outage can have a significant impact on the organization s bottom line. Users might for example miss an important appointment or be unable to respond to a message from a potential customer in a timely manner. As such, it has become critically important for Exchange Admins to keep Exchange Server up and running. Fortunately Microsoft also understands the importance of keeping Exchange Server online and has designed Exchange Server 2010 in a way that allows it to be configured for resilience to failure. The primary feature for mailbox server resilience is Database Availability Groups (DAGs). Built-in High Availability: Microsoft Exchange DAGs DAGs are designed to be a next generation replacement for the continuous replication features that were found in Exchange Server 2007. Although DAGs do have a few things in common with Exchange 2007 s continuous replication, there are some important differences. For instance, in Exchange 2007, administrators were forced to choose the type of continuous replication that they wanted to use (local continuous replication, cluster continuous replication, or standby continuous replication). In Exchange 2010, DAGs are the only high availability solution for mailbox servers. There are no longer multiple continuous replication models to choose from. DAG Benefits DAGs use failover clustering to make Exchange 2010 mailbox databases highly available. Prior to the release of Exchange Server 2010, mailbox databases were considered to be server level objects. In Exchange Server 2010 however, mailbox databases become an organizational level feature, which means that a mailbox database can reside on any Exchange 2010 mailbox server within the organization. DAGs build on the concept by allowing mailbox databases to be replicated to multiple mailbox servers. A DAG can consist of up to sixteen Exchange 2010 mailbox servers. Each DAG member can contain a mixture of active and passive mailbox databases. Mailbox databases can be replicated to other DAG members on an as needed basis. Replica databases can be mixed and matched on the DAG members. It is not necessary to replicate a mailbox database to every DAG member. The Exchange administrator is free to pick and choose which mailbox databases will be replicated and where those replicas will reside. DAGs ensure mailbox database resiliency by allowing a mailbox database to automatically fail over to another DAG member in the event that the server that was hosting the active copy of the database fails. Although this type of resilience is critical to many organizations, DAGs have another capability that is often equally appealing. Exchange Server 2010 makes it possible to create a stretched DAG that spans multiple datacenters. Stretched DAGs make it possible to replicate mailbox databases to remote

datacenters for safe keeping. These remote replicas can be activated in the event that the primary datacenter experiences a catastrophic failure. DAG Limitations In spite of all of their promise, DAGs are not without limitation. DAGs tend to work extremely well so long as they are confined to a local datacenter, however stretched DAGs require very careful planning. All too often organizations invest in Exchange Server 2010 with the intention of building a stretched DAG, only to discover that their existing network infrastructure rules out the possibility of creating such a structure. The factors with the greatest impact on an organization s ability to create a stretched DAG are network latency and Active Directory structure. Microsoft only supports the use of DAGs on networks with a round trip latency of 500 milliseconds or less. Often times this limit prohibits organizations from passing DAG traffic across a WAN link. The Active Directory structure can also impact an organization s ability to create a stretched DAG. DAG members must all belong to a common domain. Many organizations use a separate domain (or domain hierarchy) for each physical location. For instance, an organization might use one domain name for the Miami office and another domain name for the office in Las Vegas. This domain naming convention would prohibit the use of a stretched DAG between the two facilities. Stretched DAGs can only be used if the Active Directory domain also spans the physical locations. Even though network latency and Active Directory structure tend to be the greatest barriers to creating a stretched DAG, they are far from being the only limitations. Microsoft has a very strict set of requirements that must be adhered to when creating a DAG. First off, DAGs can only be comprised of Exchange 2010 mailbox servers. Servers that are running earlier versions of Exchange cannot take part in a DAG. Likewise, DAGs can only be used with native Exchange 2010 mailbox databases. This requirement carries a couple of implications. First, it means that public folder databases are not supported. If an organization wants to provide high available for public folders, they will have to use the public folder replication feature. The other implication to the native Exchange 2010 database requirement is that DAGs cannot be used with Exchange 2007 databases. For example, an organization cannot build a DAG and then copy the database from an Exchange 2007 mailbox server to a DAG member and mount the database. In addition to meeting the requirements that Microsoft has established for DAGs, there is a lot of careful planning that must go into the DAG creation process. DAGs are relatively easy to deploy, but without careful planning a DAG might not provide the level of protection that would be expected. When an organization creates a DAG, they are creating a Majority Node Set (MNS) failover cluster (The Node and File Share Majority model is used for DAGs with an even number of members). An MNS cluster can only function if the cluster is able to maintain quorum, which means that more than half of the DAG members are functional and responsive. Because more than half (technically half plus one) of the DAG members must be functional in order for the cluster to retain quorum, a DAG cannot contain less than three servers. If a DAG

consisted of only two servers then the failure of a single DAG member would cause the cluster to lose quorum. Of course there are hardware and licensing costs associated with deploying DAG members. Smaller organizations may want the benefits of a DAG, but might lack the financial ability to create a three member DAG. In these types of situations it is possible to create a DAG consisting of two DAG members and a witness server. The witness server is simply a server with a dedicated file share that takes the place of a third DAG member. Exchange mailbox databases cannot fail over to the witness server, but the witness server can prevent the DAG from losing quorum in the event of a DAG member failure. It might at first seem that the concept of making sure that DAGs are able to maintain quorum are only a concern for smaller organizations. However, this concept can be just as important in larger organizations, especially when stretched DAGs come into the picture. The main problem with stretched DAGs is that if the WAN link between datacenters were to fail, Exchange would interpret the network failure as a failure of any DAG members residing on the other side of the link. This means that an organization must design their DAG so that there are enough DAG members residing in the primary datacenter that the primary datacenter will always be able to retain quorum even if a WAN failure occurs. This means placing more than half of the DAG members in the primary datacenter. While this concept probably seems straightforward, there are a few things to consider. First, any time that additional DAG members are added to a remote office, the addition of the new server impacts the number of DAG members that must remain online in order to retain quorum. As such, adding an Exchange mailbox server to a remote site might also require a mailbox server to be added to the primary datacenter as well in order to ensure that the primary datacenter is always able to maintain quorum. Needless to say, this approach can be expensive and it can also be difficult to work within Exchange Server s limitations since the DAG cannot exceed sixteen members. Another consideration is that the primary datacenter might actually need more than the minimum number of DAG servers required to maintain quorum. The primary datacenter can theoretically maintain quorum if more than half of the DAG members reside in the physical datacenter. However, if a WAN failure and the failure of a DAG member at the primary datacenter occurred simultaneously then the primary datacenter would lose quorum unless the datacenter contained enough DAG members to absorb the loss. One big problem with this scenario is that if a WAN failure did occur and the primary datacenter retained quorum then the remote datacenter would lose quorum, which would mean an outage for users working in the remote datacenter. The only way to avoid this type of situation using native Exchange Server components is to create multiple DAGs. However, each mailbox server can only belong to a single DAG, so this approach can become quite expensive. So what about the failure of an individual DAG member within a stretched DAG? By default, when a DAG member fails the databases can failover to any other DAG member with a suitable database replica. This is concerning in a stretched DAG because it is generally undesirable to have a failover occur across a WAN link. Microsoft allows you to prevent cross site failovers by setting the DatabaseCopyAutoActivationPolicy to Blocked. Of course this also means that if a situation ever occurs in which a cross site failover is needed then the Exchange administrator will have to work through a manual procedure to facilitate the failover.

Database Availability Groups work really well as a fault tolerant solution in a single datacenter environment. However, stretched DAGs can be expensive and complex to implement, and may not fully address all of an organization s needs. Considering DAG Lagged Copies Consultants and other commentators often consider the use of a lagged database copy within a DAG for Exchange 2010 deployments. Typically, once there is more than two passive database copies, thoughts turn to the creation of a lagged copy to provide the ability for a point in time recovery should the need arise. A lagged database copy is one that is not updated by replaying transactions as they become available. Instead, the transaction logs are kept for a certain period and are then replayed. The lagged database copy is therefore maintained at a certain point relative to the active database and the other non-lagged database copies. The primary reason to use lagged database copies (7 or 14 days for example) is to provide you with the ability to go back to a point in time when you are sure that a database is in a good state. By delaying the replay of replicated transaction logs into a database copy, you always have the ability to go to that copy and know that it represents a point in time in the past when the database was in a certain condition. But the promise is stronger than the reality, as Microsoft IT does not recommend the use of using lag copies with database availability groups. Microsoft no longer necessarily recommends lag copies as a backup strategy for Exchange data, and if you choose to implement them, you should have a solid justification based on your organizational size, available hardware, and so forth. Tony Redmond, a noted Exchange author, also recommends storage-based approaches when looking at point-in-time recovery solutions. Considerations of using lagged copies: Consumes large amount of free space (space for the database plus 7-14 days worth of log files) Microsoft provides no wizard or GUI interface to recover data from a lagged database. Manual Recovery: Steps are straightforward but they are manual and depend on the administrator. Difficult to use for single-item or mailbox recovery If lagged copy becomes primary, you force a reseed for all other database copies. Regarding the use of lagged copies: Recovery of complete databases is a different matter. My recommendation is that you should invest in storage or backup technology that incorporates strong recovery capabilities. Some storage offers very good snapshot recovery capabilities so that recovery is a matter of selecting the appropriate snapshot and recovering from it; some backup products provide similar capabilities. Your choice will be dictated by personal preference, previous deployment history, and your knowledge of how strong support personnel are within your company. In other words, you ll select the best tool for the job to fit the unique circumstances of your Exchange 2010 deployment. Tony Redmond, Exchange MVP

Complementing Exchange Availability with Array Based Technology Many companies are protecting Exchange 2010 with backups and built-in availability technologies such as Database Availability Groups but have started to realize that while DAG s are a great technology, there are often areas where third-party replication and management technology will have a positive impact. Database Availability Groups can be difficult to setup and manage especially in multi-site implementations and as such, companies are looking to complement the local high availability offered with DAGs with remote replication technologies offered by infrastructure companies like EMC. Another reason to consider storage-based solutions is for companies also have a need to quickly restore copies for operational recovery purposes such as single-item email restores which cannot be fulfilled quickly and easily by DAGs or DAG lagged copies. Technology like snapshots and continuous data protection technology can enhance these protection technologies and enable instant, granular restores and zero data loss replication and can be managed from a central location. Snapshots Overview In general, a snapshot is a copy taken of a network share or an application using hostbased resources (software VSS snapshots) or hardware based options (using hardware providers). Snapshots can be made quickly and enable multiple application restore points primarily for faster recoveries, but also for use cases such as test/dev, reporting, analysis, and repurposing. Advantages of snapshots Improves speed of backup and restore Conserves resources on the server Provides a point in time of your application s data Data can be restored quickly to the same state as the copy Volume remains available to applications during the snapshot process Can remove the need for Exchange DAG lagged copies Disadvantages of snapshots Offered only on some storage systems Requires additional host software to enable application consistency Can utilize excess capacity if not managed correctly Continuous Data Protection Overview Continuous data protection (CDP) is a protection technology that is becoming increasingly popular. CDP creates an electronic journal of changes for every instant in time that data modification occurs. CDP technology enables any point in time recovery for recovery to the millisecond, using a unique DVR-like rollback mechanism.

Advantages of continuous data protection include: It preserves a record of every transaction that takes place in the company. If the system becomes infected with a virus or Trojan, or if a file becomes mutilated or corrupted and the problem is not discovered until sometime later, it is possible to recover the most recent clean copy of the affected file. It offers data recovery in a matter of seconds much less time than tape backups or archives. The installation of hardware and programming is straightforward and simple and does not put existing data at risk. Disadvantages of CDP include: Performance on the production application is decreased while waiting for confirmation that the replication has occurred. In the event of data corruption, the restoration process can be lengthy and contingent on network speed and file size. It requires proprietary hardware that adds cost and complexity.

Looking to the Future Some administrators may be understandably hesitant to invest in a high availability / disaster recovery solution for Exchange Server because Microsoft is expected to release Exchange Server 2013 next year. With each new Exchange Server release Microsoft has traditionally improved Exchange Server s resiliency to failure, so it is only natural to question whether the improvements that will be available in Exchange 2013 will mitigate the need to invest in a third party solution. Database Availability Groups continue to be the mechanism for ensuring the resiliency of mailbox servers in Exchange 2013. However, the improvements that Microsoft has made to DAGs might best be described as modest in scope. The improvements that Microsoft has made in Exchange Server 2013 fall into two general categories improvements to DAGs and improvements to the databases themselves. Let s start by talking about database level improvements. Database Improvements Without a doubt the most significant improvement that Microsoft has made at the database level is that of replacing the old store.exe with a managed store that has been entirely rewritten in C#. The reason why this change is so important is because the managed store provides failure isolation. Thanks to the managed store, each database now has a dedicated worker process. This means that is anything goes wrong with a database process, the problem will only impact that database. Other databases will remain unaffected because they are managed by separate worker processes. Some of the other database level improvements that Microsoft has made are not quite as drastic, but may prove to be helpful nonetheless. For example, Microsoft has added support for disks of up to 8 TB in size, and Exchange 2013 will make it possible to mix active and passive database copies on the same disk something that was not previously possible. DAG Improvements The improvements to DAGs in Exchange 2013 are more evolutionary than revolutionary. For instance, Microsoft has automated the configuration of DAG networks and they have made some enhancements to some of the DAG related PowerShell cmdlets. DAGs have also been optimized so that failover can occur more quickly. Exchange 2010 DAGs were designed to failover at the database level. When a failure was detected, Exchange would launch a process to determine which database copy would be the best choice to use during the failover. Exchange 2013 still supports database level failovers, but the process of selecting the best database copy to use has been enhanced. Some of the criteria that are considered include the copy queue length, the replay queue length, the database status, and the content index status. In addition to supporting database level failures, Exchange 2013 also supports failovers at the server level or even the datacenter level. Although the prospect of datacenter level failovers initially sounds promising, early Exchange 2013 builds indicate that many of the problems that plagued stretched DAGs in Exchange 2010 may still be an issue in Exchange 2013. For example, there is simply no getting around the cost and complexity of planning and implementing a stretched DAG.

WAN bandwidth limitations will also likely continue to be an issue. The WAN link must be able to deliver adequate bandwidth to facilitate the replication process without the link becoming saturated in the process. Of course organizations must also consider the ongoing costs associated with the logistics of managing remote Exchange Server deployments. It may ultimately be easier and more cost effective for organizations to use a third party solution for Exchange Server data replication than to rely on stretched DAGs. Summary In future, Exchange teams should continue to use what they know but also invest in leading edge technologies from Microsoft and third-party companies with proven technology.