Managing Data Center Interconnect Performance for Disaster Recovery

Similar documents
SAN Conceptual and Design Basics

Data Center Fabric Convergence for Cloud Computing (the Debate of Ethernet vs. Fibre Channel is Over)

Extending SANs Over TCP/IP by Richard Froom & Erum Frahim

EonStor DS remote replication feature guide

Overview. performance bottlenecks in the SAN,

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig

Cloud Storage Specialist Certification Self-Study Kit Bundle

Contents. Foreword. Acknowledgments

Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays

Primary Data Center. Remote Data Center Plans (COOP), Business Continuity (BC), Disaster Recovery (DR), and data

Optical Networks for Next Generation Disaster Recovery Networking Solutions with WDM Systems Cloud Computing and Security

Business Continuity with the. Concerto 7000 All Flash Array. Layers of Protection for Here, Near and Anywhere Data Availability

ADDENDUM 1 September 22, 2015 Request for Proposals: Data Center Implementation

NetComplete Service Assurance Solutions Portfolio

Storage Networking Management & Administration Workshop

How To Build A Network For Storage Area Network (San)

Using Multipathing Technology to Achieve a High Availability Solution

Using High Availability Technologies Lesson 12

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5

Testing VoIP on MPLS Networks

Xangati Storage Solution Brief. Optimizing Virtual Infrastructure Storage Systems with Xangati

A SWOT ANALYSIS ON CISCO HIGH AVAILABILITY VIRTUALIZATION CLUSTERS DISASTER RECOVERY PLAN

Cisco Active Network Abstraction Gateway High Availability Solution

WHITEPAPER: Understanding Pillar Axiom Data Protection Options

The Impact Of The WAN On Disaster Recovery Capabilities A commissioned study conducted by Forrester Consulting on behalf of F5 Networks

Synchronous Data Replication

Value Proposition for Data Centers

IP Storage On-The-Road Seminar Series

Brocade Fabric Vision Technology Frequently Asked Questions

Westek Technology Snapshot and HA iscsi Replication Suite

Optimizing Dell Compellent Remote Instant Replay with Silver Peak Replication Acceleration

Flexible SDN Transport Networks With Optical Circuit Switching

Whitepaper Continuous Availability Suite: Neverfail Solution Architecture

BROCADE FABRIC VISION TECHNOLOGY FREQUENTLY ASKED QUESTIONS

Solution Brief Network Design Considerations to Enable the Benefits of Flash Storage

Benefits of WAN-Level Visibility in Monitoring and Maintaining your Network

High Performance Cloud Connect and DCI Solution at Optimum Cost

Building the Virtual Information Infrastructure

How To Connect Virtual Fibre Channel To A Virtual Box On A Hyperv Virtual Machine

List of Figures and Tables

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION

Troubleshooting LANs with Network Statistics Analysis

Deploying Global Clusters for Site Disaster Recovery via Symantec Storage Foundation on Infortrend Systems

Global Server Load Balancing

Ethernet Service OAM. Standards and Functionality. Connectivity Fault Management (CFM) Fault Detection. White Paper

Block based, file-based, combination. Component based, solution based

How To Build An Ip Storage Network For A Data Center

Accurate End-to-End Performance Management Using CA Application Delivery Analysis and Cisco Wide Area Application Services

SQL Server Storage Best Practice Discussion Dell EqualLogic

RFC 6349 Testing with TrueSpeed from JDSU Experience Your Network as Your Customers Do

CompTIA Storage+ Powered by SNIA

Network Simulation Traffic, Paths and Impairment

7 Ways Compellent Optimizes VMware Server Virtualization WHITE PAPER I MARCH 2010

Data Compression and Deduplication. LOC Cisco Systems, Inc. All rights reserved.

Technical Considerations in a Windows Server Environment

An Oracle Technical White Paper September The Art of Data Replication

Protecting Your Data with Remote Data Replication Solutions

Quantum StorNext. Product Brief: Distributed LAN Client

Leveraging the Cloud for Data Protection and Disaster Recovery

High Availability with Windows Server 2012 Release Candidate

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

Enabling Cloud Architecture for Globally Distributed Applications

Server and Storage Virtualization with IP Storage. David Dale, NetApp

Fibre Channel over Ethernet in the Data Center: An Introduction

Performance and Scale in Cloud Computing

Backup Exec 9.1 for Windows Servers. SAN Shared Storage Option

Next-Generation Federal Data Center Architecture

Federated Application Centric Infrastructure (ACI) Fabrics for Dual Data Center Deployments

Microsoft Cross-Site Disaster Recovery Solutions

Windows Geo-Clustering: SQL Server

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Brocade Network Monitoring Service (NMS) Helps Maximize Network Uptime and Efficiency

OpenFlow -Enabled Cloud Backbone Networks Create Global Provider Data Centers. ONF Solution Brief November 14, 2012

Fibre Forward - Why Storage Infrastructures Should Be Built With Fibre Channel

IP SAN Best Practices

New Features in SANsymphony -V10 Storage Virtualization Software

White paper. Reliable and Scalable TETRA networks

Rose Business Technologies

HA / DR Jargon Buster High Availability / Disaster Recovery

Disaster Recovery Cloud Computing Changes Everything. Mark Hadfield CEO - LegalCloud Peter Westerveld IT Director - Minter Ellison Lawyers

HRG Assessment: Stratus everrun Enterprise

Technical White Paper Integration of ETERNUS DX Storage Systems in VMware Environments

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

IBM Tivoli Network Manager software

Optimizing Large Arrays with StoneFly Storage Concentrators

Continuous Data Protection. PowerVault DL Backup to Disk Appliance

FlexNetwork Architecture Delivers Higher Speed, Lower Downtime With HP IRF Technology. August 2011

Synchronization Essentials of VoIP WHITE PAPER

CompTIA Cloud+ 9318; 5 Days, Instructor-led

Cisco Wide Area Application Services Optimizes Application Delivery from the Cloud

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER

Making Ethernet Over SONET Fit a Transport Network Operations Model

Customer White paper. SmartTester. Delivering SLA Activation and Performance Testing. November 2012 Author Luc-Yves Pagal-Vinette

Redefining Oracle Database Management

HP Cisco data replication solutions. Partnering to deliver business continuity and availability

CompTIA Cloud+ Course Content. Length: 5 Days. Who Should Attend:

Clearing the Way for VoIP

Transcription:

Application Note Managing Data Center Interconnect Performance for Disaster Recovery Overview Data center interconnects (DCI) are traditionally used for critical data replication to meet corporate compliance and continuity requirements. Recent virtualization technology now allows IT to shift computing pools with DCI for greater resource efficiency and accessibility. With greater reliance on the services running in DCI environments, IT and service providers must have visibility into the performance and events that can impact end-user and business processes. Managing the success of these services, and the critical wide-area network (WAN) links, requires an analysis solution that can identify baseline behavior and isolate performance degradation to avoid disrupting the most important services being carried out across these critical links. Managing data center interconnect performance for disaster recovery highlights: Industry-leading packet analysis and inspection of critical WAN links ensures data compliance and accessibility for business-essential backup and replication services. Synchronous-replication environments require packet-level visibility and baseline analysis to maintain application performance. Storage network equipment manufacturers (NEMs) trust Xgig Analyzer to measure storage performance and to isolate communication failures. Data backup and replication performance baselines help service managers understand normal behavior and identify slow degradation that can lead to service disruption. Packet-level visibility shortens problem isolation and resolution when triaging multivendor issues. DCI and Synchronous Replication The use of DCI to enable high availability and continuity replication services is critical. At the same time, the data center industry is embracing DCI to dynamically move and geographically extend virtual computing pools for high performance and low cost. Organizations of all sizes are benefiting from the accessibility of interconnected data centers, but replication and backup are the two used predominantly to protect against catastrophic events like weather and electrical outages, as seen in the news recently. The most resilient form of data integrity and accessibility in DCI environments is facilitated through synchronous replication using fibre channel across wave-division multiplexed (WDM) optical networks, typically supplied by third-party carriers. Synchronous replication is unnecessary for every application within an organization; rather it is typically reserved for applications that require the most stringent, near instantaneous recovery point objectives and recovery time objectives (RPO and RTO) such as those found in the financial, insurance, and telecom industries. Synchronous replication helps to achieve a near-zero RPO because data is simultaneously written to both local and remote volumes. When this type of replication is used, changes made to the source storage array are immediately visible at and propagated to the target storage array (see Figure 1). WEBSITE: www.jdsu.com/test

Application Note: Managing Data Center Interconnect Performance for Disaster Recovery 2 Synchronous Replication Site A Site B 1. Write Request 4. Write Complete 2. Replication 3. Acknowledgment Primary Storage Secondary Storage Figure 1. Changes to the source storage array are immediately visible at and propagated to the target storage array. Data cannot be written to the source until the target storage array commits and acknowledges the previously written data. Synchronous replication, also called real-time replication, does not require buffering/delay between data committed to both the source and target elements. As a result, synchronous replication requires high-bandwidth/low-latency WAN connections, typical of WDM optical connections. Note that minimal latency increases across these networks can directly correlate to replication performance and, as a result, application performance, see Figure 2. 1. 3. 5. 2. 4. Primary DC Transaction processing without synchronous replication is dependent on: 1. End-user processing time 4. SAN transport time 2. Network transport time 5. Storage-array processing time 3. Host-server processing time 1. 3. 5. 7. 2. 4. 6. Primary DC Secondary DC Transaction processing with synchronous replication is dependent on: 1. End-user processing time 5. Storage-array processing time 2. Network transport time 6. WAN transport time 3. Host-server processing time 7. Remote-storage processing time 4. SAN transport time The greatest bottleneck of application performance are the replication components in steps 6 and 7. Figure 2. Applications with synchronous replication enabled are critically impacted by DCI performance. As Figure 2 shows, improper management can negatively impact application performance in a synchronous replication environment. Every bit of data must be written or modified in two separate data centers before the next transaction can occur. No matter how tuned an application, the network that carries the replication data always causes response time constraints.

Application Note: Managing Data Center Interconnect Performance for Disaster Recovery 3 The Need for Data Storage Visibility Enterprise and carrier-class switches support the collection of basic traffic and error data for performance monitoring and troubleshooting. However, using these counters to measure performance and isolate the root cause of issues is challenging and often impossible without deeper analysis. While many organizations use and embrace protocol analysis in IP and Ethernet networks, this investigation is less commonly understood or used in storage networks, even in DCI environments with critical business data. Data that remains within a data center usually has negligible input/output (I/O) latency as part of overall application/database response times because of limited network congestion. With DCI, WAN performance directly impacts I/O performance and, in turn, application/database performance. While storage networks within a data center commonly operate at 4, 8, 10, or even 16 G, DCI WAN connections often deliver much lower bandwidth rates. Furthermore, because third-party network operators often provide the WAN service, issues can arise within their networks from actions such as fiber-route failures, network failover, or poor maintenance. Data center operators must have analysis capabilities to determine the root cause for bottleneck issues arising from such situations. Unfortunately, pinpointing the source of issues can be complex. There are virtually no I/O or SCSI statistics in networking equipment and no ability to correlate I/O to disk array volumes. For instance, disk array operating systems typically provide abstract information such Disk Error messages that mask things like SCSI completion codes, limiting the ability to debug. DCI further exacerbates the challenge as, for instance, WDM combines many logical unit numbers (LUNs) over a single wavelength, making isolation more complex. Put simply, there is no way to monitor I/O performance issues without equipment built specifically to provide detailed visibility. Introducing the Xgig Analyzer The Xgig is the industry s leading solution for storage analysis on fibre channel and Ethernet and networks. Used by most in the storage equipment industry, the Xgig Analyzer is recognized as the gold standard for performance insight with these key capabilities: fibre channel, fibre channel over IP, fibre channel over Ethernet, and Ethernet analysis industry s best storage analytics redundant link analysis highly accurate timing clock for precise latency measurements simple access to performance metrics like: IOPS by LUN Min/Max/Avg exchange completion time read and write MB/Sec by LUN frames/sec by LUN errors by LUN protocol visibility, including all errors and primitives nonintrusive, 24x7 analysis provided via optical network taps. Figure 3 shows the various options for the Xgig Analyzer platform.

Application Note: Managing Data Center Interconnect Performance for Disaster Recovery 4 10 G Ethernet and 10 G FC Xgig Platform 1 slot Xgig Analysis Software 1 G Ethernet/1, 2, 4 G FC 2 slot Xgig 1, 2, 4, 8 G Fibre Channel 4 slot Xgig Load Tester/BERT 4, 8, 16 G Fibre Channel Jammer/Delay Emulator Figure 3. Various options for the Xgig Analyzer Viewing DCI Traffic One of the first things to consider when planning analysis of data storage traffic is placement of the equipment. The best location for DCI environments is close to the carrier demark, typically between the primary data center fibre channel switch and the carrier s WAN equipment. Placing the Xgig Analyzer in this location provides optimal investigation of I/O exchange latency, carrier throughput, and performance, as shown in Figure 4. Primary DC Secondary DC Figure 4. Xgig Analyzer placement Many metrics are available once you have packet-level visibility in a DCI environment. First, it is best to measure baseline performance for use future when analyzing increases or decreases in service performance. Dozens of metrics can be monitored from this location, three of the top statistics to monitor are: exchanges completed/sec exchange completion time fibre channel MB/Sec.

Application Note: Managing Data Center Interconnect Performance for Disaster Recovery 5 Exchanges completed/s is a metric that represents the exchange completion time for all (as shown in Figure 5) or selected initiator/target pairs. This counter is also commonly referred to as IO/Sec. While this does not reflect the potential exchange completion times, it accurately represents the actual performance between data centers. For an even more detailed view, look at the IO/Sec times for different targets to see if variations exist for individual disks. Figure 5. Xgig can display top statistics taken on the link over a period of time Exchange completion time is measured beginning with the start time of the first frame in the exchange and ends with the time of the last frame in the exchange. The exchange completion time includes latencies for components, fabric, transmission, and fabric congestion. It can be used to indicate how well the infrastructure is operating for similar types of exchanges. Since replication is one type of operation, there should not be a wide variation; however, variations may occur depending on the operations being carried out. MB/Sec is a counter calculated from the transfer rate of a frame type. This calculation does not consider unsuccessful or inefficient data transfer; however, it does give a good indication of the actual data rate the link is providing. Once you obtain a baseline sample of DCI performance, use it to compare samples taken at later dates/ times. Figure 6 provides an example of how the baseline on the left can be used to uncover a performance degradation issue later. Users may not recognize the degradation during daily tasks, however, the data center team should know to investigate this type of performance decrease to prevent impactful degradation.

Application Note: Managing Data Center Interconnect Performance for Disaster Recovery 6 Figure 6. Comparison of a baseline and a degraded condition on DCI DCI Error Condition Errored frames preset on the link can have the greatest impact on DCI performance. If they originate from the service provider supplying the WAN connection(s), a follow-up is required to resolve the issue and for possible restitution against service-level agreements (SLAs). Unlike Ethernet connections, fibre channel errors can provide great detail about the type of issues experienced on the link. The Xgig Analyzer is very easy to set up for the capture of these events when investigating the causes and origination of such problems. Figure 7 shows how to uncover the details of a Loss of Synch event on the service provider s side. Figure 7. Viewing errors and their source

Application Note: Managing Data Center Interconnect Performance for Disaster Recovery 7 When opening an event with Xgig software, the first indicator of an issue is the histogram at the bottom of the screen showing dropped traffic levels. On a good note, the event seems to have resolved itself over time; however, note the disruption even though it reset itself. The Xgig traffic summary then points out that the link experienced a Loss-of-Sync event that captured a large number of Link Reset frames, originating from the carrier s side. The event captured by Xgig was a physical connection problem on the service provider s optical ring. When the problem occurred, the switching fabric reset over to the redundant path for traffic to resume. This is how the carrier s architecture should handle such an event; however, without the visibility the Xgig Analyzer provides, data center operators would not have known the event occurred. Also, after this type of event, the baseline performance could be reviewed again to see if the secondary path is providing the same level of service that the primary path used originally. Conclusion DCI design and implementation is growing and impacting both cloud providers and enterprise organizations. The data transmitted across the WAN links enabling DCI also is increasing due to the growing volume of information that organizations gather and the need to move it continuously between storage systems. Given the expense of provisioning WAN connections for DCI, it is recommended that Enterprise and Cloud data center organizations implement dedicated analysis equipment to deliver storage performance and service as expected. It is also recommended that service providers who deliver network connectivity between data centers use analysis equipment to troubleshoot their infrastructure more effectively and to measure baseline performance against contracted obligations. The Xgig Analyzer is the only equipment on the market that provides this type of visibility, especially when storage systems connected in the DCI are transmitting data using fibre channel or fibre channel over IP across long-distance links.

Application Note: Managing Data Center Interconnect Performance for Disaster Recovery 8 Test & Measurement Regional Sales NORTH AMERICA TOLL FREE: 1 855 ASK-JDSU 1 855 275-5378 LATIN AMERICA TEL: +1 954 688 5660 FAX: +1 954 345 4668 ASIA PACIFIC TEL: +852 2892 0990 FAX: +852 2892 0770 EMEA TEL: +49 7121 86 2222 FAX: +49 7121 86 1222 www.jdsu.com/test Product specifications and descriptions in this document subject to change without notice. 2012 JDS Uniphase Corporation 30173246 000 0812 DCIPERFORMANCE.AN.SNT.TM.AE August 2012