Implementing Storage Concentrator FailOver Clusters



Similar documents
Optimizing Large Arrays with StoneFly Storage Concentrators

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

The functionality and advantages of a high-availability file server system

DAS to SAN Migration Using a Storage Concentrator

Using Multipathing Technology to Achieve a High Availability Solution

Introduction to MPIO, MCS, Trunking, and LACP

SAN Conceptual and Design Basics

WHITEPAPER: Understanding Pillar Axiom Data Protection Options

Using VMware ESX Server with IBM System Storage SAN Volume Controller ESX Server 3.0.2

If you already have your SAN infrastructure in place, you can skip this section.

Using High Availability Technologies Lesson 12

Getting Started with Endurance FTvirtual Server

Cisco Active Network Abstraction Gateway High Availability Solution

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

Parallels. Clustering in Virtuozzo-Based Systems

Disk-to-Disk Backup & Restore Application Note

Post-production Video Editing Solution Guide with Quantum StorNext File System AssuredSAN 4000

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

Best Practice of Server Virtualization Using Qsan SAN Storage System. F300Q / F400Q / F600Q Series P300Q / P400Q / P500Q / P600Q Series

NetApp E-Series Storage Systems

Clustering in Parallels Virtuozzo-Based Systems

Preface Introduction... 1 High Availability... 2 Users... 4 Other Resources... 5 Conventions... 5

Westek Technology Snapshot and HA iscsi Replication Suite

SanDisk ION Accelerator High Availability

Step-by-Step Guide to Open-E DSS V7 Active-Active Load Balanced iscsi HA Cluster

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

IP SAN BEST PRACTICES

Setup for Failover Clustering and Microsoft Cluster Service

Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices

IP SAN Best Practices

Synology High Availability (SHA)

Windows Host Utilities Installation and Setup Guide

Setup for Failover Clustering and Microsoft Cluster Service

Step-by-Step Guide. to configure Open-E DSS V7 Active-Active iscsi Failover on Intel Server Systems R2224GZ4GC4. Software Version: DSS ver. 7.

Dell High Availability Solutions Guide for Microsoft Hyper-V R2. A Dell Technical White Paper

Dell High Availability Solutions Guide for Microsoft Hyper-V

Compellent Storage Center

Volume Replication INSTALATION GUIDE. Open-E Data Storage Server (DSS )

Setup for Failover Clustering and Microsoft Cluster Service

Deploying Global Clusters for Site Disaster Recovery via Symantec Storage Foundation on Infortrend Systems

Table of contents. Matching server virtualization with advanced storage virtualization

High Availability Design Patterns

SGI InfiniteStorage NAS 50/100 Quick Start Guide

Availability Digest. Redundant Load Balancing for High Availability July 2013

Synology High Availability (SHA)

High Availability of the Polarion Server

Traditionally, a typical SAN topology uses fibre channel switch wiring while a typical NAS topology uses TCP/IP protocol over common networking

Setup for Failover Clustering and Microsoft Cluster Service

Use cases and best practices for HP StorageWorks P2000 G3 MSA FC/iSCSI Combo Controller

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

Configuring a Microsoft Windows Server 2012/R2 Failover Cluster with Storage Center

EMC Celerra NS Series/Integrated

Affordable. Simple, Reliable and. Vess Family Overview. VessRAID FC RAID Storage Systems. VessRAID SAS RAID Storage Systems

Step-by-Step Guide to Open-E DSS V7 Active-Active iscsi Failover

VMware High Availability (VMware HA): Deployment Best Practices

A Project Summary: VMware ESX Server to Facilitate: Infrastructure Management Services Server Consolidation Storage & Testing with Production Servers

Using VMWare VAAI for storage integration with Infortrend EonStor DS G7i

DATA CENTER. Best Practices for High Availability Deployment for the Brocade ADX Switch

Fibre Channel and iscsi Configuration Guide

Networking and High Availability

Pivot3 Reference Architecture for VMware View Version 1.03


EMC VNXe HIGH AVAILABILITY

EMC Virtual Infrastructure for Microsoft SQL Server

FioranoMQ 9. High Availability Guide

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Microsoft Exchange Server 2003 Deployment Considerations

Vicom Storage Virtualization Engine. Simple, scalable, cost-effective storage virtualization for the enterprise

How To Design A Data Centre

Scale-Out File Server. Subtitle

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

FlexArray Virtualization

GlobalSCAPE DMZ Gateway, v1. User Guide

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

EMC Unified Storage for Microsoft SQL Server 2008

EonStor DS remote replication feature guide

Setup for Failover Clustering and Microsoft Cluster Service

EMC PowerPath Family

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Parallels Virtuozzo Containers 4.7 for Linux

NETAPP WHITE PAPER USING A NETWORK APPLIANCE SAN WITH VMWARE INFRASTRUCTURE 3 TO FACILITATE SERVER AND STORAGE CONSOLIDATION

VTrak SATA RAID Storage System

Storage Networking Foundations Certification Workshop

DeltaV Virtualization High Availability and Disaster Recovery

Dell Compellent Storage Center

Nondisruptive Operations and Clustered Data ONTAP

Server and Storage Virtualization with IP Storage. David Dale, NetApp

E-Series. NetApp E-Series Storage Systems Mirroring Feature Guide. NetApp, Inc. 495 East Java Drive Sunnyvale, CA U.S.

Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4

Redundancy in enterprise storage networks using dual-domain SAS configurations

EMC Business Continuity for Microsoft SQL Server 2008

High Availability Cluster for RC18015xs+

High Availability with Windows Server 2012 Release Candidate

Transcription:

Implementing Concentrator FailOver Clusters Technical Brief All trademark names are the property of their respective companies. This publication contains opinions of StoneFly, Inc. which are subject to change from time to time. This publication is copyright by StoneFly, Inc. and is intended for use only by recipients authorized by StoneFly, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of StoneFly, Inc., is in violation of U.S. copyright law.

Introduction The StoneFly Concentrator is the interface between hosts and storage devices in an IP network. IP-based storage area networks (IP SANs) use the iscsi protocol over an Ethernet and TCP/IP network. The Concentrator FailOver Cluster is a redundant storage solution comprised of two Concentrators configured so that one is actively managing storage volumes and the other is in standby mode monitoring all fault conditions. If the active Concentrator fails for any reason, then a FailOver event will be initiated. FailOver to the standby unit is automatic and transparent to users. This application note describes some typical usage scenarios, including: FailOver Events Using a FailOver Cluster with different types of back-end storage Note: This paper is not intended to provide detailed setup instructions for a FailOver Cluster. Comprehensive setup and user instructions for FailOver Clusters can be found in the Concentrator Setup Guide and Concentrator User s Guide. FailOver Cluster Overview FailOver is an important fault tolerance function of mission-critical systems that require constant accessibility. A Concentrator FailOver Cluster adds a high availability layer to an IP Area Network. If a component of the active system fails, the FailOver Cluster automatically substitutes a functionally equivalent standby system that takes over the operations of the failed active system. FailOver automatically redirects user requests from the failed active system to the standby system. Concentrator FailOver Cluster: Currently, a cluster consists of two Concentrators, one active unit and one standby unit. A cluster appears as a single entity to hosts on the network. The components of a FailOver Cluster include: Active Concentrator: The active Concentrator is the Concentrator in a cluster that has active iscsi sessions with hosts on the network. The active Concentrator configures the volumes and back-end sessions and replicates the metadata to the standby Concentrator in the cluster. Standby Concentrator: The standby Concentrator monitors all FailOver conditions. In the event of a failure at the active Concentrator, the standby Concentrator takes over the active session. The standby Concentrator in the cluster transparently becomes the active Concentrator. The Concentrator volumes are now managed by the new active unit. There can be only one standby Concentrator. Cluster IP Address: The cluster IP address is the IP address of the FailOver cluster. Host server initiators view the cluster as a single entity with the Cluster IP address. Each unit in the cluster also has its own IP addresses for both the Gigabit Ethernet I/O traffic and the Management traffic. A single cluster uses either five or seven IP addresses depending on whether each SC has one or two GbE iscsi ports, as shown below: StoneFly, Inc. Page 2 of 9

FailOver Events: Each type of FailOver event is described below: A. Management Port Link Failure The loss of link between the Active unit s Management Port and its switch causes the Standby unit to take control of the storage devices and then become the Active unit. All future access to the storage devices is through the new Active unit. When the fault is corrected, the failed Concentrator will automatically reboot itself and rejoin the Cluster as the Standby unit B. iscsi Gigabit Ethernet Port Link Failure The loss of link between the Active unit s iscsi Gigabit Ethernet Port and its switch causes the Standby unit to take control of the storage devices and then become the Active unit. When its Gigabit Ethernet connection is restored, the failed unit reboots and becomes the Standby unit in the FailOver Cluster. If you are using the dual GbE connections, you can plug in both connections or a single connections. Only by plugging in both connections to the same switch will you have automatic Adaptive Load Balancing. If a GbE port fails on an active dual-ported unit in the cluster, it will FailOver to the standby unit if it still has two functioning ports. C. Power Failure to the Active Concentrator If the power fails to the Active unit but not the Standby unit, the Standby unit immediately takes control of the storage devices and then becomes the Active SC. As soon as power is re-applied to the failed Concentrator it reboots as the Standby unit in the FailOver Cluster. D. Power Failure to Both Concentrators Should power fail and then be re-applied to both Concentrators in a FailOver Cluster, both units will reboot resulting in a negotiation between the units where one StoneFly, Inc. Page 3 of 9

unit becomes the Active unit and the other becomes the Standby unit. No reconfiguration is required by the Administrator. E. Loss of Connection to Devices FailOver Clusters are established such that each Concentrator has a separate connection to each storage device. These storage devices present the same Logical Units (LUNs) to each Concentrator. If the Active unit loses connection to a storage device and the Standby unit continues to be connected, then the Standby unit takes control of all the storage devices and becomes the Active unit. The connection may be re-established between the failed unit and the storage device at a later time. The failed Concentrator will then reboot to establish it as the new Standby unit in the FailOver Cluster. Configuring Behind a Concentrator FailOver Cluster The storage subsystems behind a Concentrator FailOver Cluster should support RAID and have two SCSI ports or two Fibre Channel ports. For FailOver to operate correctly, storage must be configured so that it is symmetrical and connected to both Concentrators. A system that has a spare JBOD on the active side, for example, may not FailOver as expected due to the asymmetrical configuration. The storage must be configured such that both Concentrators can each view all LUNs. For storage with active-active dual controllers, each Concentrator should be connected to a different port on each RAID controller. For single controller RAID systems, the storage must have two ports on a single controller with a Concentrator connected to each. In the case of active/passive configurations, there must be two ports on the active controller with the Concentrators both connected to this active controller. Fibre Channel connections configured through a switch must be zoned correctly make the LUNS visible to both units in the FailOver configuration. Users can employ two Gigabit Ethernet switches and two Fibre Channel switches for full redundancy. Note: Contact StoneFly for a list of certified storage disk arrays for FailOver Configuring the Cluster with Active-Active Dual This section is an overview of setting up Concentrator FailOver configuration with a dual-controller, active/active RAID storage array that operates in a primary / secondary mode in a Fibre Channel network. On the Array, a full high-availability solution can be implemented with two Fibre Channel Switches for full redundancy and no manual intervention. The same configuration could be designed with a single FC Switch and appropriate Zoning but this would create a single point of failure in the Fibre Channel network; likewise, the user could build the configuration with a single Gigabit Ethernet switch. StoneFly, Inc. Page 4 of 9

Server Server Server Server Server Server Concentrator 1 Concentrator 2 FC Switch 1 FC Switch 2 0 1 0 1 Primary Raid Secondary Raid Lun X Active-Active System Configuration and setup: 1. Allocate LUN X on the Array and assign the LUN to be associated with both the primary and secondary controller. 2. Configure the FC switches so that the Concentrator SC1 can see LUN X on the primary controller. Bring up SC1 and verify that LUN X is visible as a resource. This is exactly the same for an existing single SC configuration. 3. Zone FC Switch 2 to make LUN X visible to SC2 on the secondary controller. 4. Bring up SC2 and verify that LUN X is visible as a resource. Do not have this Concentrator configured to allow access be any of the connected servers. No I/O should be allowed at this time on SC2. StoneFly, Inc. Page 5 of 9

5. Open the user interface of SC1 and configure it to accept SC2 as a member of the FailOver Cluster. 6. Open the user interface of SC2 and configure it to join SC1 as a member of the FailOver Cluster. 7. The two systems will now form the FailOver Cluster. SC2 will receive a copy of the metadata specifying all attached storage, allocated volumes and access control lists. Effectively SC2 will inherit the full configuration of SC1 (except for the management port IP address each system will maintain their own management port address). 8. SC2 will attempt to open LUN X and read block 0. This will allow the system to verify that the label on the storage device matches on both sides and check that the same LUN is accessible on both controllers. The Array will migrate the LUN whenever a read request is received. There is a short delay while all pending I/Os are completed on the primary LUN controller before the read operation can be fulfilled on the secondary controller. While the secondary controller read is underway the LUN is not accessible on the primary controller. During this time I/O operations on the primary are returned with a busy status and will be retried a few seconds later, which will cause the LUN to migrate back to the primary. This read is only used during system startup and not during normal operations. The impact at the system level will be barely noticed. No I/O operations are lost, only delayed for a few seconds. 9. The operator should now verify that the Active / Standby Concentrator Cluster has formed and that all resources are now redundant, dual pathed and not critical. This status is indicated on the user interface of SC1. 10. During a FailOver operation the iscsi IP address will be migrated from SC1 to SC2. SC2 will cause the Array to migrate the LUN to its secondary controller and become available for I/O operations. The Ethernet network is redirected to the MAC address of SC2 with an ARP response packet and finally the servers will reconnect to SC2 and retry any uncompleted I/O operations. This whole transition should be complete in less than 15 seconds and will be accomplished with no lost I/O operations. 11. SC1 is now available to attempt to restart and reform the Cluster with SC2. Operator intervention may be required to repair any physical failures. Configuring the Cluster with Single Dual-Bus In a single controller Array, each of the Concentrators must be connected to the same controller, but to different buses. This will allow both Concentrators to view the same LUN. 1. Bring up SC1 and verify that LUN X is visible as a resource. This is exactly the same for an existing single SC configuration. 2. Bring up SC2 and verify that LUN X is visible as a resource. Do not have this Concentrator configured to allow access be any of the connected servers. No I/O should be allowed at this time on SC2. StoneFly, Inc. Page 6 of 9

Server Server Server Server Server Server Concentrator 1 Concentrator 2 0 1 Lun X Single Dual-Bus RAID System 3. Open the user interface of SC1 and configure it to accept SC2 as a member of the FailOver Cluster. 4. Open the user interface of SC2 and configure it to join SC1 as a member of the FailOver Cluster. 5. The two systems will now form the FailOver Cluster. SC2 will receive a copy of the metadata specifying all attached storage, allocated volumes and access control lists. Effectively SC2 will inherit the full configuration of SC1 (except for the management port IP address each system will maintain their own management port address). 6. SC2 will attempt to open LUN X and read block 0. This will allow the system to verify that the label on the storage device matches on both sides and check that the same LUN is accessible. This read is only used during system startup and not during normal operations. StoneFly, Inc. Page 7 of 9

7. The operator should now verify that the Active/Standby Concentrator Cluster has formed and that all resources are now redundant, dual pathed and not critical. This status is indicated on the user interface of SC1. 8. During a FailOver operation the iscsi IP address will be migrated from SC1 to SC2. SC2 will become available for I/O operations. The Ethernet network is redirected to the MAC address of SC2 with an ARP packet and finally the servers will reconnect to SC2 and retry any uncompleted I/O operations. This whole transition should be complete in less than 15 seconds and will be accomplished with no lost I/O operations. 9. SC1 is now available to attempt to restart and reform the Cluster with SC2. Operator intervention may be required to repair any physical failures. Configuring the Cluster with an Active-Passive Array Some storage arrays with active-passive dual controllers do not allow viewing the same LUN on both the active and passive controller at the same time. If this is the case, then the Array is compatible with the Concentrator FailOver Cluster only if it allows simultaneous connection to two buses on the active controller. If there are not two buses on the active controller, then it will not be possible to use the Array with the Concentrator FailOver Cluster. Server Server Server Server Server Server Concentrator 1 Concentrator 2 0 1 0 1 Active RAID Passive RAID Lun X Active-Passive System Must Bypass Second StoneFly, Inc. Page 8 of 9

In an active-passive Array, each of the Concentrators must be connected to the same controller, but to different buses. This will allow both Concentrators to view the same LUN. 1. Bring up SC1 and verify that LUN X is visible as a resource. 2. Bring up SC2 and verify that LUN X is visible as a resource. If LUN X is visible to SC2, then continue to set up this storage device with the FailOver Cluster. The remaining steps are identical to those for setting up a single controller dual-bus storage array. Conclusion The Concentrator FailOver Cluster forms the basis of a redundant and reliable IPbased Area Network. The Concentrator FailOver pair features redundant CPUs, power supplies, hard drives, port connections and operating system. All customer configuration data from the active Concentrator is replicated to the standby unit for unparalleled data protection. Redundant Concentrators ensure storage volumes are continuously available. Concentrators fit seamlessly into existing storage and TCP/IP data networks and provide storage provisioning for heterogeneous servers and storage systems in a SAN. Setting up and managing the provisioned storage is an administrative task that is easily accomplished with the Concentrator. With StoneFly Concentrators, large, medium, and small organizations alike can finally leverage the benefits of storage networking using IP, including security, manageability, and quality of service, and use of their existing infrastructure. StoneFly, Inc. Page 9 of 9