Extending SANs Over TCP/IP by Richard Froom & Erum Frahim

Extending SANs Over TCP/IP by Richard Froom & Erum Frahim Extending storage area networks (SANs) over a distance has become a necessity for enterprise networks. A single SAN island with one storage system does not satisfy redundancy and disaster-recovery requirements. Enterprise networks require colocations, redundant data centers and disaster-recovery sites in the event of a network, environmental or other types of critical business interruption. As a result, many enterprise customers architect these disaster-recovery sites for the purposes of data recovery and business operation continuance during anomalous situations and disasters. In addition, several draft standards and best practice documents highly recommend or require financial institutions to adhere to strict disaster-recovery guidelines. As disaster-recovery solutions extend their reach to distant data centers, replicating, copying, migrating and vaulting data over TCP/IP from a main site to a remote site have become valuable skills. With Cisco MDS switches, enterprises can design their datareplication, data-copy or data-migration solutions using Cisco s Fibre Channel over IP (FCIP) solution without building separate transport networks for Fibre Channel. Furthermore, due to buffer constraints and requirements for a dedicated transport, extending SANs well beyond several hundred kilometers is limited with native Fibre Channel. FCIP solutions offer the ability to share existing bandwidth with Ethernet/IP solutions and reduce the costs of having dedicated transports, such as optical transports. Cisco MDS SAN Extension solutions provide 32 MB of buffering for FCIP that allows for disaster recovery solutions to overcome typical Fibre Channel distance limitations due to Fibre Channel buffering. In typical data-replication or data-copy environments, a storage system at the local site replicates its data to remote storage system at a remote data center. There are different methods of replication or data copying. These methods usually fall into one of the following categories: Synchronous Data Replication: A data-replication method that involves writing to a local storage system and, in turn, having the local storage system write data to the remote storage system. This transaction is not complete from the host perspective until the data is written and acknowledged at the local and remote storage system. This type of replication provides no data lost during an anomalous event. Synchronous data replication requires very low latency, and FCIP is not always the most popular choice. Generally, optical solutions are the first choice for the transport network. Nevertheless, deploying FCIP over optical solutions transporting Ethernet at short distances is becoming more popular. Asynchronous Data Replication: A data-replication method where data on the remote storage array is X amount of time behind the local storage array. This data, although not in sync with the local storage, is still valid data. In disaster recovery scenarios, some data may be lost, but only by the amount of time the replication is behind the real-time transactions, which is configurable in many storage arrays. This method allows transactions to be written to local storage without having to wait for acknowledgement from remote storage. Thus it does not hinder performance on the local host s application. Although latency is a factor with asynchronous data replication, it is not as critical as synchronous data

replication. As such, asynchronous data replication solutions are commonly used with FCIP. Data-Copying, Data-Vaulting, etc.: A data-replication method where data is copied to a remote data system by a copy algorithm. This type of replication method includes data migration and tape backups. It provides point-in-time recovery, but it is not deterministic and depends on how fast data is copied and when data is taken offline and copied. This category of replicating data commonly uses FCIP. This article will briefly introduce SAN extension using Cisco MDS switches with FCIP for the purpose of data replication. The topic will be approached from a high-level point-ofview, rather than a technological, marketing or comparative aspect. Interconnecting SANs Using FCIP Before delving into a discussion of TCP/IP and its effect on throughput and latency of transporting storage over IP, let s look at the basic configuration of interconnecting two Cisco MDS switches by building an extended inter-switch link (EISL) over an FCIP tunnel. An ISL is an inter-switch link in a storage fabric, while an EISL is an extended ISL used for carrying multiple virtual SANs (VSANs). Cisco s standardization of VSAN segments a physical fabric into multiple autonomous logical fabrics. Propagating multiple fabrics across a single EISL over IP has significant advantages. The first step to interconnect SANs over FCIP is to configure the egress (outgoing) IP interfaces on the Cisco MDS switch. Aside from the management interface, any Ethernet interface on the IPS-4, IPS-8 or MPS/MDS9216i module is capable of FCIP, with the correct license. The IP configuration of Gigabit Ethernet interfaces in SAN-OS is nearly identical to Cisco IOS. Here is an example of configuring an IP interface on a Cisco MDS switch: MDS-1# config terminal Enter configuration commands, one per line. End with CNTL/Z. MDS-1(config)# interface gigabitethernet 9/1 MDS-1(config-if)# ip address 192.168.1.2 255.255.255.0 MDS-1(config-if)# FCIP tunnels run on top of the Gigabit Ethernet interfaces. Before configuring the FCIP tunnel, the respective FCIP profile must be configured. FCIP profiles detail the local IP address and TCP parameters to be used on a FCIP tunnel. The FCIP interface configuration includes the peer IP address, FCIP profile and other configuration options, such as TCP time-stamping, compression and writes acceleration. Since FCIP profiles and the tunnels are independent, multiple FCIP tunnels may use the same FCIP profile in a multi-point configuration. The following example illustrates a minimum FCIP profile and tunnel configuration required to establish an EISL link between two SAN islands:

MDS-1: fcip enable fcip profile 1 ip address 192.168.1.2 interface fcip1 use-profile 1 peer-info ipaddr 192.168.3.2 interface GigabitEthernet9/1 ip address 192.168.1.2 255.255.255.0 MDS-2: fcip enable fcip profile 1 ip address 192.168.3.2 interface fcip1 use-profile 1 peer-info ipaddr 192.168.1.2 interface GigabitEthernet9/1 ip address 192.168.3.2 255.255.255.0 Throughput and Latency Traditional studies of TCP/IP show that TCP/IP uses windowing and throttling mechanisms to handle congestion (packet drops and latency) in IP networks. When moving SCSI over IP, it is very important not to drop any packets. Typically, storage traffic tends to be bursty in nature. Having a big enough window size (buffering) is critical in handling this type of traffic. With the Cisco MDS solution providing 32 MB of this window size (buffering), data-replication traffic is able to sustain Gigabit Ethernet bandwidth for long distances. Furthermore, long-haul solutions generally involve less than Gigabit Ethernet speed interfaces, such as OC-3 and lower, and may involve shared-bandwidth environments. As such, proper packet-shaping and TCP windowtuning configurations are required to optimize the throughput and behavior of FCIP so that packet drops are minimized. The main three parameters to configure when setting up the TCP behavior of FCIP interfaces are round-trip time (RTT) latency, maximum bandwidth and minimum bandwidth. The RTT is involved in maintaining the congestion window so that the FCIP tunnel does not send more traffic than the network can handle. Note that if the FCIP tunnel is run over dedicated bandwidth environments, such as Gigabit Ethernet over DWDM, non-default configuration of the minimum and maximum bandwidth may not be necessary. Furthermore, to reach data rates at high speeds such as OC-12 or Gigabit Ethernet over FCIP requires a significant number of outstanding

transactions (outstanding I/Os). Otherwise, there is not enough pending data to utilize all of the bandwidth due to the congestion windowing mechanism and network delay. The Cisco MDS includes methods by which to accurately gather the RTT. Once such method, of course, is to use Cisco IOS ping command. Another more accurate method of determining RTT is to use the ips measure-rtt macro. This command instructs the MDS to send 20 594-byte frames towards the destination IP address (dest-ip) out the egress interface (egress-interface) specified. However, the ips measure-rtt macro does not necessarily give latency during heavy load and does not provide throughput feedback. The best alternative to gauge RTT under load and available bandwidth is with the SAN Extension Tuner feature found in SAN-OS 2.0 and higher. This feature creates virtual N-port devices that send true input/output (I/O) for the purpose of gauging network throughput and latency. The following example illustrates resulting information from the SAN Extension Tuner feature: MDS-1# show san-ext-tuner interface gig9/2 nport pwwn 10:00:00:00:00:00:00:10 vsan 100 counters Statistics for nport Node name 10:00:00:00:00:00:00:10 Port name 10:00:00:00:00:00:00:10 I/Os per sec : 2290 Reads : 0% Writes : 100% Egress throughput : 4.77 MBs/sec (Max - 49.23 MBs/sec) Ingress throughput : 0.29 MBs/sec (Max - 2.98 MBs/sec) Average response time : Read - 0 us, Write - 415150 us Minimum response time : Read - 0 us, Write - 874 us Maximum response time : Read - 0 us, Write - 2912269 us Errors : 0 In terms of throughput and effective use of bandwidth, the minimum and maximum throughput configurations aid in maintaining a correct TCP congestion window and manage TCP slow-start for better performance. Typically, TCP slow start ramps up throughput at 50 percent per RTT. However, the minimum bandwidth configuration on the MDS allows for the FCIP tunnel to reach the minimum bandwidth rate after the first RTT. This configuration optimizes performance at startup and when a packet drop in the network occurs. The maximum bandwidth parameters are useful in preventing packet drops due to oversubscribing the available bandwidth to the FCIP tunnel. After you have thoroughly explored the throughput and latency characteristics of your network, you are now ready to configure the FCIP profile with the correct RTT, minimum and maximum bandwidth. The following FCIP profile command configures these values: MDS-1(config-profile)#tcp max-bandwidth-mbps <value-in-mbps> min-availablebandwidth-mbps <value-in-mbps> round-trip-time-ms <value-in-msec>

FCIP Advanced Features Cisco s MDS switches support additional advanced features that improve performance and availability of FCIP. A few significant features include, but are not limited to, Port- Channeling, EtherChannel, Compression, Write Acceleration, Tape Acceleration and IVR. Port-Channels and EtherChannels Cisco MDS switches support Port-Channeling for FCIP interfaces and EtherChannels for Gigabit Ethernet interfaces. Port-Channeling combines multiple FCIP interfaces into a single interface from an FSPF point-of-view and provides faster failover in the event of path failure for one of the FCIP tunnels. Faster failover is a result of FSPF not having to recalculate on link failure of link(s) in a Port-Channel. Take note that the use of both Write Acceleration and Port-Channeling requires SAN-OS 2.0. EtherChannels combine two Gigabit Ethernet interfaces into a single virtual interface for the purpose of redundancy. Generally, in respect to SANs, this configuration is for bundling links between a Catalyst switch and a Cisco MDS switch. Moreover, in the current hardware and software revisions, EtherChannels only provide link redundancy and do not provide for load balancing. The current implementation only transmits data traffic on one physical link, while the other transmits control traffic. Refer to the configuration guide for SAN-OS 2.0 for details on configuring EtherChannels and Port- Channels. Compression The Cisco MDS also supports compression for traffic on FCIP interfaces. The IPS-4 and IPS-8 line cards offer software compression up to OC-3 speeds while the MPS/MDS9216i offers hardware compression up to Gigabit speeds. Different modes of compression are available depending on available bandwidth. Figure 1 provides a guideline for selection compression modes based on WAN bandwidth. Inter-VSAN Routing Inter-VSAN routing (IVR) offers the facility to share resources across the VSAN in a Fibre Channel Fabric. Using IVR, you can share the resources across the VSANs without merging VSANs. Devices such as tape drives and storage systems that are part of one VSAN can be accessed across multiple VSANs using IVR. Without IVR, each of these devices has to be physically connected to each VSAN to access devices on the particular VSANs. IVR creates a path that only allows devices that need to access the resources across the VSANs to communicate, maintaining VSAN isolation.

In addition, IVR provides more efficient business continuity solutions for SAN extensions over FCIP. An example of IVR is having a storage system at local site in one VSAN, a remote storage system in another VSAN and a third VSAN, which is called a transit VSAN, that carries the traffic between the VSANs. In this manner, isolating SANs into VSANs limits any disruption on either fabric to that particular VSAN where the disruption occurred. As a result, IVR offers an extra layer of resiliency by providing a control boundary between the VSANs where only related FSPF routes and name servers entries are exchanged and modified in the transient VSAN. This characteristic of IVR reduces the chances of anomalous Fabric events to spread over the long distances. Write Acceleration The FCIP Write Acceleration feature is a performance-enhancement feature of the FCIP as it can reduce latency by 1 RTT per SCSI I/O. In synchronous data replication environments with longer distance, the benefits of Write Acceleration are significant. In a normal SCSI Write I/O, an initiator issues a Write command and awaits a Transfer Ready acknowledgement from the target before sending the actual data. Over long distances, the transfer takes at least 1 RTT to return. With the FCIP Write Acceleration feature, the local Cisco MDS switch proxies the Transfer Ready command before transmitting the actual SCSI Write across the FCIP tunnel. This enables the host to start sending data without waiting for an actual Transfer Ready command from the target. The remote MDS will buffer any data before sending it to the target if the MDS has yet to see the Transfer Ready come from the remote storage system. For I/Os using transfer lengths of only 2 K or 4 K, this feature significantly reduces latency and improves throughput. The FCIP interface-level configuration command to enable Write Acceleration is write-accelerator. Tape Acceleration The Tape Acceleration concept is quite similar to Write Acceleration. Applications that access tape devices usually store and retrieve data sequentially utilizing only one outstanding I/O. In other words, each SCSI command is executed serially and a new command does not process unless the initiator receives a status response from the tape device. In addition, with tape devices, a File Marker command is used to verify data integrity. The FCIP Tape Acceleration feature not only proxies the Transfer Ready when an initiator issues the first Write command, which allows the host to start sending data immediately, but also proxies the Status command so that the initiator starts the second Write command without awaiting completion of the first Write command. The Status command, among other functions, is used to signify completion and acknowledgement of a Write operation by the target tape device. Since tape devices only handle one outstanding I/O operation, the target switch buffers subsequent Write commands and data it receives awaiting previous acknowledgements. Similar to Write Acceleration where proxies of the Status (I/O acknowledgement) command are not done for SCSI Writes, proxies of the File Marker command are not done so that the actual File Marker sent from the initiator actually gets to target. Therefore, the File Marker process is handled exclusively by the initiator and target. In this manner, Tape Acceleration improves performance while maintaining data integrity when writing to tape devices over distance. The FCIP interface-level configuration command to enable Write Acceleration is write-accelerator tape-accelerator.

Richard Froom is a technical leader at Cisco Systems in the Data Center, Switching and Wireless customer operations group. Erum Frahim is a customer support engineer at Cisco Systems with the Data Center, Switching and Wireless customer operations group. Together, they co-authored CCNP Self-Study: BCMSN (2nd edition) from Cisco Press. Cuong Tan, who reviewed this article, is a technical marketing engineer for the Storage Business Unit at Cisco Systems. This article originally appeared in Certification Magazine, www.certmag.com.