The proliferation of the raw processing

Similar documents
Fibre Channel over Ethernet in the Data Center: An Introduction

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

Using High Availability Technologies Lesson 12

PCI Express Overview. And, by the way, they need to do it in less time.

High Speed I/O Server Computing with InfiniBand

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

SAN Conceptual and Design Basics

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

ESSENTIALS. Understanding Ethernet Switches and Routers. April 2011 VOLUME 3 ISSUE 1 A TECHNICAL SUPPLEMENT TO CONTROL NETWORK

How To Build A Clustered Storage Area Network (Csan) From Power All Networks

PCI Express and Storage. Ron Emerick, Sun Microsystems

Traditionally, a typical SAN topology uses fibre channel switch wiring while a typical NAS topology uses TCP/IP protocol over common networking

White Paper Abstract Disclaimer

Chapter 13 Selected Storage Systems and Interface

Network Attached Storage. Jinfeng Yang Oct/19/2015

SCSI vs. Fibre Channel White Paper

Optimizing Large Arrays with StoneFly Storage Concentrators

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

iscsi: Accelerating the Transition to Network Storage

Building High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note

Backup Exec 9.1 for Windows Servers. SAN Shared Storage Option

Overview of Computer Networks

Extending Networking to Fit the Cloud

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

Block based, file-based, combination. Component based, solution based

VMWARE WHITE PAPER 1

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

3G Converged-NICs A Platform for Server I/O to Converged Networks

Storage Solutions Overview. Benefits of iscsi Implementation. Abstract

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Storage Area Network

Communication Networks. MAP-TELE 2011/12 José Ruela

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

UPPER LAYER SWITCHING

Computer Organization & Architecture Lecture #19

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

DAS, NAS or SAN: Choosing the Right Storage Technology for Your Organization

Gigabit Ethernet. Abstract. 1. Introduction. 2. Benefits of Gigabit Ethernet

Extending SANs Over TCP/IP by Richard Froom & Erum Frahim

Computer Networking Networks

Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000

Overview of Requirements and Applications for 40 Gigabit and 100 Gigabit Ethernet

White Paper Solarflare High-Performance Computing (HPC) Applications

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Data Communication Networks and Converged Networks

EVOLUTION OF NETWORKED STORAGE

VERITAS Backup Exec 9.0 for Windows Servers

Data Link Protocols. TCP/IP Suite and OSI Reference Model

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

BEST PRACTICES GUIDE: Nimble Storage Best Practices for Scale-Out

Module 15: Network Structures

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Tyrant: A High Performance Storage over IP Switch Engine

Brocade One Data Center Cloud-Optimized Networks

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

Chapter 16: Distributed Operating Systems

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Computer Networks CS321

SAN and NAS Bandwidth Requirements

Chapter 14: Distributed Operating Systems

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Local-Area Network -LAN

STORAGE AREA NETWORKS MEET ENTERPRISE DATA NETWORKS

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

IP Networking. Overview. Networks Impact Daily Life. IP Networking - Part 1. How Networks Impact Daily Life. How Networks Impact Daily Life

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

Private cloud computing advances

NAS or iscsi? White Paper Selecting a storage system. Copyright 2007 Fusionstor. No.1

How To Virtualize A Storage Area Network (San) With Virtualization

Optimizing Data Center Networks for Cloud Computing

Windows Server 2008 R2 Hyper-V Live Migration

NComputing L-Series LAN Deployment

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

VXLAN: Scaling Data Center Capacity. White Paper

SAN/iQ Remote Copy Networking Requirements OPEN iscsi SANs 1

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

ADVANCED NETWORK CONFIGURATION GUIDE

Unified Fabric: Cisco's Innovation for Data Center Networks

IP SAN Best Practices

SummitStack in the Data Center

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

The OSI and TCP/IP Models. Lesson 2

Computer Networks. Definition of LAN. Connection of Network. Key Points of LAN. Lecture 06 Connecting Networks

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Protocol Data Units and Encapsulation

COMPARING STORAGE AREA NETWORKS AND NETWORK ATTACHED STORAGE

IP SAN Fundamentals: An Introduction to IP SANs and iscsi

Solution Brief Network Design Considerations to Enable the Benefits of Flash Storage

Windows Server 2008 R2 Hyper-V Live Migration

A Packet Forwarding Method for the ISCSI Virtualization Switch

Computer Networks Vs. Distributed Systems

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Transcription:

TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer speeds between servers, which are becoming all too evident in today s client-server architecture that is found in all data processing environments. by Joey Maitra, Magma The proliferation of the raw processing power of computers has resulted in system architectures where processing tasks are distributed and assigned to various processing elements in the system in order to spread the load and derive better system throughput. The execution of these tasks is closely coordinated and then integrated by some central processing () entity to produce the desired output. The intent is to have the entire set of these elements share the processing load thereby contributing to boost the overall throughput of the system. Processing elements must then communicate with the central entity and/ or among themselves to synchronize the execution of their respective tasks. In most scenarios, there is also volatile and non-volatile storage elements dedicated to these distributed processing elements comprising the system. For instance in blade centers, blade servers have their own private storage facilities and also communicate with each other over high-speed connections on the mid-plane as well as to devices on a storage area network (SAN) through a switch module. This is typically /-X to /-X Bridge FIGURE 1 Communication Controller Device Layers Transaction Data Link Physical Communication Controller Card A typical Processor architecture prevalent in almost all current computer motherboards. 24 OCTOBER FROM OCTOBER 2010 RTC 2010 MAGAZINE RTC MAGAZINE

Server Motherboard 10 Gigabit e CArd FIGURE 2 Link Layer TCP/IP Packets embedded in Packets Driver Code MAC Layer 10 Gigabit TCP/IP Packets embedded in Frames Buffer the case in mid- to high-end server environments. However, to extend this paradigm to an environment made up of servers physically located in separate enclosures would require a fast interconnect mechanism. In other words, these servers must communicate among themselves via some sort of a network. In such environments, there is also the need to access vast amounts of data via network attached storage (NAS) devices. This scenario is all too prevalent in datacenters and server farms to mention a few. Today, these access mechanisms are implemented via local area network (LAN) with technologies such as Infini- Band, 10 Gigabit, Fibre Channel and the like. Another point to note, the phenomenal rate of deployment of the Internet has resulted in most LANs using TCP/IP in the upper layers of the communication Virtual Connection Gigabit Switch Crossbar Switch Shared Control Server Motherboard Server 2 10 Gigabit e CArd 10 Gigabit TCP/IP Packets embedded in Frames Link Layer TCP/IP Packets embedded in Packets Driver Code MAC layer TCP/IP packet data flow from the Application layer of the sending server through the network to the Application layer of the destination server. stack. IP packets from the TCP/IP layers are essentially encapsulated within the frames of the communication protocol used to form the LAN. The physical connections to the network fabric for servers and computers take place through either a network I/O controller card or a network controller device resident on the motherboards. These motherboards host a root complex processor as shown in Figure 1. A root complex denotes the root of an I/O hierarchy that connects the / memory subsystem to I/O devices. This hierarchy consists of a root complex (RC), multiple endpoints (I/O devices), a switch and a to /-X bridge, all interconnected via links. is a point-to-point, low-overhead, low-latency communication link maximizing application payload bandwidth and link efficiency. Inherent in the technology is a very robust communication protocol with its own set of Transaction, Data Link and Physical Layers. The network I/O controllers implement some specific communication protocol and provide the interface to the physical media constituting the LAN. The controllers interface to a endpoint of the root complex processor (RCP) of the server node participating in the network. Incidentally, this architecture is not restricted to servers since it is common in workstations, desktops and laptops. The vast majority of the industry s prevalent communication protocols was invented before the advent of technology. These protocols have their own set of almost identical infra-structure made up of Transaction, Data link and Physical layers. As depicted in Figure 2, data originating at the Application layer are transformed into TCP/IP packets and then embedded in packets. These packets are then sent to the controller that de-packetizes the TCP/ IP packet from the packets and re-packetizes it to be sent in frames over the 10Gigabit physical media. The reverse process takes place at the destination server end. It is obvious from the discussion so far that there is an awful lot of protocol duplication. The cost of such duplication measured in terms of overall throughput of the network becomes more poignant when the nuances of the various communication protocols are considered as they relate to efficiency, i.e. data rate, maximum payload, packet header overhead, etc. It turns out that the duplication of the communication protocol, even though it may be executed in hardware, causes unnecessary software and hardware overhead burdens that seriously impact the overall throughput of the network infrastructure. Another important factor impacting the overall performance of a network is the bandwidth limitations of the physical media associated with the communication protocol used. This encompasses transfer rates, maximum distances supported and the connection topography to name a few. RTC FROM MAGAZINE RTC MAGAZINE OCTOBER OCTOBER 2010 2010 25

/-X IP Interface Driver Magma Driver Transaction Layer FIGURE 3 40/80 Gigabits/sec 80/160 Gigabits/sec Full Duplex IP Application Transaction Layer e Switch Virtual Internet Server 2 40/80 Gigabits/sec IP Interface Driver Magma Driver Transaction Layer 80/160 Gigabits/sec Full Duplex An alternate approach to sending TCP/IP packets via links with the use of a Switch. to /-X Bridge System Kernel Space xmt IP Packet at t1 rcv IP Packet at t 1 xmt IP Packet at t 2 rcv IP Packet at t 2 FIGURE 4 Mapped e Switch Peer to Peer Physical 4 Giga Bytes 64-Bit Addressable Logical Space System DMA Engine O R Processor Server n System Server x System Mapped /-X to /-X Bridge Server n System Server n Kernel Space xmt IP Packet at t1 rcv IP Packet at t 1 xmt IP Packet at t 2 rcv IP Packet at t 2 The mechanism for the transfer of TCP/IP packets between server memories without any involvement on the part of the server processors. For instance, with 10 Gbit/s the restriction of the data transfer rate to10 Gbit/s is potentially a very serious limitation for many applications. Given this scenario, the ideal approach to boosting the overall performance of the network would be to use the technology as the network fabric. Embedded in the packet is the IP datagram with the destination IP address of the server node. is a point-to-point communication protocol and consequently does not have a media access control (MAC) address. Therefore, the most natural and logical approach to routing data from one node in the network to another would be to have some entity route the data based on the destination IP address. Implementation of this type of routing methodology essentially makes that entity an IP router. This is where the switch comes into play as shown in Figure 3. All of the downstream ports of the switch connect to servers comprising the nodes of a system area network. Intelligence tied to the upstream port of the switch has already established the knowledge of the correlation between the downstream ports and the corresponding IP address of the server attached to it. Data flows from one server to another through the switch. Consequently, it requires that the root complex processor (RCP) tied to the upstream port of the switch communicate with the RCP of the server. This poses the question of how best to communicate between two RCPs. Bus enumeration techniques in architecture, which is the same as in bus architecture, cannot allow one RCP to go through the discovery of devices on a bus that belongs to another RCP. However, there is a technique pioneered by PLX Corporation during the heyday of the bus that addresses this issue and it is called Non Transparent Bridging (). This method allows two RCPs to communicate through the use of base address registers (BARs). This interchange of information is applicable for memory, I/O and configuration spaces in the context of Bus architecture and is applicable for both systems. 26 OCTOBER FROM OCTOBER 2010 RTC 2010 MAGAZINE RTC MAGAZINE

TECHNOLOGY CONNECTED This can only be supported if the underlying hardware of the switch provides functions on the respective downstream ports. The RCP of the IP router sets up the BAR registers on the individual Switch ports attached to the respective servers and maps their system memories to respective windows in the logical address space of its own system memory. This then allows for the visibility of individual system memories of all respective servers in the network by one entity. This access mechanism is used to transfer data, in this case TCP/IP packets, between servers comprising the LAN. This method allows for the transfer of memory or I/O data between attached servers through the switch ports at the maximum data rate supported by the respective physical links. For example, with 8 lanes of links using Gen 2 technology the data transfer rate is 40 Gbit/s and with 16 lanes it is 80 Gbit/s. incorporates full duplex communication technology meaning transmit and receive can happen at the same time. This then makes the full duplex bandwidth for 8 lanes of Gen 2 to be 80 Gbit/s and for 16 lanes it is 160 Gbit/s. Gen 3 technology, which is currently being developed, will more than double these numbers. Magma s patent pending technology, which covers all aspects of a network based on running TCP/IP protocol over fabric inclusive of the IP Router, is the basis of the network switch design. It relies on the pull-model for data transfer through the network switch. This allows for the processors on the sending servers to be totally free and oblivious of how IP data is transferred to the destination server. This significantly reduces the processor overhead on transferring data to and from the network. This is illustrated in Figure 4. With the -based network switch technology, the maximum number of nodes that can be on one network is 256 because of the restrictions imposed by configuration space that supports a maximum of 256 buses. This may be construed as a limitation, but it allows for a very symmetrical topography with one RCP, that RAID Subsystem Data Base Server Mail Server Internet 80 Gigabit/sec TCP/IP over System Area Network Optical Juke Box Tape Library Application Server FIGURE 5 Is an example of how servers with disparate functions participate seamlessly in a symmetrical TCP/IP based System Area Network. of the network switch, servicing all of the nodes as devices underneath it. There is no additional RCP involved on expanding the number of nodes and, therefore, no additional memory resources are required. Consequently, adding nodes to the network simply implies daisy-chaining switches resulting in significant cost per port decrease as the number of nodes in the network is increased. Moreover, as compared to 10Gigabit and other legacy networks, adding nodes to the network switch is seamless because of the plug-n-play attributes of the bus architecture. Since the servers have no direct visibility into a remote server s memory, any data transfer operations necessarily require the root switch to be involved. For instance, when a source server needs to read/write data from/to a target server, the server notifies the root switch rather than attempting to communicate with the target server. It is the root switch that accesses the memory of the source as well as the target server. To further reduce data transfer latencies, the new switch technology uses DMA controllers built into the ports of the switch. This relieves the network switch processor from moving data between servers and allows for concurrent transfers between nodes in the network. This amounts to peer-to-peer transfers within the switch array contributing to drastic reduction in data transfer latencies in the network. Based on the destination IP addresses of all of the individual packets in a particular server s kernel space, the RCP on the network switch sets up the DMA descriptor file and then fires the DMA engine. technology is fast becoming ubiquitous and the result has been that all server and workstation manufacturers now provide a certain number of slots for I/O expansion. These form the endpoints of the RCP on the host computer backplane. A host card will take the signals from the backplane and 27 RTC MAGAZINE RTCFROM MAGAZINE OCTOBEROCTOBER 2010 2010

bring them out on fiber, or alternately on copper, to attach to the ports of the network switch. The number of lanes operational between the server and the network switch will depend on the number of lanes supported by the server hardware. allows for link negotiation whereby both ends of a link negotiate to support the minimum number of lanes supported by either of the two connection points. Consequently, each port of the network switch will negotiate down to the number of lanes supported by the host connection to that individual port. These ports support Gen 2 signaling standards and will negotiate down to Gen 1 signaling to support the corresponding connection to the server. This makes the network switch highly scalable. The network switch technology is based completely on standards with no aspects of the technology being proprietary. With the industry s commitment to technology, it provides a migration path to a newer generation of technology thus potentially extending its life cycle. This technology allows for the coexistence of legacy networks as it goes through its adoption cycle phase and, moreover, can serve as a fallback mechanism for mission-critical applications. This allows for a fail safe deployment. Another significant advantage is the cost per port as nodes get added to the network since there is only one root complex processor (RCP) on the network switch in this network topology. Figure 5 shows an example of how servers with disparate functions participate seamlessly in a symmetrical TCP/ IP-based system area network. This also shows how storage and processing servers coexist in one homogeneous network. This is facilitated by the increasingly popular implementation of iscsi on communication with network attached storage devices. iscsi is essentially the SCSI protocol embedded in TCP/IP packets. SCSI protocol is widely used in the industry to communicate with storage devices. Also, the connection to the Internet implies simply transferring all IP packets intact that are not destined for any server on the network via a wide area network (WAN) interface. The deployment of the network switch as shown in Figure 5 is representative of a topography that with different software modules can be used for clustering, I/O virtualization and cloud computing applications. It is a highly flexible architecture. Magma San Diego, CA. (858) 530-2511. [www.magma.com]. 28 OCTOBER FROM OCTOBER 2010 RTC 2010 MAGAZINE RTC MAGAZINE