» Application Story «PCI Express Fabric breakthrough



Similar documents
» Communication Rack Mount Servers «

IMPORTANCE OF COMPUTING HEALTH MANAGEMENT

Highly available embedded server for secure teleservices Kontron KISS servers in use as a high-end firewall & VPN gateway for industrial teleservices

White Paper Solarflare High-Performance Computing (HPC) Applications

PCI Express Overview. And, by the way, they need to do it in less time.

PCI Express High Speed Networking. A complete solution for demanding Network Applications

PCI Express* Ethernet Networking

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

3G Converged-NICs A Platform for Server I/O to Converged Networks

High Speed I/O Server Computing with InfiniBand

VPX Implementation Serves Shipboard Search and Track Needs

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

QLogic 16Gb Gen 5 Fibre Channel in IBM System x Deployments

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

PCI Express High Speed Networks. Complete Solution for High Speed Networking

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

1000-Channel IP System Architecture for DSS

Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

SummitStack in the Data Center

SummitStack in the Data Center

Infrastructure Matters: POWER8 vs. Xeon x86

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

COMPUTING. Centellis Virtualization Platform An open hardware and software platform for implementing virtualized applications

The proliferation of the raw processing

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

OPTIMIZING SERVER VIRTUALIZATION

Linear Motion and Assembly Technologies Pneumatics Service. Industrial Ethernet: The key advantages of SERCOS III

- An Essential Building Block for Stable and Reliable Compute Clusters

Support a New Class of Applications with Cisco UCS M-Series Modular Servers

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Solving the Hypervisor Network I/O Bottleneck Solarflare Virtualization Acceleration

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

PCI Technology Overview

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

evm Virtualization Platform for Windows

50. DFN Betriebstagung

Introduction to PCI Express Positioning Information

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

Informatica Ultra Messaging SMX Shared-Memory Transport

SMB Direct for SQL Server and Private Cloud

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

PCI Express and Storage. Ron Emerick, Sun Microsystems

Data Center and Cloud Computing Market Landscape and Challenges

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Boosting Data Transfer with TCP Offload Engine Technology

Benchmarking the OpenCloud SIP Application Server on Intel -Based Modular Communications Platforms

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Accelerating Data Compression with Intel Multi-Core Processors

Quantum StorNext. Product Brief: Distributed LAN Client

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

How To Test A Microsoft Vxworks Vx Works (Vxworks) And Vxwork (Vkworks) (Powerpc) (Vzworks)

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Overview and Frequently Asked Questions Sun Storage 10GbE FCoE PCIe CNA

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

On Demand Satellite Image Processing

RoCE vs. iwarp Competitive Analysis

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

XMC Modules. XMC-6260-CC 10-Gigabit Ethernet Interface Module with Dual XAUI Ports. Description. Key Features & Benefits

Intel Xeon Processor E5-2600

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Intel PCI and PCI Express*

Parallels Server 4 Bare Metal

How PCI Express Works (by Tracy V. Wilson)

New!! - Higher performance for Windows and UNIX environments

Cisco Application Networking Manager Version 2.0

Virtualised MikroTik

The Bus (PCI and PCI-Express)

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Enabling Technologies for Distributed Computing

Upgrading Data Center Network Architecture to 10 Gigabit Ethernet

Building Enterprise-Class Storage Using 40GbE

Enabling Technologies for Distributed and Cloud Computing

Realizing the next step in storage/converged architectures

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

State of the Art Cloud Infrastructure

The Transition to PCI Express* for Client SSDs

Linux NIC and iscsi Performance over 40GbE

Transcription:

» Application Story «PCI Express Fabric breakthrough Up to 4.4 GB per second TCP/IP communication VPX based multi-processor applications with PCI Express and VxFabric Recently Kontron and PLX announced an industry breakthrough in the deployment of PCI Express (PCIe ) and VXFabric technology as a VPX backplane interconnect for TCP/IP communication. Up to 4.4 gigabytes per second performance is now available for TCP/IP and up to 5.6 for raw data. This represents a tremendous jump in processing power and I/O data bandwidth for high performance embedded computing applications such as radar and sonar systems as well as video based situational awareness systems. PCI Express (PCIe) technology is being used across multiple applications storage, server, graphics, communications, instrumentation, medical, embedded and consumer. However, more recently PCIe has been picked as the choice of interconnect technology in multi-processor applications. This is made possible by the incarnation of Gen 3 speeds of PCIe now supporting up to 8 gigatransfers per second (GTps). With a single PCIe switch it is now possible to support 1.5 Tbps of data transfer, which clearly enables newer usage models. Additionally, the latest PCIe switches support features such as two non-transparent (NT) ports for redundancy, direct-memory access (DMA) for faster data transfer, and spread spectrum clock (SSC) isolation for connections across different clock domains. All these features support the design of PCIe interconnect technology The pulse of innovation

in newer applications, such as VPX based multi-processor communication. VPX is a VITA standard specification for high-bandwidth military and aeronautical systems, which allows board computers to move away from parallel bus architectures on the backplane and implement high-speed serial link point-topoint connections such as PCI Express, GbE or Serial RapidIO between boards. VPX offers a well defined ecosystem for multi-vendor, multi-module, integrated system environments, providing developers and system integrators a high level of performance and interoperability for their specific COTS-based rugged systems. Need for advanced connectivity and bandwidth VPX based systems are deployed at the forefront of many demanding image-intensive aerospace and defense applications. In these application areas the need for advanced connectivity and bandwidth is constantly evolving because of requirements for higher fidelity of data (higher resolutions), different types of data (video versus audio), and the amount of data sources (higher numbers of sensors). In particular, next generation radar, targeting and surveillance systems for UAVs, and broadband electronic warfare monitoring and jamming systems are requiring enhanced resolution imagery, higher I/O rates, and higher performance switch-based serial communication fabrics. For all these size-, space- and weightconstrained applications it is essential to increase performance without exceeding the thermal budgets. Most of these already field-deployed applications use TCP/ IP via Ethernet for inter-processor communication. But native Ethernet is typically limited to what the processor and/or chipset provide and currently 10-gitabit Ethernet and even 40-gigabit Ethernet aren t natively supported by highly energy-efficient embedded processors and chipsets. If higher bandwidth is a must, radar and sonar engineers, as well as those developing video based situational awareness applications, currently have to use an alternative data plane to increase performance, which also usually requires modifying the application code. One way that helps to bridge this inconvenient gap in technology deployment of higher bandwidths without modifying application code is to facilitate the ongoing evolution of PCIe. Switches, such as PLX Express Lane Gen 3 devices with up to 96 lanes and 24 ports, are designed for such deployment, as each lane can transfer 8GTps. By using latest PCIe interconnect technology to implement Ethernet over PCIe (TCP/IP over PCIe) as well, VPX system designers can now apply considerable innovation to enable a higher performance while application software stays protected from obsolescence. Applications can facilitate native PCIe ports on all the components of the VPX single-board computer (SBC) to keep the costs down and reduce the latency of the system. The reduced latency comes as a result of using PCIe for inter-processor communication without the need to convert from multiple technologies. From the cooperation of Kontron and PLX engineers, comes VXFabric, which helps those engineers reduce development costs by simplifying and accelerating application development of inter-cpu communication in VPX system architectures. VXFabric provides the required software between a PLX Technology PCIe Gen 3 switch and the bottom of the standard TCP/IP stack, which allows the boards to run their existing TCP/IP-based application without having to be modified, while also enabling migration to emerging standards, such as 10 gigabit and 40 gigabit Ethernet. Inter-board communication at hardware speed VXFabric delivers engineers an open infrastructure which implements efficient inter-board communication at hardware speed. The architecture is compliant with the OpenVPX standard (VITA65) which defines two main hardware topologies of the backplane: distributed and centralized topologies. In its current form, VXFabric can simultaneously interconnect up to 12 nodes via PCI Express. The physical interconnect through the backplane can be made in various implementations according to the choice of backplane style:» A distributed PCIe backplane can be used without the need of any PCIe switch. In this case, very compact architectures can be proposed, as long as the relevant data path between nodes is implemented in the backplane. This data path is mostly application dependent, making distributed backplane the right solution for cost effective deployments.» When a higher bandwidth and more flexibility in the data plane are required, a centralized topology is recommended. With a PCIe switch board, such as the Kontron 3U VPX PCIe and Ethernet Hybrid switch VX3905, it is possible to interconnect up to 12 nodes with the VXFabric via PCI Express and establish communication links between any nodes using the same backplane. This approach fits well the HPEC application domain, as well as lab use where multiple application data paths can be evaluated with the same equipment. Multiprocessor architectures From the hardware point of view, the architecture is based on several CPU boards, each featuring several processing cores, interconnected through PCIe via the VPX backplane, using a PCIe switch. Software wise, VXFabric is equivalent to an Ethernet network infrastructure mapped over a switched PCIe express fabric. It implements the layers allowing the user to handle the communication with an IP socket programmatic interface. This API allows direct access to all classic protocols like TCP or UDP. Furthermore, VXFabric requires no modification of existing applications, which helps reduce development efforts and simplifies migration to the new VPX architecture. The standard user programming model of Kontron VXFabric is based on the IP protocol and implements a socket layer API through an emulation of an Ethernet interface over PCIe 2

similar to implementations of pseudo-ethernet found in virtual machines. This API is the main reason why the compatibility of existing applications with VXFabric is guaranteed. The task to migrate from a Gbit Ethernet TCP/IP infrastructure towards Kontron VXFabric avoids the low level complex and proprietary APIs like most of serial RapidIO or Infiniband implementations for example and is straightforward. VXFabric is available under Linux for all Kontron VPX CPU boards and it is designed to be portable on other operating systems offering a modern TCP/IP stack as well as other architectures such as FPGAs. This implementation is scalable. Furthermore, VXFabric does not require any other infrastructure than the VPX backplane and a VPX PCIe switch to interconnect VPX boards. 100% of the necessary hardware and silicon involved comes from the main IT market and is not at the mercy of a little group of suppliers. Data communication through Socket API There is no difference when using the socket API on top of VXFabric or on top of a standard Ethernet interface. The datagram socket (UDP) or streaming socket (TCP) API is the preferred way to address the VXFabric which is usually dedicated to data communication. A good coding example can be found in the source code of the iperf (open source GPL) tool which has been used unmodified to get benchmark and throughput measurement of the VXFabric implementation. For specific communication links where real-time data transfers have to be used, an application can choose to use the VXFabric raw mode to move large data sets directly from user memory to user memory. Driving the PCIe silicon DMA engines, the utility layers of VXFabric take care of handling the low level programming and the management of scatter/gather DMA job lists while the VXFabric infrastructure maintains the necessary system-wide address mapping coherency. With this raw mode, the user cannot rely on all the intrinsic features of TCP/IP communication (guaranteed delivery, data check, flow control) and must find other means to synchronize between the multiple parts of a distributed application. Figure 1: VX Fabric implementation Figure 2: VXFabric TCP and RAM API 3

Performance Figures Up to 5.6 gigabytes per second raw bandwidth can now be made available for VPX form factors. The configuration is capable to support up to 4.4 gigabytes per second for TCP/ IP communication. This represents a tremendous jump in processing power and data I/O bandwidth for high-performance embedded computing applications, such as radar and sonar systems as well as video based situational awareness systems. Sustained throughputs have been measured with the standard iperf tool. PCIe Gen 3 performances have been demonstrated on VX3042/VX3044 cards hosting the Intel Core i7 3rd-Gen processor (Ivy Bridge). For PCIe Gen 2, results were obtained for communications between VX3035 cards based on the Intel Core i7 2nd-Gen processor (Sandy Bridge). And the VX3030/ VX6060 cards, featuring the Intel Core i7 1st-Gen (Arrandale CPU), were used to exercise the bandwidth in PCIe Gen 1 mode. During the measurements, Turbo boost and hyperthreading were disabled. Conclusion Based on PCIe Gen 3, VXFabric is a technology ideal to be deployed in many military and aeronautic applications both for high performance embedded computing (HPEC) with up to 12 quad-processor nodes, or in SWAP optimized designs with a few computer nodes and many I/Os. The pervasion of PCIe and TCP/IP in all computer technologies are positioning PCIe and VXFabric as the most efficient, inexpensive and perennial technologies for switch fabric in embedded systems. Looking forward, PCIe Gen 4, with speeds of up to 16Gbps per link, is expected to dramatically accelerate and expand the adoption of PCIe technology even in new market segments, while making it easier and economical to design with and use. As VXFabric is bridging PCIe and the internet of things protocol TCP/IP together, application software can be protected from obsolescence for minimum the next 20 years. Sustained TCP bandwidth CPU load TCP (load as a percentage of one multicore CPU) Sustained raw bandwidth (1) PCIe x8 gen3 4.4 GBytes/s 10% (quad 5.6 GBytes/s core) PCIe x4 gen3 2.5 GBytes/s 6% (quad core) 2.8 GBytes/s PCIe x4 gen2 1.3 GBytes/s 8% (dual core) 1.5 GBytes/s PCIe x4 gen1 or x2 gen2 0.6 GBytes/s 5% 0.8 Gbytes/s (1): Very light CPU load, transfer handled directly by switch DMA engine. Figure 3: VXFabric sustained bandwidth and CPU usage Vincent Chuffart Mil & Aero Product Manager, Kontron Krishna Mallampati Senior Director of Product Marketing, PLX Technology 4

About Kontron Kontron is a global leader in embedded computing technology. With more than 40 % of its employees in research and development, Kontron creates many of the standards that drive the world s embedded computing platforms. Kontron s product longevity, local engineering and support, and value-added services, helps create a sustainable and viable embedded solution for OEMs and system integrators. Kontron works closely with its customers on their embedded application-ready platforms and custom solutions, enabling them to focus on their core competencies. The result is an accelerated time-to-market, reduced total-cost-of-ownership and an improved overall application with leading-edge, highly-reliable embedded technology. Kontron is listed on the German TecDAX stock exchanges under the symbol KBC. For more information, please visit: CORPORATE OFFICES Europe, Middle East & Africa Lise-Meitner-Str. 3-5 86156 Augsburg Germany Tel.: + 49 (0) 821 4086-0 Fax: + 49 (0) 821 4086 111 sales@kontron.com North America 14118 Stowe Drive Poway, CA 92064-7147 USA Tel.: + 1 888 294 4558 Fax: + 1 858 677 0898 info@us.kontron.com Asia Pacific 17 Building,Block #1, ABP. 188 Southern West 4th Ring Road Beijing 100070, P.R.China Tel.: + 86 10 63751188 Fax: + 86 10 83682438 info@kontron.cn Application Story PCI Express Fabric breakthrough #06142013WMH All data is for information purposes only and not guaranteed for legal purposes. Subject to change without notice. Information in this datasheet has been carefully checked and is believed to be accurate; however, no responsibilty is assumed for inaccurancies. All brand or product names are trademarks or registered trademarks of their respective owners. 5