PCI Express High Speed Networks. Complete Solution for High Speed Networking

Similar documents
PCI Express High Speed Networking. A complete solution for demanding Network Applications

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

PCI Express* Ethernet Networking

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

White Paper Solarflare High-Performance Computing (HPC) Applications

- An Essential Building Block for Stable and Reliable Compute Clusters

Intel DPDK Boosts Server Appliance Performance White Paper

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Whitepaper. Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers

Meeting the Five Key Needs of Next-Generation Cloud Computing Networks with 10 GbE

D1.2 Network Load Balancing

50. DFN Betriebstagung

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Network Attached Storage. Jinfeng Yang Oct/19/2015

The proliferation of the raw processing

Intel PCI and PCI Express*

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

The Elements of GigE Vision

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Windows Server 2008 R2 Hyper-V Live Migration

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

QLogic 16Gb Gen 5 Fibre Channel in IBM System x Deployments

Brocade Solution for EMC VSPEX Server Virtualization

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

IP SAN Best Practices

10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Windows 8 SMB 2.2 File Sharing Performance

Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

PCI Express Overview. And, by the way, they need to do it in less time.

Lustre Networking BY PETER J. BRAAM

Avid ISIS

LS DYNA Performance Benchmarks and Profiling. January 2009

SAN Conceptual and Design Basics

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Auspex Support for Cisco Fast EtherChannel TM

PCI Express IO Virtualization Overview

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

VMware vsphere 4.1 Networking Performance

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

How A V3 Appliance Employs Superior VDI Architecture to Reduce Latency and Increase Performance

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

RoCE vs. iwarp Competitive Analysis

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

XMC Modules. XMC-6260-CC 10-Gigabit Ethernet Interface Module with Dual XAUI Ports. Description. Key Features & Benefits

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Cloud Networking: A Novel Network Approach for Cloud Computing Models CQ1 2009

VMWARE WHITE PAPER 1

New!! - Higher performance for Windows and UNIX environments

Design Issues in a Bare PC Web Server

Introduction to PCI Express Positioning Information

Introduction to SFC91xx ASIC Family

Performance of Software Switching

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Three Key Design Considerations of IP Video Surveillance Systems

I3: Maximizing Packet Capture Performance. Andrew Brown

Router Architectures

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Computer Systems Structure Input/Output

Linear Motion and Assembly Technologies Pneumatics Service. Industrial Ethernet: The key advantages of SERCOS III

Frequently Asked Questions

SMB Direct for SQL Server and Private Cloud

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Realizing the next step in storage/converged architectures

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Fiber Channel Over Ethernet (FCoE)

» Application Story «PCI Express Fabric breakthrough

The functionality and advantages of a high-availability file server system

IP SAN Fundamentals: An Introduction to IP SANs and iscsi

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

Windows Server 2008 R2 Hyper-V Live Migration

PCI Express and Storage. Ron Emerick, Sun Microsystems

How To Increase Network Performance With Segmentation

The Dusk of FireWire - The Dawn of USB 3.0

High-Speed SERDES Interfaces In High Value FPGAs

Network Virtualization Technologies and their Effect on Performance

Final for ECE374 05/06/13 Solution!!

Quantum StorNext. Product Brief: Distributed LAN Client

IP SAN BEST PRACTICES

Network Simulation Traffic, Paths and Impairment

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

PCI EXPRESS: AN OVERVIEW OF PCI EXPRESS, CABLED PCI EXPRESS, AND PXI EXPRESS

3G Converged-NICs A Platform for Server I/O to Converged Networks

ADVANCED NETWORK CONFIGURATION GUIDE

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

A Dell Technical White Paper Dell Storage Engineering

GE Intelligent Platforms. PACSystems High Availability Solutions

VTrak SATA RAID Storage System

Big data management with IBM General Parallel File System

Simplify VMware vsphere* 4 Networking with Intel Ethernet 10 Gigabit Server Adapters

Desktop Consolidation. Stéphane Verdy, CTO Devon IT

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

ECLIPSE Performance Benchmarks and Profiling. January 2009

Transcription:

PCI Express High Speed Networks Complete Solution for High Speed Networking

Ultra Low Latency Ultra High Throughput Maximizing application performance is a combination of processing, communication, and software. Dolphin develops PCI Express interconnect solutions that eliminate system bottlenecks related to communication, allowing applications to reach their potential. Dolphin s PCI Express products reduce latency for database, storage, and high performance distributed applications while improving throughput for data transfers. Dolphin combines PCI Express superior performance and ultra-low latency with standardized computer hardware and software to obtain the most out of applications. Our standard form factor boards reduce time to market, enabling customers to rapidly implement PCI Express solutions in data centers and embedded systems. Socket Latency µs 150 120 90 60 30 0 1 SuperSockets 4 8 10GigE Support 12 16 24 32 48 64 96 128 256 512 1k 2k 4k 8k IXH610 Host Adapter This low profile PCI Express x8 Gen2 adapter card provides 40 Gbits/s data rates over a standard PCIe external cabling system. The IXH610 features transparent and nontransparent bridging (NTB), along with clock isolation. These powerful features make the adapter an ideal interconnect for applications such as test and measurement, medical equipment, and others seeking high performance and low latency. Hardware Dolphin Interconnect Solutions 16k Dolphin s software ensures reuse of existing applications but with better response times and data accessibility. Developers can quickly deploy applications with improved overall preformance with our socket and TCP/ IP supported development tools. Further tuning is available with our low level API that delivers maximum performance. Software Dolphin employs IDT PCI Express components as a road map to higher performance hardware at 40 Gbps and beyond. PCI Express devices furnish access to a complete low cost infrastructure. Dolphin s software takes advantage of this infrastructure enabling customers to deliver next generation systems and maximize performance of existing applications. Outstanding Performance Dolphin s PCI Express solutions deliver outstanding performance compared to other interconnects in latency and throughput. When compared to 10 Gigabit Ethernet, PCI Express latency is significantly lower, with measured latency 10 times lower than standard 10 Gigabit Ethernet. This lower latency is achieved without special tuning or complex optimization schemes. In addition, Dolphin takes advantage of the high throughput of PCI Express. Our current solution utilizes Gen 2 PCI Express with 40 Gbps of raw bandwidth. Future product generations can easily upgrade to Gen 3 PCI Express, doubling the bandwidth. The upgrade to Gen3 requires no software changes and maintains the low latency device characteristic of PCI Express. The investment in low latency high performance Dolphin products will yield dividends today and into the future. IXH620 XMC Adapter The Dolphin Express IXH620 XMC Adapter brings 40 Gbit/s data rates and advanced connection features to embedded computers that support standard XMC slots or XMC VPX, VME or cpci carrier boards. The x8 Gen2 PCI Express adapter expands the capabilities of embedded systems by enabling very low latency, high throughput cabled expansion using standard PCI Express cables.

Eliminate Software Bottlenecks The Dolphin Express Software Suite is aimed at performance critical inter-processor communication. The suite includes advanced performance improving tools for new and existing applications. The SuperSockets API removes traditional stack based network bottlenecks. Sockets and IP applications take advantage of the PIO and DMA operations within PCI Express devices without modification, improving performance and reducing time to market. SuperSockets enabled applications to experience latencies under 2 ms and throughput at 22 Gigabit/s. The optimized TCP/IP driver enables IP applications to take advantage of the performance of PCI Express. The SISCI embedded software API offers further optimization for shared memory and replicated shared memory applications. Embedded customers can benefit from even lower latencies in the range of 0.79ms latency with higher throughput of over 3500 MB/s. Scalable Systems Scalability is accomplished through the use of switches. Dolphin s IXS600 switch supplies 40Gbps ports for high throughput and scalability. By linking both transparent and nontransparent devices to the IXS600, distributed systems increase both I/O and processing capacity. This low latency switch enables systems to scale, while maintaining high throughput Robust Features Lowest host to host latency and low jitter with 0.79ms for fast connections and data movement DMA capabilities to move large amounts of data between nodes with low system overhead and low latency. Application to application transfers exceeding 3500 MB/s throughput. Management software to enable and disable connection and failover to other connections Direct access to local and remote memory, hardware based uni- and multi-cast capabilities Set up and manage PCIe peer to peer device transfers High speed sockets and TCP/IP application support Provides ease of installation and plug and play migration using standard network interfaces Dolphin software and hardware is a key enabler for developing high performance PCI Express based real-time simulation systems. Says Jean Belanger, CEO of Opal-RT a Montreal based Power Simulation Company. Dolphin offers excellent tools and features for our high speed emegasim distributed electrical grid system simulation requiring high performance and very low latency solutions.

SuperSockets Ultra low Latency Berkeley Sockets API for PCI Express Unmodified Application SuperSockets Socket Switch TCP/IP Stack MAC User Kernel Dolphin Express Nic Dolphin s PCI Express network replaces local Ethernet networks with a high speed low latency PCI Express network. Dolphin s PCI Express hardware and Dolphin s SuperSockets accomplishes this without application changes. SuperSockets TM is a unique implementation of the Berkeley Sockets API that utilizes the capabilities of our PCI Express over cable hardware transport to transparently achieve performance gains for existing socket-based network applications. The combination of Dolphin Express hardware and the SuperSockets software creates an ultra-low latency, high-bandwidth, low overhead, and high availability platform to support the most demanding sockets based applications. With support for both Linux and Windows Operating systems, new and existing applications can easily be deployed on the high performance Dolphin platform without any modifications. Traditional implementations of TCP socket require two major CPU consuming tasks: data copy between application buffers and NIC buffers and TCP transport handling (segmentation, reassembly, check summing, timers, acknowledgments, etc). These operations turn into performance bottlenecks as I/O interconnect speeds increase. SuperSockets eliminates the protocol stack bottlenecks, delivering superior latency performance. Our ultra low latency is achieved through the use of an efficient remote memory access mechanism. This mechanism is based on a combination of PIO (Programmed IO) for short transfers and RDMA (Remote Direct Memory Access) for longer transfers, thus allowing both control and data messages to experience performance improvements. PIO has clear advantages for short messages, such as control messages for simulations systems, as the transfer is completed through a single CPU store operation that moves data from CPU registers into the memory of the remote node. In most cases, data transfers through SuperSockets are completed before alternative technologies have even managed to start their RDMA. SuperSockets TM also accelerates intra-system communication. It implements a high speed loopback device for accelerating local system sockets communication. This reduces local sockets latency to a minimum. Embedded and client server Socket programs require a very high degree of stability and reliability. SuperSockets comes with built in high availability. These complex applications can benefit from Dolphin s switching technology. Dolphin provides instantaneous switching for hardware node or network errors. If the Dolphin Express network is unavailable, socket communication will be served by the regular network stack. This automatic process is the most cost-effective way for building high availability systems. The Linux version comes with an instant fail-over and fail-forward mechanism that transparently will switch between Dolphin Express and regular networking. Dolphin s PCI Express network and SuperSockets provide an ultra fast and reliable transport mechanism for embedded and business applications. Features Compliant with Linux Socket library, WinSock2, and Berkley Sockets No OS patches or application modifications required Windows and Linux OS support Supports both user space and kernel space clients Full support for socket inheritance/duplication Transparent fail-over to Ethernet if high speed connection is down. Fail forward when problem is corrected Includes local loopback socket acceleration up to 10 times faster than standard linux and Windows Supports multiple adapters per host for increased fault tolerance and speed Both TCP and UDP support Easy to install with no application modifications

SuperSockets Ultra low Latency Berkeley Sockets API How Does SuperSockets Work? In order to divert socket communication, without touching the application, the sockets API functions must be intercepted. This is done differently in the Windows and Linux environment. Dolphin SuperSockets TM on Linux differs from regular sockets only in the address family. SuperSockets TM implement an AF_INET compliant socket transport called AF_SSOCK. The Linux LD_PRELOAD functionality is used to preload the standard socket library with a special SuperSockets handle operations can be applied on them (closure, duplication, and inheritance when a new child process is created etc). Dolphin SuperSockets for Windows intercepts the required WinSock2 API calls. During runtime, all socket calls will be routed through the Dolphin Sockets switch. A Dolphin Express communication channel is enabled by a configuration file named dishosts.conf. This file contains the host names of the machines in the cluster and their associated node Id in the network. Socket Latency µs SuperSockets Performance SuperSockets 10GigE 150 120 90 60 30 0 1 4 8 12 16 24 32 48 64 96 128 256 512 1k 2k 4k Figure 2: SuperSockets vs. 10 GigE latency 8k 16k 3000 Server A Unmodified Application Server B Unmodified Application 2500 2000 Socket Switch Socket Switch SuperSockets TCP/IP Stack SuperSockets TCP/IP Stack MBytes/s 1500 1000 Throughput 500 MAC MAC 0 Dolphin Express Nic Dolphin Express Nic 16 32 64 128 256 512 1k 2k 4k 8k 16k 32k 65k 130k 262k 524k 1m 2m Figure 3: SuperSockets Throughput by block size Figure 1: SuperSockets Vs. Ethernet Movement Model library that intercepts the socket() call and replaces the AF_INET address family with AF_SSOCK. All other sockets calls follow the usual code path and are accelerated by the SuperSockets TM module if the socket target address falls within the Dolphin Express network. For Windows applications, most applications use dynamic link libraries provided by the operating system (WS2_32. DLL and KERNEL32.DLL) for socket communication. Windows sockets are regular handles, so normal Sockets configured for a Dolphin Express endpoint will be routed through the low latency Dolphin SuperSockets TM library. All other connections will be routed back through the standard WinSock2 or AF_INET transport library and regular Ethernet. Measurements show that the redirection is virtually instantaneous and adds no overhead to system calls. An optional configuration file can also be created where individual sockets can explicitly be enabled or disabled for Dolphin Express communication. The Dolphin SuperSockets library is optimized for high throughput and low latency communication by reducing the number of system resources and interrupts needed for data transfers. The Dolphin SuperSockets library uses direct remote memory CPU store operations (PIO) to transfer small messages and engages DMA engines to transmit large messages. The latency chart above shows performance results for two Intel Core2 2.00 GHz systems interconnected using Dolphin Express, 10Gigabit Ethernet and Gigabit Ethernet. The benchmark is a simple application running a socket ping-pong test. The chart shows the half RTT (Round Trip Time). The minimum latency for Dolphin SuperSockets is under 2 microseconds. This is far better than both 10Gigabit and Gigabit Ethernet. SuperSockets TM also delivers high throughput with over 2700 MB/s of data throughput.

Shared Memory Reflective Memory Solution Dolphin Express Replicated Shared Memory solutions improves performance of embedded and real time applications with critical performance requirements, such as simulation, virtual reality, and other real-time applications. A high speed flexible communication network, Dolphin Express combines ultra-low-latency data transfers, high availability, and lower cost in a modern high speed data communications solution. With Dolphin Express, you can implement a replicated shared-memory architecture (reflective memory) in a modern switched architecture. Dolphin Express provides a single high speed, low latency data link which combines both control and data networks in a single interconnect. Unlike other replicated sharedmemory concepts, Dolphin Express implements a switch based solution. The implementation of a switched architecture improves system performance by implementing a data broadcast approach, which reduces each nodes data update time and reduces data delivery jitter. Once a data packet is transferred to a switch, it is simultaneously broadcast. written to remote nodes is typically available in all remote memories within less than 1.25 microseconds. Dolphin Express implements hardware based uni- and multicast functionality that allows for very efficient low-cost distribution of data, outperforming serial reflective memory solutions. The Dolphin hardware performs all bus protocols and handshakes, while data is always pipelined to the next level. This improves data throughput and reduces system overhead. A reliable data delivery mechanism eliminates dropped packets. transfers feature Programmed IO (PIO) or Remote Direct Memory Access (RDMA) transfer mechanisms with data rates up to data throughput of 16 Gbit/s per second. This throughput meets the needs of the most demanding applications. Dolphin s solution uses cacheable main system memory to store data. The use of cacheable main memory provides a significant performance and cost benefit. Remote interrupts or polling is used to signal the arrival of data from a remote node. Since the memory segments are normal cacheable main memory, polling is very fast and consumes no memory bandwidth. The CPU polls for changes in its local cache. When new data arrives from the remote node, the I/O system automatically invalidates the cache and the new value is cached. Replicated memory solutions are known for their simplicity, just read and write into a shared distributed memory. Our high-performance network includes easy installation and operation. The Dolphin s SISCI Developers Kit (Software Infrastructure for Shared-memory Cluster Interconnect) manages the system. The kit includes all the tools and flexibility to setup your reflected memory system. Once setup, your application simply reads and writes to remote memory. Features High-performance, ultra low-latency switched 40-Gbps data rate interconnect Up to 3500 MB/s Throughput Hardware based multi-cast Configurable shared memory regions Fiber-Optic and Copper Cabling support Scalable switched architecture SISCI Developers Kit Built in CRC, 8b/10b encoding PCI Express Host Adapters Expandable switch solutions Applications High Speed Simulators Video Information Distrbution Virtual reality systems Range and telemetry systems CT Scanners Distributed sensor-to-processor systems High speed video inspection

Shared Memory High performance reflective memory Why use Dolphin Reflective Memory? The Dolphin Express Reflective shared memory solution create a reflective memory address space between nodes interconnected with the Dolphin Express product line. The solution offers significantly higher performance at a much lower cost than other reflective memory solutions and is available with both low cost copper and long distance fiber cabling. The Dolphin reflective memory functionality is implemented in the Dolphin Express switch. Topology sizes range from 2 node system without a switch to multi-switch large system configurations. For larger systems, Switches are cascaded. Two node systems use regular data unicast (data only written to single remote memory location, no local updates). All hosts needs to install a Dolphin Express adapter card. options and increasing cost. Main memory solutions benefit from CPU caching and very high performance internal memory buses. Traditional reflective memory device memory conversely is non-cacheable. Memory access is very expensive as the CPU must fetch the data from the card through the I/O system. The Dolphin IXH adapter comes with a x8 PCI Express link enabling customer applications to take advantage of the exceptional 40Gb/s link bandwidth. The fully hardware based memory mapped data transmission does not rely on any operating system service or kernel driver functionality. The solution includes an extensive software library that makes configuration and setup easy. This software is not inactive during application runtime. MBytes/s Reflective memory Performance 2500 2000 1500 1000 500 0 8 16 32 64 128 256 512 1k 2k 4k Multicast Thoughput 16k 32k 65k 131k 262k 524k Figure 2: Reflective Memory Throughput by block size The Dolphin replicated shared memory system delivers ultra fast performance. The SISCI low level API enables the lowest latency. Dolphin benchmarks show latencies as low as 0.99ms. This is accomplished while delivering high throughput. The reflective memory solution delivers 2,000 MB/s of data throughput. The chart on Figure 2 show the throughput at various message sizes. 8k The solution offers significantly Figure 1 illustrates a Dolphin faster access to local data than other Figure reflective 1 memory configuration. solutions. The Dolphin multicast solution uses system main memory as reflective memory. Other solutions use expensive device memory, limiting system configuration Source Node Reflective memory or Multi-cast Nodes Figure 1: Reflective Memory Diagram

SISCI Developers Kit Ultra low Latency Shared Memory API for PCI Express System architects seeking to maximize distributed application performance are exploring PCI Express as a cable interconnect. Dolphin s Software Infrastructure Shared-Memory Cluster Interconnect (SISCI) API makes developing PCI Express over cable applications faster and easier. The SISCI API is a well established API for shared memory environments. In a multiprocessing architecture with PCI Express over cable, the SISCI API enables PCI Express based applications to use distributed resources like CPUs, I/O, and memory. The resulting application features reduced system latency and increased data throughput. For inter-process communication, PCI Express supports both CPU driven programmed IO (PIO) and Direct Memory Access (DMA) as transports through Non-transparent bridging (NTB). Dolphin s SISCI API utilizes these components in creating a development and runtime environment for system architects seeking maximum performance. The environment is very deterministic, low latency, low jitter and ideal for traditional high performance applications like real time simulators, reflective memory applications, high availability servers with fast fail-over, and high speed trading applications. The SISCI API supports data transfers between applications and processes running in an SMP environment as well as between independent servers. It s capabilities include managing and triggering of application specific local and remote interrupts and catching and managing events generated by the underlying PCI Express system (such as a cable being unplugged). The SISCI API makes extensive use of the resource concept. Resources can be items such as virtual devices, memory segments, and DMA queues. The API removes the need to understand and manage low level PCI Express chip registers at the application level, easily enabling developers to utilize these resources in their applications without sacrificing performance. Programming features include allocating memory segments, mapping local and remote memory segments into the addressable space of their program, and manage and transfer data with DMA. The SISCI API includes advanced features to improve overall system performance and availability. Caching techniques can be exploited to improve performance and the API can be used to check for data transfer errors and correct them. Features Shared memory API Replicated/reflective memory support Distributed shared memory and DMA support Low latency messaging API Interrupt management Direct memory reads and writes Windows and Linux OS support Caching and error checking support Events and callbacks Example code available Applications High Speed Simulation High Frequency Trading High Availability backup base Acceleration High speed message passing

SISCI Ultra low Latency Shared memory API Why use SISCI? The SISCI software and underlying drivers simplifies the process of building shared memory based applications. The built in resource management enables multiple concurrent SISCI programs and other users of the PCI Express network to coexist and operate independent of each other. For PCI Express based application development, the API utilizes PCI Express Non-transparent bridging to maximum application performance. The shared memory API drivers allocate memory segments on the local node and make this memory available to other nodes. The local node then connects to memory segments on remote nodes. Once available, a memory segment is accessed in two ways, either mapped into the address space of your process and accessed as Figure 1 PIO Movement a normal memory access, e.g. via pointer operations, or use the DMA engine in the PCI Express chipset to transfer data. Figure 1 illustrates both data transfer options. Mapping the remote address space and using PIO may be appropriate for control messages and data transfers up to e.g. 1k bytes, since the processor moves the data with very low latency. PIO optimizes small write transfers by requiring no memory lock down, data may already exist in the CPU cache, and the actual transfer is just a single CPU instruction a write posted store instruction. A DMA implementations saves CPU cycles for larger transfers, enabling overlapped data transfers and computations. DMA has a higher setup cost so latencies usually increase slightly because of the time required to lock down memory and setup the DMA engine and interrupt completion time. However, more data transfers joined and sent together to the PCI Express Switch in order amortizes the overhead. µs 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 SISCI Performance 4 8 16 Figure 2: SISCI Latency 32 64 Latency The SISCI API provides applications direct access to the low latency messaging enabled by PCI Express. Dolphin SISCI benchmarks show latencies as low as 0.74ms. The chart on Figure 2 show the latency at various message sizes. MBytes/s 3500 3000 2500 2000 1500 1000 500 0 4 8 16 32 64 128 256 512 128 256 1k 2k 4k 8k 16k Figure 3: SISCI PIO/DMA Throughput by block size 512 32k 65k 1k 2k 131k 262k 524k 1m 4k 8k SISCI PIO/DMA Thoughput 2m 4m 8m Local Memory Segment System A CPU Dolphin Adapter Dolphin Adapter System B Local Memory Segment The SISCI API enables high throughput applications. This high performance API takes advantage of the PCI Express hardware performance to deliver over 3500 MB/s of real application data throughput for high performance communication. Figure 3 shows the throughput at various message sizes. DMA Movement System A System B Local Memory Segment DMA Engine Control Block Control Block Control Block DMA Queue Dolphin Adapter Dolphin Adapter Segment Figure 1: SISCI Movement Model

Dolphin Interconnect Solutions AS is a wholly-owned subsidiary of Dolphin Group ASA which is listed at Oslo Stock Exchange (OSE ticker:dolp). Dolphin Interconnect Solutions has been a global provider of ultra-low latency, high-bandwidth computer interconnect solutions for high speed real-time systems, clustered databases, general networking, web services and industrial applications for more than 20 years. For more information, please visit www.dolphinics.com. PCI, PCI Express, PCIe, Express Module, and PCI-SIG are trademarks or registered trademarks of PCI-SIG. 2012 Dolphin Interconnect Solutions AS All rights reserved. Dolphin ICS and the Dolphin Logo are registered trademarks. All other trademarks are the property of their respective owners. All statements herein are based on normal operating conditions, are provided for information purposes only and are not intended to create any implied warranty of merchantability or fitness for a particular purpose. Dolphin Interconnect AS reserves the right to modify at any time without prior notice these statements, our products, their performance specifications, availability, price, and warranty and post-warranty programs. www.dolphinics.com