Performance of STAR-System



Similar documents
SpW-10X Network Performance Testing. Peter Mendham, Jon Bowyer, Stuart Mills, Steve Parkes. Space Technology Centre University of Dundee

STAR-LAUNCH AND NETWORK DISCOVERY

D1.2 Network Load Balancing

How To Test A Microsoft Vxworks Vx Works (Vxworks) And Vxwork (Vkworks) (Powerpc) (Vzworks)

The Bus (PCI and PCI-Express)

SIDN Server Measurements

Intel DPDK Boosts Server Appliance Performance White Paper

Performance of Host Identity Protocol on Nokia Internet Tablet

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Performance Evaluation of Linux Bridge

Virtuoso and Database Scalability

OpenFlow Switching: Data Plane Performance

Networking Virtualization Using FPGAs

AgencyPortal v5.1 Performance Test Summary Table of Contents

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

PCI Express High Speed Networks. Complete Solution for High Speed Networking

CT LANforge-FIRE VoIP Call Generator

Benchmarking Cassandra on Violin

Communicating with devices

Understand Performance Monitoring

CT LANforge WiFIRE Chromebook a/b/g/n WiFi Traffic Generator with 128 Virtual STA Interfaces

ZCL for SONYGigECAM Introduction Manual

Wire-speed Packet Capture and Transmission

Performance of Software Switching

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Monitoring high-speed networks using ntop. Luca Deri

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Table of Contents INTRODUCTION Prerequisites... 3 Audience... 3 Report Metrics... 3

Gigabit Ethernet Packet Capture. User s Guide

3.4 Planning for PCI Express

Linux NIC and iscsi Performance over 40GbE

CT LANforge-FIRE VoIP Call Generator

WaveInsite Mobile WLAN Client Interoperability and Performance Testing

Dualog Connection Suite Hardware and Software Requirements

High-performance vswitch of the user, by the user, for the user

I3: Maximizing Packet Capture Performance. Andrew Brown

Dialogic Diva V-1PRI, V-2PRI and V-4PRI (PCI/PCIe)

Virtualization for Cloud Computing

GigE Vision cameras and network performance

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM

Saisei and Intel Maximizing WAN Bandwidth

Introduction to PCI Express Positioning Information

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

1000Mbps Ethernet Performance Test Report

Understanding the Performance of an X User Environment

A way towards Lower Latency and Jitter

Stream Processing on GPUs Using Distributed Multimedia Middleware

AP ENPS ANYWHERE. Hardware and software requirements

Informatica Data Director Performance

Hands-On Microsoft Windows Server 2008

FarSync TE1R. A Universal PCI adapter for E1 and T1 (G.703 / G.704) connections for Linux and Windows

Basic Use of the SPC Feature on 1100R+/H+ Testers

Network Performance Evaluation of Latest Windows Operating Systems

Application Performance Analysis and Troubleshooting

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Networking Driver Performance and Measurement - e1000 A Case Study

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

Hardware Level IO Benchmarking of PCI Express*

Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads

IxChariot Virtualization Performance Test Plan

Problems of Developing Spacewire Ethernet Bridge and Transferring Spacewire Packages Over Ethernet

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

VLAN for DekTec Network Adapters

Office 365 Migration Performance & Server Requirements

FarSync T2Ue. A 2 port PCI Express synchronous communications adapter

Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade

Open Source VoIP Traffic Monitoring

PCI Express* Ethernet Networking

What s New in Mike Bailey LabVIEW Technical Evangelist. uk.ni.com

Hardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Collecting Packet Traces at High Speed

Remote Access Server - Dial-Out User s Guide

The Elements of GigE Vision

Quality of Service su Linux: Passato Presente e Futuro

Intel PCI and PCI Express*

System Requirements - filesmart

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

Performance and Recommended Use of AB545A 4-Port Gigabit Ethernet Cards

Amplicon Core i5/i7 Ventrix and Impact-R new Systems

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

VALIDATION AND TESTING OF AN IP CODEC FOR HIGH BANDWIDTH SPACEWIRE LINK 1

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

VMWARE WHITE PAPER 1

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

BMV-602 Data Link Manual

Network testing with iperf

Features Overview Guide About new features in WhatsUp Gold v12

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Introduction to Intel Ethernet Flow Director and Memcached Performance

Transcription:

Performance of STAR-System STAR-System Application Note Stuart Mills Performance of STAR-System STAR-Dundee s new software stack, STAR-System, provides a high performance software stack for accessing STAR-Dundee interface and router devices, including the STAR-Dundee PCI Mk2, cpci Mk2 and PMC Mk2 devices and the new STAR-Dundee PCIe device. One of the primary goals of the STAR-System APIs and drivers is to provide high performance while keeping CPU usage to a minimum. This application note describes how performance of the system is measured, and how you can repeat these tests. Most importantly, it provides details of the performance that can be achieved. STAR-System Performance Tester The STAR-System Performance Tester is an application provided with STAR-System. It was written to allow tests to be run to test the performance and CPU usage of both the software stack and the devices being used. It is also provided as source code, so is a useful example of how to write code to get the best performance from the API. The Performance Tester provides three different types of test, each of which is described below. These tests can be used to transmit packets, receive packets, or both. Transmit only tests can be used to check the performance of the transmitter. The packets can be sent to an address where the packets will be dropped (e.g. an invalid port on a router) or to a receiver which is capable of receiving at the maximum link rate being used. Transmit only tests can also be used to test communication with another instance of the Performance Tester, which may be running on the same PC, or on a different PC. In such cases, the other instance of the Performance Tester should be used to receive packets. The most common way to use the software, however, is to both transmit and receive packets. Maximum Data Rate Test The maximum data rate test transmits and receives a large number of packets and times how long it takes from the first packet being transmitted to the last packet being received. It then uses this information to calculate the data and packet rates, and the average time per packet. The total CPU usage is also recorded, as is the CPU usage of the Performance Tester process. Assuming there is no other activity on the PC while the tests are running (which is unlikely due to virus scanners, etc.) the difference between these two figures will probably indicate work being done by the device driver, outside the threads of the Performance Tester process. This test is repeated for each of the packet sizes specified. A minimum and maximum packet size is specified, as is the step to get from the minimum to the maximum. In each run of this test, the packet sizes remain fixed. Application Note

2 Performance of STAR-System Random Packet Size Data Rate Test The random packet size data rate test is very similar to the maximum data rate test it transmits and receives packets as quickly as possible, recording data rate, packet rate, average packet time and CPU usage. The difference from the maximum data rate test is that the size of the packets being sent is random. Each test transmits a number of different packet sizes, each of which is between the current minimum size and the maximum size specified. The minimum size increases after each test by the step value specified. Although this test does not provide an accurate measurement of the system (each run of the test is likely to produce different results), it is a useful indication of how the system can cope with packets of various sizes. Latency Test The latency test transmits a single packet then receives a single packet. It transmits the next packet once the last packet has been received. The test is repeated a number of times for a single packet size, before displaying the data rate, packet rate, average packet time and CPU usage. The test is then repeated for each packet size specified. The important information from the latency test is the average packet time. This indicates the average time taken to transmit a packet and then receive that packet the combined transmit and receive latency. Performance Figures The results obtained by the Performance Tester can be written to file, generating a text file containing tab separated values for the figures recorded in each test. This text file can be loaded in to a spreadsheet such as Microsoft Excel and graphed. The graphs below show the results of running the Performance Tester on a reasonably low spec PC (using an Intel Core 2 Duo E75 running at 2.93 GHz, with 2 GBytes of memory, and bought for around 3 in October 211), with a SpaceWire PCI Mk2 device connected, and using various operating systems. The tests involve transmitting packets out of port 1 of the device, with a cable connected between ports 1 and 3, and receiving the packets on port 3. Each test was performed with the links running at the PCI Mk2 s maximum link speed of 2 Mbits/s. Note that SpaceWire encodes an 8-bit character as 1 bits, which means the maximum theoretical data rate achievable on a 2 Mbits/s link is 1 Mbit/s. Each packet is also terminated with a 4-bit end of packet marker which affects the data rate that can be achieved, particularly for smaller packets, e.g. a 1 byte packet will be represented using 14 bits. The performance test results graphs show the packet size in bytes along the x-axis, with the data rate in Mbits/s and the processor usage as a percentage along the y-axis. The latency tests do not include processor usage figures, as these tests aren t designed to push the system to its limits. As with the performance test results, the latency test results show the packet size on the x-axis. The y-axis shows the average packet time, and hence the average latency, in microseconds. SD_TN_1

Performance of STAR-System 3 Windows 7 32-bit Tests The following tests were run on the 32-bit version of Windows 7. Data Rate Test Results 1 1 12 Process CPU usage (%) Total CPU usage (%) 2 2 Latency Test Results 1 1 1 12 2 2 SD_TN_1

4 Performance of STAR-System Windows 7 64-bit Tests The following tests were run on the 64-bit version of Windows 7. Data Rate Test Results 1 1 12 Process CPU usage (%) Total CPU usage (%) 2 2 Latency Test Results 1 1 1 12 2 2 SD_TN_1

Performance of STAR-System 5 Ubuntu Linux 32-bit Tests The following tests were run on a 32-bit 2.6.38 Linux kernel using the Ubuntu 11.4 distribution. Data Rate Test Results 1 1 12 Process CPU usage (%) Total CPU usage (%) 2 2 Latency Test Results 1 1 1 12 2 2 SD_TN_1

6 Performance of STAR-System Ubuntu Linux 64-bit Tests The following tests were run on a 64-bit 2.6.38 Linux kernel using the Ubuntu 11.4 distribution. Data Rate Test Results 1 1 12 Process CPU usage (%) Total CPU usage (%) 2 2 Latency Test Results 1 1 1 12 2 2 SD_TN_1

Performance of STAR-System 7 Assessing the Results of the Tests These graphs show that performance of STAR-System is relatively consistent across all platforms tested. Data rate throughput performance is very good, with the rates being very close to the maximum possible rates that can be achieved, even with packets as small as bytes. The latency of the packets is reasonable, considering these tests were not being performed on realtime operating systems, with a worst case of less than microsecond latency in addition to the latency on the link. Latency is better on Windows than on Linux. There is very little difference between the 32-bit and 64-bit versions of Windows, although latency is slightly better on 64-bit. The granularity of the latency timing is not as good as on Linux, which is why a stepped graph is seen. On Linux, throughput performance is better than Windows with smaller packets, and CPU usage is lower. Again, there isn t much difference between the 32-bit and 64-bit figures, although latency on the 32-bit kernel seems to be more variable. For comparison, the graphs on the following page show the maximum theoretical data rate that can be achieved on a 2 Mbits/s link, and the minimum latency time to transmit a packet on a SpaceWire link at this rate. Note that these data rate figures are when packets are only going in one direction. If packets were being transmitted in both directions, the maximum rate would be between 4 and 8 Mbits/s lower due to the 4-bit FCTs which would also be transmitted on the link in each direction. Although these results show that STAR-System and the SpaceWire PCI Mk2 provide excellent throughput and latency, STAR-Dundee s engineers are always endeavouring to make improvements. It is hoped that the CPU usage of STAR-System on all platforms can be reduced, and attempts will also be made to improve the latency on Linux in the near future. Getting the Best Performance from Your Applications An important point to note with any tests performed using the Performance Tester is that all tests indicate the maximum possible rates that can be achieved. These figures might not be achievable when other tasks are being performed with the packets and data being transmitted and received. For example, writing data to a file can be very time consuming, so it s unlikely that the same data rates can be achieved when writing each received packet to a file. The code in the Performance Tester is also written to achieve the best possible performance. Multiple packets are transmitted at any one time when running the maximum and random packet size data rate tests, and it s unlikely that the same levels of performance will be achieved if each packet is transmitted individually. When writing an application it s important to take in to consideration the characteristics of the packets to be transmitted and received. If packets should be transmitted with the minimum possible latency, then it might be necessary to transmit each packet as soon as it is ready. If throughput is most important, it might be better to transmit packets in groups. STAR-Dundee prides itself on its support and can provide suggestions on the most effective implementation method to achieve the performance required for a given situation. SD_TN_1

8 Performance of STAR-System Theoretical Maximum for Performance Tests 1 1 12 2 2 Theoretical Maximum for Latency Tests 1 1 1 12 2 2