Informatica Ultra Messaging SMX Shared-Memory Transport



Similar documents
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Informatica Dynamic Data Masking

How To Improve Your Communication With An Informatica Ultra Messaging Streaming Edition

Integrating Verification and Hygiene into Marketo

Consolidating Multiple Salesforce Orgs: A Best Practice Guide. White Paper

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Informatica and the Vibe Virtual Data Machine

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

5 Ways Informatica Cloud Data Integration Extends PowerCenter and Enables Hybrid IT. White Paper

10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications

The Informatica Solution for Improper Payments

Data Center Consolidation in the Public Sector

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Understanding the Benefits of IBM SPSS Statistics Server

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

White Paper Solarflare High-Performance Computing (HPC) Applications

Test Data Management for Security and Compliance

Ultra-Low Latency, High Density 48 port Switch and Adapter Testing

Informatica Cloud MDM for Salesforce: Hierarchy Management. White Paper

RevoScaleR Speed and Scalability

Low Latency Test Report Ultra-Low Latency 10GbE Switch and Adapter Testing Bruce Tolley, PhD, Solarflare

Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

A Road Map to Successful Customer Centricity in Financial Services. White Paper

Healthcare Data Management

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

The Requirements for Universal Master Data Management (MDM) White Paper

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

A Practical Guide to Legacy Application Retirement

Multi-core and Linux* Kernel

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Low-latency market data delivery to seize competitive advantage. WebSphere Front Office for Financial Markets: Fast, scalable access to market data.

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

Accelerating Microsoft Exchange Servers with I/O Caching

An Oracle Benchmarking Study February Oracle Insurance Insbridge Enterprise Rating: Performance Assessment

Nexenta Performance Scaling for Speed and Cost

VMWARE WHITE PAPER 1

PARALLELS CLOUD SERVER

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

White Paper. Recording Server Virtualization

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

How To Create A Healthcare Data Management For Providers Solution From An Informatica Data Management Solution

Your Old Stack is Slowing You Down. Ajay Patel, Vice President, Fusion Middleware

Big Data and Natural Language: Extracting Insight From Text

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Top Five Reasons Not to Master Your Data in SAP ERP. White Paper

Vormetric and SanDisk : Encryption-at-Rest for Active Data Sets

The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor

StreamBase White Paper StreamBase versus Native Threading. By Richard Tibbetts Co-Founder and Chief Technology Officer, StreamBase Systems, Inc.

We look beyond IT. Cloud Offerings

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

White Paper. An Introduction to Informatica s Approach to Enterprise Architecture and the Business Transformation Toolkit

Understanding Data Locality in VMware Virtual SAN

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

What Your CEO Should Know About Master Data Management

HP ProLiant DL380 G5 takes #1 2P performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Windows

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Mission-Critical Java. An Oracle White Paper Updated October 2008

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

Enterprise Data Integration

Low Latency Market Data and Ticker Plant Technology. SpryWare.

Benchmarking the OpenCloud SIP Application Server on Intel -Based Modular Communications Platforms

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

PCI Express High Speed Networks. Complete Solution for High Speed Networking

Salesforce.com to SAP Integration

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

Improving Grid Processing Efficiency through Compute-Data Confluence

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Intel DPDK Boosts Server Appliance Performance White Paper

An Oracle White Paper October Oracle Data Integrator 12c New Features Overview

Replicating Salesforce.com Data for Operational Reporting and Compliance WHITE PAPER

Adobe LiveCycle Data Services 3 Performance Brief

Informatica Data Director Performance

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

Interoperability Testing and iwarp Performance. Whitepaper

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

White Paper. Data Quality: Improving the Value of Your Data

Enabling Technologies for Distributed and Cloud Computing

Can High-Performance Interconnects Benefit Memcached and Hadoop?

SQL Server Business Intelligence on HP ProLiant DL785 Server

Centrata IT Management Suite 3.0

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

White Paper. Successful Legacy Systems Modernization for the Insurance Industry

An Oracle White Paper August Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Cloud Computing through Virtualization and HPC technologies

The Informatica Platform for Data Driven Healthcare

Transcription:

White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance

This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information ) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software product as well as the timing of any such release or upgrade is at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published June 2013

White Paper Table of Contents Executive Summary... 2 SMX Transport Description and Electronic Trading Use Cases.... 3 SMX Performance Benchmark... 4 Purpose and Performance Test Conditions...4 Performance Test Methodology...5 Performance Test Results....6 Conclusion... 11 Appendix A: Numeric Tables of Test Results.... 12 Appendix B: Technology Stack Test Configturations... 15 Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 1

Executive Summary Informatica Ultra Messaging is software that significantly improves communication for electronic trading applications. Its high performance and architectural innovation have advanced business strategies and trading profitability throughout the capital markets industry. Furthering its position as a leader, in May 2013, Informatica Ultra Messaging (formerly 29West) released a new shared-memory communication transport: Shared Memory Acceleration (SMX). SMX is optimized for high-speed electronic trading applications requiring the absolute lowest message latency and highest throughput. It breaks the 100-nanosecond latency barrier and achieves ultra-high throughput that scales linearly with increased message size and additional consumers. This white paper begins with a summary of the newly released SMX shared-memory transport and its most common electronic trading use cases. It then describes the purpose, conditions, and methodology of benchmark tests conducted by Ultra Messaging engineers. Full test results of latency and throughput with C, Java, and C# applications are then reported in graphs and numeric tables, demonstrating the superior streaming messaging performance of SMX. The results show dramatic improvement in performance over the previous Ultra Messaging shared-memory benchmark completed in May 2010 specifically, a 16.5x decrease in latency and an 11x increase in aggregate throughput for like-kind test parameters. With these tests, Informatica Ultra Messaging has demonstrated: Lowest latency messaging in the industry, as low as 39 nanoseconds from sender to receiver Sub-100-nanosecond latency across C, Java, and C# in hyperthreading scenarios Aggregate throughput to multiple receivers of 320 million messages per second Linear scalability across multiple CPU cores for both latency and throughput The Informatica Ultra Messaging SMX benchmark sets a new standard, showing that Ultra Messaging is the fastest and highest performing messaging software in the industry. The delivery of SMX as a core part of the Ultra Messaging software product family, along with the conducting of this benchmark, is a continuation of the significant innovation Informatica has shown in the use of sharedmemory transports for inter-process communication (IPC) within the same physical server. These transports take advantage of parallel computing architectures supported by multi-core CPUs to accelerate the transfer of data from application to application. They also further enable the trading strategy of combining market data feed handlers with trading algorithms on the same host in order to respond to market changes as fast as possible. Additionally, due to its unprecedented performance, SMX is the first shared-memory transport that can effectively be used for inter-thread communication (ITC) as well as IPC, and it provides a superior level of performance natively for all three major application development languages: C, Java, and C#. 2

SMX Transport Description and Electronic Trading Use Cases SMX is new to the family of Informatica Ultra Messaging transports that trading firm customers can use for their application deployments. It is a complete redesign that supplements the existing IPC transport (LBT-IPC) and enables the software to take full advantage of the design and capabilities of modern x86 64-bit multi-core architecture CPUs. (SMX is only supported on 64-bit processors and applications.) SMX is a zero-copy transport that enables direct reads and writes into the same, shared-memory space without contention and synchronization overhead. It supports single-threaded (lock-free), single-writer, and multiplereader patterns with zero-copy APIs and reliable streaming (receiver-paced) delivery. SMX supports fully native C, Java, and C# implementations, is designed for many-core machines, and requires a dedicated CPU core for each SMX receiver thread (just as the existing LBT-IPC transport does). In the Informatica Ultra Messaging family of transport protocols, SMX is the core low-latency transport. As shown in the benchmark tests that follow, it delivers sub-100-nanosecond latencies and extremely high aggregate throughput rates. For these reasons, Informatica believes SMX will be of particular interest in trading firm strategies where having the fastest and highest performing platform is critical for profitability. These strategies include: High-frequency trading employed by proprietary trading firms and hedge funds Market makers on either side of a trade who commit to streaming executable prices that are posted electronically and can change instantaneously Arbitrage strategies that rely on speed to exploit market price discrepancies Algorithmic trading that relies on new market price updates being rapidly streamed from exchanges or consolidated market data feeds as quickly as possible Smart order routing that relies on speed to send orders as quickly as possible to liquidity venues that offer the best prices Due to its unprecedented performance, Informatica Ultra Messaging SMX also changes the game for how messaging software can be utilized. Traditionally, messaging software has been used for communication between different application processes. Although any shared-memory transport can theoretically be used also for communication between threads in the same application (inter-thread communication or ITC), developers have historically not considered messaging as a tool for passing data between threads. This is because of the relatively high latency of messaging transports compared to traditional data passing techniques such as sharing data structures across threads or putting data on internal queues to be processed by other threads. But Informatica Ultra Messaging SMX has shifted this paradigm, becoming the first messaging transport to offer developers a compelling solution for both ITC and IPC. This closes the latency gap between processes and threads and has the potential to significantly reshape application and system design. Furthermore, SMX is the only shared-memory transport to deliver this high level of performance natively and consistently for all three major application development languages C, Java, and C# with interoperability among them. This is important first because high-performance systems are often developed in multiple languages (e.g., market data feed handlers written in C; algo engines written in Java). It s significant also because the high performance and seamless interoperability that SMX provides across all three languages allow developers to use the best features of each language without sacrificing performance. Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 3

SMX Performance Benchmark Purpose and Performance Test Conditions The purpose of this SMX performance benchmark is to demonstrate in a reproducible and transparent way the extremely low-messaging latency and high-messaging throughput achievable using the new shared-memory SMX transport on a single server. The benchmark also demonstrates the advantages of hyperthreading in Intel processors, which have improved tremendously in recent years, and how SMX takes full advantage of this capability. Three tests were performed for each of the three major application development languages C, Java, and C#: two tests measured average latency from one sender to one receiver across a range of message sizes; one test measured aggregate throughput (messages per second) from one sender to an increasing number of receivers. Average latency (nanoseconds) relative to message size: core to hyperthread Range of latencies is from 39 nanoseconds for 16-byte message sizes in C to 135 nanoseconds for 512-byte message sizes in C# Range of R2 coefficient of determination values (indicating degree of linear regression fit to data) is 0.9613 for Java to 0.9974 for C# (a value of 1 equals perfect linear fit) Average latency (nanoseconds) relative to message size: core to sibling core Range of latencies is from 103 nanoseconds for 16-byte message sizes in C to 173 nanoseconds for 512-byte message sizes in Java Range of R2 values is 0.9761 for Java to 0.9916 for C Aggregate throughput (millions of messages/second): 1 sender to 1-19 receivers (C & Java); 1 sender to 1-7 receivers (C#) Highest throughput for C (with batched message delivery) is 320 million messages/second at 16 receivers Highest throughput for Java (with batched message delivery) is 286.1 million messages/second at 18 receivers Highest throughput for C# is 137.3 million messages/second at 7 receivers (identical results for batched and non-batched message delivery) In these tests, senders and receivers were mapped to individual logical cores (meaning a single hyperthread within a given core). Hence, the average latency measurements for core-to-hyperthread scenarios represent SMX communication between two different application threads in a single core. Similarly, the average latency for core-to-sibling-core scenarios reflects SMX communication between two different application threads in separate cores. For technology stack configurations, see Appendix B. 4

Performance Test Methodology Testing was performed to determine the maximum performance of the technology stack for two simple benchmarks: lowest latency (within a single core and across multiple cores on a single CPU socket) and highest aggregate throughput from one sender to multiple receivers on a single CPU (with and without batched message delivery). Latency For all three development languages, the latency tests were measured as the round-trip time (RTT) divided by 2 of sending messages from a single sending process to a single receiving process and back again (Ping-Pong Test). Latencies are shown in nanoseconds and were calculated as the average latency relative to message size for both hyperthreading communication (between two threads in a single core) and communication between a thread in one core and a thread in a sibling core on the same socket/cpu. Latency measurements were performed using the Ultra Messaging lbmlatping and lbmlatpong applications. The ping application generates messages and sends them to the pong application, which simply reflects the messages it receives back to the sending application. The ping side uses the same settings as the pong side to ensure that both sides of the round-trip are equal. For the C version of the test applications, two time stamps were taken for each message, one from just before sending a message from the ping application and another immediately upon receiving that message back from the pong application. For the Java and C# versions of the test applications, a time source with a resolution high enough to measure individual message latencies in nanoseconds was not available natively from the run time, so time stamps were taken before sending the first message in a small group of messages and upon receiving the last reflected message in the group. The average latency for messages in each group was then computed by dividing the total time per group by the number of messages in the group. These round-trip times were then divided by 2 to get the approximate oneway latency. Throughput is measured as a rate of millions of messages delivered per second and shown in aggregate as the number of receivers scaled from 1 to 19. This represents an application, which is on 1 thread in a single core, publishing messages that are subscribed to by the other 19 threads on a 10-core CPU socket (see Appendix B for hardware configuration). The throughput graphs show how the aggregate messaging rate scales linearly for 16-byte messages (with and without batched message delivery) as the additional receivers incrementally subscribe to the message stream. Throughput measurements were performed using the lbmlatsrc and lbmlatrcv applications. lbmlatsrc generates a series of messages, as fast as it can, and publishes them to an SMX shared memory segment. Meanwhile, lbmlatrcv reads messages from the shared memory segment, also going as fast as it can. The purpose of throughput testing with a shared-memory transport is essentially to demonstrate a 1:1 linear growth in message rate as CPU cores are added. This amounts to measuring the throughput of a 1:1 fan-out, and then adding receiving applications until the test system runs out of cores, as well as measuring the relative aggregate increase in messages delivered. Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 5

Performance Test Results The test results are extremely impressive. They show an order of magnitude of improvement over the previous Ultra Messaging shared memory performance benchmark completed in May of 2010 specifically a 16.5x decrease in latency and an 11x increase in aggregate throughput for like-kind test parameters. The following figures supply graphs and descriptions of each test result by development language. For the numeric data of these test results, see Appendix A. Results for C Latency and Throughput C Average Latency from Core to Hyperthread 90 80 Ping-Pong RTT/2 Latency R 2 =0.9899 Latency (nanoseconds) 70 60 50 40 30 20 10 0 1632 64 128 256 512 Figure 1. Average latency for messaging on C applications scales linearly from 39 nanoseconds for 16-byte message sizes to 81 nanoseconds for 512-byte message sizes for hyperthreaded communication between processes on a single core. C Average Latency from Core to Sibling Core Latency (nanoseconds) 160 140 120 100 80 60 40 Ping-Pong RTT/2 Latency R 2 =0.9916 20 0 1632 64 128 256 512 6 Figure 2. Average latency for messaging on C applications scales linearly from 103 nanoseconds for 16-byte message sizes to 135 nanoseconds for 512-byte message sizes for communication between processes on different cores on a single CPU socket.

C Receiver Fan-Out, 16-Byte Messages Throughput (millions of messages/sec) 350 300 250 200 150 100 50 0 w/batching 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Number of Receivers Figure 3. Highest aggregate throughput (with batched message delivery) for C application receiver fan-out is achieved at 16 subscribing receivers with a rate of almost 320 million messages per second. Results for Java Latency and Throughput Java Average Latency from Core to Hyperthread Latency (nanoseconds) 120 100 80 60 40 20 0 Ping-Pong RTT/2 Latency R 2 =0.9613 1632 64 128 256 512 Figure 4. Average latency for messaging on Java applications scales linearly from 87 nanoseconds for 16-byte message sizes to 106 nanoseconds for 512-byte message sizes for hyperthreaded communication between processes on a single core. Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 7

Java Average Latency from Core to Sibling Core 200 180 Ping-Pong RTT/2 Latency R 2 =0.9761 Latency (nanoseconds) 160 140 120 100 80 60 40 20 0 1632 64 128 256 512 Figure 5. Average latency for messaging on Java applications scales linearly from 108 nanoseconds for 16- byte message sizes to 173 nanoseconds for 512-byte message sizes for communication between processes on different cores on a single CPU socket. Java Receiver Fan-Out, 16 Byte Messages Throughput (millions of messages/sec) 350 300 250 200 150 100 50 0 w/batching 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Number of Receivers Figure 6. Highest aggregate throughput (with batched message delivery) for Java application receiver fan-out is achieved at 18 subscribing receivers with a rate of 286.1 million messages per second. 8

Results for C# Latency and Throughput C# Average Latency from Core to Hyperthread Latency (nanoseconds) 160 140 120 100 80 60 40 Ping-Pong RTT/2 Latency R 2 =0.9974 20 0 1632 64 128 256 512 Figure 7. Average latency for messaging on C# applications scales linearly from 86 nanoseconds for 16-byte message sizes to 135 nanoseconds for 512-byte message sizes for hyperthreaded communication between processes on a single core. C# Average Latency from Core to Sibling Core Latency (nanoseconds) 180 160 140 120 100 80 60 40 Ping-Pong RTT/2 Latency R 2 =0.9829 20 0 1632 64 128 256 512 Figure 8. Average latency for messaging on C# applications scales linearly from 114 nanoseconds for 16- byte message sizes to 167 nanoseconds for 512-byte message sizes for communication between processes on different cores on a single CPU socket. Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 9

C# Receiver Fan-Out, 16-Byte Messages Throughput (millions of messages/sec) 160 140 120 100 80 60 40 20 0 w/batching 1 2 3 4 5 6 7 Number of Receivers Figure 9. Highest aggregate throughput for C# application receiver fan-out is achieved at 7 subscribing receivers with a rate of 137.3 million messages per second (same results achieved with and without batched message delivery). 10

Conclusion Informatica Ultra Messaging software has advanced trading strategies throughout the capital markets industry by significantly improving communication for electronic trading applications. With the release of the Ultra Messaging SMX shared-memory transport and this performance benchmark report, Informatica further strengthens its commitment to providing customers with the highest performing messaging software. As demonstrated in performance test data, SMX has broken the 100-nanosecond latency barrier on commodity hardware and proven to be the fastest messaging software available in the industry with latencies as low as 39 nanoseconds. Based on this benchmark, Informatica SMX provides a compelling solution to increase profitability when engaged in trading strategies that require the absolute lowest latency and highest throughput. SMX also increases developer productivity by enabling the use of commercial messaging for both IPC and ITC communication, and it facilitates seamless interoperability among C, Java, and C# applications in shared-memory environments. ABOUT INFORMATICA ULTRA MESSAGING Informatica Ultra Messaging (formerly 29West) is the highest performance, lowest latency, and most functional and efficient messaging software available. With more than 200 front-office electronic trading deployments globally, it is the de facto capital markets industry standard for high-performance messaging. Of the top 10 investment banks, 9 rely on it; in addition, 8 of the top 10 FX dealers use it, as do proprietary trading firms, hedge funds, and leading electronic exchanges. After switching to the Ultra Messaging API, customers have reported up to 10x performance improvements in latency reduction and message throughput increase, dramatic reductions in infrastructure costs, and uninterrupted business operations. ABOUT INFORMATICA Informatica Corporation (NASDAQ: INFA) is the world s number one independent provider of data integration software. Organizations around the world rely on Informatica for maximizing return on data to drive their top business imperatives. Worldwide, over 4,630 enterprises depend on Informatica to fully leverage their information assets residing onpremise, in the Cloud and across social networks. Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 11

Appendix A: Numeric Tables of Test Results Figure 1. C Average Latency from Core to Hyperthread Average Latency (nanoseconds) 16 39 32 39 64 42 128 48 256 63 512 81 Figure 2. C Average Latency from Core to Sibling Core Average Latency (nanoseconds) 16 103 32 104 64 104 128 111 256 120 512 135 Figure 3. C Receiver Fan-Out, 16-Byte Messages 12 Number of Receivers (no batching) (millions of messages/sec) 1 42.02 43.80 2 32.86 56.94 3 58.90 85.75 4 53.21 94.90 5 66.93 121.73 6 72.41 136.51 7 87.35 165.83 8 92.29 176.13 9 105.68 216.57 10 107.58 210.90 11 115.24 248.72 12 113.47 256.50 13 122.36 284.57 14 117.61 292.09 15 129.22 288.70 16 132.23 319.86 17 135.11 299.43 18 132.32 307.68 19 133.92 305.34 (with batching) (millions of messages/sec)

Figure 4. Java Average Latency from Core to Hyperthread 16 87 32 89 64 91 128 93 256 100 512 106 Average Latency (nanoseconds) Figure 5. Java Average Latency from Core to Sibling Core 16 87 32 89 64 91 128 93 256 100 512 106 Average Latency (nanoseconds) Figure 6. Java Receiver Fan-Out, 16-Byte Messages Number of Receivers (no batching) (millions of messages/sec) 1 15.50 15.65 2 27.50 31.11 3 46.50 45.78 4 53.17 62.76 5 77.59 75.57 6 76.79 93.05 7 112.18 111.48 8 99.98 125.64 9 140.38 136.61 10 122.30 155.61 11 163.00 167.08 12 135.16 184.31 13 164.71 201.83 14 145.26 213.83 15 195.63 231.43 16 159.90 245.92 17 176.67 263.15 18 164.95 286.10 19 194.69 285.12 (with batching) (millions of messages/sec) Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 13

Figure 7. C# Average Latency from Core to Hyperthread 16 86 32 88 64 93 128 99 256 110 512 135 Average Latency (nanoseconds) Figure 8. C# Average Latency from Core to Sibling Core 16 114 32 119 64 119 128 124 256 145 512 167 Average Latency (nanoseconds) Figure 9. C# Receiver Fan-Out, 16-Byte Messages Number of Receivers (no batching) (millions of messages/sec) 1 38.46 38.46 2 76.92 76.92 3 115.38 96.77 4 78.43 78.43 5 98.04 98.04 6 117.65 117.65 7 137.25 137.25 (with batching) (millions of messages/sec) 14

Appendix B: Technology Stack Test Configturations Hardware and Operating System Configurations C and Java Latency Tests The test system consisted of one machine with one socket, four cores, and eight hyperthreads. CPU/Processor Intel Xeon E5-1620 @ 3.60GHz RAM 8 GB O/S CentOS 6.4 C and Java Throughput Tests The test system consisted of one machine with two sockets, each with a 10-core CPU package with hyperthreading (total of 20 threads in each CPU package). CPU/Processor Intel Ivy Bridge @ 2.80GHz, 10-core RAM 32 GB O/S RHEL 6.2 C# Latency and Throughput Tests The test system consisted of one machine with one socket, four cores, and eight hyperthreads. CPU/Processor RAM O/S Intel Xeon E5-1620 @ 3.60GHz 8 GB Windows 7 Professional SP1 BIOS Configurations All Tests and Machines C3-Report C6-Report Hyperthreading Turbo Mode Intel SpeedStep Enabled Enabled Enabled Enabled Enabled Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 15

Software Versions and Configurations Informatica Ultra Messaging Version 6.1 of Ultra Messaging Streaming Edition All LBT-SMX transport-specific configurations set to their default values UM Test Applications: lbmlatping/lbmlatpong (latency); lbmlatsrc/lbmlatrcv (throughput) Java Version 1.7.0-21 Java SE Runtime Environment (build 1.7.0-21-b11) Java HotSpot 64-bit Server VM (build 23.21-b01; mixed mode) 16

Informatica Ultra Messaging SMX Shared-Memory Transport: Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance 17

Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA Phone: 650.385.5000 Fax: 650.385.5500 Toll-free in the US: 1.800.653.3871 informatica.com linkedin.com/company/informatica twitter.com/informaticacorp 2013 Informatica Corporation. All rights reserved. Informatica and Put potential to work are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks. IN09_0613_02476