StreamBase White Paper StreamBase versus Native Threading. By Richard Tibbetts Co-Founder and Chief Technology Officer, StreamBase Systems, Inc.



Similar documents
Performance & Scalability Characterization. By Richard Tibbetts Co-Founder and Chief Architect, StreamBase Systems, Inc.

Case Study: BlueCrest Capital Management Ltd.

Informatica Ultra Messaging SMX Shared-Memory Transport

SOLUTION BRIEF. TIBCO StreamBase for Algorithmic Trading

StreamBase White Paper Real-Time Profit & Loss

FX Trading: A Changing Landscape

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

SOLUTION BRIEF. TIBCO StreamBase for Foreign Exchange

An examination of the dual-core capability of the new HP xw4300 Workstation

Improving Grid Processing Efficiency through Compute-Data Confluence

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Netezza and Business Analytics Synergy

RevoScaleR Speed and Scalability

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

The IBM Cognos Platform for Enterprise Business Intelligence

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Vectorwise 3.0 Fast Answers from Hadoop. Technical white paper

Esri ArcGIS Server 10 for VMware Infrastructure

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Understanding the Benefits of IBM SPSS Statistics Server

Parallel Computing with MATLAB

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Solido Spam Filter Technology

System Requirements Table of contents

Cisco Data Preparation

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

PARALLELS CLOUD SERVER

Improved LS-DYNA Performance on Sun Servers

Integrated Grid Solutions. and Greenplum

Zend and IBM: Bringing the power of PHP applications to the enterprise

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture

TIBCO Live Datamart: Push-Based Real-Time Analytics

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure

Jitterbit Technical Overview : Salesforce

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Scalability and Performance Report - Analyzer 2007

4D and SQL Server: Powerful Flexibility

Accelerating Business Intelligence with Large-Scale System Memory

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Cisco Unified Computing System and EMC VNX5300 Unified Storage Platform

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

on an system with an infinite number of processors. Calculate the speedup of

DEPLOYMENT ARCHITECTURE FOR JAVA ENVIRONMENTS

Data Sheet VISUAL COBOL WHAT S NEW? COBOL JVM. Java Application Servers. Web Tools Platform PERFORMANCE. Web Services and JSP Tutorials

Numerix CrossAsset XL and Windows HPC Server 2008 R2

Multicore Parallel Computing with OpenMP

Business white paper. HP Process Automation. Version 7.0. Server performance

Comparing the Network Performance of Windows File Sharing Environments

Lua as a business logic language in high load application. Ilya Martynov ilya@iponweb.net CTO at IPONWEB

HP ProLiant BL460c takes #1 performance on Siebel CRM Release 8.0 Benchmark Industry Applications running Linux, Oracle

ICONICS Choosing the Correct Edition of MS SQL Server

White Paper. Recording Server Virtualization

CHAPTER FIVE RESULT ANALYSIS

Contents. 2. cttctx Performance Test Utility Server Side Plug-In Index All Rights Reserved.

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Oracle Utilities Mobile Workforce Management Benchmark

Going to the wire: The next generation financial risk management platform

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

GPUs for Scientific Computing

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Databricks. A Primer

MAGENTO HOSTING Progressive Server Performance Improvements

Drivers to support the growing business data demand for Performance Management solutions and BI Analytics

APPLICATION NOTE. Alcatel-Lucent Virtualized Service Router - Route Reflector. Creating a new benchmark for performance and scalability

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

The 8Gb Fibre Channel Adapter of Choice in Oracle Environments

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

MakeMyTrip CUSTOMER SUCCESS STORY

An Oracle Benchmarking Study February Oracle Insurance Insbridge Enterprise Rating: Performance Assessment

Algorithmic Trading with MATLAB Martin Demel, Application Engineer

SQL Server 2005 Features Comparison

Performance Analysis of Mixed Distributed Filesystem Workloads

Enabling Technologies for Distributed and Cloud Computing

The 21st Century Time-series Database. Making real-time decisions on billions of current data events and trillions of historical records.

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Clustering Billions of Data Points Using GPUs

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Dell and JBoss just work Inventory Management Clustering System on JBoss Enterprise Middleware

Low-latency market data delivery to seize competitive advantage. WebSphere Front Office for Financial Markets: Fast, scalable access to market data.

Multi-Threading Performance on Commodity Multi-Core Processors

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

See the Big Picture. Make Better Decisions. The Armanta Technology Advantage. Technology Whitepaper

CloudHarmony Performance Benchmark: Select High-Performing Public Cloud to Increase Economic Benefits

White Paper. Advantages of IBM Notes and Domino 9 Social Edition for Midsize Businesses. 89 Fifth Avenue, 7th Floor. New York, NY 10003

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

FX Trading: The Next Generation FX Trading and Technology Trends in 2010

2009 Oracle Corporation 1

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Transcription:

StreamBase White Paper StreamBase versus Native Threading By Richard Tibbetts Co-Founder and Chief Technology Officer, StreamBase Systems, Inc.

Motivation for Benchmarking StreamBase Many StreamBase customers are interested on the relative performance of StreamBase and traditional implementation technologies such as C++ or Java. Some have concerns that the graphical development environment or use of Java will have a negative performance impact. The following benchmark demonstrates how StreamBase s flexible and powerful approach to threading outperforms native threading in both C++ and Java on the kind of analytic workloads found in real-time capital markets applications. Problem Statement The benchmark is an implementation of parallelized options pricing in C++, Java, and StreamBase. Given an algorithm specified in Visual Basic, implementations are to be produced in the three systems, and executed over one million data points. The threading and state management is as follows: Thread 1 runs the analytical version of Black Scholes and produces output Thread 2 runs the Quasi Monte Carlo version of Black Scholes and produces output Thread 3 calculates the average of the two, and also calculates a weighted average of this run, and the previous run (so we have a State ) The result of the benchmark is the elapsed time to process one million calculations. The system is executed on a 64-bit quad core Linux server. Implementation Details All implementations derive from the Visual Basic code delivered in the Excel sheet. Pure C++ The C++ implementation of Black Scholes and QMC Black Scholes is a mechanical translation of the Visual Basic code. The threading is implemented using the boost::thread library from http://www.boost.org, a standard portable C++ threading library. Threads 1 and 2 run independently, synchronizing their work via mutex, while thread 3 is the main thread responsible for calculations as well as overall timing. Pure Java The Java implementation of Black Scholes and QMC Black Scholes is a nearly mechanical translation of the Visual Basic code. The only change was to remove the array allocation in the QMC implementation, because the array was unused. Threading is implemented using standard Java threads, in a similar fashion to the C++ implementation. 2

The StreamBase Implementation StreamBase provides a built-in implementation of Black Scholes, but not Quasi Monte Carlo Black Scholes. Best practice in StreamBase development is to use plugins for this kind of purely mathematical operation, and to reserve StreamBase for the threading and state management. Plugins are either custom developed, or drawn from standard libraries like JQuantlib. The StreamBase implementation uses the Java or C++ implementation from above. The built in Black Scholes is not used, for simplicity of results analysis. Execution Platform The execution platform is a Dell SC1430 with 4 Xeon 5110 cores running at 1.6GHz with 4GB of RAM. The operating system is 64 bit RHEL 5. Benchmark Results The raw results are reflected in the following table. We show the results for full native C++ and Java implementations, as well as StreamBase implementations using either the C++ or the Java plugin functions. All implementations implement the parallelism as described in the specification. Threading Functions Evaluations Elapsed Time Evals/sec Native C++ C++ 1,000,000 32 seconds 31,000 Native Java Java 1,000,000 38 seconds 26,000 StreamBase C++ 1,000,000 41 seconds 25,000 StreamBase Java 1,000,000 27 seconds 37,000 3

Problem Statement Based on analysis of the application performance, the Quasi Monte Carlo (QMC) function is far and away the hotspot in all of these applications. We used the StreamBase functionality for Data Parallelism to dedicate 4 concurrent threads to the QMC function. This resulted in a substantial speedup: Treading Functions Evaluations Elapsed Time Evals/sec Native C++ C++ 1,000,000 32 seconds 31,000 Native Java Java 1,000,000 38 seconds 26,000 StreamBase Data Parallel Java 1,000,000 11 seconds 91,000 StreamBase Data Parallel C++ 1,000,000 24 seconds 42,000 4

Conclusion This micro-benchmark measures the efficiency of numerical function implementations as well as the efficiency of multi-threaded work sharing. By comparing StreamBase to native threading implementations, we can draw conclusions about both the performance of the platform and the ease of implementation. The first StreamBase implementation is 20% faster than the Native C++ implementation, and 40% faster than the Native Java implementation. This is most likely due to the superior efficiency of stream-oriented inter-thread communication, and the resulting improved concurrency. StreamBase stands out not only for the raw performance demonstrated here, but also for the ease with which alternative parallelism structures can be developed and profiled. Having examined the application with a profiler, we discover and demonstrate that the optimal threading structure includes four data-parallel threads dedicated to Quasi Monte Carlo. This implementation is nearly 200% faster than the fastest non-streambase threaded implementation. Five major features explain the performance of StreamBase: StreamSQL more naturally expresses event-based applications StreamBase compiled executor optimizes whole programs Direct bytecode generation delivers superior single-core performance Dataflow optimized multithreading drives multi-core performance Development tools support identification and optimization of bottlenecks While this test effectively illustrates one style of event processing application (a highly parallelizable and partitionable computation), it sheds light on only one aspect of CEP product performance. In fact, for sheer bulk computation, single threaded systems are strong contenders, as are other parallel processing systems such as compute clusters. Other use cases in FX aggregation and trading applications more accurately illustrate the performance superiority of a multi-threaded CEP engine, with support for a variety of message formats, management of large state tables, long running aggregations, and multiple kinds of parallelism. For these use cases, the choice of the most flexible platform is going to deliver the best performance on the widest range of applications, in their first deployment and as scale up is required. 5

About Richard Tibbetts Richard Tibbetts is co-founder and Chief Technology Officer at StreamBase Systems. Richard provides technical leadership for the company and leads architecture design for StreamBase s Event Processing Platform. Richard is also responsible for furthering new StreamBase capabilities such as StreamBase s white-box application frameworks, for example, the Smart Order Routing Framework. As CTO, Richard directs the next-generation of StreamSQL, the event programming language developed by Richard and the StreamBase team, which applies the benefits of SQL for stored data to real-time transitory data. About StreamBase StreamBase Systems, Inc, a leader in high-performance Complex Event Processing (CEP), provides software for rapidly building systems that analyze and act on real-time streaming data for instantaneous decision-making. StreamBase s Event Processing Platform combines a rapid application development environment, an ultra low-latency high-throughput event server, and the broadest connectivity to real-time and historical data. Leading investment banks, hedge funds, and government agencies use StreamBase to power mission-critical applications that increase revenue, lower costs, and reduce risk. Applications in Capital Markets include FX Aggregation and Pricing, Smart Order Routing, Market Data Management and Algorithmic Trading. StreamBase customers include CME Group, ConvergEx Group, RBC Capital Markets, PhaseCapital and BlueCrest Capital Management. The company is headquartered in Lexington, Massachusetts with offices in New York, Washington D.C. and London. For more information, visit www.streambase.com. Corporate Headquarters New York Office Virginia Office European Headquarters 181 Spring Street 845 Third Avenue, 6th Floor 11921 Freedom Drive, Suite 550 60 Cannon Street Lexington, MA 02421 New York, NY 10022 Reston, VA 20190 London, EC4N 6NP +1 (866) 787-6227 +1 (866) 787-6227 +1 (443) 472-0767 +44 (0) 20 7002 1095 www.streambase.com (Website) @streambase (Twitter) streambase.typepad.com (Blog) StreamBase Systems, Inc. All rights reserved. StreamBase and StreamBase Studio are trademarks of StreamBase Systems, Inc. All other trademarks are the property of their respective owners. 6