Improving Grid Processing Efficiency through Compute-Data Confluence



Similar documents
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Oracle Database Scalability in VMware ESX VMware ESX 3.5

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Dell* In-Memory Appliance for Cloudera* Enterprise

Accelerating Microsoft Exchange Servers with I/O Caching

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture

How To Build A Cloud Computer

Accelerating Business Intelligence with Large-Scale System Memory

Comparing Multi-Core Processors for Server Virtualization

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Benefits of multi-core, time-critical, high volume, real-time data analysis for trading and risk management

EMC Unified Storage for Microsoft SQL Server 2008

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

IBM Rational Asset Manager

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Tableau Server 7.0 scalability

QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION

IBM Storwize V7000 Unified and Storwize V7000 storage systems

Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops

The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.

Interoperability Testing and iwarp Performance. Whitepaper

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

IBM WebSphere Premises Server

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

SAS Business Analytics. Base SAS for SAS 9.2

Amazon EC2 XenApp Scalability Analysis

Scaling Web Applications on Server-Farms Requires Distributed Caching

Understanding the Benefits of IBM SPSS Statistics Server

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

EMC XTREMIO EXECUTIVE OVERVIEW

Navigating the Enterprise Database Selection Process: A Comparison of RDMS Acquisition Costs Abstract

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: PRICING & LICENSING GUIDE

Datacenter Management Optimization with Microsoft System Center

Legal Notices Introduction... 3

Intelligent Business Operations

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Memory Sizing for Server Virtualization. White Paper Intel Information Technology Computer Manufacturing Server Virtualization

A Superior Hardware Platform for Server Virtualization

Trading and risk management: Benefits of time-critical, real-time analysis on big numerical data

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

Intel Platform and Big Data: Making big data work for you.

Platfora Big Data Analytics

Modernizing Servers and Software

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

Kronos Workforce Central on VMware Virtual Infrastructure

Informatica Ultra Messaging SMX Shared-Memory Transport

Server Consolidation for SAP ERP on IBM ex5 enterprise systems with Intel Xeon Processors:

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation

HP ProLiant BL460c takes #1 performance on Siebel CRM Release 8.0 Benchmark Industry Applications running Linux, Oracle

Avid ISIS

Integrated Grid Solutions. and Greenplum

An Oracle White Paper Released October 2008

Fast, Low-Overhead Encryption for Apache Hadoop*

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Autodesk Revit 2016 Product Line System Requirements and Recommendations

Numerix CrossAsset XL and Windows HPC Server 2008 R2

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

Scaling up to Production

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009

Green HPC - Dynamic Power Management in HPC

Maximum performance, minimal risk for data warehousing

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

SAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family

White Paper. SAP NetWeaver Landscape Virtualization Management on VCE Vblock System 300 Family

IBM PureFlex System. The infrastructure system with integrated expertise

The IBM Cognos Platform for Enterprise Business Intelligence

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Ignify ecommerce. Item Requirements Notes

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Capacity Planning Fundamentals. Support Business Growth with a Better Approach to Scaling Your Data Center

HOW MANY USERS CAN I GET ON A SERVER? This is a typical conversation we have with customers considering NVIDIA GRID vgpu:

Accelerating Server Storage Performance on Lenovo ThinkServer

Big Data Performance Growth on the Rise

BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS

IBM System x family brochure

Quad-Core Intel Xeon Processor

Solution Recipe: Improve PC Security and Reliability with Intel Virtualization Technology

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

EMC VFCACHE ACCELERATES ORACLE

The Hartree Centre helps businesses unlock the potential of HPC

TIBCO Live Datamart: Push-Based Real-Time Analytics

Accelerating Data Compression with Intel Multi-Core Processors

Transcription:

Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform Computing 1. Overview In competitive industries such as capital markets, greater computing power can enhance the efficiency and profitability achieved through critical functions such as trading and risk management. Take for instance the case of a leading investment bank that witnessed a 10x increase in their trading volumes and a 600% improvement in risk computation times as a result of improved data architectures and compute resource expansions. Such firms require more performance, datacenter efficiency, workload management and scalability to maximize these resources capabilities delivered by grid computing. Firms are investing in computing and data management resources that rapidly return profitability to the business and provide a truly competitive advantage. Innovative new processor and computing technologies have combined to economically enable grid computing and virtualization, allowing enterprises to reduce physical IT resources and costs, improve manageability and scalability, and improve responsiveness to customers, employees and partners. Efficient use of Grid infrastructure relies on two critical factors provisioning of sufficient CPU/compute power and reliable, low-latency access to data. Only when there is a balance between these factors will both resources be optimally utilized. The data must be delivered in a timely fashion in order for the compute engines to stay busy. And, raw data itself has little value without available resources to compute upon it. This benchmark report highlights the combined power of using Platform s Symphony* (Compute Grid) and GemStone s GemFire* Enterprise Data Fabric (Data Grid) in a Grid computing environment. The benchmark tests were executed on Dual-Core Intel Xeon 64-bit processor-based hardware. Symphony and GemFire come together to offer these benefits and leverage the power of the underlying hardware platform to

deliver a perfect confluence of Compute Grid and Data Grid. The combined use of these two technologies offers interesting design patterns that were not possible thus far. These patterns include: 1. Intelligent pre-placement of static data in a distributed in-memory data grid for instantaneous access by the compute nodes. 2. Caching and distribution of intermediate results in a workflow style grid process, precluding the need for a staging database that is usually centralized, latent and not scalable under parallel loads. 3. Event-driven model to Grid calculations that are triggered automatically upon data/event updates in real-time. This eliminates the need for Grid job requests to package the necessary data with every task. In addition to delivering results to clients instantaneously, such an event-driven approach also ensures that the data propagated back to Grid clients or used for computations are as up to date and consistent as possible. 4. Data aware routing: Based on data placement strategies, the Grid scheduler can route tasks to the appropriate node based on the data requirements of that particular task. This promotes data-compute affinity. 5. Loose coupling of interdependent Grid tasks through the use of Data Grid s listener model. This avoids expensive thread-blocking operations. 6. Automatic scheduling of data load tasks on distributed Grid nodes based on pre-defined policies. All these advantages help tackle the typical latency, performance and scalability problems that impede grid deployments and growth. For the sake of simplicity, this benchmark report focuses only on the event-driven Grid calculation scenario (the third item on the list above) using Symphony and GemFire.

2. Benchmark Scenarios For the purposes of this benchmark, a derivatives (options) portfolio valuation application (Microsoft* Excel-based) typically used in Investment Banks was chosen. The tests involved a comparison of the following two scenarios Baseline scenario. Portfolio valuation with Compute Grid alone. Optimized scenario. Portfolio valuation with Compute and Data Grid. Portfolio Description. 100 stocks, each with 72 puts and 72 calls, resulting in 14400 derivative instruments that are impacted based on market data updates. Market data inputs consists of price information for the 100 underlying stocks. 2.1 Physical Benchmark Environment 1. 32-node cluster with Dual-Core Intel Xeon Processor 5100/5000 Sequence (Woodcrest/Dempsey). Each processor has: Two execution cores Intelligent, shared 4MB L2 cache 3.0 GHz clock speed 8 GB of FBD-DDR2 RAM 2. Network 1 Gigabit bandwidth. 3. Compute Grid: Symphony Master process 1 node (4 cores) Compute engines 28 nodes (total Compute CPU power of 4 cores per node x 28 nodes = 112 cores) 4. Data Grid 2 GemFire* Cache Servers 2 nodes (4 cores per node x 2 nodes = 8 cores) Data feeds and GemFire client caches 1 node (4 cores) Two test cases were run for each of these scenarios : Test 1: Platform Symphony* was configured to use compute engines on 50 cores for portfolio calculations. Test 2: Platform Symphony was configured to use compute engines on 100 cores for portfolio calculations. In both use-cases, 12 cores were used for satisfying client requests. 2.2 Baseline Scenario: Portfolio Valuation with Compute Grid alone As shown in Figure 1, user requests for portfolio recalculations are sent from an Excel spreadsheet to the Symphony compute engine. Each client request also included a market data wave, which is a collection of market data points for all stocks within the options portfolio used. The symphony engine schedules the portfolio calculations of the different compute nodes, and once the computations are completed and the results are ready, sends it back to the client. Portfolio valuation is entirely client driven in this scenario. Figure 1: Portfolio calculation without a Data Grid 2. Portfolio Valuations are calculated and returned to the spreadsheet for display. Platform Symphony / Compute Grid User Spreadsheet 1. Spreadsheet user requests an updated portfolio valuation. Current market data snapshot is sent with each Request.

2.3 Optimized Scenario: Portfolio Valuation with Compute and Data Grid Figures 2.a and 2.b describe the workflow involved in this scenario. The key architectural differences between this scenario and the baseline scenario are as follows: Figure 2.a. Portfolio Valuation with a Data Grid Event-driven portfolio valuation in real-time Ticker Plant 1. Ticker Plant, at random intervals releases a wave of Market Data. 2. Market Data. is updated. Market Data Region in GemFire* Data Grid Instrument Valuation Data Region in GemFire Data Grid 4. The new instrument valuations are updated in the Data Grid. Platform Symphony* / Compute Grid Price Event GemFire Client Cache 3. For each market data event, the Price Event Client will trigger a re-valuation of the instruments in the portfolio. The new market data is accessed from the Data Grid, and the interim results are kept in a staging results area of the Instrument Valuation Data Region in the Data Grid. Figure 2.b. Client Request Processing with a Data Grid Client-request processing 2. The compute Grid will read from the Instrument Valuation Region the latest calculated instrument valuations as a result of the last market data event wave. Instrument Valuation Region in GemFire* Data Grid 3. The new Portfolio Valuations are returned to the spreadsheet for display. Platform Symphony* / Compute Grid User Spreadsheet 1. Spreadsheet user requests an updated portfolio valuation. 4

1. Portfolio valuations in this scenario are performed by a Calculation Service triggered based on real-time market data updates (event-driven) sent to a data region on the GemFire* Data Grid and not based on client requests. To ensure consistency across the two scenarios, market data updates in this scenario are also released in waves (data updates are pushed for all stocks in the portfolio in one shot). Note: In real situations, market data updates are fine-grained and usually released for each individual instrument. In that scenario, portfolio calculations can be selectively triggered (i.e., run it for instruments that have been impacted) in response to those individual market data updates, and a more balanced CPU utilization profile can be achieved as a result of using the GemFire Data Grid. 2. Client requests to the Platform Symphony* are satisfied by a Results Service. This service directly accesses a data region on the GemFire Data Grid, which holds the most up-to-date portfolio values for the client. Thus, client requests are instantaneously satisfied. This scenario highlights the fact that intelligent use of a Compute Grid and a Data Grid results in not only significant performance improvements (see Benchmark Results section), but also offers a new event-driven design paradigm for running grid computations, and making sure that most current data is made ready and available to Grid clients. 3. Benchmark Results Test 1. Platform Symphony* configured to use compute engines on 50 CPUs for portfolio calculations Baseline Scenario (seconds) Optimized Scenario (seconds) Efficiency Increase Net Benefit End-to-end client response time 70.69 1.15 98.3 % > 61X Portfolio calculation time 70.69 50.45 28.63 % >1.3X Test 2. Platform Symphony* configured to use compute engines on 100 CPUs for portfolio calculations Baseline Scenario (seconds) Optimized Scenario (seconds) Efficiency Increase Net Benefit End-to-end client response time 36.6 0.99 97.3% >36.9X Portfolio calculation time 36.6 27.11 26.0% >1.3X

4. Results Analysis and Observations 1. The significant performance improvement in the end-user response times is primarily due to the new event-driven grid architecture made possible by Compute and Data Grid integration. Instantaneous portfolio valuation with the real-time movement of market data allows for the exact portfolio valuation to always be available and improves the end user experience dramatically. 2. Using a data grid to manage market data consumption and distribution eliminates the need for Symphony* Compute Grid to correlate and transport that data with the workload. This is responsible for greater than 30% performance boost in portfolio calculation time. Moreover, the dataset used in this benchmark is fairly simple small (100 stocks and a total of 14400 instruments). Real customer datasets would be orders of magnitude larger and more complex, wherein the power of Compute-Data Grid combination would result in an even more dramatic performance improvement. For example, at a large investment bank, the lead-time for completing 3 billion risk calculations was reduced from 9 hours to 2 hours (4.5X) through the use of a Data Grid. 1 Furthermore, if grid calculations involved access to static data (for e.g., reference data) stored in databases, those calculations can be further speeded up by moving the static data entities into the distributed mainmemory regions of a Data Grid. 3. As mentioned before, market data in released in waves or all at once for the set of underlying stocks. A more realistic, tick-bytick market data feed would increase Data Grid efficiencies and result in better CPU utilization. 4. To ensure consistency across the two scenarios, there was no reuse of data across user-requests, nor were instrument values shared across portfolios. The existence of such overlaps in real scenarios would further increase system performance and scalability. 5. The tests were setup in Platform Symphony with short-running tasks (again to ensure an apples-to-apples comparison between scenarios). Thus in the Optimized Scenario, when a portfolio value is calculated or retrieved, all the compute nodes must connect to the Data Grid. It is expected that by simply registering these tasks as long running processes in the Platform environment an even greater reduction in end user response time can be easily achieved. 6. Both scenarios used a single Excel client to drive portfolio valuation requests. In real-life situations, multiple such clients would be sending such requests to the Compute Grid. In such scenarios the presence of a Data Grid would prevent the deterioration of overall response times with an increase in the number of users and ensure smooth Grid scalability. 6

5. Summary / Next Steps Based on the results of this benchmark, it is clear that the performance benefits of a balanced model, based on the combination of compute and data grids, is a marked improvement and warrants further exploration. 6. Additional Information Links GemStone Technologies GemFire in Grid Computing Download Data Grid Whitepapers and Evaluation Software www.gemstone.com/solutions/gridcomputing www.gemstone.com/download Platform Technologies www.platform.com Intel Technologies Intel Core Microarchitecture Dual-Core Intel Xeon Processor Intel Multi-Core Processor Technologies Intel Xeon Processor Benchmarks www.intel.com/technology/architecture/coremicro www.intel.com/business/bss/products/server/xeon www.intel.com/multi-core www.intel.com/xeon 7

About GemStone Systems, Inc. GemStone Systems is the leading provider of the Enterprise Data Fabric (EDF), offering data services solutions for enterprise business architects and data infrastructure managers that are building, enhancing or simplifying access, distribution, integration and management of information within and across the enterprise. Founded in 1982, and with over 200 installed customers, GemStone is recognized worldwide for its unique competency and patented technology in object management, virtual memory architectures, high-performance caching, and data distribution technologies. For more information please visit www.gemstone.com. About Platform Computing Platform Computing is the global leader for grid computing solutions and a technology pioneer of the supercomputing world. The company s solutions for enterprise and high-performance computing help the world s largest organizations integrate and accelerate business processes, to increase competitive advantage and enjoy a higher return on investment from IT. With over 2000 customers, the company has achieved a clear leadership position in the market through a focus on technology innovation and execution. Founded in 1992, Platform Computing has strategic relationships with Dell, HP, IBM, Intel, Microsoft, Novell and Red Hat, along with the industry s broadest support for third-party applications. For more information please visit www.platform.com. About Intel Corporation Intel, the world leader in silicon innovation, develops technologies, products and initiatives to continually advance how people work and live. Additional information about Intel is available at: www.intel.com/pressroom. For More Information www.intel.com/go/finance 1 Source: GemStone Systems, Inc. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Intel, the Intel logo, Intel. Leap ahead., the Intel. Leap ahead. logo, Intel Core, and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands may be claimed as the property of others. Copyright 2006, Intel Corporation. All rights reserved. 0906/AKG/QUA/PG/300 Please Recycle 315230-001US