A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks



Similar documents
Accelerating Spark with RDMA for Big Data Processing: Early Experiences

Can High-Performance Interconnects Benefit Memcached and Hadoop?

RDMA over Ethernet - A Preliminary Study

Big Data: Hadoop and Memcached

Accelerating Big Data Processing with Hadoop, Spark and Memcached

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Enabling High performance Big Data platform with RDMA

Accelerating Big Data Processing with Hadoop, Spark and Memcached

Accelera'ng Big Data Processing with Hadoop, Spark and Memcached

A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Hadoop on the Gordon Data Intensive Cluster

High-Performance Design of HBase with RDMA over InfiniBand

Building Enterprise-Class Storage Using 40GbE

SMB Direct for SQL Server and Private Cloud

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

RDMA for Apache Hadoop User Guide

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Benchmarking Hadoop & HBase on Violin

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

Accelerating and Simplifying Apache

MapReduce Evaluator: User Guide

Microsoft SMB Running Over RDMA in Windows Server 8

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

FLOW-3D Performance Benchmark and Profiling. September 2012

Advancing Applications Performance With InfiniBand

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

A Tour of the Linux OpenFabrics Stack

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Hadoop Optimizations for BigData Analytics

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

White Paper Solarflare High-Performance Computing (HPC) Applications

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Performance Evaluation of InfiniBand with PCI Express

High Speed I/O Server Computing with InfiniBand

High-Performance Networking for Optimized Hadoop Deployments

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Enabling Technologies for Distributed Computing

New Storage System Solutions

Installing Hadoop over Ceph, Using High Performance Networking

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

SR-IOV In High Performance Computing

Tyche: An efficient Ethernet-based protocol for converged networked storage

ORACLE BIG DATA APPLIANCE X3-2

Michael Kagan.

RoCE vs. iwarp Competitive Analysis

Business white paper. HP Process Automation. Version 7.0. Server performance

Can High-Performance Interconnects Benefit Hadoop Distributed File System?

D1.2 Network Load Balancing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Dell Reference Configuration for Hortonworks Data Platform

ECLIPSE Performance Benchmarks and Profiling. January 2009

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Enabling Technologies for Distributed and Cloud Computing

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Design and Implementation of the iwarp Protocol in Software. Dennis Dalessandro, Ananth Devulapalli, Pete Wyckoff Ohio Supercomputer Center

Boosting Data Transfer with TCP Offload Engine Technology

LS DYNA Performance Benchmarks and Profiling. January 2009

Linux NIC and iscsi Performance over 40GbE

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

Your Old Stack is Slowing You Down. Ajay Patel, Vice President, Fusion Middleware

Boost Database Performance with the Cisco UCS Storage Accelerator

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Mellanox Academy Online Training (E-learning)

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Lustre Networking BY PETER J. BRAAM

Deploying 10/40G InfiniBand Applications over the WAN

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

Advanced Computer Networks. High Performance Networking I

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Storage at a Distance; Using RoCE as a WAN Transport

Windows 8 SMB 2.2 File Sharing Performance

TCP/IP Implementation of Hadoop Acceleration. Cong Xu

An Oracle White Paper August Oracle WebCenter Content 11gR1 Performance Testing Results

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

Intel True Scale Fabric Architecture. Enhanced HPC Architecture and Performance

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

GraySort on Apache Spark by Databricks

Transcription:

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State University, Columbus, OH, USA

Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 2

Big Data Technology Apache Hadoop is one of the most popular Big Data technology Provides framework for large-scale, distributed data storage and processing An open-source implementation of MapReduce programming model Hadoop Distributed File System (HDFS) is the underlying file system of Hadoop MapReduce and Hadoop DataBase, HBase Hadoop Core Common functionalities, e.g. Remote Procedure Call (RPC) HBase MapReduce HDFS Core (RPC,..) Hadoop Framework 3

Adoption of Hadoop RPC Hadoop RPC is increasingly being used with data-center middlewares such as MapReduce, HDFS, and HBase because of its simplicity, productivity, and high performance. Metadata exchange Manage compute nodes and track system status Efficient data management operations: get block info, create blocks etc. Database operations: put, get, etc. (HDD/SSD) (HDD/SSD).................. High Performance Networks... (HDD/SSD)... High Performance Networks... High Performance Networks... (HDD/SSD) (HDD/SSD) (HDD/SSD) Map/Reduce (HDFS Name Node) (HDFS Clients) (HDFS Data Nodes) MapReduce & HDFS (HBase Clients) (HRegion Servers) (Data Nodes) HBase 4

Common Protocols using Open Fabrics ApplicaAon Interface Applica-on Sockets Verbs Protocol Kernel Space TCP/IP TCP/IP RSockets SDP iwarp RDMA RDMA Ethernet Driver IPoIB Hardware Offload User space RDMA User space User space User space Adapter Ethernet Adapter InfiniBand Adapter Ethernet Adapter InfiniBand Adapter InfiniBand Adapter iwarp Adapter RoCE Adapter InfiniBand Adapter Switch Ethernet Switch InfiniBand Switch Ethernet Switch InfiniBand Switch InfiniBand Switch Ethernet Switch Ethernet Switch InfiniBand Switch 1/10/40 GigE IPoIB 10/40 GigE- TOE RSockets SDP iwarp RoCE IB Verbs 5

Can Big Data Processing Systems be Designed with High- Performance Networks and Protocols? Current Design Enhanced Designs Our Approach Applica-on Applica-on Applica-on Sockets Accelerated Sockets Verbs / Hardware Offload OSU Design Verbs Interface 1/10 GigE Network 10 GigE or InfiniBand 10 GigE or InfiniBand Sockets not designed for high- performance Stream semanacs osen mismatch for upper layers (Memcached, HBase, Hadoop) Zero- copy not available for non- blocking sockets 6

Hadoop RPC over InfiniBand Enables high performance RDMA communicaaon, while supporang tradiaonal socket interface Applica-ons Hadoop RPC Default Java Socket Interface rpc.ib.enabled Our Design Java Na-ve Interface (JNI) OSU Design 1/10 GigE, IPoIB Network IB Verbs InfiniBand Xiaoyi Lu, Nusrat Islam, Md. Wasi- ur- Rahman, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. (DK) Panda. High- Performance Design of Hadoop RPC with RDMA over InfiniBand. To be presented in the 42nd Interna-onal Conference on Parallel Processing (ICPP 2013), Lyon, France, October, 2013. 7

Hadoop RPC over IB: Gain in Latency and Throughput Latency (us) 120 100 80 60 40 20 0 RPC- 10GigE RPC- IPoIB(32Gbps) RPCoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 Payload Size (Byte) Throughput (Kops/Sec) 160 140 120 100 80 60 40 20 0 RPC- 10GigE RPC- IPoIB(32Gbps) RPCoIB(32Gbps) 8 16 24 32 40 48 56 64 Number of Clients Hadoop RPC over IB PingPong Latency 1 byte: 39 us; 4 KB: 52 us 42%- 49% and 46%- 50% improvements compared with the performance of default Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely Hadoop RPC over IB Throughput 512 bytes & 48 clients: 135.22 Kops/sec 82% and 64% improvements compared with the peak performance of default Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely 8

Available in Hadoop-RDMA SoSware High-Performance Design of Hadoop over RDMA-enabled Interconnects High performance design with native InfiniBand support at the verbs -level for HDFS, MapReduce, and RPC components Easily configurable for both native InfiniBand and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB) Current release: 0.9.0 Based on Apache Hadoop 0.20.2 Compliant with Apache Hadoop 0.20.2 APIs and applications Tested with Mellanox InfiniBand adapters (DDR, QDR and FDR) Various multi-core platforms Different file systems with disks and SSDs http://hadoop-rdma.cse.ohio-state.edu 9

Requirements of Hadoop RPC Benchmarks To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload characteristics A micro-benchmark tool suite to evaluate Hadoop RPC performance metrics in different configurations is important for tuning and understanding For Hadoop developers, this kind of micro-benchmark suite is helpful to evaluate and optimize the performance of new designs 10

Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 11

Problem Statement Can we design and implement a simple and standardized benchmark suite to let all users and developers in the Big Data community evaluate, understand, and optimize the Hadoop RPC performance over a range of networks /protocols? What will be the performance of Hadoop RPC when evaluated using this benchmark suite on high-performance networks? 12

Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 13

Design Considerations The performance of RPC systems is usually measured by the metrics of latency and throughput Performance of Hadoop RPC is determined by: Factors related to network configurations; Faster interconnects and/or protocols can enhance Hadoop RPC performance Controllable parameters in RPC engine-level and benchmark-level: handler/client number, etc. Data types: serialization and deserialization issues of different data types in RPC system; BytesWritable, Text, etc. CPU Utilization: tradeoff between RPC subsystem performance and the whole system performance 14

Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 15

Micro-benchmark Suite Two different micro-benchmarks: Latency: Single Server, Single Client Throughput: Single Server, Multiple Clients A script framework for job launching and resource monitoring Calculates statistics like Min, Max, Average Component Network Address Port Data Type Min Msg Size Max Msg Size No. of Iterations Handlers lat_client lat_server Verbose Component Network Address Port Data Type Min Msg Size Max Msg Size No. of Iterations No. of Clients Handlers thr_client thr_server 16 Verbose

Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 17

Hardware Intel Westmere Cluster Experimental Setup 8 nodes Each node has 8 processor cores on 2 Intel Xeon 2.67 GHz Quad- core CPUs, 24 GB main memory Network: 1GigE, 10GigE, and IPoIB (32Gbps) SoSware Enterprise Linux Server release 6.1 (Santiago) at kernel version 2.6.32-131 with OpenFabrics version 1.5.3 Hadoop 0.20.2 and Sun Java SDK 1.7. 18

RPC Latency for BytesWritable Latency (us) 250 200 150 100 50 1GigE 10GigE IPoIB(32Gbps) Latency (ms) 800 700 600 500 400 300 200 1GigE 10GigE IPoIB(32Gbps) 0 Payload Size (Byte) 100 0 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M Payload Size (Byte) Small Messages Large Messages Latency for RPC decreases if the underlying interconnect is changed to IPoIB or 10 GigE from 1 GigE. With 10 GigE interconnect, we observe beher latency than IPoIB for small payload sizes. For large payload sizes, IPoIB performs beher than 10 GigE. IPoIB achieves 27% gain over 10 GigE for a 64 MB payload size, whereas it performs worse by 0.66% over 10 GigE for a 4 KB payload size. 19

RPC Latency for Text 200 800 Latency (us) 180 160 140 120 100 80 60 40 20 0 1GigE 10GigE IPoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 102420484096 Payload Size (Byte) Latency (us) 700 600 500 400 300 200 100 0 1GigE 10GigE IPoIB(32Gbps) 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M Payload Size (Byte) Small Messages Large Messages Similar performance characterisac for RPC latency with the data type of Text. 20

RPC Throughput for BytesWritable Throughput (Kops/Sec) 45 40 35 30 25 20 15 10 5 0 1GigE 10GigE IPoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 Payload Size (byte) Throughput (Kops/Sec) 45 40 35 30 25 20 15 1GigE 10 5 10GigE 0 IPoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 Payload Size (byte) 7 RPC Server Handlers 16 RPC Server Handlers IPoIB performs beher than 10 GigE as payload size is increased. At 4 KB, the improvement goes upto 26% for seven handler threads. For small payload sizes, 10 GigE performs beher than IPoIB by an average margin of 5-6%. 21

RPC Throughput for BytesWritable Throughput (Kops/Sec) 100 90 80 70 60 50 40 30 20 10 0 1GigE 10GigE IPoIB(32Gbps) 1 4 16 32 Handler Number Throughput Comparison for 4 KB payload size CPU Utilization (%) 45 40 35 30 25 20 15 10 5 0 1GigE 10GigE IPoIB(32Gbps) 0 9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180 189 198 207 216 Sampling Point CPU utilization for the experiment with 4 handlers Keep the payload size fixed to 4 KB and observe the trend with different handler numbers and different networks IPoIB performs beher than 10 GigE as 48%, 5%, 45%, and 47% for 1, 4, 16, and 32 handlers respecavely. Easily used to monitor resource ualizaaon. Enable a parameter in the script framework. 22

Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 23

Conclusion and Future Works Design and implement a micro-benchmark suite to evaluate the performance of standalone Hadoop RPC. Provide standard micro-benchmarks to measure the latency and throughput of Hadoop RPC with different data types. Illustrate the performance results of Hadoop RPC using our benchmarks over different networks/protocols (1GigE/10GigE/IPoIB). Will extend our benchmark suite to help users to make the performance comparisons among Hadoop Writable RPC, Avro, Thrift, and Protocol buffers Will be made available to the big data community via an open-source release 24

Thank You! {luxi, rahmanmd, islamn, panda}@cse.ohio- state.edu Network- Based CompuAng Laboratory hhp://nowlab.cse.ohio- state.edu/ MVAPICH Web Page hhp://mvapich.cse.ohio- state.edu/ Hadoop- RDMA Web Page hhp://hadoop- rdma.cse.ohio- state.edu/ 25