A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State University, Columbus, OH, USA
Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 2
Big Data Technology Apache Hadoop is one of the most popular Big Data technology Provides framework for large-scale, distributed data storage and processing An open-source implementation of MapReduce programming model Hadoop Distributed File System (HDFS) is the underlying file system of Hadoop MapReduce and Hadoop DataBase, HBase Hadoop Core Common functionalities, e.g. Remote Procedure Call (RPC) HBase MapReduce HDFS Core (RPC,..) Hadoop Framework 3
Adoption of Hadoop RPC Hadoop RPC is increasingly being used with data-center middlewares such as MapReduce, HDFS, and HBase because of its simplicity, productivity, and high performance. Metadata exchange Manage compute nodes and track system status Efficient data management operations: get block info, create blocks etc. Database operations: put, get, etc. (HDD/SSD) (HDD/SSD).................. High Performance Networks... (HDD/SSD)... High Performance Networks... High Performance Networks... (HDD/SSD) (HDD/SSD) (HDD/SSD) Map/Reduce (HDFS Name Node) (HDFS Clients) (HDFS Data Nodes) MapReduce & HDFS (HBase Clients) (HRegion Servers) (Data Nodes) HBase 4
Common Protocols using Open Fabrics ApplicaAon Interface Applica-on Sockets Verbs Protocol Kernel Space TCP/IP TCP/IP RSockets SDP iwarp RDMA RDMA Ethernet Driver IPoIB Hardware Offload User space RDMA User space User space User space Adapter Ethernet Adapter InfiniBand Adapter Ethernet Adapter InfiniBand Adapter InfiniBand Adapter iwarp Adapter RoCE Adapter InfiniBand Adapter Switch Ethernet Switch InfiniBand Switch Ethernet Switch InfiniBand Switch InfiniBand Switch Ethernet Switch Ethernet Switch InfiniBand Switch 1/10/40 GigE IPoIB 10/40 GigE- TOE RSockets SDP iwarp RoCE IB Verbs 5
Can Big Data Processing Systems be Designed with High- Performance Networks and Protocols? Current Design Enhanced Designs Our Approach Applica-on Applica-on Applica-on Sockets Accelerated Sockets Verbs / Hardware Offload OSU Design Verbs Interface 1/10 GigE Network 10 GigE or InfiniBand 10 GigE or InfiniBand Sockets not designed for high- performance Stream semanacs osen mismatch for upper layers (Memcached, HBase, Hadoop) Zero- copy not available for non- blocking sockets 6
Hadoop RPC over InfiniBand Enables high performance RDMA communicaaon, while supporang tradiaonal socket interface Applica-ons Hadoop RPC Default Java Socket Interface rpc.ib.enabled Our Design Java Na-ve Interface (JNI) OSU Design 1/10 GigE, IPoIB Network IB Verbs InfiniBand Xiaoyi Lu, Nusrat Islam, Md. Wasi- ur- Rahman, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. (DK) Panda. High- Performance Design of Hadoop RPC with RDMA over InfiniBand. To be presented in the 42nd Interna-onal Conference on Parallel Processing (ICPP 2013), Lyon, France, October, 2013. 7
Hadoop RPC over IB: Gain in Latency and Throughput Latency (us) 120 100 80 60 40 20 0 RPC- 10GigE RPC- IPoIB(32Gbps) RPCoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 Payload Size (Byte) Throughput (Kops/Sec) 160 140 120 100 80 60 40 20 0 RPC- 10GigE RPC- IPoIB(32Gbps) RPCoIB(32Gbps) 8 16 24 32 40 48 56 64 Number of Clients Hadoop RPC over IB PingPong Latency 1 byte: 39 us; 4 KB: 52 us 42%- 49% and 46%- 50% improvements compared with the performance of default Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely Hadoop RPC over IB Throughput 512 bytes & 48 clients: 135.22 Kops/sec 82% and 64% improvements compared with the peak performance of default Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely 8
Available in Hadoop-RDMA SoSware High-Performance Design of Hadoop over RDMA-enabled Interconnects High performance design with native InfiniBand support at the verbs -level for HDFS, MapReduce, and RPC components Easily configurable for both native InfiniBand and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB) Current release: 0.9.0 Based on Apache Hadoop 0.20.2 Compliant with Apache Hadoop 0.20.2 APIs and applications Tested with Mellanox InfiniBand adapters (DDR, QDR and FDR) Various multi-core platforms Different file systems with disks and SSDs http://hadoop-rdma.cse.ohio-state.edu 9
Requirements of Hadoop RPC Benchmarks To achieve optimal performance, Hadoop RPC needs to be tuned based on cluster and workload characteristics A micro-benchmark tool suite to evaluate Hadoop RPC performance metrics in different configurations is important for tuning and understanding For Hadoop developers, this kind of micro-benchmark suite is helpful to evaluate and optimize the performance of new designs 10
Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 11
Problem Statement Can we design and implement a simple and standardized benchmark suite to let all users and developers in the Big Data community evaluate, understand, and optimize the Hadoop RPC performance over a range of networks /protocols? What will be the performance of Hadoop RPC when evaluated using this benchmark suite on high-performance networks? 12
Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 13
Design Considerations The performance of RPC systems is usually measured by the metrics of latency and throughput Performance of Hadoop RPC is determined by: Factors related to network configurations; Faster interconnects and/or protocols can enhance Hadoop RPC performance Controllable parameters in RPC engine-level and benchmark-level: handler/client number, etc. Data types: serialization and deserialization issues of different data types in RPC system; BytesWritable, Text, etc. CPU Utilization: tradeoff between RPC subsystem performance and the whole system performance 14
Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 15
Micro-benchmark Suite Two different micro-benchmarks: Latency: Single Server, Single Client Throughput: Single Server, Multiple Clients A script framework for job launching and resource monitoring Calculates statistics like Min, Max, Average Component Network Address Port Data Type Min Msg Size Max Msg Size No. of Iterations Handlers lat_client lat_server Verbose Component Network Address Port Data Type Min Msg Size Max Msg Size No. of Iterations No. of Clients Handlers thr_client thr_server 16 Verbose
Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 17
Hardware Intel Westmere Cluster Experimental Setup 8 nodes Each node has 8 processor cores on 2 Intel Xeon 2.67 GHz Quad- core CPUs, 24 GB main memory Network: 1GigE, 10GigE, and IPoIB (32Gbps) SoSware Enterprise Linux Server release 6.1 (Santiago) at kernel version 2.6.32-131 with OpenFabrics version 1.5.3 Hadoop 0.20.2 and Sun Java SDK 1.7. 18
RPC Latency for BytesWritable Latency (us) 250 200 150 100 50 1GigE 10GigE IPoIB(32Gbps) Latency (ms) 800 700 600 500 400 300 200 1GigE 10GigE IPoIB(32Gbps) 0 Payload Size (Byte) 100 0 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M Payload Size (Byte) Small Messages Large Messages Latency for RPC decreases if the underlying interconnect is changed to IPoIB or 10 GigE from 1 GigE. With 10 GigE interconnect, we observe beher latency than IPoIB for small payload sizes. For large payload sizes, IPoIB performs beher than 10 GigE. IPoIB achieves 27% gain over 10 GigE for a 64 MB payload size, whereas it performs worse by 0.66% over 10 GigE for a 4 KB payload size. 19
RPC Latency for Text 200 800 Latency (us) 180 160 140 120 100 80 60 40 20 0 1GigE 10GigE IPoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 102420484096 Payload Size (Byte) Latency (us) 700 600 500 400 300 200 100 0 1GigE 10GigE IPoIB(32Gbps) 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M Payload Size (Byte) Small Messages Large Messages Similar performance characterisac for RPC latency with the data type of Text. 20
RPC Throughput for BytesWritable Throughput (Kops/Sec) 45 40 35 30 25 20 15 10 5 0 1GigE 10GigE IPoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 Payload Size (byte) Throughput (Kops/Sec) 45 40 35 30 25 20 15 1GigE 10 5 10GigE 0 IPoIB(32Gbps) 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 Payload Size (byte) 7 RPC Server Handlers 16 RPC Server Handlers IPoIB performs beher than 10 GigE as payload size is increased. At 4 KB, the improvement goes upto 26% for seven handler threads. For small payload sizes, 10 GigE performs beher than IPoIB by an average margin of 5-6%. 21
RPC Throughput for BytesWritable Throughput (Kops/Sec) 100 90 80 70 60 50 40 30 20 10 0 1GigE 10GigE IPoIB(32Gbps) 1 4 16 32 Handler Number Throughput Comparison for 4 KB payload size CPU Utilization (%) 45 40 35 30 25 20 15 10 5 0 1GigE 10GigE IPoIB(32Gbps) 0 9 18 27 36 45 54 63 72 81 90 99 108 117 126 135 144 153 162 171 180 189 198 207 216 Sampling Point CPU utilization for the experiment with 4 handlers Keep the payload size fixed to 4 KB and observe the trend with different handler numbers and different networks IPoIB performs beher than 10 GigE as 48%, 5%, 45%, and 47% for 1, 4, 16, and 32 handlers respecavely. Easily used to monitor resource ualizaaon. Enable a parameter in the script framework. 22
Outline IntroducAon and MoAvaAon Problem Statement Design ConsideraAons Micro- benchmark Suite Performance EvaluaAon Conclusion & Future work 23
Conclusion and Future Works Design and implement a micro-benchmark suite to evaluate the performance of standalone Hadoop RPC. Provide standard micro-benchmarks to measure the latency and throughput of Hadoop RPC with different data types. Illustrate the performance results of Hadoop RPC using our benchmarks over different networks/protocols (1GigE/10GigE/IPoIB). Will extend our benchmark suite to help users to make the performance comparisons among Hadoop Writable RPC, Avro, Thrift, and Protocol buffers Will be made available to the big data community via an open-source release 24
Thank You! {luxi, rahmanmd, islamn, panda}@cse.ohio- state.edu Network- Based CompuAng Laboratory hhp://nowlab.cse.ohio- state.edu/ MVAPICH Web Page hhp://mvapich.cse.ohio- state.edu/ Hadoop- RDMA Web Page hhp://hadoop- rdma.cse.ohio- state.edu/ 25