Can High-Performance Interconnects Benefit Memcached and Hadoop?

Similar documents

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

RDMA over Ethernet - A Preliminary Study

Big Data: Hadoop and Memcached

Enabling High performance Big Data platform with RDMA

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

High-Performance Networking for Optimized Hadoop Deployments

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Hadoop on the Gordon Data Intensive Cluster

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Accelerating Big Data Processing with Hadoop, Spark and Memcached

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

SMB Direct for SQL Server and Private Cloud

RoCE vs. iwarp Competitive Analysis

Benchmarking Hadoop & HBase on Violin

Hadoop Architecture. Part 1

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Hadoop: Embracing future hardware

Hadoop Optimizations for BigData Analytics

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS

State of the Art Cloud Infrastructure

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Accelerating Big Data Processing with Hadoop, Spark and Memcached

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

Building a Scalable Storage with InfiniBand

White Paper Solarflare High-Performance Computing (HPC) Applications

Scientific Computing Data Management Visions

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

ECLIPSE Performance Benchmarks and Profiling. January 2009

Big Fast Data Hadoop acceleration with Flash. June 2013

A Tour of the Linux OpenFabrics Stack

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Interoperability Testing and iwarp Performance. Whitepaper

SR-IOV In High Performance Computing

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

Accelerating and Simplifying Apache

Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer!

Building Enterprise-Class Storage Using 40GbE

SR-IOV: Performance Benefits for Virtualized Interconnects!

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Open source Google-style large scale data analysis with Hadoop

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Windows 8 SMB 2.2 File Sharing Performance

Installing Hadoop over Ceph, Using High Performance Networking

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Accelera'ng Big Data Processing with Hadoop, Spark and Memcached

LS DYNA Performance Benchmarks and Profiling. January 2009

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Memory Aggregation For KVM Hecatonchire Project

RDMA for Apache Hadoop User Guide

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Ultra Low Latency Data Center Switches and iwarp Network Interface Cards

Mellanox Academy Online Training (E-learning)

Introduction to Hadoop on the SDSC Gordon Data Intensive Cluster"

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Lustre Networking BY PETER J. BRAAM

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Storage at a Distance; Using RoCE as a WAN Transport

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers

Advancing Applications Performance With InfiniBand

Advanced Computer Networks. High Performance Networking I

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

New Data Center architecture

TCP/IP Implementation of Hadoop Acceleration. Cong Xu

New Storage System Solutions

CSE-E5430 Scalable Cloud Computing Lecture 2

Accelerate Big Data Analysis with Intel Technologies

Open Cirrus: Towards an Open Source Cloud Stack

High Speed I/O Server Computing with InfiniBand

Transcription:

Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA

Outline Introduction Overview of Memcached and Hadoop A new approach towards OFA in Cloud Experimental Results & Analysis Summary 2

Introduction High-Performance Computing (HPC) has adopted advanced interconnects (e.g. InfiniBand, 10 Gigabit Ethernet) Low latency (few micro seconds), High Bandwidth (40 Gb/s) Low CPU overhead OpenFabrics has been quite successful in the HPC domain Many machines in Top500 list Beginning to draw interest from the enterprise domain Google keynote shows interest in IB for improving RPC cost Oracle has used IB in Exadata Performance in the enterprise domain remains a concern Google keynote also highlighted this 3

Latency (us) MillionBytes/sec MPI (MVAPICH2) Performance over IB 4.5 4 Small Message Latency 3500 3000 Unidirectional Bandwidth 3023.7 3.5 3 2.5 2500 2000 2 1.5 1.51 1500 1000 1 0.5 0 500 0 Message Size (bytes) Message Size (bytes) 2.4 GHz Quad-core (Nehalem) Intel serves with Mellanox ConnectX-2 QDR adapters and IB switch 4

Software Ecosystem in Cloud and Upcoming Challenges Memcached scalable distributed caching Widely adopted caching frontend to MySQL and other DBs MapReduce scalable model to process Petabytes of data Hadoop MapReduce framework widely adopted Both Memcached and Hadoop designed with Sockets Sockets API itself was designed decades ago At the same time SSDs have improved I/O characteristics Google keynote also highlighted that I/O costs are coming down a lot Communication cost will dominate in the future Can OFA help cloud computing software performance? 5

Typical HPC and Cloud Computing Deployments Compute cluster VM VM Physical Machine Frontend High-Performance Interconnects VM VM Physical Machine Local Storage VM VM Local Storage GigE Virtual FS Meta-Data I/O Server I/O Server I/O Server Meta Data Data Data Data Physical Machine Local Storage VM VM Physical Machine I/O Server Data Local Storage HPC system design is interconnect centric Cloud computing environment has complex software and historically relied on Sockets and Ethernet 6

Outline Introduction Overview of Memcached and Hadoop A new approach towards OFA in Cloud Experimental Results & Analysis Summary 7

Memcached Architecture System Area Network System Area Network Internet Web-frontend Servers Memcached Servers Database Servers Distributed Caching Layer Allows to aggregate spare memory from multiple nodes General purpose Typically used to cache database queries, results of API calls Scalable model, but typical usage very network intensive 8

Hadoop Architecture Underlying Hadoop Distributed File System (HDFS) Fault-tolerance by replicating data blocks NameNode: stores information on data blocks DataNodes: store blocks and host Map-reduce computation JobTracker: track jobs and detect failure Model scales but high amount of communication during intermediate phases 9

Outline Introduction Overview of Memcached and Hadoop A new approach towards OFA in Cloud Experimental Results & Analysis Summary 10

InfiniBand and 10 Gigabit Ethernet InfiniBand is an industry standard packet switched network Has been increasingly adopted in HPC systems User-level networking with OS-bypass (verbs) 10 Gigabit Ethernet follow up to Gigabit Ethernet Provides user-level networking with OS-bypass (iwarp) Some vendors have accelerated TCP/IP by putting it on the network card (hardware offload) Convergence: possible to use both through OpenFabrics Same software, different networks 11

Modern Interconnects and Protocols Application Application Interface Sockets Verbs Kernel Space TCP/IP TCP/IP SDP RDMA Protocol Implementation Ethernet Driver IPoIB Hardware Offload RDMA User space Network Adapter 1/10 GigE Adapter InfiniBand Adapter 10 GigE Adapter InfiniBand Adapter InfiniBand Adapter Network Switch Ethernet Switch InfiniBand Switch 10 GigE Switch InfiniBand Switch InfiniBand Switch 1/10 GigE IPoIB 10 GigE-TOE SDP Verbs 12

A New Approach towards OFA in Cloud Current Cloud Software Design Current Approach Towards OFA in Cloud Our Approach Towards OFA in Cloud Application Application Application Sockets Accelerated Sockets Verbs / Hardware Offload Verbs Interface 1/10 GigE Network 10 GigE or InfiniBand Sockets not designed for high-performance 10 GigE or InfiniBand Stream semantics often mismatch for upper layers (Memcached, Hadoop) Zero-copy not available for non-blocking sockets (Memcached) Significant consolidation in cloud system software Hadoop and Memcached are developer facing APIs, not sockets Improving Hadoop and Memcached will benefit many applications immediately! 13

Memcached Design Using Verbs Sockets Client 1 2 Master Thread Sockets Worker Thread Shared Data RDMA Client 1 2 Sockets Worker Thread Verbs Worker Thread Verbs Worker Thread Memory Slabs Items Server and client perform a negotiation protocol Master thread assigns clients to appropriate worker thread Once a client is assigned a verbs worker thread, it can communicate directly and is bound to that thread All other Memcached data structures are shared among RDMA and Sockets worker threads 14

Outline Introduction Overview of Memcached and Hadoop A new approach towards OFA in Cloud Experimental Results & Analysis Summary 15

Memcached Experiments Experimental Setup Intel Clovertown 2.33GHz, 6GB RAM, InfiniBand DDR, Chelsio T320 Intel Westmere 2.67GHz, 12GB RAM, InfiniBand QDR Memcached server: 1.4.5 Memcached Client (libmemcached) 0.45 Hadoop Experiments Intel Clovertown 2.33GHz, 6GB RAM, InfiniBand DDR, Chelsio T320 Intel X-25E 64GB SSD and 250GB HDD Hadoop version 0.20.2, Sun/Oracle Java 1.6.0 Dedicated NameServer and JobTracker Number of Datanodes used: 2, 4, and 8 We used unmodified Hadoop for our experiments OFA used through Sockets 16

Time (us) Time (us) Memcached Get Latency 250 200 150 100 50 350 300 250 200 150 100 50 SDP IPoIB OSU Design 1G 10G-TOE 0 0 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 1 2 4 8 16 32 64 128 256 512 1K 2K 4K Message Size Message Size Intel Clovertown Cluster (IB: DDR) Intel Clovertown Cluster (IB: QDR) Memcached Get latency 4 bytes DDR: 6 us; QDR: 5 us 4K bytes -- DDR: 20 us; QDR:12 us Almost factor of four improvement over 10GE (TOE) for 4K We are in the process of evaluating iwarp on 10GE 17

Thousands of Transactions per second (TPS) Memcached Get TPS 700 2000 600 500 1800 1600 1400 400 300 200 100 1200 1000 800 600 400 200 8 clients 16 clients 0 SDP IPoIB OSU Design 0 SDP IPoIB OSU Design Intel Clovertown Cluster (IB: DDR) Memcached Get transactions per second for 4 bytes On IB DDR about 700K/s for 16 clients On IB QDR 1.9M/s for 16 clients Almost factor of six improvement over SDP We are in the process of evaluating iwarp on 10GE Intel Clovertown Cluster (IB: QDR) 18

Bandwidth (MB/s) Bandwidth (MB/s) Hadoop: Java Communication Benchmark 1400 Bandwidth with C version 1400 Bandwidth Java Version 1200 1200 1000 800 600 400 1000 800 600 1GE IPoIB SDP 10GE-TOE SDP-NIO 200 400 0 200 0 Message Size (Bytes) Sockets level ping-pong bandwidth test 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Message Size (Bytes) Java performance depends on usage of NIO (allocatedirect) C and Java versions of the benchmark have similar performance HDFS does not use direct allocated blocks or NIO on DataNode S. Sur, H. Wang, J. Huang, X. Ouyang and D. K. Panda Can High-Performance Interconnects Benefit Hadoop Distributed File System?, MASVDC 10 in conjunction with MICRO 2010, Atlanta, GA. 19

Average Write Throughput (MB/sec) Hadoop: DFS IO Write Performance 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 File Size(GB) DFS IO included in Hadoop, measures sequential access throughput We have two map tasks each writing to a file of increasing size (1-10GB) Significant improvement with IPoIB, SDP and 10GigE 1GE with HDD IGE with SSD IPoIB with HDD IPoIB with SSD SDP with HDD SDP with SSD 10GE-TOE with HDD 10GE-TOE with SSD With SSD, performance improvement is almost seven or eight fold! SSD benefits not seen without using high-performance interconnect! In-line with comment on Google keynote about I/O performance Four Data Nodes Using HDD and SSD 20

Execution Time (sec) Hadoop: RandomWriter Performance 700 600 500 400 300 200 100 0 2 4 Number of data nodes 1GE with HDD IGE with SSD IPoIB with HDD IPoIB with SSD SDP with HDD SDP with SSD 10GE-TOE with HDD 10GE-TOE with SSD Each map generates 1GB of random binary data and writes to HDFS SSD improves execution time by 50% with 1GigE for two DataNodes For four DataNodes, benefits are observed only with HPC interconnect IPoIB, SDP and 10GigE can improve performance by 59% on four DataNodes 21

Execution Time (sec) Hadoop Sort Benchmark 2500 2000 1500 1000 500 0 2 4 Number of data nodes 1GE with HDD IGE with SSD IPoIB with HDD IPoIB with SSD SDP with HDD SDP with SSD 10GE-TOE with HDD 10GE-TOE with SSD Sort: baseline benchmark for Hadoop Sort phase: I/O bound; Reduce phase: communication bound SSD improves performance by 28% using 1GigE with two DataNodes Benefit of 50% on four DataNodes using SDP, IPoIB or 10GigE 22

Summary OpenFabrics has come a long way in HPC adoption Facing new frontiers in the Cloud computing domain Previous attempts at OpenFabrics adoption in Cloud focused on Sockets Even using OpenFabrics through Sockets good gains can be observed 50% faster sorting when OFA used in conjunction with SSDs There is a vast performance gap between Sockets and Verbs level performance Factor of four improvement in Memcached get latency (4K bytes) Factor of six improvement in Memcached get transactions/s (4 bytes) Native Verbs-level designs will benefit cloud computing domain We are currently working on Verbs-level designs of HDFS and Hbase 23

Thank You! {panda, surs}@cse.ohio-state.edu Network-Based Computing Laboratory http://nowlab.cse.ohio-state.edu/ MVAPICH Web Page http://mvapich.cse.ohio-state.edu/ 24