Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014
Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery From 451 Research 2013 Hadoop survey 2
Where can we improve Hadoop? Hive Pig Map Reduce SQL (e.g. Impala) HBase HDFS (Hadoop Distributed File System) High demand to improve Real-time operation Fast execution Streaming data Issues Inherent data latency issue with HDFS Cannot support large number of small files Efficiency of Map Reduce, Hbase, Hive, etc. 3
HDFS Operation NameNode HDFS Federation NameNode Client Read Write DataNode DataNode DataNode 1 4 8 Replication 4 8 Replication 4 2 1 HDFS Federation Faster Disks Faster CPU and Memory Bigger network pipe 4
HARDWARE KERNEL USER RDMA (Remote Directory Memory Access) Application 1 Buffer 1 Application Buffer 1 2 OS Buffer 1 Buffer 1 Buffer 1 Buffer 1 OS RDMA over InfiniBand or Ethernet HCA HCA NIC Buffer 1 Buffer 1 NIC TCP/IP RACK 1 RACK 2 5
RDMA: Critical for Efficient Data Movement ZERO Copy Remote Data Transfer USER KERNEL Application Buffer Application Buffer HARDWARE Kernel Bypass Protocol Offload Low Latency, High Performance Data Transfers InfiniBand - 56Gb/s RoCE * RDMA over Converged Ethernet 6
HDFS Operation with RDMA NameNode NameNode Client Read Write DataNode DataNode DataNode 1 4 8 Replication 4 8 Replication 4 2 1 7
HDFS RDMA Acceleration Solution 1 Hadoop HDFS-RDMA acceleration: 100% java code written on top of JXIO Same memory footprint as the vanilla client/server uses First results show double performance for HDFS WRITE operation With 3 replications compared to vanilla 8
Accelio, High-Performance Reliable Messaging and RPC Library Open source https://github.com/accelio/accelio/ && www.accelio.org Faster RDMA integration to application Maximize message and CPU parallelism 9
HDFS RDMA Acceleration Solution 2 Package available at: http://hadooprdma.cse.ohio-state.edu/ Big performance gain with RDMA support 10
Map Reduce Workflow 11
RDMA-Enabled MapReduce Unstructured Data Accelerator - UDA Uses RDMA to do the Shuffle & Merge Plug-in architecture Open-source Supported Hadoop Distributions Apache 3.0, Apache 2.2.x, Apache 1.3 Cloudera Distribution Hadoop 4.4 Inbox 12
Storage Limitations for Hadoop Hadoop using local disk to maintain data locality and reduce latency High-value that resides on external storage systems Copy data onto HDFS, run Analytics, and then copy the results to another system Wasting storage space As data sources increase, managing data is nightmare Option of just accessing the external data without having to deal with copying Need to provide performance 13
Storage: From Scale-Up to Scale-Out Scale-out storage systems using distributed computing architectures Scalable and resilient 14
Sequential Read Performance (singe port) 15
Fastest and Lowest Latency Storage Access with iser K IOPs @ 4K IO Size 2500 2000 1500 1000 500 0 iscsi (TCP/IP) 1 x FC 8 Gb port 4 x FC 8 Gb port iser 1 x 40GbE/IB Port iser 2 x 40GbE/IB Port (+Acceleratio n) KIOPs 130 200 800 1100 2300 16
Lustre as Hadoop Storage Solution RDMA enables highest Lustre performance 17
Hadoop over Cloud?? Performance? Benefits: Lowering the cost of innovation Procuring large scale resources quickly Running closer to the data Simplifying Hadoop operations Concerns: Heavily utilized, rather than being massively provisioned Cloud storage is slower and expensive Data locality makes a big difference for performance 18
Fastest OpenStack Storage Access Compute Servers V V V OS M OS M OS M Storage Servers OpenStack (Cinder) iscsi/iser Target (tgt) Hypervisor (KVM) Open-iSCSI w iser RDMA Cache Adapter Adapter Local Disks Using RDMA to accelerate iscsi storage RDMA Capable Interconnect Using OpenStack Built-in components and management RDMA is already inbox and used by OpenStack RDMA enables faster performance, with much lower CPU% 19
Fast Interconnect with RDMA to Boost Big Data 4X Faster Run Time! Benchmark: TestDFSIO (1TeraByte, 100 files) 2X Higher Performance! Benchmark: 1M Records Workload (4M Operations) 2X faster run time and 2X higher throughput 2X Faster Run Time! Benchmark: MemCacheD Operations 3X Faster Run Time! Benchmark: Redis Operations 20
RDMA Can Accelerate All Layers Compute I/O Nodes Filesystem Storage 21
What s Happening with Big Data Platform Big Data Meets HPC! 22
Questions? All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein 23 23