Big Data: Hadoop and Memcached

Size: px
Start display at page:

Download "Big Data: Hadoop and Memcached"

Transcription

1 Big Data: Hadoop and Memcached Talk at HPC Advisory Council Stanford Conference and Exascale Workshop (Feb 214) by Dhabaleswar K. (DK) Panda The Ohio State University

2 Introduction to Big Data Applications and Analytics Big Data has become the one of the most important elements of business analytics Provides groundbreaking opportunities for enterprise information management and decision making The amount of data is exploding; companies are capturing and digitizing more information than ever The rate of information growth appears to be exceeding Moore s Law Commonly accepted 3V s of Big Data Volume, Velocity, Variety Michael Stonebraker: Big Data Means at Least Three Different Things, 5V s of Big Data 3V + Value, Veracity 2

3 Not Only in Internet Services - Big Data in Scientific Domains Scientific Data Management, Analysis, and Visualization Applications examples Climate modeling Combustion Fusion Astrophysics Bioinformatics Data Intensive Tasks Runs large-scale simulations on supercomputers Dump data on parallel storage systems Collect experimental / observational data Move experimental / observational data to analysis sites Visual analytics help understand data visually 3

4 Typical Solutions or Architectures for Big Data Applications and Analytics Hadoop: The most popular framework for Big Data Analytics HDFS, MapReduce, HBase, RPC, Hive, Pig, ZooKeeper, Mahout, etc. Spark: Provides primitives for in-memory cluster computing; Jobs can load data into memory and query it repeatedly Shark: A fully Apache Hive-compatible data warehousing system Storm: A distributed real-time computation system for real-time analytics, online machine learning, continuous computation, etc. S4: A distributed system for processing continuous unbounded streams of data Web 2.: RDBMS + Memcached ( Memcached: A high-performance, distributed memory object caching systems 4

5 Who Are Using Hadoop? Focuses on large data and data analysis Hadoop (e.g. HDFS, MapReduce, HBase, RPC) environment is gaining a lot of momentum 5

6 Hadoop at Yahoo! Scale: 42, nodes, 365PB HDFS, 1M daily slot hours A new Jim Gray Sort Record: 1.42 TB/min over 2,1 nodes Hadoop.23.7, an early branch of the Hadoop 2.X 6

7 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 7

8 Big Data Processing with Hadoop The open-source implementation of MapReduce programming model for Big Data Analytics Major components HDFS MapReduce Underlying Hadoop Distributed File System (HDFS) used by both MapReduce and HBase Model scales but high amount of communication during intermediate phases can be further optimized User Applications MapReduce HBase HDFS Hadoop Common (RPC) Hadoop Framework 8

9 Hadoop MapReduce Architecture & Operations Disk Operations Map and Reduce Tasks carry out the total job execution Communication in shuffle phase uses HTTP over Java Sockets 9

10 Hadoop Distributed File System (HDFS) Primary storage of Hadoop; highly reliable and fault-tolerant Adopted by many reputed organizations eg: Facebook, Yahoo! NameNode: stores the file system namespace DataNode: stores data blocks Developed in Java for platformindependence and portability Uses sockets for communication! Client RPC RPC RPC RPC (HDFS Architecture) 1

11 System-Level Interaction Between Clients and Data Nodes in HDFS (HDD/SSD) High Performance Networks (HDD/SSD) (HDD/SSD) (HDFS Clients) (HDFS Data Nodes) 11

12 HBase Apache Hadoop Database ( Semi-structured database, which is highly scalable ZooKeeper Integral part of many datacenter applications eg:- Facebook Social Inbox Developed in Java for platformindependence and portability HBase Client HMaster HRegion Servers (HBase Architecture) HDFS Uses sockets for communication! 12

13 System-Level Interaction Among HBase Clients, Region Servers, and Data Nodes (HDD/SSD) High Performance Networks... High Performance Networks... (HDD/SSD) (HDD/SSD) (HBase Clients) (HRegion Servers) (Data Nodes) 13

14 Memcached Architecture Main memory SSD CPUs HDD... Main memory SSD CPUs HDD!"#$%"$#& High Performance Networks High Performance Networks Main memory SSD CPUs HDD High Performance Networks Main memory SSD CPUs HDD (Database Servers) Main memory CPUs Web Frontend Servers (Memcached Clients) (Memcached Servers) SSD HDD Integral part of Web 2. architecture Distributed Caching Layer Allows to aggregate spare memory from multiple nodes General purpose Typically used to cache database queries, results of API calls Scalable model, but typical usage very network intensive 14

15 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 15

16 Number of Clusters Percentage of Clusters Trends for Commodity Computing Clusters in the Top 5 List ( Percentage of Clusters Number of Clusters Timeline 16

17 HPC and Advanced Interconnects High-Performance Computing (HPC) has adopted advanced interconnects and protocols InfiniBand 1 Gigabit Ethernet/iWARP RDMA over Converged Enhanced Ethernet (RoCE) Very Good Performance Low latency (few micro seconds) High Bandwidth (1 Gb/s with dual FDR InfiniBand) Low CPU overhead (5-1%) OpenFabrics software stack with IB, iwarp and RoCE interfaces are driving HPC systems Many such systems in Top5 list 17

18 Trends of Networking Technologies in TOP5 Systems Percentage share of InfiniBand is steadily increasing Interconnect Family Systems Share 18

19 Use of High-Performance Networks for Scientific Computing Message Passing Interface (MPI) Most commonly used programming model for HPC applications Parallel File Systems Storage Systems Almost 13 years of Research and Development since InfiniBand was introduced in October 2 Other Programming Models are emerging to take advantage of High-Performance Networks UPC OpenSHMEM Hybrid MPI+PGAS (OpenSHMEM and UPC) 19

20 Latency (us) Latency (us) One-way Latency: MPI over IB with MVAPICH Small Message Latency Large Message Latency Qlogic-DDR Qlogic-QDR ConnectX-DDR ConnectX2-PCIe2-QDR ConnectX3-PCIe3-FDR Sandy-ConnectIB-DualFDR Ivy-ConnectIB-DualFDR Message Size (bytes) Message Size (bytes) DDR, QDR GHz Quad-core (Westmere) Intel PCI Gen2 with IB switch FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Deca-core (IvyBridge) Intel PCI Gen3 with IB switch 2

21 Bandwidth (MBytes/sec) Bandwidth (MBytes/sec) Bandwidth: MPI over IB with MVAPICH Unidirectional Bandwidth Bidirectional Bandwidth Qlogic-DDR Qlogic-QDR ConnectX-DDR ConnectX2-PCIe2-QDR ConnectX3-PCIe3-FDR Sandy-ConnectIB-DualFDR Ivy-ConnectIB-DualFDR Message Size (bytes) Message Size (bytes) DDR, QDR GHz Quad-core (Westmere) Intel PCI Gen2 with IB switch FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Deca-core (IvyBridge) Intel PCI Gen3 with IB switch 21

22 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 22

23 Can High-Performance Interconnects Benefit Hadoop? Most of the current Hadoop environments use Ethernet Infrastructure with Sockets Concerns for performance and scalability Usage of High-Performance Networks is beginning to draw interest from many companies What are the challenges? Where do the bottlenecks lie? Can these bottlenecks be alleviated with new designs (similar to the designs adopted for MPI)? Can HPC Clusters with High-Performance networks be used for Big Data applications using Hadoop? 23

24 Designing Communication and I/O Libraries for Big Data Systems: Challenges Applications Benchmarks Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Programming Models (Sockets) Other Protocols? Point-to-Point Communication Communication and I/O Library Threaded Models and Synchronization Virtualization I/O and File Systems QoS Fault-Tolerance Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 24

25 Common Protocols using Open Fabrics Application /Middleware Interface Application/ Middleware Sockets Verbs Protocol Kernel Space TCP/IP TCP/IP RSockets SDP iwarp RDMA RDMA Ethernet Driver IPoIB Hardware Offload User space RDMA User space User space User space Adapter Ethernet Adapter InfiniBand Adapter Ethernet Adapter InfiniBand Adapter InfiniBand Adapter iwarp Adapter RoCE Adapter InfiniBand Adapter Switch Ethernet Switch InfiniBand Switch Ethernet Switch InfiniBand Switch InfiniBand Switch Ethernet Switch Ethernet Switch InfiniBand Switch 1/1/4 GigE IPoIB 1/4 GigE- TOE RSockets SDP iwarp RoCE IB Verbs 25

26 Can Hadoop be Accelerated with High-Performance Networks and Protocols? Current Design Enhanced Designs Our Approach Application Application Application Sockets Accelerated Sockets Verbs / Hardware Offload OSU Design Verbs Interface 1/1 GigE Network 1 GigE or InfiniBand 1 GigE or InfiniBand Sockets not designed for high-performance Stream semantics often mismatch for upper layers (Hadoop and HBase) Zero-copy not available for non-blocking sockets 26

27 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 27

28 RDMA for Apache Hadoop Project High-Performance Design of Hadoop over RDMA-enabled Interconnects High performance design with native InfiniBand and RoCE support at the verbslevel for HDFS, MapReduce, and RPC components Easily configurable for native InfiniBand, RoCE and the traditional socketsbased support (Ethernet and InfiniBand with IPoIB) Current release:.9.8 Based on Apache Hadoop Compliant with Apache Hadoop APIs and applications Tested with Mellanox InfiniBand adapters (DDR, QDR and FDR) RoCE Various multi-core platforms Different file systems with disks and SSDs 28

29 Design Overview of HDFS with RDMA Enables high performance RDMA communication, while supporting traditional socket interface Others Java Socket Interface Applications HDFS Write Java Native Interface (JNI) OSU Design Design Features RDMA-based HDFS write RDMA-based HDFS replication InfiniBand/RoCE support 1/1 GigE, IPoIB Network Verbs RDMA Capable Networks (IB, 1GE/ iwarp, RoCE..) JNI Layer bridges Java based HDFS with communication library written in native code Only the communication part of HDFS Write is modified; No change in HDFS architecture 29

30 Communication Time in HDFS Communication Time (s) GigE IPoIB (QDR) OSU-IB (QDR) 15 1 Reduced by 3% 5 2GB 4GB 6GB 8GB 1GB File Size (GB) Cluster with HDD DataNodes 3% improvement in communication time over IPoIB (QDR) 56% improvement in communication time over 1GigE Similar improvements are obtained for SSD DataNodes N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy and D. K. Panda, High Performance RDMA-Based Design of HDFS over InfiniBand, Supercomputing (SC), Nov 212 3

31 Total Throughput (MBps) Total Throughput (MBps) Evaluations using TestDFSIO GigE IPoIB (QDR) OSU-IB (QDR) Increased by 24% IPoIB (QDR) OSU-IB (QDR) Increased by 61% File Size (GB) File Size (GB) Cluster with 8 HDD Nodes, TestDFSIO with 8 maps Cluster with 4 SSD Nodes, TestDFSIO with 4 maps Cluster with 8 HDD DataNodes, single disk per node 24% improvement over IPoIB (QDR) for 2GB file size Cluster with 4 SSD DataNodes, single SSD per node 61% improvement over IPoIB (QDR) for 2GB file size 31

32 Total Throughput (MBps) Evaluations using TestDFSIO GigE-1HDD IPoIB -1HDD (QDR) OSU-IB -1HDD (QDR) 1GigE-2HDD IPoIB -2HDD (QDR) OSU-IB -2HDD (QDR) 15 Increased by 31% File Size (GB) Cluster with 4 DataNodes, 1 HDD per node 24% improvement over IPoIB (QDR) for 2GB file size Cluster with 4 DataNodes, 2 HDD per node 31% improvement over IPoIB (QDR) for 2GB file size 2 HDD vs 1 HDD 76% improvement for OSU-IB (QDR) 66% improvement for IPoIB (QDR) 32

33 Aggregated Throughput (MBps) Execution Time (s) Evaluations using Enhanced DFSIO of Intel HiBench on SDSC-Gordon 25 2 IPoIB (QDR) OSU-IB (QDR) 16 Increased by 47% IPoIB (QDR) Reduced by 16% OSU-IB (QDR) File Size (GB) File Size (GB) Cluster with 32 DataNodes, single SSD per node 47% improvement in throughput over IPoIB (QDR) for 128GB file size 16% improvement in latency over IPoIB (QDR) for 128GB file size 33

34 Aggregated Throughput (MBps) Execution Time (s) Evaluations using Enhanced DFSIO of Intel HiBench on TACC-Stampede IPoIB (FDR) OSU-IB (FDR) Increased by 64% IPoIB (FDR) OSU-IB (FDR) Reduced by 37% File Size (GB) File Size (GB) Cluster with 64 DataNodes, single HDD per node 64% improvement in throughput over IPoIB (FDR) for 256GB file size 37% improvement in latency over IPoIB (FDR) for 256GB file size 34

35 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 35

36 Design Overview of MapReduce with RDMA Applications MapReduce Job Tracker Task Tracker Map Reduce Design features for RDMA Prefetch/Caching of MOF In-Memory Merge Overlap of Merge & Reduce Enables high performance RDMA communication, while supporting traditional socket interface Java Socket Interface 1/1 GigE, IPoIB Network Java Native Interface (JNI) OSU Design Verbs RDMA Capable Networks (IB, 1GE/ iwarp, RoCE..) JNI Layer bridges Java based MapReduce with communication library written in native code InfiniBand/RoCE support 36

37 Job Execution Time (sec) Job Execution Time (sec) Evaluations using Sort GigE IPoIB (QDR) Hadoop-A-IB (QDR) OSU-IB (QDR) Data Size (GB) 8 HDD DataNodes With 8 HDD DataNodes for 4GB sort 41% improvement over IPoIB (QDR) 39% improvement over Hadoop-A-IB (QDR) GigE IPoIB (QDR) Hadoop-A-IB (QDR) OSU-IB (QDR) Data Size (GB) 8 SSD DataNodes With 8 SSD DataNodes for 4GB sort 52% improvement over IPoIB (QDR) 44% improvement over Hadoop-A-IB (QDR) M. W. Rahman, N. S. Islam, X. Lu, J. Jose, H. Subramon, H. Wang, and D. K. Panda, High-Performance RDMAbased Design of Hadoop MapReduce over InfiniBand, HPDIC Workshop, held in conjunction with IPDPS, May

38 Job Execution Time (sec) Evaluations using TeraSort GigE IPoIB (QDR) Hadoop-A-IB (QDR) OSU-IB (QDR) Data Size (GB) 1 GB TeraSort with 8 DataNodes with 2 HDD per node 51% improvement over IPoIB (QDR) 41% improvement over Hadoop-A-IB (QDR) 38

39 Job Execution Time (sec) Job Execution Time (sec) Performance Evaluation on Larger Clusters 12 1 IPoIB (QDR) OSU-IB (QDR) Hadoop-A-IB (QDR) 7 6 IPoIB (FDR) Hadoop-A-IB (FDR) OSU-IB (FDR) Data Size: 6 GB Data Size: 12 GB Data Size: 24 GB Cluster Size: 16 Cluster Size: 32 Cluster Size: 64 Sort in OSU Cluster For 24GB Sort in 64 nodes 4% improvement over IPoIB (QDR) with HDD used for HDFS Data Size: 8 GB Data Size: 16 GBData Size: 32 GB Cluster Size: 16 Cluster Size: 32 Cluster Size: 64 TeraSort in TACC Stampede For 32GB TeraSort in 64 nodes 38% improvement over IPoIB (FDR) with HDD used for HDFS 39

40 Normalized Execution Time Evaluations using PUMA Workload GigE IPoIB (QDR) OSU-IB (QDR) AdjList (3GB) SelfJoin (8GB) SeqCount (3GB) WordCount (3GB) InvertIndex (3GB) Benchmarks 5% improvement in Self Join over IPoIB (QDR) for 8 GB data size 49% improvement in Sequence Count over IPoIB (QDR) for 3 GB data size 4

41 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 41

42 Time (us) Time (us) Motivation Detailed Analysis on HBase Put/Get Communication Communication Preparation Server Processing Server Serialization Client Processing Client Serialization IPoIB (DDR) 1GigE OSU-IB (DDR) HBase Put 1KB HBase 1KB Put Communication Time 8.9 us A factor of 6X improvement over 1GE for communication time HBase 1KB Get Communication Time 8.9 us A factor of 6X improvement over 1GE for communication time IPoIB (DDR) 1GigE OSU-IB (DDR) Hbase Get 1KB M. W. Rahman, J. Huang, J. Jose, X. Ouyang, H. Wang, N. Islam, H. Subramoni, Chet Murthy and D. K. Panda, Understanding the Communication Characteristics in HBase: What are the Fundamental Bottlenecks?, ISPASS 12 42

43 Time (us) Ops/sec HBase Single-Server-Multi-Client Results IPoIB (DDR) OSU-IB (DDR) 1GigE No. of Clients Latency HBase Get latency 4 clients: 14.5 us; 16 clients: us HBase Get throughput clients: 37.1 Kops/sec; 16 clients: 53.4 Kops/sec 27% improvement in throughput for 16 clients over 1GE No. of Clients Throughput J. Huang, X. Ouyang, J. Jose, M. W. Rahman, H. Wang, M. Luo, H. Subramoni, Chet Murthy, and D. K. Panda, High-Performance Design of HBase with RDMA over InfiniBand, IPDPS 12 43

44 Time (us) Time (us) HBase YCSB Read-Write Workload IPoIB (QDR) 1GigE OSU-IB (QDR) No. of Clients No. of Clients Read Latency Write Latency HBase Get latency (Yahoo! Cloud Service Benchmark) 64 clients: 2. ms; 128 Clients: 3.5 ms 42% improvement over IPoIB for 128 clients HBase Get latency 64 clients: 1.9 ms; 128 Clients: 3.5 ms 4% improvement over IPoIB for 128 clients 44

45 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 45

46 Design Overview of Hadoop RPC with RDMA Applications Design Features Default Java Socket Interface Hadoop RPC Our Design Java Native Interface (JNI) OSU Design Verbs JVM-bypassed buffer management On-demand connection setup RDMA or send/recv based adaptive communication InfiniBand/RoCE support 1/1 GigE, IPoIB Network RDMA Capable Networks (IB, 1GE/ iwarp, RoCE..) Enables high performance RDMA communication, while supporting traditional socket interface 46

47 Latency (us) Throughput (Kops/Sec) Hadoop RPC over IB - Gain in Latency and Throughput GigE IPoIB(QDR) OSU-IB(QDR) Payload Size (Byte) Number of Clients Hadoop RPC over IB PingPong Latency 1 byte: 39 us; 4 KB: 52 us 42%-49% and 46%-5% improvements compared with the performance on 1 GigE and IPoIB (32Gbps), respectively Hadoop RPC over IB Throughput 512 bytes & 48 clients: Kops/sec 82% and 64% improvements compared with the peak performance on 1 GigE and IPoIB (32Gbps), respectively 47

48 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 48

49 Total Throughput (MBps) Total Throughput (MBps) Performance Benefits - TestDFSIO 1GigE IPoIB (QDR) OSU-IB (QDR) GigE IPoIB (QDR) OSU-IB (QDR) Size (GB) Size (GB) For 5GB, Cluster with 4 HDD Nodes, TestDFSIO with total 4 maps Cluster with 4 SSD Nodes, TestDFSIO with total 4 maps 53.4% improvement over IPoIB, 126.9% improvement over 1GigE with HDD used for HDFS 57.8% improvement over IPoIB, 138.1% improvement over 1GigE with SSD used for HDFS For 2GB, 1.7% improvement over IPoIB, 46.8% improvement over 1GigE with HDD used for HDFS 73.3% improvement over IPoIB, 141.3% improvement over 1GigE with SSD used for HDFS 49

50 Job Execution Time (sec) Job Execution Time (sec) Performance Benefits - Sort GigE IPoIB (QDR) OSU-IB (QDR) GigE IPoIB (QDR) OSU-IB (QDR) Sort Size (GB) Sort Size (GB) Cluster with 8 HDD Nodes, Sort with <8 maps, 8 reduces>/host Cluster with 8 SSD Nodes, Sort with <8 maps, 8 reduces>/host For 8GB Sort in a cluster size of 8, 4.7% improvement over IPoIB, 32.2% improvement over 1GigE with HDD used for HDFS 43.6% improvement over IPoIB, 42.9% improvement over 1GigE with SSD used for HDFS 5

51 Job Execution Time (sec) Job Execution Time (sec) Performance Benefits - TeraSort 25 1GigE IPoIB (QDR) OSU-IB (QDR) 25 1GigE IPoIB (QDR) OSU-IB (QDR) Size (GB) Size (GB) Cluster with 8 HDD Nodes, TeraSort with <8 maps, 8reduces>/host Cluster with 8 SSD Nodes, TeraSort with <8 maps, 8reduces>/host For 1GB TeraSort in a cluster size of 8, 45% improvement over IPoIB, 39% improvement over 1GigE with HDD used for HDFS 46% improvement over IPoIB, 4% improvement over 1GigE with SSD used for HDFS 51

52 Job Execution Time (sec) Job Execution Time (sec) Performance Benefits RandomWriter & Sort in SDSC-Gordon IPoIB (QDR) OSU-IB (QDR) IPoIB (QDR) OSU-IB (QDR) 4 Reduced by 2% Reduced by 36% GB 1 GB 2 GB 3 GB 5 GB 1 GB 2 GB 3 GB Cluster: 16 Cluster: 32 Cluster: 48 Cluster: 64 Cluster: 16 Cluster: 32 Cluster: 48 Cluster: 64 RandomWriter in a cluster of single SSD per Sort in a cluster of single SSD per node node 2% improvement over IPoIB (QDR) 16% improvement over IPoIB (QDR) for for 5GB in a cluster of 16 5GB in a cluster of 16 36% improvement over IPoIB (QDR) 2% improvement over IPoIB (QDR) for 3GB in a cluster of 64 for 3GB in a cluster of 64 52

53 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 53

54 Memcached-RDMA Design Sockets Client 1 2 Master Thread Sockets Worker Thread Shared Data RDMA Client 1 2 Sockets Worker Thread Verbs Worker Thread Verbs Worker Thread Memory Slabs Items Server and client perform a negotiation protocol Master thread assigns clients to appropriate worker thread Once a client is assigned a verbs worker thread, it can communicate directly and is bound to that thread All other Memcached data structures are shared among RDMA and Sockets worker threads 54

55 Time%(us)% Time%(us)% Time (us) Time (us) Memcached Get Latency (Small Message) Time%(us)% Time%(us)% Time%(us)% Time%(us)% K 2K Message Size Intel Clovertown Cluster (IB: DDR) Memcached Get latency 4 bytes RC/UD DDR: 6.82/7.55 us; QDR: 4.28/4.86 us 2K bytes RC/UD DDR: 12.31/12.78 us; QDR: 8.19/8.46 us Almost factor of four improvement over 1GE (TOE) for 2K bytes on the DDR cluster Message%Size% Message%Size% IPoIB (DDR) 1GigE (DDR) OSU-RC-IB (DDR) OSU-UD-IB (DDR) Time%(us)% 1 Time%(us)% Message%Size% Message%Size% K 2K Message Size IPoIB (QDR) 1GigE (QDR) Intel Westmere Cluster (IB: QDR) Message%Size% Message%Size% OSU-RC-IB (QDR) OSU-UD-IB (QDR) Message%Size% Message%Size% 55

56 Thousands of Transactions per second (TPS) Thousands of Transactions per second (TPS) Time%(us)% Time%(us)% Time%(us)% Memcached Get TPS (4byte) IPoIB (QDR) OSU-RC-IB (QDR) OSU-UD-IB (QDR) No. of Clients Memcached Get transactions per second for 4 bytes On IB QDR 1.4M/s (RC), 1.3 M/s (UD) for 8 clients K No. of Clients Message%Size% Message%Size% Significant improvement with native IB QDR compared to SDP and IPoIB Message%Size 56

57 K 2K 4K Time (us) Thousands of Transactions per Second (TPS) 1 1 Memcached Performance (FDR Interconnect) 1 Memcached GET Latency OSU-IB (FDR) IPoIB (FDR) Latency Reduced by nearly 2X Memcached Throughput 2X 1 Message Size Memcached Get latency Experiments on TACC Stampede (Intel SandyBridge Cluster, IB: FDR) 4 bytes OSU-IB: 2.84 us; IPoIB: us 2K bytes OSU-IB: 4.49 us; IPoIB: us Memcached Throughput (4bytes) 48 clients OSU-IB: 556 Kops/sec, IPoIB: 233 Kops/s Nearly 2X improvement in throughput No. of Clients 57

58 Time (ms) Time (ms) Application Level Evaluation Olio Benchmark 6 25 IPoIB (QDR) 5 2 OSU-RC-IB (QDR) OSU-UD-IB (QDR) 4 3 Time%(ms)% 15 OSU-Hybrid-IB (QDR) No. of Clients No.%of%Clients% No. of Clients Olio Benchmark RC 1.6 sec, UD 1.9 sec, Hybrid 1.7 sec for 124 clients 4X times better than IPoIB for 8 clients Hybrid design achieves comparable performance to that of pure RC design 58

59 Time (ms) Time (ms) Application Level Evaluation Real Application Workloads Time%(ms)% IPoIB (QDR) OSU-RC-IB (QDR) OSU-UD-IB (QDR) OSU-Hybrid-IB (QDR) No. of Clients Real Application Workload RC 32 ms, UD 318 ms, Hybrid 314 ms for 124 clients 12X times better than IPoIB for 8 clients Hybrid design achieves comparable performance to that of pure RC design No.%of%Clients% No. of Clients J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, W. Rahman, N. Islam, X. Ouyang, H. Wang, S. Sur and D. K. Panda, Memcached Design on High Performance RDMA Capable Interconnects, ICPP 11 J. Jose, H. Subramoni, K. Kandalla, W. Rahman, H. Wang, S. Narravula, and D. K. Panda, Scalable Memcached design for InfiniBand Clusters using Hybrid Transport, CCGrid 12 59

60 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 6

61 Designing Communication and I/O Libraries for Big Data Systems: Solved a Few Initial Challenges Applications Benchmarks Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Upper level Changes? Programming Models (Sockets) Other RDMA Protocols? Point-to-Point Communication Communication and I/O Library Threaded Models and Synchronization Virtualization I/O and File Systems QoS Fault-Tolerance Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 61

62 More Challenges Multi-threading and Synchronization Multi-threaded model exploration Fine-grained synchronization and lock-free design Unified helper threads for different components Multi-endpoint design to support multi-threading communications QoS and Virtualization Network virtualization and locality-aware communication for Big Data middleware Hardware-level virtualization support for End-to-End QoS I/O scheduling and storage virtualization Live migration 62

63 More Challenges (Cont d) Support of Accelerators Efficient designs for Big Data middleware to take advantage of NVIDA GPGPUs and Intel MICs Offload computation-intensive workload to accelerators Explore maximum overlapping between communication and offloaded computation Fault Tolerance Enhancements Novel data replication schemes for high-efficiency fault tolerance Exploration of light-weight fault tolerance mechanisms for Big Data processing Support of Parallel File Systems Optimize Big Data middleware over parallel file systems (e.g. Lustre) on modern HPC clusters 63

64 Job Execution Time (sec) Job Execution Time (sec) Performance Improvement Potential of MapReduce over Lustre on HPC Clusters An Initial Study IPoIB (FDR) OSU-IB (FDR) 2 IPoIB (QDR) OSU-IB (QDR) Data Size (GB) Data Size (GB) Sort in TACC Stampede Sort in SDSC Gordon For 5GB Sort in 64 nodes For 1GB Sort in 16 nodes 44% improvement over IPoIB (FDR) 33% improvement over IPoIB (QDR) Can more optimizations be achieved by leveraging more features of Lustre? 64

65 Are the Current Benchmarks Sufficient for Big Data? The current benchmarks provide some performance behavior However, do not provide any information to the designer/developer on: What is happening at the lower-layer? Where the benefits are coming from? Which design is leading to benefits or bottlenecks? Which component in the design needs to be changed and what will be its impact? Can performance gain/loss at the lower-layer be correlated to the performance gain/loss observed at the upper layer? 65

66 Challenges in Benchmarking of RDMA-based Designs Applications Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Programming Models (Sockets) Benchmarks Other RDMA Protocols? Current Benchmarks Correlation? Communication and I/O Library Point-to-Point Communication Threaded Models and Synchronization Virtualization No Benchmarks I/O and File Systems QoS Fault-Tolerance Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 66

67 OSU MPI Micro-Benchmarks (OMB) Suite A comprehensive suite of benchmarks to Compare performance of different MPI libraries on various networks and systems Validate low-level functionalities Provide insights to the underlying MPI-level designs Started with basic send-recv (MPI-1) micro-benchmarks for latency, bandwidth and bi-directional bandwidth Extended later to MPI-2 one-sided Collectives GPU-aware data movement OpenSHMEM (point-to-point and collectives) UPC Has become an industry standard Extensively used for design/development of MPI libraries, performance comparison of MPI libraries and even in procurement of large-scale systems Available from Available in an integrated manner with MVAPICH2 stack 67

68 Iterative Process Requires Deeper Investigation and Design for Benchmarking Next Generation Big Data Systems and Applications Applications Benchmarks Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Applications- Level Benchmarks Programming Models (Sockets) Other RDMA Protocols? Communication and I/O Library Point-to-Point Communication I/O and File Systems Threaded Models and Synchronization QoS Virtualization Fault-Tolerance Micro- Benchmarks Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 68

69 Future Plans of OSU Big Data Project Upcoming RDMA for Apache Hadoop Releases will support HDFS Parallel Replication Advanced MapReduce HBase Hadoop 2..x Memcached-RDMA Exploration of other components (Threading models, QoS, Virtualization, Accelerators, etc.) Advanced designs with upper-level changes and optimizations Comprehensive set of Benchmarks More details at 69

70 Concluding Remarks Presented an overview of Big Data, Hadoop and Memcached Provided an overview of Networking Technologies Discussed challenges in accelerating Hadoop Presented initial designs to take advantage of InfiniBand/RDMA for HDFS, MapReduce, RPC, HBase and Memcached Results are promising Many other open issues need to be solved Will enable Big Data community to take advantage of modern HPC technologies to carry out their analytics in a fast and scalable manner 7

71 Personnel Acknowledgments Current Students S. Chakraborthy (Ph.D.) M. Rahman (Ph.D.) N. Islam (Ph.D.) D. Shankar (Ph.D.) J. Jose (Ph.D.) R. Shir (Ph.D.) M. Li (Ph.D.) A. Venkatesh (Ph.D.) S. Potluri (Ph.D.) J. Zhang (Ph.D.) R. Rajachandrasekhar (Ph.D.) Past Students P. Balaji (Ph.D.) W. Huang (Ph.D.) D. Buntinas (Ph.D.) W. Jiang (M.S.) S. Bhagvat (M.S.) S. Kini (M.S.) L. Chai (Ph.D.) M. Koop (Ph.D.) B. Chandrasekharan (M.S.) R. Kumar (M.S.) N. Dandapanthula (M.S.) S. Krishnamoorthy (M.S.) V. Dhanraj (M.S.) K. Kandalla (Ph.D.) T. Gangadharappa (M.S.) P. Lai (M.S.) K. Gopalakrishnan (M.S.) J. Liu (Ph.D.) Current Senior Research Associate H. Subramoni Current Post-Docs Current Programmers X. Lu M. Arnold K. Hamidouche J. Perkins M. Luo (Ph.D.) G. Santhanaraman (Ph.D.) A. Mamidala (Ph.D.) A. Singh (Ph.D.) G. Marsh (M.S.) J. Sridhar (M.S.) V. Meshram (M.S.) S. Sur (Ph.D.) S. Naravula (Ph.D.) H. Subramoni (Ph.D.) R. Noronha (Ph.D.) K. Vaidyanathan (Ph.D.) X. Ouyang (Ph.D.) A. Vishnu (Ph.D.) S. Pai (M.S.) J. Wu (Ph.D.) W. Yu (Ph.D.) Past Post-Docs H. Wang X. Besseron H.-W. Jin E. Mancini S. Marcarelli J. Vienne Past Research Scientist S. Sur Past Programmers D. Bureddy M. Luo 71

72 Pointers

Accelerating Big Data Processing with Hadoop, Spark and Memcached

Accelerating Big Data Processing with Hadoop, Spark and Memcached Accelerating Big Data Processing with Hadoop, Spark and Memcached Talk at HPC Advisory Council Stanford Conference (Feb 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Accelerating Big Data Processing with Hadoop, Spark and Memcached

Accelerating Big Data Processing with Hadoop, Spark and Memcached Accelerating Big Data Processing with Hadoop, Spark and Memcached Talk at HPC Advisory Council Switzerland Conference (Mar 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department

More information

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

Accelerating Spark with RDMA for Big Data Processing: Early Experiences Accelerating Spark with RDMA for Big Data Processing: Early Experiences Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, Dip7 Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department

More information

Accelera'ng Big Data Processing with Hadoop, Spark and Memcached

Accelera'ng Big Data Processing with Hadoop, Spark and Memcached Accelera'ng Big Data Processing with Hadoop, Spark and Memcached Talk at HPC Advisory Council Switzerland Conference (Mar 15) by Dhabaleswar K. (DK) Panda The Ohio State University E- mail: panda@cse.ohio-

More information

RDMA over Ethernet - A Preliminary Study

RDMA over Ethernet - A Preliminary Study RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Hari Subramoni. Education: Employment: Research Interests: Projects:

Hari Subramoni. Education: Employment: Research Interests: Projects: Hari Subramoni Senior Research Associate, Dept. of Computer Science and Engineering The Ohio State University, Columbus, OH 43210 1277 Tel: (614) 961 2383, Fax: (614) 292 2911, E-mail: subramoni.1@osu.edu

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects

Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects Slides are available from http://www.cse.ohio-state.edu/~panda/hoti15-bigdata-tut.pdf Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects Tutorial at

More information

High-Performance Networking for Optimized Hadoop Deployments

High-Performance Networking for Optimized Hadoop Deployments High-Performance Networking for Optimized Hadoop Deployments Chelsio Terminator 4 (T4) Unified Wire adapters deliver a range of performance gains for Hadoop by bringing the Hadoop cluster networking into

More information

Hadoop Optimizations for BigData Analytics

Hadoop Optimizations for BigData Analytics Hadoop Optimizations for BigData Analytics Weikuan Yu Auburn University Outline WBDB, Oct 2012 S-2 Background Network Levitated Merge JVM-Bypass Shuffling Fast Completion Scheduler WBDB, Oct 2012 S-3 Emerging

More information

RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS

RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

High-Performance Design of HBase with RDMA over InfiniBand

High-Performance Design of HBase with RDMA over InfiniBand 212 IEEE 26th International Parallel and Distributed Processing Symposium High-Performance Design of HBase with RDMA over InfiniBand Jian Huang 1, Xiangyong Ouyang 1, Jithin Jose 1, Md. Wasi-ur-Rahman

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Can High-Performance Interconnects Benefit Hadoop Distributed File System?

Can High-Performance Interconnects Benefit Hadoop Distributed File System? Can High-Performance Interconnects Benefit Hadoop Distributed File System? Sayantan Sur, Hao Wang, Jian Huang, Xiangyong Ouyang and Dhabaleswar K. Panda Department of Computer Science and Engineering,

More information

A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks

A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks Dipti Shankar (B), Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Department of Computer

More information

RDMA for Apache Hadoop 0.9.9 User Guide

RDMA for Apache Hadoop 0.9.9 User Guide 0.9.9 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand Ping Lai, Hari Subramoni, Sundeep Narravula, Amith Mamidala, Dhabaleswar K. Panda Department of Computer Science and

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

MapReduce Evaluator: User Guide

MapReduce Evaluator: User Guide University of A Coruña Computer Architecture Group MapReduce Evaluator: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño December 9, 2014 Contents 1 Overview

More information

Building Enterprise-Class Storage Using 40GbE

Building Enterprise-Class Storage Using 40GbE Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni (mahidhar@sdsc.edu) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data

More information

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Advancing Applications Performance With InfiniBand

Advancing Applications Performance With InfiniBand Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one

More information

Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer!

Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer! Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer! Mahidhar Tatineni, Rick Wagner, Eva Hocks, Christopher Irving, and Jerry Greenberg! SDSC! XSEDE13, July 22-25, 2013! Overview!!

More information

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used

More information

TCP/IP Implementation of Hadoop Acceleration. Cong Xu

TCP/IP Implementation of Hadoop Acceleration. Cong Xu TCP/IP Implementation of Hadoop Acceleration by Cong Xu A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Science Auburn,

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Understanding the Significance of Network Performance in End Applications: A Case Study with EtherFabric and InfiniBand

Understanding the Significance of Network Performance in End Applications: A Case Study with EtherFabric and InfiniBand Understanding the Significance of Network Performance in End Applications: A Case Study with EtherFabric and InfiniBand K. VAIDYANATHAN, S. BHAGVAT, P. BALAJI AND D. K. PANDA Technical Report Ohio State

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National

More information

Michael Kagan. michael@mellanox.com

Michael Kagan. michael@mellanox.com Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies michael@mellanox.com Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service

More information

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions 64% of organizations were investing or planning to invest on Big Data technology

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures Brian Sparks IBTA Marketing Working Group Co-Chair Page 1 IBTA & OFA Update IBTA today has over 50 members; OFA

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters

High Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters High Throughput File Servers with SMB Direct, Using the 3 Flavors of network adapters Jose Barreto Principal Program Manager Microsoft Corporation Abstract In Windows Server 2012, we introduce the SMB

More information

FTP 100: An Ultra-High Speed Data Transfer Service over Next Generation 100 Gigabit Per Second Network

FTP 100: An Ultra-High Speed Data Transfer Service over Next Generation 100 Gigabit Per Second Network FTP 100: An Ultra-High Speed Data Transfer Service over Next Generation 100 Gigabit Per Second Network Award Number: 53662 - US Department o f Energy Aw ard Title: ARRA: TAS::89 0227:: TASRecovery A ct

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Improving High Performance Networking Technologies. for Data Center Clusters

Improving High Performance Networking Technologies. for Data Center Clusters Improving High Performance Networking Technologies for Data Center Clusters by Ryan Eric Grant A thesis submitted to the Department of Electrical and Computer Engineering In conformity with the requirements

More information

SR-IOV In High Performance Computing

SR-IOV In High Performance Computing SR-IOV In High Performance Computing Hoot Thompson & Dan Duffy NASA Goddard Space Flight Center Greenbelt, MD 20771 hoot@ptpnow.com daniel.q.duffy@nasa.gov www.nccs.nasa.gov Focus on the research side

More information

A Case for Flash Memory SSD in Hadoop Applications

A Case for Flash Memory SSD in Hadoop Applications A Case for Flash Memory SSD in Hadoop Applications Seok-Hoon Kang, Dong-Hyun Koo, Woon-Hak Kang and Sang-Won Lee Dept of Computer Engineering, Sungkyunkwan University, Korea x860221@gmail.com, smwindy@naver.com,

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

Windows 8 SMB 2.2 File Sharing Performance

Windows 8 SMB 2.2 File Sharing Performance Windows 8 SMB 2.2 File Sharing Performance Abstract This paper provides a preliminary analysis of the performance capabilities of the Server Message Block (SMB) 2.2 file sharing protocol with 10 gigabit

More information

An In-Memory RDMA-Based Architecture for the Hadoop Distributed Filesystem

An In-Memory RDMA-Based Architecture for the Hadoop Distributed Filesystem An In-Memory RDMA-Based Architecture for the Hadoop Distributed Filesystem Master Thesis Konstantinos Karampogias August 21, 2012 Supervisor: Advisors: Prof. Dr. Bernhard Plattner Dr. Xenofontas Dimitropoulos

More information

Zadara Storage Cloud A whitepaper. @ZadaraStorage

Zadara Storage Cloud A whitepaper. @ZadaraStorage Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers

More information

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com DDN Technical Brief Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. A Fundamentally Different Approach To Enterprise Analytics Architecture: A Scalable Unit

More information

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap

More information

Self service for software development tools

Self service for software development tools Self service for software development tools Michal Husejko, behalf of colleagues in CERN IT/PES CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Self service for software development tools

More information

A Tour of the Linux OpenFabrics Stack

A Tour of the Linux OpenFabrics Stack A Tour of the OpenFabrics Stack Johann George, QLogic June 2006 1 Overview Beyond Sockets Provides a common interface that allows applications to take advantage of the RDMA (Remote Direct Memory Access),

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2) INFINIBAND NETWORK ANALYSIS AND MONITORING USING OPENSM N. Dandapanthula 1, H. Subramoni 1, J. Vienne 1, K. Kandalla 1, S. Sur 1, D. K. Panda 1, and R. Brightwell 2 Presented By Xavier Besseron 1 Date:

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Extending Hadoop beyond MapReduce

Extending Hadoop beyond MapReduce Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core

More information

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. Grant 1 Ahmad Afsahi 1 Pavan Balaji 2 1 Department of Electrical and Computer Engineering Queen s University, Kingston, ON,

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

State of the Art Cloud Infrastructure

State of the Art Cloud Infrastructure State of the Art Cloud Infrastructure Motti Beck, Director Enterprise Market Development WHD Global I April 2014 Next Generation Data Centers Require Fast, Smart Interconnect Software Defined Networks

More information

The Greenplum Analytics Workbench

The Greenplum Analytics Workbench The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

New Storage System Solutions

New Storage System Solutions New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems

More information

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction

More information

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman High Performance OpenStack Cloud Eli Karpilovski Cloud Advisory Council Chairman Cloud Advisory Council Our Mission Development of next generation cloud architecture Providing open specification for cloud

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8

Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8 Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8 Tom Talpey, Architect Microsoft March 27, 2012 1 SMB2 Background The primary Windows filesharing protocol Initially shipped in Vista and Server

More information