Big Data: Hadoop and Memcached
|
|
- Noah Patterson
- 8 years ago
- Views:
Transcription
1 Big Data: Hadoop and Memcached Talk at HPC Advisory Council Stanford Conference and Exascale Workshop (Feb 214) by Dhabaleswar K. (DK) Panda The Ohio State University
2 Introduction to Big Data Applications and Analytics Big Data has become the one of the most important elements of business analytics Provides groundbreaking opportunities for enterprise information management and decision making The amount of data is exploding; companies are capturing and digitizing more information than ever The rate of information growth appears to be exceeding Moore s Law Commonly accepted 3V s of Big Data Volume, Velocity, Variety Michael Stonebraker: Big Data Means at Least Three Different Things, 5V s of Big Data 3V + Value, Veracity 2
3 Not Only in Internet Services - Big Data in Scientific Domains Scientific Data Management, Analysis, and Visualization Applications examples Climate modeling Combustion Fusion Astrophysics Bioinformatics Data Intensive Tasks Runs large-scale simulations on supercomputers Dump data on parallel storage systems Collect experimental / observational data Move experimental / observational data to analysis sites Visual analytics help understand data visually 3
4 Typical Solutions or Architectures for Big Data Applications and Analytics Hadoop: The most popular framework for Big Data Analytics HDFS, MapReduce, HBase, RPC, Hive, Pig, ZooKeeper, Mahout, etc. Spark: Provides primitives for in-memory cluster computing; Jobs can load data into memory and query it repeatedly Shark: A fully Apache Hive-compatible data warehousing system Storm: A distributed real-time computation system for real-time analytics, online machine learning, continuous computation, etc. S4: A distributed system for processing continuous unbounded streams of data Web 2.: RDBMS + Memcached ( Memcached: A high-performance, distributed memory object caching systems 4
5 Who Are Using Hadoop? Focuses on large data and data analysis Hadoop (e.g. HDFS, MapReduce, HBase, RPC) environment is gaining a lot of momentum 5
6 Hadoop at Yahoo! Scale: 42, nodes, 365PB HDFS, 1M daily slot hours A new Jim Gray Sort Record: 1.42 TB/min over 2,1 nodes Hadoop.23.7, an early branch of the Hadoop 2.X 6
7 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 7
8 Big Data Processing with Hadoop The open-source implementation of MapReduce programming model for Big Data Analytics Major components HDFS MapReduce Underlying Hadoop Distributed File System (HDFS) used by both MapReduce and HBase Model scales but high amount of communication during intermediate phases can be further optimized User Applications MapReduce HBase HDFS Hadoop Common (RPC) Hadoop Framework 8
9 Hadoop MapReduce Architecture & Operations Disk Operations Map and Reduce Tasks carry out the total job execution Communication in shuffle phase uses HTTP over Java Sockets 9
10 Hadoop Distributed File System (HDFS) Primary storage of Hadoop; highly reliable and fault-tolerant Adopted by many reputed organizations eg: Facebook, Yahoo! NameNode: stores the file system namespace DataNode: stores data blocks Developed in Java for platformindependence and portability Uses sockets for communication! Client RPC RPC RPC RPC (HDFS Architecture) 1
11 System-Level Interaction Between Clients and Data Nodes in HDFS (HDD/SSD) High Performance Networks (HDD/SSD) (HDD/SSD) (HDFS Clients) (HDFS Data Nodes) 11
12 HBase Apache Hadoop Database ( Semi-structured database, which is highly scalable ZooKeeper Integral part of many datacenter applications eg:- Facebook Social Inbox Developed in Java for platformindependence and portability HBase Client HMaster HRegion Servers (HBase Architecture) HDFS Uses sockets for communication! 12
13 System-Level Interaction Among HBase Clients, Region Servers, and Data Nodes (HDD/SSD) High Performance Networks... High Performance Networks... (HDD/SSD) (HDD/SSD) (HBase Clients) (HRegion Servers) (Data Nodes) 13
14 Memcached Architecture Main memory SSD CPUs HDD... Main memory SSD CPUs HDD!"#$%"$#& High Performance Networks High Performance Networks Main memory SSD CPUs HDD High Performance Networks Main memory SSD CPUs HDD (Database Servers) Main memory CPUs Web Frontend Servers (Memcached Clients) (Memcached Servers) SSD HDD Integral part of Web 2. architecture Distributed Caching Layer Allows to aggregate spare memory from multiple nodes General purpose Typically used to cache database queries, results of API calls Scalable model, but typical usage very network intensive 14
15 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 15
16 Number of Clusters Percentage of Clusters Trends for Commodity Computing Clusters in the Top 5 List ( Percentage of Clusters Number of Clusters Timeline 16
17 HPC and Advanced Interconnects High-Performance Computing (HPC) has adopted advanced interconnects and protocols InfiniBand 1 Gigabit Ethernet/iWARP RDMA over Converged Enhanced Ethernet (RoCE) Very Good Performance Low latency (few micro seconds) High Bandwidth (1 Gb/s with dual FDR InfiniBand) Low CPU overhead (5-1%) OpenFabrics software stack with IB, iwarp and RoCE interfaces are driving HPC systems Many such systems in Top5 list 17
18 Trends of Networking Technologies in TOP5 Systems Percentage share of InfiniBand is steadily increasing Interconnect Family Systems Share 18
19 Use of High-Performance Networks for Scientific Computing Message Passing Interface (MPI) Most commonly used programming model for HPC applications Parallel File Systems Storage Systems Almost 13 years of Research and Development since InfiniBand was introduced in October 2 Other Programming Models are emerging to take advantage of High-Performance Networks UPC OpenSHMEM Hybrid MPI+PGAS (OpenSHMEM and UPC) 19
20 Latency (us) Latency (us) One-way Latency: MPI over IB with MVAPICH Small Message Latency Large Message Latency Qlogic-DDR Qlogic-QDR ConnectX-DDR ConnectX2-PCIe2-QDR ConnectX3-PCIe3-FDR Sandy-ConnectIB-DualFDR Ivy-ConnectIB-DualFDR Message Size (bytes) Message Size (bytes) DDR, QDR GHz Quad-core (Westmere) Intel PCI Gen2 with IB switch FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Deca-core (IvyBridge) Intel PCI Gen3 with IB switch 2
21 Bandwidth (MBytes/sec) Bandwidth (MBytes/sec) Bandwidth: MPI over IB with MVAPICH Unidirectional Bandwidth Bidirectional Bandwidth Qlogic-DDR Qlogic-QDR ConnectX-DDR ConnectX2-PCIe2-QDR ConnectX3-PCIe3-FDR Sandy-ConnectIB-DualFDR Ivy-ConnectIB-DualFDR Message Size (bytes) Message Size (bytes) DDR, QDR GHz Quad-core (Westmere) Intel PCI Gen2 with IB switch FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Octa-core (SandyBridge) Intel PCI Gen3 with IB switch ConnectIB-Dual FDR GHz Deca-core (IvyBridge) Intel PCI Gen3 with IB switch 21
22 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 22
23 Can High-Performance Interconnects Benefit Hadoop? Most of the current Hadoop environments use Ethernet Infrastructure with Sockets Concerns for performance and scalability Usage of High-Performance Networks is beginning to draw interest from many companies What are the challenges? Where do the bottlenecks lie? Can these bottlenecks be alleviated with new designs (similar to the designs adopted for MPI)? Can HPC Clusters with High-Performance networks be used for Big Data applications using Hadoop? 23
24 Designing Communication and I/O Libraries for Big Data Systems: Challenges Applications Benchmarks Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Programming Models (Sockets) Other Protocols? Point-to-Point Communication Communication and I/O Library Threaded Models and Synchronization Virtualization I/O and File Systems QoS Fault-Tolerance Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 24
25 Common Protocols using Open Fabrics Application /Middleware Interface Application/ Middleware Sockets Verbs Protocol Kernel Space TCP/IP TCP/IP RSockets SDP iwarp RDMA RDMA Ethernet Driver IPoIB Hardware Offload User space RDMA User space User space User space Adapter Ethernet Adapter InfiniBand Adapter Ethernet Adapter InfiniBand Adapter InfiniBand Adapter iwarp Adapter RoCE Adapter InfiniBand Adapter Switch Ethernet Switch InfiniBand Switch Ethernet Switch InfiniBand Switch InfiniBand Switch Ethernet Switch Ethernet Switch InfiniBand Switch 1/1/4 GigE IPoIB 1/4 GigE- TOE RSockets SDP iwarp RoCE IB Verbs 25
26 Can Hadoop be Accelerated with High-Performance Networks and Protocols? Current Design Enhanced Designs Our Approach Application Application Application Sockets Accelerated Sockets Verbs / Hardware Offload OSU Design Verbs Interface 1/1 GigE Network 1 GigE or InfiniBand 1 GigE or InfiniBand Sockets not designed for high-performance Stream semantics often mismatch for upper layers (Hadoop and HBase) Zero-copy not available for non-blocking sockets 26
27 Presentation Outline Overview of Hadoop and Memcached Trends in Networking Technologies and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 27
28 RDMA for Apache Hadoop Project High-Performance Design of Hadoop over RDMA-enabled Interconnects High performance design with native InfiniBand and RoCE support at the verbslevel for HDFS, MapReduce, and RPC components Easily configurable for native InfiniBand, RoCE and the traditional socketsbased support (Ethernet and InfiniBand with IPoIB) Current release:.9.8 Based on Apache Hadoop Compliant with Apache Hadoop APIs and applications Tested with Mellanox InfiniBand adapters (DDR, QDR and FDR) RoCE Various multi-core platforms Different file systems with disks and SSDs 28
29 Design Overview of HDFS with RDMA Enables high performance RDMA communication, while supporting traditional socket interface Others Java Socket Interface Applications HDFS Write Java Native Interface (JNI) OSU Design Design Features RDMA-based HDFS write RDMA-based HDFS replication InfiniBand/RoCE support 1/1 GigE, IPoIB Network Verbs RDMA Capable Networks (IB, 1GE/ iwarp, RoCE..) JNI Layer bridges Java based HDFS with communication library written in native code Only the communication part of HDFS Write is modified; No change in HDFS architecture 29
30 Communication Time in HDFS Communication Time (s) GigE IPoIB (QDR) OSU-IB (QDR) 15 1 Reduced by 3% 5 2GB 4GB 6GB 8GB 1GB File Size (GB) Cluster with HDD DataNodes 3% improvement in communication time over IPoIB (QDR) 56% improvement in communication time over 1GigE Similar improvements are obtained for SSD DataNodes N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy and D. K. Panda, High Performance RDMA-Based Design of HDFS over InfiniBand, Supercomputing (SC), Nov 212 3
31 Total Throughput (MBps) Total Throughput (MBps) Evaluations using TestDFSIO GigE IPoIB (QDR) OSU-IB (QDR) Increased by 24% IPoIB (QDR) OSU-IB (QDR) Increased by 61% File Size (GB) File Size (GB) Cluster with 8 HDD Nodes, TestDFSIO with 8 maps Cluster with 4 SSD Nodes, TestDFSIO with 4 maps Cluster with 8 HDD DataNodes, single disk per node 24% improvement over IPoIB (QDR) for 2GB file size Cluster with 4 SSD DataNodes, single SSD per node 61% improvement over IPoIB (QDR) for 2GB file size 31
32 Total Throughput (MBps) Evaluations using TestDFSIO GigE-1HDD IPoIB -1HDD (QDR) OSU-IB -1HDD (QDR) 1GigE-2HDD IPoIB -2HDD (QDR) OSU-IB -2HDD (QDR) 15 Increased by 31% File Size (GB) Cluster with 4 DataNodes, 1 HDD per node 24% improvement over IPoIB (QDR) for 2GB file size Cluster with 4 DataNodes, 2 HDD per node 31% improvement over IPoIB (QDR) for 2GB file size 2 HDD vs 1 HDD 76% improvement for OSU-IB (QDR) 66% improvement for IPoIB (QDR) 32
33 Aggregated Throughput (MBps) Execution Time (s) Evaluations using Enhanced DFSIO of Intel HiBench on SDSC-Gordon 25 2 IPoIB (QDR) OSU-IB (QDR) 16 Increased by 47% IPoIB (QDR) Reduced by 16% OSU-IB (QDR) File Size (GB) File Size (GB) Cluster with 32 DataNodes, single SSD per node 47% improvement in throughput over IPoIB (QDR) for 128GB file size 16% improvement in latency over IPoIB (QDR) for 128GB file size 33
34 Aggregated Throughput (MBps) Execution Time (s) Evaluations using Enhanced DFSIO of Intel HiBench on TACC-Stampede IPoIB (FDR) OSU-IB (FDR) Increased by 64% IPoIB (FDR) OSU-IB (FDR) Reduced by 37% File Size (GB) File Size (GB) Cluster with 64 DataNodes, single HDD per node 64% improvement in throughput over IPoIB (FDR) for 256GB file size 37% improvement in latency over IPoIB (FDR) for 256GB file size 34
35 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 35
36 Design Overview of MapReduce with RDMA Applications MapReduce Job Tracker Task Tracker Map Reduce Design features for RDMA Prefetch/Caching of MOF In-Memory Merge Overlap of Merge & Reduce Enables high performance RDMA communication, while supporting traditional socket interface Java Socket Interface 1/1 GigE, IPoIB Network Java Native Interface (JNI) OSU Design Verbs RDMA Capable Networks (IB, 1GE/ iwarp, RoCE..) JNI Layer bridges Java based MapReduce with communication library written in native code InfiniBand/RoCE support 36
37 Job Execution Time (sec) Job Execution Time (sec) Evaluations using Sort GigE IPoIB (QDR) Hadoop-A-IB (QDR) OSU-IB (QDR) Data Size (GB) 8 HDD DataNodes With 8 HDD DataNodes for 4GB sort 41% improvement over IPoIB (QDR) 39% improvement over Hadoop-A-IB (QDR) GigE IPoIB (QDR) Hadoop-A-IB (QDR) OSU-IB (QDR) Data Size (GB) 8 SSD DataNodes With 8 SSD DataNodes for 4GB sort 52% improvement over IPoIB (QDR) 44% improvement over Hadoop-A-IB (QDR) M. W. Rahman, N. S. Islam, X. Lu, J. Jose, H. Subramon, H. Wang, and D. K. Panda, High-Performance RDMAbased Design of Hadoop MapReduce over InfiniBand, HPDIC Workshop, held in conjunction with IPDPS, May
38 Job Execution Time (sec) Evaluations using TeraSort GigE IPoIB (QDR) Hadoop-A-IB (QDR) OSU-IB (QDR) Data Size (GB) 1 GB TeraSort with 8 DataNodes with 2 HDD per node 51% improvement over IPoIB (QDR) 41% improvement over Hadoop-A-IB (QDR) 38
39 Job Execution Time (sec) Job Execution Time (sec) Performance Evaluation on Larger Clusters 12 1 IPoIB (QDR) OSU-IB (QDR) Hadoop-A-IB (QDR) 7 6 IPoIB (FDR) Hadoop-A-IB (FDR) OSU-IB (FDR) Data Size: 6 GB Data Size: 12 GB Data Size: 24 GB Cluster Size: 16 Cluster Size: 32 Cluster Size: 64 Sort in OSU Cluster For 24GB Sort in 64 nodes 4% improvement over IPoIB (QDR) with HDD used for HDFS Data Size: 8 GB Data Size: 16 GBData Size: 32 GB Cluster Size: 16 Cluster Size: 32 Cluster Size: 64 TeraSort in TACC Stampede For 32GB TeraSort in 64 nodes 38% improvement over IPoIB (FDR) with HDD used for HDFS 39
40 Normalized Execution Time Evaluations using PUMA Workload GigE IPoIB (QDR) OSU-IB (QDR) AdjList (3GB) SelfJoin (8GB) SeqCount (3GB) WordCount (3GB) InvertIndex (3GB) Benchmarks 5% improvement in Self Join over IPoIB (QDR) for 8 GB data size 49% improvement in Sequence Count over IPoIB (QDR) for 3 GB data size 4
41 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 41
42 Time (us) Time (us) Motivation Detailed Analysis on HBase Put/Get Communication Communication Preparation Server Processing Server Serialization Client Processing Client Serialization IPoIB (DDR) 1GigE OSU-IB (DDR) HBase Put 1KB HBase 1KB Put Communication Time 8.9 us A factor of 6X improvement over 1GE for communication time HBase 1KB Get Communication Time 8.9 us A factor of 6X improvement over 1GE for communication time IPoIB (DDR) 1GigE OSU-IB (DDR) Hbase Get 1KB M. W. Rahman, J. Huang, J. Jose, X. Ouyang, H. Wang, N. Islam, H. Subramoni, Chet Murthy and D. K. Panda, Understanding the Communication Characteristics in HBase: What are the Fundamental Bottlenecks?, ISPASS 12 42
43 Time (us) Ops/sec HBase Single-Server-Multi-Client Results IPoIB (DDR) OSU-IB (DDR) 1GigE No. of Clients Latency HBase Get latency 4 clients: 14.5 us; 16 clients: us HBase Get throughput clients: 37.1 Kops/sec; 16 clients: 53.4 Kops/sec 27% improvement in throughput for 16 clients over 1GE No. of Clients Throughput J. Huang, X. Ouyang, J. Jose, M. W. Rahman, H. Wang, M. Luo, H. Subramoni, Chet Murthy, and D. K. Panda, High-Performance Design of HBase with RDMA over InfiniBand, IPDPS 12 43
44 Time (us) Time (us) HBase YCSB Read-Write Workload IPoIB (QDR) 1GigE OSU-IB (QDR) No. of Clients No. of Clients Read Latency Write Latency HBase Get latency (Yahoo! Cloud Service Benchmark) 64 clients: 2. ms; 128 Clients: 3.5 ms 42% improvement over IPoIB for 128 clients HBase Get latency 64 clients: 1.9 ms; 128 Clients: 3.5 ms 4% improvement over IPoIB for 128 clients 44
45 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 45
46 Design Overview of Hadoop RPC with RDMA Applications Design Features Default Java Socket Interface Hadoop RPC Our Design Java Native Interface (JNI) OSU Design Verbs JVM-bypassed buffer management On-demand connection setup RDMA or send/recv based adaptive communication InfiniBand/RoCE support 1/1 GigE, IPoIB Network RDMA Capable Networks (IB, 1GE/ iwarp, RoCE..) Enables high performance RDMA communication, while supporting traditional socket interface 46
47 Latency (us) Throughput (Kops/Sec) Hadoop RPC over IB - Gain in Latency and Throughput GigE IPoIB(QDR) OSU-IB(QDR) Payload Size (Byte) Number of Clients Hadoop RPC over IB PingPong Latency 1 byte: 39 us; 4 KB: 52 us 42%-49% and 46%-5% improvements compared with the performance on 1 GigE and IPoIB (32Gbps), respectively Hadoop RPC over IB Throughput 512 bytes & 48 clients: Kops/sec 82% and 64% improvements compared with the peak performance on 1 GigE and IPoIB (32Gbps), respectively 47
48 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 48
49 Total Throughput (MBps) Total Throughput (MBps) Performance Benefits - TestDFSIO 1GigE IPoIB (QDR) OSU-IB (QDR) GigE IPoIB (QDR) OSU-IB (QDR) Size (GB) Size (GB) For 5GB, Cluster with 4 HDD Nodes, TestDFSIO with total 4 maps Cluster with 4 SSD Nodes, TestDFSIO with total 4 maps 53.4% improvement over IPoIB, 126.9% improvement over 1GigE with HDD used for HDFS 57.8% improvement over IPoIB, 138.1% improvement over 1GigE with SSD used for HDFS For 2GB, 1.7% improvement over IPoIB, 46.8% improvement over 1GigE with HDD used for HDFS 73.3% improvement over IPoIB, 141.3% improvement over 1GigE with SSD used for HDFS 49
50 Job Execution Time (sec) Job Execution Time (sec) Performance Benefits - Sort GigE IPoIB (QDR) OSU-IB (QDR) GigE IPoIB (QDR) OSU-IB (QDR) Sort Size (GB) Sort Size (GB) Cluster with 8 HDD Nodes, Sort with <8 maps, 8 reduces>/host Cluster with 8 SSD Nodes, Sort with <8 maps, 8 reduces>/host For 8GB Sort in a cluster size of 8, 4.7% improvement over IPoIB, 32.2% improvement over 1GigE with HDD used for HDFS 43.6% improvement over IPoIB, 42.9% improvement over 1GigE with SSD used for HDFS 5
51 Job Execution Time (sec) Job Execution Time (sec) Performance Benefits - TeraSort 25 1GigE IPoIB (QDR) OSU-IB (QDR) 25 1GigE IPoIB (QDR) OSU-IB (QDR) Size (GB) Size (GB) Cluster with 8 HDD Nodes, TeraSort with <8 maps, 8reduces>/host Cluster with 8 SSD Nodes, TeraSort with <8 maps, 8reduces>/host For 1GB TeraSort in a cluster size of 8, 45% improvement over IPoIB, 39% improvement over 1GigE with HDD used for HDFS 46% improvement over IPoIB, 4% improvement over 1GigE with SSD used for HDFS 51
52 Job Execution Time (sec) Job Execution Time (sec) Performance Benefits RandomWriter & Sort in SDSC-Gordon IPoIB (QDR) OSU-IB (QDR) IPoIB (QDR) OSU-IB (QDR) 4 Reduced by 2% Reduced by 36% GB 1 GB 2 GB 3 GB 5 GB 1 GB 2 GB 3 GB Cluster: 16 Cluster: 32 Cluster: 48 Cluster: 64 Cluster: 16 Cluster: 32 Cluster: 48 Cluster: 64 RandomWriter in a cluster of single SSD per Sort in a cluster of single SSD per node node 2% improvement over IPoIB (QDR) 16% improvement over IPoIB (QDR) for for 5GB in a cluster of 16 5GB in a cluster of 16 36% improvement over IPoIB (QDR) 2% improvement over IPoIB (QDR) for 3GB in a cluster of 64 for 3GB in a cluster of 64 52
53 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 53
54 Memcached-RDMA Design Sockets Client 1 2 Master Thread Sockets Worker Thread Shared Data RDMA Client 1 2 Sockets Worker Thread Verbs Worker Thread Verbs Worker Thread Memory Slabs Items Server and client perform a negotiation protocol Master thread assigns clients to appropriate worker thread Once a client is assigned a verbs worker thread, it can communicate directly and is bound to that thread All other Memcached data structures are shared among RDMA and Sockets worker threads 54
55 Time%(us)% Time%(us)% Time (us) Time (us) Memcached Get Latency (Small Message) Time%(us)% Time%(us)% Time%(us)% Time%(us)% K 2K Message Size Intel Clovertown Cluster (IB: DDR) Memcached Get latency 4 bytes RC/UD DDR: 6.82/7.55 us; QDR: 4.28/4.86 us 2K bytes RC/UD DDR: 12.31/12.78 us; QDR: 8.19/8.46 us Almost factor of four improvement over 1GE (TOE) for 2K bytes on the DDR cluster Message%Size% Message%Size% IPoIB (DDR) 1GigE (DDR) OSU-RC-IB (DDR) OSU-UD-IB (DDR) Time%(us)% 1 Time%(us)% Message%Size% Message%Size% K 2K Message Size IPoIB (QDR) 1GigE (QDR) Intel Westmere Cluster (IB: QDR) Message%Size% Message%Size% OSU-RC-IB (QDR) OSU-UD-IB (QDR) Message%Size% Message%Size% 55
56 Thousands of Transactions per second (TPS) Thousands of Transactions per second (TPS) Time%(us)% Time%(us)% Time%(us)% Memcached Get TPS (4byte) IPoIB (QDR) OSU-RC-IB (QDR) OSU-UD-IB (QDR) No. of Clients Memcached Get transactions per second for 4 bytes On IB QDR 1.4M/s (RC), 1.3 M/s (UD) for 8 clients K No. of Clients Message%Size% Message%Size% Significant improvement with native IB QDR compared to SDP and IPoIB Message%Size 56
57 K 2K 4K Time (us) Thousands of Transactions per Second (TPS) 1 1 Memcached Performance (FDR Interconnect) 1 Memcached GET Latency OSU-IB (FDR) IPoIB (FDR) Latency Reduced by nearly 2X Memcached Throughput 2X 1 Message Size Memcached Get latency Experiments on TACC Stampede (Intel SandyBridge Cluster, IB: FDR) 4 bytes OSU-IB: 2.84 us; IPoIB: us 2K bytes OSU-IB: 4.49 us; IPoIB: us Memcached Throughput (4bytes) 48 clients OSU-IB: 556 Kops/sec, IPoIB: 233 Kops/s Nearly 2X improvement in throughput No. of Clients 57
58 Time (ms) Time (ms) Application Level Evaluation Olio Benchmark 6 25 IPoIB (QDR) 5 2 OSU-RC-IB (QDR) OSU-UD-IB (QDR) 4 3 Time%(ms)% 15 OSU-Hybrid-IB (QDR) No. of Clients No.%of%Clients% No. of Clients Olio Benchmark RC 1.6 sec, UD 1.9 sec, Hybrid 1.7 sec for 124 clients 4X times better than IPoIB for 8 clients Hybrid design achieves comparable performance to that of pure RC design 58
59 Time (ms) Time (ms) Application Level Evaluation Real Application Workloads Time%(ms)% IPoIB (QDR) OSU-RC-IB (QDR) OSU-UD-IB (QDR) OSU-Hybrid-IB (QDR) No. of Clients Real Application Workload RC 32 ms, UD 318 ms, Hybrid 314 ms for 124 clients 12X times better than IPoIB for 8 clients Hybrid design achieves comparable performance to that of pure RC design No.%of%Clients% No. of Clients J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, W. Rahman, N. Islam, X. Ouyang, H. Wang, S. Sur and D. K. Panda, Memcached Design on High Performance RDMA Capable Interconnects, ICPP 11 J. Jose, H. Subramoni, K. Kandalla, W. Rahman, H. Wang, S. Narravula, and D. K. Panda, Scalable Memcached design for InfiniBand Clusters using Hybrid Transport, CCGrid 12 59
60 Presentation Outline Overview of Hadoop and Memcached Trends in High Performance Networking and Protocols Challenges in Accelerating Hadoop and Memcached Designs and Case Studies Hadoop HDFS MapReduce HBase RPC Combination of components Memcached Open Challenges Conclusion and Q&A 6
61 Designing Communication and I/O Libraries for Big Data Systems: Solved a Few Initial Challenges Applications Benchmarks Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Upper level Changes? Programming Models (Sockets) Other RDMA Protocols? Point-to-Point Communication Communication and I/O Library Threaded Models and Synchronization Virtualization I/O and File Systems QoS Fault-Tolerance Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 61
62 More Challenges Multi-threading and Synchronization Multi-threaded model exploration Fine-grained synchronization and lock-free design Unified helper threads for different components Multi-endpoint design to support multi-threading communications QoS and Virtualization Network virtualization and locality-aware communication for Big Data middleware Hardware-level virtualization support for End-to-End QoS I/O scheduling and storage virtualization Live migration 62
63 More Challenges (Cont d) Support of Accelerators Efficient designs for Big Data middleware to take advantage of NVIDA GPGPUs and Intel MICs Offload computation-intensive workload to accelerators Explore maximum overlapping between communication and offloaded computation Fault Tolerance Enhancements Novel data replication schemes for high-efficiency fault tolerance Exploration of light-weight fault tolerance mechanisms for Big Data processing Support of Parallel File Systems Optimize Big Data middleware over parallel file systems (e.g. Lustre) on modern HPC clusters 63
64 Job Execution Time (sec) Job Execution Time (sec) Performance Improvement Potential of MapReduce over Lustre on HPC Clusters An Initial Study IPoIB (FDR) OSU-IB (FDR) 2 IPoIB (QDR) OSU-IB (QDR) Data Size (GB) Data Size (GB) Sort in TACC Stampede Sort in SDSC Gordon For 5GB Sort in 64 nodes For 1GB Sort in 16 nodes 44% improvement over IPoIB (FDR) 33% improvement over IPoIB (QDR) Can more optimizations be achieved by leveraging more features of Lustre? 64
65 Are the Current Benchmarks Sufficient for Big Data? The current benchmarks provide some performance behavior However, do not provide any information to the designer/developer on: What is happening at the lower-layer? Where the benefits are coming from? Which design is leading to benefits or bottlenecks? Which component in the design needs to be changed and what will be its impact? Can performance gain/loss at the lower-layer be correlated to the performance gain/loss observed at the upper layer? 65
66 Challenges in Benchmarking of RDMA-based Designs Applications Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Programming Models (Sockets) Benchmarks Other RDMA Protocols? Current Benchmarks Correlation? Communication and I/O Library Point-to-Point Communication Threaded Models and Synchronization Virtualization No Benchmarks I/O and File Systems QoS Fault-Tolerance Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 66
67 OSU MPI Micro-Benchmarks (OMB) Suite A comprehensive suite of benchmarks to Compare performance of different MPI libraries on various networks and systems Validate low-level functionalities Provide insights to the underlying MPI-level designs Started with basic send-recv (MPI-1) micro-benchmarks for latency, bandwidth and bi-directional bandwidth Extended later to MPI-2 one-sided Collectives GPU-aware data movement OpenSHMEM (point-to-point and collectives) UPC Has become an industry standard Extensively used for design/development of MPI libraries, performance comparison of MPI libraries and even in procurement of large-scale systems Available from Available in an integrated manner with MVAPICH2 stack 67
68 Iterative Process Requires Deeper Investigation and Design for Benchmarking Next Generation Big Data Systems and Applications Applications Benchmarks Big Data Middleware (HDFS, MapReduce, HBase and Memcached) Applications- Level Benchmarks Programming Models (Sockets) Other RDMA Protocols? Communication and I/O Library Point-to-Point Communication I/O and File Systems Threaded Models and Synchronization QoS Virtualization Fault-Tolerance Micro- Benchmarks Networking Technologies (InfiniBand, 1/1/4GigE and Intelligent NICs) Commodity Computing System Architectures (Multi- and Many-core architectures and accelerators) Storage Technologies (HDD and SSD) 68
69 Future Plans of OSU Big Data Project Upcoming RDMA for Apache Hadoop Releases will support HDFS Parallel Replication Advanced MapReduce HBase Hadoop 2..x Memcached-RDMA Exploration of other components (Threading models, QoS, Virtualization, Accelerators, etc.) Advanced designs with upper-level changes and optimizations Comprehensive set of Benchmarks More details at 69
70 Concluding Remarks Presented an overview of Big Data, Hadoop and Memcached Provided an overview of Networking Technologies Discussed challenges in accelerating Hadoop Presented initial designs to take advantage of InfiniBand/RDMA for HDFS, MapReduce, RPC, HBase and Memcached Results are promising Many other open issues need to be solved Will enable Big Data community to take advantage of modern HPC technologies to carry out their analytics in a fast and scalable manner 7
71 Personnel Acknowledgments Current Students S. Chakraborthy (Ph.D.) M. Rahman (Ph.D.) N. Islam (Ph.D.) D. Shankar (Ph.D.) J. Jose (Ph.D.) R. Shir (Ph.D.) M. Li (Ph.D.) A. Venkatesh (Ph.D.) S. Potluri (Ph.D.) J. Zhang (Ph.D.) R. Rajachandrasekhar (Ph.D.) Past Students P. Balaji (Ph.D.) W. Huang (Ph.D.) D. Buntinas (Ph.D.) W. Jiang (M.S.) S. Bhagvat (M.S.) S. Kini (M.S.) L. Chai (Ph.D.) M. Koop (Ph.D.) B. Chandrasekharan (M.S.) R. Kumar (M.S.) N. Dandapanthula (M.S.) S. Krishnamoorthy (M.S.) V. Dhanraj (M.S.) K. Kandalla (Ph.D.) T. Gangadharappa (M.S.) P. Lai (M.S.) K. Gopalakrishnan (M.S.) J. Liu (Ph.D.) Current Senior Research Associate H. Subramoni Current Post-Docs Current Programmers X. Lu M. Arnold K. Hamidouche J. Perkins M. Luo (Ph.D.) G. Santhanaraman (Ph.D.) A. Mamidala (Ph.D.) A. Singh (Ph.D.) G. Marsh (M.S.) J. Sridhar (M.S.) V. Meshram (M.S.) S. Sur (Ph.D.) S. Naravula (Ph.D.) H. Subramoni (Ph.D.) R. Noronha (Ph.D.) K. Vaidyanathan (Ph.D.) X. Ouyang (Ph.D.) A. Vishnu (Ph.D.) S. Pai (M.S.) J. Wu (Ph.D.) W. Yu (Ph.D.) Past Post-Docs H. Wang X. Besseron H.-W. Jin E. Mancini S. Marcarelli J. Vienne Past Research Scientist S. Sur Past Programmers D. Bureddy M. Luo 71
72 Pointers
Accelerating Big Data Processing with Hadoop, Spark and Memcached
Accelerating Big Data Processing with Hadoop, Spark and Memcached Talk at HPC Advisory Council Stanford Conference (Feb 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationCan High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationAccelerating Big Data Processing with Hadoop, Spark and Memcached
Accelerating Big Data Processing with Hadoop, Spark and Memcached Talk at HPC Advisory Council Switzerland Conference (Mar 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationA Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
More informationAccelerating Spark with RDMA for Big Data Processing: Early Experiences
Accelerating Spark with RDMA for Big Data Processing: Early Experiences Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, Dip7 Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
More informationAccelera'ng Big Data Processing with Hadoop, Spark and Memcached
Accelera'ng Big Data Processing with Hadoop, Spark and Memcached Talk at HPC Advisory Council Switzerland Conference (Mar 15) by Dhabaleswar K. (DK) Panda The Ohio State University E- mail: panda@cse.ohio-
More informationRDMA over Ethernet - A Preliminary Study
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement
More informationEnabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
More informationHari Subramoni. Education: Employment: Research Interests: Projects:
Hari Subramoni Senior Research Associate, Dept. of Computer Science and Engineering The Ohio State University, Columbus, OH 43210 1277 Tel: (614) 961 2383, Fax: (614) 292 2911, E-mail: subramoni.1@osu.edu
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationHigh Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department
More informationUnstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationExploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
More informationAccelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects
Slides are available from http://www.cse.ohio-state.edu/~panda/hoti15-bigdata-tut.pdf Accelerating Big Data Processing with Hadoop, Spark, and Memcached over High-Performance Interconnects Tutorial at
More informationHigh-Performance Networking for Optimized Hadoop Deployments
High-Performance Networking for Optimized Hadoop Deployments Chelsio Terminator 4 (T4) Unified Wire adapters deliver a range of performance gains for Hadoop by bringing the Hadoop cluster networking into
More informationHadoop Optimizations for BigData Analytics
Hadoop Optimizations for BigData Analytics Weikuan Yu Auburn University Outline WBDB, Oct 2012 S-2 Background Network Levitated Merge JVM-Bypass Shuffling Fast Completion Scheduler WBDB, Oct 2012 S-3 Emerging
More informationRDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS
RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationHigh-Performance Design of HBase with RDMA over InfiniBand
212 IEEE 26th International Parallel and Distributed Processing Symposium High-Performance Design of HBase with RDMA over InfiniBand Jian Huang 1, Xiangyong Ouyang 1, Jithin Jose 1, Md. Wasi-ur-Rahman
More informationSMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationCan High-Performance Interconnects Benefit Hadoop Distributed File System?
Can High-Performance Interconnects Benefit Hadoop Distributed File System? Sayantan Sur, Hao Wang, Jian Huang, Xiangyong Ouyang and Dhabaleswar K. Panda Department of Computer Science and Engineering,
More informationA Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks Dipti Shankar (B), Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Department of Computer
More informationRDMA for Apache Hadoop 0.9.9 User Guide
0.9.9 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)
More informationComparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,
More informationDesigning Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand
Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand Ping Lai, Hari Subramoni, Sundeep Narravula, Amith Mamidala, Dhabaleswar K. Panda Department of Computer Science and
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar
More informationMapReduce Evaluator: User Guide
University of A Coruña Computer Architecture Group MapReduce Evaluator: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño December 9, 2014 Contents 1 Overview
More informationBuilding Enterprise-Class Storage Using 40GbE
Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance
More informationSockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationApplication and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster
Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni (mahidhar@sdsc.edu) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data
More informationMellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct
Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationSolving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
More informationHadoop Deployment and Performance on Gordon Data Intensive Supercomputer!
Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer! Mahidhar Tatineni, Rick Wagner, Eva Hocks, Christopher Irving, and Jerry Greenberg! SDSC! XSEDE13, July 22-25, 2013! Overview!!
More informationInfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment
December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used
More informationTCP/IP Implementation of Hadoop Acceleration. Cong Xu
TCP/IP Implementation of Hadoop Acceleration by Cong Xu A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Science Auburn,
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationElasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationUnderstanding the Significance of Network Performance in End Applications: A Case Study with EtherFabric and InfiniBand
Understanding the Significance of Network Performance in End Applications: A Case Study with EtherFabric and InfiniBand K. VAIDYANATHAN, S. BHAGVAT, P. BALAJI AND D. K. PANDA Technical Report Ohio State
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationAnalysis and Optimization of Massive Data Processing on High Performance Computing Architecture
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National
More informationMichael Kagan. michael@mellanox.com
Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies michael@mellanox.com Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service
More informationSimplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions
Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions 64% of organizations were investing or planning to invest on Big Data technology
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationInfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair
InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures Brian Sparks IBTA Marketing Working Group Co-Chair Page 1 IBTA & OFA Update IBTA today has over 50 members; OFA
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationSMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation
SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel
More informationECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationHigh Throughput File Servers with SMB Direct, Using the 3 Flavors of RDMA network adapters
High Throughput File Servers with SMB Direct, Using the 3 Flavors of network adapters Jose Barreto Principal Program Manager Microsoft Corporation Abstract In Windows Server 2012, we introduce the SMB
More informationFTP 100: An Ultra-High Speed Data Transfer Service over Next Generation 100 Gigabit Per Second Network
FTP 100: An Ultra-High Speed Data Transfer Service over Next Generation 100 Gigabit Per Second Network Award Number: 53662 - US Department o f Energy Aw ard Title: ARRA: TAS::89 0227:: TASRecovery A ct
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationImproving High Performance Networking Technologies. for Data Center Clusters
Improving High Performance Networking Technologies for Data Center Clusters by Ryan Eric Grant A thesis submitted to the Department of Electrical and Computer Engineering In conformity with the requirements
More informationSR-IOV In High Performance Computing
SR-IOV In High Performance Computing Hoot Thompson & Dan Duffy NASA Goddard Space Flight Center Greenbelt, MD 20771 hoot@ptpnow.com daniel.q.duffy@nasa.gov www.nccs.nasa.gov Focus on the research side
More informationA Case for Flash Memory SSD in Hadoop Applications
A Case for Flash Memory SSD in Hadoop Applications Seok-Hoon Kang, Dong-Hyun Koo, Woon-Hak Kang and Sang-Won Lee Dept of Computer Engineering, Sungkyunkwan University, Korea x860221@gmail.com, smwindy@naver.com,
More informationECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationWhy Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics
Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400
More informationWindows 8 SMB 2.2 File Sharing Performance
Windows 8 SMB 2.2 File Sharing Performance Abstract This paper provides a preliminary analysis of the performance capabilities of the Server Message Block (SMB) 2.2 file sharing protocol with 10 gigabit
More informationAn In-Memory RDMA-Based Architecture for the Hadoop Distributed Filesystem
An In-Memory RDMA-Based Architecture for the Hadoop Distributed Filesystem Master Thesis Konstantinos Karampogias August 21, 2012 Supervisor: Advisors: Prof. Dr. Bernhard Plattner Dr. Xenofontas Dimitropoulos
More informationZadara Storage Cloud A whitepaper. @ZadaraStorage
Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers
More informationModernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com
DDN Technical Brief Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. A Fundamentally Different Approach To Enterprise Analytics Architecture: A Scalable Unit
More informationIntroduction to Infiniband. Hussein N. Harake, Performance U! Winter School
Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap
More informationSelf service for software development tools
Self service for software development tools Michal Husejko, behalf of colleagues in CERN IT/PES CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Self service for software development tools
More informationA Tour of the Linux OpenFabrics Stack
A Tour of the OpenFabrics Stack Johann George, QLogic June 2006 1 Overview Beyond Sockets Provides a common interface that allows applications to take advantage of the RDMA (Remote Direct Memory Access),
More informationData-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationHow To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)
INFINIBAND NETWORK ANALYSIS AND MONITORING USING OPENSM N. Dandapanthula 1, H. Subramoni 1, J. Vienne 1, K. Kandalla 1, S. Sur 1, D. K. Panda 1, and R. Brightwell 2 Presented By Xavier Besseron 1 Date:
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationExtending Hadoop beyond MapReduce
Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core
More informationHadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013
Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?
More informationFLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
More informationWrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney
Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationEvaluation of ConnectX Virtual Protocol Interconnect for Data Centers
Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. Grant 1 Ahmad Afsahi 1 Pavan Balaji 2 1 Department of Electrical and Computer Engineering Queen s University, Kingston, ON,
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationState of the Art Cloud Infrastructure
State of the Art Cloud Infrastructure Motti Beck, Director Enterprise Market Development WHD Global I April 2014 Next Generation Data Centers Require Fast, Smart Interconnect Software Defined Networks
More informationThe Greenplum Analytics Workbench
The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationNew Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
More informationRAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
More informationHigh Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman
High Performance OpenStack Cloud Eli Karpilovski Cloud Advisory Council Chairman Cloud Advisory Council Our Mission Development of next generation cloud architecture Providing open specification for cloud
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationMicrosoft SMB 2.2 - Running Over RDMA in Windows Server 8
Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8 Tom Talpey, Architect Microsoft March 27, 2012 1 SMB2 Background The primary Windows filesharing protocol Initially shipped in Vista and Server
More information