Hadoop Optimizations for BigData Analytics

Size: px

Start display at page:

Download "Hadoop Optimizations for BigData Analytics"

Edward Banks
8 years ago
Views:

1 Hadoop Optimizations for BigData Analytics Weikuan Yu Auburn University

2 Outline WBDB, Oct 2012 S-2 Background Network Levitated Merge JVM-Bypass Shuffling Fast Completion Scheduler

3 WBDB, Oct 2012 S-3 Emerging Demand for BigData Analytics Big demand from many organizations in various domains Scalable computing power without worrying about system maintenance. Ubiquitously accessible computing and storage resources. Low cost, highly reliable, trusted computing infrastructure. Commercial companies are gearing up resources for BigData

maintenance. Ubiquitously accessible computing and storage resources.

4 MapReduce l l l l A simple data processing model to process big data Designed for commodity off-the-shelf hardware components. Strong merits for big data analytics l l Scalability: increase throughput by increasing # of nodes Fault-tolerance (quick and low cost recovery of the failures of tasks) Hadoop, An open-source implementation of MapReduce: l Widely deployed by many big data companies: AOL, Baidu, EBay, Facebook, IBM, NY Times, Yahoo!. WBDB, Oct 2012, S-4

Strong merits for big data analytics l l Scalability: increase throughput by increasing # of nodes Fault-tolerance

5 WBDB, Oct 2012 S-5 High-Level Overview of Hadoop l l l HDFS and the MapReduce Framework Data processing with MapTasks and ReduceTasks Three main steps of data movement. l Intermediate data shuffling in the MapReduce is time-consuming Applications JobTracker Job Submission Task Tracker/Runner 1 HDFS 3 Task Tracker/Runner MapTasks 2 Shuffle Intermediate Data ReduceTasks

l Intermediate data shuffling in the MapReduce is time-consuming Applications JobTracker Job

6 Outline WBDB, Oct 2012 S-6 Background Network Levitated Merge JVM-Bypass Shuffling Fast Completion Scheduler

7 WBDB, Oct 2012 S-7 Motivation for Network-Levitated Merge 1: Serialization between shuffle/merge and reduce phases shuffle map merge reduce Start First MOF Serialization Time

8 WBDB, Oct 2012 S-8 Repetitive Merges and Disk Access Hadoop data spilling controlled through parameters To limit the number of outstanding files An example with io.sort.factor=3 1: merge more 2: insert 3: merge 4: to merge soon

limit the number of outstanding files An example with io.

9 WBDB, Oct 2012 S-9 Hadoop Acceleration (Hadoop-A) Pipelined shuffle, merge and reduce Network-levitated data merge Hadoop JobTracker TaskTracker TaskTracker Java MapTask ReduceTask C++ NetMerger Data Engine MOFSupplier RDMA Server RDMA Client Fetch Manager Merged Data Merging Thread Merge Manager RDMA Interconnects Acceleration

MapTask ReduceTask C++ NetMerger Data Engine MOFSupplier RDMA Server RDMA Client

10 WBDB, Oct 2012 S-10 Pipelined Data Shuffle, Merge and Reduce shuffle map header merge reduce PQ setup start First MOF Last MOF Time

11 WBDB, Oct 2012 S-11 Network-Levitated Merge Algorithm S1 S1 Merge Point <k1,v1>, S2 S3 S2 S3 <k2,v2>, <k3,v3>, (a) Fetching Header (b) Priority Queue Setup S2 <k2,v2>, <k2,v2 >, S1 <k1,v1>, <k1,v1 >, S1 <k1,v1>, <k1,v1 > S2 <k2,v2>, <k2,v2 >, S3 <k3,v3>, Merged Data: <k1,v1><k2,v2><k3,v3>, <k3,v3 >, S3 <k3,v3>, <k3,v3 >, Merged Data: <k1,v1><k2,v2><k3,v3>,,<k2,v2 ><k1,v1 ><k3,v3 >, (c) Concurrent Fetching & Merging (d) Towards Completion

S2 <k2,v2>, <k2,v2 >, S3 <k3,v3>, Merged Data: <k1,v1><k2,v2><k3,v3>, <k3,v3 >, S3 <k3,v3>, <k3,v3 >, Merged

12 WBDB, Oct 2012 S-12 Job Progression with Network-levitated Merge a) Hadoop-A speeds up the execution time by more than 47% b) Both MapTasks and ReduceTasks are improved Hadoop-A (Map) Hadoop on IPoIB (Map) Hadoop on GigE (Map) Hadoop-A (Reduce) Hadoop on IPoIB (Reduce) Hadoop on GigE (Reduce) Progress (%) Progress (%) Time (sec) Time (sec) a) Map Progress of TeraSort b) Reduce Progress of TeraSort

Hadoop-A (Reduce) Hadoop on IPoIB (Reduce) Hadoop on GigE (Reduce) Progress (%) 100 80 60 Progress (%) 100 80 60 40 40 20

13 WBDB, Oct 2012 S-13 Breakdown of ReduceTask Execution Time (sec) Significantly reduced the execution time of ReduceTasks Most came from reduced shuffle/merge time An improvement of 2.5 times Also improved the time to reduce data An improvement of 15% Category PQ-Setup Shuffle/Merge Reduce or Merge/Reduce Hadoop-GigE (65.0%) (35.0%) Hadoop-IPoIB (65.9%) (34.1%) Hadoop-A (47.4%) (52.6%)

5 times Also improved the time to reduce data An improvement of 15% Category PQ-Setup Shuffle/Merge Reduce

14 Outline WBDB, Oct 2012 S-14 Background Network Levitated Merge JVM-Bypass Shuffling Fast Completion Scheduler

15 JVM-Dependent Intermediate Data Shuffling MapTask Map TaskTracker HttpServlet JobTracker TaskTracker ReduceTask MOF1 MOF2 MOF1 MOF2 MOF1 MOF2 Staging HttpServlet Staging HttpServlet Staging TCP/IP-Only MOFCopiers Sort/Merge Reduce HDFS Heavily relies on Java WBDB, Oct 2012, S-15

MOF1 MOF2 Staging HttpServlet Staging HttpServlet Staging TCP/IP-Only

16 JVM-Bypass Shuffling (JBS) JBS removes JVM from the critical path of intermediate data shuffling JBS is a portable library supporting both TCP/IP and RDMA protocols Data Analytics Applications MapTask TaskTracker ReduceTask HTTP Servlet Java C Sockets TCP/IP HTTP GET JVM-Bypass C JVM-Bypass Shuffling (JBS) MOFSupplier NetMerger RDMA Verbs, TCP/IP Ethernet InfiniBand/Ethernet WBDB, Oct 2012, S-16

MapTask TaskTracker ReduceTask HTTP Servlet Java C Sockets TCP/IP HTTP GET JVM-Bypass C JVM-Bypass

17 Benefits of JBS: 1/10 Gigabit Ethernets JBS is effective for intermediate data of different sizes Ø Using Terasort benchmark, size of intermediate data = size of input data JBS reduces the execution time by 20.9% on average in 1GigE, 19.3% on average in 10GigE Terasort Job Execution Time (sec) Hadoop on 1GigE JBS on 1GigE Terasort Job Execution Time (sec) Hadoop on 10GigE JBS on 10GigE Input Data Size (GB) Input Data Size (GB) (a): 1 Gigabit Ethernet (b): 10 Gigabit Ethernet WBDB, Oct 2012, S-17

3% on average in 10GigE Terasort Job Execution Time (sec) 2000 1500 1000 500 Hadoop on 1GigE JBS on 1GigE Terasort Job Execution Time (sec)

18 Benefits of JBS: InfiniBand Cluster JBS on IPoIB outperforms Hadoop on IPoIB and SDP by 14.1%, 14.8%, respectively. Hadoop performs similarly when using IPoIB or SDP. Terasort Job Execution Time (sec) Hadoop on IPoIB Hadoop on SDP JBS on IPoIB Input Data Size (GB) WBDB, Oct 2012, S-18

Hadoop performs similarly when using IPoIB or SDP.

19 Outline WBDB, Oct 2012 S-19 Background Network Levitated Merge JVM-Bypass Shuffling Fast Completion Scheduler

20 Hadoop Fair Scheduler WBDB, Oct 2012 S-20 Scheduler assigns tasks to the TaskTrackers Tasks occupy slots until completion or failure Slot-M5 J-2 J-3 J-3 Slot-M4 J-2 J-3 J-3 Slot-M3 Slot-M2 Slot-M1 J-1 J-1 J-1 J-2 J-3 J-2 J-2 J-2 J-2 shuffle reduce Slot-R3 Slot-R2 Slot-R Job Arrival Time

J-3 J-3 Slot-M4 J-2 J-3 J-3 Slot-M3 Slot-M2 Slot-M1 J-1 J-1 J-1 J-2 J-3

21 WBDB, Oct 2012 S-21 Fair Completion Scheduler Prioritize ReduceTasks based on the shortest remaining map phase When remaining map phases are equal, prioritize ReduceTasks of jobs with least remaining reduce data Track the slowdown of preempted ReduceTasks Prevent large jobs from being preempted for too long

22 Average ReduceTask Waiting Time WBDB, Oct 2012 S-22 ReduceTasks in small jobs are significantly speedup Average ReduceTask Waiting Time (sec) ,5 FCS 12,4 22 HFS Groups

23 WBDB, Oct 2012 S-23 Conclusions Examined the design and architecture of Hadoop MapReduce framework and reveal critical issues faced by the existing implementation Designed and implemented Hadoop-A as an extensible acceleration framework which addresses all these issues Provided JVM-Bypass Shuffling to avoid JVM overhead, meanwhile we enable it to be a portable library that can run on both TCP/IP and RDMA protocols. Designed and Implemented Fast Completion Scheduler for fast job completion and job fairness.

24 Sponsors of our research WBDB, Oct 2012, S-24

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume