Understanding Hadoop Performance on Lustre

Similar documents
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

CSE-E5430 Scalable Cloud Computing Lecture 2

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Accelerating and Simplifying Apache

Hadoop Architecture. Part 1

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Big Fast Data Hadoop acceleration with Flash. June 2013

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Developing a MapReduce Application

COURSE CONTENT Big Data and Hadoop Training

THE HADOOP DISTRIBUTED FILE SYSTEM

Performance and Energy Efficiency of. Hadoop deployment models

HADOOP MOCK TEST HADOOP MOCK TEST II

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Peers Techno log ies Pv t. L td. HADOOP

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Enabling High performance Big Data platform with RDMA

Hadoop Job Oriented Training Agenda

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

HiBench Introduction. Carson Wang Software & Services Group

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Use of Hadoop File System for Nuclear Physics Analyses in STAR

ITG Software Engineering

Duke University

Hadoop* on Lustre* Liu Ying High Performance Data Division, Intel Corporation

MapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment

H2O on Hadoop. September 30,

Performance measurement of a Hadoop Cluster

The Performance Characteristics of MapReduce Applications on Scalable Clusters

MapReduce Evaluator: User Guide

Evaluating MapReduce and Hadoop for Science

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Complete Java Classes Hadoop Syllabus Contact No:

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

How To Use Hadoop

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Optimize the execution of local physics analysis workflows using Hadoop

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Intro to Map/Reduce a.k.a. Hadoop

PassTest. Bessere Qualität, bessere Dienstleistungen!

A Performance Analysis of Distributed Indexing using Terrier

HADOOP PERFORMANCE TUNING

HDFS: Hadoop Distributed File System

A very short Intro to Hadoop

Open source Google-style large scale data analysis with Hadoop

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Hadoop Ecosystem B Y R A H I M A.

Chapter 7. Using Hadoop Cluster and MapReduce

CS 378 Big Data Programming. Lecture 5 Summariza9on Pa:erns

Big Data With Hadoop

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

HADOOP MOCK TEST HADOOP MOCK TEST I

Storage Architectures for Big Data in the Cloud

Dell Reference Configuration for Hortonworks Data Platform

Big Data for Big Science. Bernard Doering Business Development, EMEA Big Data Software

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

MapReduce and Hadoop Distributed File System

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Herodotos Herodotou Shivnath Babu. Duke University

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer!

Certified Big Data and Apache Hadoop Developer VS-1221

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

CS 378 Big Data Programming. Lecture 2 Map- Reduce

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)

I/O Considerations in Big Data Analytics

Introduction to Hadoop on the SDSC Gordon Data Intensive Cluster"

The Greenplum Analytics Workbench

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Getting to know Apache Hadoop

Transcription:

Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15 April 2015

Our live demo of Hadoop on Lustre at Supercomputing November 17-20, 2014 Seagate Confidential 2

Why Hadoop on Lustre? LustreFS Hadoop Plugin Our Test System HDFS vs Lustre Benchmarks Performance experiments on Lustre and Hadoop Configuration Diskless Node Optimization Final Thoughts

Why Hadoop on Lustre? Lustre is POSIX compliant and convenient for a wide range of workflows and applications, while HDFS is a distributed object store designed narrowly for the write once, read many Hadoop paradigm Disaggregate computation and storage HDFS requires storage to accompany compute, with Lustre each part can be scaled or resized according to needs Lower Storage Overhead Lustre is typically backed by RAID with ~40% overhead, while HDFS has a default 200% overhead Speed In our testing we see nearly 30% TeraSort v2 Performance boost over HDFS. When transfer time is considered the improvement is over 50% Seagate Confidential 4

Hadoop on HDFS and on Lustre Data Flow Typical Workflow: Ingestion of Data before Analysis 1) Transfer (and usually duplicate) data on HDFS 2) Analyze data 1) Analyze data on Lustre Compute Compute Lustre Storage HDFS Storage Lustre Storage HDFS Storage Seagate Confidential 5

LustreFS Plugin It is a single Java jar file that needs to be placed in the CLASSPATH of whatever service that needs to access Lustre file Requires minimal changes to Hadoop configuration XML files Instead of file:/// or hdfs://hostname:8020/, files are accessed using lustrefs:/// HDFS can co-exist with this plugin, and jobs can run between the two file systems seamlessly Hadoop service accounts (e.g. yarn or mapred ) need to have permissions to a directory hierarchy on Lustre for temporary and other accounting data Freely available at http://github.com/seagate/lustrefs today, with documentation A PR has been issued to include the plugin with Apache Hadoop Seagate Confidential 6

Our Hadoop on Lustre Development Cluster All benchmarks shown were performed on this system Hadoop Rack: 1 Namenode, 18 Datanodes 12 cores & 12 HDDs per Datanode, total of 216 for both 19x10 GbE connection within rack for Hadoop traffic Each Hadoop node has a second 10GbE connection to the ClusterStor network for a total of 190Gb full duplex bandwidth ClusterStor Rack: 2 ClusterStor 6000 SSUs 160 data HDDs Seagate Confidential 7

How Much Faster is Hadoop on Lustre? Terasort timings Completion Time (s) Transfer Time (s) Analysis Time (s) 1500 1000 500 63% Faster Overall Data Transfer not Needed 38% Faster Analysis 53% Faster Overall Data Transfer not Needed 28% Faster Analysis Faster 0 Hadoop 1 on HDFS Hadoop 1 on Lustre Hadoop 2 on HDFS Hadoop 2 on Lustre Seagate Confidential 8

How Much Faster is Hadoop on Lustre? Ecosystem Timings Compared to HDFS (transfer times not included) HoL Improvement over HDFS 40% 30% 20% 10% 0% -10% 38% 28% 17% 15% 8% 5% 6% 5% 1% -7% -1% 0% TeraSort Sort Wordcount Mahout Pig Hive Hadoop 1 Hadoop 2 All remaining benchmarks presented were performed on Hadoop v2 Seagate Confidential 9

Hadoop on Lustre Performance Stripe Size Performance Stripe size is not a strong factor in Hadoop performance, except perhaps for 1MB stripes in some cases Recommend 4MB or greater Seagate Confidential 10

Hadoop on Lustre Performance max_rpcs_in_flight and max_dirty_mb Terasort performance on 373GB with 4MB stripe size, 216 Mappers and 108 Reducers RPCs 8 16 128 Dirty MB 32 128 512 Time 5m04s 5m03s 5m07s Changing these values does not strongly affect performance. Seagate Confidential 11

Hadoop on Lustre Performance Number of Mappers and Reducers Terasort 373GB The number of mappers and reducers has a much stronger effect on performance than any Lustre client configuration Seagate Confidential 12

Hadoop on Lustre Performance Some additional observations When it comes to Terasort, and likely other I/O intensive jobs, most performance gains come from tuning Hadoop rather than Lustre In contrast to HDFS, this means that Hadoop performance is less tied to how (or where, in the case of HDFS) the data is stored on disk, which gives users more options in how to tune their applications It s best to focus on tuning Hadoop parameters for best performance: We have seen that Lustre does better with fewer writers than the same job going to HDFS (e.g. number of Mappers for Teragen, or Reducers in Terasort) A typical Hadoop rule of thumb is to build clusters, and run jobs, following the 1:1 core:hdd rule. With Hadoop on Lustre, this isn t necessarily true Particularly for CPU-intensive jobs, performance tuning for Lustre is very similar to typical Hadoop on HDFS Seagate Confidential 13

Hadoop on Lustre Diskless Modification (2011) Standard Hadoop has local disks used for both HDFS and temporary data However, when using Hadoop on Lustre on diskless compute nodes, all data needs to go to Lustre, including temporary data. Beginning in 2011, we investigated an idea of how to optimize Map/Reduce performance when using Lustre as the temporary data store point This required a modification to core Hadoop functionality, and we released these changes at SC14: https://github.com/seagate/hadoop-on-lustre https://github.com/seagate/hadoop-on-lustre2 Seagate Confidential 14

Map/Reduce Data Flow Basics Hadoop on HDFS Lustre Disk HDFS Disk Local Disk RAM Local Xfer Network Xfer Sufficient RAM Note: Most cases HDFS data is local to mapper but can be remote Mapper Xfer Task (Reducer) Sort Insufficient RAM Reduce Note: 1 Local HDFS + 2 Network HDFS copies for Reduce Output HDFS data is split into blocks which is analogous to Lustre s stripe size. Default size is 128 MB for Hadoop 2 The ResourceManager does best effort to assign computation to nodes that already have the data (Mapper) When done, the Mapper writes intermediate data to local scratch disk (no replication) Reducers collect data from Mappers (reading from local disk) over HTTP (called the shuffle phase) and sort incoming data either in RAM or on disk (again, local scratch disk) When data is sorted, the Reducer applies the reduction algorithm and writes output to HDFS that is 3x replicated Seagate Confidential 15

Map/Reduce Data Flow Basics Hadoop on HDFS Lustre Disk HDFS Disk Local Disk RAM Local Xfer Network Xfer Sufficient RAM Note: Most cases HDFS data is local to mapper but can be remote Mapper Xfer Task (Reducer) Sort Insufficient RAM Reduce Note: 1 Local HDFS + 2 Network HDFS copies for Reduce Output Hadoop on Lustre with local Disks Mapper Xfer Task (Reducer) Sufficient RAM Sort Reduce Note: 1 Lustre network copy for Reduce output Insufficient RAM Using Lustre instead of HDFS is very similar, except the Mapper reads data from Lustre over the network, and the Reducer output is not triple replicated. The lower diagram describes the setup for how we performed all the benchmarks we ve presented so far, by using both Lustre and local disks to each Hadoop compute node. Seagate Confidential 16

Hadoop on Lustre Map/Reduce Diskless Modification Hadoop on HDFS Lustre Disk HDFS Disk Local Disk RAM Local Xfer Network Xfer Sufficient RAM Note: Most cases HDFS data is local to mapper but can be remote Mapper Xfer Task (Reducer) Sort Insufficient RAM Reduce Note: 1 Local HDFS + 2 Network HDFS copies for Reduce Output Hadoop on Lustre Diskless Unmodified Mapper Xfer Task (Reducer) Sufficient RAM Sort Reduce Note: 1 Lustre network copy for Reduce output Insufficient RAM Hadoop on Lustre Diskless Modified Sufficient RAM Mapper Sort Reduce Insufficient RAM Note: 1 Lustre network copy for Reduce output Seagate Confidential 17

Hadoop on Lustre Diskless Modification Hadoop Node Unmodified Terasort Timings: Size Unmodified Modified 100 GB 131s 155s Mapper IO Cache IO Cache Reducer 500 GB 538s 558s Modified Temporary files Saved to Lustre The modified method is slower! Takeaway: IO Caching speeds up the unmodified method. Mapper IO Cache IO Cache Reducer Seagate Confidential 18

Final Thoughts We have explored three methods of Hadoop on Lustre: Using local disk(s) for temporary storage Using Lustre for temporary storage with no modifications to Map/Reduce Using Lustre for temporary storage, but with modifications to Map/Reduce We find that each method above is slower than the ones above it The local disks do not need to be close to big (where big is the largest dataset analyzed), but it is most optimal to have some kind of local disk for temporary data Thank you to my collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan Seagate Confidential 19