Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013



Similar documents
Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

MapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Lustre * Filesystem for Cloud and Hadoop *

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

HADOOP PERFORMANCE TUNING

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Hadoop* on Lustre* Liu Ying High Performance Data Division, Intel Corporation

Understanding Hadoop Performance on Lustre

CSE-E5430 Scalable Cloud Computing Lecture 2

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Big Fast Data Hadoop acceleration with Flash. June 2013

Enabling High performance Big Data platform with RDMA

Chapter 7. Using Hadoop Cluster and MapReduce

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Experiences with Lustre* and Hadoop*

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

MapReduce Evaluator: User Guide

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Map Reduce & Hadoop Recommended Text:

Intro to Map/Reduce a.k.a. Hadoop

HiBench Introduction. Carson Wang Software & Services Group

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

A very short Intro to Hadoop

COURSE CONTENT Big Data and Hadoop Training

Data-intensive computing systems

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop on the SDSC Gordon Data Intensive Cluster"

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Hadoop: Embracing future hardware

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Performance and Energy Efficiency of. Hadoop deployment models

Duke University

BBM467 Data Intensive ApplicaAons

Hadoop and Map-Reduce. Swati Gore

Big Data With Hadoop

Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer!

Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Hadoop Architecture. Part 1

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Maximizing Hadoop Performance with Hardware Compression

HADOOP MOCK TEST HADOOP MOCK TEST I

Benchmarking Hadoop & HBase on Violin

THE HADOOP DISTRIBUTED FILE SYSTEM

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Storage Architectures for Big Data in the Cloud

HPCHadoop: MapReduce on Cray X-series

Hadoop on the Gordon Data Intensive Cluster

MapReduce, Hadoop and Amazon AWS

High Performance Computing OpenStack Options. September 22, 2015

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

How MapReduce Works 資碩一 戴睿宸

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

HADOOP MOCK TEST HADOOP MOCK TEST II

Reduction of Data at Namenode in HDFS using harballing Technique

Hadoop Parallel Data Processing

Developing MapReduce Programs

Deploying Hadoop on Lustre Sto rage:

MapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joseph Hellerstein, Khaled Elmeleegy, Russell Sears. Neeraj Ganapathy

Keywords: Big Data, HDFS, Map Reduce, Hadoop

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

GraySort on Apache Spark by Databricks

HDFS: Hadoop Distributed File System

International Journal of Advance Research in Computer Science and Management Studies

Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon

Data-Intensive Computing with Map-Reduce and Hadoop

In Memory Accelerator for MongoDB

Energy Efficient MapReduce

Herodotos Herodotou Shivnath Babu. Duke University

Apache HBase. Crazy dances on the elephant back

Performance measurement of a Hadoop Cluster

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

Performance Analysis of Mixed Distributed Filesystem Workloads

Cloud Computing at Google. Architecture

New Storage System Solutions

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Hadoop Scheduler w i t h Deadline Constraint

Implement Hadoop jobs to extract business value from large and varied data sets

Open source Google-style large scale data analysis with Hadoop

Parallel Processing of cluster by Map Reduce

BIG DATA What it is and how to use?

How To Use Hadoop

Transcription:

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others.

Agenda Hadoop Intro Why run Hadoop on Lustre? OpAmizing Hadoop for Lustre Performance What s next? 2 2

3 A LiHle Intro of Hadoop Open source MapReduce framework for data- intensive compuang Simple programming model two funcaons: Map and Reduce Map: Transforms input into a list of key value pairs Map(D) List[Ki, Vi] Reduce: Given a key and all associated values, produces result in the form of a list of values Reduce(Ki, List[Vi]) List[Vo] Parallelism hidden by framework Highly scalable: can be applied to large datasets (Big Data) and run on commodity clusters Comes with its own user- space distributed file system (HDFS) based on the local storage of cluster nodes 3

A LiHle Intro of Hadoop (cont.) Input Split Map Shuffle Reduce Output Input Split Map Reduce Output Input Split Map 4 Framework handles most of the execuaon Splits input logically and feeds mappers ParAAons and sorts map outputs (Collect) Transports map outputs to reducers (Shuffle) Merges output obtained from each mapper (Merge) 4

Why Hadoop with Lustre? HPC moving towards Exascale. SimulaAons will only get bigger Need tools to run analyses on resulang massive datasets Natural allies: Hadoop is the most popular sozware stack for big data analyacs Lustre is the file system of choice for most HPC clusters Easier to manage a single storage pla[orm No data transfer overhead for staging inputs and extracang results No need to paraaon storage into HPC (Lustre) and AnalyAcs (HDFS) Also, HDFS expects nodes with locally a]ached disks, while most HPC clusters have diskless compute nodes with a separate storage cluster 5 5

How to make them cooperate? Hadoop uses pluggable extensions to work with different file system types Lustre is POSIX compliant: Use Hadoop s built- in LocalFileSystem class Uses naave file system support in Java Extend and override default behavior: LustreFileSystem Defines new URL scheme for Lustre lustre:/// Controls Lustre striping info Resolves absolute paths to user- defined directory Leaves room for future enhancements org.apache.hadoop.fs FileSystem RawLocalFileSystem LustreFileSystem 6 Allow Hadoop to find it in config files 6

7 Sort, Shuffle & Merge M Number of Maps, R Number of Reduces Map output records (Key- Value pairs) organized into R paraaons ParAAons exist in memory. Records within a paraaon are sorted A background thread monitors the buffer, spills to disk if full Each spill generates a spill file and a corresponding index file Eventually, all spill files are merged (paraaon- wise) into a single file Final index is file created containing R index records Index Record = [Offset, Compressed Length, Original Length] A Servlet extracts paraaons and streams to reducers over HTTP Reducer merges all M streams on disk or in memory before reducing 7

Sort, Shuffle & Merge (Cont.) Mapper X TaskTracker Reducer Y Sort Merge Servlet Copy Reduce Map Partition 1 Idx 1 Merged Streams Partition 2 Idx 2 Map 1:Partition Y Partition R Idx R Map 2:Partition Y Input Split X Output Index HDFS Map M:Partition Y Output Part Y 8

OpAmized Shuffle for Lustre Why? Biggest (but inevitable) bo]leneck bad performance on Lustre! How? Shared File System HTTP transport is redundant 9 How would reducers access map outputs? First Method: Let reducers read paraaons from map outputs directly But, index informaaon sall needed Either, let reducers read index files, as well Results in (M*R) small (24 bytes/record) IO operaaons Or, let Servlet convey index informaaon to reducer Advantage: Read enare index file at once, and cache it Disadvantage: Seeking paraaon offsets + HTTP latency Second Method: Let mappers put each paraaon in a separate file Three birds with one stone: No index files, no disk seeks, no HTTP 9

OpAmized Shuffle for Lustre (Cont.) Mapper X Sort Merge Reducer Y Reduce Map Input Split X Map 1:Partition Y Map 2:Partition Y Merged Streams Output Part Y Map X:Partition 1 Map X:Partition Y Map X:Partition R 10 Lustre Map M:Partition Y 10

Performance Tests Standard Hadoop benchmarks were run on the Rosso cluster ConfiguraAon Hadoop (Intel Distro v1.0.3): 8 nodes, 2 SATA disks per node (used only for HDFS) One with dual configuraaon, i.e. master and slave ConfiguraAon Lustre (v2.3.0): 4 OSS nodes, 4 SATA disks per node (OSTs) 1 MDS, 4GB SSD MDT All storage handled by Lustre, local disks not used 11 11

TestDFSIO Benchmark Tests the raw performance of a file system Write and read very large files (35G each) in parallel One mapper per file. Single reducer to collect stats Embarrassingly parallel, does not test shuffle & sort Throughput "! filesize % $ #!time ' & MB/s 100 80 60 40 HDFS Lustre 12 More is better! 20 0 Write Read 12

Terasort Benchmark Distributed sort: The primary Map- Reduce primiave Sort a 1 Billion records, i.e. approximately 100G Record: Randomly generated 10 byte key + 90 bytes garbage data Terasort only supplies a custom paraaoner for keys, the rest is just default map- reduce behavior. Block Size: 128M, Maps: 752 @ 4/node, Reduces: 16 @ 2/node Lustre HDFS Lustre 10-15% Faster 13 0 100 200 300 400 500 Runtime (seconds) Less is better! 13

Work in progress Planned so far More exhausave tesang needed Test at scale: Verify that large scale jobs don t thro]le MDS Port to IDH 3.x (Hadoop 2.x): New architecture, More decoupled Scenarios with other tools in the Hadoop Stack: Hive, HBase, etc. Further Work Experiment with caching Scheduling Enhancements ExploiAng Locality 14 14