Managing large clusters resources

Similar documents
Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

YARN Apache Hadoop Next Generation Compute Platform

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Introduction to Hadoop

Hadoop Open Platform-as-a-Service (Hops)

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Unified Big Data Processing with Apache Spark. Matei

Scaling Out With Apache Spark. DTL Meeting Slides based on

Large scale processing using Hadoop. Ján Vaňo

How Companies are! Using Spark

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Apache Flink Next-gen data analysis. Kostas

Apache Hadoop. Alexandru Costan

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Hadoop Open Platform-as-a-Service (Hops)

Hadoop Ecosystem B Y R A H I M A.

Hadoop implementation of MapReduce computational model. Ján Vaňo

Big Data and Scripting Systems beyond Hadoop

Energy Efficient MapReduce

Hadoop. Sunday, November 25, 12

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Challenges for Data Driven Systems

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

CSE-E5430 Scalable Cloud Computing Lecture 11

HiBench Introduction. Carson Wang Software & Services Group

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Hadoop and Map-Reduce. Swati Gore

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

HDP Hadoop From concept to deployment.

Big Data Management and Security

Comprehensive Analytics on the Hortonworks Data Platform

Integrating Big Data into the Computing Curricula

Hadoop IST 734 SS CHUNG

Apache Hadoop Ecosystem

Ali Ghodsi Head of PM and Engineering Databricks

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Big Data and Apache Hadoop s MapReduce

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Big Data Analytics Hadoop and Spark

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Unified Big Data Analytics Pipeline. 连 城

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

How To Create A Data Visualization With Apache Spark And Zeppelin

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

#TalendSandbox for Big Data

Dell In-Memory Appliance for Cloudera Enterprise

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

White Paper. Optimizing the Performance Of MySQL Cluster

Computing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System

Spark. Fast, Interactive, Language- Integrated Cluster Computing

Big Data on Google Cloud

Hybrid Software Architectures for Big

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

CDH AND BUSINESS CONTINUITY:

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

The Stratosphere Big Data Analytics Platform

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Sujee Maniyam, ElephantScale

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

Brave New World: Hadoop vs. Spark

CSE-E5430 Scalable Cloud Computing Lecture 2

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

The Hadoop Distributed File System

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Apache Hama Design Document v0.6

Moving From Hadoop to Spark

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Real-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai Intel Software and Services Group

Spark and the Big Data Library

Chapter 7. Using Hadoop Cluster and MapReduce

Running a typical ROOT HEP analysis on Hadoop/MapReduce. Stefano Alberto Russo Michele Pinamonti Marina Cobal

Dominik Wagenknecht Accenture

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Upcoming Announcements

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Hadoop & Spark Using Amazon EMR

Big Data and Scripting Systems build on top of Hadoop

HADOOP MOCK TEST HADOOP MOCK TEST II

BIG DATA What it is and how to use?

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Data Security in Hadoop

GraySort on Apache Spark by Databricks

Hadoop. Bioinformatics Big Data

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Transcription:

Managing large clusters resources ID2210 Gautier Berthou (SICS)

Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth is the bottleneck 1 2 3 2 5 6 4 3 6 3 5 6 1 2 4 1 4 5

MapReduce Locality Job( /crawler/bot/jd.io/1 ) submi t Job Tracker Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Job Job Job Job Job Job DN DN DN DN DN DN 1 2 3 2 5 6 4 3 6 3 5 6 1 2 4 1 4 5 R R = esultfile(s) R R

First step Single Processing Framework Batch Apps Hadoop 1.x MapReduce (resource mgmt, job scheduler, data processing) HDFS (distributed storage) 4

MapReduce Locality Job( /crawler/bot/jd.io/1 ) submi t Job Tracker Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Job Job Job Job Job Job DN DN DN DN DN DN 1 2 3 2 5 6 4 3 6 3 5 6 1 2 4 1 4 5 R R = esultfile(s) R R

The Job Tracker Job Distribute the map and reduce tasks on the s of the cluster Ensure fairness of the cluster resource attributions Track the progress of these tasks Authenticate job tenants and make sure that each job is isolated from the others Etc. 6

Limitations of MapReduce [Zaharia 11] MapReduce is based on an acyclic data flow from stable storage to stable storage. - Slow writes data to HDFS at every stage in the pipeline Acyclic data flow is inefficient for applications that repeatedly reuse a working set of data: - Iterative algorithms (machine learning, graphs) - Interactive data mining tools (R, Excel, Python) Map Reduce Input Map Output Map Reduce

Limitations Only one programming model. The map reduce framework is not using the cluster at its maximum. The job tracker is a bottle neck. The job tracker is a single point of failure. 8

Goals for a new Scheduler Being able to run different frameworks Scale Provide advance scheduling policies Run efficiently with different kind of workloads 9

Second step Multiple Processing Frameworks Batch, Interactive, Streaming Hadoop 2.x MapReduce (data processing) Others (spark, mpi, giraph, etc) YARN (resource mgmt, job scheduler) HDFS (distributed storage) 10

Examples of scheduling Policies Capacity scheduler: - Applications have different levels of priorities. Fair scheduler: - Applications have different levels of priorities. - Used resources can be preempted. Reservation-based scheduler (1) : - Applications can indicate how long they will run and when they have to be finished. (1) Reservation-based Scheduling: If you re late don t blame us!, C. Curino & al., Microsoft tech-report 11

Scenario 3 kinds of jobs: - Emergency jobs: need to be run as soon as possible. - Production jobs: have a deadline, a known running time and are very exigent on the s they can be scheduled on. - Best effort jobs: interactive jobs that have lower priority, but on which users expect low latency. 12

Capacity scheduler Best effort emergency production Best effort Best effort Best effort Best effort Best effort production Now Production work deadline 13

Fair scheduler Best effort emergency Best effort Best production effort Best effort Best effort Now Production work deadline 14

Reservation-based scheduler Best effort emergency Best effort production production Best effort Best effort Now Production work deadline 15

Scheduler Architectures Omega: flexible, scalable schedulers for large compute clusters, Malte Schwarzkopf & al., EuroSys 13 16

The monolithic Scheduler Yarn: - Apache Hadoop YARN: Yet Another Resource Negotiator, V. K. Vavilapalli & al., SoCC 13. Borg: - Large-scale cluster management at Google with Borg, A. Verma & al., EuroSys 15. 17

Architecture 1/3 18

Architecture 2/3 Resources Manager 19

Architecture 3/3 zookeeper Standby Master Resources Manager Standby Resources Resources Manager Manager Master Resources Manager 20

Pros and Cons Pros: - Fine knowledge of the state of the cluster state -> optimal use of the cluster resources. - Easy to implement new scheduling policies. Cons: - Bottle neck. - The failure of the master scheduler has a big impact on the cluster usage. 21

Two level Scheduler Mesos: A Platform for Fine-Grained Resource Sharing in the Center, B. Hindman & al., NSDI 11 22

Architecture 1/2 23

Architecture 2/2 MapReduce Scheduler Partial State Spark Scheduler Flink Scheduler Mesos Master 24

Pros and Cons Pros: - Scale out by adding schedulers. - Concurrent scheduling of tasks. Cons: - Suboptimal use of the cluster. Especially when there exist long running tasks. 25

Shared State Scheduler Omega: flexible, scalable schedulers for large compute clusters, M. Schwarzkopf & al. EuroSys 13 26

Architecture 1/2 27

Architecture 2/2 MapReduce Scheduler Global state` Spark Scheduler Flink Scheduler State Manager 28

Architecture 2/2 MapReduce Scheduler Global state Spark Scheduler Conflict Flink Scheduler State Manager 29

Pros and Cons Pros - Scalable. - Good use of the cluster resources. Cons - Unpredictable interaction between the different schedulers policies. 30

Comparison 31

Performance comparison 1/ Simple monolithic Advence monolithic Two-level Shared state 32

Performance Comparison 2/ What the previous evaluation does not show about the Two-level scheduling: 33

Performance Comparison 3/ Trying to handle more batch jobs in Omega by running several batch schedulers in parallel. How to write an good accepted paper. paper. 34

Sum up Two-Level and Shared state Schedulers scale better. Shared state Schedulers use the cluster resources more optimally than Two-level Schedulers. Monolithic Scheduler are a potential Bottleneck. But as Monolithic schedulers are easier to design, allow finer allocation of resources and more advance scheduling policies, they are the ones used in practice. 35

Making Yarn more scalable HOPS YARN: a one and a half level scheduler 36

Hadoop Yarn HA Implementation zookeeper Standby Master Resources Manager Standby Resources Resources Manager Manager Master Resources Manager 37

Hops Yarn HA Implementation3/3 NDB Standby Master Resources Manager Standby Resources Resources Manager Manager Master Resources Manager 38

MySQL Cluster (NDB) Shared Nothing DB SQL API NDB API Distributed, In-memory 2-Phase Commit - Replicate DB, not the Log! Real-time - Low TransactionInactive timeouts Commodity Hardware Scales out - Millions of transactions/sec - TB-sized datasets (48 s) Split-Brain solved with Arbitrator Pattern SQL and Native Blocking/Non- Blocking APIs 30+ million update transactions/second on a 30- cluster 39

Standby is boring Master Resources Manager Standby Resources Manager Standby Resources Manager 40

Dificulties 1/2 41

Difficulties 2/2 Pulling from the database when the state is needed is inefficient. Having an independent thread that regularly pull from the database is difficult to tune and cause lock problems. 42

Solution Luckily NDB has an event API. 43

With streaming Master Resources Manager Standby Resources Manager Standby Resources Manager 44

Conclusion There exists three architectures for large cluster resource scheduling: - Monolithic - Two-levels - Shared State Each of these architectures has pros and cons. The monolithic architectur is the one presently used because it is easyer to use and develop. At KTH and SICS we are exploring the possibilities for a new architecture ensuring more scalability while keeping the advantages of the monolithic architecture.

References Reservation-based Scheduling: If you re late don t blame us!, C. Curino & al., Microsoft tech-report Omega: flexible, scalable schedulers for large compute clusters, Malte Schwarzkopf & al., EuroSys 13 Apache Hadoop YARN: Yet Another Resource Negotiator, V. K. Vavilapalli & al., SoCC 13. Large-scale cluster management at Google with Borg, A. Verma & al., EuroSys 15. Mesos: A Platform for Fine-Grained Resource Sharing in the Center, B. Hindman & al., NSDI 11