MANAGING RESOURCES IN A BIG DATA CLUSTER.

Size: px

Start display at page:

Download "MANAGING RESOURCES IN A BIG DATA CLUSTER."

Suzanna Barber
8 years ago
Views:

1 MANAGING RESOURCES IN A BIG DATA CLUSTER. Gautier Berthou (SICS) EMDC Summer Event 2015

2 We are producing lot of data

3 Where does they Come From? On-line services : PBs per day Scientific instruments : PBs per minute Whole genome sequencing : 250 GB per person Internet-of-Things : Will be lots!

4 Small Big

5 HDFS Big File Name Node

6 What do we do with this data? Batch Large quantity of data (Tera bytes). Stored in a large number of machines (100 to 5000). The user want to run computation on this data in a way that is as efficient as possible. Streaming Large quantity of data. The data arrive as a stream generated by different sources. The user want to run computation on the fly on this data. With the guaranty that the stream of data and the computation will run without any interruption.

The user want to run computation on this data in a way that is as efficient as possible.

7 Example of batch processing MapReduce Job( /crawler/bot/jd.io/1 ) submi t Job Tracker Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Job Job Job Job Job Job DN DN DN DN DN DN R R = R R

Tracker Task Tracker Task Tracker Task Tracker Job Job Job Job

8 Scenario 3 kinds of jobs: Emergency jobs: need to be run as soon as possible. Production jobs: have a deadline, a known running time and are very exigent on the s they can be scheduled on. Best effort jobs: interactive jobs that have lower priority, but on which users expect low latency. 8

9 Capacity scheduler Best effort emergency production Best effort Best effort Best effort Best effort Best effort production Now Production work deadline 9

10 Fair scheduler Best effort emergency Best effort Best production effort Best effort Best effort Now Production work deadline 10

11 Reservation-based scheduler Best effort emergency Best effort production production Best effort Best effort Now Production work deadline 11

12 Scheduler Architectures Omega: flexible, scalable schedulers for large compute clusters, Malte Schwarzkopf & al., EuroSys 13 12

13 The monolithic Scheduler Yarn: Apache Hadoop YARN: Yet Another Resource Negotiator, V. K. Vavilapalli & al., SoCC 13. Borg: Large-scale cluster management at Google with Borg, A. Verma & al., EuroSys

14 Architecture 1/3 14

15 Architecture 2/3 Resources Manager 15

16 Architecture 3/3 zookeeper Standby Master Resources Manager Standby Resources Resources Manager Manager Master Resources Manager 16

17 Pros and Cons Pros: Fine knowledge of the state of the cluster state -> optimal use of the cluster resources. Easy to implement new scheduling policies. Cons: Bottle neck. The failure of the master scheduler has a big impact on the cluster usage. 17

Easy to implement new scheduling policies. Cons: Bottle neck.

18 Two level Scheduler Mesos: A Platform for Fine-Grained Resource Sharing in the Center, B. Hindman & al., NSDI 11 18

19 Architecture 1/2 19

20 Architecture 2/2 MapReduce Scheduler Partial State Spark Scheduler Flink Scheduler Mesos Master 20

21 Pros and Cons Pros: Scale out by adding schedulers. Concurrent scheduling of tasks. Cons: Suboptimal use of the cluster. Especially when there exist long running tasks. 21

22 Shared State Scheduler Omega: flexible, scalable schedulers for large compute clusters, M. Schwarzkopf & al. EuroSys 13 22

23 Architecture 1/2 23

24 Architecture 2/2 MapReduce Scheduler Global state` Spark Scheduler Flink Scheduler State Manager 24

25 Architecture 2/2 MapReduce Scheduler Global state Spark Scheduler Conflict Flink Scheduler State Manager 25

26 Pros and Cons Pros Scalable. Good use of the cluster resources. Cons Unpredictable interaction between the different schedulers policies. 26

27 Sum up Two-Level and Shared state Schedulers scale better. Shared state Schedulers use the cluster resources more optimally than Two-level Schedulers. Monolithic Scheduler are a potential Bottleneck. In practice the monolithic bottleneck is not/rarely reached. And, as the monolithic scheduler is easier to implement and allows more advance scheduling policies, it is the scheduler architecture used in Hadoop and by Google. 27

28 Hadoop Multiple Processing Frameworks Batch, Interactive, Streaming Hadoop 2.x MapReduce (data processing) Others (spark, mpi, giraph, etc) YARN (resource mgmt, job scheduler) HDFS (distributed storage) 28

29 Making Yarn more scalable HOPS YARN: a one and a half level scheduler 29

30 Hadoop Yarn HA Implementation zookeeper Standby Resources Manager Standby Resources Manager Master Resources Manager 30

31 Hops Yarn HA Implementation3/3 NDB Standby Master Resources Manager Standby Resources Manager Master Resources Manager 31

32 MySQL Cluster (NDB) Shared Nothing DB SQL API NDB API Distributed, In-memory 2-Phase Commit Replicate DB, not the Log! Real-time Low TransactionInactive timeouts Commodity Hardware Scales out Millions of transactions/sec TB-sized datasets (48 s) Split-Brain solved with Arbitrator Pattern SQL and Native Blocking/Non- Blocking APIs 30+ million update transactions/second on a 30- cluster 32

33 Standby is boring Master Resources Manager Standby Resources Manager Standby Resources Manager 33

34 Dificulties 1/2 34

35 Difficulties 2/2 Pulling from the database when the state is needed is inefficient. Having an independent thread that regularly pull from the database is difficult to tune and cause lock problems. 35

36 Solution Luckily NDB has an event API. 36

37 With streaming Master Resources Manager Standby Resources Manager Standby Resources Manager 37

38 Conclusion There exists three architectures for large cluster resource scheduling: Monolithic Two-levels Shared State Each of these architectures has pros and cons. The monolithic architectur is the one presently used because it is easyer to use and develop. At KTH and SICS we are exploring the possibilities for a new architecture ensuring more scalability while keeping the advantages of the monolithic architecture.

39 Project proposition: Quota base scheduling What we saw so far allows cluster resources to be optimally used. Scheduling policies implemented in Yarn such as Capacity scheduler or Fair scheduler allows the scheduler to give priority to some users over some other users. But: none of this allows to follow, manage and limit a user consumption of resources over time. Google Borg provide this feature, we need to add it to HOPS- YARN.

40 References Reservation-based Scheduling: If you re late don t blame us!, C. Curino & al., Microsoft tech-report Omega: flexible, scalable schedulers for large compute clusters, Malte Schwarzkopf & al., EuroSys 13 Apache Hadoop YARN: Yet Another Resource Negotiator, V. K. Vavilapalli & al., SoCC 13. Large-scale cluster management at Google with Borg, A. Verma & al., EuroSys 15. Mesos: A Platform for Fine-Grained Resource Sharing in the Center, B. Hindman & al., NSDI 11

Managing large clusters resources

Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth