Cloud Federation to Elastically Increase MapReduce Processing Resources A.Panarello, A.Celesti, M. Villari, M. Fazio and A. Puliafito {apanarello,acelesti, mfazio, mvillari, apuliafito}@unime.it DICIEAMA, University of Messina Contrada di Dio, S. Agata, 98166 Messina, Italy The second international FedICI'2014 workshop: Federative and interoperable cloud infrastructures
Outline Cloud federation introduction How Cloud federation can elastically increase providers' MapReduce resources Case of study: a video transcoding service System prototype (Hadoop, CLEVER, Amazon S3) Main factors involved in job submission Conclusion and future works
Toward Cloud Federation Currently, only the major cloud providers (e.g., Amazon, Google, Rackspace, etc) hold big datacenters, i.e., virtualization infrastructures Small cloud provider cannot directly compete with these market leaders. They have to buy services from these mega-providers. The largest business is in hand of mega-providers. Possible solution: Cloud Federation
Evolution of the Cloud Ecosystem Indepentent Cloud Cloud federation cloud federation: a mesh of cloud providers that are interconnected to provide a universal decentralized computing environment where everything is driven by constraints and agreements in a ubiquitous, multi-provider infrastructure Different distributed services (e.g., IaaS, PaaS, SaaS) One of the main challenges: minimizing the barriers of delivering services among different administrative domains
Why to Federate Cloud Providers? Multiple reasons: Clouds can benefit of a market in which the can buy/sell resources A cloud has saturated its own resources and it needs external assets A cloud needs particular types of services or resources that it does not hold A cloud wants to perform software consolidation in order to save energy cost A cloud wants to move part of processing into other providers (e.g., for security, performance, or for the deployment of particular location-dependent services) And so on...
Motivation MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster The major MapReduce pieces of framework are not cloud-like: They often are non resilient They often does not scale up/down They often require manual configurations Objectives: Make a piece of MapReduce framework cloud-like Investigate the main concerns regarding the job submission in a federated cloud environment
MapReduce Distributed Processing in Cloud Federation: A Reference Scenario (1) Actors: Multiple Cloud Providers (CPs), each one running a MapReduce system in its administrative domain. A public Cloud Storage Provider (CSP), offering storage services and supporting multi-part data download Clients, each one submitting a parallel processing request (job) to a particular CP (i.e., Home CP). The the piece of input data is stored in a CSP (e.g., Amazon S3, Dropbox, Drive, etc) to minimize the transmission overhead between federated CPs
MapReduce Distributed Processing in Cloud Federation: A Reference Scenario (2) The client contacts the home CP that offers a particular parallel processing service and he/she submits a job (where the piece of data is stored and how to process it) The home CP establishes a federation with other foreign CPs and sends them sub-job instructions. Exploiting the multi-part download each federated CP download chunks of data and process them exploiting the local MapReduce system Each federated CP upload the output in the CSP sends a notification to the home CP Finally, the client merges the processed chunks (if required) and read the whole output.
A Video Transcoding Use Case A user would like to watch a movie that is stored in a CSP using his/her mobile phone Unfortunately the movie is stored as HD file and the user's device is not able to play it Thus, the client submit a video transcoding job to reduce the resolution of the movie to a particular home CP The job submission includes where the input movie is stored and how to process it The Home CP establishes a federation with other foreign CPs submitting them a sub-job Each foreign CP downloads a chunk of file, processes it, upload it in the CSP, and sends a notification to the home CP. Once the Home CP received all the notification is generates a SMIL file, i.e., an XML file that allows to play a video without merge chunks. The home CP upload the SMIL file in the CSP The client is able to play the movie
System Prototype (1) System components Hadoop as MapReduce piece of framework CLEVER as middleware to make Hadoop cloud-like with federation capabilities in CPs Amazon S3 as public CSP Hadoop Master/Slave architecture It consists of a single master JobTracker node and several slave TaskTracker nodes. To speed up the processing it supports a distributed file system, i.e., HDFS including Name and Data nodes typically respectively deployed in the same nodes running JobTracker and TaskTracker
System Prototype (2) CLEVER The CLoud-Enabled Virtual EnviRonment (CLEVER) is a Message- Oriented Middleware for Cloud comptuting (MOM4C) that enables to arrange federated cloud systems A Cluster Manager (CM) acts as interface with client and manages several Host Managers (HMs) Inter-module communication by means of MUC using XMPP Pluggable architecture: agents can be added to control third party components (Sensor networks, virtualization, parallel processing, storage, etc )
System Prototype (3) Advantages of integrating Hadoop in CLEVER Typically, Hadoop uses the TCP/IP layer for communication: firewalls can block inter-domain communication. Solution: Integrating Hadoop in CLEVER communication can be sent on port 80 thanks ot XMPP. The system can automatically scale The two main software agents: Hadoop Master Node (HMN) and Hadoop Slave Node (HSN) running in respectively in CM and HM Two possible configurations: HM with HSN in PHs or in VMs (more resilient)
Experiments (1) Objective: understanding the main concerns regarding the job submission in the federated cloud environment. Processing time of a hadoop cluster was out of scope of this paper (many works are available in literature) Testbed Specification 4 CLEVER/Hadoop administrative domains (i.e., A, B, C, and D) deployed in 4 servers CPU: Intel(R) Core(TM)2 CPU 6300; 1.86GHz, 3GB RAM, running Linux Ubuntu 12.04 x86 64 OS and VirtualBox Overall system deployed in 10 VMs (1 VM in domain A and 3 in domains B, C, D) Amazon S3 Experiment repeated 50 times in order to consider mean values and confidence intervals
Experiments (2) Timeline T0, a client submit a video transcoding job to the home CP T1, the home CP that receives the request decides to establish a federation with the other ones, retrieving domain information. T2, the home CP performs a job assignment involving the whole federated environment. By means of the job tracker it creates the video transcoding job, and it assigns the sub-jobsto the other federated domains. T3, each involved federated CP downloads only particular video chunks from Amazon S3 using the multipart download mechanism. T4. Each CP uploads the previously downloaded video chunks in its own HDFS of the local domain for the processing.
Experiments (3) The average time required to retrieve domain information (tt1-t0) and to forward in parallel the request to federated CPs (t2-t1) is roughly 5 seconds.
Experiments (3) Distribution histogram of the mean times required to download 20MB, 10MB, and 7MB block sizes from Amazon S3 (t3-t2) in each CP considering one administrative domain. Looking at the summary distribution histogram, it is evident how thanks to the federation, increasing the number of administrative domains the download time can be reduced considering smaller chunks.
Experiments (4) The average upload time of chunks in the HDFS on each domain (t4-t3) changes according to the number of active Data Nodes and video file sizes. we can notice that increasing the number of Hadoop Data Nodes the upload time increases too. We can motivate this trend remembering that the Hadoop has been configured with a redundancy parameter equal to 2. In fact with a single active Data Node, the upload time has a very low value, because the system does not have the need to replicate the file. Due to Hadoop s data replication mechanisms, increasing the number of Data Nodes, we can notice a linear increase of the upload
Conclusion and Future Work The main result has been understanding how a MapReduce parallel processing system can be deployed in a federated cloud environment. Experiments highlighted the overhead of the system in job submission In future works we plan to integrate resource provisioning policies to make more flexible the federation relationship establishment between Cps. For who is interested in CLEVER, a guide on how to use the middleware and how to develop agents is available in the official web site http://clever.unime.it
Questions?