5 SCS Deployment Infrastructure in Use



Similar documents
4 SCS Deployment Infrastructure on Cloud Infrastructures

A programming model in Cloud: MapReduce

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Cloud computing - Architecting in the cloud

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Chapter 7. Using Hadoop Cluster and MapReduce

Hadoop Architecture. Part 1

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Performance and Energy Efficiency of. Hadoop deployment models

CSE-E5430 Scalable Cloud Computing Lecture 2

FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Hadoop IST 734 SS CHUNG

MapReduce (in the cloud)

Sriram Krishnan, Ph.D.

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Sheepdog: distributed storage system for QEMU

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A Cost-Evaluation of MapReduce Applications in the Cloud

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

Putchong Uthayopas, Kasetsart University

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Cloud Computing based on the Hadoop Platform

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Big Data With Hadoop

A Service for Data-Intensive Computations on Virtual Clusters

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Aneka Dynamic Provisioning

The Hadoop Framework

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

CHAPTER 8 CLOUD COMPUTING

Introduction to Cloud Computing

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Lecture 02a Cloud Computing I

Apache Hadoop. Alexandru Costan

Apache Hadoop new way for the company to store and analyze big data

Mobile Cloud Computing for Data-Intensive Applications

CLOUD DEVELOPMENT BEST PRACTICES & SUPPORT APPLICATIONS

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

Improving MapReduce Performance in Heterogeneous Environments

Hadoop Scheduler w i t h Deadline Constraint

Big Data and Apache Hadoop s MapReduce

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

Cloud Models and Platforms

PASS4TEST 専 門 IT 認 証 試 験 問 題 集 提 供 者

On Cloud Computing Technology in the Construction of Digital Campus

Getting Started Hacking on OpenNebula

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Zabbix for Hybrid Cloud Management

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

CLOUD COMPUTING. When It's smarter to rent than to buy

Introduction to Parallel Programming and MapReduce

Contents. 1. Introduction

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Migration Scenario: Migrating Batch Processes to the AWS Cloud

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Scalable Architecture on Amazon AWS Cloud

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Viswanath Nandigam Sriram Krishnan Chaitan Baru

The Porticor Virtual Private Data solution includes two or three major components:

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing

The Performance Characteristics of MapReduce Applications on Scalable Clusters

Last time. Today. IaaS Providers. Amazon Web Services, overview

Getting Started with Hadoop with Amazon s Elastic MapReduce

A bit about Hadoop. Luca Pireddu. March 9, CRS4Distributed Computing Group. (CRS4) Luca Pireddu March 9, / 18

Cloud-pilot.doc SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

Survey on Scheduling Algorithm in MapReduce Framework

Cloud Computing. Adam Barker

Data Semantics Aware Cloud for High Performance Analytics

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Network Infrastructure Services CS848 Project

Disco: Beyond MapReduce

Deploying complex applications to Google Cloud. Olia Kerzhner

Building an Enterprise Cloud with F5 and IBM

Task Scheduling in Hadoop

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

An Approach to Implement Map Reduce with NoSQL Databases

Code and Process Migration! Motivation!

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

From GWS to MapReduce: Google s Cloud Technology in the Early Days

Transcription:

5 SCS Deployment Infrastructure in Use Currently, an increasing adoption of cloud computing resources as the base to build IT infrastructures is enabling users to build flexible, scalable, and low-cost production environments. For example, consumption of CPU processing and manipulation of large data sets are easy tasks using computing and storage services, respectively. Applications like MapReduce fulfill these characteristics because it requires the processing of large storage files as input data. Also, its distributed architecture allows us to experiment with different deployment configurations. This chapter focuses on the use of SCS Deployment Infrastructure using cloud infrastructures as the target environment. We deployed a SCS MapReduce application on the cloud where the MasterNode and each WorkerNode were executed on virtual machine instances. We describe in section 5.1 the architecture and main concepts of MapReduce programming model and describe the architecture of a SCS component-based MapReduce application. Section 5.2 explains the necessary steps to deploy a SCS MapReduce application on cloud infrastructures. Finally, section 5.3 describes some useful considerations that should be taken into account at the moment when an application is deployed. 5.1 MapReduce Application MapReduce is a programming model for distributed and parallel processing, and it generates large data sets [Dean04]. MapReduce allows users to hide technical details of distributed processing, parallelization, fault-tolerance, data distribution, and load balancing, making it a popular library for programmers without skills in distributed and parallel programming. MapReduce defines two main functions map and reduce, both are written by the users. The map function processes a key/value pair as an input and produces a set of key/value pairs or intermediate values. The reduce function receives a key and merges all intermediate values associated with this key, generating a new list of values. MapReduce implementations are designed to be executed over a large cluster of physical machines, and should support many terabytes of input

Chapter 5. SCS Deployment Infrastructure in Use 66 data. map :: (key 1, value 1 ) list(key 2, value 2 ) reduce : (key 2, list(values 2 )) list(value 3 ) Figure 5.1 shows an architectural overview of MapReduce. First, the MapReduce framework splits the input files into M pieces of 16MB to 64MB, and then many copies of the program are made to the cluster. A copy called Master assigns tasks map or reduce for other copies called Workers. There are M map tasks and R reduce tasks. All Workers execute the map tasks by reading their input split assigned and generating intermediate data distributed into R regions on the local disk. Later, workers reduce over the intermediate data generating a set of output files. Figure 5.1: MapReduce Architecture This work uses a WordCount application, this is an implementation of MapReduce architecture built using SCS components [Fonseca09]. This application allows us to count the number of words in an input set. Figure 5.2 displays its architecture, composed of Master, Worker, Scheduler, Reporter, and Channel components. Master controls all the application executions, it is connected with the Workers using a receptacle, and implements the facet Master used to submit jobs. Workers are connected to a channel, and their consumers are connected to the Master component. Workers use this channel to notify the final status of a task. Scheduler is responsible to select workers

Chapter 5. SCS Deployment Infrastructure in Use 67 from a set of machines, it uses a RoundRobin approach. Reporter component is used to register debugged messages. Figure 5.2: A SCS Component-based MapReduce Framework Architecture 5.2 Deploying a SCS-MapReduce on the Cloud We explained the MapReduce application WordCount, therefore we have all elements to complete our main objective, that is, to enable the deployment of distributed component-based applications on cloud infrastructures. Figure 5.3 displays a flow diagram overview showing the components of the Deployment Infrastructure fulfilling the role of Platform as a Service(PaaS). The cloud API allows us to consume Infrastructure as a Service(IaaS), and the MapReduce application is intended to test our proposed architecture. The MapReduce application is represented by a Master node, N Worker nodes, and a NFS shared storage. To deploy the MapReduce application we use the mapreducedeployment script. Also, there are some considerations to take into account before we execute an updated version of the mapreduce-deployment script. We need to check our architecture configuration, policies, and the cloud computing environment. An original version of the mapreduce-deployment script supporting a local-network as its target environment is displayed in appendix A.2. Our first scenario for testing assumes the minimal configuration, as was explained in subsection 4.3.2. We use all policies studied in section 4.3, thus we use the cloud infrastructure policy displayed in 4.3, the platform policy specified in 4.4, the application policy 4.5, and the target environment policy defined in 4.6. The application policy is necessary to update the deploy-

Chapter 5. SCS Deployment Infrastructure in Use 68 Figure 5.3: Deploying a SCS-based MapReduce Application on a Cloud Infrastructure ment script parameter with the mapreduce deployment.lua script. To start deploying the SCS MapReduce application, we assume that the Deployment Infrastructure is running, including the CIS server. We use Amazon Web Services as the cloud infrastructure for our experiments, but it is nccessary to update the security parameters located in the application policy. We use an updated version of mapreduce-deployment script, which loads the policies and defines the Master and Worker nodes. List 5.1 shows a segment code that operates the options from ec2 instances policy, i.e., getting or running virtual machine instances. List 5.2 displays the creation of the workers and their containers. The complete code of our first test is showed in A.3. Our second scenario assumes the Multi-Deployment Architectural configuration explained in subsection 4.3.2. Two or more deployment actors execute their own Deployment Infrastructure(PaaS) and Target Environment(IaaS). However they share the same Cloud Infrastructure Service requiring cloud resources. We use the same policies as our first scenario, except the application

Chapter 5. SCS Deployment Infrastructure in Use 69 1... 2 Listing 5.1: Getting or Running Virtual Machine Instances Metadata 3 I n s t a n t i a t i n g a c l o u d e n g i n e 4 c l o u d e n g i n e = CloudEngine{} 5 One Master and Two Workers 6 l o c a l num i nstances = 3 7 8 i f p o l i c y == g e t t i n g then 9 G e t t i n g VM I n s t a n c e s 10 vminstances = c l o u d e n g i n e. 11 i n s t a n c e s [ c l o u d e n g i n e : g e t c l o u d d e p l o y m e n t m o d e l ( ) ]. 12 c l o u d c o n n e c t i o n : g e t i n s t a n c e s ( num instances, num instances, p o l i c i e s ) 13 14 e l s e i f p o l i c y == r unning then 15 S t a r t i n g VM I n s t a n c e s 16 vminstances = c l o u d e n g i n e. 17 i n s t a n c e s [ c l o u d e n g i n e : g e t c l o u d d e p l o y m e n t m o d e l ( ) ]. 18 c l o u d c o n n e c t i o n : r u n i n s t a n c e s ( num instances, num instances, p o l i c i e s ) 19 end 20 21... policy, because the current configuration is intended to be utilized byt two deployment actors. They require deploying their applications on the same cloud infrastructure. Code 5.3 and 5.4 show the application policies for the Word- Count application, deployed by user a and user b, respectively. To avoid security problems each deployment actor needs to check and update the security parameters from the ec2 creds policy, The cloud API manages the connections with the cloud infrastructure. Specifically, the EucaEngine allows the deployment actors to set different access and secret keys. Finally, both deployment actor A and deployment actor B are ready to execute the same mapreducedeployment script as the first scenario. 5.3 Final Considerations We have a set of metadata from a pool of virtual machine instances as a consequence of the adoption of cloud infrastructures as target environments. However, currently we have not fully this metadata. Therefore, we could take advantages of this metadata in several ways. For example, we could use this information to obtain customized target environments, to address nonfunctional application requirements, etc. Our tests generate a set of log information, such as logs files from the cloud infrastructure, cloud API, deployment infrastructure, and applications. The log files of cloud API usually contain many useless data, especially when deployment actors are using the cloud resources. Thus, to make the debugging simple in advanced configurations of our architecture the support of other monitoring tools is necessary.

Chapter 5. SCS Deployment Infrastructure in Use 70 1... 2 Listing 5.2: Creation of Containers and Workers 3 Workers 4 workers, c o n t a i n e r s = {},{} 5 f o r i =2, num i nstances do 6 c o n t a i n e r s [ i ] = plan : c r e a t e c o n t a i n e r ( j a v a ) 7 c o n t a i n e r s [ i ] : s e t i n s t a n c e ( exnodes [ i ] ) 8 c o n t a i n e r s [ i ] : s e t p r o p e r t y ( c l a s s p a t h, 9 SCS HOME.. / l i b s / j a c o r b /\ :.. SCS HOME.. 10 / l i b s /\ :.. SCS HOME.. / l i b s / l u a j /\ ) 11 workers [ i ] = plan : create component ( ) 12 workers [ i ] : s e t i d ( w o r k e r I d ) 13 workers [ i ] : s e t c o n t a i n e r ( c o n t a i n e r s [ i ] ) 14 workers [ i ] : s e t a r g s ( {SCS HOME.. 15 / s c r i p t s / e x e c u t e /mapreduce. p r o p e r t i e s, 16 exnodes [ i ] : g e t h o s t ( ). i p} ) 17 workers [ i ] : a d d c o n n e c t i o n ( Channel, channel, EventChannel ) 18 workers [ i ] : a d d c o n n e c t i o n ( R e p o r t e r, r e p o r t e r, R e p o r t e r ) 19 master : a d d c o n n e c t i o n ( WorkerServant, workers [ i ], WorkerServant ) 20 end 21 22... Listing 5.3: Application Policy to deploy WordCount by Deployment Actor A 1 MapReduce A p p l i c a t i o n 2 mapreduce app = { 3 4 name = WordCount, v e r s i o n = 1.0, author = u s e r d a t a. u s e r a, 5 d e p l o y m e n t s c r i p t = /home/ u s e r a / p r o j e c t s / scs deploysystem / s r c / l u a.. 6 / s c s /demos/ d e p l o y e r / mapreduce deployment. l u a 7 } 8 9 Deployment Actor A 10 d e p l o y m e n t a c t o r = { 11 12 User Data 13 u s e r d a t a = u s e r a, 14 15 AWS C r e d e n t i a l s f o r User A 16 e c 2 c r e d s = {... }, 17 18 } 19 20 Target Environments 21 t a r g e t e n v i r o n m e n t = { 22 23 p o l i c y = c l o u d i n f r a s t r u c t u r e, 24 } 25 26 A p p l i c a t i o n s to Deploy 27 a p p l i c a t i o n s t o d e p l o y = { 28 29 A p p l i c a t i o n 1 30 a p p l i c a t i o n 1 = { 31 32 a p p l i c a t i o n = mapreduce app, 33 d e p l o y m e n t a c t o r = deployment actor, 34 t a r g e t e n v i r o n m e n t = t a r g e t e n v i r o n m e n t, 35 }, 36 }

Chapter 5. SCS Deployment Infrastructure in Use 71 Listing 5.4: Application Policy to deploy WordCount by Deployment Actor B 1 MapReduce A p p l i c a t i o n 2 mapreduce app = { 3 4 name = WordCount, v e r s i o n = 1.0, author = u s e r d a t a. u s e r a, 5 d e p l o y m e n t s c r i p t = /home/ u s e r b / p r o j e c t s / scs deploysystem / s r c / l u a.. 6 / s c s /demos/ d e p l o y e r / mapreduce deployment. l u a 7 } 8 9 Deployment Actor B 10 d e p l o y m e n t a c t o r = { 11 12 User Data 13 u s e r d a t a = user b, 14 15 AWS C r e d e n t i a l s f o r User B 16 e c 2 c r e d s = {... }, 17 18 } 19 20 Target Environments 21 t a r g e t e n v i r o n m e n t = { 22 23 p o l i c y = c l o u d i n f r a s t r u c t u r e, 24 } 25 26 A p p l i c a t i o n s to Deploy 27 a p p l i c a t i o n s t o d e p l o y = { 28 29 A p p l i c a t i o n 1 30 a p p l i c a t i o n 1 = { 31 32 a p p l i c a t i o n = mapreduce app, 33 d e p l o y m e n t a c t o r = deployment actor, 34 t a r g e t e n v i r o n m e n t = t a r g e t e n v i r o n m e n t, 35 }, 36 }