A Programming Model for the Cloud Platform



Similar documents
BSPCloud: A Hybrid Programming Library for Cloud Computing *

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

A Secure Password-Authenticated Key Agreement Using Smart Cards

A heuristic task deployment approach for load balancing

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers

Load Balancing By Max-Min Algorithm in Private Cloud Environment

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

Improved SVM in Cloud Computing Information Mining

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Review on the Cloud Computing Programming Model

Resource Scheduling in Desktop Grid by Grid-JQA

An Interest-Oriented Network Evolution Mechanism for Online Communities

What is Candidate Sampling

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

An Optimal Model for Priority based Service Scheduling Policy for Cloud Computing Environment

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

QoS-based Scheduling of Workflow Applications on Service Grids

P2P/ Grid-based Overlay Architecture to Support VoIP Services in Large Scale IP Networks

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

The Greedy Method. Introduction. 0/1 Knapsack Problem

A DATA MINING APPLICATION IN A STUDENT DATABASE

A New Service Pricing Mechanism based on Coalition Game Theory in

Complex Service Provisioning in Collaborative Cloud Markets

Pricing Model of Cloud Computing Service with Partial Multihoming

Enabling P2P One-view Multi-party Video Conferencing

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Politecnico di Torino. Porto Institutional Repository

Distributed Multi-Target Tracking In A Self-Configuring Camera Network

Cloud-based Social Application Deployment using Local Processing and Global Distribution

J. Parallel Distrib. Comput.

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Calculating the high frequency transmission line parameters of power cables

Resource Sharing Models and Heuristic Load Balancing Methods for

The Load Balancing of Database Allocation in the Cloud

Optimal Provisioning of Resource in a Cloud Service

Performance Analysis and Comparison of QoS Provisioning Mechanisms for CBR Traffic in Noisy IEEE e WLANs Environments

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Cloud Auto-Scaling with Deadline and Budget Constraints

A Simple Approach to Clustering in Excel

Dynamic Resource Allocation for MapReduce with Partitioning Skew

Checkng and Testng in Nokia RMS Process

Network Aware Load-Balancing via Parallel VM Migration for Data Centers

Schedulability Bound of Weighted Round Robin Schedulers for Hard Real-Time Systems

Loop Parallelization

Hosting Virtual Machines on Distributed Datacenters

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

7.5. Present Value of an Annuity. Investigate

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Resource Management and Organization in CROWN Grid

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Multi-Source Video Multicast in Peer-to-Peer Networks

Project Networks With Mixed-Time Constraints

An Adaptive Cross-layer Bandwidth Scheduling Strategy for the Speed-Sensitive Strategy in Hierarchical Cellular Networks

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

BERNSTEIN POLYNOMIALS

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Damage detection in composite laminates using coin-tap method

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Forecasting the Direction and Strength of Stock Market Movement

An Evolutionary Game Theoretic Approach to Adaptive and Stable Application Deployment in Clouds

Optimal Scheduling in the Hybrid-Cloud

Activity Scheduling for Cost-Time Investment Optimization in Project Management

A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning

Introduction CONTENT. - Whitepaper -

Testing and Debugging Resource Allocation for Fault Detection and Removal Process

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Transcription:

Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com Abstract Programmng models for cloud computng has become a research focus recently. Cloud computng promses to provde on-demand and flexble IT servces, whch goes beyond tradtonal programmng models and calls for new ones. Some progress has been made n cloud computng programmng models for large-scale data processng, but lttle was done on models of predctable performance. Wth the advantages on predctable performance, easly programmng and deadlock avodance, the BSP model has been wdely appled n parallel databases, search engnes, and scentfc computng. Ths paper targets to adapt the BSP model nto cloud envronment. The schedulng of computng tasks and the allocaton of cloud resources wll be ntegrated nto the BSP model. A BSPCloud programmng model wth predctable performance s proposed. Keywords: Programmng Model; Cloud Computng; BSPCloud; Bulk Synchronous Parallel. Introducton Cloud computng ntegrates vast computng and/or storage resources together, whch provdes servces on demand va networks. Developers request resources on demand and pay for t by hours. Developers can also ncrease or decrease resources accordng to ther demand. Cloud computng provdes convenence for applcaton development and run. Meanwhle, t brngs new challenges for cloud computng programmng models. Research on cloud computng programmng models has made some progress, such as Google s MapReduce [] and Mcrosoft s Dryad [2]. However, there are stll some ssues to be further studed. Frstly, the current cloud computng programmng models manly focus on processng mass data. Computaton ntensve and I/O ntensve applcaton programmng on cloud computng have been a new topc. Secondly, when one programmng a cloud applcaton, t s very mportant for the programmer to rely on a smple yet realstc cost model. Study on performance predctable cloud computng programmng model s of great sgnfcance. The Bulk Synchronous Parallel (BSP) [3] model s orgnally proposed by Harvard s Valant. Its ntal am s to brdge parallel computaton software and archtecture. The advantages of BSP model are manly on three aspects: Frstly, ts performance can be predcted. Secondly, no deadlock occur when message passng. Thrdly, t s easy to program. Because of the above advantages, mprovement models based on BSP were used n many programmng envronments. In order to program on heterogeneous envronment, the authors of [4] propose a Heterogeneous Bulk Synchronous Parallel (HBSP) model. The authors of [5, 6]extend the BSP model through the mgraton to solve programmng on heterogeneous grd envronment. The authors of [7] apply the BSP model on parallel database and the authors of [8] apply BSP model on search engnes. Ths paper proposes a new cloud computng programmng model BSPCloud, whose performance can be predcted. Our BSPCloud not only can be used for data ntensve applcaton but also can be used for computaton ntensve and I/O ntensve applcaton. 75

Internatonal Journal of Advanced Scence and Technology The BSP model communcates between any pars of computng nodes. So, t can t explot communcaton localty, whch means a computng node does not communcate wth all, and t only communcates wth adjacent (e.g., the same server or the same data center) nodes. Our BSPCloud Usng herarchcal communcaton mechansm, whch makes communcaton occur between adjacent nodes as far as possble. In order to facltate the use of communcaton localty, ths paper organzes computng node nto a tree accordng to communcaton ablty, whch wll be further dscussed n sesson 2.. The novelty of BSPCloud can be summarzed as follows: The BSPCloud programmng model performance s predctable. The programmer can rely on a smple yet realstc cost model when desgns a cloud computng applcaton. The BSPCloud model adapts to a broad varety of applcaton, for example, data ntensve applcaton, computaton applcaton, I/O ntensve applcaton, and t can be expanded. The BPSCloud uses herarchcal communcaton mechansm, whch explots communcaton localty. Ths paper also proposes a vrtual resource tree accordng to communcaton ablty. The rest of the paper s organzed as follows. Secton 2 descrbes the BSPCloud programmng model. Secton 3 presents the BSPCloud performance cost model. Secton 4 descrbes the related work. Secton 5 concludes our work. 2. BspCloud Programmng Model/Archtecture BSPCloud s a programmng model for cloud computng, and t s goal s to provde a programmng model whose performance can be predcted. The programmer can rely on a smple yet realstc cost model when desgns a cloud computng program. A schematc of the BSPCloud model organzaton s shown n Fgure. User Layer BSPWare ResourageManager Montor TreeKeeper BulkManager ResourcePck Bulk Bulk Bulk Bulk Resource pool Fgure. BspCloud Programmng Model BSPWare s the core of the system archtecture, whch s responsble for schedulng of computng tasks and allocaton of cloud resources. Montor s used to montor the entre cloud platform resources. TreeKeeper s manly used to construct and mantan resources tree. ResourcePck s responsble for selectng resources from vrtual resource tree, whch s used to partcpate n computaton. BulkManager s the control center of 76

Internatonal Journal of Advanced Scence and Technology the applcaton program, and t dvdes applcaton program nto many s whch run parallel and t also responsble for fault tolerant. 2.. Resources Organzatonal Strategy In cloud computng envronment, network bandwdth s relatvely rarely. In order to make full use of network resources, our BSPCloud uses herarchcal mechansm. To motvate further dscusson, let us take an example of cloud computng envronment topologcal structure (as shown n Fgure 2). Cloud Contronl Center Date Center Date Center 2 Cluster Cluster n Cluster Cluster m Mult Mult Mult Mult Mult Mult Mult Mult Fgure 2. Cloud Computng Topologcal Structure The above graph s an abstract from the reslstc cloud computng platform, and t has two data centers(data center and data center 2), whch locate n dfferent regons, data center has n computng clusters and data center 2 has m computng clusters, and each cluster s composed of many mult core nodes. Communcaton qualty s dfferent between nodes because of dfferent localty. For example, communcaton between two nodes whch locate n the same cluster and the same data center s faster than whch locate dfferent data center. In order to make full use of network resources, ths paper organzes computng nodes as a vrtual resource tree whch s managed by TreeKeeper. Fgure 3 shows dynamc changes of vrtual resource tree. Each crcle n ths fgure represents a computng node and rectangular represents control nodes whch are used to manage computng nodes n a herarchcal manner. Ths paper sgns busy nodes n black and dle nodes n whte. When all computng nodes of one control node are busy, the control node s sgned black. When users submt an applcaton to cloud platform, ResourcePck selects computng resource for applcaton from vrtual resource tree. 77

Internatonal Journal of Advanced Scence and Technology BSPWare BulkManager ResourcePck data partton Computng Resource Computng Resource Bulk Computng Resource Resource Pool Fgure 3. Execute Overvew Bulk Barrer Synchronous 2.2. Computng Tasks Parttonng Model In cloud envronment, the data are usually very large, and t unt s often chunk (e.g., each chunk of Google s GFS s 64M). Ths paper assumes that data sze whch s needed to be processed s N, and dvdes tasks nto n s B ( B, B2 Bn ). In the case data can be parttoned arbtrarly. Ths paper assumes partton X ( X, X 2 X n ) make load balance, and X s gven as X f = ( + β ) n n = f g = g N α () where f s the frequency of B, g s the network through rate of B,α and β are the scale parameters of the computng phase and communcatng phase. Because data unt s often chunk n cloud envronment, the above load balance s dffcult to acheve. One way s to take the approxmate method, but ths method has a defect. Let us pck a smple example of a fve s whch s used to process 448M data and each chunk sze s 64M. Ths paper assumes partton ( 90,85,95,90,88) make load balance. However, f ths paper uses approxmate method, the frst four s partton wll be ( 64,64,64,64), and the last wll have to be assgned 92M data. 78

Internatonal Journal of Advanced Scence and Technology In our BSPCloud model, ths paper uses followng data partton model: Mnmze X Y (2) n Subject to Y N (3) Y, = = 0 (4) The objectve functon gven by Equaton (2) s to fnd an optmzaton partton Y ( Y, Y2 Yn ), and the constrants (3) are used to guarantee partton cannot exceed ts total amount. The constrants (4) are used to guarantee each partton s not negatve. For make tasks partton be easy, ths paper uses herarchcal tasks parttonng strategy. For example, data center assgn computng tasks to ts cluster accordng to computng ablty of each cluster, and when the cluster receves the tasks, t assgns the tasks to computng nodes mmedately. 2.3. Executon Overvew BulkManager automatcally partton the nput data nto several s, whch can be parttoned recursvely. ResourcePck selects computng resources from the cloud platform. Bulks are mapped to resources by BulkManager. Bulks compute parallel and they communcate use herarchcal mechansm. Fgure 3 shows the overall flow of our BSPCloud. For smplcty, there are only two level s n Fgure3. When the applcaton program s submtted to the cloud platform,the followng actons occur. ) partton BulkManager partton the frst level s B, B2 Bn, and then the frst level control node contnue to partton the second level s (e.g., B s dvded nto B, B2 B m ). The last layer s are leaf nodes (see n Fgure 2), each computes parallel. 2) resource select ResourcePck selects resources from cloud platform accordng to applcaton demands. Selected resources are organzed nto many resource s, and the number of whch s equal to data s. 3) communcaton phas BSPCloud makes communcaton occur between adjacent nodes as far as possble. It s communcaton model uses herarchcal mechansm. Bulks only n the same level can communcate. The advantage of ths s reducng communcaton overhead and mprovng the scalablty. 4) fault tolerance polcy Snce BSPCloud executon s composed of a set of super-steps, BSPCloud sets a checkpont after each super-step. When error occurs, program needn t executes from the begnnng and t only needs to execute from the last checkpont. 79

Internatonal Journal of Advanced Scence and Technology 3. Cost Analyss The BSPCloud model for level d wll be specfed by ( b, g, l, s), ( b 2, g 2, l2, s2 ) ( b d, g d, ld, sd ). b s the number of the th level, g s the network throughput rate of the th level, l s the tme requred for barrer th synchronzaton of the level, s s the number of super-step. For smplcty, the BSPCloud assumes the numbers of sub s of each are same. The tme cost can be decded by follow equaton T ( ) = W + M g + l (5) T ( d) = sd max ( T ( d )) + M d g d + l d s (d>) (6) s W bd = ( ) max( ω s ) (7) s b = ( s) M d max ( m ) (8) (s) = d d b s= ω s processng tme of the th of super-step s, th of super-steps. (s) m s network throughput of the 4. Related Work Research on programmng models for cloud computng has been hot n recent years. Google s MapReduce[] hdes the detals of parallelzaton, fault tolerance, and load balance,. Users specfy the computaton n terms of map and reduce functon, and the runtme system automatcally parallelzes the computaton across large-scale clusters of machnes. Hadoop[9] s the open-source project of MapReduce. MapReduce two stage computaton s rgd, and t only allows one nput and one output, Mcrosoft s Dryad[2] s more flexble, and t allows nput and output are arbtrary number, a Dryad job s a drected acyclc graph where each vertex s a program and edges represent data channels. There s some programmng model s specal on massve data processng, Yahoo s Pg Latn[0] and HadoopDB[] combne hgh level declaratve style of SQL and low-level procedural style of map-reduce. Google s Pregel [2] s a dstrbute programmng model of graph processng whch s based on BSP model. Hama[3] s a dstrbute parallel computng model whch s based on Hadoop, t s processng graph use BSP model.[4] presents a parallel programmng model on grd based on BSP. 5. Conclusons and Future Work When one programmng a cloud applcaton, t s very nterestng for the programmer to rely on a smple yet realstc cost model. In ths paper, a cloud computng programmng model whch performance predctable s proposed, ths paper calls t BSPCloud. BSPCloud adapts the BSP model nto cloud envronment. The schedulng of computng tasks and the allocaton of cloud resources are ntegrated nto the BSP mode. In order to explot communcaton localty, BSPCloud commncaton uses herarchcal mechansm, whch means a computng node does not communcate wth all, but only communcate wth adjacent nodes. In the future, we wll mplement the programmng model and deploy t n the cloud computng platform. 80

Internatonal Journal of Advanced Scence and Technology Acknowledgements Ths work s supported by Innovaton Acton Plan supported by Scence and Technology Commsson of Shangha Muncpalty (No.5500200). References [] J. Dean and S. Ghemawat, Mapreduce: Smplfed data processng on large clusters, Communcatons of the ACM, vol. 5, no., (2008), pp. 07-3. [2] M. Iard, M. Budu and Y. Yuan, Dryad: dstrbuted data-parallel programs from sequental buldng blocks, Oper Syst Rev, vol. 4, no. 3, (2007), pp. 59-72. [3] L. G. Valant, A brdgng model for parallel computaton, Communcatons of the ACM, vol. 33, no. 8, (990), pp. 03-. [4] T. L. Wllams, R. J. Parsons, The Heterogeneous Bulk Synchronous Parallel model, ROLIM J. Parallel and Dstrbuted Processng, Proceedngs. Berln; Sprnger-Verlag Berln, (2000). [5] D. Rghr, L. Plla and A. Carssm, MgBSP: A Novel Mgraton Model for Bulk-Synchronous Parallel Processes Reschedulng, New York: Ieee, (2009). [6] O. Bonorden, Load balancng n the -synchronous-parallel settng usng process mgratons, IEEE Internatonal Parallel and Dstrbuted Processng Symposum (IEEE Cat No07TH8938), (2007), pp. -9. [7] M. A. H. Hassan and M. Bamha, Parallel processng of group-by jon queres on shared nothng machnes, F, Sprnger-Verlag New York Inc., (2008). [8] V. G. Costa, A. Prnrsta and M. Marn, A parallel search engne wth BSP, F, IEEE, (2005). [9] Hadoop, [F]. http://hadoop.apache.org/. [0] C. Olston, B. Reed and U. Srvastava, Pg latn: a not-so-foregn language for data processng, F, ACM, (2008). [] A. Abouzed, K. Bajda-Pawlkowsk and D. Abad, HadoopDB: An archtectural hybrd of MapReduce and DBMS technologes for analytcal workloads, Proceedngs of the VLDB Endowment, vol. 2, no., (2009), pp. 922-33. [2] A. Malewcz. Gregorz, H. Matthew, J. C, Bk Aart, H. llan, L. Naty and C. Grzegorz, Pregel: A system for large-scale graph processng, Proceedngs of the 200 Internatonal Conference on Management of Data, 200, pp: 35-45. [3] S. Sangwon, J. Yoon Edward, K. Jaehong, J. Seongwook, K. Jn-Soo and M. Seungryoul, HAMA: An effcent matrx computaton wth the MapReduce framework, 2nd IEEE Internatonal Conference on Cloud Computng Technology and Scence, (200), pp. 72-726. [4] W. Tong, J. Dng and L. Ca, A Parallel Programmng Envronment on Grd, Proc. of ICCS, LNCS, vol. 2658, no., (2003), pp. 225-234. 8

Internatonal Journal of Advanced Scence and Technology 82