ROUTING ALGORITHM BASED COST MINIMIZATION FOR BIG DATA PROCESSING


 Dominic Spencer
 1 years ago
 Views:
Transcription
1 ROUTING ALGORITHM BASED COST MINIMIZATION FOR BIG DATA PROCESSING D.Vinotha,PG Scholar,Department of CSE,RVS Technical Dr.Y.Baby Kalpana, Head of the Department, Department of CSE,RVS Technical Campus, ABSTRACT: The information explosion is the rapid increase in the amount of published information or data and the effects of this abundance. As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. Therefore, it is imperative to study the cost minimization problem for big data processing in geodistributed data centres. The cost efficient in big data processing because of the following weaknesses. First, data locality may result in a waste of resources. Second, the links in networks vary on the transmission rates and costs according to their unique features the distances and physical optical fiber facilities between data centers. To conquer above weaknesses, the cost minimization problem for big data processing via joint optimization of task assignment, data placement, and routing in geodistributed data centers has been studied. Finally, the comparison is made and the changes and improvement werestudied. Keywords: Big data, Data centre resizing, routing algorithm,data centres, markov chain process 1. INTRODUCTION The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big data era. Different from conventional cloud services, one of the main features of big data services is the tight coupling between data and computation as computation tasks can be conducted only when the corresponding data is available. As a result, three factors, i.e., task assignment, data placement and data movement, deeply influence the operational expenditure of data centers. In this paper, we are motivated to study the cost minimization problem via a joint optimization of these three factors for big data services in geodistributed data centers. Big data analysis has shown its great potential in unearthing valuable insights of data to improve decision making, minimize risk and develop new products and services. By 2015, 71% of worldwide data center hardware spending will come from the big data processing, which will surpass $126.2 billion.the study of the cost minimization problem via a joint optimization of three factors task assignment, data placement and data movement, deeply influence the operational expenditure of data centers for big data services in geodistributed data centers have been introduced. To describe the task completion time with the consideration of both data transmission and computation, a twodimensional Markov chain and derive the average task completion time in closedform has been proposed. Furthermore, model of the problem as a MixedInteger NonLinear Programming (MINLP) and propose an efficient solution to linearize has done. The high efficiency of their proposal is validated by extensive simulation based studies [6]. 2 RELATED WORKS 2.1Multilevel Power Management The coordination problem has been seeked.there are two key contributions. First, a power management solution that coordinates different individual approaches has been proposed and validated. Using simulations based on 180 server traces from nine different realworld enterprises, demonstrate the correctness, stability, and efficiency advantages of solution.second, using unified architecture as the base, a detailed quantitative sensitivity analysis has performed and draw conclusions about the impact of different architectures, implementations, workloads, and system design choices.perform a detailed sensitivity analysis to evaluate several interesting variations in the architecture and implementation, and in the mechanisms and policies space is the main advantage.power delivery, electricity consumption, and heat management are becoming key challenges in data center environments.there is individual solution to solve this problem no coordination between them were the demerits[9]. 2.2 Poisson Model Predicting the next request of a user as she visits Web pages has gained importance as Web based activity increases. There are a number of different approaches to prediction. It concentrates on 26
2 the discovery and modeling of the user's aggregate interest in a session. This approach relies on the premise that the visiting time of a page is an indicator of the user's interest in that page. Even the same person may have different desires at different times.the model has an advantage over previous proposals in terms of speed and memory usage.the experiments show that the model can be used on Web sites with different structures.to confirm our finding, compare these models to two previously proposed recommendation models. Results show that this model improves the efficiency significantly. If the representation is not appropriate for the model, the prediction accuracy will decrease [2]. equation 2.3 Geographical Load Balancing The exploration of whether geographical load balancing can encourage use of green renewable energy and reduce use of brown fossil fuel energy has done. It makes two contributions. First, derive two distributed algorithms for achieving optimal geographical load balancing. Second, show that if electricity is dynamically priced in proportion to the instantaneous fraction of the total energy that is brown, then geographical load balancing provides significant reductions in brown energy use. Geographical load balancing provides a huge opportunity for environmental benefit as the penetration of green, renewable energy sources increases. Specifically, an enormous challenge facing the electric grid is that of incorporating intermittent, unpredictable renewable sources such as wind and solar.geographical load balancing aims to reduce energy costs, but this can come at the expense of increased total energy usage.by routing to a data center farther from the request source to use cheaper energy, the data center may need to complete the job faster, and so use more service capacity, and thus energy, than if the request was served closer to the source[6]. 2.3 Cost minimization Data centre resizing (DCR) has been proposed to reduce the computation cost by adjusting the number of activated servers via task placement. To describe the rateconstrained computation and transmission in big data processing process, a two dimensional Markov chain and derive the expected task completion time in closed form has been proposed. To deal with the high computational complexity of solving MINLP, a mixedinteger linear programming (MILP) problem is linearized, which can be solved using commercial solver.dcr and task placement are usually jointly considered to match the computing requirement.[5] Consider the below table 1 for various references in following 3 SYSTEM MODEL Based on the study of data placement, task assignment, data center resizing and routing, the overall operational cost in largescale geodistributed data centers for big data applications will be minimized.first characterize the data processing process using a twodimensional Markov chain and derive the expected completion time in closedform, based on which the joint optimization is formulated as an MINLP problem. To tackle the high computational complexity of solving MINLP, linearize it into an MILP problem. Through extensive experiments, jointoptimization solution has substantial advantage over the approach by twostep separate optimization. K shortest path algorithm is used to perform the minimum shortest path for routing. 3.1Big data and Data Flow Collecting dataset for big data is the first task. The whole system can be modelled as a directed graph G = (N;E).Receive data flows from source nodes and forward them according to the routing strategy. The weight of each link w(u;v), representing the corresponding communication cost, can be defined as Where CR and CL, and are the interdata centre traffic and local transmission cost such that CR> CL. 27
3 3.2Data placement We define a binary variable yjk to denote whether chunk k is placed on server j as follows, In the distributed file system, we maintain P copies for each chunk k < K, which leads to the following constraint: Furthermore, the data stored in each server j belongs to J cannot exceed its storage capacity, i.e., The data placement and task assignment are transparent to the data users with guaranteed QOS. Let be the processing rate and loading rate for data chunk k on server j, respectively. The processing procedure then can be described by a twodimensional markov chain process. According to the QoS requirement, Where (6) (5) (7) 3.3Routing of distributed data centers and Cost minimization The cost minimization problem for big data processing via joint optimization of task assignment, data placement, and routing in geodistributed data centers. Specifically, consider the following issues in joint optimization. Servers are equipped with limited storage and computation resources. Each data chunk has a storage requirement and will be required by big data tasks. K Shortest Path Routing Algorithm The K shortest path routing algorithm is an extension algorithm of the shortest path routing algorithm in a given network. It is sometimes crucial to have more than one path between two nodes in a given network. In the event there are additional constraints, other paths different from the shortest path can be computed. To find the shortest path one can use shortest path algorithms such as Dijkstra s algorithm or Bellman Ford algorithm and extend them to find more than one path. The K Shortest path routing algorithm is a generalization of the shortest path problem. The algorithm not only finds the shortest path, but also K other paths in order of increasing cost. K is the number of shortest paths to find. The problem can be restricted to have the K shortest path without loops (loopless K shortest path) or with loop [4] 3.4Task assignment A task is distributed to a server where its requested data chunk does not reside, it needs to wait for the data chunk to be transferred. Each task should be responded in time D. Moreover, in practical data center management, many task predication mechanisms based on the historical statistics have been developed and applied to the decision making in data centers. To keep the data center settings uptodate, data center operators may make adjustment according to the task predication period by period.to deal with the high computational complexity of solving MINLP, linearize it as a mixedinteger linear programming (MILP) problem, which can be solved using commercial solver. Through extensive numerical studies, show the high efficiency of proposed jointoptimization based algorithm.the flow of work can be explained in the Fig 1.1.During the file transfer, files of size > 10MB are transferred to their destination. If File sending to S>D cost exceeds the Server cost means the cost minimization to be done where D is the number of copies. Algorithm The Dijkstra s algorithm can be generalized to find the K Shortest path. Algorithm *P =empty, *count u = 0, for all u in V insert path P s = {s} into B with cost 0 while B is not empty and count t < K: let P u be the shortest cost path in B with cost C B = B {P u }, count u = count u + 1 if u = t then P = P U P u if count u K then for each vertex v adjacent to u: 28
4 let P v be a new path with cost C + w(u, v) formed by concatenating edge (u, v) to path P u insert P v into B cost for the number of servers, communication and operation are determined. (a) SERVER COST (8) 4 JOINT OPTIMIZATION To linearize the constrains due to product of two variables joint optimization is done. We define a new variable as follows (9) Which can be equivalently replaced by linear constrains as (10) (11) The constrains can be written in linear form as SERVER COST COMMUNICATION COST 1000 JOINT NO OF REPLICAS K MAP (b) COMMUNICATION COST JOINT NO OF SERVER KMAP (12) (13) In a similar way,we define a new variable as Which can be linearized by (14) OPERATION COST (c) OPERATION COST NO OF SERVER JOINT K MAP (15) 5 PERFORMANCE MEASURE (16) The performance results of routing algorithm (k map) is analyzed which is compared with a separate optimization scheme algorithm (joint), in which minimum number of servers to be activated is found, the traffic routing scheme using the network flow model is described. The result graph will be nonjoint, joint, genetic algorithmperformance graph.from the below graph,the values of both joint and kmap has been compared. The values of kmap will high value than using joint linear method. Based on this individual 6 CONCLUSION: Thus the study of the data placement, task assignment, data center resizing and routing to minimize the overall operational cost in largescale geodistributed data centers for big data applications has done. Therefore first characterize the data processing process using a twodimensional Markov chain and derive the expected completion time in closedform, based on which the joint optimization is formulated as an MINLP problem. To tackle the high computational complexity of solving MINLP, linearize it into an MILP problem. Through extensive experiments, show that jointoptimization solution has substantial advantage over the approach by twostep separate optimization.through extensive 29
5 numerical studies, it show the high efficiency of proposed jointoptimization based algorithm. This to be enhanced using Coupling Genetic Algorithm with a Grid Search Method to Solve Mixed Integer Nonlinear Programming Problems. REFERENCES [1]J.Dean and S.Ghemawat, Mapreduce: simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp , [2] S. Gunduz and M. Ozsu, A poisson model for user accesses to web pages, in Computer and Information Sciences  ISCIS 2003, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2003, vol. 2869, pp [3] B.L.HongXu,ChenFeng, Temperature Aware Workload Management in Geodistributed Datacenters, in Proceeding of International Conferences on Measurement and Modelling of Computer Systems (SIGMETRICS).ACM, 2013, pp [4] shortest path routing [5] Lin Gu, DezeZeng Cost Minimization for Big Data Processing in GeoDistributed Data Centers, Member, IEEE, Peng Li, Member, IEEE and Song Guo, Senior Member, IEEE /TETC , [6] Z.Liu, M.Lin, A.Wierman, S.H.Low, and L.L. Andrew, Greening Geographical Load Balancing, in Proceedings of International Conference on Measurement and Modelling of Computer Systems (SIGMETRICS).ACM, 2011, pp [7] Z. Liu, Y. Chen, C. Bash, A. Wierman, D. Gmach, Z. Wang, M. Marwah, and C. Hyser, Renewable and Cooling Aware Workload Management for Sustainable Data Centers, in Proceedings of International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). ACM, 2012, pp [8] I.Marshall and C.Roadknight, ss, Computer Networks and ISDN Systems, vol.30, no.223, pp , [9] R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, and X. Zhu, No Power Struggles: Coordinated Multilevel Power Management for the Data Center, in Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2008, pp [10] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, Xoring elephants: novel erasure codes for big data, in Proceedings of the 39th international conference on Very Large Data Bases, ser. PVLDB 13. VLDB Endowment, 2013, pp [11]A.Qureshi,R.Weber,H.Balakrishnan,J.Guttang,an d B.Maggs, Cutting the Electric Bill for Internetscale Systems, in Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).ACM,2009,pp [12]R.Urgaonkar, B.Urgaonkar, M.J.Neely, and A.Sivasubramaniam, Optimal Power Cost Management Using Stored Energy in Data Centers, in Proceeding of International Conferences on Measurement and Modelling of Computer Systems (SIGMETRICS).ACM, 2011, pp
Greenhead: Virtual Data Center Embedding Across Distributed Infrastructures
: Virtual Data Center Embedding Across Distributed Infrastructures Ahmed Amokrane, Mohamed Faten Zhani, Rami Langar, Raouf Boutaba, Guy Pujolle LIP6 / UPMC  University of Pierre and Marie Curie; 4 Place
More informationEnergyaware joint management of networks and cloud infrastructures
Energyaware joint management of networks and cloud infrastructures Bernardetta Addis 1, Danilo Ardagna 2, Antonio Capone 2, Giuliana Carello 2 1 LORIA, Université de Lorraine, France 2 Dipartimento di
More informationCLOUD computing has recently gained significant popularity
IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 1, NO. 1, JANUARYJUNE 2013 1 Greenhead: Virtual Data Center Embedding across Distributed Infrastructures Ahmed Amokrane, Student Member, IEEE, Mohamed Faten
More informationOnline Social Network Data Placement over Clouds
Online Social Network Data Placement over Clouds Dissertation zur Erlangung des mathematischnaturwissenschaftlichen Doktorgrades "Doctor rerum naturalium" der GeorgAugustUniversität Göttingen in the
More informationNo Power Struggles: Coordinated Multilevel Power Management for the Data Center
No Power Struggles: Coordinated Multilevel Power Management for the Data Center Ramya Raghavendra *, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, Xiaoyun Zhu * University of California, Santa
More informationDynamic Virtual Machine Allocation in Cloud Server Facility Systems with Renewable Energy Sources
Dynamic Virtual Machine Allocation in Cloud Server Facility Systems with Renewable Energy Sources Dimitris Hatzopoulos University of Thessaly, Greece Iordanis Koutsopoulos Athens University of Economics
More informationOptimal Positioning of Active and Passive Monitoring Devices
Optimal Positioning of Active and Passive Monitoring Devices Claude Chaudet Claude.Chaudet@enst.fr GET/ENST LTCIUMR 5141 CNRS 46, rue Barrault 75634 Paris, France Eric Fleury, Isabelle Guérin Lassous
More informationReducing Electricity Cost Through Virtual Machine Placement in High Performance Computing Clouds
Reducing Electricity Cost Through Virtual Machine Placement in High Performance Computing Clouds Kien Le, Ricardo Bianchini, Thu D. Nguyen Department of Computer Science Rutgers University {lekien, ricardob,
More informationGreenCloud: a packetlevel simulator of energyaware cloud computing data centers
J Supercomput DOI 10.1007/s1122701005041 GreenCloud: a packetlevel simulator of energyaware cloud computing data centers Dzmitry Kliazovich Pascal Bouvry Samee Ullah Khan Springer Science+Business
More informationAutonomic Service Management in Mobile Cloud Infrastructures
Autonomic Service Management in Mobile Cloud Infrastructures 1 Aduragba Olanrewaju Tahir 2 Adedoyin Adeyinka Dept. of Electrical and Computer Engineering Dept. of Information and Communication Science
More informationCost Minimization for Computational Applications on Hybrid Cloud Infrastructures
Cost Minimization for Computational Applications on Hybrid Cloud Infrastructures Maciej Malawski a,b Kamil Figiela b Jarek Nabrzyski a a University of Notre Dame, Center for Research Computing, Notre Dame,
More informationToward Optimal Resource Provisioning for Cloud MapReduce and Hybrid Cloud Applications
Preliminary version. Final version to appear in 8th IEEE International Conference on Cloud Computing (IEEE Cloud 2015), June 27  July 2, 2015, New York, USA Toward Optimal Resource Provisioning for Cloud
More informationEnergyEfficient Resource Management for Cloud Computing Infrastructures
EnergyEfficient Resource Management for Cloud Computing Infrastructures (Published in the Proceedings of the 3 rd IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2011)
More informationEnhancing Data Center Sustainability Through Energy Adaptive Computing
Enhancing Data Center Sustainability Through Energy Adaptive Computing Krishna Kant, George Mason University Muthukumar Murugan, University of Minnesota David H.C. Du, University of Minnesota The sustainability
More informationGreen Cloud: A PocketLevel Simulator with OnDemand Protocol for EnergyAware Cloud Data Centers
Green Cloud: A PocketLevel Simulator with OnDemand Protocol for EnergyAware Cloud Data Centers Anusuya 1, Krishnapriya 2 1 Research Scholar, Department of Computer Science, Sri Ramakrishna College of
More informationWorkloadAware Database Monitoring and Consolidation
WorkloadAware Database Monitoring and Consolidation Carlo Curino curino@mit.edu Evan P. C. Jones evanj@mit.edu Samuel Madden madden@csail.mit.edu Hari Balakrishnan hari@csail.mit.edu ABSTRACT In most
More informationGreenCloud: Economicsinspired Scheduling, Energy and Resource Management in Cloud Infrastructures
GreenCloud: Economicsinspired Scheduling, Energy and Resource Management in Cloud Infrastructures Rodrigo Tavares Fernandes rodrigo.fernandes@tecnico.ulisboa.pt Instituto Superior Técnico Avenida Rovisco
More informationLoad Balancing on Stateful Clustered Web Servers
Load Balancing on Stateful Clustered Web Servers G. Teodoro T. Tavares B. Coutinho W. Meira Jr. D. Guedes Department of Computer Science Universidade Federal de Minas Gerais Belo Horizonte MG Brazil 327000
More informationAnomaly Detection with Virtual Service Migration in Cloud Infrastructures
Institut für Technische Informatik und Kommunikationsnetze Kirila Adamova Anomaly Detection with Virtual Service Migration in Cloud Infrastructures Master Thesis 2638L October 22 to March 23 Tutor: Dr.
More informationCapacity Management and Demand Prediction for Next Generation Data Centers
Capacity Management and Demand Prediction for Next Generation Data Centers Daniel Gmach Technische Universität München, 85748 Garching, Germany daniel.gmach@in.tum.de Jerry Rolia and Ludmila Cherkasova
More informationAnalysis of EndtoEnd Response Times of MultiTier Internet Services
Analysis of EndtoEnd Response Times of MultiTier Internet Services ABSTRACT Modern Internet systems have evolved from simple monolithic systems to complex multitiered architectures For these systems,
More informationChord: A Scalable Peertopeer Lookup Service for Internet Applications
Chord: A Scalable Peertopeer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science chord@lcs.mit.edu
More informationLoad Shedding for Aggregation Queries over Data Streams
Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu
More informationReducing Cluster Energy Consumption through Workload Management
Reducing Cluster Energy Consumption through Workload Management Sara Alspaugh Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS2012108 http://www.eecs.berkeley.edu/pubs/techrpts/2012/eecs2012108.html
More informationMercury and Freon: Temperature Emulation and Management for Server Systems
Mercury and Freon: Temperature Emulation and Management for Server Systems Taliver Heath Dept. of Computer Science taliver@cs.rutgers.edu Ana Paula Centeno Dept. of Computer Science anapaula@cs.rutgers.edu
More informationAdaptive Energy Efficient Distributed VoIP Load Balancing in Federated Cloud Infrastructure
Adaptive Energy Efficient Distributed VoIP Load Balancing in Federated Cloud Infrastructure Andrei Tchernykh, Jorge M. CortésMendoza Computer Science Department CICESE Research Center Ensenada, Baja California,
More informationBigData Processing With Privacy Preserving MapReduce Cloud
BigData Processing With Privacy Preserving MapReduce Cloud R.Sreedhar 1, D.Umamaheshwari 2 PG Scholar, Computer and Communication Engineering, EBET Group of Institutions, EBET Knowledge Park, Tirupur
More informationOn Interferenceaware Provisioning for Cloudbased Big Data Processing
On Interferenceaware Provisioning for Cloudbased Big Data Processing Yi YUAN, Haiyang WANG, Dan Wang, Jiangchuan LIU The Hong Kong Polytechnic University, Simon Fraser University Abstract Recent advances
More informationComparison of Load Balancing Strategies on Clusterbased Web Servers
Comparison of Load Balancing Strategies on Clusterbased Web Servers Yong Meng TEO Department of Computer Science National University of Singapore 3 Science Drive 2 Singapore 117543 email: teoym@comp.nus.edu.sg
More informationUNIVERSITA DEGLI STUDI DI TRENTO Facoltà di Scienze Matematiche, Fisiche e Naturali
UNIVERSITA DEGLI STUDI DI TRENTO Facoltà di Scienze Matematiche, Fisiche e Naturali Corso di Laurea Magistrale in Informatica within European Master in Informatics Final Thesis OPTIMAL RELAY NODE PLACEMENT
More information