Moving Big Data to The Cloud
|
|
|
- Ralf Manning
- 10 years ago
- Views:
Transcription
1 Moving Big Data to The Cloud Linquan Zhang, Chuan Wu, Zongpeng Li, Chuanxiong Guo, Minghua Chen and Francis C.M. Lau The University of Hong Kong, Hong Kong University of Calgary, Canada Microsoft Research Asia, China The Chinese University of Hong Kong, Hong Kong Abstract Cloud computing, rapidly emerging as a new computation paradigm, provides agile and scalable resource access in a utility-like fashion, especially for the processing of big data. An important open issue here is how to efficiently move the data, from different geographical locations over time, into a cloud for effective processing. The de facto approach of hard drive shipping is not flexible, nor secure. This work studies timely, cost-minimizing upload of massive, dynamically-generated, geodispersed data into the cloud, for processing using a MapReducelike framework. Targeting at a cloud encompassing disparate data centers, we model a cost-minimizing data migration problem, and propose two online algorithms, for optimizing at any given time the choice of the data center for data aggregation and processing, as well as the routes for transmitting data there. The first is an online lazy migration (OLM) algorithm achieving a competitive ratio of as low as 2.55, under typical system settings. The second is a randomized fixed horizon control (RFHC) algorithm achieving a competitive ratio of 1+ 1 κ with l+1 λ a lookahead window of l, where κ and λ are system parameters of similar magnitude. I. INTRODUCTION The cloud computing paradigm enables rapid on-demand provisioning of server resources (CPU, storage, bandwidth) to users, with minimal management efforts. Recent cloud platforms, as exemplified by Amazon EC2 and S3, Microsoft Azure, Google App Engine, Rackspace, etc., organize a shared pool of servers from multiple data centers, and serve their users using virtualization technologies. The elastic and on-demand nature of resource provisioning makes a cloud platform attractive for the execution of various applications, especially computation-intensive ones [1]. More and more data-intensive (big data) applications, e.g., Facebook, Twitter, and big data analytics applications, such as the Human Genome Project [2], are relying on the clouds for processing and analyzing their petabyte-scale data sets, using a computing framework such as MapReduce and Hadoop [3]. An important issue however has largely been left out in this respect: How does one move the massive amounts of data into a cloud, in the very first place? The current practice is to copy the data into large hard drives for physical transportation to the data center [4], or even to move entire machines [5]. Such physical transportation incurs undesirable delay and possible service downtime, while outputs of the data analysis are often needed to be presented to users in the most timely fashion [5]. It is also less secure, given that the hard drives are prone The research was supported in part by Hong Kong RGC General Research Fund (717812, , , and ), a China 973 Program (2012CB315904), an Area of Excellence Grant (AoE/E-02/08), and two gift grants from Microsoft and Cisco. to infection of malicious programs and damages from road accidents. A safer and more flexible data migration strategy is in need, to minimize any potential service downtime. The challenge escalates when we consider that data are dynamically and continuously produced, from different geographical locations, e.g., astronomical data from disparate observatories [6], user data from different Facebook front-end servers. For dynamically-generated data, an efficient online algorithm is desired, for timely guiding the transfer of data into the cloud over time; for geo-dispersed data sets, we wish to select the best data center to aggregate all data onto (e.g., Amazon Elastic MapReduce launches all processing nodes in the same EC2 Availability Zone [7]), given that a MapReducelike framework is most efficient when data to be processed are all in one place, and not across data centers due to the enormous overhead of inter-data center data moving in the stage of shuffle and reduce [8]. As the first dedicated effort in the cloud computing literature, this work studies timely, cost-minimizing migration of massive amounts of dynamically-generated, geo-dispersed data into the cloud, for processing using a MapReduce-like framework. Targeting a typical cloud platform that encompasses disparate data centers of different resource charges, we carefully model the cost-minimizing data migration problem, and propose efficient online algorithms, which optimize the routes of data into the cloud and the choice of the data center for data aggregation and processing, at any give time. Our detailed contributions are as follows: We analyze the detailed cost composition and identify the performance bottleneck for moving data into the cloud, and formulate an offline optimal data migration problem. The optimization computes optimal data routing and aggregation strategies at any given time, and minimizes the overall system cost and data transfer delay, over a long run of the system. Two online algorithms are proposed to practically guide data migration over time: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm. Theoretical analyses show that the OLM algorithm achieves a worst-case competitive ratio of 2.55, without the need of any future information and regardless of the system scale, under the typical settings in real-world scenarios. The κ l+1 RFHC algorithm achieves a competitive ratio of 1+ 1 λ that approaches 1 as the lookahead window l grows. Here κ and λ are system dependent parameters of similar magnitude. We conduct extensive experiments to evaluate the performance of our online algorithms, using real-world meteorolog-
2 ical data generation traces. The online algorithms can achieve close-to-offline-optimum performance in most cases examined, revealing that the theoretical worst-case competitive ratios are pessimistic, and only correspond to rare scenarios in practice. The remainder of this paper is organized as follows. We discuss related work in Sec. II, and present the system model and the offline optimal data migration problem in Sec. III. We design two online algorithms and analyze their competitive ratios in Sec. IV. Evaluation results are presented in Sec. V. Sec. VI concludes the paper. II. RELATED WORK The recent years have witnessed significant interest in migrating different applications onto the cloud platform. Hajjat et al. [9] develop an optimization model for migrating enterprise IT applications onto a hybrid cloud. Wu et al. [10] advocate deploying social media applications into clouds, for leveraging the rich resources and pay-as-you-go pricing. These projects focus on workflow migration and application performance optimization, by carefully deciding the modules to be moved to the cloud and the data caching/replication strategies in the cloud. The very rudimentary question of how to move large volumes of application data into the cloud however is not explored. Few existing work discussed such transfer of large amounts of data to the cloud. Cho et al. [11] design Pandora, a costaware planning system for data transfer to the cloud provider, via both the Internet and courier services. Different from our study, they focus on static scenarios with a fixed amount of bulk data to transfer, rather than dynamically generated data; in addition, a single cloud site is considered, while our study considers multiple data centers. A number of online algorithms have been proposed to address different cloud computing and data center issues. For online algorithms without future information, Lin et al. [12] investigate energy-aware dynamic server provisioning, by proposing a Lazy Capacity Provisioning algorithm with a 3-competitive ratio. Assuming lookahead into the future, Lu and Chen [13] study the dynamic provisioning problem in data centers, design future-aware algorithms based on the classic ski-rental online algorithm. Lin et al. [14] investigate load balancing among geographically-distributed data centers with a receding horizon control (RHC) algorithm, and show that the competitive ratio can be reduced substantially by leveraging the predicted future information. A. System Model III. THE DATA MIGRATION PROBLEM Consider a cloud with K data centers distributed in a set of regions K (K = K ). A user (e.g., a global astronomical telescopes application) continuously produces large volumes of data at a set D of geographic locations (e.g., dispersed telescope sites). The user connects to the data centers from different data generation locations via VPNs, with G VPN gateways (G) at the user side and K VPN gateways each collocated with a data center (Fig. 1). A private network of Data Location 1 Data Location 2 Gateways at the user side GW1 GW2 GW3 GW4 Fig. 1. Gateways at the data center side GW2' GW1' DC 2 DC 1 GW4' Legend Intranet links in cloud Internet links Intranet links at the user side DC 4 GW3' An illustration of the cloud system. the user inter-connects all the data generation locations and the VPN gateways at the user side. Such a model reflects typical connection approaches between users and public clouds (e.g., AWS Direct Connect [15]), where dedicated, private network connections are established between a user s premise and the cloud, for enhanced security and reliability, and guaranteed inter-connection bandwidth. While intra-cloud links and links in the private network are usually over provisioned, the bandwidth U gi on a VPN link (g, i) from user side gateway g to data center i is limited, and constitutes the bottleneck in the system. B. Cost-minimizing Data Migration: Problem Formulation We consider a time-slotted system with slot length τ. F d (t) bytes of data are produced at location d in slot t. l dg is the latency between data location d D and gateway g G, p gi is the delay along VPN link (g, i), and η ik is the latency between data centers i and k. These delays are dictated by the respective geographic distances. A cloud user faces the problem of deciding (i) via which VPN connections to upload the data into the cloud, and (ii) to which data center should they be aggregated, for processing by a MapReduce-like framework, such that the monetary charges incurred, as well as the latency for the data to reach the aggregation point, are minimized. Decision variables. (1) Data routing variable x d,g,i,k (t) denotes the portion of data F d (t) produced at location d in t, to be uploaded through VPN connection (g, i) and then migrated to data center k for processing. x d,g,i,k (t) > 0 indicates that the data routing path d g i k is employed, and x d,g,i,k = 0 otherwise. Let x = (x d,g,i,k (t)) d,g,i,k, the set of feasible data routing variables are: { X = x(t) x d,g,i,k (t) = 1 and x d,g,i,k [0, 1], g G,i K,k K DC 3 } d D, g G, i K, k K. (1) Here g,i,k x d,g,i,k(t) = 1 ensures that all data produced from location d are uploaded into the cloud in t. (2) Binary variable y k (t) indicates whether data center k is target of data aggregation in time slot t (y k (t) = 1) or not (y k (t) = 0). At any given time, exactly one data center is chosen. Let y(t) = (y k (t)) k K, the set of possible data aggregation variables are: { Y = y(t) } y k (t) = 1 and y k (t) {0, 1}, k K. (2) k K
3 Costs. The costs incurred in time slot t include the following components. (1) The overall bandwidth cost for uploading data via the VPN connections, where d D,k K F d(t)x d,g,i,k (t) is the amount uploaded via VPN connection (g, i), and f gi is the unit charge for uploading one byte of data via (g, i): C BW ( x(t)) (f gi F d (t)x d,g,i,k (t)). (3) g G,i K d D,k K d D F d(ν)), (2) Storage and computation costs are important factors to consider in choosing the data aggregation point. In a largescale online application, processing and analyzing in t may involve data produced not only in t, but also from the past, in the form of raw data or intermediate processing results [16]. Without loss of generality, let the amount of current and history data to process in t be F(t) = t (α ν where d D F d(ν) is the total amount of data produced in time slot ν from different data generation locations, and weight α ν [0, 1] is smaller for older times ν and α t = 1 for the current time t. Assume all the other historical data, except those in F(t), are removed from the data centers where they were processed. Let Ψ k (F(t)) be a non-decreasing, convex cost function for storage and computation in data center k in t. The aggregate storage and computing cost in t is: C DC( y(t)) k K y k (t)ψ k (F(t)). (4) (3) The best data center for data aggregation could differ in t than in t 1, due to temporal and spatial variations in data generation. Historical data needed for processing together with d D F d(ν)), the new data in t, at the amount of t 1 (α ν should be moved from the former data center to the current. Let φ ik (z) be the non-decreasing, convex migration cost to move z bytes of data from data center i to date center k, satisfying triangle inequality: φ ik (z) + φ kj (z) φ ij (z). The migration cost between time slot t 1 and time slot t is: CMG( y(t), t y(t 1)) ([y i(t 1) y i(t)] + i K k K t 1 [y k (t) y k (t 1)] + φ ik ( α ν F d (ν))). Here [a b] + = max{a b, 0}. d D (4) We use a routing cost to model delays along the selected routing paths: C RT ( x(t)) Lx d,g,i,k (t)f d (t)(l dg + p gi + η ik ), (6) d,g,i,k where x d,g,i,k (t)f d (t)(l dg + p gi + η ik ) is the product of data volume and delay along the routing path d g i k. L is the routing cost weight converting x d,g,i,k (t)f d (t)(l dg + p gi + η ik ) into a monetary cost, reflecting how latency-sensitive the user is. In this work, L is a constant provided by the user a priori. The latency l dg + p gi + η ik is fixed in each time slot, but can change over time. In summary, the overall cost incurred in t in the system is: C( x(t), y(t)) =C BW ( x(t)) + C DC( y(t))+ C t MG( y(t), y(t 1)) + C RT ( x(t)). (5) (7) The offline optimization problem. The optimization problem of minimizing the overall cost of data upload and processing over a time interval [1, T ], can be formulated as: T minimize C( x(t), y(t)) (8) subject to: t = 1,..., T, t=1 (8a) x(t) X, (8b) d K,k K F d(t)x d,g,i,k (t)/τ U gi, i K, g G, (8c) x d,g,i,k (t) y k (t), d D, g G, i K, k K, (8d) y(t) Y, Constraint (8b) states that the total amount of data routed via (g, i) into the cloud in each time slot should not exceed the upload capacity of (g, i). (8c) ensures that a routing path d g i k is used (x d,g,i,k (t) > 0), only if data center k is the point of data aggregation in t (y k (t) = 1). IV. TWO ONLINE ALGORITHMS We next design two online algorithms for guiding data routing and aggregation over time. The first algorithm relies only on the current and historical information, and the second further exploits predicted information from the future. A. The OLM Algorithm The offline optimization problem in (8) can be divided into T one-shot optimization problems at each time t: minimize C( x(t), y(t)) subject to: (8a)(8b)(8c)(8d). (9) A naive online algorithm that solves (9) in each time slot can be far from optimal, migrating data back and forth prematurely. We design a more judicious online solution by exploring the inter-slot dependencies for data center selection. We divide the overall cost C( x(t), y(t)) into: (i) migration cost CMG t ( y(t), y(t 1)) defined in (5), related to decisions in t 1; and (ii) non-migration cost that relies only on current information at t: C MG( x(t), t y(t)) = C BW ( x(t))+c DC( y(t))+c RT ( x(t)). (10) We design a lazy migration algorithm that postpones data center switching indicated by the one-shot optimum, until the cumulative non-migration cost (in C MG t ( x(t), y(t))) significantly exceeds the potential data migration cost. As shown in Alg. 1, we solve the one-shot optimization in (9) at t = 1, and obtain the optimal data center indicted by y(1), the optimal routes x(1). Let ˆt be the time of the data center switch. In each following time slot t, we compute the overall non-migration cost in [ˆt, t 1], t 1 ν=ˆt Cν MG ( x(ν), y(ν)).the algorithm checks whether this cost is at least β 2 times the migration cost C ˆt MG ( y(ˆt), y(ˆt 1)). If so, it solves the one-shot optimization to derive x(t) and y(t) without considering the migration cost, i.e., by minimizing C MG t ( x(t), y(t)) subject to (8a) (8d) and an additional constraint, that the potential migration cost, CMG t ( y(t), y(t 1)), is no larger than β 1 times the nonmigration cost C MG t ( x(t), y(t)) at time t. If a change of migration data center is indicated ( y(t) y(t 1)), the
4 Algorithm 1 The Online Lazy Migration (OLM) Algorithm 1: t = 1; 2: ˆt = 1; //Time slot when the last change of aggregation data center happens 3: Compute data routing decision x(1) and aggregation decision y(1) by minimizing C( x(1), y(1)) subject to (8a) (8d); 4: Compute CMG 1 ( y(1), y(0)) and C1 MG ( x(1), y(1)); 5: while t T do 6: if Cˆt MG ( y(ˆt), y(ˆt 1)) 1 t 1 β 2 ν=ˆt Cν MG ( x(ν), y(ν)) then 7: Derive x(t) and y(t) by minimizing C MG t ( x(t), y(t)) in (10) subject to (8a) (8d) and constraint CMG t ( y(t), y(t 1)) β 1 C MG t ( x(t), y(t)); 8: if y(t) y(t 1) then 9: Use the new aggregation data center indicated by y(t); 10: ˆt = t; 11: if ˆt < t then //not to use new aggregation data center 12: y(t) = y(t 1), compute data routing decision x(t) by solving (9) if not derived; 13: t = t + 1; algorithm accepts the new aggregation decision, and migrates data accordingly. Otherwise, the aggregation point remains unchanged, and only data routing paths are computed. In Alg. 1, β 2 and β 1 reflect the laziness and aggressiveness of the algorithm: a larger β 2 prolongs the inter-switch interval of the aggregation data center, while a larger β 1 invites more frequent switches. We next analyze the competitive ratio of the OLM algorithm, i.e., the ratio of the worst-case total cost incurred by the OLM algorithm in [1, T ], over that of the offline optimal algorithm. Lemma 1. The overall migration cost in [1, t] is at most max{β 1,1/β 2 } times the overall non-migration cost t in this period, i.e., Cν MG ( y(ν), y(ν 1)) max{β 1, 1/β 2 } t Cν MG ( x(ν), y(ν)). Lemma 2. The overall non-migration cost in [1, t] is at most ɛ times the total offline-optimal cost, i.e., t Cν MG ( x(ν), y(ν)) ɛ t C( x (ν), y (ν)), where max y(ν) Y, x(ν):(8a) (8c) C MG( x(ν), ν y(ν)) ɛ = max ν [1,T ] min y(ν) Y, x(ν):(8a) (8c) C MG ν ( x(ν), y(ν)) is the maximum ratio of the largest over the smallest possible non-migration cost incurred in a time slot, with different data upload and aggregation decisions. Theorem 1. The OLM Algorithm is ɛ(1 + max{β 1, 1/β 2 })- competitive. Detailed proofs of the lemmas and the theorem are in our technical report [17]. The value of ɛ depends more on data generation patterns over time, and less on system scale. Under a typical value ɛ = 1.7 from our experiments, setting β 1 = 0.5 and β 2 = 2 leads to a competitive ratio of B. The Randomized Fixed Horizon Control (RFHC) Algorithm In practical applications, near-term future data generation patterns can often be estimated from history, e.g., using a time series forecasting model [18]. We next design an algorithm that exploits such future information. Time Fig. 2. An illustration of different FHC algorithms with l = 2. We divide time into equal-size frames of l + 1 time slots each (l 0). In the first time slot t of each frame, suppose we can predict all future information on data generation for the next l time slots: F d (t), F d (t + 1),..., F d (t + l), d D. We solve the following cost minimization problem over [t, t + l] given y(t 1), to derive x(ν) and y(ν), ν = t,..., t + l: t+l minimize C( x(ν), y(ν)), (11) ν=t subject to: constraints (8a) (8d), for ν = t,..., t + l. The method is essentially a fixed horizon control (FHC) algorithm, adapted from receding horizon control in the dynamic resource allocation literature [14]. Allowing the first time frame (of l + 1 slots) to start from different initial times p [1, l + 1], we have l + 1 versions of the FHC algorithm (Fig. 2). In particular, for F HC (p) starting from slot p, (11) is solved at t = p, p + l + 1, p + 2(l + 1),..., for routing and aggregation decisions in the following l + 1 time slots. For each F HC (p), an adversary can tailor an input with a surge of data produced at the beginning of each time frame. A high migration cost is likely to occur at each frame start, since the surge was not considered by the decision making in the previous frame. A randomized algorithm defeats such adversaries by randomizing the starting times of the frames. Alg. 2 shows our Randomized Fixed Horizon Control (RFHC) algorithm. It first uniformly randomly chooses p [1, l + 1] as the start of the first time frame of l + 1 slots, i.e., it randomly picks one specific algorithm F HC (p) from the l + 1 finite horizon control algorithms: at t = 1, it solves (11) to decide the optimal data routing and aggregation strategies in the period of t = 1 to p 1 (p 1); then at t = p, p+l+1, p + 2(l + 1),..., it solves (11) for optimal strategies in the following l + 1 time slots, respectively. Algorithm 2 The RFHC Algorithm 1: y(0) = 0; 2: p = rand(1, l + 1); //A random integer within [1,l+1] 3: if p 1 then 4: Derive x(1) x(p 1) and y(1) y(p 1) by solving (11) over the time window [1, p 1]; 5: t = p; 6: while t T do 7: if (t p) mod (l + 1) = 0 then 8: Derive x(t),, x(t + l) and y(t),, y(t + l) by solving (11) over the time frame [t, t + l]; 9: t = t + 1; Lemma 3. The overall cost incurred by F HC (p) is upperbounded by the offline-optimal cost plus the migration costs
5 TABLE I PERFORMANCE COMPARISON AMONG THE ALGORITHMS: SPOT INSTANCE PRICING, P = 0.25, L = 0.01 Simple OLM RFHC(0) RFHC(1) Offline Overall cost ($) Ratio to move data from the aggregation data center computed by F HC (p) to the optimal one, at the end of the time frames. That is, letting x (p) and y (p) be the solution derived by the F HC (p) algorithm and Ω p,t = {ω ω = p + k(l + 1), k = 0, 1,..., t p l+1 }, we have for any t [1, T ], t t C( x p (ν), y p (ν)) C( x (ν), y (ν)) + ω Ω p,t C ω MG( y (ω 1), y (p) (ω 1)). Theorem 2. The RFHC algorithm is (1+ 1 l+1 λ )- competitive. Here l is the number of lookahead C steps, κ = sup t t [1,T ], y 1 (t), y 2 MG ( y1 (t), y 2 (t)) (t) Y t 1 d D (αν F d(ν)) is the maximum migration cost per unit data, and C( x(t), y(t)) λ = inf t [1,T ], x(t), y(t):(8a) (8d) t 1 is d D (αν F d(ν)) the minimum total cost per unit data per time slot. The proofs of the lemma and theorem above can be found in our technical report [17]. Theorem 2 reveals that the more future steps predicted (the larger l is), the closer the RFHC algorithm can approach the offline optimum. Values of κ and λ are related to system input including prices and delays, and are less involved with the data generation patterns and the number of data centers. Under the practical settings used in our experiments, κ λ In this case, even with l = 1, the competitive ratio is already as low as V. PERFORMANCE EVALUATION Due to space limitations, detailed experiment set up and more empirical results are provided in our technical report [17]. We have implemented and compared our offline and online algorithms, as well as a Pandora-like simple algorithm that fixes the choice of the data center for aggregation to be the one in Hong Kong. We investigate dynamical VM prices (time-varying data processing costs) following the Spot Instance prices from Amazon EC2 during Apr to Jul From Tab. I, we see that due to lack of future price information, the OLM algorithm performs slightly worse than the RFHC algorithm with lookahead window l = 1. We also evaluate the RFHC algorithm with different lookahead window sizes, with dynamic prices. Tab. II shows that, with the increase of the lookahead window, the performance is approaching the offline optimum. Just a 1- or 2-step peek into the future drives the performance very close to the offline optimum. κ TABLE II PERFORMANCE OF RFHC ALGORITHMS WITH DIFFERENT LOOKAHEAD WINDOW SIZES: SPOT INSTANCE PRICING, P = 0.25, L = 0.01 RFHC(0) RFHC(1) RFHC(2) RFHC(3) Offline Overall cost ($) Ratio generated, geo-dispersed data into the cloud, for processing using a MapReduce-like framework. Two novel online algorithms are designed to practically guide data migration in an online fashion, based on solid theoretical analysis. The OLM algorithm achieves a worst-case competitive ratio of as low as 2.55 under typical real-world settings, without the need of any future information; the RFHC algorithm provides a decreasing competitive ratio with increasing size of the lookahead window. Our extensive experiments reveal the close-to-offlineoptimum performance of both algorithms, by comparing them with a simple algorithm and the optimal offline algorithm, under real-world meteorological data generation patterns. REFERENCES [1] M. Armbrust, A. Fox, R. Grifth, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. P. A. Rabkin, I. Stoica, and M. Zaharia, Above the Clouds: A Berkeley View of Cloud Computing, EECS, University of California, Berkeley, Tech. Rep., [2] Human Genome Project, [3] Hadoop at Twitter, [4] AWS Import/Export, [5] Moving an Elephant: Large Scale Hadoop Data Migration at Facebook, [6] R. J. Brunner, S. G. Djorgovski, T. A. Prince, and A. S. Szalay, Handbook of Massive Data Sets, J. Abello, P. M. Pardalos, and M. G. C. Resende, Eds. Norwell, MA, USA: Kluwer Academic Publishers, 2002, ch. Massive Datasets in Astronomy, pp [7] Amazon Elastic MapReduce, [8] M. Cardosa, C. Wang, A. Nangia, A. Chandra, and J. Weissman, Exploring mapreduce efficiency with highly-distributed data, in Proc. of MapReduce 2011, [9] M. Hajjat, X. Sun, Y. E. Sung, D. Maltz, and S. Rao, Cloudward Bound: Planning for Beneficial Migration of Enterprise Applications to the Cloud, in Proc. of ACM SIGCOMM, August [10] Y. Wu, C. Wu, B. Li, L. Zhang, Z. Li, and F. Lau, Scaling Social Media Applications into Geo-Distributed Clouds, in Proc. of IEEE INFOCOM, Mar [11] B. Cho and I. Gupta, New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks, in Proc. of IEEE ICDCS, [12] M. Lin, A. Wierman, L. L. Andrew, and E. Thereska, Dynamic Right-sizing for Power-proportional Data Centers, in Proc. of IEEE INFOCOM, April [13] T. Lu and M. Chen, Simple and Effective Dynamic Provisioning for Power-Proportional Data Centers, in Proc. of IEEE CISS, Mar [14] M. Lin, Z. Liu, A. Wierman, and L. Andrew, Online Algorithms for Geographical Load Balancing, in Proc. of IEEE IGCC, [15] Amazone Web Services, [16] D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum, Stateful Bulk Processing for Incremental Analytics, in Proc. of ACM SoCC, [17] L. Zhang, C. Wu, Z. Li, C. Guo, M. Chen, and F. C. M. Lau, Move My Data to the Cloud: an Online Cost-Minimizing Approach, Tech. Rep. [18] G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: Forecasting and Control, 4th ed. Wiley, VI. CONCLUDING REMARKS This paper designs efficient algorithms for timely, costminimizing migration of enormous amounts of dynamically-
Moving Big Data to The Cloud: An Online Cost-Minimizing Approach
Moving Big Data to The Cloud: An Online Cost-Minimizing Approach Linquan Zhang, Chuan Wu, Zongpeng Li, Chuanxiong Guo, Minghua Chen and Francis C.M. Lau Abstract Cloud computing, rapidly emerging as a
Hyper Elliptic Curve Encryption and Cost Minimization Approach in Moving Big Data to Cloud
554 Hyper Elliptic Curve Encryption and Cost Minimization Approach in Moving Big Data to Cloud Jerin Jose 1, Dr. Shine N Das 2 1 (Department of CSE, College of Engineering, Munnar) 2 (Department CSE, College
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS Survey of Optimization of Scheduling in Cloud Computing Environment Er.Mandeep kaur 1, Er.Rajinder kaur 2, Er.Sughandha Sharma 3 Research Scholar 1 & 2 Department of Computer
Data Locality-Aware Query Evaluation for Big Data Analytics in Distributed Clouds
Data Locality-Aware Query Evaluation for Big Data Analytics in Distributed Clouds Qiufen Xia, Weifa Liang and Zichuan Xu Research School of Computer Science Australian National University, Canberra, ACT
Data Center Energy Cost Minimization: a Spatio-Temporal Scheduling Approach
23 Proceedings IEEE INFOCOM Data Center Energy Cost Minimization: a Spatio-Temporal Scheduling Approach Jianying Luo Dept. of Electrical Engineering Stanford University [email protected] Lei Rao, Xue
An Efficient Approach for Cost Optimization of the Movement of Big Data
2015 by the authors; licensee RonPub, Lübeck, Germany. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Profit-driven Cloud Service Request Scheduling Under SLA Constraints
Journal of Information & Computational Science 9: 14 (2012) 4065 4073 Available at http://www.joics.com Profit-driven Cloud Service Request Scheduling Under SLA Constraints Zhipiao Liu, Qibo Sun, Shangguang
Algorithms for sustainable data centers
Algorithms for sustainable data centers Adam Wierman (Caltech) Minghong Lin (Caltech) Zhenhua Liu (Caltech) Lachlan Andrew (Swinburne) and many others IT is an energy hog The electricity use of data centers
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National
Performance Evaluation of the Illinois Cloud Computing Testbed
Performance Evaluation of the Illinois Cloud Computing Testbed Ahmed Khurshid, Abdullah Al-Nayeem, and Indranil Gupta Department of Computer Science University of Illinois at Urbana-Champaign Abstract.
Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems
215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical
Figure 1. The cloud scales: Amazon EC2 growth [2].
- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 [email protected], [email protected] Abstract One of the most important issues
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,
A Hybrid Load Balancing Policy underlying Cloud Computing Environment
A Hybrid Load Balancing Policy underlying Cloud Computing Environment S.C. WANG, S.C. TSENG, S.S. WANG*, K.Q. YAN* Chaoyang University of Technology 168, Jifeng E. Rd., Wufeng District, Taichung 41349
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks
Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks Praveenkumar Kondikoppa, Chui-Hui Chiu, Cheng Cui, Lin Xue and Seung-Jong Park Department of Computer Science,
Geoprocessing in Hybrid Clouds
Geoprocessing in Hybrid Clouds Theodor Foerster, Bastian Baranski, Bastian Schäffer & Kristof Lange Institute for Geoinformatics, University of Münster, Germany {theodor.foerster; bastian.baranski;schaeffer;
Optimizing Big Data Processing Performance in the Public Cloud: Opportunities and Approaches
Optimizing Big Data Processing Performance in the Public Cloud: Opportunities and Approaches Dan Wang Department of Computing The Hong Kong Polytechnic University [email protected] Jiangchuan Liu
High performance computing network for cloud environment using simulators
High performance computing network for cloud environment using simulators Ajith Singh. N 1 and M. Hemalatha 2 1 Ph.D, Research Scholar (CS), Karpagam University, Coimbatore, India 2 Prof & Head, Department
Minimal Cost Data Sets Storage in the Cloud
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1091
Cloud Computing Trends
UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Ensuring Reliability and High Availability in Cloud by Employing a Fault Tolerance Enabled Load Balancing Algorithm G.Gayathri [1], N.Prabakaran [2] Department of Computer
Content Distribution Scheme for Efficient and Interactive Video Streaming Using Cloud
Content Distribution Scheme for Efficient and Interactive Video Streaming Using Cloud Pramod Kumar H N Post-Graduate Student (CSE), P.E.S College of Engineering, Mandya, India Abstract: Now days, more
CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING
Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila
A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems *
A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems * Dong Yuan, Yun Yang, Xiao Liu, Gaofeng Zhang, Jinjun Chen Faculty of Information and Communication
The public-cloud-computing market has
View from the Cloud Editor: George Pallis [email protected] Comparing Public- Cloud Providers Ang Li and Xiaowei Yang Duke University Srikanth Kandula and Ming Zhang Microsoft Research As cloud computing
Beyond the Stars: Revisiting Virtual Cluster Embeddings
Beyond the Stars: Revisiting Virtual Cluster Embeddings Matthias Rost Technische Universität Berlin September 7th, 2015, Télécom-ParisTech Joint work with Carlo Fuerst, Stefan Schmid Published in ACM SIGCOMM
Index Terms: Cloud Computing, Third Party Auditor, Threats In Cloud Computing, Dynamic Encryption.
Secure Privacy-Preserving Cloud Services. Abhaya Ghatkar, Reena Jadhav, Renju Georgekutty, Avriel William, Amita Jajoo DYPCOE, Akurdi, Pune [email protected], [email protected], [email protected],
Optimizing the Cost for Resource Subscription Policy in IaaS Cloud
Optimizing the Cost for Resource Subscription Policy in IaaS Cloud Ms.M.Uthaya Banu #1, Mr.K.Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional Centre
The Hidden Extras. The Pricing Scheme of Cloud Computing. Stephane Rufer
The Hidden Extras The Pricing Scheme of Cloud Computing Stephane Rufer Cloud Computing Hype Cycle Definition Types Architecture Deployment Pricing/Charging in IT Economics of Cloud Computing Pricing Schemes
An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment
An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment Daeyong Jung 1, SungHo Chin 1, KwangSik Chung 2, HeonChang Yu 1, JoonMin Gil 3 * 1 Dept. of Computer
Towards Predictable Datacenter Networks
Towards Predictable Datacenter Networks Hitesh Ballani, Paolo Costa, Thomas Karagiannis and Ant Rowstron Microsoft Research, Cambridge This talk is about Guaranteeing network performance for tenants in
On the Amplitude of the Elasticity Offered by Public Cloud Computing Providers
On the Amplitude of the Elasticity Offered by Public Cloud Computing Providers Rostand Costa a,b, Francisco Brasileiro a a Federal University of Campina Grande Systems and Computing Department, Distributed
IMPLEMENTATION CONCEPT FOR ADVANCED CLIENT REPUDIATION DIVERGE AUDITOR IN PUBLIC CLOUD
IMPLEMENTATION CONCEPT FOR ADVANCED CLIENT REPUDIATION DIVERGE AUDITOR IN PUBLIC CLOUD 1 Ms.Nita R. Mhaske, 2 Prof. S.M.Rokade 1 student, Master of Engineering, Dept. of Computer Engineering Sir Visvesvaraya
Energy Constrained Resource Scheduling for Cloud Environment
Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering
Energy Saving Virtual Machine Allocation in Cloud Computing
13 IEEE 33rd International Conference on Distributed Computing Systems Workshops Energy Saving Virtual Machine Allocation in Cloud Computing Ruitao Xie, Xiaohua Jia, Kan Yang, Bo Zhang Department of Computer
Introduction to Cloud Computing
Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model
Dynamic Resource Pricing on Federated Clouds
Dynamic Resource Pricing on Federated Clouds Marian Mihailescu and Yong Meng Teo Department of Computer Science National University of Singapore Computing 1, 13 Computing Drive, Singapore 117417 Email:
How To Understand Cloud Computing
Overview of Cloud Computing (ENCS 691K Chapter 1) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ Overview of Cloud Computing Towards a definition
Conceptual Approach for Performance Isolation in Multi-Tenant Systems
Conceptual Approach for Performance Isolation in Multi-Tenant Systems Manuel Loesch 1 and Rouven Krebs 2 1 FZI Research Center for Information Technology, Karlsruhe, Germany 2 SAP AG, Global Research and
A Cloud Data Center Optimization Approach Using Dynamic Data Interchanges
A Cloud Data Center Optimization Approach Using Dynamic Data Interchanges Efstratios Rappos Institute for Information and Communication Technologies, Haute Ecole d Ingénierie et de Geston du Canton de
A Review of Load Balancing Algorithms for Cloud Computing
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue -9 September, 2014 Page No. 8297-8302 A Review of Load Balancing Algorithms for Cloud Computing Dr.G.N.K.Sureshbabu
Secured Storage of Outsourced Data in Cloud Computing
Secured Storage of Outsourced Data in Cloud Computing Chiranjeevi Kasukurthy 1, Ch. Ramesh Kumar 2 1 M.Tech(CSE), Nalanda Institute of Engineering & Technology,Siddharth Nagar, Sattenapalli, Guntur Affiliated
What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos
Research Challenges Overview May 3, 2010 Table of Contents I 1 What Is It? Related Technologies Grid Computing Virtualization Utility Computing Autonomic Computing Is It New? Definition 2 Business Business
A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network
ASEE 2014 Zone I Conference, April 3-5, 2014, University of Bridgeport, Bridgpeort, CT, USA A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network Mohammad Naimur Rahman
Dynamic Scheduling and Pricing in Wireless Cloud Computing
Dynamic Scheduling and Pricing in Wireless Cloud Computing R.Saranya 1, G.Indra 2, Kalaivani.A 3 Assistant Professor, Dept. of CSE., R.M.K.College of Engineering and Technology, Puduvoyal, Chennai, India.
Improving MapReduce Performance in Heterogeneous Environments
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Motivation 1. MapReduce
On the Interaction and Competition among Internet Service Providers
On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers
Data Integrity Check using Hash Functions in Cloud environment
Data Integrity Check using Hash Functions in Cloud environment Selman Haxhijaha 1, Gazmend Bajrami 1, Fisnik Prekazi 1 1 Faculty of Computer Science and Engineering, University for Business and Tecnology
CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com
` CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS Review Business and Technology Series www.cumulux.com Table of Contents Cloud Computing Model...2 Impact on IT Management and
Chapter 19 Cloud Computing for Multimedia Services
Chapter 19 Cloud Computing for Multimedia Services 19.1 Cloud Computing Overview 19.2 Multimedia Cloud Computing 19.3 Cloud-Assisted Media Sharing 19.4 Computation Offloading for Multimedia Services 19.5
Scheduling using Optimization Decomposition in Wireless Network with Time Performance Analysis
Scheduling using Optimization Decomposition in Wireless Network with Time Performance Analysis Aparna.C 1, Kavitha.V.kakade 2 M.E Student, Department of Computer Science and Engineering, Sri Shakthi Institute
Using Proxies to Accelerate Cloud Applications
Using Proxies to Accelerate Cloud Applications Jon Weissman and Siddharth Ramakrishnan Department of Computer Science and Engineering University of Minnesota, Twin Cities Abstract A rich cloud ecosystem
Cyber Crime Scene Investigations (C 2 SI) through Cloud Computing
Cyber Crime Scene Investigations (C 2 SI) through Cloud Computing Xinwen Fu University of Massachusetts Lowell [email protected] Zhen Ling Southeast University, China zhen [email protected] Wei Yu Towson
[email protected] [email protected]
1 The following is merely a collection of notes taken during works, study and just-for-fun activities No copyright infringements intended: all sources are duly listed at the end of the document This work
ENERGY EFFICIENT AND REDUCTION OF POWER COST IN GEOGRAPHICALLY DISTRIBUTED DATA CARDS
ENERGY EFFICIENT AND REDUCTION OF POWER COST IN GEOGRAPHICALLY DISTRIBUTED DATA CARDS M Vishnu Kumar 1, E Vanitha 2 1 PG Student, 2 Assistant Professor, Department of Computer Science and Engineering,
Dynamic Cloud Resource Reservation via Cloud Brokerage
Dynamic Cloud Resource Reservation via Cloud Brokerage Wei Wang, Di Niu, Baochun Li, Ben Liang Department of Electrical and Computer Engineering, University of Toronto Department of Electrical and Computer
Optimal Service Pricing for a Cloud Cache
Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,
Workload Predicting-Based Automatic Scaling in Service Clouds
2013 IEEE Sixth International Conference on Cloud Computing Workload Predicting-Based Automatic Scaling in Service Clouds Jingqi Yang, Chuanchang Liu, Yanlei Shang, Zexiang Mao, Junliang Chen State Key
Dynamic Workload Management in Heterogeneous Cloud Computing Environments
Dynamic Workload Management in Heterogeneous Cloud Computing Environments Qi Zhang and Raouf Boutaba University of Waterloo IEEE/IFIP Network Operations and Management Symposium Krakow, Poland May 7, 2014
Topology-Transparent Distributed Multicast and Broadcast Scheduling in Mobile Ad Hoc Networks
Topology-Transparent Distributed Multicast and Broadcast Scheduling in Mobile d Hoc Networks Yiming Liu, Victor O. K. Li, Ka-Cheong Leung, and Lin Zhang, Department of Electronic Engineering Tsinghua University,
Advanced Task Scheduling for Cloud Service Provider Using Genetic Algorithm
IOSR Journal of Engineering (IOSRJEN) ISSN: 2250-3021 Volume 2, Issue 7(July 2012), PP 141-147 Advanced Task Scheduling for Cloud Service Provider Using Genetic Algorithm 1 Sourav Banerjee, 2 Mainak Adhikari,
Performance Evaluation of Linux Bridge
Performance Evaluation of Linux Bridge James T. Yu School of Computer Science, Telecommunications, and Information System (CTI) DePaul University ABSTRACT This paper studies a unique network feature, Ethernet
A Novel Approach for Calculation Based Cloud Band Width and Cost Diminution Method
A Novel Approach for Calculation Based Cloud Band Width and Cost Diminution Method Radhika Chowdary G PG Scholar, M.Lavanya Assistant professor, P.Satish Reddy HOD, Abstract: In this paper, we present
Data Integrity for Secure Dynamic Cloud Storage System Using TPA
International Journal of Electronic and Electrical Engineering. ISSN 0974-2174, Volume 7, Number 1 (2014), pp. 7-12 International Research Publication House http://www.irphouse.com Data Integrity for Secure
Agent Based Framework for Scalability in Cloud Computing
Agent Based Framework for Scalability in Computing Aarti Singh 1, Manisha Malhotra 2 1 Associate Prof., MMICT & BM, MMU, Mullana 2 Lecturer, MMICT & BM, MMU, Mullana 1 Introduction: Abstract: computing
Cloud Management: Knowing is Half The Battle
Cloud Management: Knowing is Half The Battle Raouf BOUTABA David R. Cheriton School of Computer Science University of Waterloo Joint work with Qi Zhang, Faten Zhani (University of Waterloo) and Joseph
Dynamic Virtual Machine Allocation in Cloud Server Facility Systems with Renewable Energy Sources
Dynamic Virtual Machine Allocation in Cloud Server Facility Systems with Renewable Energy Sources Dimitris Hatzopoulos University of Thessaly, Greece Iordanis Koutsopoulos Athens University of Economics
Mobile Cloud Computing Security Considerations
보안공학연구논문지 (Journal of Security Engineering), 제 9권 제 2호 2012년 4월 Mobile Cloud Computing Security Considerations Soeung-Kon(Victor) Ko 1), Jung-Hoon Lee 2), Sung Woo Kim 3) Abstract Building applications
Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
Taking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER
Taking Big Data to the Cloud WHITE PAPER TABLE OF CONTENTS Introduction 2 The Cloud Promise 3 The Big Data Challenge 3 Aspera Solution 4 Delivering on the Promise 4 HIGHLIGHTS Challenges Transporting large
On the Performance-cost Tradeoff for Workflow Scheduling in Hybrid Clouds
On the Performance-cost Tradeoff for Workflow Scheduling in Hybrid Clouds Thiago A. L. Genez, Luiz F. Bittencourt, Edmundo R. M. Madeira Institute of Computing University of Campinas UNICAMP Av. Albert
AN ADAPTIVE DISTRIBUTED LOAD BALANCING TECHNIQUE FOR CLOUD COMPUTING
AN ADAPTIVE DISTRIBUTED LOAD BALANCING TECHNIQUE FOR CLOUD COMPUTING Gurpreet Singh M.Phil Research Scholar, Computer Science Dept. Punjabi University, Patiala [email protected] Abstract: Cloud Computing
Zadara Storage Cloud A whitepaper. @ZadaraStorage
Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers
