Energy Cost Optimization for Geographically Distributed Heterogeneous Data Centers

Energy Cost Optimization for Geographically Distribute Heterogeneous Data Centers Eric Jonari, Mark A. Oxley, Sueep Pasricha, Anthony A. Maciejewski, Howar Jay Siegel Abstract The proliferation of istribute ata centers has recently motivate researchers to stuy energy cost minimization at the geo-istribute level. Researchers have been using moels for time-of-use TOU) electricity pricing an renewable energy sources to help reuce energy costs when performing geographical workloa istribution, but have mae oversimplifying assumptions at the ata center level. Important consierations such as the thermal, power, an co-location interference effects within each ata center have a large impact on the performance of workloa management techniques. By esigning three techniques that possess varying amounts of knowlege of such information, we compare an quantify the benefits of consiering etaile moels at the ata center level, an emonstrate that our best heuristic can on average achieve a cost reuction of 37% compare to state of the art prior work. Keywors geo-istribute ata centers; workloa management; power-aware computing; memory interference. I. INTRODUCTION The strong success an extensive growth of clou computing has resulte in ata center operators geographically istributing their ata center locations e.g., Google [1]). Distributing ata centers geographically offers benefits to the clients e.g., low latency ue to shorter communication istances). However, a strong motivating factor for ata center operators to geographically istribute their ata centers is to reuce operating expenitures by exploiting time-of-use TOU) electricity pricing [2], an reucing electricity costs is now a focus of ata center management. Relocating workloa among geo-istribute ata centers offers several benefits. First, workloas can be shifte to locations in ifferent times zones to concentrate workloa in the regions with the lowest electricity prices at that time. Secon, an opportunistic istribution of the workloa among ata centers uring perios of peak eman can allow iniviual noes to run in slower but possibly more energy-efficient performance states P-states), further reucing electricity costs. Due to the ever-increasing electricity consumption of ata centers, the use of on-site renewable energy sources e.g., solar an win) has grown in recent years. Aing on-site renewable power can provie aitional opportunities for geographical loa istribution GLD) techniques to reuce electricity costs. The goal of our research is to esign techniques for geographical loa istribution that will minimize energy cost for executing incoming workloas. We use etaile moels This research was supporte by NSF grants CCF-1302693 an CCF- 1252500. This research use the CSU ISTeC Cray System supporte by NSF Grant CNS-0923386. We also thank Hewlett Packar for onating servers. Department of Electrical an Computer Engineering Department of Computer Science Colorao State University, Fort Collins, Colorao 80523, USA {eric.jonari,mark.oxley,sueep,aam,hj}@colostate.eu 978-1-5090-0172-9/15/$31.00 2015 IEEE of power, temperature, an co-location interference at each ata center to provie more accurate information to the geoistribute workloa manager. This work applies to environments where there is information about the history of the types of tasks being execute e.g., DigitalGlobe, Google, DoD). By consiering TOU pricing an renewable power moels at each ata center, we esign three new workloa management techniques that assume varying egrees of colocation interference knowlege to istribute or migrate the workloa to low-cost ata centers at regular time intervals, while ensuring all of the workloa completes. We compare to the state-of-the-art metho [3], an show that our best heuristic can, on average, achieve a cost reuction of 37% comparatively. The contributions of this work are as follows: A new hierarchical framework for the GLD problem that consiers cost-minimization workloa management at both the geo-istribute an local heterogeneous ata center level; A ata center moel that consiers heterogeneous compute noe types, P-states, noe temperatures, cooling power, renewable power sources, an co-location interference; The esign of three novel heuristics which possess varying egrees of co-location interference preiction knowlege to emonstrate an motivate the use of etaile moels in workloa management ecisions. II. RELATED WORK Workloa istribution for geo-istribute ata centers has been stuie in [3, 4, 5, 6, 7, 8, 9]. Knowlege of TOU pricing is typically use to either minimize electricity costs across all geo-istribute ata centers e.g., [3, 4, 6, 7, 8, 9]), or to maximize profits when a revenue moel is inclue for computing e.g., [5]). A quality of service QoS) constraint of some form is recognize in most of the aforementione works, typically as a queuing elay constraint [4, 6, 8]. Others incorporate QoS violations into the cost function, where a monetary penalty is associate with violating queuing elay [5], latency [7], or migration [3] service level agreements SLAs). The moeling etail varies significantly, some works inclue ynamic voltage an frequency scaling DVFS) in ecision making [6], some inclue power consumption of the cooling system in aition to the computing system [7, 8], others consier real-worl TOU pricing ata [5, 6], an one consiers renewable energy sources at each ata center location [3]. Our research inclues all aforementione moeling aspects to assist in workloa management ecisions: DVFS to exploit the power/performance traeoffs of P-states, cooling system power to inclue thermal awareness an reuce cooling

cost, TOU pricing ata from an actual electric company, an renewable power sources at each ata center. To the best of our knowlege, our work is the first to encompass all of these aspects within the GLD problem. In aition, unlike any prior work in GLD, we consier co-location interference as part of our loa istribution techniques; a phenomena that occurs when multiple cores within the same multicore processor are executing applications simultaneously an compete for share resources e.g., last-level cache or DRAM). Similar to [3], our stuy consiers a renewable energy source at each geo-istribute ata center, a cooling system at each ata center, an migration penalties associate with moving alreay-assigne workloas to ifferent ata centers. We iffer significantly from [3] by incluing TOU electricity pricing traces, consieration of DVFS P-state ecisions in our management techniques, an integrating interference cause by the co-location of multiple tasks to cores that share resources. III. SYSTEM MODEL A. Geo-istribute Level The goal of the geo-istribute resource manager GDRM) is to minimize the total monetary cost of the system while servicing all requests. We ivie time evenly into intervals calle epochs. As an example, in this work an epoch is an hour of time T e ), thus a 24-epoch perio is a full ay. We assume that the beginning of each epoch is a steaystate scheuling problem where we assign execution rates of a set of I task types to D ata centers. A task type i I is characterize by its arrival rate AR i, an its estimate computational spees on each of the heterogeneous compute noes in all P-states. The assignment problem at the geoistribute level is to map execution rates for each task type i to each ata center D such that total energy cost across all ata centers is minimize, an the execution rates of all task types meet or excee their arrival rates. For each epoch τ, we assign a esire ata center execution rate ER,i DC for each task type i to each ata center such that the total execution rate for all task types excee or equal) the corresponing arrival rate, AR i, thus ensuring the workloa is complete. That is, D ER,i DC τ) AR i τ), i I. 1) =1 B. Data Center Level 1) Overview: Each ata center houses NN compute noes that are arrange in hot aisle/col aisle fashion Fig. 1), an a cooling system comprise of NCR computer room air conitioning CRAC) units. A compute noe n is of a heterogeneous compute noe type, where noe types vary in their execution spees, power consumption characteristics, an number of cores, i.e., they are heterogeneous. Cores within a compute noe are homogeneous, an each core is DVFSenable to allow inepenent configuration of its P-states. The number of cores in noe n is NCN n, an NT k is the compute noe type to which core k belongs. 2) Core Execution Rates: At each ata center, the sum of execution rates of all cores that are assigne to execute task type i must excee or equal ER,i DC τ). We assume that we know the estimate computational spee ECS) of any task of type i on a core of noe type n in P-state p, ECSi, n, p). The execution rate of task type i on core k, ER core, is the i,k prouct of the assigne esire fraction of time core k spens executing tasks of type i, DF i,k τ), an the execution spee that core executes tasks of type i in P-state P S i,k τ). That is, the execution rate of task type i on core k is ER core i,k τ) = DF i,k τ) ECSi, NT k, P S i,k τ)). 2) At the ata center level, we assign DF i,k τ) an P S i,k τ) such that power is minimize see Section III-B3), an the execution rates of all task types on cores in ata center meets or excees the execution rate assigne by the GDRM, ensuring that the arriving workloa is fully execute. That is, NN n=1 NCN n k=1 ER core i,k ER DC i, i I, D. 3) 3) Power Moel: The power consumption of a compute noe consists of the overhea ile ) power consumption an ynamic power consume by cores executing tasks. We efine O n as the overhea power consumption of compute noe n. Let APCi, NT k, P S i,k τ)) be the average power consume by core k in a noe of type NT k when executing tasks of type i in P-state P S i,k τ) uring epoch τ. The power consumption of noe n uring epoch τ, P N n τ), is NCN n P N n τ) = O n + k=1 I AP Ci, NT k, P S i,k τ)) DF i,k τ). i=1 4) The power consume by a CRAC unit, P CR,c τ), is a function of the heat remove at that CRAC unit an the Coefficient of Performance CoP) of the CRAC unit [10], calculate using Eq. 5 of [11]. 4) Renewable Energy Moel: Solar energy E solar an win energy E win both kw h) are calculate for each ata center as an average per epoch τ Eqs. 5 an 6 are from [12]). is the total active area of all solar panels, an A win A solar is the total swept rotor area of all win turbines. The solarto-electricity an win-to-electricity conversion efficiency are given by α an β, respectively. Lastly, s τ) is the average solar irraiance, v t) is the win spee, an ρ τ) is the air ensity, as average for ata center uring epoch τ. The total renewable energy, R τ), available at ata center uring epoch τ is the sum of the win an solar energy available at that time. We use these moels with historical ata to preict the renewable power available at each ata center, given by E solar τ) = α A solar s τ) T e, 5) E win τ) = β 1 2 Awin ρ τ) v τ) 3 T e, 6) R τ) = E solar τ) + E win τ). 7) 5) Thermal Moel: Using the notion of thermal influence inices [13] that were erive using computational flui ynamics simulations, we can calculate the steay-state temperatures at compute noes an CRAC units in each ata center. Because we assume the same physical layout for each of the ata centers Fig. 1), we erive these thermal influence inices for one ata center, an assume they are the same for all other ata centers. The outlet temperature of each compute noe is a function of the inlet temperature, the power consume, an the air flow

CRAC intake hot aisle perforate floor tile col aisle CRAC unit heterogeneous compute noes Fig. 1. Data center in hot aisle/col aisle configuration [11]. rate of the noe. The inlet temperature of each compute noe is a function of the outlet temperatures of each CRAC unit an the outlet temperatures of all compute noes [11]. Lastly, for all noes the inlet temperature of each noe is constraine to be less than or equal to the re line temperature maximum allowable noe temperature). 6) System Electricity Cost: The electricity price at ata center uring epoch τ is efine as E price τ). Let Eff be the approximation of power overhea in ata center ue to the inefficiencies of power supply units an uninterruptable power supplies. The total electricity cost for ata center uring epoch τ, P C τ), is efine as P C τ) = E price τ) [ NCR P CR,c τ) + c=1 NN n=1 P N n τ) ) Eff R τ) 8) 7) Noe Activation/Deactivation Cost: At each ata center, the number of noes of each noe type that are in use changes frequently between epochs. Inactive noes are place in a sleep state, but entering an exiting this sleep state takes a non-negligible amount of time. Each noe that is active is consiere to be active for the entire epoch, which requires that any noe transitioning to/from a sleep state o so uring the epoch following/prior the current epoch, respectively. For each ata center, let N,j start τ) be the number of noes of type j that are inactive uring epoch τ an active uring epoch τ + 1, an let N stop,j τ) be the number of noes of type j that are active uring epoch τ 1 an inactive uring epoch τ. Let Pj S, P j D Sleep, an Pj be the average static power, average peak ynamic power, an average sleep power for noe type j, respectively, with the average CPU utilization of noe type j efine as φ E j. Let the coefficient to approximate CRAC unit power at ata center be CUP. We assume each ata center contains the same number of noes, however each ata center is heterogeneous in the sense that the number of noes belonging to each noe type among ata centers varies. Let J be the set of noe types in ata center. Let T S be the time require for a noe to transition to/from a sleep state. Recall that T e is the uration of an epoch. The noe assignment cost AC for ata center uring epoch τ is calculate as AC τ) = j J E price τ) Eff ) φ E j Pj D + Pj S P Sleep j ] N start,j 1 + 1 ) T S CUP T e ) τ) + N stop,j τ).. 9) 8) Co-Location Interference Moel: Tasks competing for share memory in multicore processors can cause severe performance egraation, especially when competing tasks are memory-intensive [14]. The memory-intensity of a task refers to the ratio of last-level cache misses to the total number of instructions execute [15]. We aapt a linear regression moel from [15] that uses a set of features i.e., inputs) base on the current applications assigne to a multicore processor to preict the execution time of a target application i on core k. These features are A i,k, the number of applications co-locate on that multicore processor, B i,k, the base execution time, C i,k, the clock frequency, D k, the average memory intensity of all applications on that multicore processor, an E i,k, the memory intensity of application i on core k. In a linear moel, the output is a linear combination of all features an their calculate coefficients. We classify the task types into memory-intensity classes on each of the noe types, an calculate the coefficients for each memory-intensity class using the linear regression moel. If we enote u, v, w, x, an y as the linear moel coefficients for feature symbols A, B, C, D, an E, respectively, plus the constant term z, the equation for co-locate execution time of a task type i of memory-intensity class m on core k CET i,k τ)) is CET i,k τ) = u m,k A i,k + v m,k B i,k + w m,k C i,k +x m,k D k + y m,k E i,k + z m,k. 10) The execution rate is the reciprocal of the execution time. Therefore the co-locate execution rate for task type i on core k, CERi,k coreτ), is 1/CET i,kτ). The total execution rate for task type i in epoch τ is therefore given by D NC CER i τ) = CERi,k core τ). 11) =1 k=1 To allocate tasks to cores when consiering co-location interference, some of our techniques use knowlege of CERi,k core to juge actual execution rates more accurately than techniques that o not consier co-location interference. When consiering co-location the execution rate constraint becomes NC k=1 CERi,k core τ) ER,i DC τ), i I, D. 12) IV. HEURISTIC DESCRIPTIONS A. Problem Statement The GDRM allocates the incoming workloa to specific noes within each ata center. The GLD problem is NP-har [3], an therefore we propose three heuristics for GDRM FDLD-TAO, FDLD-CL, an GALD-CL), with each having ifferent levels of etail of the system available to it. The system as a whole is uner-subscribe, i.e., all tasks must be complete without ropping. The objective of a GDRM is to minimize monetary electricity cost of the geo-istribute system the sum of Eq. 8 across all ata centers) while ensuring the workloa is complete Eqs. 1 an 12). B. Force Directe Loa Distribution Heuristics Force-irecte loa istribution FDLD) is a variation of force-irecte scheuling [16]. We aapt the FDLD propose in [3], esignate FDLD-SO, to our rate-base allocation environment, an propose two new FDLD heuristics FDLD- TAO an FDLD-CL) that each possess ifferent amounts of

co-location interference information to solve this problem. In FDLD-SO, to account for co-location interference performance egraation an let the FDLD technique meet the execution rate constraint at a given ata center Eq. 12), we give the technique simple over-provisioning FDLD-SO) to compensate for performance egraation ue to co-location. This technique over-provisions all task types equally by scaling estimate task execution rates by the factor φ C. The FDLD- TAO technique improves upon FDLD-SO by using task aware over-provisioning to estimate co-location effects for each task type by a factor specific to each task type i, φ C i. For both FDLD-SO an FDLD-TAO, the egree of over-provisioning φ C an φ C i, respectively) is etermine empirically. Lastly, the FDLD-CL heuristic uses the co-location moels given in Sec. III-B8 to account for co-location effects when calculating task execution rates. All versions of FDLD consier a system implementation where the computing time of each core in a noe is evenly ivie among its assigne tasks. The funamental operation of all FDLD variants is escribe in Algorithm 1. To generate the initial solution, every noe in every ata center in every epoch is assigne to execute all task types step 1). Each iteration of the FDLD removes one instance of one task type from a single noe, selecting the task to remove that woul result in the lowest total system force, F S steps 3-20). F S is the sum of the execution rate forces F ER τ)) an cost forces F C τ)) across all epochs. The execution rate force F ER is the ratio of task execution rate calculate uring steps 6-11) to task arrival rate. Task execution rate is a function of the P-state of the noe the task is executing on, but the FDLD is not esigne to make DVFS ecisions to set the execution rates of task types, an therefore an average execution rate must be etermine for all task types using the average noe utilization factor φ E j for each noe type j. Let ER j,i P MAX ) an ER j,i P 0 ) be the execution rates of task type i running on a single core of a noe of type j in the highest numbere P-state an lowest numbere P-state, respectively. Therefore, the equivalent single core execution rate R j,i of task type i on noe type j is R j,i = ER j,i P MAX ) + [ER j,i P 0 ) ER j,i P MAX )] φ E j. 13) Let N,j be the number of noes of type j in ata center. Let W,j,m τ) be the set of instances of task type i place on noe m of noe type j in ata center uring epoch τ. Let Q,j,i τ) be the equivalent number of noes of type j running task type i in ata center uring epoch τ, given by Q,j,i τ) = N,j m=1 { 1 W,j,mτ) 0 else if i W,j,m τ) 14) Let K j be the number of cores in a noe of type j. The average estimate execution rate ERj,i E τ) of task type i on machine type j uring epoch τ, when using either the FDLD- SO or FDLD-TAO versions, is given by D ERj,iτ) E = K j R j,i F Q,j,i τ) 15) j J =1 subject to the constraint ER E j,iτ) AR i τ) i I. 16) To compensate for performance egraation ue to co-location effects, noe over-provisioning is accomplishe by the factor Algorithm 1 Pseuo-coe for FDLD heuristics 1. allocate an instance of each task type to every noe in every ata center in every epoch 2. while 3. for each noe with tasks still allocate to it 4. for each task type on the noe 5. temporarily remove task type from noe 6. if FDLD-CL 7. estimate execution rates using Eq. 11 CER i) 8. else if FDLD-TAO 9. estimate execution rates using Eq. 15 an φ C i 10. else if FDLD-SO 11. calculate execution rates using Eq. 15 an φ C 12. estimate power costs using Eq. 18 13. calculate F S from F ER an F C 14. if execution rate constraints are not violate Eq. 12 for FDLD-CL, Eq. 16 FDLD-SO & FDLD-TAO) 15. a to set of possible task removal operations 16. restore task type to noe 17. if set of possible task removal operations is empty 18. break 19. else 20. choose an implement the task type removal operation that woul result in the lowest F S 21. en while 22. calculate final execution rates CER iτ), i I, τ N τ ) 23. calculate final cost from sum of power costs an allocation costs P C τ) an AC τ), D, τ N τ ) F. F is replace by either φ C or φ C i in Eq. 15 when using either FDLD-SO or FLDB-TAO, respectively. The execution rate force F ER is calculate using F ER τ) = i I e Z AR i τ) 1 ) 1. 17) When consiering the FDLD-CL heuristic, the term Z is replace by CER i τ), an is replace by ERj,i E τ) when using either FDLD-SO or FDLD-TAO. Observe that F ER τ) ecreases to zero as the ratio of Z to AR i τ) ecreases to one. Recall that R τ) is the renewable power available at ata center uring epoch τ. For all FDLD variants, let P C Eτ) be the estimate power cost at ata center uring epoch τ, calculate as P C E τ) = E price τ) N,j P,j,m E 1 + 1 ) Eff R τ) 18) CUP where j J m=1 P E,j,m = { P Sleep j if W,j,m = 0 φ j Pj D + Pj S. 19) else Let C actual τ) be sum of power P C E τ)) an allocation AC τ)) costs incurre at ata center uring epoch τ. Let C max τ) be the maximum real power cost possible at ata center, calculate using C max τ) = E price τ) [ N,j φ j Pj D + P S ] j. 20) j J The cost force F C can then be calculate with ) C D F C actual τ) C τ) = e max τ) 1. 21) =1 Observe that the value of F C goes to zero as the ratio of τ) to C max τ) ecreases to zero. C actual

Let N τ be the total number of epochs being consiere. The total system force across all epochs, F S, is calculate as F S = F ER τ) + F C τ). 22) N τ τ=1 C. Genetic Algorithm Heuristic We also esigne a thir heuristic; a genetic algorithm loa istribution with full co-location awareness GALD-CL). The GALD-CL heuristic Algorithm 2) has two parts: a genetic algorithm base GDRM an a greey heuristic serving as the fitness function of the genetic algorithm. The GALD-CL assigns fractions of the global task arrival rate to each of the ata centers in the simulation step 3), with the arrival rates of each task type i at each ata center acting as the genes of the chromosomes. Using the task arrival rates assigne to each ata center by the genetic algorithm at the geo-istribute level, the local greey heuristic assigns tasks types to execute on specific noes steps 5-15). If the greey heuristic fins that the task arrival rate assigne to a ata center excees the capacity of that ata center step 16), the global arrival rates are ajuste slightly an the chromosome is evaluate once again steps 5-15), with further ajustments mae to the global allocations within the chromosome until a vali solution can be reache. The greey heuristic has full knowlege of the entire system moel, incluing the co-location moels an task-noe power moels, allowing it to make better placement ecisions. The GALD-CL heuristic aresses two potential shortcomings of the FDLD variants. First, the nature of the FDLD variants prevents them from making of DVFS ecisions. The greey heuristic in the GALD-CL approach chooses the most efficient P-state for each task type on each noe type [11]. Secon, the FDLD variants are susceptible to becoming trappe in local minima. The genetic algorithm portion of the GALD- CL approach intrinsically enables escape from local minima, allowing a more complete search of the solution space. A. Experimental Setup V. SIMULATION RESULTS Experiments were conucte for groups of four, eight, an sixteen ata centers. The site locations for ata centers were selecte so that each group woul have a fairly even east coast to west coast istribution to better exploit TOU pricing an renewable power. Each ata center consists of 4,320 noes arrange in four aisles, an is heterogeneous within itself, having noes from either two or three of the noe types given in Table I, with most locations having three noes types an per-noe core counts that range from 4-12 cores epening on the mix of noe types. The electricity prices use uring experiments were taken irectly from Pacific Gas an Electric PG&E) Scheule E- 19 [17]. Each ata center ha an installe renewable power generating capacity equivalent to 20% of the maximum power consumption of the location uring the month with the highest generate power for that location. Renewable power ata was obtaine from [18], where each location uses either win power, solar power, or a combination of the two. Sleep power for all noes is calculate as a fixe percentage of static power for each noe type, assume to be 16% base on a recent stuy of noe power states [19]. The average noe utilization factor use uring FDLD allocations, φ E, is set as Algorithm 2 Pseuo-coe for GALD-CL heuristic 1. create an initial population of chromosomes 2. while within time limit o 3. perform selection, crossover an mutation to create new chromosomes 4. for each new chromosome, evaluate: 5. for each ata center 6. fin most efficient P-state for all task type/noe type pairs 7. sort all task type/noe type pairs by efficiency 8. while power constraint not violate o 9. choose first task type/noe type pair 10. assign 100% esire fraction of time for selecte task type to a single core from selecte noe type 11. remove core from future consieration 12. if no cores within selecte noe type available 13. remove task type/noe type pair from use 14. set CRAC outlet temperatures to hottest temperatures such that thermal constraints are met 15. en while 16. if solution is invali 17. moify chromosome, return to step 5 18. trim population with elitism) 19. en while 20. take final allocation from the best chromosome 21. calculate final execution rates CER iτ), i I, τ N τ ) 22. calculate final cost from sum of power costs an allocation costs P C τ) an AC τ), D, τ N τ ) TABLE I. NODE PROCESSOR TYPES USED IN EXPERIMENTS Intel processor # cores L3 cache frequency range Xeon E3-1225v3 4 8MB 0.8-3.20 GHz Xeon E5649 6 12MB 1.60-2.53 GHz Xeon E5-2697v2 12 30MB 1.20-2.70 GHz 0.75. The coefficient to approximate CRAC unit power at ata center CUP ) was etermine empirically by simulating workloas of multiple levels at each ata center location, an its value range between 1.43 an 2.08 for the ifferent configurations. The time of each epoch τ was set to be one hour. The time require to transition a noe to or from a sleep state, T S, was assume to be five minutes. The GALD-CL heuristic was limite to a run time of one hour for each epoch it was solving for, to mimic the representative time of each epoch. The FDLD heuristics for four, eight, an sixteen locations complete on average in one, four, an thirteen minutes per epoch simulate, respectively. Each of five task types is representative of a ifferent benchmark from the PARSEC benchmark suite. Task execution times an co-locate performance ata were obtaine from running the benchmark applications on the noes liste in Table I [15]. Synthetic task arrival rates were constructe that follow a sinusoial pattern, peaking uring business hours an eclining uring the evening an until the next morning. B. Monetary Cost Comparison of Heuristics Our first set of experiments compare the cost associate with using the four heuristics escribe in Section V. These experiments use a ata center group consisting of four locations an estimate costs over a 24-hour perio Fig. 2). It can be observe that the FDLD-CL technique, using the co-location moels, performs the best of the FDLD variants for provisioning the minimum number of noes necessary to meet execution rate requirements. The FDLD-SO technique performe the worst, severely over-provisioning noes. The GALD-CL heuristic outperforme all other approaches. While not performing as well as well as the GALD-CL, the FDLD variants o have the avantage of reaching a solution more quickly, which may be beneficial in some cases.

Fig. 2. System costs across twenty four epoch perio for each heuristic, four locations. C. Workloa Type Analysis The experiment in Section V-B use a workloa that was a mix of memory-intensive an CPU-intensive tasks types. Fig. 3 shows experiments for the FDLD-CL an GALD-CL heuristics for a group of four ata centers where two aitional workloa types have been ae: one where all of the tasks are highly memory-intensive using ata from canneal, cg, ua, sp, an lu benchmarks), an one where the tasks are highly CPU-intensive using ata from fluianimate, blackscholes, boytrack, ep, an swaptions benchmarks). The composition of ata center workloas can vary greatly an can impact the resource requirements, an these experiments show that the techniques presente in this work will perform well for a variety of workloa types. Fig. 3. System costs across twenty four epoch perio for ifferent workloa types, for a group of four locations. FDLD-CL shown as soli line, an GALD- CL shown as ashe line. D. Scalability Analysis Aitional experiments were conucte using groups of eight an sixteen separate ata centers. For each of the ata center group sizes, the average performance improvement of each technique over the FDLD-SO metho is given in Table II. It shoul be note that as the number of ata centers in the group grows larger, the time for the FDLD variants to reach a solution increases, an the number of GALD-CL generations that can take place within the time limit ecreases. As previously mentione, the increase in the runtime of the FDLD heuristics was very manageable as the number of ata centers in the group increase. TABLE II. MONETARY COST REDUCTION COMPARED TO FDLD-SO [3] Heuristic 4 ata centers 8 ata centers 16 ata centers FDLD-TAO 5.2% 4.6% 5.7% FDLD-CL 14.2% 15.7% 18.8% GALD-CL 39.7% 39.2% 36.8% VI. CONCLUSIONS We propose three workloa allocation heuristics for workloa allocation across geographically istribute ata centers. In this work, we explore aing ifferent levels of knowlege of the system, particularly co-location interference, to geographical workloa istribution algorithms. We emonstrate that incluing aitional information about the co-location interference in the ecision process of the heuristics resulte in a lower energy cost by reucing or eliminating noe overprovisioning while still meeting all require workloa execution rates. Our FDLD-CL an GALD-CL heuristics resulte on average in 10% an 37%, respectively, lower total cost than the prior work represente by the FDLD-SO heuristic) [3]. In systems where the workloa profile changes rapily an therefore requires short epochs a few minutes), we recommen FDLD-CL. When the workloa profile is not changing rapily an workloa istribution ecisions are given more time an hour), GALD-CL is a more suitable technique. REFERENCES [1] Data center locations, http://www.google.com/about/ atacenters/insie/locations/inex.html. [2] Y. Li et al., Operating cost reuction for istribute ata centers, in CCGRID 13, May 2013, pp. 589 596. [3] H. Gouarzi an M. Peram, Geographical loa balancing for online service applications in istribute atacenters, in CLOUD 13, June 2013, pp. 351 358. [4] L. Gu et al., Joint optimization of VM placement an request istribution for electricity cost cut in geo-istribute ata centers, in ICNC 15, Feb. 2015, pp. 717 721. [5] J. Zhao et al., Dynamic pricing an profit maximization for the clou with geo-istribute ata centers, in INFOCOM 14, Apr. 2014, pp. 118 126. [6] L. Gu et al., Optimal task placement with QoS constraints in geoistribute ata centers using DVFS, IEEE Trans. Comp., vol. 64, no. 7, pp. 2049 2059, June 2015. [7] H. Xu, C. Feng, an B. Li, Temperature aware workloa management in geo-istribute ata centers, IEEE Trans. Parallel an Distribute Systems, vol. 26, no. 6, pp. 1743 1753, May 2015. [8] M. Polverini et al., Thermal-aware scheuling of batch jobs in geographically istribute ata centers, IEEE Trans. Clou Comp., vol. 2, no. 1, pp. 71 84, Apr. 2014. [9] D. Mehta, B. O Sullivan, an H. Simonis, Energy cost management for geographically istribute ata centres uner time-variable emans an energy prices, in UCC 13, Dec. 2013, pp. 26 33. [10] J. Moore et al., Making scheuling cool : Temperature-aware workloa placement in ata centers, in ATEC 05, Apr. 2005, pp. 61 75. [11] M. A. Oxley et al., Thermal, power, an co-location aware resource allocation in heterogeneous high performance computing systems, in IGCC 14, Nov. 2014, 10 pp. [12] X. Deng et al., Eco-aware online power management an loa scheuling for green clou atacenters, IEEE Sys. Journal, no. 99, pp. 1 10, 2014. [13] H. Bhagwat et al., Fast an accurate evaluation of cooling in ata centers, J. Electronic Packaging, vol. 137, no. 1, Mar. 2015, 9 pp. [14] S. Govinan et al., Cuanta: Quantifying effects of share on-chip resource interference for consoliate virtual machines, in SOCC 11, Oct. 2011, pp. 1 14. [15] D. Dauwe et al., A methoology for co-location aware application performance moeling in multicore computing, in APDCM 15, May 2015, pp. 434 443. [16] P. Paulin an J. Knight, Force-irecte scheuling for the behavioral synthesis of asics, Computer-Aie Design of Integrate Circuits an Systems, IEEE Transactions on, vol. 8, no. 6, pp. 661 679, Jun 1989. [17] Pacific Gas an Electric Company, Electric scheule e-19, Apr. 2015, http://www.pge.com/tariffs/tm2/pf/elec SCHEDS E-19.pf. [18] NREL, National solar raiation atabase, https://mapsbeta.nrel.gov/nsrb-viewer/, April 2015. [19] C. Isci et al., Agile, efficient virtualization power management with low-latency server power states, in ISCA 13, June 2013, pp. 96 107.