IaaS-Clouds in the MaDgIK Sky Konstantinos Tsakalozos PhD candidate Advisor: Alex Delis
Research Topics 1.Nefeli: Hint based deployment of virtual infrastructures 2.How profit maximization drives resource allocation in highly scalable infrastructures 3.MigrateFS, towards a true share nothing cloud 4.Tackle cloud's heterogeneity
Nefeli, VM placement The Idea behind Nefeli: The Virtual Infrastructure consumer/user is aware of operation and data flows among VMs. Can we harvest this information to tackle performance bottlenecks? BUT: The physical cloud infrastructure must never be revealed to the cloud consumers
Interfacing with Nefeli The consumer/user expresses a set of constraints/hints describing an ideal deployment Nefeli takes these user constraints/wishes under consideration when VMs are mapped to physical machines (PMs) Consider VMs holding Database replicas. They have to be deployed on different PMs. Consider VMs producing excessive network traffic. They should be co-deployed
Constraints User constraints VMs to be co-deployed, spread across physical machines (PM), favored against others, data gravity Administrative constraints Offload a PM, Power save Solver: Simulated annealing Specify the time you need to spend in producing the VM-to-PM mapping
Runtime Interaction The consumer/user expresses a set of states for her infrastructure. These states activate different constraints. States are trapped. Nefeli migrates VMs to accommodate user wishes Active hints may change over time offering a dynamic virtual infrastructure
Nefeli vs other placement policies Simulation measuring the end node throughput Random VM placement, Balanced VM placement, Use as few hosting nodes as possible (Power)
Nefeli in a real cloud Nefeli achieves a 17% improvement on the time required to have video and audio transcoding complete, compared to default OpenNebula 1.2.
2. Resource allocation in highly scalable infrastructures Highly scalable frameworks: The more resources consumed the higher the performance Scale linearly? Clouds, seemingly endless resources Performance guaranties? How many resources (eg, Satelites, VMs) should we use for a scalable infrastructure?
Clouds... It is all about money Cost: Pay for the resources you consume. Revenue: Sell products coming form the processing taking place within the cloud Budget Function: Response time to revenue Pay more -> Reduce response time -> Increase your revenue
Finding the maximum profit point Max profit B changes at runtime. Why? Some cloud resources are shared among users (Disk, Net I/O, CPU) Workloads (processing time) change based on input To specify B we assume re-occurring user s workloads DB loads Day-Night, Index updates Query execution plan updates
Finding the maximum profit point Re-occurring user workload: In each iteration compute MR and MC We increase or decrease the size number of VMs used accordingly so as MR == MC B too far away from B: increase/decrease VMs exponentially When B close to B: increase/decrease VMs linearly Revolve around an unchanged B
Applications - Evaluation Used by the cloud provider Cost: cloud s operational cost, Revenue: per VM Used by each consumer separately Revenue: the degree of satisfaction the service offers Resources shared proportionally to the money offered
Evaluation - Two users Evaluated using Real infrastructures elastic Hadoop/Condor Simulated for large infrastructures A single user computing Pi over and over again Exponential and linear VM adjustments Second user entering the cloud forces the equilibrium point to change
3. A true share nothing cloud Suspend/resume VM migration is a show stopper for load balancing You must have shared storage facilities Shared storage is: A single point of failure Performance bottleneck Clouds are based on commodity hardware to be cost effective
Migrate FS. Why? Distributed file systems: Scaling issues Have relaxed semantics Offer much more than what clouds need Migration operation Sync VM disk image between target and source PM Sync VM RAM between target and source PM Instantly suspend VM form source and resume it to the target Step 1 must be assisted by the file system
Migrate FS prototype Two modes of operation: I need to move VM v from PM A to PM B in less than t seconds I need to move VM v from PM A to PM B with guaranteed VM I/O performance Respect SLAs At any time you can get an estimate on the time the migration will take (depends on the I/O load of the VM)
4. Handling Heterogeneity How we dealt with hetogeneity Organize physical nodes into sites Specialy crafted VMs to boot in multiple sites Univeral instantiation configuration schema Heterogeneity: a challenge Sky computing: Cloud of clouds System upgrades leaving old equipment operational How to balance load in a large non-homogeneous IaaS- cloud?
Load Balancing in IaaS-Clouds Load balancing through VM migration Live migration: almost no downtime Copy RAM while the VM in online Requirement: PMs share storage, compatible hypervisors Suspend-resume: have to copy memory and disk content before resuming Load balancing is itself a costly (time & resources) operation
VM Scheduling - Placement Physical,Virtual infrastructure properties Resource availability, VM requirements (CPU, RAM, network) Topology: distance from repositories, neighboring nodes Future load balancing prospects User provided hints/constraints System properties: Compatibility (kernel, virtualization), Features (high availability, RAID) Constraints set by already deployed infrastructures
Two Phase VM Scheduling How to form a site: Load balancing prospects. Favor site formation among PMs allowing live migration. When live-migration enabled nodes not enough allow suspend/resume migration Resources of the site must be more than the requested Site formation is formed as a constraint satisfaction problem VM-to-PM mapping is also a constraint satisfaction problem (Nefeli)
Elastic Solver Consume resources from the cloud fill out underutilized, isolated physical nodes Simulated annealing easily parallelizable through simultaneous executions More resources better site formation and VM-to-PM mapping
Results? Reduction of the search space yields: Improvements in the time consumed No degradation in the VM-to-PM quality when compared to a one phase approach
Related work [Tsak11] K. Tsakalozos, H. Kllapi, E. Sitaridi, M. Roussopoulos, D. Paparas and A. Delis, Flexible Use of Cloud Resources through Profit Maximization and Price Discrimination, ICDE 2011 Hannover, Germany, April 2011. [Tsak10] K. Tsakalozos, M. Roussopoulos, V. Floros and A. Delis, Nefeli: Hint-based Execution of Workloads in Clouds, ICDCS 2010, Genoa, Italy, June 2010. [TsakF]K. Tsakalozos, M. Roussopoulos, and A. Delis, VM Placement in non-homogeneous IaaS-Clouds, under review. J. O. Kephart and D. M. Chess, The Vision of Autonomic Computing, IEEE Computer, vol. 36, no. 1, pp. 41 50, 2003. K. Lee, N. Paton, R. Sakellariou, and A. Fernandes, Utility Driven Adaptive Workflow Execution, in Proc. of the 2009 9th IEEE/ACM Int. Symposium on Cluster Computing and the Grid, Shanghai, PR China. J. O. Kephart and R. Das, Achieving Self-Management via Utility Functions, IEEE Internet Computing 2007. D. Grosu and A. Das, Auctioning resources in Grids: model and protocols: Research Articles, Concurrent Computation : Practice and Experience, vol. 18, no. 15, pp. 1909 1927, 2006
Related work K. Subramoniam, M. Maheswaran, and M. Toulouse, Towards a MicroEconomic Model for Resource Allocation, in In IEEE Canadian Conference on Electrical and Computer Engineering. IEEE Press, 2002. H. R. Varian, Intermediate Microeconomics : A Modern Approach, 7th ed. W. W. Norton and Company, Dec. 2005, ch. 25, Monopoly Yingwei Luo, Binbin Zhang, Xiaolin Wang, Zhenlin Wang, Yifeng Sun, Haogang Chen, "Live and incremental whole-system migration of virtual machines using block-bitmap," Cluster Computing, 2008 IEEE International Conference on, vol., no., pp.99-106, Sept. 29 2008-Oct. 1 2008 Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, and Harald Schioberg. 2007. Live wide-area migration of virtual machines including local persistent state. In Proceedings of the 3rd international conference on Virtual execution environments (VEE '07). Keahey, K., Tsugawa, M., Matsunaga, A., Fortes, J.,, "Sky Computing," IEEE Internet Computing, Sept.-Oct. 2009 F. Hermenier, X. Lorca, J.-M. Menaud, G. Muller, and J. Lawall, Entropy: a consolidation manager for clusters, in Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual Execution Environments, ser. VEE 09. C. Hyser, B. McKee, R. Gardner, and B. J. Watson, Autonomic virtual machine placement in the data center, HP Laboratories HPL-2007-189, 2008.