Scheduling and Monitoring of Internally Structured Services in Cloud Federations

Scheduling and Monitoring of Internally Structured Services in Cloud Federations Lars Larsson, Daniel Henriksson and Erik Elmroth {larsson, danielh, elmroth}@cs.umu.se

Where are the VMs now? Cloud hosting: No detailed information about current deployment No way of specifying relationships to other VMs No control over scheduling decisions made regarding the VMs Cloud federations: No control over which Cloud provider is executing each VM in federated Clouds (location unawareness)

Where should each VM be? Possible to specify the internal relationship between VMs and affect the placement of the service via constraints without managing the infrastructure in detail Constraints: intra-service inter-component relationships Defined at the beginning of the service lifecycle Preserved for entire service duration Offers influence over placement decisions, but not full control

What is this talk about? Our ongoing and early work on constraints-driven Cloud (IaaS) management A way of defining service structure and placement constraints A model and heuristic for scheduling in Cloud (federations) that abides by constraints A monitoring data distribution architecture that provides data upon which the scheduler bases its decisions

Federations of Clouds

Example: What do we want? 3-tier web application Deployed completely in Europe All components connected to an internal network Front-ends accessible via external network Conditions: Primary and secondary database replicas may not be deployed on the same host No secondary database replica may be deployed at the same host as some other secondary database replica and these conditions must be retained even as parts of the service are deployed on remote sites!

How do we get it? Definition of service components Component types act as templates for instances Several instances can be instantiated of each type Inter-component affinity and anti-affinity Levels: {geographical, site, host} For a given level and set of components, either requires or forbids co-placement

Constraint scope Type- and instance-level constraints Type scope affects instances of different types Primary and secondary database replicas may not be deployed on the same host Instance scope affects instances of all types, regardless of type Deployed completely in Europe No secondary database replica may be deployed at the same host as some other secondary database replica

Types Node Type Abbr. Description Service Root Common ancestor for all service components. Compute Resource C Compute resource, which can be connected to networks and storage units. AA-constraint A Metadata for use within a scheduler to determine placement according to affinity and anti-affinity rules. Scope may either be type or instance and must be specified. Block Storage Sb A mountable data storage for a Compute resource. Cf. Amazon EBS. File Storage Sf Data storage which may be accessed by multiple Compute resources simultaneously. Cf. Amazon S3 Internal Network Ni Internal network for all underlying Compute resources and File storages. External Network Ne External network connection (IP address) for the parent Compute or File storage resource.

Type Relations

Example

Scheduling (VM placement) Schedulers create mappings of sets of VMs to host machines (or remote Clouds) that maximize some benefit function (e.g. profit, utilization, reputation) In Cloud federations, remote Clouds can be regarded as logical hosts with different characteristics (e.g. network connectivity/topology and bandwidth) The general problem is NP-complete

Model for scheduling V: set of all VMs B: benefit gained from deploying the VMs in V H: set of host machines (including remote sites) M: set of mappings m v,h C: cost function of a mapping M S: estimated costs due to risk of SLA violations in migrating from one mapping to another Goal is to make a new mapping M that maximizes benefit after subtracting cost and potential penalties due to SLA violations

Constraints-driven heuristics Key is to modify mappings, i.e. perform migrations Which migrations are good? NP-complete! AA-Constraints help us define a heuristic: If we migrate a VM that has affinity to others, we must move them as well Anti-affinities prevent certain migrations (or cause a series of migrations of other VMs) SLA violation risk can be assessed using: Long-term monitoring data to predict spikes Short-term monitoring of VM activity Estimation of total data transfer of VM migrations We use this to suggest only such migrations (changes of mappings) that have a low risk of violating SLAs for sets of VMs that are related due to AA-constraints

Monitoring Scheduling requires pertinent up-to-date information Contemporary monitoring systems are incompatible, which is troublesome for Cloud federations Semantic metadata can help remove this technical barrier! We introduce the Medici monitoring data distribution architecture: Plugins translate specific data formats from underlying monitoring systems Designed for scalability Asynchronous Publish/Subscribe Designed to handle both private and public data

Medici architecture

Medici architecture Distribution hub uses Google s PubSubHubbub technology to notify subscribers of when new data is available Data is presented as an Atom feed with semantic metadata extensions to a format whose content is based on that of libvirt A SPARQL server is deployed as a subscriber so that the scheduler can make queries to it The server can subscribe to data from remote sites as well and thus give the scheduler information from remote sites in a familiar format

Summary Service structure and constraints give a reasonable amount of control to the service provider regarding scheduling decisions A scheduling model where decisions are influenced by AA-constraints and monitoring data Medici adds semantic metadata to bridge technical gaps caused by incompatibility in Cloud federations

Future work directions Investigate a larger set of constraints for service structure than AA-constraints Quantify benefit of using this scheduling model compared to others Formalize and evaluate the heuristic outlined here Validate scalability property of the monitoring architecture Determine reasonable sizes of collated data sets

Thank you for your attention! Questions?

Service Representation Parts of services (compute nodes and file storages), may have different Affinities affecting the placement Affinity may be geographical or relate to other components in the service Anti-affinity is an unwanted relation and follows the same patterns as Affinity We call the union of these AA-constraints AA-constraints have two different scopes Type scope affects instances of different types Instance scope affects instances of all types