Cloud Computing and Datacenters Prof. Sasu Tarkoma University of Helsinki Matemaattis-luonnontieteellinen tiedekunta / Sasu Tarkoma 22.1.2010 1
Contents Introduction Views to Cloud Computing Eucalyptus Open Source Platform MapReduce and Hadoop NEXUS/Mesos Network and Data Center Virtualization PSIRP Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 2
Focus Distributed applications and services on the cloud Collaboration & P2P Massive scale Millions and billions of users and data items Significant growth is expected in the mobile sector 50 times increase in 2015? Evaluation methods for wide-scale experiments Exploiting parallelisms & locality Custom software & solutions Building blocks Interfaces, policies, protocols, algorithms, runtimes, platforms State of the art examples: Amazon, Google, Yahoo, Microsoft
Cloud Computing The services of Cloud computing can be divided into three categories: 1. Software-as-a-Service (SaaS), in which a vendor supplies the hardware infrastructure, the software product and interacts with the user using a portal (software on demand, pay-as-you-go). 2. Platform-as-a-Service (PaaS), in which a set of software and development tools are hosted by a provider on the provider's infrastructure, for example, Google's AppEngine. 3. Infrastructure-as-a-Service (IaaS), which involves virtual server instances with unique IP addresses and blocks of on-demand storage, for example, Amazon's Web services infrastructure. Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 4
Cloud Computing Private, community, public, hybrid clouds Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS) On-demand service Information demand and supply (Open APIs) Location Independent Resource Pooling Ubiquitous Network Access Elasticity Virtualization Web Application Frameworks Datacenters and clusters Browser as a Platform 22.1.2010 5
Datacenter for Experiments UH has a new datacenter for experimenting with networking technology, services, and computational science 240 CPUs with 8 cores each, virtualization support (Ubuntu and KVM) Aim to support network virtualization with OpenFlow switches and NetFPGA software-defined routers Platform experiments with Eucalyptus Platform experiments with Nexus (joint work with ICSI and UCB) Pub/sub control and data planes for data centers
Toward New Deployments New deployments Services, applications Distribution middleware Networking Solutions and Basic Processes Datacenter / cluster / testbed
Toward New Knowledge New Knowledge on Distributed Systems Real-life experiments Wide-area simulation Formal models Evaluation Methodology Datacenter / cluster / testbed
Eucalyptus Elastic Utility Computing Architecture Linking Your Programs To Useful Systems Originally a research project at UC Santa Barbara Now a private company (Eucalyptus Systems) Web services based implementation of elastic/utility/cloud computing infrastructure Linux image hosting ala Amazon How do we know if it is a cloud? Try and emulate an existing cloud: Amazon AWS Functions as a software overlay Existing installation should not be violated (too much) Focus on installation and maintenance
Eucalyptus Supports KVM and Xen Open source platform with Amazon AWS features Support for Amazon AWS (Elastic Compute Cloud EC2, Simple Storage Service S3, and Elastic Block Storage EBS) Includes Walrus: an Amazon S3 interface-compatible storage manager Added support for elastic IP assignment Web-based interface for cloud configuration Image registration and image attribute manipulation Configurable scheduling policies and SLAs Support for multiple hypervisor technologies within the same cloud
Eucalyptus Usage Foster greater understanding and uptake of cloud computing Provide a vehicle for extending what is known about the utility model of computing Experimentation vehicle prior to buying commercial services Provide development, debugging, and tech preview platform for Public Clouds Homogenize local IT environment with Public Clouds AWS functionality locally makes moving using Amazon AWS easier, cheaper, and more sustainable Provide a basic software development platform for the open source community
Architecture Client-side Interface (via network) Client-side API Translator Cloud Controller Database Walrus (S3) Cluster Controller Node Controller
Components Node Controller controls the execution, inspection, and terminating of VM instances on the host where it runs. Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network. Storage Controller (Walrus) is a put/get storage service that implements Amazon s S3 interface, providing a mechanism for storing and accessing virtual machine images and user data. Cloud Controller is the entry-point into the cloud for users and administrators. It queries node managers for information about resources, makes high-level scheduling decisions, and implements them by making requests to cluster controllers. Matemaattis-luonnontieteellinen tiedekunta 22.1.2010 13
Cloud Controller Web services in three categories Resource Services perform system-wide arbitration of resource allocations, let users manipulate properties of the virtual machines and networks, and monitor both system components and virtual resources. Data Services govern persistent user and system data and provide for a configurable user environment for formulating resource allocation request properties. Interface Services present user-visible interfaces, handling authentication & protocol translation, and expose system management tools.
Virtual Interfaces VM instance is assigned a virtual interface that is connected to a software Ethernet bridge Cluster Controller configures VM traffic isolation, dynamic public IP assignment Physical Resource VM Instance Virtual Interface Physical Interface VLAN Tagged Virtual Interface Software Ethernet Bridge To physical Ethernet To physical Ethernet 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta 15
Walrus Walrus is a data storage service Leverages standard web services technologies (Axis2, Mule) Interface compatible with Amazon s Simple Storage Service (S3) Walrus implements the REST (via HTTP), sometimes termed the Query interface and SOAP interfaces Users that have access to EUCALYPTUS can use Walrus to stream data into/out of the cloud as well as from instances that they have started on nodes. Walrus acts as a storage service for VM images.
What s it Made Out Of? Axis2 and Axis2c version 1.4.0 Hibernate 3.2.2 HSQLDB 1.8.0 jetty 6.1.9 JiBX (March 30th sourceforge) Mule 2.0.1 Rampart version 1.3 libvirt version 0.4.2 socat-1.6.0 VDE version 2.2.0-pre2
Cloud Speed Performance study using HPC applications and benchmarks Two questions: What is the performance impact of virtualization? What is the performance impact of cloud infrastructure? Tested Xen, Eucalyptus, and AWS (small SLA) Many answers: Random access disk is slower with Xen CPU bound can be faster with Xen -> depends on configuration Kernel version is far more important Eucalyptus imposes no statistically detectable overhead AWS small appears to throttle network bandwidth and (maybe) disk bandwidth
MapReduce and Hadoop MapReduce was developed by Google to process large datasets in clusters Used for sorting, searching and indexing, counting, clustering, machine learning, etc. Inspired by the Map and Reduce operations used in functional programming Uses a filesystem or a database to store intermediate values and solutions Solve problems by splitting them into smaller problems, then combine the solutions Done by a master node Hadoop is a Java framework inspired by the Google file system and MapReduce Rack-aware processing of vast data sets Batch-oriented workloads Used by Facebook and Amazon Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 19
User program Worker (1)fork (2)assign map (1)fork Master (2) assign reduce (1)fork split 0 split 1 split 2 (3)read Worker (4) local write (5)Remote read Worker (6)write output file 0 split 3 split 4 Worker output file 1 Worker Input files Map phase Intermediate files (on local disks) Reduce phase Output files 22.1.2010 20
Nexus / Mesos Nexus is about running multiple frameworks in the same cluster. Resource manager. Can run systems such as Hadoop. Multiplexes resources across frameworks. Nexus decouples job execution management from resource management by providing a simple resource management layer over which frameworks like Hadoop and Dryad can run. Microkernel Make reliable component as small as possible Exokernel Give maximal control to frameworks IP model Narrow waist over which diverse frameworks can run Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 21
Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 22
Task Management Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 23
Task Management Operations Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 24
Intro to OpenFlow OpenFlow is an open protocol for router configuration and control Opportunities for convergence of packet and circuit switched and clean slate designs Slides from the Stanford Clean Slate program http://cleanslate.stanford.edu OpenFlow Web site http://www.openflowswitch.org
OpenFlow: Enable Innovations within the Infrastructure PC Net Services sw OpenFlow Switch Secure Channel SSL Controller API hw Flow Table Add/delete flow entries Encapsulated packets Controller discovery 26
PSIRP Publish/subscribe Internet Routing Paradigm (PSIRP) FP7 project coordinated by HIIT Creating a new clean-slate protocol suite based on publish/ subscribe
PSIRP Rendezvous Routing Fragmentation Observations No topological addresses, only labels No explicit layering (blackboard pattern) Security enhanced using self-certification End-to-end reachability, control in the network Natural support for multicast, it is the norm Support for broadcast and all-optical labelswitching technologies Dynamic state is introduced into the network How do we make it scale? 14.10.2008 28
Forwarding Design Fast path In-packet Bloom filters Line-rate forwarding Slow path (Rendezvous) Content-centric functions Policies Caching configuration Security 14.10.2008 29
AS: Rendezvous AS: Rendezvous Publish AS: Topology AS: Topology Subscribe Create delivery path Forwarding node Forwarding node Forwarding edge node Forwarding node Forwarding node Configure Forwarding path Subscriber Data Forwarding Publisher
Characteristics Links have identifiers (Link IDs) Source routing mechanism Install forwarding state on demand (traffic aggregation) Topology Manager Network topology graph and its maintenance Constructs Bloom filter-based forwarding identifiers
Efficient flat identifier based forwarding Currrent zfilter size 256 bits Link IDs are added in the zfilter (OR operation) Verification requires one comparison (AND operation) Limitations Possible false positives Wrong forwarding path
Overlay Networks Book Introduction Overview Overlay Technology Applications Properties of Data Structure of the Book Network Technologies Networking Firewalls and NATs Naming Addressing Routing Multicast Network Coordinates Network Metrics Properties of Networks and Data Data on the Internet Zipf s Law Scale-free Networks Robustness Small Worlds Unstructured Overlays Overview Early Systems Locating Data Napster Gnutella Skype BitTorrent Cross-ISP BitTorrent Freenet Comparison Foundations of Structured Overlays Overview Geometries Consistent Hashing Distributed Data Structures for Clusters Distributed Hash Tables Overview APIs Plaxton s Algorithm Chord Pastry Koorde Tapestry Kademlia Content Addressable Network Viceroy Skip Graph Comparison Probabilistic Algorithms Overview of Bloom Filters Bloom Filters Bloom Filters in Distributed Computing Gossip Algorithms Content-based Networking and Publish/Subscribe Overview DHT-based Data-centric Communications Content-based Routing Router Configurations Siena and Routing Structures Hermes Formal Specification of Content-based Routing Systems Pub/sub Mobility Security Overview Attacks and Threats Securing Data Security Issues in P2P Networks Anonymous Routing Security Issues in Pub/Sub Networks Applications Amazon Dynamo Overlay Video Delivery SIP and P2PSIP CDN Solutions Conclusions References Index 33 22.1.2010
Conclusions and Future Work Excellent facilities and connections for doing high impact research in cloud computing in Helsinki Current work with PSIRP, Future Internet SHOK, Cloud Software SHOK Not covered by current activities, seeds for a new project Software-defined networking Dynamic controller for OpenFlow enabled routers PSIRP in data centers Petabyte storage mechanisms Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22.1.2010 34