Building a real world public cloud from the ground up Vangelis Koukis vkoukis@grnet.gr Technical Coordinator, okeanos Project GRNET DeIC konference 2012 1
okeanos? Outline Rationale Design Platform Features Unity Automation Opensource Upcoming Greek Research and Technology Network DeIC konference 2012 2
What is okeanos? okeanos is Greek for ocean. Oceans capture, store and deliver energy, oxygen and life around the planet. Greek Research and Technology Network DeIC konference 2012 3
Simplicity i GRNET DeIC konference 2012 4
vkoukis@grnet.gr 20121112 GRNET DeIC konference 2012 5
Virtual Compute Machines Virtual Network Ethernets Virtual Storage Disks ss Virtual Security Firewalls GRNET DeIC konference 2012 6
Flexibility GRNET DeIC konference 2012 7
GRNET DeIC konference 2012 8
5x 8x 2x 1x GRNET DeIC konference 2012 9
okeanos service Goal: Production quality IaaS Bt Beta in Dec, current Alpha: >1600 VMs / >1000 users Target group: GRNET s customers direct: IT depts of connected institutions indirect: university students, researchers in academia Users manage resources over a simple, elegant UI, or a REST API, for full programmatic control Greek Research and Technology Network DeIC konference 2012 10
okeanos features Compute/Network Service: Cyclades File Storage Service: Pithos+ Image Service: Plankton Identity Service: Astakos Volume Service: Archipelago Greek Research and Technology Network DeIC konference 2012 11
Rationale GRNET DeIC konference 2012 12
How it all started Need for easy, secure access to GRNET s datacenters User fi friendliness, simplicity i Scalable to the thousands #VMs, TBs, users (Pithos: 10k) running within GRNET s AAI Federation Resell or build your own? IaaS cloud provider, vendor, or own infrastructure? It all depends on your needs Greek Research and Technology Network DeIC konference 2012 13
Commercial IaaS Build on commercial IaaS? Amazon EC2 not an end user service Need to develop custom UI, AAI layers Vendor lock in Unsuitable for IT depts persistent, long term servers custom networking requirements GRNET has invested heavily in its core network > 8000km of dark kfiber Greek Research and Technology Network DeIC konference 2012 14
Bring vendor into datacenter? Hypervisor lock in Is a turn key solution suitable for a public cloud? Building public clouds is an ongoing process Manageable by GRNET s operation Integrated into the rest of the infrastructure Scaling to thousands of users Build on existing know how Gain know how, build own IaaS reuse for own services Greek Research and Technology Network DeIC konference 2012 15
What about opensource? OpenStack, Eucalyptus, OpenNebula Need a mature opensource core to build around Maturity, production readiness? readiness? proven in production environments, predictable Extensibility? Flexibility? Upgradeability, maintainability? Greek Research and Technology Network DeIC konference 2012 16
Design GRNET DeIC konference 2012 17
okeanos design decisions Reuse existing components Build on Google Ganeti target commodity hardware release to the community as opensource Greek Research and Technology Network DeIC konference 2012 18
okeanos design principles No need to make the world No need to support everything Service developed and maintained by 10 15 people p Start from the architecture then discover, combine, reuse the right components And for everything that s not already available Do it yourself! Greek Research and Technology Network DeIC konference 2012 19
GRNET DeIC konference 2012 20
Jigsaw puzzle Synnefo custom cloud management software to power okeanos Google Ganeti backend VM cluster management: physical nodes, VMs, migrations OpenStack APIs: Compute API v1.1, Object Storage API with custom extensions whenever necessary Then everything comes togetherth UI, Networking, Images, Storage, Monitoring, Identity management, Accounting, Billing, Clients, Helpdesk Greek Research and Technology Network DeIC konference 2012 21
Why Ganeti? No need to reinvent the wheel Scalable, proven software infrastructure Built with reliability and redundancy in mind Combines open components (KVM, LVM, DRBD) Well maintained, readable code VM cluster management in production is serious business reliable VM control, VM migrations, resource allocation handling node downtime, software upgrades Greek Research and Technology Network DeIC konference 2012 22
Why Ganeti? GRNET already had long experience with Ganeti provides 280 VMs to NOCs through the ViMa service involved in development, contributing patches upstream Build on existing know how for okeanos Common backend, common fixes reuse of experience and operational procedures simplified, less error prone deployment Greek Research and Technology Network DeIC konference 2012 23
Platform GRNET DeIC konference 2012 24
Software Stack REST API Multiple users, multiple resources Synnefo Multiple VMs on cluster Single VM Ganeti KVM Greek Research and Technology Network DeIC konference 2012 25
user@home Platform Design Web Client CLI Client Web Client 2 admin@home OpenStack Compute API v1.1 GRNET Proprietary GRNET datacenter Synnefo cloud management software Google Ganeti KVM Debian Direct Outof Band Access Virtual Hardware Greek Research and Technology Network DeIC konference 2012 26
Features GRNET DeIC konference 2012 27
Virtual Machine Actions My_Windows_desktop Start Console Reboot Shutdown Destroy Greek Research and Technology Network DeIC konference 2012 28
Virtual Machines powered by KVM IaaS Compute (1) Linux and Windows guests, on Debian hosts Google Ganeti for VM cluster management accessible by the end user over the Web or programmatically (OpenStack Compute v1.1) Greek Research and Technology Network DeIC konference 2012 29
IaaS Compute (2) User has full control over own VMs Create Select # CPUs, RAM, System Disk OS selection from pre defined or custom Images popular Linux distros (Fedora, Debian, Ubuntu) Windows Server 2008 R2 Start, Shutdown, Reboot, Destroy Out of Band console over VNC for troubleshooting Greek Research and Technology Network DeIC konference 2012 30
IaaS Compute (3) REST API for VM management OpenStack Compute v1.1 1 compatible 3rd party tools and client libraries custom extensions for yet unsupported functionality Python & Django implementation Full featured UI in JS/jQuery UI is just another API client All UI operations happen over the API Greek Research and Technology Network DeIC konference 2012 31
IaaS Network (Virtual Ethernets) Internet Private Network 1 Private Network2 Pi Private Network k3 Greek Research and Technology Network DeIC konference 2012 32
IaaS Network Functionality Dual IPv4/IPv6 connectivity for each VM Easy, platform provided ltf d firewalling Array of pre configured firewall profiles Or roll your own firewall inside VM Multiple private, virtual L2 networks Construct arbitrary network topologies e.g., deploy VMs in multi tier configurations Exported all the way to the API and the UI Greek Research and Technology Network DeIC konference 2012 33
Unity GRNET DeIC konference 2012 34
Images Spawn my own Ubuntu Freeze Greek Research and Technology Network DeIC konference 2012 35
Untrusted images Custom Images: snf image Host cannot touch user provided d data Resize fs, change hostname, change passwords, inject files Split design snf image host snf image helper All customization in helper VM Greek Research and Technology Network DeIC konference 2012 36
OpenStack Object Storage API Block storage Content based addressing for blocks Every fl file is a collection of blocks Web based, command line, and native clients Synchronization, deduplication An integral part of okeanos User files, Image registry for VM Images Goal: use common backend with Archipelago Greek Research and Technology Network DeIC konference 2012 37
Images Spawn my own Ubuntu Freeze Greek Research and Technology Network DeIC konference 2012 38
Images Storage Clone Ubuntu + user data Snapshot Greek Research and Technology Network DeIC konference 2012 39
Images Golden Image Greek Research and Technology Network DeIC konference 2012 40
IaaS Storage Greek Research and Technology Network DeIC konference 2012 41
IaaS Storage Volume Composer RADOS object I/O Monitor nodes Maps Archipelago Storage Object Storage nodes Greek Research and Technology Network DeIC konference 2012 42
IaaS Storage Volume Composer RADOS object I/O Monitor nodes Maps Storage nodes Greek Research and Technology Network DeIC konference 2012 43
First phase deployment IaaS Storage (1) System provided and custom user Images Redundant storage based on DRBD VMs survive physical node downtime or failure Currently under testing Reliable distributed storage over RADOS Combined with ihcustom software for snapshotting, cloning Dynamic virtual storage volumes Greek Research and Technology Network DeIC konference 2012 44
IaaS Storage (2) Multi tier storage architecture Ddi Dedicated dstorage Nodes (SSD, SAS, and SATA storage) OSDs, e.g., for RADOS Custom storage layer: Archipelago manages snapshots, creates clones over block pools OS Images held as snapshots VMs created as clones of snapshots Greek Research and Technology Network DeIC konference 2012 45
Integration GRNET DeIC konference 2012 46
Greek Research and Technology Network DeIC konference 2012 47
Greek Research and Technology Network DeIC konference 2012 48
Greek Research and Technology Network DeIC konference 2012 49
Greek Research and Technology Network DeIC konference 2012 50
Identity: Astakos Support services Provides the user base for okeanos Once authenticated, the user retrieves a common auth token for programmatic access Greek Research and Technology Network DeIC konference 2012 51
Automation GRNET DeIC konference 2012 52
./kamaki $./kamaki Usage: kamaki <group> <command> [options] api=api API can be either openstack or synnefo url=url API URL token=token use token TOKEN Commands: flavor info get flavor details flavor list list flavors image create create image image delete delete image $./kamaki server shutdown 101 url=http://localhost:8000/api/v1.1 token=1234527db2 t k Greek Research and Technology Network DeIC konference 2012 53
./kamaki $ ipython In [1]: from kamaki.client import Client In [2]: c = Client('http://localhost:8000/api/v1.1', "1234527db2 ") In [3]: c.list_flavors() In [4]: i = c.list_images() In [5]: i[5] {u'created': u'2011 06 09T00:00:00+00:00', u'id': 7, u'metadata': {u'values': {u'os': u'windows', u'size': u'11000'}}, u'name': u'windows', u'progress': 100, u'status': u'active', u'updated': u'2011 09 12T14:47:12+00:00'} In [6]: c.create_server('mywin1', 3, 5) Greek Research and Technology Network DeIC konference 2012 54
Sights GRNET DeIC konference 2012 55
Live Demo Prepare and upload Image from local template VM Spawn compute cluster to run MPI app Make local modifications and repeat What if it was over a 3G connection? Time needed to upload 1GB Image file? Time needed to prepare and spawn virtual nodes? Greek Research and Technology Network DeIC konference 2012 56
Internals GRNET DeIC konference 2012 57
Deployment DB SQL WbS Web Server API Server REST API ui SQL api aai Logic snf dispatcher RAPI Queue Ganeti Master Ganeti node snf gnt eventd snf gnt hook KVM Greek Research and Technology Network DeIC konference 2012 58
Upcoming goals Credit based resource allocation Abstract away the Ganeti backend, replace with backend connector behind the MQ Release to community as reference implementation of OpenStack Compute v1.1 Support live modification of VMs in Ganeti Snapshots, clones in storage layer Dramatic decrease in VM initialization i i i time Support workloads with 100s of ephemeral VMs e.g. for scientific computation, MPI jobs Greek Research and Technology Network DeIC konference 2012 59
GRNET DeIC konference 2012 60
GRNET DeIC konference 2012 61
GRNET DeIC konference 2012 62
GRNET DeIC konference 2012 63
GRNET DeIC konference 2012 64
GRNET DeIC konference 2012 65
GRNET DeIC konference 2012 66
GRNET DeIC konference 2012 67
GRNET DeIC konference 2012 68
GRNET DeIC konference 2012 69
GRNET DeIC konference 2012 70
GRNET DeIC konference 2012 71
GRNET DeIC konference 2012 72
GRNET DeIC konference 2012 73
GRNET DeIC konference 2012 74
GRNET DeIC konference 2012 75
GRNET DeIC konference 2012 76
GRNET DeIC konference 2012 77
GRNET DeIC konference 2012 78
GRNET DeIC konference 2012 79
GRNET DeIC konference 2012 80
GRNET DeIC konference 2012 81
GRNET DeIC konference 2012 82
GRNET DeIC konference 2012 83
GRNET DeIC konference 2012 84
GRNET DeIC konference 2012 85
GRNET DeIC konference 2012 86
GRNET DeIC konference 2012 87
GRNET DeIC konference 2012 88
GRNET DeIC konference 2012 89
GRNET DeIC konference 2012 90
GRNET DeIC konference 2012 91
GRNET DeIC konference 2012 92
GRNET DeIC konference 2012 93
GRNET DeIC konference 2012 94
GRNET DeIC konference 2012 95
GRNET DeIC konference 2012 96
GRNET DeIC konference 2012 97
GRNET DeIC konference 2012 98
GRNET DeIC konference 2012 99
GRNET DeIC konference 2012 100
GRNET DeIC konference 2012 101
GRNET DeIC konference 2012 102
GRNET DeIC konference 2012 103
GRNET DeIC konference 2012 104
GRNET DeIC konference 2012 105
GRNET DeIC konference 2012 106
GRNET DeIC konference 2012 107
GRNET DeIC konference 2012 108
GRNET DeIC konference 2012 109
GRNET DeIC konference 2012 110
GRNET DeIC konference 2012 111
GRNET DeIC konference 2012 112
GRNET DeIC konference 2012 113
GRNET DeIC konference 2012 114
GRNET DeIC konference 2012 115
Upcoming GRNET DeIC konference 2012 116
Now: Alpha2 Current and Upcoming features Common user base, custom user images on Pithos+ short term: Synnefo v0.12, Beta Ultra lightweight VMs on Archipelago with RADOS backend medium term Volumes: clonable / snapshottable / attachable disks Network and storage hotplugging Upcoming beta in fully populated datacenter Greek Research and Technology Network DeIC konference 2012 117
Opensource Synnefo: Cyclades / Pithos+ / Astakos https://code.grnet.gr/projects/synnefo https://code.grnet.gr/projects/pithos https://code.grnet.gr/projects/astakos kamaki https://code.grnet.gr/projects/kamaki pip install or apt get install everything! Greek Research and Technology Network DeIC konference 2012 118
http://okeanos.io
Thank You! Questions? Greek Research and Technology Network DeIC konference 2012 120
Asynchronous design DB contains All state needed to handle API queries no need to reach the backend Ganeti GetInstanceInfo() is a proper job, too slow Two distinct paths, effect and update Effect changes to VMs when servicing API requests to modify VM state issue commands to Ganeti backend, over RAPI ACK reception of request to user Update DB, when interesting things happen user or admin initiated Queue notifications to Message Queue, over AMQP Greek Research and Technology Network DeIC konference 2012 121
Synnefo deployment DB SQL WbS Web Server API Server REST API ui SQL api aai Logic snf dispatcher RAPI Queue Ganeti Master snf gnt eventd snf gnt hook Ganeti node KVM Greek Research and Technology Network DeIC konference 2012 122
The effect Path Reception of API request to modify VM state (e.g., PUT /servers over HTTP) API enforces access rights and policy Ganeti knows no cloud users or access rights Need to translate from Openstack Compute to backend ops (e.g., CreateInstance()) Asynchronous request processing Return HTTP 202 Accepted it s up to the API client to poll for completion Greek Research and Technology Network DeIC konference 2012 123
Synnefo deployment DB SQL WbS Web Server API Server REST API ui SQL api aai Logic snf dispatcher RAPI Queue Ganeti Master snf gnt eventd snf gnt hook Ganeti node KVM Greek Research and Technology Network DeIC konference 2012 124
May run at any time The update path Completely decoupled from effect path Design goal: Ganeti admins free to bypass frontend Synnefo adapts Synnefo logic triggered on backend events Ganeti operation progressing in the queue Synnefo hook running inside Ganeti Hooks run at various phases in a VM s lifecyclel Greek Research and Technology Network DeIC konference 2012 125
Synnefo deployment DB SQL WbS Web Server API Server REST API ui SQL api aai Logic snf dispatcher RAPI Queue Ganeti Master snf gnt eventd snf gnt hook Ganeti node KVM Greek Research and Technology Network DeIC konference 2012 126
The Ganeti event daemon Ganeti master manages job queue Jobs pass Queued, Waiting, Running, end up in Canceled, Success, Error. Need a way for Synnefo to monitor job progress Synnefo specific solution: Ganeti event daemon Passively monitor the Ganeti job queue Notifications over AMQP on job progress Synnefo logic listens to Message Queue, updates DB inotify() based mechanism, no code changes to Ganeti Greek Research and Technology Network DeIC konference 2012 127
The Synnefo hook in Ganeti Different phases in a VM s lifecyclele {pre, post} {add, start, stop, reboot, modify} Run Synnefo specific hook in post * Pushes VM configuration notifications to MQ e.g., g, NIC setup Greek Research and Technology Network DeIC konference 2012 128
IaaS Network Implementation Custom modifications to Ganeti IP pool management for the public network Custom written DHCP server over NFQUEUE Custom interface handling scripts Enforce VM networking configuration Private Networks Alpha: pre provisioned bridges to 802.1Q VLANs Later on: MAC prefix based filtering Greek Research and Technology Network DeIC konference 2012 129
Synnefo deployment DB SQL WbS Web Server API Server REST API ui SQL api aai Logic snf dispatcher RAPI Queue Ganeti Master snf gnt eventd snf gnt hook Ganeti node KVM Greek Research and Technology Network DeIC konference 2012 130
Reconciliation with Ganeti What if the MQ is down, and messages are lost? Ganeti is the Single Source of Truth for VM state Reconcile DB state asynchronously On success notification for a Ganeti GetInstanceInfo() op Triggered periodically, e.g., g, using cron or even by the administrator, running gnt instance t info manually Greek Research and Technology Network DeIC konference 2012 131