CCMA & Cloud OS 工 研 院 雲 瑞 運 算 中 心 系 统 軟 體 組 組 長 1
Agenda Introduction CCMA @ ITRI ( 工 研 院 雲 端 運 算 行 動 應 用 科 技 中 心 ) Cloud OS Virtual Data Center & Virtual Clusters Virtualized Storage Networking in a Cloud Data Center Runtime Virtual Machine Management Security System Management Summary
Cloud Computing Definition Provisioning of dynamically scalable and virtualized resources as a service over the Internet. Multi-tenancy Device & Location independence Ability to obtain virtual computing resource on demand Provides the Illusion of infinite computing resources Self-Provisioning of virtual resources Eliminates the need for up-front commitment by Cloud developers Provides the ability to pay as you go for use of computing resources Reliability, Scalability, Security, Manageability
Cloud Computing vs. Utility Services IaaS Providers PaaS Providers ISVs SaaS Providers IEK (2010/02) End Users
Timing is right Market Pull Big Data Software install on premise Software as a service (SaaS) Information technology (IT) on premise IT service as a rented utility (as in electricity) IT should not and will not be a core competence for most corporations Nicholas Carr s - Does IT matter? and The Big Switch Lowering up-front and day-to-day IT cost: pay only as much as actual resource usage Technology Push Broadband network connectivity getting faster and more reliable Internet service availability significantly improved Sufficient trust in infrastructure providers By many measures, Google is already a critical service for most of the world, and it is in the cloud!
Cloud Providers Service Providers Types of Clouds Hybrid Cloud Public Cloud Private Cloud Service Users Cloud End-User Services (SaaS) Cloud Platform Services (PaaS) Cloud Infrastructure Services (IaaS) Physical Infrastructure 6
Infrastructure as a Service Example Players Amazon GoGrid RightScale AbiCloud Nimbus Eucalyptus ElasticHosts
Platform as a Service Example Players Microsoft Azure Google App Engine Force.com Rackspace Cloud Heroku QuickBase Caspio
Software as a Service Example Players SaleForce.com Adobe.com Autodesk WebEx Microsoft Office Gmail & other Google Apps Flicker
DataCenter as a Computer Majority of cloud computing infrastructure consists of reliable services delivered through data centers Traditional datacenter Multiple servers and communications gear collocated due to common environmental & security needs Hosts a large number of relatively small or medium-sized applications, each running on a dedicated hardware infrastructure Datacenters for Cloud Computing platform Belongs to a single organization, Uses a relatively homogeneous hardware and system software platform, and share a common system management layer. Runs a smaller number of very large applications Cloud computing workloads must be designed to gracefully tolerate large numbers of component faults with little or no impact on service level performance and availability.
Cost of Data Center Data Center Budget 25% 15% 45% Servers Power distribution & Cooling Power Draw (utility) 15% Network Power Usage
Google Warehouse Style Computer Data Center
Secret Sauce of Cloud Computing Virtualization Servers, Memory, Storage, Network Self Provisioning Programmatic Control Elasticity Data vs. Response time Data and Traffic keeps on growing, but response time must maintain relatively constant Data Center must scale out Manageability High Availability Green Computing
The New Data Center Industry Container Computer for high efficiency and environmental conservation (Packaging, PUE, ) Bundled software (Cloud OS) for integrated service, high scalability, and availability Large Enterprise will bypass traditional server channels (IBM, HP, Dell, ) Purchase of entire data center directly from ODM manufacturers Significant cost reductions Horizontal scalability High Availability Google already directly purchase from Taiwan manufacturers
工 研 院 雲 端 運 算 應 用 科 技 中 心 CCMA@ITRI
Mission Statement Deliver an end-to-end data center architecture know-how and a system software suite that will enable a cloud service provider to operate a mega data center that is the most efficient and capable in the world
Cloud Computing Food Chain Build Cloud Data Center the Google Way Hardware DataCenter Know-how Cloud OS 18
Container Computers 19
Data Center Architecture Know-how Treat the entire data center as a computer - Air flow analysis - Cooling architecture (thermal management) - Power/energy management - Focus on ease of system and network management - What cannot be managed/monitored does not get deployed Modular and Scalable (Card to Rack to Container to Warehouse) Explore low power, commodity CPU as a building block
System Software (Cloud OS) Virtualization Platform CPUs Storage (Filesystems) Network Resource Management Provisioning of virtual clusters Physical machine load balancing Network traffic load balancing Power Management Security Hypervisor protection Compartmentalization between Clusters System Management FCAPS High Availability Physical component failure does not interrupt availability of virtual resources Cloud Applications management Mail Virtual Cluster VM Physical Node Bkup Virtual Cluster VM CCMA Infrastructure SW Physical Node HC Virtual Cluster VM Physical Node AppX Virtual Cluster VM Physical Node
Cloud OS
What s different about WSC s? As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of colocated servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines May, 2009
Architecture Prinicples Commodity Hardware A set of compute servers each equipped with homogenous multiple CPUs Requires CPU/memory/IO virtualization support A set of JBOD (just a bunch of disks) storage servers proportionally intermixed with the compute servers Low-power CPU is sufficient; RAID is optional A layer-2-only network connects all servers that consists of top-of-rack switches and core switches Everything is virtualized CPU, Memory, Storage, Network If a resource cannot be remotely managed, it should not be part of the CCMA data center
Commodity Hardware-Only System Architecture Physical Server VM0 VM1 VMn Layer-3 Border Routers Layer-2-Only Data Center Network Server Load Balancer Cluster Compute Server Rack Storage Server
Software Stack for Cloud OS Cloud Application Management Tool Virtual Cluster Provisioning Network/System Management Physical Cluster Deployment Tool Physical Compute Servers Distributed Main/Secondary Storage All-layer-2 Network Security Power Management Virtual Machine Management Intra-Virtual-Cluster Load Balancing
Virtualization Platform Leverage existing hypervisors Allocation of virtual machine instances Monitor VM Performance Virtual storage provisioning Scalable data center network Isolation between virtual clusters Mail Virtual Cluster Compute Nodes Bkup Virtual Cluster Service Nodes HC Virtual Cluster AppXYZ Virtual Cluster Data Nodes System Service daemons Cloud OS agents Virtual machine migration Physical Node Physical Node Physical Node Storage Server Storage Server
Virtual Resource Provisioning Physical cluster deployment Virtual Cluster A group of VM s providing same service, front-ended by a network load balancer Configuration Storage space requirement External network bandwidth requirement Load Balancing policy Firewall/IDS setting Network configuration, including DNS and DHCP OS image and application image Virtual Data Center One or more virtual cluster working in coordination (multi-tier web services, VDI s, etc) Physical Machine Load Balancing Satisfy each virtual cluster s performance requirement while minimizing the total amount of physical resource reserved
Virtual Storage Management Storage virtualization Service models Dedicated or Shared Volume Shared Filesystem Shared Database Distributed main storage Provides a global storage abstraction on a large number of distributed storage servers Distributed secondary storage Replication, Snapshot, Deduplication Unification of SAN and LAN: 10G Ethernet interconnect Each storage block in a disk volume remains available despite failure in switch, server, or disk drive Scales to a very large number of concurrent accesses
Cloud Storage System Architecture DMS DFS Metadata VM Volume iscsi Initiator iscsi Target DFS Client DFS DataNode DFS DataNode
Networking in Cloud OS Scalable Load Balancer Cluster Inter-VirtualCluster load balancing Each member of SLB cluster responsible for load balancing one or more VC s Load balance based on current load on virtual machines Layer 2 only How to scale up to 100,000 physical servers with commodity Ethernet switches Load balance of Network packet routing Support for fast fail-over Pre-computed main and alternative routes Fast failure detection and re-routing Use Valiant load balancing to avoid congestion or bottlenecks
Layer-2-Only Data Center Network Valiant load balancing Network load balancing Server Server load balancing Fail Over Server Fast failure detection and rerouting Core (Layer 2 switch) Region (Layer 2 switch) Top Of Rack (Layer 2 switch) IP1, MAC1 IP2, MAC2 Compute Server Rack Node #1 VM #1 VM #2 VM #3 VM #24 Node #2 VM #1 VM #2 VM #3 VM #24 Node #3 VM #1 VM #2 VM #3 VM #24 Node #4 VM #1 VM #2 VM #3 VM #24 Node #10 VM #1 VM #2 VM #3 VM #24 Node #20 VM #1 VM #2 VM #3 VM #24 Node #30 VM #1 VM #2 VM #3 VM #24 Node #40 VM #1 VM #2 VM #3 VM #24 Node #100 VM #1 VM #2 VM #3 VM #24 Node #200 VM #1 VM #2 VM #3 VM #24 Node #300 VM #1 VM #2 VM #3 VM #24 Node #400 VM #1 VM #2 VM #3 VM #24
Virtual Machine Management Objective Power Management Physical Machine Load Balancing Monitor runtime VM statistics Heuristic calculation to predict workload for virtual clusters Determine power down/up of machines 2 dimensional bin packing VM migration algorithm Physical machine load balancing Migration of VM s to other physical machine to balance out CPU and I/O load CONSIGNEE CONSIGNOR = PM to be turn off CONSIGNMNET = VM to be migrated
Fail-over & Load Balancing Virtual Machine Manager 1. One VM die 2. System is busy 2.1 Migrate to meet load balancing 1.1 Restart the dead VM VM Die I am the new one! Monitor Hypervisor Monitor
Multi-tenancy architecture Inter-virtual-cluster compartmentalization Works in the presence of constant VM motion Virtual appliance-based firewall and IDS/IPS Leverages open-source firewall/ids/ips technology Support for AAA, VPN, and standard access control Security
System Management Leverages open-source network/system monitor tool and server configuration tool Discovery of comprehensive inter-service dependency map: How an arbitrary service depends on other services and in what temporal order Provides application-level performance monitoring support to cloud application management tool Comprehensive resource usage accounting for SLA or billing purpose Virtualization-aware, temperature aware and poweraware Configuration CFENGINE Performance GANGLIA SNMP Container Computer Network Operating System Fault MANTIS IPMI Agent Security LDAP Accounting RADIUS
Summary
Why we are building Cloud OS? Cloud OS integrates server virtualization, storage virtualization, and network virtualization to provide: Resource management for Virtual Data Centers and Virtual Clusters Scalable Data Center Networking Load Balancing of Virtual Cluster, Network Traffic, and Physical Machines Ease of management for all Data Center resources Highly Available services end-to-end security and QoS guarantee Taiwan ODM manufacturers is uniquely positioned to take advantage of growth Data Center Industry due to Cloud Computing WSC will be used in both Public clouds and Private clouds Cloud OS will significantly enhance the value of Warehouse Style Computers (WSC s)
Thank you!