The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)"



Similar documents
CERN Cloud Infrastructure. Cloud Networking

Ubuntu OpenStack on VMware vsphere: A reference architecture for deploying OpenStack while limiting changes to existing infrastructure

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Sales Slide Midokura Enterprise MidoNet V1. July 2015 Fujitsu Limited

Introduction to OpenStack

Virtualization, SDN and NFV

Release Notes for Fuel and Fuel Web Version 3.0.1

UZH Experiences with OpenStack

NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use.

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

Openstack. Cloud computing with Openstack. Saverio Proto

Mobile Cloud Computing T Open Source IaaS

Open Source Networking for Cloud Data Centers

CS312 Solutions #6. March 13, 2015

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

7 Ways OpenStack Enables Automation & Agility for KVM Environments

Computing at the HL-LHC

POSIX and Object Distributed Storage Systems

Enabling Technologies for Distributed and Cloud Computing

Cloud on TEIN Part I: OpenStack Cloud Deployment. Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat University

Aerohive Networks Inc. Free Bonjour Gateway FAQ

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Configuring Oracle SDN Virtual Network Services on Netra Modular System ORACLE WHITE PAPER SEPTEMBER 2015

VXLAN: Scaling Data Center Capacity. White Paper

Shoal: IaaS Cloud Cache Publisher

CERN Cloud Architecture

Building Multi-Site & Ultra-Large Scale Cloud with Openstack Cascading

Bring your virtualized networking stack to the next level

Enabling Technologies for Distributed Computing

How To Create A Cloud Based System For Aaas (Networking)

1.1 SERVICE DESCRIPTION

Microsoft Exchange Server 2003 Deployment Considerations

Software Defined Environments

Software Defined Network (SDN)

Availability Digest. Redundant Load Balancing for High Availability July 2013

Centrata IT Management Suite 3.0

The Evolution of Cloud Computing in ATLAS

Tier0 plans and security and backup policy proposals

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

HAWAII TECH TALK SDN. Paul Deakin Field Systems Engineer

Quantum Hyper- V plugin

Restricted Document. Pulsant Technical Specification

Grid Computing in Aachen

An Oracle Technical White Paper November Oracle Solaris 11 Network Virtualization and Network Resource Management

Next Gen Data Center. KwaiSeng Consulting Systems Engineer

Installation Guide Avi Networks Cloud Application Delivery Platform Integration with Cisco Application Policy Infrastructure

Managing a Tier-2 Computer Centre with a Private Cloud Infrastructure

KVM, OpenStack, and the Open Cloud

Mirantis

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

Performance of Network Virtualization in Cloud Computing Infrastructures: The OpenStack Case.

How To Build A Cloud Stack For A University Project

Corso di Reti di Calcolatori M

Lecture 02b Cloud Computing II

Load Balancing for Microsoft Office Communication Server 2007 Release 2

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Network Virtualization for Large-Scale Data Centers

Part 1 - What s New in Hyper-V 2012 R2. Clive.Watson@Microsoft.com Datacenter Specialist

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Deployment of Private, Hybrid & Public Clouds with OpenNebula

2) Xen Hypervisor 3) UEC

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Oracle Collaboration Suite

Network Technologies for Next-generation Data Centers

Infrastructure as a Service (IaaS)

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Network performance in virtual infrastructures

Windows Server 2008 R2 Hyper-V Server and Windows Server 8 Beta Hyper-V

Private Distributed Cloud Deployment in a Limited Networking Environment

DVS Enterprise. Reference Architecture. VMware Horizon View Reference

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

Stretched Active- Active Application Centric Infrastructure (ACI) Fabric

IP Telephony Management

Web Application Hosting Cloud Architecture

Software Define Storage (SDs) and its application to an Openstack Software Defined Infrastructure (SDi) implementation

PROPRIETARY CISCO. Cisco Cloud Essentials for EngineersV1.0. LESSON 1 Cloud Architectures. TOPIC 1 Cisco Data Center Virtualization and Consolidation

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

SAN Conceptual and Design Basics

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

EdgeRouter Lite 3-Port Router. Datasheet. Model: ERLite-3. Sophisticated Routing Features. Advanced Security, Monitoring, and Management

Introduction to Network Virtualization in IaaS Cloud. Akane Matsuo, Midokura Japan K.K. LinuxCon Japan 2013 May 31 st, 2013

State of the Art Cloud Infrastructure

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

Solution for private cloud computing

Optimizing Data Center Networks for Cloud Computing

Transcription:

The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)" J.A. Coarasa " CERN, Geneva, Switzerland" for the CMS TriDAS group." " CHEP2013, 14-18 October 2013, Amsterdam, The Netherlands

Outline" Introduction" The CMS Online Cluster" Opportunistic Usage" The Architecture" Overlay in detail" Openvswitch-ed" OpenStack infrastructure" Onset, Operation, Outlook and Conclusion"

Introduction" The CMS Online Cluster" Opportunistic Usage"

The CMS Online Cluster" A large " More than 3000 computers" More than 100 switches (>7000 ports)" and complex cluster " Different kinds of hardware and ages" Computers configured in more than 100 ways" Set of segmented networks using sometimes VLANs"» 1 network per rack"» Up to 2 Additional networks in VLANs in some racks" designed as a data acquisition system to process data at 100GBytes/s and select and archive 20TBytes/day" The CMS online cluster: Setup, operation and maintenance of an evolving cluster. PoS ISGC2012 (2012) 023."

The CMS Data Acquisition System" 40 MHz COLLISION RATE LEVEL-1 TRIGGER DETECTOR CHANNELS Charge Time Pattern 16 Million channels 3 Gigacell buffers Energy Tracks 100-50 khz 1 MB EVENT DATA 1 Terabit/s READOUT 50,000 data channels 200 GB buffers ~ 400 Readout memories 500 Gigabit/s 300 Hz FILTERED EVENT Gigabit/s SERVICE LAN SWITCH NETWORK Computing Services EVENT BUILDER. A large switching network (400+400 ports) with total throughput ~ 400Gbit/s forms the interconnection between the sources (deep buffers) and the destinations (buffers before farm CPUs). ~ 400 CPU farms EVENT FILTER. A set of high performance commercial processors organized into many farms convenient for on-line and off-line applications. 5 TeraIPS Petabyte ARCHIVE Cluster with ~1300 computers doing event filtering

The CMS High Level Trigger (HLT) Cluster" cluster Nodes cores (HT on)/ node cores Memory Disk (Gbyte/node) (Gbytes/node) (1) 720 8 5760 16 72 (2) 288 12 (24) 3456 24 225 "(3) 256 16 (32) 4096 32 451 " (1)+(2)+(3) 1264 13312 26 Tbytes 227 Tbytes Three generations of hardware, some with limited local storage" Nice connectivity to CERN GRID Tier 0 were data is stored (~20Gbit/s, can be increased to 40Gbit without a large investment)."

The HLT Clusters versus Tier[0,1,2]" CPU in HEP-SPEC06 " " HLT farm" Tier0" Tier1" Tier2" sum" 602k + ALICE" 356k" 603k" 985k" ATLAS" 197k" 111k" 260k" 396k" CMS! 195k! 121k! 150k! 399k! ALICE" 90k" 101k" 143k" LHCb" 210k" 34k" 92k" 47k" http://w3.hepix.org/benchmarks/doku.php/"

The CMS Online Cluster: Network Details" CMS Networks:" Private Networks:" Service Networks (per rack) " "(~3000 1 Gbit ports);" Data Networks " "(~4000 1Gbit ports)" Source routing on computers" VLANs on switches " To Tier 0 Network." Private networks for Oracle RAC" Private networks for subdetectors" Public CERN Campus Network" " CMS Sites CMS Networks ToTier0 Network Data Networks Control Readout, HLT Storage Manager Service Networks" Computer gateways CERN Campus Network Internet Firewall

Opportunistic Usage" The cluster can be used when not 100% in use:" During the technical stops (~1 week every 10):" These timeslots already used during data taking to set up the cloud infrastructure and test it;" During the shutdown used to upgrade the accelerator (since Mid-February for more than a year)" When the cluster is under-used:" Already used simultaneously while taking data (Heavy Ions, ppb 2013) as a proof of concept; " This needs cautious testing and deeper integration to allow dynamical resources reallocation;" Technically feasible but may arise concerns about putting in danger data taking."

The Architecture in the two scenarios (prove of concept and production)" Requirements " Overlay in detail" OpenStack infrastructure"

Requirements" - Opportunistic usage (the motivation! Use the resources!)." - No impact on data taking." - Online groups retain full control of resources." - Clear responsibility boundaries among online/users (offline groups)." " Virtualization" " Overlay Cloud"

Cloud Architecture with Openstack" Auxiliary infrastructure" Database Backend (MySQL )" Message passing system (RabbitMQ )" Openstack components" User frontends to control the cloud" Command line API (nova, EC2 )" Web interface (Dashboard)" VM disk image service (glance)" Network virtualization (nova-network, neutron )" Compute virtualization (nova-compute)" Identity (keystone)" Storage virtualization"

Openstack Components"

Overview of Controlling architectures" Proof of Concept Phase" Minimal changes implemented to test the cloud concept" Minimal software changes on computers" Untouched routing/network hardware" Production Phase" Designed for" High availability" Easy scalability" Allowed changes on network to allow overlay network and use of High Bandwidth Network"

Overlay on HLT nodes" Overlay! Not a dedicated cloud. The HLT Nodes add software to be compute nodes." - Minimal changes to convert HLT nodes in compute nodes participating in the cloud." not to have any impact on data taking" Losing 1 min of data is wasting accelerator time (worth ~O(1000)CHF/min)" - Easy to quickly move resources between data taking and cloud usage. " "

Overlay on HLT nodes. Details" Networking" A virtual switch/vlan added in the computer" Software added" Libvirt, kvm" OpenStack 1 compute (Essex/Grizzly from EPEL-RHEL 6)" OpenStack metadata-api and network (Grizzly only)" Open vswitch 2 (version 1.7.0-1) (NOT used with Grizzly)" RabbitMQ and MySQL clients" Home made scripts" Configure all components (and virtual network if necessary)" Clean up leftovers if necessary"» VMs, image files, hooks to bridge " 1 http://www.openstack.org! 2 http://openvswitch.org"

Controller The Overlay/Virtual Network in Detail in the Proof of Concept Phase: Open vswitch" 10.176.w.z/25 10.0.w.z/15 Network service" GRE GRE 10.176.x.y/25 Router Compute node CERN Network Switches/Router NOT modified" Compute nodes have a virtual switch " Where VMs hook up using 10.1.0.0/16 (Flat network for VMs!)! Where the compute node has a virtual interface (10.0.x.y/15) and traffic is routed to the control network (10.176.x.y/25)" And a port is connected to a central computer control network IP encapsulating with GRE" A central computer acts as a big switch (potential bottleneck)" With reciprocating GRE ports! SNATs (OpenStack Nova Network service) and routes to the internet through the CERN Network" VM 10.1.a.b VM 10.1.c.d 10.0.x.y/15

Controllers The Overlay/Virtual Network in Detail in the Production Phase: Virtual" Router Compute node SNAT" CERN Network ToTier0 Network Data Networks VM 10.29.a.b 10.29.c.d VM Proxy" Network service" SNAT" Metadata API service" Added cables and routing from Data Network switches to CDR and back (128.142.x.y ) and VLANs fencing the traffic of the VMs" Added routing tables and iptables to hypervisor" VMs talking to 128.142.x.y will talk through the Data Networks" SNAT (OpenStack nova network provided with the multihost with external gateway configuration)" VMs talking to cmsproxy.cms:3128 will talk to the hypervisor:3128" DNAT (to make access to frontier scalable profiting from online infrastructure)" The controllers in RR manner are default gateways to the other addresses" SNATs and routes to the internet through the CERN Network"

CMS cloud: the Flat Network" VM" VM" VM" VM" VM" switch" switch" switch" 10.0.0.0/15 switch" Router + proxy + NAT" CDR Network CERN Campus Network

The Cloud Controlling Layer in the Proof of Concept Phase" bridge MySQL" Network service" Keystone" service" Rabbit MQ" SNAT" Proxy" Scheduler" service" API (EC2, Openstack)" 1xFat controller node (Dell PowerEdge R610 48Gbytes, 8 CPU, 8x1Gbit Ethernet)" Glance image service" 1xVM image store" VM VM bridge Compute service" 1300xCompute nodes"

The CMSoCloud: Proof of Concept Phase" Controller bridge GRE Router GRE MySQL" Network service" Keystone" service" Rabbit MQ" SNAT" Proxy" Scheduler" service" API (EC2, Openstack)" Glance image service" VM VM bridge Compute service" CERN Campus Network

The Cloud Controlling Layer in the Production Phase" Purpose:" Highly available " Easy to scale out (services replicated)" The Solution:" MySQL cluster as a backend DB (N headnodes in Round Robin alias (RRa) and RAID 10 for the disk nodes)" Needed changes to the table definitions" The rest of the services come in a brick (N instances)" RabbitMQ in cluster with replicated queues (RR order in conf. files)" Gateway virtual IP (VIP) for VM (RR conf. files) " SNAT to the outside world for the VMs" Corosync/pacemaker to provide OpenStack HA" APIs with VIP (RRa)" Keystones with VIP (RRa)" schedulers" Dashboard"

The CMSoCloud: The Production Phase" Controllers MLP Router Rabbit MQ" Keystone" services" NAT" Proxy" corosync/pacemaker" Dashboard" Scheduler" " Scheduler" services" " APIs (EC2, Openstack)" MLP Database MySQL" Glance image service" VM Compute service" bridge Network service" SNAT" VM Metadata API service" Proxy" CERN Campus Network Data Networks ToTier0 Network

The CMSoCloud: Proof of Concept Phase" Controller bridge GRE Router GRE MySQL" Network service" Rabbit MQ" SNAT" Proxy" Scheduler" service" VM VM bridge Keystone" service" API (EC2, Openstack)" Glance image service" Compute service" CERN Campus Network

Operating Architecture for The GRID, our first client as IaaS! The HLT clusters aim to run jobs as a GRID site" A dedicated Factory in CERN IT instantiates VMs of the specific flavor from the specific VM image." In CMS A dedicated VM image has been created." The factory uses condor to control the life of the VM through ec2 commands." CVMfs is used:" To get the proper Workflow;" To get the proper cmssw software." Frontier is used. The controller is a frontier server." Xrootd is being used to stage in/out files of the cluster" A patched version due to bugs in staging out."

Onset, Operation, Outlook and Conclusion"

Onset and Evolution of CMS online cloud: Growing on the Technical Stops " July 2012: Deployment of first OpenStack infrastructure." 17-18/9/2012: HLT Cluster cloudified/migrated to SLC6.! 8-12/10/2012: First tests of big scale VMs deployment. One of the Largest(?) OpenStack cloud in service. " We run the Folding@home project." Mid December 2012: First working image and revamped hardware for the controller/proxy." We run the Folding@home project " We run the first cmssw workflows over Christmas." Cloud running since January 2013 (if conditions permit)." Also simultaneously when data taking on the heavy ions runs." June 2013. Cloud using the 20Gbit links." August 2013. Fully redundant scalable grizzly deployed."

CMSoooooCloud Achievements" - No impact on data taking." - Nor during the setup phase." - Neither during data taking on Heavy Ion runs." - Controlled ~1300 compute nodes (hypervisors)." - Deployed to simultaneously run ~3500 VMs in a stable manner." - Deployed ~250 VMs (newest cluster) in ~5 min if previously deployed (cached image in hypervisor). " - Move resources to be used or not by the cloud in seconds." - Able to run more than 6000 jobs simultaneously." - The man power dedicated to cloudify the HLT cluster was low (~1.5 FTE or less for ~6 months) for the potential offered."

Outlook of CMSoooCloud" Continue the operation as a GRID site." Integrate OpenStack with the online control to allow dynamic allocation of resources to the Cloud" Interoperate with CERN IT s OpenStack Cloud, as a connected cell/zone." Use it for CMS online internal developments to benefit from the lower thresholds from idea to results."

CMSoooooCloud: Conclusions" An overlay Cloud layer has been deployed on the CMS online High Level Trigger cluster with zero impact on data taking. One of the largest OpenStack clouds." We shared the knowledge on how to deploy such an overlay layer to existing sites that may transition to cloud infrastructures (ATLAS online, Bologna, IFAE)." We gained experience on how to contextualize and deploy VMs on the cloud infrastructures, that are becoming commonplace, to run the GRID jobs." We were able to run different kind of GRID jobs and non GRID jobs on our cloud."

Thank you. Questions?"