Themis Athanassiadou HPC Project Manager
About 12 years Europe's Dedicated Specialist for High-Performance Computing End-to-end hardware/software/services solution provider HPC engineering and innovation is at the heart of what we do Active in Europe, Middle-East & Africa, Asia-Pacific Amsterdam based: - Well connected to major European business locations Quality - ISO9001:2008 & ISO14001 certified More than 400 projects and 250 customers
Customers Industry Government Education
HPC in Industry Systems share Top 500 Government 4% Vendor 3% Academic 18% Industry 51% Research 24% Industry Research Academic Government Vendor
Today, to Out-Compute is to Out-Compete HPC: Enables development of new products and services Reduces time to market Reduces R&D costs Increases quality Reduces personnel costs
HPC powers industry giants
and empowers a broad spectrum of other businesses
Adopting HPC can be challenging for many businesses Lack of infrastructure Cost of equipment Cost of operation Lack of expertise Lack of experience
CV Innovations that ease HPC adoption OpenStack Cloud Compute Convergence of big data, cloud and HPC Merge with the general IT infrastructure Open industry standard, open source Securely host CPU cycles and storage HPC mineral oil based cooling solution Saving ~20% power: no air-cooling or fans Less current leakage, higher HPC performance Skinless servers Re-use of racks and same oil for > 15 years Remote System Administration Outsourced infrastructure management Power of scaling: lower your cost Especially suitable for OpenStack You can focus on workflow instead of hardware Trinity HPC enabled cloud environment OpenStack base. Support for containers Enhanced for High Performance Compute Full HPC Ecosystem
Why cloud? Freedom of choice and flexibility Main characteristics: On demand self-service Broad network access Resource pooling Rapid elasticity Measured service Service models: Software as a Service Platform as a Service Infrastructure as a Service Deployment models: Private Cloud Public Cloud Hybrid Cloud Community Cloud
Ideally Engineering (HPC nodes) IT department Finance Physical Hardware (Servers, Database, Storage, GPU Pool)
To get there, you need some form of virtualization Engineering IT department Finance VM VM VM Virtual Machine Monitor Physical Hardware (Servers, Database, Storage, GPU Pool)
Virtualization technologies HYPERVISORS (eg. Vmware, Xen, KVM) Full virtualization Great workload isolation Slower VS CONTAINERS (eg LXC, Docker) Lightweight virtualization Good workload isolation Faster Image credit: CISCO
Which is best for HPC? HPC applications: usually tuned to specific hardware strive to maximize performance (compute, I/O) some require a very fast network (Infiniband) Clouds only guarantee "minimal" level of performance CPU test: Linpack performance on 2 sockets (16 cores) IBM research report: RC25482 (AUS1407-001) July 21, 2014
Is performance using container virtualization good enough? According to the IBM study: Docker equals or exceeds KVM performance in every case tested (CPU, memory, I/O) For I/O-intensive workloads, both forms of virtualization should be used carefully. Network, which is very important in many applications, needs to be tested. CPU test: Linpack performance on 2 sockets (16 cores) Container based virtualization is a great starting point for HPC in the Cloud.
Building a suitable HPC Cloud An HPC Cloud should strive to find the balance between flexibility, convenience and acceptable performance.
Choosing a suitable Cloud for HPC Depends on the need (Public? Private? Level of abstraction? Specialized hardware? Performance? ) A number of companies, including Penguin, R-HPC, Amazon, Univa, SGI, Sabalcore, UberCloud and Gompute offer specialized HPC clouds. Evaluate and choose. For full freedom of choice/customization/security, build your cloud from an OpenSource project (OpenStack, OpenNebula, Eucalyptus) + add HPC functionality using containers.
Openstack: Cloud building toolkit of choice Open source set of software tools for building and managing cloud computing platforms for public and private clouds. Currently managed by the OpenStack Foundation. More than 200 companies have joined, including Dell, Intel, Red Hat and Oracle Rapidly becoming industry standard It is primarily deployed as an infrastructure as a service (IaaS) solution.
Challenges addressed by OpenStack Problem Manager: Virtual server and HPC environments running independently. Solution With OpenStack, you merge these into one single efficient environment Manager: Resources wasted when Virtual Desktop Infrastructure (VDI) is idle at night Admin: Inability to collaborate with external parties due to lack of a security infrastructure for hosting CPUs/disks Admin : Inflexibility in building and maintaining similar environments on multiple physical platforms User: Finance department needs to run the payroll for tomorrow, and doesn t have the resources to do so in time! With OpenStack you can easily switch from the Virtual Desktop Infrastructure (VDI) to HPC With OpenStack, you can securely host CPU/DISK for paying customers through the use of virtual instances / environments With OpenStack, you can easily share images which contain predefined application binaries and/or environments With OpenStack this user would more easily access a larger or even external infrastructure with the required CPU / Storage environment
Enhancing OpenStack for HPC Full HPC Stack - Monitoring - Checking - Module and Library Environment - Scheduler (SLURM, PBS, SGE etc) Performance Optimisations (containers) Typical HPC services and integration - InfiniBand - Parallel filesystems - GPUs/Accelerators
Trinity: Linking Cloud and HPC Trinity is a set of software tools for building and managing virtual HPC or OpenStack environments in a platform as a service (PaaS) solution, customized for HPC performance. Adds ease of management (Trinity dashboard) to HPC Scalable to tens of thousands of nodes Full hardware support (IPMI, infiniband, PXE) Provides full HPC stacks (schedulers, MPI, libraries) No performance loss (virtualization based on Docker) Allows customers to host their own private or public IaaS cloud (for general IT) Load balancing (HPC) partitions Environment customization
All of the standard HPC cluster manager requirements includ Features Bright CM IBM Platform Trinity Node provisioning Health check and monitoring GUI and command line interfaces SLURM, SGE & PBS support Parallel shell Modules environment Compilers, debuggers & profilers MPI + Scientific libraries Containerized HPC building blocks Cloud Computing Ready
A traditional cluster Login node(s) Worker node(s) Storage node(s)
A Trinity managed cluster Trinity Dashboard (Single management interface) Login node(s) Worker node(s) Storage node(s) Virtual Cluster A: runs the HPC stack for department A Virtual Cluster B: runs the HPC stack for department B Virtual Cluster C: runs VDI Virtual Cluster D: runs general IT infrastructure using OpenStack
In summary: An HPC Cloud is a powerful tool for the arsenal of any industry, small or big. It gives both power and flexibility at many stages of product design and testing It can reduce cost by consolidating resources used for different purposes New software technologies are alleviating performance concerns Many vendor choices to suit every need Trinity is a great choice for a private cloud, providing a full cluster manager, HPC stack and cloud management for HPC, IT, Data.
Thank You!
HPC, Cloud and BigData are coming together HPC CRM Database VDI Email Compute Virtualization Authentication Object storage Dashboard Resource Management Monitoring Deployment Hardware Resources (nodes, network, disks etc) HPC, Cloud and BigData have a lot of overlap: Centralised resources Same complex management, same complex environment Similar high performance storage and powerful server requirements Almost same physical networking Same controller <-> worker-node relationship
Neat tricks in Virtual Machines Current time Current time Node A Node B Node C Node D t t Reliable checkpoint restart of jobs Towards a 100% scheduling efficiency Fast track high priority users Move jobs within the cluster & outside the cluster The price: loosing performance (5% -> 1%)
Why HPC in the Cloud? What is OpenStack? A set of software tools for building and managing cloud computing platforms for public and private clouds. OpenStack is primarily deployed as an infrastructure as a service (IaaS) solution. OpenStack began in 2010 as a joint project of Rackspace Hosting and NASA. Currently, it is managed by the OpenStack Foundation. More than 200 companies have joined this project, including Dell, Intel and Oracle.
Challenges addressed with OpenStack Problem Manager: vmware and HPC environment running independently from each other Manager: Resources wasted on a Virtual Desktop Infrastructure (VDI) is idle at night Admin: Inability to collaborate with external parties due to lack of a security infrastructure for hosting CPUs/disks User Group: Inflexibility in building and maintaining similar environments on multiple physical platforms User: has a deadline for a paper/conference tomorrow, requires fast amounts of immediate CPU cycles Solution With OpenStack, you merge these into one single efficient environment With OpenStack you can easily switch from the Virtual Desktop Infrastructure (VDI) to HPC With OpenStack, you can securely host CPU/DISK for paying customers through the use of virtual instances / environments With OpenStack, you can easily share images which contain predefined application binaries and/or environments With OpenStack this user would more easily access a larger or even external infrastructure with the required CPU / Storage environment
Using the cloud to address growing challenges in HPC IT Manager Rising infrastructure & personnel costs Growing complexity / fragmented infrastructure Increasingly complicated personnel needs System Administrator Growing complexity of hardware & software environment Managing tenants with different needs/workflows Managing secure access to resources Dealing with hardware changes User Cluster environment different from workstation Software stack needed for workflow not pre-installed Non- availability of resource when needed most
Why HPC in the Cloud? What can the Cloud bring to HPC? HPC and Cloud also have significant differences Cloud: Split bigger computing units into smaller Timesharing execution model Elastic Provides commodity (virtual) hardware Increases utilisation to 85% HPC: Merge smaller computing units into a single whole Batch execution model Backfilling Needs specialised hardware (GPU, IB) Utilisation already above 85%
Recent References Dolphin Geophysical Lasting Partnership Onshore / offshore HPC solutions & extended consulting services Bright, BeeGFS, Dell / Asus hardware Volvo IT HPC services partner for Dell Main installs Lyon (France) Gothenburg (Sweden) Ecole Polytechnique Fédérale de Lausanne Framework contract 512+ compute nodes Intel Ivy Bridge / Haswell (Truescale & Servers) Close collaboration on application fine tuning XCat2 National Supercomputer Center (Sweden) 640 compute nodes Asus 4 nodes in 2U systems First Haswell reference: 2640V3 (8 cores / 2.6GHz)
What is RSA and why does it fit so nicely? What? end-to-end HPC management Why? - Avoid single point of failure - Empower your highly qualified admins - Use the power of scaling: lower your cost - Reliability of service delivery How? - Central monitoring system - Know instantly when something is wrong and react upon it - Software updates - Remote and onsite repair - Notification and explanation for actions - Management reporting
OpenStack Trinity OpenStack Trinity Based on open source components with custom OpenStack plugins xcat Deployment Docker Virtualization and image management SLURM Scheduling OpenMPI Communication RSA/Nagios Monitoring and Health-checks Cookbook style recipes and easy customizations
Trinity Why xcat? 1. xcat is stable, full featured and well tested & Open Source 2. xcat is also usable without OpenStack 3. Supports node discovery, image management, stateless nodes, IPMI abstraction, PXE boot 4. OpenStack ironic is still in beta 5. Bright is expensive 6. The main xcat weakness (lousy UI) will be mitigated by standardization and adding a custom OpenStack dashboard to xcat
Trinity Why Docker? Docker is an operating system level virtualization framework. 1. Used extensively by Google and other industry giants 2. Lightweight 3. Much faster than full machine virtualization 4. Good support for image management and versioning 5. Pluggable virtualization drivers (lxc, OpenVZ) 6. Pluggable storage virtualization (AUFS, devicemapper, VFS)
Current status of Trinity Phase 1 of the project is completed We support the following core features 1. Cluster partitioning (virtual clusters) 2. Multiple OS and applications versions between clusters 3. Isolated (sandboxed) applications between clusters 4. Setup OpenStack insidetrinity Will require manual customizations and improvisation to support this in the field. Overall dashboard is absent. POC Q3 2014: Bristol & Cambridge University.
A new approach: Container-based virtualization Linux containers LXC is an operating system level virtualization method for running multiple isolated Linux systems (containers) on a single control host. Docker is an open-source project that automates the deployment of applications inside containers, by providing an additional layer of abstraction and automation. - Lightweight - Fast provisioning - Workload isolation - Near bare metal performance
Adopting cloud computing is easier Lack of infrastructure Cost of equipment Cost of operation Lack of expertise Lack of experience
Current status of HARP Phase 2 (Q3 2014): 1. Allow remote management (RSA inside OpenStack dashboard) 2. Allow self-service partitioning by customers (use case 5) Phase 3 (Q4 2014): 1. Allow automated elasticity (meta-scheduling, repartition resources based on a calendar- use case 2) 2. Allow sharing of resources between HPC clusters 3. Allow self-service of jobs by customers of our customers (SaaS model)