Virtual InfiniBand Clusters for HPC Clouds



Similar documents
Virtual InfiniBand Clusters for HPC Clouds

Open Cirrus: Towards an Open Source Cloud Stack

Emerging Technology for the Next Decade

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Cloud Computing. Alex Crawford Ben Johnstone

Elastic Management of Cluster based Services in the Cloud

Introduction to Cloud Design Four Design Principals For IaaS

Cloud and Virtualization to Support Grid Infrastructures

Performance Management for Cloudbased STC 2012

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

Lecture 02a Cloud Computing I

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Cloud Computing through Virtualization and HPC technologies

Proactively Secure Your Cloud Computing Platform

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Impact of Advanced Virtualization Technologies on Grid Computing Centers

Automating Big Data Benchmarking for Different Architectures with ALOJA

Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise

New resource provision paradigms for Grid Infrastructures: Virtualization and Cloud

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Cloud Models and Platforms

International Symposium on Grid Computing 2009 April 23th, Academia Sinica, Taipei, Taiwan

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Network performance in virtual infrastructures

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Storage, Cloud, Web 2.0, Big Data Driving Growth

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Grid Computing Vs. Cloud Computing

Group-Based Policy for OpenStack

Cloud Computing and Open Source: Watching Hype meet Reality

Pros and Cons of HPC Cloud Computing

Advancing Towards the Future of Cloud Computing: Intel Open Cloud Vision

Lecture 02b Cloud Computing II

Software-Defined Networks Powered by VellOS

Performance Management for Cloud-based Applications STC 2012

Cloud Optimize Your IT

Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER

Virtualization, SDN and NFV

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Challenges in Hybrid and Federated Cloud Computing

Infrastructure as a Service (IaaS)

Flexible Building Blocks for Software Defined Network Function Virtualization (Tenant-Programmable Virtual Networks)

Dynamic Extension of a Virtualized Cluster by using Cloud Resources CHEP 2012

Towards a New Model for the Infrastructure Grid

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services

Building Platform as a Service for Scientific Applications

OpenNebula An Innovative Open Source Toolkit for Building Cloud Solutions

Cloud computing: the state of the art and challenges. Jānis Kampars Riga Technical University

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Cloud Performance Considerations

Cloud Computing. Adam Barker

High Performance Applications over the Cloud: Gains and Losses

Architectural Implications of Cloud Computing

Performance Evaluation of the XDEM framework on the OpenStack Cloud Computing Middleware

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

Cloud Computing and the Internet. Conferenza GARR 2010

IaaS Federation. Contrail project. IaaS Federation! Objectives and Challenges! & SLA management in Federations 5/23/11

Goals, Difficulties, and Good Practices when Building an Internal Cloud. Peter Schnorf, Infrastructure Architecture

OpenNebula Latest Innovations in Private Cloud Computing

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Cloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd.

International Journal of Engineering Research & Management Technology

SDN CENTRALIZED NETWORK COMMAND AND CONTROL

Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures

High Performance Computing (HPC)

SLA BASED SERVICE BROKERING IN INTERCLOUD ENVIRONMENTS

- An Essential Building Block for Stable and Reliable Compute Clusters

CON Software-Defined Networking in a Hybrid, Open Data Center

Network Infrastructure Services CS848 Project

Oracle Applications and Cloud Computing - Future Direction

Data Centers and Cloud Computing

SC12 Cloud Compu,ng for Science Tutorial: Cloud Challenges

Achieving Performance Isolation with Lightweight Co-Kernels

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

GPU Accelerated Signal Processing in OpenStack. John Paul Walters. Computer Scien5st, USC Informa5on Sciences Ins5tute

Building an AWS-Compatible Hybrid Cloud with OpenStack

Cloud Optimize Your IT

VON/K: A Fast Virtual Overlay Network Embedded in KVM Hypervisor for High Performance Computing

Open Source Cloud Computing Management with OpenNebula

Optimize Server Virtualization with QLogic s 10GbE Secure SR-IOV

CLOUD COMPUTING. When It's smarter to rent than to buy

The Secret World of Cloud IaaS Pricing: How to Compare Apples and Oranges Among Cloud Providers

The OpenNebula Standard-based Open -source Toolkit to Build Cloud Infrastructures

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Cisco Unified Network Services: Overcome Obstacles to Cloud-Ready Deployments

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

State of the Art Cloud Infrastructure

White Paper. Requirements of Network Virtualization

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

全 新 企 業 網 路 儲 存 應 用 THE STORAGE NETWORK MATTERS FOR EMC IP STORAGE PLATFORMS

Private Cloud Database Consolidation with Exadata. Nitin Vengurlekar Technical Director/Cloud Evangelist

Lecture 7: Data Center Networks"

Evaluation Methodology of Converged Cloud Environments

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Datacenter Operating Systems

SURFsara HPC Cloud Workshop

Transcription:

Virtual InfiniBand Clusters for HPC Clouds April 10, 2012 Marius Hillenbrand, Viktor Mauch, Jan Stoess, Konrad Miller, Frank Bellosa SYSTEM ARCHITECTURE GROUP, 1 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP, KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

High Performance Computing + Clouds? HPC Applications Weather forecast, crash test simulations Today in use in all scientific disciplines Supercomputers / HPC Clusters Owned and operated by single institutions Fixed and inflexible run-time environments Cloud Promise: Infrastructure-as-a-Service Rent a service instead of buying and operating HW Pay and use capacity adapting to current demand Cloud Reality: No viable choice for HPC today 2 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Analysis: Clouds Today Contemporary clouds not viable for HPC High communication latency and jitter Performance acceptable for loosely-coupled applications [1,2] Communication-intensive workloads do not scale [3,4] Only premium offers compete with small commodity clusters (EC2 cluster compute instances) [5] Existing clouds cannot run communication-intensive applications which are crucial for HPC [1] Juve et al.: Scientific workflow applications on amazon EC2, 2010. [2] Montero et al.: An elasticity model for HTC clusters, 2011. [3] Napper and Bientinesi: Can cloud computing reach the top500? 2009. [4] Gupta and Milojicic: Evaluation of hpc applications on cloud, 2011. [5] Church et al.: Iaas clouds vs. clusters for hpc: A performance study, 2010. 3 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Proposal: Clouds on HPC Base Cloud environment on HPC infrastructure InfiniBand clusters BlueGene supercomputers Future PCI Express interconnects 4 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Proposal: Clouds on HPC Base Cloud environment on HPC infrastructure InfiniBand clusters BlueGene supercomputers Future PCI Express interconnects 4 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Differences of HPC and Clouds Clouds HPC Network Gigabit/10G Ethernet InfiniBand, BlueGene torus, PCI Express Network QoS Flexibility 77.5 µs in EC2 premium VMs Best effort on-demand (re)configuration custom OS image exchangeable SW layers 2 4 µs with InfiniBand QoS features in HW months for installation, weeks for re-partitioning fixed userbase, applications HW constraints are fixed, e.g. isolation/qos per node 5 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

HPC Cloud Architecture 6 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

HPC Cloud Architecture 6 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

HPC Cloud Architecture 6 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

HPC Cloud Architecture 6 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

HPC Cloud Architecture 6 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Network Isolation Goal: Prevent illegitimate traffic between virtual clusters Base: InfiniBand Partitions Membership per node, not per VM Applications freely choose partition to use Our extension: Transparent enforcement of partitions per VM 7 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Network Performance Isolation Goal: Ensure bandwidth and latency SLAs Base: InfiniBand Virtual Lanes Configurable traffic scheduling Known policies for QoS [6] Applications freely choose traffic class Our extension: Transparent enforcement of traffic classes per VM [6] Alfaro et al.: A formal model to manage the InfiniBand arbitration tables providing QoS, 2007. 8 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Implementation: Intercept Commands Base: HPC network virtualization Proposed by Liu et al. [7] Apps issue send/receive operations directly to HW Connection establishment via host OS Applied with SR-IOV Our extension: Intercept connection management in the host Map users partitions and traffic classes Protect physical network configuration Enforce isolation transparently to user [7] Liu et al.: High performance vmm-bypass i/o in virtual machines, 2006. 9 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Virtual HPC Network View Impression of a dedicated HPC network Behaving like physical network for user apps and config tools Custom node addresses, isolation and QoS Routing customized for communication pattern Topology state machine per virtual cluster Simulate configuration interface Redirect users accesses Repurpose debug tool ibsim for InfiniBand Cloud provider s challenge Virtual cluster placement according to constraints Merging virtual configuration of users 10 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Results Prototype VMs with InfiniBand access Automated isolation setup (partitions) Measurements cannot be published SR-IOV drivers in non-public beta PCI passthrough as substitute MPI application latency (SKaMPI) 77.5 µs in premium cloud offering (10GE) 3.4 µs in our prototype (IB @ 10 Gbit/s) Conceptual evaluation with published pre-alpha SR-IOV drivers Transparent enforcement of isolation works Protection of network configuration is inherent 11 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Future Work Transparent Live Migration on HPC Networks protocol state in hardware node addresses bound to physical nodes Low-Latency-Clouds for non-hpc workloads scale-out workloads bound by latency future tightly-coupled cloud environments adapt workloads to new communication primitives 12 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,

Conclusion Architecture for HPC Cloud Computing InfiniBand virtualization Network and performance isolation Transparent enforcement of isolation Virtual HPC network view Impression of exclusive use Behavior of a physical cluster Physical network configuration is protected Next step: Evaluation with SR-IOV 13 10.04.2012 Marius Hillenbrand - Virtual InfiniBand Clusters for HPC Clouds SYSTEM ARCHITECTURE GROUP,