GASPI A PGAS API for Scalable and Fault Tolerant Computing



Similar documents
A PGAS-based implementation for the unstructured CFD solver TAU

GPI Global Address Space Programming Interface

Petascale Software Challenges. William Gropp

Altix Usage and Application Programming. Welcome and Introduction

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Scientific Computing Programming with Parallel Objects

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

HPC enabling of OpenFOAM R for CFD applications

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Kriterien für ein PetaFlop System

Sourcery Overview & Virtual Machine Installation

- An Essential Building Block for Stable and Reliable Compute Clusters

Apache Hama Design Document v0.6

MPI / ClusterTools Update and Plans

Programming Languages for Large Scale Parallel Computing. Marc Snir

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

HPC performance applications on Virtual Clusters

HPC Software Requirements to Support an HPC Cluster Supercomputer

Advancing Applications Performance With InfiniBand

Data Centric Systems (DCS)

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

Kommunikation in HPC-Clustern

Access to the Federal High-Performance Computing-Centers

Mellanox HPC-X Software Toolkit Release Notes

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Pedraforca: ARM + GPU prototype

Resource Utilization of Middleware Components in Embedded Systems

EVITA-Project.org: E-Safety Vehicle Intrusion Protected Applications

Storage at a Distance; Using RoCE as a WAN Transport

SHARED HASH TABLES IN PARALLEL MODEL CHECKING

PRIMERGY server-based High Performance Computing solutions

Cluster, Grid, Cloud Concepts

Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. technische universität dortmund. fakultät für informatik informatik 12

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Trends in High-Performance Computing for Power Grid Applications

Manjrasoft Market Oriented Cloud Computing Platform

Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines

Implementing MPI-IO Shared File Pointers without File System Support

Symmetric Multiprocessing

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Understanding applications using the BSC performance tools

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Principles and characteristics of distributed systems and environments

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Hari Subramoni. Education: Employment: Research Interests: Projects:

A Multi-layered Domain-specific Language for Stencil Computations

Advanced Computer Networks. High Performance Networking I

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Performance Tools for Parallel Java Environments

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Scalability and Classifications

Multicore Parallel Computing with OpenMP

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

Performance Monitoring of Parallel Scientific Applications

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka

Application Performance Analysis Tools and Techniques

HP ProLiant SL270s Gen8 Server. Evaluation Report

GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources

HPC ABDS: The Case for an Integrating Apache Big Data Stack

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

How To Visualize Performance Data In A Computer Program

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Software Development around a Millisecond

Multi-Threading Performance on Commodity Multi-Core Processors

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Mellanox Academy Online Training (E-learning)

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure

Objective 1.2 Cloud Computing, Internet of Services and Advanced Software Engineering


Microsoft SMB Running Over RDMA in Windows Server 8

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

HPC with Multicore and GPUs

Operating System for the K computer

Building a Private Cloud with Eucalyptus

BLM 413E - Parallel Programming Lecture 3

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Transcription:

GASPI A PGAS API for Scalable and Fault Tolerant Computing Specification of a general purpose API for one-sided and asynchronous communication and provision of libraries, tools, examples and best practices Release 1.0 in November 2012 Reported by J.-P. Weiß, Facing the Multicore-Challenge III, Sep 21, 2012

GASPI - Overview Funding Programme: ICT 2020 - Research for Innovation Funding Focus: HPC Software for Scalable Parallel Computers Funding Code: 01IH11007A Funding Volume: 2 Million Euro Duration: June 1, 2011 - May 31, 2014 Coordinator: Dr. Christian Simmendinger T-Systems SfR Pfaffenwaldring 38-40 70569 Stuttgart christian.simmendinger@t-systems.com

Background and Motivation Current parallel software is mainly MPI-based Adaptation to current hardware has highlighted significant weaknesses which preclude scalability on heterogeneous multi-core systems MPI is huge, not very flexible due to backward compatibility New demands on programming models: flexible threading models support for data locality asynchronous communication manage storage subsystems with varying bandwidth and latency This "Multicore-Challenge stimulates the development of new programming models and programming languages and leads to new challenges for mathematical modeling and algorithms

GASPI - Motivation I GASPI targets extreme scalability in the exascale age Overcome the limitations of MPI (are there any?!) GASPI aims to initiate a paradigm shift From bulk-synchronous two-sided communication patterns towards an asynchronous communication and execution model GASPI challenges algorithms, implementations and applications Rethink your communication patterns! Reformulate towards an asynchronous data flow model GASPI provides multiple memory segments per GASPI process GASPI addresses heterogeneous machines

GASPI - Motivation II GASPI is not a new language Not like X10, UPC, Chapel GASPI is not a language extension Not like Co-Array Fortran GASPI complements existing languages with a PGAS API Very much like MPI GASPI supports multiple memory models Not like OpenShmem or Global Arrays GASPI can be combined with any threading model GASPI is not fixed to SPMD or MPMD style of execution

GASPI - Motivation III GASPI is fault tolerant It provides time-out mechanisms for all non-local procedures Failure detection can handle node failures and delayed responses Can check sanity of communication partners by state vectors GASPI can be adapted to shrinking or growing node sets GASPI leverages one-sided RDMA driven communication Implemented on top of the IB verbs layer and OFED stack Communication handled by the network infrastructure No involvement of CPU cores

GASPI Features I Processes, groups, ranks Multiple PGAS memory segments per process Dynamic support for heterogeneous systems (GPUs, MICs, ) One-sided communication primitives Asynchronous communication by remote read and write Handled by local queues, no copy operations into buffers Notification mechanisms for communication partners Passive communication When the sender may be unknown, two-sided semantics Fair distributed updates of globally shared parts of data

GASPI Features II Weak synchronization primitives Global atomics fetch_and_add, compare_and_swap Counters as globally shared variables or for synchronization Collective communication Allreduce, broadcast, barrier with group support User-defined global collectives Asynchronous versions provided Time-out mechanisms for non-blocking routines Enable fault tolerance

Project Activities I Specification of the GASPI standard for a PGAS API Ensure interoperability with MPI Take into account requirements of applications Provision of open-source GASPI implementation Portable high-performance library for one-sided and asynchronous communication Adaptation and further development of the Vampir performance analysis suite for the GASPI standard

Project Activities II Development efficient numerical libraries based on GASPI core functions; sparse and dense linear algebra routines, high level solvers, FEM code Verification through porting of complex, industry-oriented applications Evaluation, benchmarking and performance analysis Outreach to the HPC & Scientific Computing Community by information dissemination, formation of user groups, trainings and workshops

Key Objectives In a Partitioned Global Address Space every thread can read/write the entire global memory of an application. Scalability From bulk synchronous two sided communication patterns to asynchronous one-sided communication Fault Tolerance Timeouts in non-local operations, dynamic node sets Flexibility Support for multiple memory models, multiple segments, configurable hardware resources Versatility PGAS API - beyond the message passing model of MPI

Project Partners Fraunhofer Gesellschaft e.v. Fraunhofer ITWM Fraunhofer SCAI T-Systems Solutions for Research GmbH Forschungszentrum Jülich Karlsruhe Institute of Technology Deutsches Zentrum für Luft- und Raumfahrt e.v. Institute of Aerodynamics and Flow Technology Institute of Propulsion Technology Technische Universität Dresden Center for Information Services and HPC Deutscher Wetterdienst scapos AG

Contributors Thomas Alrutz 1, Jan Backhaus 2, Thomas Brandes 3, Vanessa End 1, Thomas Gerhold 4, Alfred Geiger 1, Daniel Grünewald 5, Vincent Heuveline 6, Jens Jägersküpper 4, Andreas Knüpfer 7, Olaf Krzikalla 7, Edmund Kügeler 2, Carsten Lojewski 5, Guy Lonsdale 8, Ralph Müller-Pfefferkorn 7, Wolfgang Nagel 7, Lena Oden 5, Franz-Josef Pfreundt 5, Mirko Rahn 5, Michael Sattler 1, Mareike Schmidtobreick 6, Annika Schiller 9, Christian Simmendinger 1, Thomas Soddemann 3, Godehard Sutmann 9, Henning Weber 10, Jan-Philipp Weiß 2 1 T-Systems SfR, Stuttgart & Göttingen, 2 DLR, Institut für Antriebstechnik, Köln 3 Fraunhofer SCAI, Sankt Augustin 4 DLR, Institut für Aerodynamik und Strömungstechnik, Braunschweig & Göttingen 5 Fraunhofer ITWM, Kaiserslautern 6 Engineering Mathematics and Computing Lab (EMCL), KIT Karlsruhe 7 Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), TU Dresden 8 scapos AG, Sankt Augustin 9 Forschungszentrum Jülich 10 Deutscher Wetterdienst (DWD), Offenbach