OASIS3-MCT & Open-PALM: 2 open source codes couplers

Similar documents
Part I Courses Syllabus

The PRISM software framework and the OASIS coupler

EC-Earth: new global earth system model

BLM 413E - Parallel Programming Lecture 3

Collaborative modelling and concurrent scientific data analysis:

HPC Wales Skills Academy Course Catalogue 2015

Sourcery Overview & Virtual Machine Installation

SCADA/HMI MOVICON TRAINING COURSE PROGRAM

How To Visualize Performance Data In A Computer Program

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

HPC Deployment of OpenFOAM in an Industrial Setting

Schedule WRF model executions in parallel computing environments using Python

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Software Development around a Millisecond

High Performance Computing. Course Notes HPC Fundamentals

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Data Centric Systems (DCS)

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Parallel Analysis and Visualization on Cray Compute Node Linux

Performance Monitoring of Parallel Scientific Applications

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Kodak Remote Support System - RSS VPN

1 Bull, 2011 Bull Extreme Computing

Cluster, Grid, Cloud Concepts

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

End-user Tools for Application Performance Analysis Using Hardware Counters

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Advanced discretisation techniques (a collection of first and second order schemes); Innovative algorithms and robust solvers for fast convergence.

Mariana Vertenstein. NCAR Earth System Laboratory CESM Software Engineering Group (CSEG) NCAR is sponsored by the National Science Foundation

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Chapter 3: Operating-System Structures. Common System Components

Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

Linux tools for debugging and profiling MPI codes

Building an Inexpensive Parallel Computer

MayaVi: A free tool for CFD data visualization

Mathematical Libraries on JUQUEEN. JSC Training Course

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Introduction to Embedded Systems. Software Update Problem

CICS Transactions Measurement with no Pain

JOURNAL OF OBJECT TECHNOLOGY

How To Understand The Concept Of A Distributed System

HPC enabling of OpenFOAM R for CFD applications

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

Structural Health Monitoring Tools (SHMTools)

OPERATING SYSTEM SERVICES

Thermo Scientific Compound Discoverer Software. A New Generation. of integrated solutions for small molecule structure ID

The Service Revolution software engineering without programming languages

System Structures. Services Interface Structure

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

BSC vision on Big Data and extreme scale computing

EVOLUTION AND INDUSTRIALIZATION OF A SBAS REAL-TIME PERFORMANCE MONITORING TOOL (EVORA)

Building Technologies

FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling

Application Note: AN00141 xcore-xa - Application Development

Chapter 18: Database System Architectures. Centralized Systems

NetFlow Feature Acceleration

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

SCADAPack E ISaGRAF 3 User Manual

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4

Virtual Machines.

Types Of Operating Systems

Full and Para Virtualization

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Amazon EC2 Product Details Page 1 of 5

SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.

Turbomachinery CFD on many-core platforms experiences and strategies

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Deploying Clusters at Electricité de France. Jean-Yves Berthou

CISCO CONTENT SWITCHING MODULE SOFTWARE VERSION 4.1(1) FOR THE CISCO CATALYST 6500 SERIES SWITCH AND CISCO 7600 SERIES ROUTER

Introduction to Cloud Computing

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Building Platform as a Service for Scientific Applications

Notes and terms of conditions. Vendor shall note the following terms and conditions/ information before they submit their quote.

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Origins of Operating Systems OS/360. Martin Grund HPI

Simplest Scalable Architecture

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

Voice Internet Phone Gateway

Operating Systems 4 th Class

Recent Advances in Periscope for Performance Analysis and Tuning

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

Transcription:

OASIS3-MCT & Open-PALM: 2 open source codes couplers https://verc.enes.org/oasis/ http://www.cerfacs.fr/globc/palm_web/ Anthony Thevenin (thevenin@cerfacs.fr)

Code coupling issues Ê Why coupling scientific computing codes? Ä To treat a global system e.g. ocean - atmosphere coupling for Ocean + Atmosphere e climate modeling NEMO ARPEGE flux exchanges Ä To set up new applications from existing codes e.g. fluid - structure coupling for the research of the optimal position of the intakes in a combustion chamber Ä To build modular applications by splitting complex problems and assembling their elementary components e.g. data assimilation Page 2

Ê What does it imply? Code coupling issues Ä Drive the information exchanges between codes Ä Manage codes execution (serial or parallel) DATA exchanges Code1 Code2 Ê Commitments: Ä Easy to implement: non intrusive interfaces Ä Efficiency: no loss in performances Ä Portability: standard technical solutions Ä Flexibility: test different configurations with a few changes Ä Genericity: reuse components in other couplings Page 3

Ê First solution : code merging Technical coupling solutions Code 1: ocean model Program prog1 Call sub_prog2(in, out) end Code 2: atmosphere model Program prog2 Subroutine sub_prog2(in,out) end Ä Very efficient (memory exchange) and portable Ä BUT : c integration problems (no independency between codes) c No flexibility (problems in maintenance, evolution, ) c Potential memory wastes c parallelism of the master code imposed to the application Ä This is merging and not coupling Page 4

Technical coupling solutions Ê Second solution: use a communication protocol (MPI, CORBA, files, ) Code 1: ocean model Program prog1 Call XXX_send(prog2, data) end Code 2: atmosphere model Program prog2 Call XXX_recv(prog1, data) end Ä Can be very efficient and preserve existing codes (parallelism) Ä But: c Coupling is not generic c Good experience in parallel computing required c Lack of flexibility (interpolation, transformation, ) c Not always portable depending on the chosen mechanism c Too complex with more than two codes and many exchanges Page 5

Technical coupling solutions Ê Third solution: choose a coupler (or a coupling library) Ä A coupler is a software allowing data exchanges between several codes and managing their executions Ä It starts the executables in a sequence or in parallel Ä Data exchanges are invoked by general PUT/GET primitives Ä It uses a portable and efficient communication protocol (MPI) Ä It provides tools: c time and spatial interpolations, c parallel data redistribution, c monitoring, performance analysis COUPLER Code 1 Code 2 Code n Page 6

Technical coupling solutions Ê Static coupling (OASIS) Dynamic coupling (Open-PALM) Code1 Code2 Time Code1 Code2 Loop Code3 condition Code4 c Codes are started at the beginning of the simulation and exit together at the end of the application c A component can be launched and can release resources upon termination at any moment during the simulation. Computing resources (required memory and the number of concurrent processors) are handled by the coupler c Programs can be executed in loops or under logical conditions (faculty of describing complex coupling algorithms) Page 7

Technical coupling solutions Ê Managing data exchanges between components: end-point communications DATA Code1 Code2 Ä When a code produces a data potentially interesting for other components, it informs the coupler: c appeal to a generic Put primitive (non-blocking) Ä When a calculation code needs data, it tells the coupler that it is waiting for some information: c appeal to a generic Get primitive (blocking) Ä Total independence between the codes: c NO explicit indication of the receiving units on the producer side, NOR of the producer on the receivers side c replacement of a code by another one without interfering with the rest the application. Page 8

Technical coupling solutions Ê The coupler: probably the best solution to couple independently developed code Ä At run time, the component models remain separate executables and the coupler acts as a communication library linked to the models Ä Two levels of parallelism: c distributed components (codes can be parallel programs) c task parallelism (codes can be executed in parallel) Ä BUT: c multi-executable: possible waste of resources if sequential execution of the components is enforced c multi-executable: more difficult to debug; harder to manage for the OS c more or less efficient depending on the coupling algorithm Page 9

OASIS3 MCT, an open source codes coupler for massively parallel climate modeling https://verc.enes.org/oasis/ Page 10

OASIS3 - MCT 1991 1993 2001 2010 PRISM IS-ENES OASIS1 OASIS2 OASIS3 OASIS3-MCT OASIS4 Ê PRISM : Partnership for Research Infrastructure in Earth System Modeling Ê IS-ENES: InfraStructure for ENES : European project Ê Current developers : 2 permanents (CERFACS, CNRS) & 1 consultant Ê All sources are written in F90 and C Ê Open source product, distributed under a LGPL license Ê All external libraries used are public domain: MPI, NetCDF ; or open source: SCRIP (Los Alamos National Laboratory), MCT (Argonne National Laboratory) Page 11

OASIS3 - MCT Ê About 35 groups world-wide (climate modelling or operational monthly/seasonal forecasting): France: CERFACS, METEO-FRANCE, IPSL (LOCEAN, LMD, LSCE), OMP, LGGE, IFREMER Europe: ECMWF + Ec-Earth community Germany: MPI-M, IFM-GEOMAR, HZG, U. Frankfurt UK: MetOffice, NCAS/U. Reading, ICL Denmark: DMI Norway: U. Bergen Sweden: SMHI, U. Lund Ireland: ICHEC, NUI Galway The Netherland: KNMI Switzerland: ETH Zurich Italy: INGV, ENEA, CASPUR Czech_Republic :CHMI Spain: U. Castilla Tunisia: Inst. Nat. Met Japan: JMA, JAMSTEC China: IAP-CAS, Met. Nat. Centre, SCSIO Korea: KMA Australia: CSIRO New Zealand: NIWA Canada: RPN-Environment Canada, UQAM USA: Oregon State U., Hawaii U., JPL, MIT Peru: IGP OASIS3 is used in 5 European ESMs that participate in IPCC AR5 Page 12

OASIS3 - MCT Ê To use OASIS3-MCT: Ä Register on the website and download the sources Ä Compile and run the tutorial on the target platform Ê To configure OASIS3-MCT: Ä Identify the component models and their grids Ä Identify the coupling fields to be exchanged between those models Ä Identify the coupling parameters: - total duration - sources and targets symbolic names - exchange period - field transformations and interpolations c write a full namcouple configuration file Ä Insert calls of the OASIS3-MCT communication library Ê And then: Ä Compile OASIS3-MCT and link the component models with it Ä Start the models and let OASIS3-MCT manage the coupling exchanges Page 13

OASIS3 - MCT Ê Component model instrumentation: Ä Initialization: Ä Grid definition: Ä Local partition definition: Ä Coupling field declaration: Ä End of definition phase: call oasis_init_comp( )! call oasis_write_grid( )! call oasis_def_partition( )! call oasis_def_var( )! call oasis_enddef( )!! Ä Coupling field exchange: in model time stepping loop : call oasis_put(, date, field, )! call oasis_get(, date, field, )! c user s defined source or target c sending or receiving at appropriate time only c automatic averaging / accumulation if requested c automatic writing of coupling restart file at the end of run Ä Termination : call oasis_terminate( )! Page 14

OASIS3 - MCT Ê Communication: Ä Fully parallel communications between parallel models, based on MPI! pe1 pe2 pe3 model1 model2 Ä Interpolation of the coupling fields performed directly on the source or target component processes Ä Parallel redistribution of the coupling fields done directly from the source to the target component processes c without gathering the whole coupling field on an additional process as in the previous OASIS3 versions (no bottleneck) Ê I/O functionality: Ä Switch between coupled and forced mode! pe1 pe2 pe3 model1 file pe1 pe2 pe3 pe4 Page 15

Ê Interpolations: OASIS3 - MCT Ê Interpolations & transformations: Ä On 2D vector fields Ä On different types of grids : lat-lon, rotated, gaussian reduced, unstructured Ê Transformations: Ä Statistics, addition / multiplication by a scalar, global conservation x x1 x2 x3 x4 Nearest-neighbor bilinear conservative x source grid point target grid point Page 16

OASIS3 - MCT Ê Performance: 0,3500 Time for one ping-pong exchange on Curie (toy models) for OASIS-MCT and OASIS3.3 0,3000 Time (seconds) 0,2500 0,2000 0,1500 0,1000 0,0500 oasis3-mct #1 oasis3.3 #1 0,0000 1 2 4 8 16 32 64 128 256 512 1024 2048 Number of cores Page 17

OASIS3 - MCT Ê Conclusions : Ê Perspectives: Ä First version of OASIS3-MCT released in August 2012 Ä Very simple to use for traditional OASIS3 users Ä OASIS3-MCT solves the problem of the coupling bottleneck observed at high number of cores for previous OASIS versions, with which the coupling fields had to be reassembled onto one (or more) coupler process to perform a mono-process interpolation Ä Excellent example of the benefit of shared software Ä Bi-cubic interpolation Ä Work on scalability at >1000 cores Ê Regular training sessions at CERFACS Page 18

Open-PALM : an open source codes coupler for massively parallel multiphysics/multi-components applications and dynamic algorithms http://www.cerfacs.fr/globc/palm_web/ Page 19

Ê History : Open-PALM Ä 1996: Mercator project: operational oceanography c data assimilation with a high resolution model c need for a modular and efficient tool to compose assimilation algorithm (component coupling approach) Ä Expertise at CERFACS: c OASIS coupler for climate modeling c parallel computations, data assimilation Ä Human power for the project: c 2 Mercator engineers c 1 CNRS engineer c 1 to 2 CERFACS engineers Ä 2000: first proto PALM_PROTO for Mercator Ä 2002: PALM_Research, SPMD version that simulates MPMD Ä 2003: PALM_MP, true MPMD Ä 2011: Open-PALM, open-source (LGPL) and collaboration with ONERA Page 20

Open-PALM Ê 60 groups use OpenPALM, half for data assimilation, half for multi physic and multi component simulations Ê > 300 people trained to OpenPALM Ê Various domains of use: - D.A.: Oceanography, hydrology, atmospheric chemistry, - Multi physic: hydrology, CFD, agriculture... Ê At CERFACS : 3 teams regularly use OpenPALM: GLOBC, CFD, AE, More than 15 regular users among the about 120 people, Most of CERFACS partners use the code or plan to use it : Météo-France, EDF, CNES, SAFRAN, ONERA Ê Current developers : CERFACS : 3 permanents & 1 post-doc ONERA : 1 permanent Page 21

Open-PALM Ê The graphical user interface PrePALM (CERFACS): Ä Definition of the coupling scheme Ä Monitoring and post-processing Ä Runs on standard PC or on clusters, allows to define coupling algorithms and prepare compilations process Ê PALM library (CERFACS): Ä Dynamic launching by the PALM driver which drives the parallel/sequential execution of the components Ä Management of data exchanges Ä The PALM library is compiled on the different machines where the applications are executed Ê CWIPI library (ONERA): Ä Manages fully parallel data exchanged based on distributed mesh definition Ä Interpolation between non coincident meshes c Open-PALM = PALM (GUI and library) + CWIPI Page 22

Open-PALM Ê PrePALM: a friendly and simple GUI for: Ä Representing the components Ä Handling the parallelism (the two levels) Ä Managing the resources Ä Describing loops and conditions Ä Describing the communications (exchanges) c implement and supervise coupled applications Ê With all the benefits of a GUI as, for instance: Ä Coherency checks on the exchanged data Ä Pictorial flow chart representation of the algorithm Ä Performance analysis, debugging and run-time monitoring Page 23

Open-PALM Page 24

Open-PALM TIME Page 25

Open-PALM Branch code: Control structures Fortran regions Branch: sequence of units and instructions c algorithm Unit: computational component Communications: described by connecting the plugs - Code - Module of a code - Data loaders (NetCDF) - Pre-defined unit Page 26

Open-PALM Ê Two levels of parallelism to take the best advantage from the intrinsic parallelism of an application (MPI+OpenMP possible) 1 Task parallelism: by drawing branches we program task parallelism Distributed components (each component is a MPI executable): parallel units Page 27

Open-PALM Ê Process management: Ä The PrePALM scheme is converted into a graph for the PALM driver Ä This driver schedules this graph in order to launch the components depending on the available resources Ä MPI-2 spawn as well as MPI-2 client/server are used for the dynamic launching of the units. MPI-1 launching available (choice during installation) Ê Target machines: Ä HPC or clusters Ä Designed for homogeneous computing but functionalities exist for heterogeneous computing (TCP/IP) Ä Open-PALM is ported on many architectures (BULL, NEC, CRAY, IBM, SGI, Linux, Mac OS X, Windows via Cygwin) Page 28

Open-PALM Ê Transformation of a code into a PALM unit: Ä Creation of an ID card (Read by PrePALM) Ä Replace PROGRAM (FORTRAN) by subroutine, or MAIN (C) by a function name Ä For a parallel unit: c skip the calls to MPI_INIT and MPI_FINALIZE c replace MPI_COMM_WORLD by PL_COMM_EXEC c replace the calls to STOP, EXIT by PALM_Abort Ä Add the PALM Primitives into the codes: c PALM_Put(args): to send data c PALM_Get(args): to receive data Ê Pre-requisites: Ä Codes must run under Linux/Unix Ä Code sources should be available AND have to be written in a compiled language (C, C++, FORTRAN 77 ou 90) OR Ä User functions (C or FORTRAN) with pointers on data provided by the code Page 29

Ê Algebra toolbox: Open-PALM Pre-defined units interfacing common mathematical libraries (BLAS, LAPACK, ) applied on the fly to objects exchanged between units (Interesting for units conversion, ). Load the unit from the Algebra Toolbox Execution and communications are managed as if it were a user unit Page 30

Ê Monitoring Tools: Open-PALM Ä Run time monitoring: PrePALM offers the opportunity of monitoring the computation evolution and the resources usage during the execution of an application. Ä Performances analysis: c CPU and elapsed time statistics for the user defined units c post-mortem replay of the algorithm progress in the graphical interface Ä Debugging tools: c a user defined debugging function, called on the PALM_Put and PALM_Get sides of an active communication to check the coherency of the objects, to track the execution and compute the global performances of the parallel execution c a PALM output file per branch with different verbosity levels set by the user for different classes of messages Page 31

Open-PALM Page 32

Open-PALM Page 33

Open-PALM Ê The CWIPI: library for data exchange with interpolation in parallel Ä Open source (LGPL), developed at ONERA since 2009 Ä Management of coupling between n parallel codes Ä Direct data exchange of interpolated fields between a source and a target meshes (non coincident and differently partitioned on several processes) Ä CWIPI primitives Interfaced for C/C++, Fortran and Python codes Ê Type of meshes: Ä 1D: edges Ä 2D: triangles, quadrangles, polygons Ä 3D: tetrahedrons, pyramids, prisms, hexahedra, polyhedra Ê Source and target mesh type can be the same or not Ê Primitives are called in the codes Page 34

Open-PALM Ê PALM is a software dealing with couplings or with complex applications built upon sub-modules while respecting the performances of the codes Ê Friendly and simple G.U.I. to describe, implement and supervise applications Ê Many interests: Ä Facilitate evolution and maintenance of complex applications and of its components, Ä Easy integration of new codes replacement, in existing applications (multi-physic coupling, collaborative development ) Ä Maximize the use of intrinsic parallelism of applications and algorithms Ê Regular training sessions at CERFACS (or in situ) Page 35

OASIS3-MCT & Open-PALM: 2 open source code couplers https://verc.enes.org/oasis/ http://www.cerfacs.fr/globc/palm_web/ Anthony Thevenin (thevenin@cerfacs.fr)