PARIS*: Programming parallel and distributed systems for large scale numerical simulation applications. Christine Morin IRISA/INRIA

Similar documents
Kerrighed / XtreemOS cluster flavour

Ghost Process: a Sound Basis to Implement Process Duplication, Migration and Checkpoint/Restart in Linux Clusters

Is Virtualization Killing SSI Research?

P U B L I C A T I O N I N T E R N E 1656 OPENMOSIX, OPENSSI AND KERRIGHED: A COMPARATIVE STUDY

GAMoSe: An Accurate Monitoring Service For Grid Applications

P U B L I C A T I O N I N T E R N E 1669 CAPABILITIES FOR PER PROCESS TUNING OF DISTRIBUTED OPERATING SYSTEMS

P U B L I C A T I O N I N T E R N E 1704 SSI-OSCAR: A CLUSTER DISTRIBUTION FOR HIGH PERFORMANCE COMPUTING USING A SINGLE SYSTEM IMAGE

XtreemOS : des grilles aux nuages informatiques

Distributed System Monitoring and Failure Diagnosis using Cooperative Virtual Backdoors

Is Virtualization Killing SSI Research?

A Monitoring Tool to Manage the Dynamic Resource Requirements of a Grid Data Sharing Service

Distributed Operating Systems. Cluster Systems

The Advantages and Disadvantages of a Standard Data Storage System

Security of Information Systems hosted in Clouds: SLA Definition and Enforcement in a Dynamic Environment


Deliverable D2.2. Resource Management Systems for Distributed High Performance Computing

Work in Progress on Cloud Computing in Myriads Team and Contrail European Project Christine Morin, Inria

Architectural Review of Load Balancing Single System Image

Simple Introduction to Clusters

Energy efficiency in HPC :

Cellular Computing on a Linux Cluster

Kerrighed: use cases. Cyril Brulebois. Kerrighed. Kerlabs

Deploying Clusters at Electricité de France. Jean-Yves Berthou

Efficient Load Balancing using VM Migration by QEMU-KVM

Ressources management and runtime environments in the exascale computing era

Flauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically

PhantomOS: A Next Generation Grid Operating System

Box Leangsuksun+ * Thammasat University, Patumtani, Thailand # Oak Ridge National Laboratory, Oak Ridge, TN, USA + Louisiana Tech University, Ruston,

EIT ICT Labs MASTER SCHOOL DSS Programme Specialisations

Multi-core Curriculum Development at Georgia Tech: Experience and Future Steps

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

nanohub.org An Overview of Virtualization Techniques

Scheduling and Resource Management in Computational Mini-Grids

Large Scale Management of Virtual Machines Cooperative and Reactive Scheduling in Large-Scale Virtualized Platforms

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

MPI / ClusterTools Update and Plans

Using the Windows Cluster

Windows Compute Cluster Server Miron Krokhmal CTO

1 Bull, 2011 Bull Extreme Computing

Self-Adapting Load Balancing for DNS

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Linux clustering. Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

High Performance Computing. Course Notes HPC Fundamentals

Computing in High- Energy-Physics: How Virtualization meets the Grid

Clusters: Mainstream Technology for CAE

Cluster, Grid, Cloud Concepts

Fig. 3. PostgreSQL subsystems

LinuxWorld Conference & Expo Server Farms and XML Web Services

Virtualization for Cloud Computing

Software services competence in research and development activities at PSNC. Cezary Mazurek PSNC, Poland

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29. Jean-Louis Pazat. IRISA/INSA Rennes, FRANCE MYRIADS Project Team

Virtualization with Windows

Virtual machine interface. Operating system. Physical machine interface

Virtualization of a Cluster Batch System

Software Distributed Shared Memory Scalability and New Applications

HPC performance applications on Virtual Clusters

Using an MPI Cluster in the Control of a Mobile Robots System

University of Huddersfield Repository

Storage Virtualization from clusters to grid

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

High Performance Applications over the Cloud: Gains and Losses

An Oracle White Paper. Oracle Database Appliance X4-2

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

Improved LS-DYNA Performance on Sun Servers

Hadoop on the Gordon Data Intensive Cluster

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Big Data Management in the Clouds and HPC Systems

LSKA 2010 Survey Report Job Scheduler

Bulk Synchronous Programmers and Design

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Data Sharing Options for Scientific Workflows on Amazon EC2

Supercomputing Resources in BSC, RES and PRACE

Very Large Enterprise Network, Deployment, Users

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

I R I S A P U B L I C A T I O N I N T E R N E PROVIDING QOS IN A GRID APPICATION MONITORING SERVICE THOMAS ROPARS, EMMANUEL JEANVOINE, CHRISTINE MORIN

BlobSeer: Enabling Efficient Lock-Free, Versioning-Based Storage for Massive Data under Heavy Access Concurrency

REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters

SSI-OSCAR Single System Image - Open Source Cluster Application Resources

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Cloud Computing through Virtualization and HPC technologies

The Cost Effectiveness of PolyServe Matrix Server and SAPRITURITY

Using Peer to Peer Dynamic Querying in Grid Information Services

Table of Contents. Server Virtualization Peer Review cameron : modified, cameron

Achieving Performance Isolation with Lightweight Co-Kernels

Parallel Visualization for GIS Applications

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Universidad Simón Bolívar

XtreemFS a Distributed File System for Grids and Clouds Mikael Högqvist, Björn Kolbeck Zuse Institute Berlin XtreemFS Mikael Högqvist/Björn Kolbeck 1

Benchmark Framework for a Load Balancing Single System Image

Distributed Systems Architectures

On-Demand Supercomputing Multiplies the Possibilities

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Integrated Application and Data Protection. NEC ExpressCluster White Paper

BSC - Barcelona Supercomputer Center

On Cloud Computing Technology in the Construction of Digital Campus

HAVmS: Highly Available Virtual machine Computer System Fault Tolerant with Automatic Failback and close to zero downtime

Transcription:

PARIS*: Programming parallel and distributed systems for large scale numerical simulation applications Kerrighed, Vigne Christine Morin IRISA/INRIA * Common project with CNRS, ENS-Cachan, INRIA, INSA, Université de Rennes 1

Members of PARIS Project (sept 05) Scientific leader T. Priol (DR INRIA) Researchers F. André (Prof IFSIC) G. Antoniu (CR INRIA) J-P. Banâtre (Prof IFSIC) M. Bertier (MdC INSA) L. Bougé (Prof ENS) Y. Jégou (CR INRIA) A-M. Kermarrec (DR INRIA) C. Morin (DR INRIA) J.L. Pazat (MdC INSA) C. Perez (CR INRIA) Post-docs A. Viana A. Ribes G. Vallée Engineers D. Margery (IR INRIA) P. Morillon (IE IFSIC) Engineers P. Gallard (DGA) V. Lefèvre (G5K - IFSIC) R. Lottiaux (DGA) G. Mornet (G5K - PRIR) P. Palosaari (CoreGRID) J. Parpaillon (Ing. Associé) PhD candidates H-L. Bouziane (INRIA 2) J. Buisson (MENRT 3) Y. Busnel (ENS 1) L. Cudennec (INRIA-Région 1) M. Fertré (MENRT 1) M. Jan (MENRT 3) E. Jeanvoine (CIFRE 2) S. Lacour (INRIA 3) E. Le Merrer (CIFRE FT) S. Monnet (INRIA-Région 3) Y. Radenac (MENRT 3) E. Riviere (MENRT 2) L. Rilling (ENS 3)

Studied Systems Clusters A set of interconnected PC used as a single computing resource Grid A set of resources (processor, memory, disk, ) interconnected via Internet P2P systems A dynamic distributed system without any global state

Research Directions Single system image operating system Problem: clusters are difficult to program/use Challenge: to give the illusion that a cluster is a single machine Component based middleware Problème: code coupling applications are complex Challenge: How to facilitate the design of such applications while providing high performance? Advanced programming models Problème: Current programming models are not adequate for highly dynamic systems Challenge: How to express computing/coordination is such an environement? Data sharing service Problème: data sharing in large scale grids Challenge: sharing mutable data Systèmes P2P Problème: Master and optimize a P2P system Challenge: Characterize a P2P system and searching relevant information Experimental grid platform Problème: Need to experiment to validate our research results Challenge: building a reconfigurable grid

Grid 5000 Experimental Platform Contribution to the construction of Grid 5000 9 sites, 5000 processors Rennes site 500 processors (powerpc, Xeon, Opteron) Dual processor nodes Participants Researcher: Y. Jégou Engineers: V. Lefèvre, D. Margery, P. Morrillon, G. Mornet 500 500 1000 500 500 500 500 500 500 Grid-5000 Rennes

International Collaborations (funded) Cluster OS University of Ulm (Germany), ORNL (USA), Rutgers University (USA) Grids Pisa University (Italy), SNU (Korea) Large scale data management UIUC (USA) P2P systems Vrije Universiteit (The Netherlands)

OS for Clusters and Grids Kerrighed Single System Image (SSI) operating system for high performance computing on clusters Vigne Operating system to ease the use and programming of grids

Kerrighed Objectives Virtual shared memory multiprocessor Global and transparent resource management Tolerating node failures transparently for the applications High performance Approach Design of distributed OS mechanisms within an existing OS (Linux)

Kerrighed Achievements Customizable efficient full SSI operating system for high performance computing on clusters Small clusters (up to 256 nodes) Advanced research prototype Integration of the work of 3 Ph.D. students (R. Lottiaux (2001), Geoffroy Vallée (2004), P. Gallard (2004)) Robust prototype able to execute real applications provided by EDF R&D and DGA Open source software www.kerrighed.org Stable version (K V1.0.2) based on Linux 2.4.29 Demo LiveCD based on Knoppix kerrighed.users@irisa.fr integrated in OSCAR ssi-oscar.irisa.fr OSCAR is a snapshot of methods for building, programming, and using clusters. It consists of a fully integrated and easy to install software bundle designed for high performance cluster computing.

Efficient Operating System Comparison with other SSI for clusters OpenSSI, openmosix Results published in CC-GRID 2005 Internship of Benoît Boissinot Efficient communication system Highly reactive communication system to support Kerrighed distributed operating system services Compatibility of Kerrighed with efficient communication drivers used by HPC applications (such as GM for Myrinet)

Collaborations EDF R& D (since 2000) PhD and post-doc grants DGA (2003-2005) Funding for research engineers ORNL (S. Scott) Integration of Kerrighed in OSCAR University of Ulm (M. Schöttner) Fault tolerant SDSM University of Rutgers (L. Iftode) High availability Invited researchers R. Badrinath (IIT Kharagpur) Isaac Scherson (UCI)

Current Research Directions Fault tolerance Large scale parallel application checkpointing System initiated checkpoints Checkpointing grid applications Master & PhD Thesis of Matthieu Fertré High availability Current work of Pascal Gallard and Renaud Lottiaux Tolerating hot node addition and eviction Phenix Investigating the backdoor approach in the context of Kerrighed Master Thesis of Benoît Boissinot Application SSI cluster OS Node 1 Node 2 Node 3

Technology Transfer KerLabs (http://www.kerlabs.com) Start-up funded by Pascal Gallard and Renaud Lottiaux Software suite based on Kerrighed technologies EasyAdmin: Global cluster management EasyCheckpoint: Checkpoint/restart of parallel applications EasyRun: Application deployment & scheduling on clusters EasyCluster: the whole Kerrighed SSI solution Optimized support for high performance networking technologies Open source model

Vigne: a Grid OS Design and implementation of a Grid OS to ease the use and programming of very large grids Highly decentralized system Algorithms based on local knowledge Self-healing system Dealing with multiple quasi-simultaneous reconfigurations Single System Image Flexible system

Vigne Infrastructure based on decentralized overlays Structured and unstructured overlays Application manager for reliable application execution Resource discovery & allocation service On-going work (PhD Thesis of Emmanuel Jeanvoine) Volatile data sharing service PhD Thesis of Louis Rilling Complex application deployment PhD Thesis of Boris Daix (co-advised with Christian Pérez) To start beginning of 2006

Collaborations EDF R&D PhD grants

Future Work XtreemOS Project Integrated Project (FP6 - Call 5) Goal Under evaluation Building and Promoting a Linux-based Operating System to Support Virtual Organisations for Next Generation Grids 18 partners Academic & industrial partners 8 countries (including China)

XtreemOS Main Objectives We will design, implement, evaluate and distribute an open source Grid OS with native support for virtual organizations. Development of a Grid Operating System Enhance Linux to support VO across multiple administrative domains Manage very large and Self-organizing and selfhealing system Available on PC, SMP, clusters, PDA and mobile phones XtreemOS software: 3 flavours Standard flavour for PC Federation flavour based on Kerrighed SD flavour for small devices XtreemOS software will make the VO management easy for administrators and work, within VOs, easy, secure and efficient. Application Appli Appli Appli Middleware Experimentation and evaluation with a comprehensive set of real usecases provided by ISVs and endusers Linux Computer Linux Computer XtreemOS Linux Computer Linux Computer Integration in notorious Linux distributions Mandriva, Red Flag Linux Building a reference open source Grid OS

Talks Kerrighed Pascal Gallard Matthieu Fertré Vigne Emmanuel Jeanvoine JuxMem Sébastien Monnet