David Vicente Head of User Support BSC



Similar documents
Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

MareNostrum: Building and running the system - Lisbon, August 29th, 2005

RES is a distributed infrastructure of Spanish HPC systems. The objective is to provide a unique service to HPC users in Spain

BSC - Barcelona Supercomputer Center

Pedraforca: ARM + GPU prototype

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Supercomputing Resources in BSC, RES and PRACE

Trends in High-Performance Computing for Power Grid Applications

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Cosmological simulations on High Performance Computers

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Accelerating CFD using OpenFOAM with GPUs

VDI: What Does it Mean, Deploying challenges & Will It Save You Money?

Parallel Programming Survey

Kriterien für ein PetaFlop System

Clusters: Mainstream Technology for CAE

MareNostrum 3 Javier Bartolomé BSC System Head Barcelona, April 2015

High Performance Computing in CST STUDIO SUITE

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing


LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Cooling and thermal efficiently in

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Jean-Pierre Panziera Teratec 2011

LS DYNA Performance Benchmarks and Profiling. January 2009

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Monitoring Infrastructure for Superclusters: Experiences at MareNostrum

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Improved LS-DYNA Performance on Sun Servers

10- High Performance Compu5ng

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

HPC Update: Engagement Model

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Supercomputing Status und Trends (Conference Report) Peter Wegner

Spanish Supercomputing Network

Automating Big Data Benchmarking for Different Architectures with ALOJA

PRACE hardware, software and services. David Henty, EPCC,

HP ProLiant SL270s Gen8 Server. Evaluation Report

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

International High Performance Computing. Troels Haugbølle Centre for Star and Planet Formation Niels Bohr Institute PRACE User Forum

ALPS - The Swiss Grand Challenge Programme on the Cray XT3. CUG 2007, Seattle Dominik Ulmer, CSCS

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Altix Usage and Application Programming. Welcome and Introduction

OpenMP Programming on ScaleMP

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

Energy Efficiency Analysis of The SoC Based Processors for The Scientific Applications

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

How Cineca supports IT

High Performance Computing. Course Notes HPC Fundamentals

Lecture 1: the anatomy of a supercomputer

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Multicore Parallel Computing with OpenMP

SR-IOV In High Performance Computing

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

A-CLASS The rack-level supercomputer platform with hot-water cooling

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

A Holistic Model of the Energy-Efficiency of Hypervisors

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

PRIMERGY server-based High Performance Computing solutions

Performance analysis of parallel applications on modern multithreaded processor architectures

Virtualization of ArcGIS Pro. An Esri White Paper December 2015

Accelerating Server Storage Performance on Lenovo ThinkServer

Sun Constellation System: The Open Petascale Computing Architecture

Recent Advances in HPC for Structural Mechanics Simulations

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Experiences With Mobile Processors for Energy Efficient HPC

Dutch HPC Cloud: flexible HPC for high productivity in science & business

ECDF Infrastructure Refresh - Requirements Consultation Document

Relations with ISV and Open Source. Stephane Requena GENCI

Big Data Performance Growth on the Rise

PRACE An Introduction Tim Stitt PhD. CSCS, Switzerland

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

The CNMS Computer Cluster

The Greenplum Analytics Workbench

CMSC 611: Advanced Computer Architecture

Panasas: High Performance Storage for the Engineering Workflow

Estonian Scientific Computing Infrastructure (ETAIS)

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Overview of HPC systems and software available within

Kalray MPPA Massively Parallel Processing Array

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Michael Kagan.

Resource Scheduling Best Practice in Hybrid Clusters

PRACE: access to Tier-0 systems and enabling the access to ExaScale systems Dr. Sergi Girona Managing Director and Chair of the PRACE Board of

Hadoop on the Gordon Data Intensive Cluster

Data Centric Systems (DCS)

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

Transcription:

www.bsc.es Programming MareNostrum III David Vicente Head of User Support BSC

Agenda WEDNESDAY - 17-04-13 9:00 Introduction to BSC, PRACE PATC and this training 9:30 New MareNostrum III the views from System administration group 10:30 Javier Bartolome Coffee Break 11:00 Visualization at BSC 11:30 How to use MareNostrum3 Part1 12:15 Hands-on I 13:00 Carlos Tripiana Christian Simarro Lunch (not hosted) 14:30 Tuning applications I How to put your program up and running 15:15 Hands-on II 16:00 Tuning your application! 17:00 David Vicente Visit MN3 David Vicente

Agenda THURSDAY - 18-04-13 9:00 Introduction to RES and PRACE Infrastructures Jorge Rodriguez 9:30 How can I get Resources from you (PRACE and RES)? Jorge Rodriguez 10:45 Coffee Break 11:15 Tuning applications II BSC performance tools (Extrae and Paraver) 11:45 Hands-on III 12:15 Can we help you in your porting? How? when? 13:30 End of the course Pablo Rodenas Christian Simarro

Who we are? Centro Nacional de Supercomputación www.bsc.es PRACE, Partnership for Advanced Computing in Europe www.prace-ri.eu

Centro nacional de Supercomputación www.bsc.es

BSC-CNS Objectives Operate national supercomputing facility R&D in Supercomputing Collaborate in R&D e-science Public Consortium Spanish Government 51% Catalonian Government 37% Technical University of Catalonia 12%

Organization structure at BSC-CNS BSC joins within their organization structure the service and the research

Life Science Department Atomic (and electronic) modeling of protein biochemistry and biophysics Micro and mesoscopic modeling of macromolecules. Drug Design Identification of the structural bases of protein-protein interaction Protein-protein interaction networks Systems biology Web services, applications, databases Analysis of genomes and networks to model diseases, systems and evolution of organisms

CASE Department Computational Fluid Dynamics Geophysics ITER: Plasma physics Bio-mechanics Ab-initio Molecular Dynamics

Earth Science Department Air Quality Mineral Dust Climate Change Global model for Mineral Dust Technology Transference

Computer Science Department Benchmarking, analysis and prediction tools: Computer architecture: Superscalar and VLIW Hardware multithreading Design space exploration for multicore chips and Hw accelerators Transactional memory (Hw, Hw-assisted) SIMD and vector extensions/units Tracing scalability Pattern and structure identification Visualization and analysis Processor, memory, network, system GRID Programming models: Scalability of MPI and UPC OpenMP for multicore, SMP and ccnuma DSM for clusters CellSs, streaming Transactional Memory Embedded architectures Future Exaflop systems Large cluster systems Grid and cluster computing: Small DMM cc-numa Operating environments: Chip Programming models Resource management I/O for Grid On-board SMP Autonomic application servers Resource management for heterogenous workloads Coordinated scheduling and resource management Parallel file system scalability

Operations team MareNostrum is managed by the Operations team that takes care of its availability, security and performance. An important task of this team is to support scientists in the usage of MareNostrum, as well as to help them in the improvement of their applications getting better research results. System administration area: includes MareNostrum s pure system administration, security, resource management, networking & helpdesk. User support area: includes direct user support with knowledge in programming models, libraries, tools, applications, etc.

What does HPC Support do? The main objectives for the HPC-Support group are : Solve the request of researchers using the BSC HPC-Resources Installation and debugging applications. Enabling and porting codes to the MareNostrum architecture Assist the users to the efficient use of supercomputing resources Optimization and scalability study. Parallelization assistant. Benchmarking Manage accounting information and users accounts.

Benchmark Suit The current codes used in the parallel BSC-benchmark suit are : Molecular Dynamics CPMD,GROMACS,AMBER DNS codes LISO Astrophysical simulations GADGET-2 Weather Forecast simulations WRF Others HPCC The parameters used in the benchmark study are : Elapsed time CPU time MFlops per process MFlops per total parallel execution Total Instructions per process Total Instructions per job.

Datasets Widely used application: one of the most cpu time-consuming application in the last period in MareNostrum. NAMD, as a molecular dynamics and quantum calculations software; addresses a field which is currently one of the most computationally demanding, in terms of compute load, communication speed, and memory load. MareNostrum's NAMD dataset: Realistic simulation: The system consists in three different proteins interacting with a cell membrane all surrounded by water molecules. Three different problem sizes: small, medium and large.

BSC new building, BSC as DATA CENTER, MinoTauro, SHM, Haga clic para modificar el estilo de texto del patrón Segundo nivel Tercer nivel Cuarto nivel Quinto nivel

BSC/HPC as Data Centers BSC Force10 E1200i 10G Switch Read: 5.7 GB/s Write: 4.7 GB/s Data BB1 Data BB2 Read: 5.7 GB/s Write: 4.7 GB/s Read:17.1 GB/s Write: 14.1 GB/s Data BB3 Read: 5.7 GB/s Write: 4.7 GB/s MetaData

CNAG, Centro Nacional de Análisis Genómico National centre of Genomics analysis BSC provides HPC and data IT services to CNAG Next generation sequencing Rapid sequencing of whole individuals, Detailed studies of cellular processes Raw Data: 1-2TB/run 2 runs/week 10 machines Image processing To generate sequence data Sequence analysis, Alignment and clustering Aligned results 250-500 GB/run

MinoTauro: The GPUS machine 128 compute nodes 2 Intel chips 2 GPU NVIDIA M2090 1 SSD 250GB Mos power efficient system in Europe Most performing system in Spain 15 Tflops peak en x86_64 167 Tflops peak en GPU 2 logins 2 admin Servers Networks Administration File system, 10GE IB-QDR non-blocking

Altix: Large shared memory for specific requirements SGI Altix 4700 SGI Altix is a shared memory machine, with a cc-numa architecture (Cache Coherent Non-Uniform Memory Access). Its hardware configuration is: 64 cpus Dual Core Montecito(IA-64 at1,6 GHz ) 8MB L3 cache and 533 MHz Bus. 1.5 TB RAM (shared for the128 cores) Peak Performance : 819.2 Gflops 2 internal SAS disks of 146 GB at 15000 RPMs 12 external SAS disks of 300 GB at 10000 RPMS

Nord 2: MN2 is still alive!! BSC Nord is a cluster with 256 JS21 blades with the following configuration: 4 cpus Powerpc 970MP at 2,3GHz. 8 GB RAM per blade. Peak Performance : 9.42 Tflops Myrinet and Gigabit interconnection network. SLES 10 SP1 Operating System. GPFS 3.5 shared filesystems: /gpfs/projects: 612 TB. /gpfs/scratch: 1.1 PB. /gpfs/home: 59 TB. /gpfs/apps: 30 TB.

Tibidabo: Green computing, the future? System Overview Tegra2 SoC: 2x ARM Cortex-A9 Cores 2 GFLOPS @ 0.5 Watt Tegra2 Q7 module: 1x Tegra2 SoC 2x ARM Cortex-A9 Cores 1 GB DDR2 DRAM 1 Gbe interconnect 2 GFLOPS @ ~4 Watt 1U Multi-board container: 8x Q7 carrier boards 8x Tegra2 SoC 16x ARM Cortex-A9 Cores 8 GB DDR2 DRAM 16 GFLOPS @ ~35 Watt Tibidabo rack: 32x Board container 10x 48-port 1GbE switches 256x Q7 carrier boards 256x Tegra2 SoC 512x ARM Cortex-A9 Cores 256 GB DDR2 DRAM 512 GFLOPS @ ~1.7 Kwatt 300 MFLOPS / W Entire prototype contains 2 racks. 1 TFLOPS @ ~3.4 KWatt

PRACE, Partnership for Advanced Computing in Europe WWW.prace-ri.eu

ESFRI: European Infrastructure Roadmap The high-end (capability) resources should be implemented every 23 years in a renewal spiral process Tier0 Centre total cost over a 5 year period shall be in the range of 200-400 M With supporting actions in the national/regional centers to maintain the transfer of knowledge and feed projects to the top capability layer tier0 tier1 tier2

PATC: PRACE Advance Training Center The mission of the PRACE Advanced Training Centres (PATCs) is to carry out and coordinate training and education activities that enable the European research community to utilise the computational infrastructure available through PRACE. The long-term vision is that such centres will become the hubs and key drivers of European high-performance computing education. Six PATC created Barcelona Supercomputing Center (Spain) CINECA - Consorzio Interuniversitario (Italy) CSC - IT Center for Science Ltd (Finland) EPCC at the University of Edinburgh (UK) Gauss Centre for Supercomputing (Germany) Maison de la Simulation (France) 25

PATC: Next Activities Activities for PATC@BSC until end of June 2013 Programming MareNostrum III 17-18 Apr 2013 Performance Analysis and Tools 13 May 2013 Heterogeneous Programming on GPUs with MPI + OmpSs 15 May 2013 Programming ARM based prototypes 17 May 2013 Introduction to CUDA Programming 3 Jun 2013

Thanks!