www.bsc.es Programming MareNostrum III David Vicente Head of User Support BSC
Agenda WEDNESDAY - 17-04-13 9:00 Introduction to BSC, PRACE PATC and this training 9:30 New MareNostrum III the views from System administration group 10:30 Javier Bartolome Coffee Break 11:00 Visualization at BSC 11:30 How to use MareNostrum3 Part1 12:15 Hands-on I 13:00 Carlos Tripiana Christian Simarro Lunch (not hosted) 14:30 Tuning applications I How to put your program up and running 15:15 Hands-on II 16:00 Tuning your application! 17:00 David Vicente Visit MN3 David Vicente
Agenda THURSDAY - 18-04-13 9:00 Introduction to RES and PRACE Infrastructures Jorge Rodriguez 9:30 How can I get Resources from you (PRACE and RES)? Jorge Rodriguez 10:45 Coffee Break 11:15 Tuning applications II BSC performance tools (Extrae and Paraver) 11:45 Hands-on III 12:15 Can we help you in your porting? How? when? 13:30 End of the course Pablo Rodenas Christian Simarro
Who we are? Centro Nacional de Supercomputación www.bsc.es PRACE, Partnership for Advanced Computing in Europe www.prace-ri.eu
Centro nacional de Supercomputación www.bsc.es
BSC-CNS Objectives Operate national supercomputing facility R&D in Supercomputing Collaborate in R&D e-science Public Consortium Spanish Government 51% Catalonian Government 37% Technical University of Catalonia 12%
Organization structure at BSC-CNS BSC joins within their organization structure the service and the research
Life Science Department Atomic (and electronic) modeling of protein biochemistry and biophysics Micro and mesoscopic modeling of macromolecules. Drug Design Identification of the structural bases of protein-protein interaction Protein-protein interaction networks Systems biology Web services, applications, databases Analysis of genomes and networks to model diseases, systems and evolution of organisms
CASE Department Computational Fluid Dynamics Geophysics ITER: Plasma physics Bio-mechanics Ab-initio Molecular Dynamics
Earth Science Department Air Quality Mineral Dust Climate Change Global model for Mineral Dust Technology Transference
Computer Science Department Benchmarking, analysis and prediction tools: Computer architecture: Superscalar and VLIW Hardware multithreading Design space exploration for multicore chips and Hw accelerators Transactional memory (Hw, Hw-assisted) SIMD and vector extensions/units Tracing scalability Pattern and structure identification Visualization and analysis Processor, memory, network, system GRID Programming models: Scalability of MPI and UPC OpenMP for multicore, SMP and ccnuma DSM for clusters CellSs, streaming Transactional Memory Embedded architectures Future Exaflop systems Large cluster systems Grid and cluster computing: Small DMM cc-numa Operating environments: Chip Programming models Resource management I/O for Grid On-board SMP Autonomic application servers Resource management for heterogenous workloads Coordinated scheduling and resource management Parallel file system scalability
Operations team MareNostrum is managed by the Operations team that takes care of its availability, security and performance. An important task of this team is to support scientists in the usage of MareNostrum, as well as to help them in the improvement of their applications getting better research results. System administration area: includes MareNostrum s pure system administration, security, resource management, networking & helpdesk. User support area: includes direct user support with knowledge in programming models, libraries, tools, applications, etc.
What does HPC Support do? The main objectives for the HPC-Support group are : Solve the request of researchers using the BSC HPC-Resources Installation and debugging applications. Enabling and porting codes to the MareNostrum architecture Assist the users to the efficient use of supercomputing resources Optimization and scalability study. Parallelization assistant. Benchmarking Manage accounting information and users accounts.
Benchmark Suit The current codes used in the parallel BSC-benchmark suit are : Molecular Dynamics CPMD,GROMACS,AMBER DNS codes LISO Astrophysical simulations GADGET-2 Weather Forecast simulations WRF Others HPCC The parameters used in the benchmark study are : Elapsed time CPU time MFlops per process MFlops per total parallel execution Total Instructions per process Total Instructions per job.
Datasets Widely used application: one of the most cpu time-consuming application in the last period in MareNostrum. NAMD, as a molecular dynamics and quantum calculations software; addresses a field which is currently one of the most computationally demanding, in terms of compute load, communication speed, and memory load. MareNostrum's NAMD dataset: Realistic simulation: The system consists in three different proteins interacting with a cell membrane all surrounded by water molecules. Three different problem sizes: small, medium and large.
BSC new building, BSC as DATA CENTER, MinoTauro, SHM, Haga clic para modificar el estilo de texto del patrón Segundo nivel Tercer nivel Cuarto nivel Quinto nivel
BSC/HPC as Data Centers BSC Force10 E1200i 10G Switch Read: 5.7 GB/s Write: 4.7 GB/s Data BB1 Data BB2 Read: 5.7 GB/s Write: 4.7 GB/s Read:17.1 GB/s Write: 14.1 GB/s Data BB3 Read: 5.7 GB/s Write: 4.7 GB/s MetaData
CNAG, Centro Nacional de Análisis Genómico National centre of Genomics analysis BSC provides HPC and data IT services to CNAG Next generation sequencing Rapid sequencing of whole individuals, Detailed studies of cellular processes Raw Data: 1-2TB/run 2 runs/week 10 machines Image processing To generate sequence data Sequence analysis, Alignment and clustering Aligned results 250-500 GB/run
MinoTauro: The GPUS machine 128 compute nodes 2 Intel chips 2 GPU NVIDIA M2090 1 SSD 250GB Mos power efficient system in Europe Most performing system in Spain 15 Tflops peak en x86_64 167 Tflops peak en GPU 2 logins 2 admin Servers Networks Administration File system, 10GE IB-QDR non-blocking
Altix: Large shared memory for specific requirements SGI Altix 4700 SGI Altix is a shared memory machine, with a cc-numa architecture (Cache Coherent Non-Uniform Memory Access). Its hardware configuration is: 64 cpus Dual Core Montecito(IA-64 at1,6 GHz ) 8MB L3 cache and 533 MHz Bus. 1.5 TB RAM (shared for the128 cores) Peak Performance : 819.2 Gflops 2 internal SAS disks of 146 GB at 15000 RPMs 12 external SAS disks of 300 GB at 10000 RPMS
Nord 2: MN2 is still alive!! BSC Nord is a cluster with 256 JS21 blades with the following configuration: 4 cpus Powerpc 970MP at 2,3GHz. 8 GB RAM per blade. Peak Performance : 9.42 Tflops Myrinet and Gigabit interconnection network. SLES 10 SP1 Operating System. GPFS 3.5 shared filesystems: /gpfs/projects: 612 TB. /gpfs/scratch: 1.1 PB. /gpfs/home: 59 TB. /gpfs/apps: 30 TB.
Tibidabo: Green computing, the future? System Overview Tegra2 SoC: 2x ARM Cortex-A9 Cores 2 GFLOPS @ 0.5 Watt Tegra2 Q7 module: 1x Tegra2 SoC 2x ARM Cortex-A9 Cores 1 GB DDR2 DRAM 1 Gbe interconnect 2 GFLOPS @ ~4 Watt 1U Multi-board container: 8x Q7 carrier boards 8x Tegra2 SoC 16x ARM Cortex-A9 Cores 8 GB DDR2 DRAM 16 GFLOPS @ ~35 Watt Tibidabo rack: 32x Board container 10x 48-port 1GbE switches 256x Q7 carrier boards 256x Tegra2 SoC 512x ARM Cortex-A9 Cores 256 GB DDR2 DRAM 512 GFLOPS @ ~1.7 Kwatt 300 MFLOPS / W Entire prototype contains 2 racks. 1 TFLOPS @ ~3.4 KWatt
PRACE, Partnership for Advanced Computing in Europe WWW.prace-ri.eu
ESFRI: European Infrastructure Roadmap The high-end (capability) resources should be implemented every 23 years in a renewal spiral process Tier0 Centre total cost over a 5 year period shall be in the range of 200-400 M With supporting actions in the national/regional centers to maintain the transfer of knowledge and feed projects to the top capability layer tier0 tier1 tier2
PATC: PRACE Advance Training Center The mission of the PRACE Advanced Training Centres (PATCs) is to carry out and coordinate training and education activities that enable the European research community to utilise the computational infrastructure available through PRACE. The long-term vision is that such centres will become the hubs and key drivers of European high-performance computing education. Six PATC created Barcelona Supercomputing Center (Spain) CINECA - Consorzio Interuniversitario (Italy) CSC - IT Center for Science Ltd (Finland) EPCC at the University of Edinburgh (UK) Gauss Centre for Supercomputing (Germany) Maison de la Simulation (France) 25
PATC: Next Activities Activities for PATC@BSC until end of June 2013 Programming MareNostrum III 17-18 Apr 2013 Performance Analysis and Tools 13 May 2013 Heterogeneous Programming on GPUs with MPI + OmpSs 15 May 2013 Programming ARM based prototypes 17 May 2013 Introduction to CUDA Programming 3 Jun 2013
Thanks!