Red Española de Supercomputación Zaragoza

Size: px
Start display at page:

Download "Red Española de Supercomputación Zaragoza"

Transcription

1 Red Española de Supercomputación Zaragoza Zaragoza, 1 de Julio, 2010 Mateo Valero Director

2 Top10 2

3 Looking at the Gordon Bell Prize 1 GFlop/s; 1988; Cray Y-MP; 8 Processors Static finite element analysis 1 TFlop/s; 1998; Cray T3E; 1024 Processors Modeling of metallic magnet atoms, using a variation of the locally self-consistent multiple scattering method. 1 PFlop/s; 2008; Cray XT5; 1.5x105 Processors Superconductive materials 1 EFlop/s; ~2018;?; 1x107 Processors (109 threads) Jack Dongarra 3

4 Cores in the Top25 Over Last 10 Years 4

5 Exponential growth in parallelism for the foreseeable future 5

6 Increasing chip performance: Intel s Petaflop chip 80 processors in a die of 300 square mm. Terabytes per second of memory bandwidth Note: The barrier of the Teraflops was obtained by Intel in 1991 using Pentium Pro processors contained in more than 85 cabinets occupying 200 square meters This will be possible in 3 years from now Thanks to Intel ICPP-2009, September 23rd

7 Intel/UPC Since 2002 (Roger Espasa, Toni Juan) 40 People Microprocessor Development (Larrabee x86 many core) 7

8 NVIDIA Fermi Architecture 16 Streaming- Multiprocessors (512 cores) execute Thread Blocks 620 Gigaflops Wide DRAM interface provides 12 GB/s bandwidth Unified 768KB L2 cache serves all threads GigaThread hardware scheduler assigns Thread Blocks to SMs 8

9 Cell Broadband Engine TM: A Heterogeneous Multi-core Architecture * Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. 9

10 Hybrid SMP-cluster parallel systems Interconnect (Myrinet, IB, Ge, 3D torus, tree, ) Node* Node Node Node** Node* Node* Node Node Node Node** Node** SMP Memory IN multicore multicore multicore multicore homogeneous multicore (e.g. Larrabee) heterogenous multicore general-purpose accelerator (e.g. Cell) GPU FPGA ASIC (e.g. Anton for MD) Network-on-chip (bus, ring, direct, ) 10

11 HPC hierarchy in current Top 10 Based on June 2010 list 11

12 Evolution towards Exaflop supercomputers Current #1 (6/2010) Current #2 (Fermi) (6/2010) Current #3 (Cell) (6/2010) All in Top500 (6/2010) Personal supercomputer (CPU) Personal supercomputer (CPU/GPU) Sequoia LANL (announced) Possible exaflop? cores nodes chips/n cores/c ops/cor ode hip e Personal supercomputer Personal sup. accelerator GHz GFlops 2,8 2,8 0, PF/s, 1.6 PB Memory 96 racks, 98,304 nodes 1.6 M cores (1 GB/core) 50 PB Lustre file system 6.0 MW power 12

13 13

14 Holistic approach Towards exaflop Applications Performance tools Programming model Can you imagine how would it be if there was no distance? Yes, he can! if everything was here? Load balancing Interconnection Processor/node architecture Thanks to Jesus Labarta New York, June 9th,

15 The holistic approach Towards exaflop Applications Performance tools Programming model A d d r e s s Can you imagine how should we think exaflop? In a holistic way? s p a c e s Load balancing Interconnection Processor/node architecture New York, June 10th, 2009 Yes, he can! L a t e n c y Dependences 15

16 Holistic approach Towards exaflop Applications Comput. Complexity Async. Algs. Moldability Job Scheduling a lle Ma Load Balancing Resource awareness y nc icie Po Concurrency extraction Interconnection ef f Locality optimization we r Work generation d Dependencies ea Address space Ov erh Run time User satisfaction ity bi l YES...We can Programming Model Topology and routing External contention Processor/node architecture NIC design Hw counters Run time support Memory subsystem Core Structure M. Valero Keynote at ICS, NY, June

17 Challenges: view from Jesús Labarta Variability Everywhere, huge Efficiency Performance and power reasons Avoid overkills Memory Logical and physical structure Bandwidth and latency Resilience Impossible to run an app without errors happening halfway Constraint: Power Locality scheduling, minimize Bandwidth Programmability: Don Grice: we can do the hardware but if it can not be programmed (approx.) Programming model Machine independent. What, not how Smooth migration path Runtime/Execution model Data access awareness. Asynchrony/dataflow Automatic Load balance Malleability Algorithms Asynchrony, overlap Minimize bandwidth Resilience From recovery to tolerance Holistic: Applications: Co-design vehicles Between system software layers and architecture Tools: Fly with instruments The importance of detail Jesús Labarta 17

18 BSC-CNS e iniciativas a nivel internacional: IESP Improve the world s simulation and modeling capability by improving the coordination and development of the HPC software environment Build an international plan for developing the next generation open source software for scientific highperformance computing 18

19 Back to Babel? Book of Genesis Now the whole earth had one language and the same words The computer age Fortran & MPI Come, let us make bricks, and burn them thoroughly. "Come, let us build ourselves a city, and a tower with its top in the heavens, and let us make a name for ourselves And the LORD said, "Look, they are one people, and they have all one language; and this is only the beginning of what they will do; nothing that they propose to do will now be impossible for them. Come, let us go down, and confuse their language there, so that they will not understand one another's speech." ++ Cilk++ Fortress X10 CUDA Sisal HPF RapidMind StarSs Sequoia CAF ALF OpenMP UPC SDK Chapel MPI 19

20 Different models of computation. The dream for automatic parallelizing compilers not true so programmer needs to express opportunities for parallel execution in the application SPMD OpenMP 2.5 Nested fork-join OpenMP 3.0 DAG data flow Huge Lookahead &Reuse. Latency/EBW/Scheduling And asynchrony (MPI and OpenMP too synchronous): Collectives/barriers multiply effects of microscopic load imbalance, OS noise, 20

21 The holistic approach Towards exaflop Performance Tools Programming Model Load Balancing Interconnection Processor/node architecture Clean programming practices. Abstraction y c n e h t m t a s d i i l L e w l l d a n r a a B P Mechanisms to inject probes Merge nicely with node level Asynchrony Sc he du lin g Applications Asynchrony/dependences/dataflow data access specification. Abstract/simple memory model Fine grain DLP Basic transformations (unroll, ) r e w o P Monitor/set tunable and scheduling variables Detect data production to enable overlapped trasnfer Efficient support for basic mechanisms. Data transfers, thread creation, dependence handling 21

22 The TEXT project Towards EXaflop applications Demonstrate that Hybrid MPI/SMPSs addresses the Exascale challenges in a an productive and efficient way. Deploy at supercomputing centers: Julich, EPCC, HLRS, BSC Port Applications (HLA, SPECFEM3D, PEPC, PSC, BEST, CPMD, LS1 MarDyn) and develop algorithms. Develop additional environment capabilities tools (debug, performance) improvements in runtime systems (load balance and GPUSs) Support other users Identify users of TEXT applications Identify and support interested application developers Contribute to Standards (OpenMP ARB, PERI-XML) 22

23 BSC-CNS El Barcelona Supercomputing Center Centro Nacional de Supercomputación (BSC-CNS) fue constituido el 1 de Abril de 2005 como el Spanish National Laboratory in supercomputing. A corto plazo se sumará el CSIC al patronato del BSC. Los nuevos porcentajes de propiedad serán: El Gobierno Español 54% El Gobierno Catalán 30% La UPC 11% El CSIC 5% 23

24 BSC: Spanish National Center More than 300 people from 27 different countries (Argentina, Belgium, Brazil, Bulgaria, Canada, Colombia, Cuba, China, Cuba, Dominicana, France, Germany, India, Iran, Ireland, Italy, Jordania, Lebanon, Mexico, Pakistan, Poland, Russia, Serbia, Spain, Turkey, UK, USA) 24

25 25

26 A la espera de un nuevo MareNostrum MNv1 MNv2 (50TF) (100TF) MNv3 MN (400TF) Posición Mundial Posición Europea MareNostrum v1 Nov Jun Nov Jun Nov Jun Nov Jun Nov Jun Nov MareNostrum v

27 Top20 in Europe 27

28 Red Española de Supercomputación Altamira Universidad de Cantabria MareNostrum BSC CaesarAugusta Universidad de Zaragoza Magerit Universidad Politécnica Madrid Picasso Universidad de Málaga La Palma IAC Tirant Universidad de Valencia Atlante ITC 28

29 1550 proyectos de e-ciencia 29

30 Dynamic load balancing MPI + OpenMP Over commit threads/core. Only one per processor active at a time Shift processors between processes in node i.e. 800 procs 2.5x speedup MPI + StarSs (In development) Should result in higher flexibility Overcoming Amdahl s law in hybrid parallelization Start with one process per core StarSs only in unbalanced regions LeWI: A Runtime Balancing Algorithm for Nested Parallelism. M.Garcia,J.corbalan.J.Labarta. ICPP09 30

31 Performance analysis of real production applications Describe actual behavior Identify optimization approaches Estimate potential To optimize the applications in cooperation with developers To drive other developments: Programming model, run times, architecture, interconnect, GROMACS Serialization!! Profile 40 PEPC Real Code region Bandwdith (MB/s) Bandwdith (MB/s) CPU ratio CPU ratio % % code region Speedup Speedup GADGET 93.67% Bandwdith (MB/s) load balance!! 4096 Pred. 1MB/s 15 1 Speedup Pred. 10MB/s Pred. 5MB/s 20 0 Endpoint contention!! NM Pred. Pred. 100MB/s ideal 30 % of computation time SPECFEM3D %elapsed time 35 CPU ratio

32 Kaleidoscope Project Platform Gflops Seep-up Power (W) Gflops/W JS21 8, ,03 QS22 116, ,32 2 TESLA ,8 0,76 The work of 3 months is now done in 1 week (speed-up 14) 2 days (speed-up 42) On the Cell 23.5 GB/s of memory BW used from 25.6 GB/s max BW On TESLA the I/O is now the real bottleneck Awarded by "IEEE Spectrum" as one of the 2008 top 5 innovative technologies Platt s award to the commercial technology of the year 2009 Barcelona, 10 febrero

33 Consolider El BSC-CNS coordina un programa Consolider de supercomputación y e-ciencia, que une a grupos de investigación expertos en aplicaciones que requieren supercomputación y a grupos expertos en el diseño del hardware y software de base de los supercomputadores. Grupo de Modelización Molecular y Bioinformática (U. de Barcelona, M. Orozco) Grup de Bioinformàtica del Genoma (Centre de Regulació Genòmica, R. Guigó) Grupo de Biología Estructural Computacional (Centro Nacional de Investigaciones Oncológicas, A. Valencia) Application scope Life Sciences Compilers and tuning of application kernels Programming models and performance tuning tools Application areas Architectures and hardware technologies Application scope Earth Sciences Application scope Astrophysics Application scope Engineering Application scope Physics Grupo de Ciencias de la Tierra (BSC-CNS, J. M. Baldasano) Unidad de Contaminación Atmosférica (CIEMAT, F. Martín) Grupo de Diagnóstico y Modelización del Clima (U. Complutense de Madrid, R. García-Herrera) Grupo de Astrofísica Relativista y Cosmología (U. de Valencia, J. M. Ibáñez) Grupo de Simulación Numérica de Procesos Astrofísicos (Instituto de Astrofísica de Canarias, F. Moreno) Grupo GAIA de Astronomía Galáctica (U. de Barcelona, J. Torra) Grupo de Cosmología Computacional (U. Autónoma de Madrid, G. Yepes) Grupo de Mecánica de Fluidos Computacional (U. Politécnica de Madrid, J. Jiménez Sendín) Unidad de Simulación Numérica y Modelización de Procesos (CIEMAT, M. Uhlmann) Grupo SIESTA (U. Autónoma de Madrid, J. M. Soler) Grupo SIESTA (Laboratorio de Estructura Electrónica de los Materiales del Instituto de Ciencia de Materiales de Barcelona ICMAB-CSIC, P. J. Ordejón) 33

34 Consolider (2) Departamento de Tecnologías de la Información (BSC-CNS, M. Valero) Grupo Computación de Altas Prestaciones (U. Politècnica de Catalunya, J. M. Llabería) Grupo de Arquitectura y Tecnología de Sistemas Informáticos (U. Complutense de Madrid, F. Tirado) Grupo de Arquitectura de Computadores (U. de Malaga, E. López Zapata) Application scope Life Sciences Compilers and tuning of application kernels Departamento de Tecnologías de la Información (BSC-CNS, M. Valero) Grupo Computación de Altas Prestaciones (U. Politècnica de Catalunya, J. M. Llabería) Parallel Processing and Distributed Systems group (U. Autónoma de Barcelona, A. Ripoll) Departamento de Tecnologías de la Información (BSC-CNS, M. Valero) Grupo Computación de Altas Prestaciones (U. Politècnica de Catalunya, J. M. Llabería) Grupo de Arquitectura de Computadores (U de Zaragoza, V. Viñals) Grupo de Arquitectura y Tecnología de Sistemas Informáticos (U. Complutense de Madrid, F. Tirado) Grupo de Arquitectura y Tecnología de Computadores (U. de Cantabria, J. R. Beivide) Grupo de Arquitectura de Computadores (U. de Malaga, E. López Zapata) Grupo de Arquitectura de Computadores (U. de Las Palmas de Gran Canaria, Instituto Universitario de Ciencias y Tecnologías Cibernéticas, E. Fernández) Programming models and performance tuning tools Application scope Earth Sciences Basic research Application scope Astrophysics in supercomputing Architectures and hardware technologies Application scope Engineering Application scope Physics 34

35 SyeC Consolider Project Nanotechnology (SIESTA code) Improving scalability till 1000 cores Load balancing Hybrid parallelism MPI+openMP Parallel I/O Matrix per vector Scalability Eigenvalue solver scalability 1000 Operations Matrix per vector Ideal 100 Speed Up Hamiltonian builder builder scalability scalability Hamiltonian Number of processors

36 SyeC Consolider Project Astrophysics codes on: General Relativistic Resistive Magneto-Hydrodinamics Porting to MPI + openmp or GPUs GAIA operative infrastructure 36

37 SyeC Consolider Project DNS in CFD (Turbulence + Particles) Global communications are critical Boundary layer modeling Hybrid code MPI+openMP Reθ = 6150 on BG/P at ANL cores (34Mh) Parallel I/O 100 TB Total Output data => 5 year to analyze these data 37

38 SyeC Consolider Project Atmospheric modeling New numerical schemes Improving scalability MPI+openMP+ paralell I/O Cell and GPUs code First dust simulation including all processes except dry deposition 38

39 SyeC Consolider Project Life Sciences (Protein interactions & Genomics) Workflows Data bases GPU code porting New sequencing harware Small folding 1 experiment 4 Tb of data 40 Gb processed data 1 machine 2 experiments a week A medium sized center 10 machines 39

40 SyeC Consolider Project Computer Sciences Code scalability Programming Models Computer architecture 40

41 PRACE Project Consortium tier 1 Hosting Partners tier 0 General Partners Non Hosting Partners 41

42 First machine available JUGENE IBM FZJ, Jülich, Germany Next machine France Summer

43 PRACE Early Access Call Opening date : 10th May 2010 Closing date : 10th June 2010 Start date : 1st August or 1st December 2010 Allocation period : 4 months Type of access: Project (1 proposal) or Preparatory + Project (combined -2 linked proposals) Information: 43

44 MareIncognito: Project structure 4 relevant apps: Materials: SIESTA Geophisics imaging: RTM Comp. Mechanics: ALYA Plasma: EUTERPE General kernels Automatic analysis Coarse/fine grain prediction Sampling Clustering Integration with Peekperf StarSs: CellSs, SMPSs OpenMP++ MPI + OpenMP/StarSs Applications Performance analysis tools Models and prototype Interconnect Contention, Collectives Overlap computation/communication Slimmed Networks Direct versus indirect networks Programming models Coordinated scheduling: Run time, Process, Job Power efficiency Load balancing Processor and node Contribution to new Cell design Support for programming model Support for load balancing Support for performance tools Issues for future processors 44

45 RIS: BSC-CNS y Latinoamérica Establecimiento de RIS: Red Iberoamericana de Supercomputación a través del CYTED: Uso compartido de recursos: RES más centros iberoamericanos Formación Investigación Países: Portugal Argentina Brasil Colombia Chile República Dominicana Cuba México Ecuador Conectar RIS a los programas de la UE Barcelona, 10 febrero

46 Barcelona Computing Week, July 5-9, 2010 Programming and Tuning Massively Parallel Systems Coordinators: Mateo Valero, BSC Wen-mei Hwu, Illinois Instructors: Wen-mei Hwu, University of Illinois David B. Kirk, NVIDIA Corporation Audience: Three parallel tracks specially designed for beginners, advanced and teachers profiles. Programming Languages: CUDA, OpenCL, OpenMP, StarSs Numerical Methods and Case Studies: FFT, Graph, Tiling, Grid, Montecarlo, FDTD, Sparse matrices Hands-on Labs: Afternoon labs with teaching assistants for each audience/level 250 applications for July

47 ACACES 10 "HiPEAC Summer School" : one week summer school for computer architects and compiler builders 294 applications, 197 participants 17 industry attendants, 8 companies 31 countries Keynotes Insup Lee University of Pennsylvania Jesus Labarta, BSC Instructor Andreas Herkersdorf Michael Scott Affiliation TU Muenchen Vivek Sarkar Multicore Programming Models and their Compilation Challenges Harvard University Variation-Aware Processor Design David Brooks Derek Chiou University of Rochester Rice University University of Texas at Austin Scott Mahlke University of Michigan Dan Sorin Duke University Donatella Politecnico di Sciuto Milano Steven Hand Citrix Theodore Ts'o Google Mahmut Kandemir Andrzej Brud and Per Stenström Title Application-Specific (MP)SoC Architectures for Internet Networking Transactional Memory Fast and Accurate Computer System Simulators Compilation for Multicore Processors Fault Tolerant Computer Architecture FPGA-based reconfigurable computing System Virtualization File Systems and Storage Technologies Pennsylvania State Embedded Systems: A Software University Perspective Chalmers How to transform research results into a business 47

48 BMW10: Barcelona Multicore Workshop, October 22-23, 2010 Multi-core and many-core processors have already arrived The issue facing the software community is how to program those machines in the most productive way. The hardware community has to design the manycores so as to maximize the potential performance. The BMW workshop consists of a combination of invited talks, two panel discussions and time for discussion. Co-organized by BSC-Microsoft Research Center and HiPEAC Organizers: BSC -MIcrosoft: Mateo Valero, Fabrizio Gagliardi, Osman Unsal HiPEAC: Per Stenstrom, Georgi Gaydadjiev, Manolis Katevenis, Eduard Ayguade 48

49 Master on HPC Kernel Supercomputer Architectures Methods and Algorithms for Parallel Programming Optimization and Parallelization of Numerical Simulations Free-options High performance Computational Mechanics Performance tuning and analysis tools Data Mining 2 Seminar on Supercomputing I, II, III Applications Computational Astrophysics Bionformatics Earth Sciences Applications of Computational Astrophysics Applications of Bionformatics Applications of Earth Sciences 49 T H E S I S

50 Education for Parallel Programming I I many-core programming multi-core programming We all massive parallel prog. I games Multicore-based pacifier 50

51 Muchas gracias Barcelona, 10 febrero

52 SIESTA project Ab-inito DFT molecular dynamics code BSC working on its development Example: Neptune Mid-layer H2O+ NH3+ CH4 Temperature 1500 ºK 2500 ºK Pressure 0.15 GPa 60 GPa Simulated time 10 ps equivalent to 20,000 molecular dynamic steps Number of atoms 1269 atoms (100 processors, 2007) Now, more than atoms using more than 1000 processors) 52

53 High Performance Computing as key-enabler LES Capacity: # of Overnight Loads cases run Unsteady RANS Available Computational Capacity [Flop/s] RANS Low Speed x106 RANS High Speed 105 Smart use of HPC power: 106 Algorithms Data mining knowledge 1980 HS Design 1990 Data Set 2000 CFD-based LOADS & HQ Aero Optimisation & CFD-CSM 2030 Full MDO Capability achieved during one night batch CFD-based noise simulation Real time CFD based in flight simulation Courtesy AIRBUS France 53

54 Diseño del ITER TOKAMAK (JET, Oxford) 54

55 Weather, Climate and Earth Sciences: Roadmap 2009 Resolution : 80 Km Memory: FLOPS * 1014 GB Storage: 8 TB NEC-SX9 48 vector procs: 40 days run 2015 Resolution : 20 Km MemSory: 3,5 TB Storage: 180 TB High resolution model with complete carbon cycle model Challenges:16 data viz and post-processing, data discovery, archiving FLOPS 1* Resolution : 1 Km Memory: 4 PB Storage: 150 PB Higher resolution with global cloud resolving model Challenges: data sharing, transfer memory management, I/O management FLOPS 1*

56 Supercomputación, teoría y experimentación 56 Cortesia de IBM

57 57

58 ORNL: 1.75 PF/s Cray XT5-HE system Quad-core AMD Opteron processors running at 2.6 GHz, 224,162 cores. Power: 6.95 Mwatts 300 terabytes of memory 10 petabytes of disk space. 240 gigabytes per second disk bandwidth Cray's SeaStar2+ interconnect network. Jack Dongarra 58

59 PRACE AISBL First Council meeting of the legal form. Three more countries formally adhered to the legal form: Sweeden, Cyprus and Czech Republic BoD was formally appointed. Chair of the BoD selected: Sergi Girona Operating budget approved 59

60 PRACE regular calls Preparatory access code testing and optimisation, technical support if requested, continuous call, fast track assessment, maximum allocation of 6 months Project access 2 calls per year, 1 year allocation, allocation in November and May Programme access large projects of a research group, 2 years allocation, calls coincide with the calls for project access, possible review at the end of the first year allocation Final report mandatory for all types of proposals 60

61 PRACE regular calls 1st PRACE regular call opened 15th June for allocation starting 1st November Only available machine at present is JUGENE All proposals will be subject to PRACE Peer Review, which will be handled on-line The Scientific Steering Committee will be responsible for advising on the scientific direction of PRACE 61

62 Grand Challenge problems Systems biology Model & simulation leading to predictive models with clinical or environmental impact Sustainable Systems Taking into account multi-scale nature Models are linked to experimental data providing corroboration of experiments Turbulence & Chaos Characterize boundary layer effects and their impact on global solution and stability Environmental Global Warming/Climate Change Energy Water Biodiversity and land use Chemicals, toxics and heavy metals Air pollution Waste management Stratospheric ozone depletion Oceans & fisheries Deforestation Multi-Scale Patient-Specific Data Genetic Variability Gene Protein ExpressionExpression Profiling Profiling Multi-Modal Imaging Data Analysis And Modeling 62

63 BSC-CNS: sinergia con infraestructuras CNS es un complemento fundamental para las infraestructuras científicas experimentales IAC ICFO Sincroton IRB Barcelona, 20 Julio 2009 CIEMAT(TJ II) 63

64 Factors that Necessitate Redesign Steepness of the ascent from terascale to petascale to exascale Extreme parallelism and hybrid design Preparing for million/billion way parallelism Tightening memory/bandwidth bottleneck Limits on power/clock speed implication on multicore Reducing communication will become much more intense Memory per core changes, byte-to-flop ratio will change Necessary Fault Tolerance MTTF will drop Checkpoint/restart has limitations Software infrastructure does not exist today 64

65 El BSC-CNS Misión del BSC-CNS Investigar, desarrollar y gestionar tecnologías que faciliten el avance de la ciencia Objetivos del BSC-CNS I+D en ciencias de los computadores, ciencias de la vida y ciencias de la tierra Soporte de supercomputación a investigación externa 5 departamentos científico/técnicos Barcelona, 10 febrero

66 BSC-CNS: vertebrador del servicio de supercomputación en España MareNostrum BSC Magerit Universidad Politécnica Madrid CaesarAugusta Universidad de Zaragoza La Palma IAC Picasso Universidad de Málaga Barcelona, 10 febrero 2010 Altamira Universidad de Cantabria Atlante Gobierno Canarias Tirant Universidad de Valencia 66

67 BSC-CNS y REPSOL: proyecto Kaleidoscope 2008: Awarded by "IEEE Spectrum" as one of the top 5 innovative technologies Barcelona, 10 febrero

68 Kaleidoscope: WEM vs. RTM Data Property of TGS 68

69 SIESTA project Ab-inito DFT molecular dynamics code BSC working on its development Example: Neptune Mid-layer H2O+ NH3+ CH4 Temperature 1500 ºK 2500 ºK Pressure 0.15 GPa 60 GPa Simulated time 10 ps equivalent to 20,000 molecular dynamic steps Number of atoms 1269 atoms (100 processors, 2007) Now, more than atoms using more than 1000 processors) 69

Spanish Supercomputing Network

Spanish Supercomputing Network IBERGRID 2008 Spanish Supercomputing Network Francesc Subirada Associate Director Introduction: National Center & Spanish Network The BSC-CNS is the Spanish National Supercomputing Center, created with

More information

RES is a distributed infrastructure of Spanish HPC systems. The objective is to provide a unique service to HPC users in Spain

RES is a distributed infrastructure of Spanish HPC systems. The objective is to provide a unique service to HPC users in Spain RES: Red Española de Supercomputación, Spanish Supercomputing Network RES is a distributed infrastructure of Spanish HPC systems The objective is to provide a unique service to HPC users in Spain Services

More information

Supercomputing Resources in BSC, RES and PRACE

Supercomputing Resources in BSC, RES and PRACE www.bsc.es Supercomputing Resources in BSC, RES and PRACE Sergi Girona, BSC-CNS Barcelona, 23 Septiembre 2015 ICTS 2014, un paso adelante para la RES Past RES members and resources BSC-CNS (MareNostrum)

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

David Vicente Head of User Support BSC

David Vicente Head of User Support BSC www.bsc.es Programming MareNostrum III David Vicente Head of User Support BSC Agenda WEDNESDAY - 17-04-13 9:00 Introduction to BSC, PRACE PATC and this training 9:30 New MareNostrum III the views from

More information

Cosmological simulations on High Performance Computers

Cosmological simulations on High Performance Computers Cosmological simulations on High Performance Computers Cosmic Web Morphology and Topology Cosmological workshop meeting Warsaw, 12-17 July 2011 Maciej Cytowski Interdisciplinary Centre for Mathematical

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

BSC - Barcelona Supercomputer Center

BSC - Barcelona Supercomputer Center Objectives Research in Supercomputing and Computer Architecture Collaborate in R&D e-science projects with prestigious scientific teams Manage BSC supercomputers to accelerate relevant contributions to

More information

Barry Bolding, Ph.D. VP, Cray Product Division

Barry Bolding, Ph.D. VP, Cray Product Division Barry Bolding, Ph.D. VP, Cray Product Division 1 Corporate Overview Trends in Supercomputing Types of Supercomputing and Cray s Approach The Cloud The Exascale Challenge Conclusion 2 Slide 3 Seymour Cray

More information

Kriterien für ein PetaFlop System

Kriterien für ein PetaFlop System Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Jean-Pierre Panziera Teratec 2011

Jean-Pierre Panziera Teratec 2011 Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores

More information

MareNostrum: Building and running the system - Lisbon, August 29th, 2005

MareNostrum: Building and running the system - Lisbon, August 29th, 2005 MareNostrum Building and running the system Lisbon, August 29th, 2005 Sergi Girona Operations Head History: Three Rivers Project IBM project Objective Bring IBM Power Systems back into the Top5 list Push

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

10- High Performance Compu5ng

10- High Performance Compu5ng 10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Lecture 1. Course Introduction

Lecture 1. Course Introduction Lecture 1 Course Introduction Welcome to CSE 262! Your instructor is Scott B. Baden Office hours (week 1) Tues/Thurs 3.30 to 4.30 Room 3244 EBU3B 2010 Scott B. Baden / CSE 262 /Spring 2011 2 Content Our

More information

PRACE: access to Tier-0 systems and enabling the access to ExaScale systems Dr. Sergi Girona Managing Director and Chair of the PRACE Board of

PRACE: access to Tier-0 systems and enabling the access to ExaScale systems Dr. Sergi Girona Managing Director and Chair of the PRACE Board of PRACE: access to Tier-0 systems and enabling the access to ExaScale systems Dr. Sergi Girona Managing Director and Chair of the PRACE Board of Directors PRACE aisbl, a persistent pan-european supercomputing

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Software for High Performance. Computing. Requirements & Research Directions. Marc Snir

Software for High Performance. Computing. Requirements & Research Directions. Marc Snir Software for High Performance Requirements & Research Directions Computing Marc Snir May 2006 Outline Petascale hardware Petascale operating system Programming models 2 Jun-06 Petascale Systems are Coming

More information

Relations with ISV and Open Source. Stephane Requena GENCI Stephane.requena@genci.fr

Relations with ISV and Open Source. Stephane Requena GENCI Stephane.requena@genci.fr Relations with ISV and Open Source Stephane Requena GENCI Stephane.requena@genci.fr Agenda of this session 09:15 09:30 Prof. Hrvoje Jasak: Director, Wikki Ltd. «HPC Deployment of OpenFOAM in an Industrial

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

ANALYSIS OF SUPERCOMPUTER DESIGN

ANALYSIS OF SUPERCOMPUTER DESIGN ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis

More information

Cloud+X: Exploring Asynchronous Concurrent Applications

Cloud+X: Exploring Asynchronous Concurrent Applications Cloud+X: Exploring Asynchronous Concurrent Applications Authors: Allen McPherson, James Ahrens, Christopher Mitchell Date: 4/11/2012 Slides for: LA-UR-12-10472 Abstract: We present a position paper that

More information

Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp

Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

YALES2 porting on the Xeon- Phi Early results

YALES2 porting on the Xeon- Phi Early results YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin

More information

Amazon Cloud Performance Compared. David Adams

Amazon Cloud Performance Compared. David Adams Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

Introducing the Singlechip Cloud Computer

Introducing the Singlechip Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Performance of the JMA NWP models on the PC cluster TSUBAME.

Performance of the JMA NWP models on the PC cluster TSUBAME. Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)

More information

PRACE hardware, software and services. David Henty, EPCC, d.henty@epcc.ed.ac.uk

PRACE hardware, software and services. David Henty, EPCC, d.henty@epcc.ed.ac.uk PRACE hardware, software and services David Henty, EPCC, d.henty@epcc.ed.ac.uk Why? Weather, Climatology, Earth Science degree of warming, scenarios for our future climate. understand and predict ocean

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

HPC Programming Framework Research Team

HPC Programming Framework Research Team HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

The PRACE Project Applications, Benchmarks and Prototypes. Dr. Peter Michielse (NCF, Netherlands)

The PRACE Project Applications, Benchmarks and Prototypes. Dr. Peter Michielse (NCF, Netherlands) The PRACE Project Applications, Benchmarks and Prototypes Dr. Peter Michielse (NCF, Netherlands) Introduction to me Ph.D. in numerical mathematics (parallel adaptive multigrid solvers) from Delft University

More information

CENTRO DE SUPERCOMPUTACIÓN GALICIA CESGA

CENTRO DE SUPERCOMPUTACIÓN GALICIA CESGA CENTRO DE SUPERCOMPUTACIÓN DE GALICIA CENTRO DE SUPERCOMPUTACIÓN GALICIA CESGA Javier García Tobío (Managing Director, Galicia Supercomputing Centre) MISSION STATEMENT To provide high performance computing,

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Hank Childs, University of Oregon

Hank Childs, University of Oregon Exascale Analysis & Visualization: Get Ready For a Whole New World Sept. 16, 2015 Hank Childs, University of Oregon Before I forget VisIt: visualization and analysis for very big data DOE Workshop for

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 19 13:00-13:30 Welcome

More information

Overview of High Performance Computing

Overview of High Performance Computing Overview of High Performance Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu http://geco.mines.edu/workshop 1 This tutorial will cover all three time slots. In the first session we will discuss the

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Performance analysis of parallel applications on modern multithreaded processor architectures

Performance analysis of parallel applications on modern multithreaded processor architectures Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance analysis of parallel applications on modern multithreaded processor architectures Maciej Cytowski* a, Maciej

More information

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

More information

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

Experiences With Mobile Processors for Energy Efficient HPC

Experiences With Mobile Processors for Energy Efficient HPC Experiences With Mobile Processors for Energy Efficient HPC Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, Alex Ramirez Barcelona Supercomputing Center Universitat Politècnica

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

High Performance Computing in the Multi-core Area

High Performance Computing in the Multi-core Area High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable

More information

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Research Group Scientific Computing Faculty of Computer Science University of Vienna AUSTRIA http://www.par.univie.ac.at

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Big Data Management in the Clouds and HPC Systems

Big Data Management in the Clouds and HPC Systems Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim Shadi.ibrahim@inria.fr Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Automating Big Data Benchmarking for Different Architectures with ALOJA

Automating Big Data Benchmarking for Different Architectures with ALOJA www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner (Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical

More information

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver 1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution

More information

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Bill Barth, Kent Milfeld, Dan Stanzione Tommy Minyard Texas Advanced Computing Center Jim Jeffers, Intel June 2013, Leipzig, Germany

More information

Parallel file I/O bottlenecks and solutions

Parallel file I/O bottlenecks and solutions Mitglied der Helmholtz-Gemeinschaft Parallel file I/O bottlenecks and solutions Views to Parallel I/O: Hardware, Software, Application Challenges at Large Scale Introduction SIONlib Pitfalls, Darshan,

More information

High Performance Computing (HPC)

High Performance Computing (HPC) High Performance Computing (HPC) High Performance Computing (HPC) White Paper Attn: Name, Title Phone: xxx.xxx.xxxx Fax: xxx.xxx.xxxx 1.0 OVERVIEW When heterogeneous enterprise environments are involved,

More information

Parallel Software usage on UK National HPC Facilities 2009-2015: How well have applications kept up with increasingly parallel hardware?

Parallel Software usage on UK National HPC Facilities 2009-2015: How well have applications kept up with increasingly parallel hardware? Parallel Software usage on UK National HPC Facilities 2009-2015: How well have applications kept up with increasingly parallel hardware? Dr Andrew Turner EPCC University of Edinburgh Edinburgh, UK a.turner@epcc.ed.ac.uk

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information