Supercomputadores del Futuro

Size: px
Start display at page:

Download "Supercomputadores del Futuro"

Transcription

1 Supercomputadores del Futuro Real Academia de Ingeniería Mateo Valero Director Madrid, Abril 29 th 2014

2 Cómo avanza la ciencia hoy? Experimentación Teoría Simulación Simulación = Calcular las fórmulas de la teoría CARO PELIGROSO IMPOSIBLE

3 Computers are now an essential part of almost all research October 11, 2013 Science: Beyond the God particle By Clive Cookson The Nobel prizes in chemistry and physics show how computing is changing every field of research

4

5 Higgs and Englert s Nobel for Physics 2013 Last year one of the most computer-intensive scientific experiments ever undertaken confirmed Peter Higgs and François Englert s theory by making the Higgs boson the so-called God particle in an $8bn atom smasher, the Large Hadron Collider at Cern outside Geneva. the LHC produces 1PetaByte of data every second

6 Internet & Big Data

7 Barcelona Supercomputing Center - Centro Nacional de Supercomputación BSC-CNS objectives: R&D in Computer, Life, Earth and Engineering Sciences Supercomputing services and support to Spanish and European researchers BSC-CNS is a consortium that includes: Spanish Government 51% Catalonian Government 37% Universitat Politècnica de Catalunya (UPC) 12% +400 people, 40 countries BSC STAFF 2012 Funding from Personnel Grants

8 Mission of BSC Scientific Departments COMPUTER SCIENCES To influence the way machines are built, programmed and used: programming models, performance tools, Big Data, computer architecture, energy efficiency. EARTH SCIENCES To develop and implement global and regional state-ofthe-art models for shortterm air quality forecast and long-term climate applications. LIFE SCIENCES To understand living organisms by means of theoretical and computational methods (molecular modeling, genomics, proteomics). CASE To develop scientific and engineering software to efficiently exploit supercomputing capabilities (biomedical, geophysics, atmospheric, energy, social and economic simulations). 8

9 Severo Ochoa: programme The BSC is one of only eight Spanish research centres awarded with the prestigious Severo Ochoa grant. The aim of the Severo Ochoa programme is to strengthen the very best Spanish research centres, who are internationally amongst the most competitive in their field. With the Severo Ochoa grant, the BSC-CNS will strengthen its strategic research capacities, human resources, international collaboration and the dissemination of its results to society. 9

10 In the beginning... there were vector supercomputers Built to order Very few of them Special purpose hardware Very expensive Control Data Cray , 160 MFLOPS 80 units, 5-8 M$ Cray X-MP 1982, 800 MFLOPS Cray , 1.9 GFLOPS Cray Y-MP 1988, 2.6 GFLOPS...Fortran + Vectorizing Compilers 10

11 Evolution of the computing power of Supercomputers FLOP/second (operaciones sobre números reales 64 bits) ~2018? (1x10 7 processadors 1988 Cray Y-MP (8 processadors) 2008 Cray XT5 (15000 processadors) 1998 Cray T3E (1024 processadors)

12 The Formula 1 of Supercomputers today FLOP/segon (operaciones sobre números reales de 64 bits) #1 (55 PF) National University of Defense Technology, 54.9 PFlops Samsung Exynos > 50 Gflops Prototips BSC #1 Espanya (1PF) BSC #1 EU (5PF) Forschungszentrum Jülich

13

14 Top10 Rank Site Computer Procs Rmax Rpeak Power GFlops/W att Name 1 National University of Defense Technology TH-IVB-FEP Cluster, Intel Xeon E C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P ,86 54,90 17,8 1,90 Tianhe-2 (MilkyWay-2) 2 DOE/SC/OAK Ridge National Lab CRAY XK7, Opteron C, 2.20 GHz, Cray Gemini interconnect, NVIDIA K20x ,59 27,11 8,21 2,14 Titan 3 DOE/NNSA/LLNL BlueGene/Q, Power BQC 16C 1.60 GHz, Custom ,17 20,13 7,89 2,18 Sequoia 4 RIKEN Advanced Institute for Computational Science (AICS) Fujitsu, K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect ,51 11,28 12,65 0,83 K 5 DOE/SC/Argonne National Laboratory BlueGene/Q, Power BQC 16C 1.60GHz, Custom ,58 10,06 3,94 2,18 Mira 6 CSCS Cray XC30, Xeon E C 2.600GHz, Aries interconnect, NVIDIA K20x ,27 7,79 2,32 2,70 Piz Daint 7 Texas Advanced Computing Center PowerEdge C8220, Xeon E C 2.700GHz, Infiniband FDR, Intel Xeon Phi ,17 8,52 4,51 1,14 Stampede 8 Forschungszentrum Juelich (FZJ) BlueGene/Q, Power BQC 16C 1.60GHz, Custom ,00 5,87 2,30 2,18 JUQUEEN 9 DOE/NNSA/LLNL 10 Leibniz Rechenzentrum BlueGene/Q, Power BQC 16C 1.60 GHz, Custom NUDT YH MPP, Xeon X5670 6C 2.93 GHz, NVIDIA ,29 5,03 1,97 2,18 Vulcan ,90 3,18 3,42 0,85 SuperMUC

15 Building Supercomputers Interconnect (Myrinet, IB, Ge, 3D torus, tree, ) Node Node Node * Node * Node * Node ** Node ** Node ** Node Node Node Memory SMP IN multicore multicore multicore multicore homogeneous multicore (BlueGene-Q,Sandy-Bridge) heterogenous multicore general-purpose accelerator (e.g. Cell) GPU FPGA ASIC (e.g. Anton for MD) Network-on-chip (bus, ring, direct, )

16 Homogeneous Architectures Intel Sandy Bridge (2011) 8 cores x86, 32nm DP Performance: 0.16 TF Power: 130 W 256KB / core L2 coherent 20MB L3 shared Int Netw: Ring, BW = 400 GB/s Mem BW = 51.2 GB/s Intel Ivy Bridge (2013) 12 cores x86, 22 nm DP Performance 0.27 TF Power: 130 W 256KB / core L2 coherent 30 MB L3 shared Int Netw: Ring Mem BW: 80 GB/s

17

18 Intel Xeon Phi or Intel Many Integrated Core Architecture (MIC) Knights Corner (2011) Coprocessor, 61x86 cores, 22nm, AVX-512, 4 HTs 1.2TFLOPS (DP), 300W TDP, 4 GFLOPS/W 512KB/core L2 coherent Int Netw: Ring Mem BW: 352GB/s Knights Landing (exp 2015) Coprocessor or host processor 72 Atom cores, 14nm, AVX512 per core, 4 HTs Up to 16GB of DRAM 3D stacked on-package, 384GB GDDR 3TFLOPS (DP), 200W TDP, 15GFLOPS/W

19 Accelerators: NVIDIA Kepler GK110 GPU (2014) DP Performance: 1.43 Tflop Mem BW (ECC off): 288 GB/s Memory size (GDDR5): 12 GB 15 SMX units 192 single precision CUDA cores 64 double precision units 32 special function units 32 load/store units Six 64 bit memory controllers

20 Hardware/Myrinet Spine 1280 Spine Links Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x links (1 to each node) 250MB/s each direction 0 255

21 Dragonfly networks Minimal Routing Longest path: 3 hops local global local Deadlock avoidance: 3 logical VCs [2] VC0 - VC1 - VC2 2 physical VCs per local port + 1 physical VC per global port Good performance under UN traffic Saturation of the global link with adversarial traffic ADV+N [2] K. Gunther, Prevention of deadlocks in packet-switched data transport systems, Trans. Communications Source node SATURATION Dest Node Source group i Destination group i+n

22 Tianhe-2 Compute node with: 2 IvyBridge Xeon sockets (12 cores each) with 88 GB memory 3 Xeon Phi sockets with 8 GB memory for a total of cores IvyBridge socket: 8 flops/cycle per core * 12 cores/socket * 2.2 GHz = Gflop/s peak Xeon Phi socket: 16 flops/cycle per core * 57 cores/socket * 1.1 GHz = Tflop/s peak On a node there are 2 IvyBridge * Tflop/s + 3 Phi * Tflop/s Tflop/s per node

23 Tianhe-2 The complete system has 16,000 nodes: 2 nodes per board, 16 board per frame, 4 frames per rack and a total of 125 racks 54.9 Pflop/s peak Peak power consumption under load for the system (processors, memory and interconnect) is at 17.6 MWs (24 MWs with cooling)

24 MareNostrum 3 48,896 Intel SandyBridge cores at 2.6 GHz 84 Intel Xeon Phi Peak Performance of 1.1 Petaflops TB of main memory 2 PB of disk storage 8.5 PB of archive storage 9 th in Europe, 29 th in the world (June 2013 Top500 List)

25 Future Exascale Supercomputers - Energy efficient circuit, power and cooling technologies. - High performance processors and interconnect technologies. - Advanced memory technologies to dramatically improve capacity and bandwidth. - Scalable system software that is power and resilience aware. - Data management software that can handle the volume, velocity and diversity of data-storage - Programming environments to express massive parallelism, data locality, and resilience. - Reformulating science problems and refactoring solution algorithms for exascale. - Ensuring correctness in the face of faults, reproducibility, and algorithm verification. - Mathematical optimization and uncertainty quantification for discovery, design, and decision. - Software engineering and supporting structures to enable scientific productivity.

26 Going back to the Formula 1 who plays the most important role?

27 Influencing the design of HPC systems R Co-design hardware-software: ILP / memory/ power / resiliency walls HPC everywhere desktop, mobile, realtime embedded Performance analytics StarSs E Publications in the best magazines and conferences Influencing programming standards

28 MFLOPS The killer microprocessors Cray-1, Cray-C90 NEC SX4, SX5 Alpha AV4, EV5 Intel Pentium IBM P2SC HP PA Microprocessors killed the Vector supercomputers They were not faster but they were significantly cheaper and greener Need 10 microprocessors to achieve the performance of 1 Vector CPU SIMD vs. MIMD programming paradigms M. Valero. Vector Architectures: Past, Present and Future. Keynote talk. ICS-11. International Conference on Supercomputers. IEEE-ACM. Melbourne,

29 MFLOPS The killer mobile processors TM Alpha Intel AMD NVIDIA Tegra Samsung Exynos 4-core ARMv8 1.5 GHz Microprocessors killed the Vector supercomputers They were not faster but they were significantly cheaper and greener History may be about to repeat itself Mobile processor are not faster but they are significantly cheaper

30 A Supercomputer built from mobile devices? BSC lead the Mont-Blanc project to design a new type of computer architecture using mobile processors and new programming paradigms. The resulting systems should be capable of supporting future Exascale price, power and performance requirements.

31 Back to Babel? Book of Genesis Now the whole earth had one language and the same words The computer age Fortran & MPI Come, let us make bricks, and burn them thoroughly. "Come, let us build ourselves a city, and a tower with its top in the heavens, and let us make a name for ourselves And the LORD said, "Look, they are one people, and they have all one language; and this is only the beginning of what they will do; nothing that they propose to do will now be impossible for them. Come, let us go down, and confuse their language there, so that they will not understand one another's speech." Fortress Sisal StarSs CAF UPC ALF Chapel X10 RapidMind OpenMP MPI ++ Cilk++ HPF CUDA Sequoia SDK Thanks to Jesus Labarta

32 StarSs: generates task graph at run time #pragma css task input(a, B) output(c) void vadd3 (float A[BS], float B[BS], float C[BS]); #pragma css task input(sum, A) output(b) void scale_add (float sum, float A[BS], float B[BS]); #pragma css task input(a) inout(sum) void accum (float A[BS], float *sum); Task Graph Generation for (i=0; i<n; i+=bs) // C=A+B vadd3 ( &A[i], &B[i], &C[i]);... for (i=0; i<n; i+=bs) // sum(c[i]) accum (&C[i], &sum);... for (i=0; i<n; i+=bs) // B=sum*E scale_add (sum, &E[i], &B[i]);... for (i=0; i<n; i+=bs) // A=C+D vadd3 (&C[i], &D[i], &A[i]);... for (i=0; i<n; i+=bs) // E=C+F vadd3 (&C[i], &F[i], &E[i]);

33 StarSs: and executes as efficient as possible #pragma css task input(a, B) output(c) void vadd3 (float A[BS], float B[BS], float C[BS]); #pragma css task input(sum, A) output(b) void scale_add (float sum, float A[BS], float B[BS]); #pragma css task input(a) inout(sum) void accum (float A[BS], float *sum); Task Graph Execution for (i=0; i<n; i+=bs) // C=A+B vadd3 ( &A[i], &B[i], &C[i]);... for (i=0; i<n; i+=bs) // sum(c[i]) accum (&C[i], &sum);... for (i=0; i<n; i+=bs) // B=sum*E scale_add (sum, &E[i], &B[i]);... for (i=0; i<n; i+=bs) // A=C+D vadd3 (&C[i], &D[i], &A[i]);... for (i=0; i<n; i+=bs) // E=C+F vadd3 (&C[i], &F[i], &E[i]);

34 StarSs: benefiting from data access information Flat global address space seen by programmer Flexibility to dynamically traverse dataflow graph optimizing Concurrency. Critical path Memory access Opportunities for Prefetch Reuse Eliminate antidependences (rename) Replication management

35 BSC contribution to OpenMP 4.0 OmpSs data directionality hints (in, out, inout) Express data needs for tasks Runtime schedules tasks As soon as the data is available Following task dependences They have been incorporated in OpenMP 4.0 Specification will be released during the summer Compiler vendors probably already working on it #pragma omp task \ depend (in: A[i]) \ depend (out : B[i]) \ depend (inout : C[i]) {. = A [ ì ] ; B [ ì ] = C [ ì ] = C [ ì ] + }

36 Why Tools? Measurement techniques as enablers of science Are becoming vital for program development at exascale Are important for Lawyers Are vital for system architects Performance analyst: A specialist understanding displays

37 Leveraging techniques from many areas Spectral analysis techniques Wavelet High frequency Spectral density Autocorrelation Simulation T Software Counters CPI Stack model Correlate tracing and sampling Histograms Sequence alignment algorithms Models Clustering

38 The Human Brain Project In the life sciences, one of the most spectacular applications of information technology will be the EU s 10-year 1.2bn Human Brain Project, the world s largest neuroscience research programme. Every aspect of the project depends on computing, from neuroinformatics to eventually simulating a working brain in a machine.

39 Homogeneous Architectures: IBM POWER 8 (2014) Coprocessor, 12 cores PPC (8SMT), 22nm DP Performance: 0.4 TF Power: 200+ wats 512KB / core L2 96MB L3 shared Int Netw: Ring Topology, BW = 3.6TB/s Mem BW: 230GB/s Power 9 (2017) 1 Tflop?

40 Accelerators: NVIDIA Pascal (expected 2016) Aimed to fix data movement bottleneck Based on NVlinks chip-to-chip communication approach comprised of bidirectional 8-lane links provide between 80 and 200 GB/s of bandwidth This approach is expected to provide 4x speedups w. r. t. current GPU-based designs Future? ( )

41 Nvidia: Node for the Exaflop Computer ( ) Thanks Bill Dally

42 Intel: Node for the Exaflop Computer ( ) Thanks to S. Borkar, Intel

43 Advanced ERC - Riding on Moore s Law 5-year ERC Advanced Grant (M. Valero) Idea: a radically new conception of parallel architectures, built using a higher level of abstraction Objective: ensure continued performance improvements by riding on Moore's Law. Holistic approach with parallel architecture partially implemented as a software runtime management layer Multicore architecture with vector accelerators exploiting both thread and data level parallelism to optimize data movement Handling parallelism, the memory wall and the power wall, in application domains from mobile to supercomputers.

44 DRAM DRAM DRAM DRAM MRAM MRAM MRAM MRAM Cluster Interconnect Cluster Interconnect RoMoL works/ideas so far OmpSs prog. model I$ C C C C L1 LM L1 LM L1 LM L1 LM L1 LM L1 LM L1 LM L1 LM C C C C C C C C L1 LM L1 LM L1 LM L1 LM L1 LM L1 LM L1 LM L1 LM C C C C Combination of L1 cache and LM (Lluc) Runtime-assisted block prefetching (Victor) Instruction cache sharing, loop buffer in core to reduce contention (Ugi) L2 L3 L3 L3 L3 Interconnect MC L2 MC Runtime-managed cache hierarchy (fetch+flush), no cache coherence (Victor) When no cache coherence, runtime and OS data reside in lastlevel coherent cache Runtime-assisted object migration between different memory types

45 Education for Parallel Programming I many-core programming I multi-core programming We all massive parallel prog. I games Multicore-based pacifier

46 Navigating the Mare Nostrum

47 Are we planning to upgrade?.. Negotiating our next site ;)

48 Thank you!

SOSCIP Platforms. SOSCIP Platforms

SOSCIP Platforms. SOSCIP Platforms SOSCIP Platforms SOSCIP Platforms 1 SOSCIP HPC Platforms Blue Gene/Q Cloud Analytics Agile Large Memory System 2 SOSCIP Platforms Blue Gene/Q Platform 3 top500.org Rank Site System Cores Rmax (TFlop/s)

More information

Barcelona Supercomputing Center

Barcelona Supercomputing Center www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación Prof. Mateo Valero, Director Madrid, Febrero, 2014 Our Origins...Plan Nacional de Investigación High-performance Computing

More information

Visit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Visit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory Visit to the National University for Defense Technology Changsha, China Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 3, 2013 On May 28-29, 2013, I had the opportunity to attend

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory Report on the Sunway TaihuLight System Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 20, 2016 University of Tennessee Department of Electrical Engineering and Computer Science

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Supercomputing Resources in BSC, RES and PRACE

Supercomputing Resources in BSC, RES and PRACE www.bsc.es Supercomputing Resources in BSC, RES and PRACE Sergi Girona, BSC-CNS Barcelona, 23 Septiembre 2015 ICTS 2014, un paso adelante para la RES Past RES members and resources BSC-CNS (MareNostrum)

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

ANALYSIS OF SUPERCOMPUTER DESIGN

ANALYSIS OF SUPERCOMPUTER DESIGN ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis

More information

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03

More information

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Bill Barth, Kent Milfeld, Dan Stanzione Tommy Minyard Texas Advanced Computing Center Jim Jeffers, Intel June 2013, Leipzig, Germany

More information

Parallel Computing. Introduction

Parallel Computing. Introduction Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00

More information

BSC - Barcelona Supercomputer Center

BSC - Barcelona Supercomputer Center Objectives Research in Supercomputing and Computer Architecture Collaborate in R&D e-science projects with prestigious scientific teams Manage BSC supercomputers to accelerate relevant contributions to

More information

The K computer: Project overview

The K computer: Project overview The Next-Generation Supercomputer The K computer: Project overview SHOJI, Fumiyoshi Next-Generation Supercomputer R&D Center, RIKEN The K computer Outline Project Overview System Configuration of the K

More information

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC Mississippi State University High Performance Computing Collaboratory Brief Overview Trey Breckenridge Director, HPC Mississippi State University Public university (Land Grant) founded in 1878 Traditional

More information

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built

More information

Programming Techniques for Supercomputers: Multicore processors. There is no way back Modern multi-/manycore chips Basic Compute Node Architecture

Programming Techniques for Supercomputers: Multicore processors. There is no way back Modern multi-/manycore chips Basic Compute Node Architecture Programming Techniques for Supercomputers: Multicore processors There is no way back Modern multi-/manycore chips Basic ompute Node Architecture SimultaneousMultiThreading (SMT) Prof. Dr. G. Wellein (a,b),

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

Chances and Challenges in Developing Future Parallel Applications

Chances and Challenges in Developing Future Parallel Applications Chances and Challenges Prof. Dr. Rudolf Berrendorf rudolf.berrendorf@h brs.de http://berrendorf.inf.h brs.de/, Germany Computer Science Department Outline Why Parallelism? Parallel Systems are Complex

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

RES is a distributed infrastructure of Spanish HPC systems. The objective is to provide a unique service to HPC users in Spain

RES is a distributed infrastructure of Spanish HPC systems. The objective is to provide a unique service to HPC users in Spain RES: Red Española de Supercomputación, Spanish Supercomputing Network RES is a distributed infrastructure of Spanish HPC systems The objective is to provide a unique service to HPC users in Spain Services

More information

HP ProLiant SL270s Gen8 Server. Evaluation Report

HP ProLiant SL270s Gen8 Server. Evaluation Report HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch

More information

David Vicente Head of User Support BSC

David Vicente Head of User Support BSC www.bsc.es Programming MareNostrum III David Vicente Head of User Support BSC Agenda WEDNESDAY - 17-04-13 9:00 Introduction to BSC, PRACE PATC and this training 9:30 New MareNostrum III the views from

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

A quick tutorial on Intel's Xeon Phi Coprocessor

A quick tutorial on Intel's Xeon Phi Coprocessor A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed

More information

Pedraforca: ARM + GPU prototype

Pedraforca: ARM + GPU prototype www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

Kriterien für ein PetaFlop System

Kriterien für ein PetaFlop System Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

YALES2 porting on the Xeon- Phi Early results

YALES2 porting on the Xeon- Phi Early results YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin

More information

Jean-Pierre Panziera Teratec 2011

Jean-Pierre Panziera Teratec 2011 Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

CPU Session 1. Praktikum Parallele Rechnerarchtitekturen. Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14, 2015 1

CPU Session 1. Praktikum Parallele Rechnerarchtitekturen. Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14, 2015 1 CPU Session 1 Praktikum Parallele Rechnerarchtitekturen Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14, 2015 1 Overview Types of Parallelism in Modern Multi-Core CPUs o Multicore

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Algorithms of Scientific Computing II

Algorithms of Scientific Computing II Technische Universität München WS 2010/2011 Institut für Informatik Prof. Dr. Hans-Joachim Bungartz Alexander Heinecke, M.Sc., M.Sc.w.H. Algorithms of Scientific Computing II Exercise 4 - Hardware-aware

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner (Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

More information

High Performance Computing in the Multi-core Area

High Performance Computing in the Multi-core Area High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver 1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Barry Bolding, Ph.D. VP, Cray Product Division

Barry Bolding, Ph.D. VP, Cray Product Division Barry Bolding, Ph.D. VP, Cray Product Division 1 Corporate Overview Trends in Supercomputing Types of Supercomputing and Cray s Approach The Cloud The Exascale Challenge Conclusion 2 Slide 3 Seymour Cray

More information

High Performance Computing

High Performance Computing High Performance Computing Oliver Rheinbach oliver.rheinbach@math.tu-freiberg.de http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

SGI High Performance Computing

SGI High Performance Computing SGI High Performance Computing Accelerate time to discovery, innovation, and profitability 2014 SGI SGI Company Proprietary 1 Typical Use Cases for SGI HPC Products Large scale-out, distributed memory

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, June 2016

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, June 2016 Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, June 2016 Mellanox Leadership in High Performance Computing Most Deployed Interconnect in High Performance

More information

Jeff Wolf Deputy Director HPC Innovation Center

Jeff Wolf Deputy Director HPC Innovation Center Public Presentation for Blue Gene Consortium Nov. 19, 2013 www.hpcinnovationcenter.com Jeff Wolf Deputy Director HPC Innovation Center This work was performed under the auspices of the U.S. Department

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer

GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer GU Computing 1 2 3 The GU Advantage To ExaScale and Beyond The GU is the Computer The GU Advantage The GU Advantage A Tale of Two Machines Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World s

More information

Experiences With Mobile Processors for Energy Efficient HPC

Experiences With Mobile Processors for Energy Efficient HPC Experiences With Mobile Processors for Energy Efficient HPC Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, Alex Ramirez Barcelona Supercomputing Center Universitat Politècnica

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni (mahidhar@sdsc.edu) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice

InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice Provides the Best Return-on-Investment by Delivering the Highest System Efficiency and Utilization TOP500 Supercomputers June

More information

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS

More information

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Big Data Visualization on the MIC

Big Data Visualization on the MIC Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth timothy.dykes@port.ac.uk Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth

More information

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services Data Analytics at NERSC Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August, 2015 Data analytics at NERSC Science Applications Climate, Cosmology, Kbase, Materials,

More information

Cosmological simulations on High Performance Computers

Cosmological simulations on High Performance Computers Cosmological simulations on High Performance Computers Cosmic Web Morphology and Topology Cosmological workshop meeting Warsaw, 12-17 July 2011 Maciej Cytowski Interdisciplinary Centre for Mathematical

More information

Build an Energy Efficient Supercomputer from Items You can Find in Your Home (Sort of)!

Build an Energy Efficient Supercomputer from Items You can Find in Your Home (Sort of)! Build an Energy Efficient Supercomputer from Items You can Find in Your Home (Sort of)! Marty Deneroff Chief Technology Officer Green Wave Systems, Inc. deneroff@grnwv.com 1 Using COTS Intellectual Property,

More information

High Performance Computing, an Introduction to

High Performance Computing, an Introduction to High Performance ing, an Introduction to Nicolas Renon, Ph. D, Research Engineer in Scientific ations CALMIP - DTSI Université Paul Sabatier University of Toulouse (nicolas.renon@univ-tlse3.fr) Michel

More information

Keys to node-level performance analysis and threading in HPC applications

Keys to node-level performance analysis and threading in HPC applications Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

More information

White Paper The Numascale Solution: Extreme BIG DATA Computing

White Paper The Numascale Solution: Extreme BIG DATA Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer

More information

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT numascale Hardware Accellerated Data Intensive Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad www.numascale.com Supemicro delivers 108 node system with Numascale

More information

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK Barry Davis, General Manager, High Performance Fabrics Operation Data Center Group, Intel Corporation Legal Disclaimer Today s presentations contain

More information

Access to the Federal High-Performance Computing-Centers

Access to the Federal High-Performance Computing-Centers Access to the Federal High-Performance Computing-Centers rabenseifner@hlrs.de University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 TOP 500 Nov. List German Sites,

More information

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking

More information

Jezelf Groen Rekenen met Supercomputers

Jezelf Groen Rekenen met Supercomputers Jezelf Groen Rekenen met Supercomputers Symposium Groene ICT en duurzaamheid: Nieuwe energie in het hoger onderwijs Walter Lioen Groepsleider Supercomputing About SURFsara SURFsara

More information

Architecture of Hitachi SR-8000

Architecture of Hitachi SR-8000 Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

How Cineca supports IT

How Cineca supports IT How Cineca supports IT Topics CINECA: an overview Systems and Services for Higher Education HPC for Research Activities and Industries Cineca: the Consortium Not For Profit Founded in 1969 HPC FERMI: TOP500

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Lecture 1. Course Introduction

Lecture 1. Course Introduction Lecture 1 Course Introduction Welcome to CSE 262! Your instructor is Scott B. Baden Office hours (week 1) Tues/Thurs 3.30 to 4.30 Room 3244 EBU3B 2010 Scott B. Baden / CSE 262 /Spring 2011 2 Content Our

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Optimizing Code for Accelerators: The Long Road to High Performance

Optimizing Code for Accelerators: The Long Road to High Performance Optimizing Code for Accelerators: The Long Road to High Performance Hans Vandierendonck Mons GPU Day November 9 th, 2010 The Age of Accelerators 2 Accelerators in Real Life 3 Latency (ps/inst) Why Accelerators?

More information