Pedraforca: ARM + GPU prototype

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Pedraforca: ARM + GPU prototype"

Transcription

1 Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014

2 Overview Goals: Test the performance, scalability, and energy efficiency of the ARM multicore processors + high-end GPGPU accelerators Test scalability to high number of compute nodes Budget: BSC State of the art: Carma BSC: ARM + mobile Nvidia GPU Workshop on exascale and PRACE prototypes, 20 May

3 Prototype architecture: components Workshop on exascale and PRACE prototypes, 20 May

4 Prototype architecture: housing E4 boxes Bull racks Workshop on exascale and PRACE prototypes, 20 May

5 Prototype architecture: rack layout and network topology 3 bullx 1200 rack 78 compute nodes 2 login nodes 4 36-port InfiniBand switches (MPI) 2 50-port GbE switches (storage) Workshop on exascale and PRACE prototypes, 20 May

6 Partners + roles BSC PRACE partner Bull System integrator E4 Subcontractor of Bull for system integration Seco Boards provider (Q7 + carrier board) Nvidia GP-GPU provider Support for CUDA software stack Mellanox High speed interconnection provider Support for IB software stack Workshop on exascale and PRACE prototypes, 20 May

7 Prototype procurement Legal process Public tender with publicity required Spain: amount >60 KEuro EU: amount >200 KEuro Exclusive contract to Bull Timeline: Mar Project start 18 Jan Tender published 25 Feb Proposal deadline 28 Mar Bull proposal accepted 22 Apr Contract signed 28 Aug First node delivered Sep Final installation Workshop on exascale and PRACE prototypes, 20 May

8 Prototype deployment Power supply Data center hosting the final installation did not have enough power capability SOLUTION: connected to a second power grid, no issues Fans System is completely air cooled: 2 fans per node, >150 in total! No dynamic regulation of revolution speed of the fans Problem of noise Problem of power consumption (~25 W per node) SOLUTION: installed manual speed regulators for fans Temperature sensor of the computer room Installed on top of one of the temperature sensor of the computing room False temperature measurements of part of the data center SOLUTION: move the sensor Workshop on exascale and PRACE prototypes, 20 May

9 Architectural issues Coherency protocol within the memory controller (it seems that) Ordering of PCIe transaction is not guaranteed between different PCI devices Polling does not work Requires re-write of drivers avoiding polling SOLUTION: Extremely difficult, unless you have strong commitment of providers PCIe bandwidth CPU: 4x Gen1, 1 GB/sec GPU: 16x Gen3, ~15 GB/sec SOLUTION: Impossible to overcome with current technology Memory issues: 2 GB on the host / 5 GB on GPU SOLUTION: Impossible to overcome with current technology Board Management Control Impossibility of monitor/handle remotely behaviour of the system Due to nature of the hardware (embedded not HPC) SOLUTION: Development of special motherboard (with obvious impacts on the prices) Workshop on exascale and PRACE prototypes, 20 May

10 Prototype evaluation What works Multi-core CPU GPU + CUDA support GbE interconnection IBoverIP Login nodes HPC software stack GPU 1 NW 1 NW 2 GPU 2 PCIe switch PCIe switch MEM SoC 1 SoC 2 What doesn t work InfiniBand over RDMA due to lack of Mellanox support for 32bit platforms No full OpenFabrics Enterprise Distribution (OFED) available for ARM therefore verbs not usable!!! GPUdirect!!!!!! MEM Workshop on exascale and PRACE prototypes, 20 May

11 Advances over State Of The Art Contribution to the development of HPC software ecosystem on ARM Including CUDA support on ARM based architecture Performance evaluation of ARM based architecture Test of ARM based platforms on large scale Pedraforca has been the first large HPC system based on ARM architecture Contribution pointing out limiting factors of current ARM based platforms (last slide of Alex yesterday) Workshop on exascale and PRACE prototypes, 20 May

12 Results: benchmarks CPU stream benchmarks Per-node power consumption Op. Threads Perf. Power Eff. E.Eff. [MB/s] W % [MB/J] Copy Scale Sum Triad * Arka desktop node excludes inefficient fans 12

13 Results: Lattice Boltzmann on Pedraforca Fluid dynamics simulations evolving a 2D array of particles (double) interacting with their third neighbours. This translates in a regular pattern of: floating point computation collide memory accesses propagate. Propagate Collide Machine Power [W] Performance [GB/s] Perf/Power [GB/J] Time per iteration [ms] Power [W] Performance [GB/s] Perf/Power [GB/J] Time per iteration [ms] Pedraforca Coka * * Coka: 2 x Intel SandyBridge 12-core + 2 x Nvidia K20M (idle Intel MIC was removed for power measurement) Workshop on exascale and PRACE prototypes, 20 May

14 Results: Education Teaching PATC Course (last Friday): Programming ARM based prototypes Workshop on exascale and PRACE prototypes, 20 May

15 Results: Product of E4 Computer Engineering Arka EK002 twin server: Workshop on exascale and PRACE prototypes, 20 May

16 Results: Feelings Like when you are a kid: you put a lot of effort in building a beautiful LEGO construction but You miss a few bricks to finish it (InfiniBand support) You do not have friends to play with (underutilized prototype) Workshop on exascale and PRACE prototypes, 20 May

17 Lessons learned: OK stress unbalanced architectures, but not too much PCIe: Gen1 vs Gen3 RAM: 2GB on host vs 5GB on device Strong commitment of all the parts involved is required No only OEM, but also final providers! How to obtain commitment is an open question: Carlo suggested informing providers Radek suggested pushing providers with the power of a community making appealing/interesting/profitable the needs of the project Size matters: bigger prototype means bigger risk!!! Workshop on exascale and PRACE prototypes, 20 May

18 Conclusions Large scale cluster with ARM + GPU Some delays in deployment But system is truly new Prototype as network of GPUs: failure due to hw configuration + missing sw support from the providers System software leadership First ARM based HPC system with CUDA Benefit to scientists who ported codes to x86 + CUDA GPU Encourage European industry Embedded industry to develop HPC-ready components E4 Computer Systems commercialising technology in ARKA series Good educational platform 18

Paving The Road to Exascale Computing HPC Technology Update

Paving The Road to Exascale Computing HPC Technology Update Paving The Road to Exascale Computing HPC Technology Update Todd Wilde, Director of Technical Computing and HPC Mellanox HPC Technology Accelerates the TOP500 InfiniBand has become the de-factor interconnect

More information

HP ProLiant SL270s Gen8 Server. Evaluation Report

HP ProLiant SL270s Gen8 Server. Evaluation Report HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch

More information

Mellanox Academy Online Training (E-learning)

Mellanox Academy Online Training (E-learning) Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),

More information

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Making small GeForce gtx 750Ti GPU cluster with less than $5000

Making small GeForce gtx 750Ti GPU cluster with less than $5000 From NVidia's Web Making small GeForce gtx 750Ti GPU cluster with less than $5000 K.Hoshina Sep. 15 2014 IceCube Collaboration Meeting Motivation All simulation production uses GPU Currently we have ~200

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering slyness@appro.com Company Overview

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Accelerating CST MWS Performance with GPU and MPI Computing. CST workshop series

Accelerating CST MWS Performance with GPU and MPI Computing.  CST workshop series Accelerating CST MWS Performance with GPU and MPI Computing www.cst.com CST workshop series 2010 1 Hardware Based Acceleration Techniques - Overview - Multithreading GPU Computing Distributed Computing

More information

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

Automating Big Data Benchmarking for Different Architectures with ALOJA

Automating Big Data Benchmarking for Different Architectures with ALOJA www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Advancing Applications Performance With InfiniBand

Advancing Applications Performance With InfiniBand Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and

More information

LS-DYNA Performance Benchmark and Profiling on Windows. July 2009

LS-DYNA Performance Benchmark and Profiling on Windows. July 2009 LS-DYNA Performance Benchmark and Profiling on Windows July 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center

More information

WG 4.1 HPC R&D cartography. H. Huber, Chair R. Brunino, Vice Chair (Speaker)

WG 4.1 HPC R&D cartography. H. Huber, Chair R. Brunino, Vice Chair (Speaker) WG 4.1 HPC R&D cartography H. Huber, Chair R. Brunino, Vice Chair (Speaker) 1 WG4.1 Experts Leif Nordlund AMD SE Business Development Manager Jean-Pierre Panziera Bull FR Director of Performance Engineering

More information

Abaqus Performance Benchmark and Profiling. March 2015

Abaqus Performance Benchmark and Profiling. March 2015 Abaqus 6.14-2 Performance Benchmark and Profiling March 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information

More information

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems PCI Express Impact on Storage Architectures Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

PCI Express IO Virtualization Overview

PCI Express IO Virtualization Overview Ron Emerick, Oracle Corporation Author: Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and

More information

Cooling and thermal efficiently in

Cooling and thermal efficiently in Cooling and thermal efficiently in the datacentre George Brown HPC Systems Engineer Viglen Overview Viglen Overview Products and Technologies Looking forward Company Profile IT hardware manufacture, reseller

More information

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015 INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended

More information

Exadata HW Overview. Marek Mintal

Exadata HW Overview. Marek Mintal Exadata HW Overview Marek Mintal marek.mintal@phaetech.com Oracle Day 2011 20.10.2011 Exadata Hardware Architecture Scalable Grid of industry standard servers for Compute and Storage Eliminates long-standing

More information

A quick tutorial on Intel's Xeon Phi Coprocessor

A quick tutorial on Intel's Xeon Phi Coprocessor A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

TECHNICAL OVERVIEW NVIDIA TESLA P100: INFINITE COMPUTE POWER FOR THE MODERN DATA CENTER

TECHNICAL OVERVIEW NVIDIA TESLA P100: INFINITE COMPUTE POWER FOR THE MODERN DATA CENTER TECHNICAL OVERVIEW NVIDIA TESLA : INFINITE COMPUTE POWER FOR THE MODERN DATA CENTER Nearly a decade ago, NVIDIA pioneered the use of s to accelerate parallel computing with the introduction of the G80

More information

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q

More information

White Paper Solarflare High-Performance Computing (HPC) Applications

White Paper Solarflare High-Performance Computing (HPC) Applications Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

3G Converged-NICs A Platform for Server I/O to Converged Networks

3G Converged-NICs A Platform for Server I/O to Converged Networks White Paper 3G Converged-NICs A Platform for Server I/O to Converged Networks This document helps those responsible for connecting servers to networks achieve network convergence by providing an overview

More information

Building Blocks. CPUs, Memory and Accelerators

Building Blocks. CPUs, Memory and Accelerators Building Blocks CPUs, Memory and Accelerators Outline Computer layout CPU and Memory What does performance depend on? Limits to performance Silicon-level parallelism Single Instruction Multiple Data (SIMD/Vector)

More information

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»

More information

Deep Learning GPU-Based Hardware Platform

Deep Learning GPU-Based Hardware Platform Deep Learning GPU-Based Hardware Platform Hardware and Software Criteria and Selection Mourad Bouache Yahoo! Performance Engineering Group Sunnyvale, CA +1.408.784.1446 bouache@yahoo-inc.com John Glover

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering Enquiry No: Enq/IITK/ME/JB/02 Enquiry Date: 14/12/15 Last Date of Submission: 21/12/15 Formal quotations are invited for HPC cluster.

More information

PRACE-3IP PCP: A journey to the Energy Efficient HPC Dr. Piero Altoè, E4 Computer Engineering

PRACE-3IP PCP: A journey to the Energy Efficient HPC Dr. Piero Altoè, E4 Computer Engineering PRACE-3IP PCP: A journey to the Energy Efficient HPC Dr. Piero Altoè, E4 Computer Engineering 1 E4 Computer Engineering S.p.A. specializes in the manufacturing of high performance IT systems of medium

More information

Kriterien für ein PetaFlop System

Kriterien für ein PetaFlop System Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working

More information

PRIMERGY server-based High Performance Computing solutions

PRIMERGY server-based High Performance Computing solutions PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Intel PCI and PCI Express*

Intel PCI and PCI Express* Intel PCI and PCI Express* PCI Express* keeps in step with an evolving industry The technology vision for PCI and PCI Express* From the first Peripheral Component Interconnect (PCI) specification through

More information

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520 COMPETITIVE BRIEF August 2014 Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520 Introduction: How to Choose a Network Interface Card...1 Comparison: Mellanox ConnectX

More information

Findings in High-Speed OrthoMosaic

Findings in High-Speed OrthoMosaic Findings in High-Speed OrthoMosaic David Piekny, Solutions Product Manager PCI Geomatics Committed To Image-Centric Excellence Technical Session 6, Rm. 203D Tuesday May 3 rd, 9:30-11:00 AM ASPRS 2011,

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

PCI Express and Storage. Ron Emerick, Sun Microsystems

PCI Express and Storage. Ron Emerick, Sun Microsystems Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

High Performance Computing: A Review of Parallel Computing with ANSYS solutions. Efficient and Smart Solutions for Large Models

High Performance Computing: A Review of Parallel Computing with ANSYS solutions. Efficient and Smart Solutions for Large Models High Performance Computing: A Review of Parallel Computing with ANSYS solutions Efficient and Smart Solutions for Large Models 1 Use ANSYS HPC solutions to perform efficient design variations of large

More information

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging

More information

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science Call for Expression of Interest (EOI) for the Supply, Installation

More information

High Performance Computing at CINECA

High Performance Computing at CINECA High Performance Computing at CINECA Marzia Rivi m.rivi@cineca.it Supercomputing Applications & Innovation Department - CINECA CINECA Non profit Consortium, made up of 51 Italian universities, The National

More information

GPU multiprocessing. Manuel Ujaldón Martínez Computer Architecture Department University of Malaga (Spain)

GPU multiprocessing. Manuel Ujaldón Martínez Computer Architecture Department University of Malaga (Spain) GPU multiprocessing Manuel Ujaldón Martínez Computer Architecture Department University of Malaga (Spain) Outline 1. Multichip solutions [10 slides] 2. Multicard solutions [2 slides] 3. Multichip + multicard

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Testing Performed at: Clemson Center of Excellence in Next Generation Computing Evaluation & Usability Labs October 2013

Testing Performed at: Clemson Center of Excellence in Next Generation Computing Evaluation & Usability Labs October 2013 OrangeFS DDN SFA12K Architecture Testing Performed at: Clemson Center of Excellence in Next Generation Computing Evaluation & Usability Labs October 2013 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

Hyperscale. The new frontier for HPC. Philippe Trautmann. HPC/POD Sales Manager EMEA March 13th, 2011

Hyperscale. The new frontier for HPC. Philippe Trautmann. HPC/POD Sales Manager EMEA March 13th, 2011 Hyperscale The new frontier for HPC Philippe Trautmann HPC/POD Sales Manager EMEA March 13th, 2011 Hyperscale the new frontier for HPC New HPC customer requirements demand a shift in technology and market

More information

Review of SC13; Look Ahead to HPC in 2014. Addison Snell addison@intersect360.com

Review of SC13; Look Ahead to HPC in 2014. Addison Snell addison@intersect360.com Review of SC13; Look Ahead to HPC in 2014 Addison Snell addison@intersect360.com New at Intersect360 Research HPC500 user organization, www.hpc500.com Goal: 500 users worldwide, demographically representative

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing

More information

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre University of Cambridge, UIS, HPC Service Authors: Wojciech Turek, Paul Calleja, John Taylor

More information

Headline in Arial Bold 30pt. The Need For Speed. Rick Reid Principal Engineer SGI

Headline in Arial Bold 30pt. The Need For Speed. Rick Reid Principal Engineer SGI Headline in Arial Bold 30pt The Need For Speed Rick Reid Principal Engineer SGI Commodity Systems Linux Red Hat SUSE SE-Linux X86-64 Intel Xeon AMD Scalable Programming Model MPI Global Data Access NFS

More information

Why ClearCube Technology for VDI?

Why ClearCube Technology for VDI? Why ClearCube Technology for VDI? January 2014 2014 ClearCube Technology, Inc. All Rights Reserved 1 Why ClearCube for VDI? There are many VDI platforms to choose from. Some have evolved inefficiently

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. June 2015

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. June 2015 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel,

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER August 21 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...4 Summary...

More information

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth Shattering the 1U Server Performance Record Supermicro and NVIDIA recently announced a new class of servers that combines massively parallel GPUs with multi-core CPUs in a single server system. This unique

More information

RDMA over Ethernet - A Preliminary Study

RDMA over Ethernet - A Preliminary Study RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement

More information

Share and aggregate GPUs in your cluster. F. Silla Technical University of Valencia Spain

Share and aggregate GPUs in your cluster. F. Silla Technical University of Valencia Spain Share and aggregate s in your cluster F. Silla Technical University of Valencia Spain ... more technically... Remote virtualization F. Silla Technical University of Valencia Spain Accelerating applications

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

SGI High Performance Computing

SGI High Performance Computing SGI High Performance Computing Accelerate time to discovery, innovation, and profitability 2014 SGI SGI Company Proprietary 1 Typical Use Cases for SGI HPC Products Large scale-out, distributed memory

More information

A-CLASS The rack-level supercomputer platform with hot-water cooling

A-CLASS The rack-level supercomputer platform with hot-water cooling A-CLASS The rack-level supercomputer platform with hot-water cooling INTRODUCTORY PRESENTATION JUNE 2014 Rev 1 ENG COMPUTE PRODUCT SEGMENTATION 3 rd party board T-MINI P (PRODUCTION): Minicluster/WS systems

More information

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap

More information

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative

More information

PCI Express Impact on Storage Architectures and Future Data Centers

PCI Express Impact on Storage Architectures and Future Data Centers PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation Author: Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is

More information

Experiences With Mobile Processors for Energy Efficient HPC

Experiences With Mobile Processors for Energy Efficient HPC Experiences With Mobile Processors for Energy Efficient HPC Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, Alex Ramirez Barcelona Supercomputing Center Universitat Politècnica

More information

Dutch HPC Cloud: flexible HPC for high productivity in science & business

Dutch HPC Cloud: flexible HPC for high productivity in science & business Dutch HPC Cloud: flexible HPC for high productivity in science & business Dr. Axel Berg SARA national HPC & e-science Support Center, Amsterdam, NL April 17, 2012 4 th PRACE Executive Industrial Seminar,

More information

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being

More information

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Building an energy dashboard. Energy measurement and visualization in current HPC systems Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators

More information

David Vicente Head of User Support BSC

David Vicente Head of User Support BSC www.bsc.es Programming MareNostrum III David Vicente Head of User Support BSC Agenda WEDNESDAY - 17-04-13 9:00 Introduction to BSC, PRACE PATC and this training 9:30 New MareNostrum III the views from

More information

Visualization @ SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems

Visualization @ SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems Visualization @ SUN Shared Visualization 1.1 Software Scalable Visualization 1.1 Solutions Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems The Data Tsunami Visualization is

More information

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

More information

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0 D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline

More information

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre Graphics Processing Unit (GPU) Memory Hierarchy Presented by Vu Dinh and Donald MacIntyre 1 Agenda Introduction to Graphics Processing CPU Memory Hierarchy GPU Memory Hierarchy GPU Architecture Comparison

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03

More information

Michael Kagan. michael@mellanox.com

Michael Kagan. michael@mellanox.com Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies michael@mellanox.com Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service

More information

White Paper The Numascale Solution: Extreme BIG DATA Computing

White Paper The Numascale Solution: Extreme BIG DATA Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer

More information