ANALYSIS OF SUPERCOMPUTER DESIGN
|
|
|
- Magdalen Chandler
- 9 years ago
- Views:
Transcription
1 ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall Anh Huy Bui Nilesh Malpekar Vishnu Gajendran
2 AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis Processor architecture, memory Interconnection model Cluster design Software (System & Application) Conclusions 2
3 1. BRIEF INTRODUCTION Brief history of supercomputer Introduced in the 1960s By Father of supercomputer: Seymour Cray at CDC The first supercomputer: CDC Scalar processor 40 MHz 3
4 1. BRIEF INTRODUCTION Roadmap of supercomputer since then until now Processors roadmap Early machines: scalar processors 1970s: most supercomputer used vector processors mid-1980s: number of vector processors working parallel Late 1980s and 1990s: massive parallel processing system with thousands of ordinary CPU, which are offthe-shelf units or being custom designs Today: supercomputers are now highly-tuned computer clusters using commodity processors combined with custom interconnects 4
5 1. BRIEF INTRODUCTION Roadmap of supercomputer since then until now Speed 5
6 SUPERCOMPUTER DEFINITION As per Landau and Fink The class of fastest and most powerful computers available As per Dictionary of Science and Technology Any computer that is one of the largest, fastest and most powerful available at a given time 6
7 LINPACK BENCHMARK Introduced by Jack Dongarra Reflects performance of a dedicated system for solving dense system of linear equations Algorithm must confirm to LU factorization with partial pivoting have 2/3 n^3 + O(n^2) double precision floating point operations 7
8 LINPACK BENCHMARK - DETAILS Flops/s 64-bit floating point operations per second Operations refer to addition or multiplication Gigaflops => 10^9 flops/s Teraflops => 10^12 flops/s Petaflops => 10^15 flops/s Exaflops => 10^18 flops/s 8
9 LINPACK BENCHMARK DETAILS Rpeak Theoretical peak performance Number of full precision floating-point additions and multiplications completed within cycle time of the machine E.g. If 1.5 GHz computer completes 4 floating point operations per cycle, then Rpeak is 6 Gigaflops 9
10 LINPACK BENCHMARK DETAILS Rmax Maximum performance of a supercomputer measured in Gigaflops Nhalf Size of the problem for which machine achieves half its peak speed Good indicator of machine bandwidth Small value of Nhalf => good machine balance 10
11 2. DESIGN CONCERNS Processor architecture, memory Interconnection model Cluster design Software (System and Application) 11
12 SUPERCOMPUTER ARCHITECTURE Processor architecture Flynn's taxonomy SISD SIMD MISD MIMD Memory Shared memory Distributed memory Virtual shared memory 12
13 VECTOR PROCESSING (SIMD) Acts on a array of data instead of single data item Pipelines the data to the ALU. Scalar processors pipelines only the instruction execution Example: A[i] = B[i] + C[i] for i = 1 to 10 13
14 SCALAR PROCESSOR EXECUTION Execute this loop 10 times read the next instruction and decode it fetch this number fetch that number add them put the result here end loop Demerits: Instruction fetched and decoded ten times Memory is accessed ten times 14
15 VECTOR PROCESSOR EXECUTION Read instruction and decode it. Fetch array B[1..10] and fetch array C[1..10], add them and put the results in A[1...10] Merits Only two address translations are needed Instruction fetch and decode is done only once Demerits Increase in the complexity of the decoder Might slow down the decoding of normal instruction 15
16 VECTOR PROCESSOR BASED Fujitsu VPP500 series Cray -1, Cray-2, Cray X-MP, Cray Y-MP Nec SX-4 series 16
17 RISC ARCHITECTURE Simple instructions Simple hardware design Pipelining is used to speedup RISC machines Less cost and good performance 17
18 PIPELINED VS. NON-PIPELINED 18
19 RISC BASED SUPERCOMPUTERS IBM Roadrunner #1 spot among supercomputers in 2008 uses cell processor Tianhe-IA #1 spot among supercomputers in 2010 uses Intel Xeon processors and Nvidia Tesla GPGPUs 19
20 GPGPU General purpose computing on graphics processing units GPU Stream processor Processor that can run single kernel on many records SIMD High arithmetic intensity 20
21 GPGPU BASED SUPERCOMPUTERS 3 out of top 5 supercomputers in the world uses NVIDIA Tesla GPUs Tianhe 1A Nebulae Tsubame
22 SPECIAL PURPOSE SUPERCOMPUTERS High performance computing device with hardware architecture dedicated for single problem Custom FPGA or VLSI chips are used Examples GRAPE for astrophysics D.E. SHAW RESEARCH ANTON for simulating moleculat dynamics MDGRAPE-3 for protein structure computation BELLE for playing chess 22
23 TOP 500 THE CPU ARCHITECTURE The CPU Architecture Share of Top500 Rankings between 1993 and
24 SHARED AND DISTRIBUTED MEMORY 24
25 SHARED AND DISTRIBUTED MEMORY Virtual Shared memory Programming model that allows processors on the distributed memory machine to be programmed as if they had shared memory Software layer takes care of the necessary communications 25
26 MEMORY HIERARCHIES Two types Cache based Vector register based Factors affecting memory latency Temporal locality - for instruction and data Spatial locality - for data only 26
27 CACHE BASED o Hierarchy of memory o Most recent used data is kept in the cache memory o Cost increases and access time decreases as it goes up the hierarchy 27
28 VECTOR REGISTER BASED Consists of small set of vector registers Main memory built from SRAM Instructions to move data from main memory to vector register in a high bandwidth bulk transfer 28
29 CACHE BASED & VECTOR REGISTER BASED Cache based Merits lower average access time low cost Demerits Lower bandwidth to memory Programs not exhibiting spatial or temporal locality are penalized Vector register based Merits Faster access to main memory Demerits Expensive 29
30 LATEST DEVELOPMENTS Using Flash memory instead of DRAM Cheaper than DRAM Retains data when the current is turned off Reduces the space and power requirements Livermore's Hyperion supercomputer uses Flash based memory 30
31 2.2 INTERCONNECTION Supercomputer interconnect Joins nodes within supercomputer Compute node I/O node Service node Network node Needs to support High Bandwidth Very low level communication latency 31
32 INTERCONNECT TOPOLOGY Static (fixed) Dynamic (switches) Routing Involves large quantities of network cabling often must fit within small spaces do NOT utilize wireless networking technology internally! 32
33 INTERCONNECT USAGE 33
34 WIDELY USED INTERCONNECTS Quadrics 6 /10 fastest supercomputers used Quadrics in 2003 Hardware QsNet I : 350 5us MPI latency QsNet II : MPI latency QsTenG : 10 Gigabit Ethernet switches, from 24-port QsNet III : Approx 2 GB/s in each 1.3us MPI latency 34
35 WIDELY USED INTERCONNECTS Infiniband Switched fabric communication link point-to-point bi-directional serial links between processor node and high-speed peripherals supports several signaling rates links can be bonded together for additional input 35
36 WIDELY USED INTERCONNECTS Myrinet High speed LAN Much lower protocol overhead better throughput less interference and latency can bypass operating system physically two fiber-optic cables upstream and downstream 36
37 TOFU : 6D MESH/TORUS From Fujitsu For large-scale supercomputers that exceed 10 petaflops Stands for TOrus FUsion Can be divided into an arbitrary size of rectangular submeshes, provides torus topology for each submesh 37
38 TOFU : 6D MESH/TORUS 38
39 TOFU : MULTIPATH ROUTING 39
40 TOFU : 3D TORUS VIEW 40
41 TOFU: OTHER FEATURES Throughput and Packet Transfer 10 GB/s of fully bidirectional bandwidth for each 100 GB/s of the off-chip bandwidth for each node to feed enough data to a massive array of 128-Gflops processors Variable packet length; 32 B to 2 KB including header and CRC 41
42 TOFU : 6D MESH/TORUS 42
43 TOFU : 6D MESH/TORUS 43
44 TOFU : 6D MESH/TORUS 44
45 2.3 CLUSTER DESIGN Nowadays, most of supercomputers are clusters: Typical nodes in a cluster Tiered architecture of a cluster Energy consumption Cooling problem 45
46 2.3 CLUSTER DESIGN Typical nodes in a cluster Compute nodes: Comprise the heart of a system. This is where user jobs run I/O nodes: Dedicated to performing all I/O requests by compute nodes - not available to users directly Login/Front-end nodes: These are where users login, compile and interact with the batch system Service nodes : for management functions such as system boot, machine partitioning, system performance measurements, system health monitoring, etc. 46
47 2.3 CLUSTER DESIGN Nodes in BlueGene/P General Configuration 47
48 2.3 CLUSTER DESIGN Scaling Architecture(H/W Scaling) 48
49 2.3 CLUSTER DESIGN A schematic overview of a Blue Gene/L supercomputer 49
50 2.3 CLUSTER DESIGN A schematic overview of the tiered composition of the Roadrunner supercomputer cluster. 50
51 2.3 CLUSTER DESIGN Energy consumption A typical supercomputer consumes a lot of energy Most of them turns into heat Then it requires cooling Examples Tiahe-1A: 4.04MW/hr, if 10cent/hr, then $400/hr and $3.5M/year K computer: 9.89 MW/hr ~ 10,000 suburban homes. $10M/year Energy efficient is measured: FLOPS/Watt Green 500 June 2011: IBM BlueGene/Q is 1st: MFLOPS/Watt. 51
52 2.3 CLUSTER DESIGN Cooling techniques Liquid cooling Fluorinert "cooling waterfall":cray 2 Hot watercooling:ibm Aquasar system (water is used to heat the building as well) Air cooling IBM BlueGene/P Combination of air conditioning with liquid cooling System X Virginia Tech Using low power processors IBM BlueGene systems 52
53 2.3 CLUSTER DESIGN IBM BlueGene/P cooling system 53
54 2.3 CLUSTER DESIGN IBM Aquasar cooling system 54
55 2.4 SOFTWARE - SYSTEM SOFTWARE Operating systems Most of supercomputers are now using Linux Operating sytems used by top
56 2.4 SOFTWARE - APPLICATION SOFTWARE/TOOLS Programming languages: Base languages: Fortran, C Variants of C: C for CUDA or OpenCL for GPGPUs Libraries Loosely connected clusters: PVM, MPI Tightly coordinated shared memory clusters: OpenMP Key software for different functions FullLinux kernel on I/O nodes Proprietary kernel dedicated for compute nodes Scalable control system based on an external service node Tools: open-source solutions Beowulf, WareWulf... 56
57 2.4 SOFTWARE - APPLICATION SOFTWARE/TOOLS Software stacks IBM BlueGene 57
58 3. CONCLUSIONS Giving an overview of concerns when designing a supercomputer Hardware design Interconnection Software design Cluster layout Other concerns: power consumption and cooling Not covered all topics, various designs due to proprietary. 58
59 REFERENCES Supercomputer Wikipedia Tofu: a 6D mesh/torus interconnect for exascale computers. Yuichiro Ajima, Shinji Sumimoto and Toshiyuki Shimizu, Fujitsu Evolution of IBM System Blue Gene Solution, RedPaper REDP Using the Dawn BG/P System. 59
60 THANK YOU! 60
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
High Performance Computing
High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Building a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
A Very Brief History of High-Performance Computing
A Very Brief History of High-Performance Computing CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 1
Parallel Computing. Introduction
Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) [email protected] Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner
(Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1
Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these
COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)
COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University [email protected] COMP
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
Trends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
PRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
HPC Software Requirements to Support an HPC Cluster Supercomputer
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Current Trend of Supercomputer Architecture
Current Trend of Supercomputer Architecture Haibei Zhang Department of Computer Science and Engineering [email protected] Abstract As computer technology evolves at an amazingly fast pace,
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
Chapter 2 Parallel Computer Architecture
Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer
GU Computing 1 2 3 The GU Advantage To ExaScale and Beyond The GU is the Computer The GU Advantage The GU Advantage A Tale of Two Machines Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World s
Kriterien für ein PetaFlop System
Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working
The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems
202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
How To Write A Parallel Computer Program
An Introduction to Parallel Programming An Introduction to Parallel Programming Tobias Wittwer VSSD Tobias Wittwer First edition 2006 Published by: VSSD Leeghwaterstraat 42, 2628 CA Delft, The Netherlands
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
Evaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
ST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Lecture 1: the anatomy of a supercomputer
Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q
Accelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
HP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
10- High Performance Compu5ng
10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
Kalray MPPA Massively Parallel Processing Array
Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing
Cluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
CHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
Cray DVS: Data Virtualization Service
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
Systolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
Cluster Computing at HRI
Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: [email protected] 1 Introduction and some local history High performance computing
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
Jezelf Groen Rekenen met Supercomputers
Jezelf Groen Rekenen met Supercomputers Symposium Groene ICT en duurzaamheid: Nieuwe energie in het hoger onderwijs Walter Lioen Groepsleider Supercomputing About SURFsara SURFsara
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Technical Computing Suite Job Management Software
Technical Computing Suite Job Management Software Toshiaki Mikamo Fujitsu Limited Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Outline System Configuration and Software Stack Features The major functions
Case Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
Parallel Programming
Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen [email protected] WS15/16 Parallel Architectures Acknowledgements Prof. Felix
Building Clusters for Gromacs and other HPC applications
Building Clusters for Gromacs and other HPC applications Erik Lindahl [email protected] CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
High Performance Computing, an Introduction to
High Performance ing, an Introduction to Nicolas Renon, Ph. D, Research Engineer in Scientific ations CALMIP - DTSI Université Paul Sabatier University of Toulouse ([email protected]) Michel
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
The K computer: Project overview
The Next-Generation Supercomputer The K computer: Project overview SHOJI, Fumiyoshi Next-Generation Supercomputer R&D Center, RIKEN The K computer Outline Project Overview System Configuration of the K
Jean-Pierre Panziera Teratec 2011
Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels Jiannan Ouyang, Brian Kocoloski, John Lange The Prognostic Lab @ University of Pittsburgh Kevin Pedretti Sandia National Laboratories HPDC 2015
Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis
Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work
Advanced Computer Networks. High Performance Networking I
Advanced Computer Networks 263 3501 00 High Performance Networking I Patrick Stuedi Spring Semester 2014 1 Oriana Riva, Department of Computer Science ETH Zürich Outline Last week: Wireless TCP Today:
Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School
Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap
Large Scale Simulation on Clusters using COMSOL 4.2
Large Scale Simulation on Clusters using COMSOL 4.2 Darrell W. Pepper 1 Xiuling Wang 2 Steven Senator 3 Joseph Lombardo 4 David Carrington 5 with David Kan and Ed Fontes 6 1 DVP-USAFA-UNLV, 2 Purdue-Calumet,
The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.
White Paper Virtualized SAP: Optimize Performance with Cisco Data Center Virtual Machine Fabric Extender and Red Hat Enterprise Linux and Kernel-Based Virtual Machine What You Will Learn The virtualization
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
