# Trends in High-Performance Computing for Power Grid Applications

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University Co-Founder, SpiralGen This talk presents my personal views and is not endorsed by any other party mentioned in it. The copyright of all images is held by the respective owners. They are used under fair use.

2 This Talk Is Not About 2050 Quantum computing DNA computing Optical computing Other far-out technologies

3 This Talk: HPC and Supercomputing in 2018 Where are we today? How did we end up here? Where will we go? How will it impact you?

4 What Is High Performance Computing? 1 flop/s = one floating-point operation (addition or multiplication) per second mega (M) = 10 6, giga (G) = 10 9, tera (T) = 10 12, peta (P) = 10 15, exa E) = Computing systems in 2010 Cell phone 1 CPUs 1 Gflop/s \$300 1 W power Laptop 2 CPUs 20 Gflop/s \$1, W power Workstation 8 CPUs 1 Tflop/s \$10,000 1 kw power HPC 200 CPUs 20 Tflop/s \$700,000 8 kw power #1 supercomputer 224,162 CPUs 2.3 Pflop/s \$100,000,000 7 MW power Economical HPC Power grid scenario Central servers (planning, contingency analysis) Autonomous controllers (smart grids) Operator workstations (decision support)

5 How Big are the Computational Problems? 1 flop/s = one floating-point operation (addition or multiplication) per second mega (M) = 10 6, giga (G) = 10 9, tera (T) = 10 12, peta (P) = 10 15, exa E) = Matrix-matrix multiplication = for i=1:n for j=1:n for k=1:n C[i,j] = A[i,k]*B[k,j]..running on Cell phone 1 Gflop/s 1k 1k 8MB, 2s Laptop 20 Gflop/s 8k 8k 0.5 GB, 5.5s Workstation 1 Tflop/s 16k 16k 2 GB, 8s HPC 20 Tflop/s 64k 64k 32 GB, 28s #1 supercomputer 2.3 Pflop/s 1M 1M 8 TB, 1,000s

6 The Evolution of Performance Carnegie Mellon

7 How Do We Compare? 1 flop/s = one floating-point operation (addition or multiplication) per second mega (M) = 10 6, giga (G) = 10 9, tera (T) = 10 12, peta (P) = 10 15, exa E) = In 2010 Cell phone 1 Gflop/s Laptop 20 Gflop/s Workstation 1 Tflop/s HPC 20 Tflop/s #1 supercomputer 2.3 Pflop/s would have been the #1 supercomputer back in Cray X-MP/ Mflop/s 1984 NEC SX-3/44R 23.2 Gflop/s 1990 Intel ASCI Red Tflop/s 1997 Earth Simulator Tflop/s 2002

8 If History Predicted the Future the performance of the #1 supercomputer of 2010 #1 supercomputer 1 Pflop/s could be available as HPC 1 Pflop/s 2018 Workstation 1 Pflop/s 2023 Laptop 1 Pflop/s 2030 Cell phone 1 Pflop/s 2036 How do we get here?

9 HPC: ExtremeScale Computing DARPA UHPC ExtremeScale system goals Time-frame: Pflop/s, air-cooled, single 19-inch cabinet Power budget: 57 kw, including cooling 50 Gflop/W for HPL benchmark

10 Developing The New #1: DoE ExaScale Challenges to achieve ExaScale Energy and power Memory and storage Concurrency and locality Resiliency X-Stack: Software for ExaScale System software Fault management Programming environments Applications frameworks Workflow systems #1 supercomputer 2 Pflop/s 2010 #1 supercomputer 1 Eflop/s 2018

11 Some Predictions for ExaScale Machines Processors 10 billion-way concurrency 100 s of cores per die 10 to 10-way per-core concurrency 100 million to 1 billion cores at 1 to 2 GHz Multi-threaded fine grain concurrency 10,000s of cycles system-wide latency Memory Global address space without cache coherence Explicitly managed high speed buffer caches 128 PB capacity Deep memory hierarchies Technology 22 to 11 nanometers CMOS 3-D packaging of dies Optical communications at 1TB/s Fault tolerance Active power management

12 International Semiconductor Roadmap Near-term (through 2016) and long-term (2017 through 2024) Process Integration, Devices, and Structures RF and Analog / Mixed-signal Technologies for Wireless Communications Emerging Research Devices System Drivers Design Test and Test Equipment Front End Processes Lithography Interconnect Factory Integration Assembly and Packaging

13 Prediction 1: Network of Nodes Why? State-of-the-art for large machines Allows scaling from Tflop/s to Eflop/s Designs can be tailored to application Fault tolerance Implications Segmented address space Multiple instructions, multiple data (MIMD) Packet-based messaging Long inter-node latencies

14 Prediction 2: Multicore CPUs Why? State-of-the-art CPU design Growing transistor count (Moore s law) Limited power budget Implications On-chip multithreading Instruction set extensions targeting applications Physically segmented cache Software and/or hardware managed cache Non-uniform memory access (NUMA) IBM Cell BE 8+1 cores Intel Core i7 8 cores, 2-way SMT IBM POWER7 8 cores, 4-way SMT Intel SCC 48 cores Nvidia Fermi 448 cores, SMT Tilera TILE Gx 100 cores

15 Prediction 3: Accelerators Why? Special purpose enables better efficiency 10x to 100x gain for data parallel problems Limited applicability, thus co-processor Can be discrete chip or integrated on die Implications Multiple programming models Coarse-grain partitioning necessary Programs often become non-portable RoadRunner 6,480 CPUs + 12,960 Cells 3240 TriBlades Rack-mount server components 2 quad-core CPUs + 4 GPUs 200 Gflop/s + 4 Tflop/s HPC cabinet CPU blades + GPU blades Custom interconnect

16 Prediction 4: Memory Capacity Limited Why? Good machine balance: 1 byte/flop Multicore CPUs have huge performance Limited power budget Need to limit memory size Implications Saving memory complicates programs Trade-off: memory vs. operations Requires new algorithm optimization Dell PowerEdge R910 2 x 8-core CPUs 256 GB, 145 Gflop/s 1 core: 16 GB for 9 Gflop/s 1.7 byte/flop BlueGene/L 65,536 dual-core CPUs 16 TB RAM, 360 Tflop/s 1 core: 128 MB for 2.8 Gflop/s byte/flop Nvidia Tesla M2050 (Fermi) 1 GPU, 448 cores 6 GB, 515 Gflop/s 1 core: 13 MB for 1.15 Gflop/s byte/flop

17 HPC Software Development Popular HPC programming languages 1953: Fortran 1973: C 1985: C : OpenMP 2007: CUDA Popular HPC libraries 1979: BLAS 1992: LAPACK 1994: MPI 1995: ScaLAPACK 1995: PETSc 1997: FFTW Proposed and maturing (?) Chapel, X10, Fortress, UPC, GA, HTA, OpenCL, Brook, Sequoia, Charm++, CnC, STAPL, TBB, Cilk, Slow change in direction

18 The Cost Of Portability and Maintainability Matrix-Matrix Multiplication Performance [Gflop/s] Best GPU code 10,000 lines CUDA ,500x ,024 2,048 4,096 8,192 matrix size =15 years technology loss Best CPU code 100,000 lines, SSE, OpenMP simple C code 4 lines

19 Summary Hardware vendors will somehow keep Moore s law on track Software development changes very slowly Portable and maintainable code costs performance Unoptimized program = 15 years technology loss

### GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

### Parallel Programming Survey

Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

### Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

### Lecture 1. Course Introduction

Lecture 1 Course Introduction Welcome to CSE 262! Your instructor is Scott B. Baden Office hours (week 1) Tues/Thurs 3.30 to 4.30 Room 3244 EBU3B 2010 Scott B. Baden / CSE 262 /Spring 2011 2 Content Our

### HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

### Accelerating CST MWS Performance with GPU and MPI Computing. CST workshop series

Accelerating CST MWS Performance with GPU and MPI Computing www.cst.com CST workshop series 2010 1 Hardware Based Acceleration Techniques - Overview - Multithreading GPU Computing Distributed Computing

### COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

### Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

### HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

### Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Energy efficient computing on Embedded and Mobile devices Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez A brief look at the (outdated) Top500 list Most systems are built

### High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

### 1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

### Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

### Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

### Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

### Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

### Parallel Computing. Frank McKenna. UC Berkeley. OpenSees Parallel Workshop Berkeley, CA

Parallel Computing Frank McKenna UC Berkeley OpenSees Parallel Workshop Berkeley, CA Overview Introduction to Parallel Computers Parallel Programming Models Race Conditions and Deadlock Problems Performance

### High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

### Amazon Cloud Performance Compared. David Adams

Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

### Jean-Pierre Panziera Teratec 2011

Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores

### Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

### Cluster Computing at HRI

Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

### Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Bill Barth, Kent Milfeld, Dan Stanzione Tommy Minyard Texas Advanced Computing Center Jim Jeffers, Intel June 2013, Leipzig, Germany

### A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University

### Visit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Visit to the National University for Defense Technology Changsha, China Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 3, 2013 On May 28-29, 2013, I had the opportunity to attend

ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

### 10- High Performance Compu5ng

10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance

### Lecture 1: the anatomy of a supercomputer

Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949

### Multi-core Systems What can we buy today?

Multi-core Systems What can we buy today? Ian Watson & Mikel Lujan Advanced Processor Technologies Group COMP60012 Future Multi-core Computing 1 A Bit of History AMD Opteron introduced in 2003 Hypertransport

### Kalray MPPA Massively Parallel Processing Array

Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights

### The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation Mark Baker, Garry Smith and Ahmad Hasaan SSE, University of Reading Paravirtualization A full assessment of paravirtualization

### Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools

### Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner

(Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical

### Virtualization of ArcGIS Pro. An Esri White Paper December 2015

An Esri White Paper December 2015 Copyright 2015 Esri All rights reserved. Printed in the United States of America. The information contained in this document is the exclusive property of Esri. This work

### Jezelf Groen Rekenen met Supercomputers

Jezelf Groen Rekenen met Supercomputers Symposium Groene ICT en duurzaamheid: Nieuwe energie in het hoger onderwijs Walter Lioen Groepsleider Supercomputing About SURFsara SURFsara

### OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

### GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

### Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

### Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

### Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

### The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric

### GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer

GU Computing 1 2 3 The GU Advantage To ExaScale and Beyond The GU is the Computer The GU Advantage The GU Advantage A Tale of Two Machines Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World s

### Building Clusters for Gromacs and other HPC applications

Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network

### Petascale Visualization: Approaches and Initial Results

Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos

### Parallel Algorithm Engineering

Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

### Parallel Computing. Introduction

Parallel Computing Introduction Thorsten Grahs, 14. April 2014 Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00

### OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

### HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

### Overview of HPC systems and software available within

Overview of HPC systems and software available within Overview Available HPC Systems Ba Cy-Tera Available Visualization Facilities Software Environments HPC System at Bibliotheca Alexandrina SUN cluster

### Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

### Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

### SX-ACE Processor: NEC s Brand-New Vector Processor

HOTCHIPS26 SX-ACE Processor: NEC s Brand-New Vector Processor Shintaro Momose, Ph.D. NEC Corporation IT Platform Division Manager of SX vector supercomputer development August 11th, 2014 Performance SX

### CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended

### Chances and Challenges in Developing Future Parallel Applications

Chances and Challenges Prof. Dr. Rudolf Berrendorf rudolf.berrendorf@h brs.de http://berrendorf.inf.h brs.de/, Germany Computer Science Department Outline Why Parallelism? Parallel Systems are Complex

### David Vicente Head of User Support BSC

www.bsc.es Programming MareNostrum III David Vicente Head of User Support BSC Agenda WEDNESDAY - 17-04-13 9:00 Introduction to BSC, PRACE PATC and this training 9:30 New MareNostrum III the views from

### Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

### The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

### Barry Bolding, Ph.D. VP, Cray Product Division

Barry Bolding, Ph.D. VP, Cray Product Division 1 Corporate Overview Trends in Supercomputing Types of Supercomputing and Cray s Approach The Cloud The Exascale Challenge Conclusion 2 Slide 3 Seymour Cray

### Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

### Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

### SC15 SYNOPSIS FOR FEDERAL GOVERNMENT

SC15 SYNOPSIS FOR FEDERAL GOVERNMENT SC15 Synopsis for Federal Government As a service to our clients, Engility offers a few notes from the Supercomputing Conference 2015 (SC15). Engility recently attended

### CS140: Parallel Scientific Computing. Class Introduction Tao Yang, UCSB Tuesday/Thursday. 11:00-12:15 GIRV 1115

CS140: Parallel Scientific Computing Class Introduction Tao Yang, UCSB Tuesday/Thursday. 11:00-12:15 GIRV 1115 1 CS 140 Course Information Instructor: Tao Yang (tyang@cs). Office Hours: T/Th 10-11(or email

### Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre

Graphics Processing Unit (GPU) Memory Hierarchy Presented by Vu Dinh and Donald MacIntyre 1 Agenda Introduction to Graphics Processing CPU Memory Hierarchy GPU Memory Hierarchy GPU Architecture Comparison

### HPC with Multicore and GPUs

HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

### Introduction to Cloud Computing

Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

### A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

### ~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

### ANALYSIS OF SUPERCOMPUTER DESIGN

ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis

### GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

### The K computer: Project overview

The Next-Generation Supercomputer The K computer: Project overview SHOJI, Fumiyoshi Next-Generation Supercomputer R&D Center, RIKEN The K computer Outline Project Overview System Configuration of the K

### Embedded Systems: map to FPGA, GPU, CPU?

Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware

### BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice. 2011 IBM Corporation

BMW11: Dealing with the Massive Data Generated by Many-Core Systems Dr Don Grice IBM Systems and Technology Group Title: Dealing with the Massive Data Generated by Many Core Systems. Abstract: Multi-core

### Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, June 2016

Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, June 2016 Mellanox Leadership in High Performance Computing Most Deployed Interconnect in High Performance

### Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

### Summit and Sierra Supercomputers:

Whitepaper Summit and Sierra Supercomputers: An Inside Look at the U.S. Department of Energy s New Pre-Exascale Systems November 2014 1 Contents New Flagship Supercomputers in U.S. to Pave Path to Exascale

### Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every

### Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

### High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

### High Performance Computing for Silicon Design

High Performance Computing for Silicon Design Shesha Krishnapura Senior Principal Engineer April 2009 Authors & Contributors: Raju Nallapa, Doug Austin, Ananth Sankaranarayanan, Shaji Achuthan, Vipul Lal,

### Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

### Big Data Challenges In Leadership Computing

Big Data Challenges In Leadership Computing Presented to: Data Direct Network s SC 2011 Technical Lunch November 14, 2011 Galen Shipman Technology Integration Group Leader Office of Science Computing at

### Parallel Large-Scale Visualization

Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU

### HPC Software Requirements to Support an HPC Cluster Supercomputer

HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417

### BSC - Barcelona Supercomputer Center

Objectives Research in Supercomputing and Computer Architecture Collaborate in R&D e-science projects with prestigious scientific teams Manage BSC supercomputers to accelerate relevant contributions to

### Retargeting PLAPACK to Clusters with Hardware Accelerators

Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.

### HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

### ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop Emily Apsey Performance Engineer Presentation Overview What it takes to successfully virtualize ArcGIS Pro in Citrix XenApp and XenDesktop - Shareable

### Data Centric Systems (DCS)

Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

### Defense Technical Information Center Compilation Part Notice

UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP023801 TITLE: Application Scalability and Performance on Multicore Architectures DISTRIBUTION: Approved for public release,

### GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

### Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03

### Introduction to GPU Programming Languages

CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

### Cloud Computing. Alex Crawford Ben Johnstone

Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a

### An examination of the dual-core capability of the new HP xw4300 Workstation

An examination of the dual-core capability of the new HP xw4300 Workstation By employing single- and dual-core Intel Pentium processor technology, users have a choice of processing power options in a compact,

### TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

### Experiences With Mobile Processors for Energy Efficient HPC

Experiences With Mobile Processors for Energy Efficient HPC Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, Alex Ramirez Barcelona Supercomputing Center Universitat Politècnica

### 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

### LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

### Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate