A Very Brief History of High-Performance Computing



Similar documents
High Performance Computing. Course Notes HPC Fundamentals

ANALYSIS OF SUPERCOMPUTER DESIGN

Parallel Computing. Introduction

Large Scale Simulation on Clusters using COMSOL 4.2

Scalability and Classifications

Lecture 2 Parallel Programming Platforms

How To Build A Cloud Computer

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Advanced Core Operating System (ACOS): Experience the Performance

Introduction to Cloud Computing

High Performance Computing

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Building an Inexpensive Parallel Computer

Linux Cluster Computing An Administrator s Perspective

Introduction to GPGPU. Tiziano Diamanti

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

A Comparison of Distributed Systems: ChorusOS and Amoeba

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Trends in High-Performance Computing for Power Grid Applications

Computer Architecture TDTS10

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

THE NAS KERNEL BENCHMARK PROGRAM

CMSC 611: Advanced Computer Architecture

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

Overview of HPC Resources at Vanderbilt

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Architecture of Hitachi SR-8000

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Router Architectures

Parallel Programming Survey

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

CFD Implementation with In-Socket FPGA Accelerators

High Performance Computing, an Introduction to

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Systolic Computing. Fundamentals

Chapter 2 Parallel Architecture, Software And Performance

Pedraforca: ARM + GPU prototype

Tableau Server 7.0 scalability

Introduction to GPU Programming Languages

Getting the Performance Out Of High Performance Computing. Getting the Performance Out Of High Performance Computing

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Performance of the JMA NWP models on the PC cluster TSUBAME.

Achieving Performance Isolation with Lightweight Co-Kernels

White Paper The Numascale Solution: Extreme BIG DATA Computing

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

LSKA 2010 Survey Report Job Scheduler

Parallel Computing with MATLAB

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

Cluster Computing in a College of Criminal Justice

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Cluster Implementation and Management; Scheduling

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Technical Computing Suite Job Management Software

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Scientific Computing Programming with Parallel Objects

SPARC64 VIIIfx: CPU for the K computer

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

PCI Express* Ethernet Networking

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Principles and characteristics of distributed systems and environments

Current Trend of Supercomputer Architecture

ST810 Advanced Computing

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Future of the ARM Processor in Military Operations

Accelerate Cloud Computing with the Xilinx Zynq SoC

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Evaluation of CUDA Fortran for the CFD code Strukti

Distributed File System Performance. Milind Saraph / Rich Sudlow Office of Information Technologies University of Notre Dame

An Introduction to Parallel Computing/ Programming

High Performance Computing with Linux Clusters

Computer Systems Structure Input/Output

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Parallel Firewalls on General-Purpose Graphics Processing Units

Transcription:

A Very Brief History of High-Performance Computing CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 1 / 20

Outline 1 High-Performance Computing in the modern computing era 1940s 1960s: the first supercomputers 1975 1990: the Cray era 1990 2010: the cluster era 2000 present; the GPGPU and hybrid era CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 2 / 20

Acknowledgements Some material used in creating these slides comes from Lecture on history of supercomputing from 2009 offering of CMPE113 by Andrea Di Blas, University of California Santa Cruz. Wikipedia provided a starting point and led to many pages from on which facts and images were found. Don Becker: The Inside Story of the Beowulf Saga CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 3 / 20

Outline 1 High-Performance Computing in the modern computing era 1940s 1960s: the first supercomputers 1975 1990: the Cray era 1990 2010: the cluster era 2000 present; the GPGPU and hybrid era CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 4 / 20

Military-driven evolution of supercomputing As can be inferred from the previous example, the development of the modern supercomputer was largely driven by military needs. World War II hand-computed artillery tables used during war; led to development in the USA of ENIAC (Electronic Numerical Integrator and Computer) from 1943 to 1946. Nazi codes like Enigma were cracked in large part by machines Bombe and Colossus in the UK. Cold War Nuclear weapon design Aircraft, submarine, etc. design Intelligence gathering and processing Code breaking CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 5 / 20

ENIAC, 1943 The Electronic Numerical Integrator and Computer was the first storedprogram electronic computer. Designed and built at the University of Pennsylvania from 1943 to 1946, it was transferred to Maryland in 1947 and remained in operation until 1955. CPS343 (Parallel and HPC) Image source: http://ftp.arl.army.mil/ftp/historic-computers/gif/eniac3.gif A Very Brief History of High-Performance Computing Spring 2016 6 / 20

CDC 6600, 1964 The Control Data Corporation s CDC 6600 could do 500 kiloflop/s up to 1 megaflop/s and was the first computer dubbed a supercomputer. Image source: http://en.wikipedia.org/wiki/file:cdc_6600.jc.jpg CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 7 / 20

ILLIAC-IV, 1966-1976 ILLIAC-IV was the forth Illinois Automatic Computer. design began in 1966; goal was 1 GFLOP/s and estimated cost was $8 million finished in 1971-1972 at a cost of $31 million and a top speed well below the 1 GFLOP/s goal designed to be a parallel computer with linear array of 256 64-bit processing elements, the final computer had only a fraction of this amount Image source: http://en.wikipedia.org/wiki/file:illiac_4_parallel_computer.jpg CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 8 / 20

Outline 1 High-Performance Computing in the modern computing era 1940s 1960s: the first supercomputers 1975 1990: the Cray era 1990 2010: the cluster era 2000 present; the GPGPU and hybrid era CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 9 / 20

Cray 1, 1976 Seymour Cray, the architect of the 6600 and other CDC computers, started his own company, Cray Research Inc., in 1972. Cray 1 scalar and vector processor, 80 MHz clock, capable of 133 MFLOP/s for in normal use. $5 to $8 million 20-ton compressor for Freon cooling system shape facilitated shorter wire lengths, increasing clock speed Image source: http://en.wikipedia.org/wiki/file:cray-1-deutsches-museum.jpg CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 10 / 20

Cray X-MP (1982) and Y-MP (1988) 105 MHz Cray X-MP two vector processors, each capable of 200 MFLOPS memory shared by all processors Cray Y-MP 167 MHz 2, 4, or 8 vector processors, each capable of 333 MFLOPS shared memory for all processors Image sources: http://www.piggynap.com/awesome/cray-supercomputer-computing-in-the-80s/ and http://www.flickr.com/photos/llnl/4886635870/ CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 11 / 20

Vector computers During the 1980s speed in supercomputers was primarily achieved through two mechanisms: 1 vector processors: these were designed using a pipeline architecture to rapidly perform a single floating point operation on a large amount of data. Achieving high performance depended on data arriving in the processing unit in an uninterrupted stream. 2 shared memory multiprocessing: a small number (up to 8) processors with access to the same memory space. Interprocess communication took place via the shared memory. Cray was not the only player, other companies like Convex and Alliant marketed vector supercomputers during the 1980s. Some believed vector processing this was a better approach to HPC than using many smaller processors. Seymour Cray famously said If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens? CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 12 / 20

Outline 1 High-Performance Computing in the modern computing era 1940s 1960s: the first supercomputers 1975 1990: the Cray era 1990 2010: the cluster era 2000 present; the GPGPU and hybrid era CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 13 / 20

Distributed memory computers The age of truly effective parallel computers had begun, but was already limited by access to shared memory. Memory contention was a major impediment to increasing speed; the vector processors required high-speed access to memory but multiple processors working simultaneously created contention for memory that reduced access speed. Vector processing worked well with 4 or 8 processors, but memory contention would prevent a 64 or 128 processor machine from working efficiently. CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 14 / 20

Distributed memory computers The alternative to shared memory is distributed memory, where each processor has a dedicated memory space. The challenge became implementing effective processes communication processes can t communicate with one another by writing data into shared memory; a message must be passed. During the 1980s there began to be a lot of interest in distributed memory computers CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 15 / 20

Intel ipsc Hypercube, 1985 between 32 and 128 nodes each node has 80286 processor 80287 math co-processor 512K of RAM 8 Ethernet ports (7 for other compute nodes, one for control node) used hypercube connection scheme between processors base model was 5-dimension hypercube (2 5 = 32 processors) superseded by ipsc/2 in 1987 Image source: http://www.flickr.com/photos/carrierdetect/3599265588/ CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 16 / 20

Thinking Machines Connection Machines The CM-1 was a massively parallel computer with 65,536 SIMD processing elements arranged in a hypercube Each element was composed of a 1-bit processor and 4Kbits of RAM (the CM-2 upped this to 64 Kbits) CM-5, a MIMD computer with a different network topology, was at the top of the first Top 500 list in 1993. Located at Los Alamos National Lab, it had 1024 processors and was capable of 59.7 GFLOPS. http://www.computerhistory.org/revolution/supercomputers/10/73 CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 17 / 20

Beowulf Clusters, 1994-present In 1994 Donald Becker and Tom Stirling, both at NASA, built a cluster using available PCs and networking hardware. 16 Intel 486DX PCs connected with 10 Mb/s Ethernet Achieved 1 GFLOP/s on $50,000 system Named Beowulf by Stirling in reference to a quote in some translations of the epic poem Beowulf that says Because my heart is pure, I have the strength of a thousand men. Image source: http://www.ligo.caltech.edu/ligo_web/0203news/0203mit.html CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 18 / 20

Outline 1 High-Performance Computing in the modern computing era 1940s 1960s: the first supercomputers 1975 1990: the Cray era 1990 2010: the cluster era 2000 present; the GPGPU and hybrid era CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 19 / 20

Hybrid clusters, 2000 present During the 2000s the trend of increasing processor speeds was replaced with increasing the number of processor cores This led to hybrid clusters with a large number of processors, each with a small number of core sharing RAM and some cache space with the development of GPGPU and other general purpose accelerator hardware, today s top supercomputers are hybrid clusters with a large number of standard processor nodes each node has a multicore processor with some individual cache, some shared cache, and RAM shared by all cores some number of GPGPU or other accelerators that are used for offloading specific types of computation from the CPU nodes Computation speed often limited by data-movement rate CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 20 / 20