High Performance Computing

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "High Performance Computing"

Transcription

1 High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University

2 What is High Performance Computing? HPC is ill defined and context dependent. In the late 1980 s, the US Government defined supercomputers as processors capable of more than 100MFlops. This definition is clearly obsolete, as modern desktop PC s are capable of ~ 5GFlops. Another approach is to describe HPC as the fastest computers at any point in time, however, that is more a budgetary dependent definition. For the intent of this presentation, we will define HPC as: Computing resources which provide at least an order of magnitude more computing power than is normally available on a desktop computer.

3 What does the definition really mean? That definition sounds like HPC is hardware only. Isn t the software important too? The full range of supercomputing activities including existing supercomputer systems, special purpose and experimental systems, and the new generation of large scale parallel architectures. HPC exists on a broad range of computer systems, from departmental clusters of desktop workstations to large parallel processing systems.

4 Why High Performance Computing? To achieve the maximum amount of computations in a minimum amount of time SPEED! To solve problems that couldn t otherwise be solved without large computer systems. Traditionally, HPC used in scientific and engineering fields for work with massively complex simulations. Computations are typically floating point intensive.

5 Areas of HPC Use Traditional: Computational Fluid Dynamics (CFD) Climate, Weather, and Ocean Modeling and Simulation (CWO) Nuclear Modeling and Simulation Geophysical/Petroleum Modeling Emerging: Computer Graphics/Scientific Visualization Financial Modeling Database Applications Bioinformatics Biomedical

6 Parallel Computing A collection of processing elements that can communicate and cooperate to solve large problems more quickly than a single processing element. Simultaneous use of multiple processors to execute different parts of a program. Goal: To reduce wall-clock time of run No single processor ever again is likely to match performance of existing parallel HPC systems: HPC => Parallel

7 Overt Type of Parallelism Parallelism is visible to the programmer May be difficult to program (correctly) Large improvements in performance Covert Parallelism is not visible to the programmer Compiler responsible for parallelism Easy to do Small improvements in performance are typical

8 Speed Up Speed Up is one quantitative measure of the benefit of parallelism Speed Up is defined as S / T(N) where, S = best serial time T(N) = time required for N processors Since S/N is the best possible parallel time, speedup typically should not exceed N S is sometimes difficult to measure causing many people to substitute T(1) for S

9 Types of Speed Up

10 Efficiency Speed up does not measure how efficiently the processors are being used Is it worth using 100 processors to get a speed up of 2? Efficiency is defined as the ratio of the speed up and the number of processors required to achieve it The best efficiency is 1 In reality, it is between 0 and 1

11 HPC Architecture and Design

12 Vector Processors Large rows of data are operated on simultaneously Scalar Data is operated on in a sequential fashion Instruction sets Complex Instruction Set Computer (CISC) Reduced Instruction Set Computer (RISC) Post-RISC or CISC/RISC UltraSPARC IBM Power4 IA64

13 Scalar vs. Vector Arithmetic DO 10 i = 1.n a(i) = b(i) + c(i) 10 CONTINUE Scalar: a(1) = b(1) + c(1) a(2) = b(2) + c(2) a(n) = b(n) + c(n) Vector: a = b + c n instructions one vector instruction

14 Where is Scalar better? If the vector length is small If the loop contains IF statements If partial vectorization involves large overhead If recursion is used Small budget for capital expenditures!

15 Architectural Classifications Published by Flynn in 1972 Flynn s Taxonomy Outdated, but still widely used Categorizes machines by instruction streams and data streams A stream of instructions (the algorithm) tells the computer what to do. A stream of data (the input) is affected by these instructions. Four Categories SISD Single Instruction, Single Data MISD Multiple Instruction, Single Data SIMD Single Instruction, Multiple Data MIMD Multiple Instruction, Multiple Data

16 SISD Single Instruction, Single Data Conventional single processor computers Each arithmetic instruction initiates an operation on a data item taken from a single stream of data elements. Historical supercomputers and most contemporary microprocessors are SISD

17 SIMD Single Instruction, Multiple Data Many, simple processing elements 1000s Each processor has its own local memory Each processor runs the same program Each processor processes different data streams All processors work in lock-step (synchronously) Very efficient for array/matrix operations Most older vector/array computers are SIMD Example machines: Cray YMP Thinking Machine s CM-200

18 MISD Multiple Instruction, Single Data Very few machines fit this category None have been commercially successful or have had any impact on computational science

19 MIMD Multiple Instruction, Multiple Data Most diverse of the four classifications Multiple processors Each processor either has own, or accesses shared, memory Each processor can run the same or different programs Each processor processes different data streams Processors can work synchronously or asynchronously

20 MIMD cont. Processors can be either tightly or loosely coupled Examples include: Processors and memory units specifically designed to be components of a parallel architecture (e.g., Intel Paragon) Large scale parallel machines built from off the shelf workstations (e.g., Beowulf Cluster) Small scale multiprocessors made by connecting multiple vector processors together (e.g., Cray T90) Wide variety of other designs as well

21 SPMD Computing Not a Flynn category, per se, but instead a combination of categories. SPMD stands for single program, multiple data The same program is run on the processors of an MIMD machine. Occasionally the processors may synchronize. Because an entire program is executed on separate data, it is possible that different branches are taken, leading to asynchronous parallelism SPMD came about as a desire to do SIMD like calculations on MIMD machine SPMD is not a hardware paradigm, but instead, the software equivalent of SIMD

22 Memory Classifications Organization Shared Memory (SM-MIMD) Bus based Interconnection network Distributed Memory (DM-MIMD) Local Message passing Virtual shared memory (VSM-MIMD) Physically distributed, but appears as one image Access Uniform Memory Access (UMA) All processors take the same time to reach all memory locations Non-Uniform Memory Access (NUMA)

23 Memory Organization Shared Memory One common memory block between all processors Bus Based Since bus has limited bandwidth, number of processors which can be used is limited to a few tens of processors Examples include typical multi-processors PC s, SGI Challenge

24 Memory Organization Switch based Utilizes (complex) inter-connected network to connect processors to shared memory modules May use multi-stage networks - NUMA Increases bandwidth to memory over bus based systems Every processor still has access to global memory Examples include Sun E10000

25 Memory Organization Distributed Memory Message Passing. Memory physically distributed through the machine. Each processor has private memory. Contents of private memory can only be accessed by that processor. If required by another processor, then it must be sent explicitly. In general, machines can be scaled to thousands of processors. Requires special programming techniques. Examples include Cray T3E, IBM SP

26 Memory Organization Virtual Shared Memory Objective is to have the scalability of distributed memory with the programmability of shared memory Global address space mapped onto physically distributed memory Data moves between processors on demand or as it is accessed

27 Compute Clusters Connecting multiple standalone machines via a network interconnect, utilizing software to access the combined systems as one computer The standalone machines could be inexpensive single processor workstations or multi-million dollar multiprocessor servers Individual machines can be connected via numerous networking technologies using a variety of topologies. 100BaseT Ethernet inexpensive, low performance, high latency Myrinet (2 Gb/s) expensive, high performance, low latency Proprietary high speed network Nearly 20% of fastest 500 supercomputers in the world are clusters.

28 Beowulf Clusters First developed in 1994 at NASA Goddard Goal is to build a supercomputer utilizing a large number of inexpensive, commodity off-the-shelf (COTS) parts. Increasingly used for HPC applications due to high cost of MPPs and the wide availability of networked workstations. Not a panacea for HPC. Many applications require shared memory or vector solutions. Existing Beowulf clusters range from 2 to 4000 processors, are likely to reach processors in the near future.

29 Metacomputing Metacomputing is a dynamic environment that has some informal pool of nodes that can join or leave the environment whenever they desire. Why do we need metacomputing? Our computational needs are infinite Our financial needs are finite Someday we will utilize computing cycles just like we utilize electricity from the power company. Enables us to buy cycles on an as needed basis. Commonly referred to The Grid or Computational Grids

30 Job Execution Most HPC systems do not allow interactive access. Batch-style jobs are submitted to the system via a queuing mechanism. Schedulers determine the order in which jobs should be run. Factors include User priority Resource availability The goal of the Scheduler is to maximize system utilization. Scheduler optimization is an important component and is a field of study of its own.

31 HPC Software

32 Programming Languages It has been said, I don t know what language they will be using to program high performance computers 10 years from now, but we do know it will be called FORTRAN. C and C++ are making strides in the HPC community due to their ability to create complex data structures and better I/O routines. FORTRAN 90 incorporated many of the features of C (e.g., pointers). High Performance Fortran (HPF) is FORTRAN 90 with directivebased extensions allowing for shared and distributed memory machines clusters, traditional supercomputers, and massively parallel processors Today, many programmers prefer to do their data structure, communications, etc. in C, while doing the computations in FORTRAN.

33 Compilers Compilers are an often overlooked area of HPC, but are of critical importance. Application run times are directly related to the ability of the compiler to produce highly optimized code. Poor compiler optimization could result in run times increasing by an order of magnitude. Optimization Levels None, Basic, Interprocedural analysis, Runtime profile analysis, Floating-point, Data flow analysis, Advanced

34 Distributed Memory Parallel Programming Message passing is a programming paradigm where one effectively writes multiple programs for parallel execution. The problem must be decomposed, typically by domain or function Each process knows only about its own local data. If data is required from a different process, it must send a message to that process asking for the data Access to remote data is much slower than to local data, so a major objective is to minimize remote communications.

35 Message Passing Environments PVM Parallel Virtual Machine Portable and operable across heterogeneous computers Performance sacrificed for flexibility Well defined protocol allows for interoperability between different implementations MPI Message Passing Interface Today s standard for message passing Widely adopted by most vendors Portable and operable across heterogeneous computers Good performance with reasonable efficiency No standard for interoperability between implementations

36 Shared Memory Parallel Programming Every processor has direct access to the memory of every other processor in the system Not widely used at programmer level, but widely used at the system level (even on single processors systems via Multithreading) Allows low-latency, high-bandwidth communications Portability is poor Easy to program (compared to message passing) Directive controlled parallelism

37 Shared Memory Environments POSIX Threads (Pthreads) SHMEM OpenMP Quickly becoming the standard API for shared memory programming Emphasis on performance and scalability Allows for fine-grain or coarse-grain parallelism Some implementations are interoperable with MPI and PVM Message Passing

38 Benchmarking Benchmarking is an important aspect of HPC and is used for purchase decisions, system configuration, and application tuning. Rule 1: All vendors lie about their benchmarks!! Purchase decisions should not be based on published benchmark results. If at all possible, run your code on the exact machine you are considering for purchase. LINPACK Mother of all benchmarks Not originally designed to be a benchmark, but instead a set of high performance library routines for linear algebra. Reports average megaflop rates by dividing the total number of floating-point operations by time Used for the TOP500 Supercomputing Sites report

39

40 Summary HPC is parallel computing. HPC involves a broad spectrum of components, and is only as fast as the weakest component, whether that be processor, memory, network interconnect, compiler, or software. HPC exists on a broad range of computer systems, from departmental clusters of desktop workstations to large parallel processing systems.

41 Additional Information Dowd, Kevin and Severance, Charles. High Performance Computing, Second Edition. O Reilly & Associates, Inc., Dongarra, Jack. High Performance Computing: Technology, Methods and Applications. Elsevier, Buyya, Rajkumar. High Performance Cluster Computing, Volume 1. Prentice Hall PTR, Foster, Ian and Kesselman, Carl. The Grid: Blueprint for a new Computing Infrastructure. Morgan Kaufmann Publishers, Inc., 1999.

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Short Course: Advanced programming with MPI. Multi-core processors

Short Course: Advanced programming with MPI. Multi-core processors Short Course: Advanced programming with MPI Spring 2007 Multi-core processors 1 Parallel Computer Parallel Computer PC cluster in 2000-4U height per node - Myrinet Network - Fast Ethernet Network (1U =

More information

Parallel Computing. Frank McKenna. UC Berkeley. OpenSees Parallel Workshop Berkeley, CA

Parallel Computing. Frank McKenna. UC Berkeley. OpenSees Parallel Workshop Berkeley, CA Parallel Computing Frank McKenna UC Berkeley OpenSees Parallel Workshop Berkeley, CA Overview Introduction to Parallel Computers Parallel Programming Models Race Conditions and Deadlock Problems Performance

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Module 2: "Parallel Computer Architecture: Today and Tomorrow" Lecture 4: "Shared Memory Multiprocessors" The Lecture Contains: Technology trends

Module 2: Parallel Computer Architecture: Today and Tomorrow Lecture 4: Shared Memory Multiprocessors The Lecture Contains: Technology trends The Lecture Contains: Technology trends Architectural trends Exploiting TLP: NOW Supercomputers Exploiting TLP: Shared memory Shared memory MPs Bus-based MPs Scaling: DSMs On-chip TLP Economics Summary

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

More information

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Large Scale Simulation on Clusters using COMSOL 4.2

Large Scale Simulation on Clusters using COMSOL 4.2 Large Scale Simulation on Clusters using COMSOL 4.2 Darrell W. Pepper 1 Xiuling Wang 2 Steven Senator 3 Joseph Lombardo 4 David Carrington 5 with David Kan and Ed Fontes 6 1 DVP-USAFA-UNLV, 2 Purdue-Calumet,

More information

Message-passing Multiprocessors. Server-based systems

Message-passing Multiprocessors. Server-based systems Message-passing Multiprocessors. In recent years, there has been interest (and even commercially available machines) which do not use common memory at all, but instead, communicate by passing messages.

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Linux and High-Performance Computing

Linux and High-Performance Computing Linux and High-Performance Computing Outline Architectures & Performance Measurement Linux on High-Performance Computers Beowulf Clusters, ROCKS Kitten: A Linux-derived LWK Linux on I/O and Service Nodes

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Availability Digest. Penguin Computing Offers Beowulf Clustering on Linux January 2007

Availability Digest. Penguin Computing Offers Beowulf Clustering on Linux January 2007 the Availability Digest Penguin Computing Offers Beowulf Clustering on Linux January 2007 Clustering can provide high availability and superr-scalable high-performance computing at commodity prices. The

More information

What is High Performance Computing?

What is High Performance Computing? What is High Performance Computing? V. Sundararajan Scientific and Engineering Computing Group Centre for Development of Advanced Computing Pune 411 007 vsundar@cdac.in Plan Why HPC? Parallel Architectures

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Hiroaki Kobayashi 8/3/2011

Hiroaki Kobayashi 8/3/2011 Hiroaki Kobayashi 8/3/2011 l l l l l l l l l Introduction The Difficulty of Creating Parallel Processing Programs Shared Memory Multiprocessors Clusters and Other Message-Passing Multiprocessors Cache

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

ANALYSIS OF SUPERCOMPUTER DESIGN

ANALYSIS OF SUPERCOMPUTER DESIGN ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis

More information

Chapter 7. Multicores, Multiprocessors, and Clusters

Chapter 7. Multicores, Multiprocessors, and Clusters Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)

More information

Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

BLM 413E - Parallel Programming Lecture 3

BLM 413E - Parallel Programming Lecture 3 BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several

More information

Making A Beowulf Cluster Using Sun computers, Solaris operating system and other commodity components

Making A Beowulf Cluster Using Sun computers, Solaris operating system and other commodity components Making A Beowulf Cluster Using Sun computers, Solaris operating system and other commodity components 1. INTRODUCTION: Peter Wurst and Christophe Dupré Scientific Computation Research Center Rensselaer

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the

More information

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis Middleware and Distributed Systems Introduction Dr. Martin v. Löwis 14 3. Software Engineering What is Middleware? Bauer et al. Software Engineering, Report on a conference sponsored by the NATO SCIENCE

More information

High Performance Computing. R P Thangavelu C-MMACS January 23, 2004

High Performance Computing. R P Thangavelu C-MMACS January 23, 2004 High Performance Computing R P Thangavelu C-MMACS January 23, 2004 Agenda What is HPC? Evolution of the HPC base at C-MMACS Current status Projections for the future What is HPC? Why worry about performance?

More information

Tamás Budavári / The Johns Hopkins University

Tamás Budavári / The Johns Hopkins University PRACTICAL SCIENTIFIC ANALYSIS OF BIG DATA RUNNING IN PARALLEL / The Johns Hopkins University 2 Parallelism Data parallel Same processing on different pieces of data Task parallel Simultaneous processing

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

ADVANCED COMPUTER ARCHITECTURE: Parallelism, Scalability, Programmability

ADVANCED COMPUTER ARCHITECTURE: Parallelism, Scalability, Programmability ADVANCED COMPUTER ARCHITECTURE: Parallelism, Scalability, Programmability * Technische Hochschule Darmstadt FACHBEREiCH INTORMATIK Kai Hwang Professor of Electrical Engineering and Computer Science University

More information

10- High Performance Compu5ng

10- High Performance Compu5ng 10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance

More information

SUPERCOMPUTERS SECOND EDITION SEPTEMBER 1991

SUPERCOMPUTERS SECOND EDITION SEPTEMBER 1991 SUPERCOMPUTERS SECOND EDITION SEPTEMBER 1991 ARCHITECTURE TECHNOLOGY CORPORATION SPECIALISTS IN COMPUTER ARCHITECTURE P.O. BOX 24344 MINNEAPOLIS, MINNESOTA 55424 (612) 935-2035 ELSEVIER ADVANCED TECHNOLOGY

More information

Lecture 23: Multiprocessors

Lecture 23: Multiprocessors Lecture 23: Multiprocessors Today s topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks

More information

Considering Middleware Options

Considering Middleware Options Considering Middleware Options in High-Performance Computing Clusters Middleware is a critical component for the development and porting of parallelprocessing applications in distributed high-performance

More information

A Comparison of Distributed Systems: ChorusOS and Amoeba

A Comparison of Distributed Systems: ChorusOS and Amoeba A Comparison of Distributed Systems: ChorusOS and Amoeba Angelo Bertolli Prepared for MSIT 610 on October 27, 2004 University of Maryland University College Adelphi, Maryland United States of America Abstract.

More information

Networks and parallelism

Networks and parallelism Networks and parallelism CSE-C3200 Operating systems Autumn 2015 (I), Lecture 9 Vesa Hirvisalo Today Middleware Something between the OS and applications SOAP and DDS as the examples Network virtualization

More information

CS 352H: Computer Systems Architecture

CS 352H: Computer Systems Architecture CS 352H: Computer Systems Architecture Topic 14: Multicores, Multiprocessors, and Clusters University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Introduction Goal:

More information

Cluster, Grid, Cloud Concepts

Cluster, Grid, Cloud Concepts Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Architecture of Hitachi SR-8000

Architecture of Hitachi SR-8000 Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

More information

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )

More information

Client/Server and Distributed Computing

Client/Server and Distributed Computing Adapted from:operating Systems: Internals and Design Principles, 6/E William Stallings CS571 Fall 2010 Client/Server and Distributed Computing Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Traditional

More information

Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA

Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA [REF] I Foster, Y Zhao, I Raicu, S Lu, Cloud computing and grid computing 360-degree compared Grid Computing Environments Workshop, 2008.

More information

Contents PART I: Background

Contents PART I: Background Contents List of Figures List of Tables xvii xxv PART I: Background 1 1. INTRODUCTION 3 1.1 Why Parallel Processing? 3 1.2 Parallel Architectures 4 1.2.1 SIMD Systems 5 1.2.2 MIMD Systems 6 1.3 Job Scheduling

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration IS-ENES/PrACE Meeting EC-EARTH 3 A High-resolution Configuration Motivation Generate a high-resolution configuration of EC-EARTH to Prepare studies of high-resolution ESM in climate mode Prove and improve

More information

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study DISTRIBUTED SYSTEMS AND CLOUD COMPUTING A Comparative Study Geographically distributed resources, such as storage devices, data sources, and computing power, are interconnected as a single, unified resource

More information

Motivation and Goal. Introduction to HPC content and definitions. Learning Outcomes. Organization

Motivation and Goal. Introduction to HPC content and definitions. Learning Outcomes. Organization Motivation and Goal Introduction to HPC content and definitions Jan Thorbecke, Section of Applied Geophysics Get familiar with hardware building blocks, how they operate, and how to make use of them in

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Software Development around a Millisecond

Software Development around a Millisecond Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

Workshare Process of Thread Programming and MPI Model on Multicore Architecture Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma

More information

Dual Core Architecture: The Itanium 2 (9000 series) Intel Processor

Dual Core Architecture: The Itanium 2 (9000 series) Intel Processor Dual Core Architecture: The Itanium 2 (9000 series) Intel Processor COE 305: Microcomputer System Design [071] Mohd Adnan Khan(246812) Noor Bilal Mohiuddin(237873) Faisal Arafsha(232083) DATE: 27 th November

More information

Basic Concepts in Parallelization

Basic Concepts in Parallelization 1 Basic Concepts in Parallelization Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Fortran Program Development with Visual Studio* 2005 ~ Use Intel Visual Fortran with Visual Studio* ~

Fortran Program Development with Visual Studio* 2005 ~ Use Intel Visual Fortran with Visual Studio* ~ Fortran Program Development with Visual Studio* 2005 ~ Use Intel Visual Fortran with Visual Studio* ~ 31/Oct/2006 Software &Solutions group * Agenda Features of Intel Fortran Compiler Integrate with Visual

More information

TDTS 08 Advanced Computer Architecture

TDTS 08 Advanced Computer Architecture TDTS 08 Advanced Computer Architecture [Datorarkitektur] www.ida.liu.se/~tdts08 Zebo Peng Embedded Systems Laboratory (ESLAB) Dept. of Computer and Information Science (IDA) Linköping University Contact

More information

An Introduction to Parallel Computing/ Programming

An Introduction to Parallel Computing/ Programming An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European

More information

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov

More information

GPUs: Doing More Than Just Games. Mark Gahagan CSE 141 November 29, 2012

GPUs: Doing More Than Just Games. Mark Gahagan CSE 141 November 29, 2012 GPUs: Doing More Than Just Games Mark Gahagan CSE 141 November 29, 2012 Outline Introduction: Why multicore at all? Background: What is a GPU? Quick Look: Warps and Threads (SIMD) NVIDIA Tesla: The First

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth Shattering the 1U Server Performance Record Supermicro and NVIDIA recently announced a new class of servers that combines massively parallel GPUs with multi-core CPUs in a single server system. This unique

More information

Comparison of Shared memory based parallel programming models Srikar Chowdary Ravela

Comparison of Shared memory based parallel programming models Srikar Chowdary Ravela Master Thesis Computer Science Thesis no: MSC-2010-01 Month Year Comparison of Shared memory based parallel programming models Srikar Chowdary Ravela School School of Computing of Blekinge Blekinge Institute

More information

UNIT 1 OPERATING SYSTEM FOR PARALLEL COMPUTER

UNIT 1 OPERATING SYSTEM FOR PARALLEL COMPUTER UNIT 1 OPERATING SYSTEM FOR PARALLEL COMPUTER Structure Page Nos. 1.0 Introduction 5 1.1 Objectives 5 1.2 Parallel Programming Environment Characteristics 6 1.3 Synchronisation Principles 1.3.1 Wait Protocol

More information

A Very Brief History of High-Performance Computing

A Very Brief History of High-Performance Computing A Very Brief History of High-Performance Computing CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Very Brief History of High-Performance Computing Spring 2016 1

More information

GPU programming using C++ AMP

GPU programming using C++ AMP GPU programming using C++ AMP Petrika Manika petrika.manika@fshn.edu.al Elda Xhumari elda.xhumari@fshn.edu.al Julian Fejzaj julian.fejzaj@fshn.edu.al Abstract Nowadays, a challenge for programmers is to

More information

Software for High Performance. Computing. Requirements & Research Directions. Marc Snir

Software for High Performance. Computing. Requirements & Research Directions. Marc Snir Software for High Performance Requirements & Research Directions Computing Marc Snir May 2006 Outline Petascale hardware Petascale operating system Programming models 2 Jun-06 Petascale Systems are Coming

More information

Using an MPI Cluster in the Control of a Mobile Robots System

Using an MPI Cluster in the Control of a Mobile Robots System Using an MPI Cluster in the Control of a Mobile Robots System Mohamed Salim LMIMOUNI, Saïd BENAISSA, Hicham MEDROMI, Adil SAYOUTI Equipe Architectures des Systèmes (EAS), Laboratoire d Informatique, Systèmes

More information

Performance of the JMA NWP models on the PC cluster TSUBAME.

Performance of the JMA NWP models on the PC cluster TSUBAME. Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Hybrid parallelism for Weather Research and Forecasting Model on Intel platforms (performance evaluation)

Hybrid parallelism for Weather Research and Forecasting Model on Intel platforms (performance evaluation) Hybrid parallelism for Weather Research and Forecasting Model on Intel platforms (performance evaluation) Roman Dubtsov*, Mark Lubin, Alexander Semenov {roman.s.dubtsov,mark.lubin,alexander.l.semenov}@intel.com

More information

Citation Proceedings Of The Ieee, 1996, v. 84 n. 7, p Creative Commons: Attribution 3.0 Hong Kong License

Citation Proceedings Of The Ieee, 1996, v. 84 n. 7, p Creative Commons: Attribution 3.0 Hong Kong License Title High-performance computing for vision Author(s) Wang, CL; Bhat, PB; Präs Anna, VK Citation Proceedings Of The Ieee, 1996, v. 84 n. 7, p. 931-946 Issued Date 1996 URL http://hdl.handle.net/10722/43637

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Challenges to Obtaining Good Parallel Processing Performance

Challenges to Obtaining Good Parallel Processing Performance Outline: Challenges to Obtaining Good Parallel Processing Performance Coverage: The Parallel Processing Challenge of Finding Enough Parallelism Amdahl s Law: o The parallel speedup of any program is limited

More information

THE NAS KERNEL BENCHMARK PROGRAM

THE NAS KERNEL BENCHMARK PROGRAM THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

Improved LS-DYNA Performance on Sun Servers

Improved LS-DYNA Performance on Sun Servers 8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

TECHNICAL OVERVIEW NVIDIA TESLA P100: INFINITE COMPUTE POWER FOR THE MODERN DATA CENTER

TECHNICAL OVERVIEW NVIDIA TESLA P100: INFINITE COMPUTE POWER FOR THE MODERN DATA CENTER TECHNICAL OVERVIEW NVIDIA TESLA : INFINITE COMPUTE POWER FOR THE MODERN DATA CENTER Nearly a decade ago, NVIDIA pioneered the use of s to accelerate parallel computing with the introduction of the G80

More information

Distributed and Cloud Computing

Distributed and Cloud Computing Distributed and Cloud Computing K. Hwang, G. Fox and J. Dongarra Chapter 1: Enabling Technologies and Distributed System Models Copyright 2012, Elsevier Inc. All rights reserved. 1 1-1 Data Deluge Enabling

More information

Distributed Optimization of Fiber Optic Network Layout using MATLAB. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl

Distributed Optimization of Fiber Optic Network Layout using MATLAB. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl Distributed Optimization of Fiber Optic Network Layout using MATLAB R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl uhl@cosy.sbg.ac.at R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and

More information

HPC Software Requirements to Support an HPC Cluster Supercomputer

HPC Software Requirements to Support an HPC Cluster Supercomputer HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information