High-Performance Computing: Architecture and APIs

Size: px
Start display at page:

Download "High-Performance Computing: Architecture and APIs"

Transcription

1 High-Performance Computing: Architecture and APIs Douglas Fuller ASU Fulton High Performance Computing

2 Why HPC? Capacity computing Do similar jobs, but lots of them. Capability computing Run programs we could not before! Insufficient resources (memory, usually) Insufficient time

3 Commonly Discussed HPC Terms Supercomputer Cluster (Beowulf Cluster) Shared Memory Machine Grid Scalability

4 Intel Core 2 Duo - 3GF

5 Scaling von Neumann Where are the bottlenecks? C P M I/O

6 Vector processing SIMD approach (sound familiar?) Special register/execution units Handles large amounts of data at once Good for linear algebra / scientific computing Can be assisted by language support Can be partially leveraged by compilers

7 Cray-1 First successful vector system, bit, 80 MHz 8MB RAM 250 MFLOPS peak (136 typical) 115 kw

8 Vector processing Resurgence with high-end A/V market MMX, 3DNow!, SSE, SSE2 GPUs Game consoles iphones Vector processing leading to gaming/hpc convergence (Cell)

9 SMP Systems MIMD approach Commodity processors connect through an interconnect to a single logical memory Demands on the interconnection bus extremely high Sustained memory bandwidth Fetch latency Cache coherence is a real problem Programs must still have concurrency, but variables are shared C C C P P P I/O M P P P C C C

10 Digression: cache coherence Multiple processors using the same data These processors caches must stay synchronized! C A=3; C P P I/O P P Get A A C This introduces considerable overhead and limits scalability 3 P M P C

11 Programming SMPs The same as multithreaded serial programming, right? (example) Well, almost. More locality issues False sharing Toolkits to help OpenMP

12 SMP systems today Just about everything has multiple cores Intel Core, Core 2, Xeon,... AMD Opteron,... Cache strategies vary Transistor count (AMD vs. Intel) Memory bandwidth (and local IC buses)

13 Scaling the SMP Remember the critical system bus Broadcast coherence messages and shared links impose high bandwidth requirements Cores aren t the problem. Solution: serialize memory bus communication Removes the S in SMP

14 NUMA Similar to SMP, but we give up the S Reduces bus bandwidth requirements Requires interconnect design (more later) Introduces a penalty for remote memory access! Cache coherence pops up again What about for remote memory?

15 The directory I/O Tracks which CPUs have each cache line C D D C Allows point-to-point messages for cachecoherence P M M P How do you locate a remote block that s cached by another processor? P C M D M D P C

16 Programming NUMA It s just like writing for SMPs. (example) Right? Sort of. It looks the same, but there are more factors to consider. Architecture design imposes a performance impact Code still must be architecture-aware!

17 NUMA today Still exists for HPC, but expensive Custom hardware, directory units, interconnects Custom software (single system image) Commodity processors AMD Opteron (DirectConnect brings MMU onboard)

18 MPP Systems Use a large number of weaker processors Most decouple their memory subsystems - Distributed memory Relies on: Smart system, Smart compiler, or Smart programmer

19 MPP Systems Processors interconnected with custom hardware Architectures vary widely

20 Thinking Machines CM-5 Up to 16, MHz processors Largest ever built was 512 processors, 64 GF peak 16 GB main memory Where was the most famous CM-5?

21 Programming MPPs Program follows architecture (including interconnect) Many MPPs support multiple models More architecture-aware models perform better Less architecture-aware models are more portable What to choose when developing a program?

22 Interconnecting Topology choice critical; considerations include: Performance (latency and bandwidth) Conformity/uniformity Cost Scalability

23 Interconnecting Completely Connected an Completely Connected : Each pr communication link to every othe Completely Connected and Star Networks Completely Connected : Each processor has direct communication link to every other processor Fully connected Arrays and Rings Star Connected Network : The m Star the central processor. Every othe Linear Array : connected to it. Counter part of C Ring Dynamic interconnect. Star Connected Network : The middle processor is the central processor. Ring : Every other processor is connected to it. Counter part of Cross Bar switch in Dynamic interconnect. Mesh Network (e.g. 2D-array)

24 near Array : Arrays and Rings esh Network (e.g. 2D-array) Mesh Torus Hypercube Tree Hypercubes Interconnecting ing ercube : Network : A multidimensional mesh of essors with exactly two processors in each nsion. A d dimensional processor consists of p = 2 d processors wn below are 0, 1, 2, and 3D hypercubes Fat Trees Multiple switches Each level has the same number of links in as out Increasing number of links at each level 0-D 1-D 2-D 3-D hypercubes Gives full bandwidth between the links Torus 2-d Torus (2-d version of the ring) Added latency the higer you go

25 Look familiar? Desktop systems use the same architectures Token Ring SONET FDDI Ethernet

26 Desktop systems Leverage the economics Commodity parts CPUs and memory Circuit City supercomputing Interconnect is now a commodity network.

27 Beowulf Clusters A Beowulf is a parallel computer consisting of a collection of nodes built from commodity parts Each node has it's own processors, memory, and I/O Nodes communicate through an interconnection network. One node designated master or head is attached to public network and interconnection network Compute nodes Interconnection Network Master Node Internet or Internal Network Basic Beowulf

28 Clusters Important Characteristics Commodity Components - Mass Market R&D investment keeps technology moving forward Distributed memory - your old program won t speed up Communication between processors has a cost

29 Programming clusters Multiple system images, therefore there is NO shared memory. Many models try to emulate earlier architectures. Why? By far the most popular is MPI.

30 A 10 minute introduction to MPI

31 What is MPI? The Message Passing Interface,De Facto standard for message passing Unified many vendor specific MP libraries in 1990s Works with C,FORTRAN, F90 (always), C++ (usually) and more exotic things (e.g. Python) occasionally Allows programmer to explicitly send/receive messages among processes in parallel program Supports Data Parallel programming model

32 Data Parallel Programming One program, many copies. Each instance of the program (task) does the same instructions on different data. Each task has it s own local memory The trick (for the programmer): Remember it s parallel Remember what s in what memory Input Output

33 Why the Data Parallel Model? Only one program to worry about. Easier to debug program. Easier to visualize program behavior. Naturally load balances (sometimes).

34 Introduction to MPI MPI is a standard for message passing interfaces MPI-1 covers point-to-point and collective communication Point-to-point: Explicit Messages (send/receive) Collective: (Express Patterns of Communication) MPI-2 covers connection based communication and I/O Typical implementations include MPICH, LAMMPI, and OpenMPI

35 MPI in Six Functions MPI_Init - start using MPI MPI_Comm_size - get the number of tasks MPI_Comm_rank - the unique index of this task MPI_Send - send a message MPI_Recv - receive a message MPI_Finalize - stop using MPI

36 Initialize and Finalize The first MPI call must be to MPI_Init. The last MPI call must be to MPI_Finalize. #include <mpi.h> main(int argc, int **argv) { } MPI_Init(&argc, &argv ); // put program here MPI_Finalize();

37 Initialize and Finalize int MPI_Init(int *argc,char ***argv); int MPI_Finalize(); MPI_INIT(ierror) integer ierror MPI_FINALIZE(ierror) integer ierror void MPI::Init(int& argc,char**& argv); void MPI::Finalize();

38 Size and Rank MPI_Comm_size returns the number of tasks in the job int size; MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank returns the number of the current task (0.. size-1) int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank);

39 MPI Communicators Abstract structure represents a group of MPI tasks that can communicate MPI_COMM_WORLD represents all of the tasks in a given job Programmer can create new communicators to subset MPI_COMM_WORLD RANK or task number is relative to a given communicator Messages from different communicators do not interfere

40 A Simple Example #include "mpi.h" #include <stdio.h> int main(int argc, char *argv[] ) { int rank, size; } MPI_Init( &argc,&argv); MPI_Comm_rank( MPI_COMM_WORLD, &rank); MPI_Comm_size( MPI_COMM_WORLD, &size); printf("hello world from process %d of %d \n",rank,size); MPI_Finalize(); return 0;

41 Send and Recv MPI_Send to send a message char sbuf[count]; MPI_Send(sbuf, COUNT, MPI_CHAR, 1, 99, MPI_COMM_WORLD); MPI_Recv to receive a message char rbuf[count]; MPI_Status status; MPI_Recv(rbuf, COUNT, MPI_CHAR, 1, 99, MPI_COMM_WORLD, &status);

42 Anatomy of MPI_Recv MPI_Recv(rbuf, COUNT, MPI_CHAR, 1, 99, MPI_COMM_WORLD, &status); rbuf : pointer to receive buffer COUNT : items in receive buffer MPI_CHAR : MPI datatype 1 : source task number (rank) 99 : message tag MPI_COMM_WORLD : communicator Status : pointer to status struct

43 MPI Datatypes Encodes type of data sent and received Built-in types MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE MPI_BYTE, MPI_PACKED User defined types MPI_Type_contiguous, MPI_Type_vector, MPI_Type_indexed, MPI_Type_struct MPI_Pack, MPI_Unpack

44 A Quick Send and Receive Example #include "mpi.h" #include <stdio.h> int main(int argc, char *argv[] ) { int numprocs, myrank, namelen, i; char processor_name[mpi_max_processor_name]; char greeting[mpi_max_processor_name + 80]; MPI_Status status; MPI_Init( &argc,&argv); MPI_Comm_rank( MPI_COMM_WORLD, &myrank); MPI_Comm_size( MPI_COMM_WORLD, &numprocs); MPI_Get_Processor_name( processor_name, &namelen); sprintf(greeting,"hello world from process %d of %d on %s \n",myrank,numpro, processor_name);

45 A Quick Send and Receive Example } if (myrank == 0) { printf("%s\n", greeting); for(i=1;i<numprocs;i++) { MPI_Recv(greeting,sizeof(greeting), MPI_CHAR, i, 1, MPI_COMM_WORL printf("%s\n", greeting); } } else { MPI_Send(greeting, strlen( greeting) +1, MPI_CHAR, 0,1,MPI_COMM_WOR } MPI_Finalize(); return( 0);

46 Collective Operations Rather than dealing with individual messages, express common patterns of communication Simpler coding Hide optimization Hide cluster topology details Called at the same time by every task in the communicator (no if/else) - true data parallel

47 Common Collectives Broadcast / Reduce Scatter / Gather Barrier All-to-all

48 Sending to 8 nodes with a for loop and MPI_Send vs. MPI_Broadcast

49 Message Passing Cautions All messages are overhead (the non-parallel program wouldn t have them). Messages take substantial time Use them only when necessary, and group together as many as possible (long blocks of computation between communication raises performance!).

50

51 Parallelism in Monte Carlo Methods Monte Carlo methods often amenable to parallelism Find an estimate about p times faster OR Reduce error of estimate by p1/2 The trick to parallelizing MC methods is developing independent random number generators!!!

52 Linear Congruential RNGs X i = (a X i 1 + c)mod M

53 Linear Congruential RNGs X i = (a X i 1 + c)mod M Multiplier

54 Linear Congruential RNGs X i = (a X i 1 + c)mod M Multiplier Additive constant

55 Linear Congruential RNGs X i = (a X i 1 + c)mod M Modulus Multiplier Additive constant

56 Linear Congruential RNGs X i = (a X i 1 + c)mod M Modulus Multiplier Additive constant Sequence depends on choice of seed, X 0

57 Period of Linear Congruential RNG Maximum period is M For 32-bit integers maximum period is 232, or about 4 billion This is too small for modern computers Use a generator with at least 48 bits of precision

58 Producing Floating-Point Numbers X i, a, c, and M are all integers X i s range in value from 0 to M-1 To produce floating-point numbers in range [0, 1), divide X i by M

59 Defects of Linear Congruential RNGs Least significant bits correlated Especially when M is a power of 2 k-tuples of random numbers form a lattice Especially pronounced when k is large

60 Lagged Fibonacci RNGs

61 Lagged Fibonacci RNGs "p and q are lags, p > q

62 Lagged Fibonacci RNGs "p and q are lags, p > q "* is any binary arithmetic operation

63 Lagged Fibonacci RNGs "p and q are lags, p > q "* is any binary arithmetic operation "Addition modulo M

64 Lagged Fibonacci RNGs "p and q are lags, p > q "* is any binary arithmetic operation "Addition modulo M "Subtraction modulo M

65 Lagged Fibonacci RNGs "p and q are lags, p > q "* is any binary arithmetic operation "Addition modulo M "Subtraction modulo M "Multiplication modulo M

66 Lagged Fibonacci RNGs "p and q are lags, p > q "* is any binary arithmetic operation "Addition modulo M "Subtraction modulo M "Multiplication modulo M "Bitwise exclusive or

67 Properties of Lagged Fibonacci RNGs Require p seed values Careful selection of seed values, p, and q can result in very long periods and good randomness For example, suppose M has b bits Maximum period for additive lagged Fibonacci RNG is (2 p -1)2 b-1

68 Ideal Parallel RNGs All properties of sequential RNGs No correlations among numbers in different sequences Scalability Locality

69 Parallel RNG Designs Manager-worker Leapfrog Sequence splitting Independent sequences

70 Manager-Worker Parallel RNG Manager process generates random numbers Worker processes consume them If algorithm is synchronous, may achieve goal of consistency Not scalable Does not exhibit locality

71 Leapfrog Method

72 Leapfrog Method Process with rank 1 of 4 processes

73 Leapfrog Method Process with rank 1 of 4 processes

74 Leapfrog Method Process with rank 1 of 4 processes

75 Leapfrog Method Process with rank 1 of 4 processes

76 Leapfrog Method Process with rank 1 of 4 processes

77 Leapfrog Method Process with rank 1 of 4 processes

78 Leapfrog Method Process with rank 1 of 4 processes

79 Properties of Leapfrog Method Easy modify linear congruential RNG to support jumping by p Can allow parallel program to generate same tuples as sequential program Does not support dynamic creation of new random number streams

80 Sequence Splitting

81 Sequence Splitting Process with rank 1 of 4 processes

82 Sequence Splitting Process with rank 1 of 4 processes

83 Sequence Splitting Process with rank 1 of 4 processes

84 Sequence Splitting Process with rank 1 of 4 processes

85 Sequence Splitting Process with rank 1 of 4 processes

86 Sequence Splitting Process with rank 1 of 4 processes

87 Sequence Splitting Process with rank 1 of 4 processes

88 Properties of Sequence Splitting Forces each process to move ahead to its starting point Does not support goal of reproducibility May run into long-range correlation problems Can be modified to support dynamic creation of new sequences

89 Independent Sequences Run sequential RNG on each process Start each with different seed(s) or other parameters Example: linear congruential RNGs with different additive constants Works well with lagged Fibonacci RNGs Supports goals of locality and scalability

90 Best Approach - Use an Existing Library SPRNG (Scalable Parallel Random Number Generator) from Florida State is an MPI based library for generating random numbers independently Linear congruential generator on one node provides seed values for lagged fib generators on other nodes Ridiculously long period, good statistical properties SPRNG is simple and robust, and is highly recommended.

91 SPRNG Example See code listings: sprng_mpi.c seed_mpi.c 2streams_mpi.c pi-simple_mpi.c

92 Parting Note... If you are doing anything special (massive runs, massive storage, massive memory, meeting deadlines, non-traditional usage), please contact us and let us work with you to meet your needs. Policies are there to keep automated systems running well, they are not locked in stone.

93 How to Get More Help Online If something isn t there, fill out a service request to ask for help (same form as account request) Someone will respond next business day hpc@asu.edu Phone -- Leah Kritzer - The HPCI front desk More lectures would be fun: Short courses offered again soon CSE 494/598 (SP07) - a one semester course in MPI and HPC

94 Grids Loosely coupled sets of HPC (and other) compute resources Workstation Grid Portal No centralized control Middleware moves jobs to resources A way to share resources Cluster 1 Cluster 2 SMP Database Server

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism

More information

Lightning Introduction to MPI Programming

Lightning Introduction to MPI Programming Lightning Introduction to MPI Programming May, 2015 What is MPI? Message Passing Interface A standard, not a product First published 1994, MPI-2 published 1997 De facto standard for distributed-memory

More information

HPCC - Hrothgar Getting Started User Guide MPI Programming

HPCC - Hrothgar Getting Started User Guide MPI Programming HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

High Performance Computing

High Performance Computing High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or

More information

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1 Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion

More information

Parallelization: Binary Tree Traversal

Parallelization: Binary Tree Traversal By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

High performance computing systems. Lab 1

High performance computing systems. Lab 1 High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management:

More information

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

MPI Application Development Using the Analysis Tool MARMOT

MPI Application Development Using the Analysis Tool MARMOT MPI Application Development Using the Analysis Tool MARMOT HLRS High Performance Computing Center Stuttgart Allmandring 30 D-70550 Stuttgart http://www.hlrs.de 24.02.2005 1 Höchstleistungsrechenzentrum

More information

Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

More information

Systolic Computing. Fundamentals

Systolic Computing. Fundamentals Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Distributed Systems 15-319, spring 2010 11 th Lecture, Feb 16 th Majd F. Sakr Lecture Motivation Understand Distributed Systems Concepts Understand the concepts / ideas

More information

Analysis and Implementation of Cluster Computing Using Linux Operating System

Analysis and Implementation of Cluster Computing Using Linux Operating System IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 2, Issue 3 (July-Aug. 2012), PP 06-11 Analysis and Implementation of Cluster Computing Using Linux Operating System Zinnia Sultana

More information

AMD Opteron Quad-Core

AMD Opteron Quad-Core AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1

Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1 Lecture 6: Introduction to MPI programming Lecture 6: Introduction to MPI programming p. 1 MPI (message passing interface) MPI is a library standard for programming distributed memory MPI implementation(s)

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

An Introduction to Parallel Computing/ Programming

An Introduction to Parallel Computing/ Programming An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European

More information

Basic Concepts in Parallelization

Basic Concepts in Parallelization 1 Basic Concepts in Parallelization Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Parallel Programming with MPI on the Odyssey Cluster

Parallel Programming with MPI on the Odyssey Cluster Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: plamenkrastev@fas.harvard.edu FAS Research Computing Harvard University Objectives: To introduce you

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC dholmes@epcc.ed.ac.uk

McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC dholmes@epcc.ed.ac.uk McMPI Managed-code MPI library in Pure C# Dr D Holmes, EPCC dholmes@epcc.ed.ac.uk Outline Yet another MPI library? Managed-code, C#, Windows McMPI, design and implementation details Object-orientation,

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Parallel Computing. Parallel shared memory computing with OpenMP

Parallel Computing. Parallel shared memory computing with OpenMP Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

Introduction to Hybrid Programming

Introduction to Hybrid Programming Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming

More information

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

QUADRICS IN LINUX CLUSTERS

QUADRICS IN LINUX CLUSTERS QUADRICS IN LINUX CLUSTERS John Taylor Motivation QLC 21/11/00 Quadrics Cluster Products Performance Case Studies Development Activities Super-Cluster Performance Landscape CPLANT ~600 GF? 128 64 32 16

More information

x64 Servers: Do you want 64 or 32 bit apps with that server?

x64 Servers: Do you want 64 or 32 bit apps with that server? TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Recommended hardware system configurations for ANSYS users

Recommended hardware system configurations for ANSYS users Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

Workshare Process of Thread Programming and MPI Model on Multicore Architecture Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability

More information

Many-task applications in use today with a look toward the future

Many-task applications in use today with a look toward the future Many-task applications in use today with a look toward the future Alan Gara IBM Research Lots of help form Mark Megerian, IBM 1 Outline Review of Many-Task motivations on supercomputers and observations

More information

On-Demand Supercomputing Multiplies the Possibilities

On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server

More information

White Paper The Numascale Solution: Extreme BIG DATA Computing

White Paper The Numascale Solution: Extreme BIG DATA Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer

More information

Benchmarking Large Scale Cloud Computing in Asia Pacific

Benchmarking Large Scale Cloud Computing in Asia Pacific 2013 19th IEEE International Conference on Parallel and Distributed Systems ing Large Scale Cloud Computing in Asia Pacific Amalina Mohamad Sabri 1, Suresh Reuben Balakrishnan 1, Sun Veer Moolye 1, Chung

More information

Session 2: MUST. Correctness Checking

Session 2: MUST. Correctness Checking Center for Information Services and High Performance Computing (ZIH) Session 2: MUST Correctness Checking Dr. Matthias S. Müller (RWTH Aachen University) Tobias Hilbrich (Technische Universität Dresden)

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)

Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5) Introduction Reading Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF 1 Programming Assignment Notes Assume that memory is limited don t replicate the board on all nodes Need to provide load

More information

BLM 413E - Parallel Programming Lecture 3

BLM 413E - Parallel Programming Lecture 3 BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Lecture 1: the anatomy of a supercomputer

Lecture 1: the anatomy of a supercomputer Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT numascale Hardware Accellerated Data Intensive Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad www.numascale.com Supemicro delivers 108 node system with Numascale

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

Parallel Processing and Software Performance. Lukáš Marek

Parallel Processing and Software Performance. Lukáš Marek Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

Improved LS-DYNA Performance on Sun Servers

Improved LS-DYNA Performance on Sun Servers 8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

Full and Para Virtualization

Full and Para Virtualization Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

supercomputing. simplified.

supercomputing. simplified. supercomputing. simplified. INTRODUCING WINDOWS HPC SERVER 2008 R2 SUITE Windows HPC Server 2008 R2, Microsoft s third-generation HPC solution, provides a comprehensive and costeffective solution for harnessing

More information

Lattice QCD Performance. on Multi core Linux Servers

Lattice QCD Performance. on Multi core Linux Servers Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012 Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................

More information

Design and Implementation of the Heterogeneous Multikernel Operating System

Design and Implementation of the Heterogeneous Multikernel Operating System 223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,

More information

CPU Organisation and Operation

CPU Organisation and Operation CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program

More information