Symmetric Multiprocessing

Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called cores in this sense) have been attached. Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package. A dual-core processor contains two cores, a quad-core processor contains four cores, and a hexa-core processor contains six cores

Symmetric Multiprocessing Symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory and connect via a bus SMPs generally do not comprise more than 32 processors. Because of the small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely costeffective, provided that a sufficient amount of memory bandwidth exists

Symmetric Multiprocessor (SMP) Memory: centralized with uniform access time ( uma ) and bus interconnect Examples: Sun Enterprise 5000, SGI Challenge, Intel SystemPro

Decentralized Memory versions 1. Shared Memory with "Non Uniform Memory Access" time (NUMA) 2. Message passing "multicomputer" with separate address space per processor Can invoke software with Remote Procedue Call (RPC) Often via library, such as MPI: Message Passing Interface Also called "Syncrohnous communication" since communication causes synchronization between 2 processes 3. Software DSM A level of o/s built on top of message passing multiprocessor to give a shared memory view to the programmer.

Distributed Directory MPs

Communication Models Shared Memory Processors communicate with shared address space Easy on small-scale machines Advantages: Model of choice for uniprocessors, small-scale MPs Ease of programming Lower latency Easier to use hardware controlled caching Message passing Processors have private memories, communicate via messages Advantages: Less hardware, easier to design Good scalability Focuses attention on costly non-local operations Virtual Shared Memory (VSM) also called Software DSM o A level of o/s built on top of message passing multiprocessor to give a shared memory view to the programmer.

Shared Address/Memory Multiprocessor Model Communicate via Load and Store Oldest and most popular model Based on timesharing: processes on multiple processors vs. sharing single processor process: a virtual address space and ~ 1 thread of control Multiple processes can overlap (share), but ALL threads share a process address space Writes to shared address space by one thread are visible to reads of other threads Usual model: share code, private stack, some shared heap, some private heap

Advantages shared-memory communication model Compatibility with SMP hardware Ease of programming when communication patterns are complex or vary dynamically during execution Ability to develop apps using familiar SMP model, attention only on performance critical accesses Lower communication overhead, better use of BW for small items, due to implicit communication and memory mapping to implement protection in hardware, rather than through I/O system HW-controlled caching to reduce remote comm. by caching of all data, both shared and private.

Message Passing Model Whole computers (CPU, memory, I/O devices) communicate as explicit I/O operations Essentially NUMA but integrated at I/O devices vs. memory system Send specifies local buffer + receiving process on remote computer Receive specifies sending process on remote computer + local buffer to place data Usually send includes process tag and receive has rule on tag: match 1, match any Synch: when send completes, when buffer free, when request accepted, receive wait for send Send+receive => memory-memory copy, where each supplies local address, AND does pair-wise synchronization!

Advantages message-passing communication model The hardware can be simpler Communication explicit => simpler to understand; in shared memory it can be hard to know when communicating and when not, and how costly it is Explicit communication focuses attention on costly aspect of parallel computation, sometimes leading to improved structure in multiprocessor program Synchronization is naturally associated with sending messages, reducing the possibility for errors introduced by incorrect synchronization Easier to use sender-initiated communication, which may have some advantages in performance

Decentralized Memory Types also known as A distributed computer Types: 1. Cluster computing 2. Massive parallel processing 3. Grid computing

Cluster Definition Group of computers and servers (connected together) that act like a single system. Each system called a Node. Node contain one or more Processor, Ram,Hard disk and LAN card. Nodes work in Parallel. 13

Cluster types Load Balancing Cluster. Computing Cluster(Parallel sequence alignment). High-availability (HA) clusters. 14

Cluster types: Load Balancing Cluster Task 15

07/14/08 A load balancing cluster with two servers and 4 user stations

Load-balancing clusters are configurations in which cluster-nodes share computational workload to provide better overall performance. For example, a web server cluster may assign different queries to different nodes, so the overall response time will be optimized

Cluster types: Computing Cluster Task 18

"Computer clusters" are used for computation-intensive purposes, rather than handling IO-oriented operations such as web service or databases.

Cluster type: High-availability Clusters 20

"High-availability clusters improve the availability of the cluster approach. They operate by having redundant nodes, which are then used to provide service when system components fail

Cluster advantages Performance. Scalability. Maintenance. Cost. 22

Massive Parallel Processing(MPP) MPP is a single computer with many networked processors. MPPs have many of the same characteristics as clusters, but MPPs have specialized interconnect networks MPPs also tend to be larger than clusters, typically having "far more" than 100 processors. In an MPP, "each CPU contains its own memory and copy of the operating system and application.

Grid Computing Grid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely coupled computers, acting in concert to perform very large tasks. Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates the executions of large-scale resource intensive applications on geographically distributed computing resources. Facilitates flexible, secure, coordinated large scale resource sharing among dynamic collections of individuals, institutions, and resource Enable communities ( virtual organizations ) to share geographically distributed resources as they pursue common goals Ian Foster and Carl Kesselman

Grid Computing, Cont. An embarrassingly parallel problem is one for which little or no effort is required to separate the problem into a number of parallel tasks This is often the case where there exists no dependency (or communication) between those parallel tasks