Symmetric Multiprocessing

Similar documents
High Performance Computing. Course Notes HPC Fundamentals

Lecture 23: Multiprocessors

Client/Server Computing Distributed Processing, Client/Server, and Clusters

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Principles and characteristics of distributed systems and environments

How To Understand The Concept Of A Distributed System

CMSC 611: Advanced Computer Architecture

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Chapter 18: Database System Architectures. Centralized Systems

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

Chapter 1: Introduction. What is an Operating System?

Client/Server and Distributed Computing

Parallel Programming Survey

Distributed Systems LEEC (2005/06 2º Sem.)

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Introduction to Cloud Computing

Lecture 2 Parallel Programming Platforms

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

MOSIX: High performance Linux farm

Multi-core and Linux* Kernel

Cloud Computing through Virtualization and HPC technologies

Rackspace Cloud Databases and Container-based Virtualization

Virtual machine interface. Operating system. Physical machine interface

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

General Overview of Shared-Memory Multiprocessor Systems

Chapter 16 Distributed Processing, Client/Server, and Clusters

LS DYNA Performance Benchmarks and Profiling. January 2009

Operating Systems 4 th Class

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

1 Organization of Operating Systems

IOS110. Virtualization 5/27/2014 1

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Multi-core Programming System Overview

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

nanohub.org An Overview of Virtualization Techniques

OpenMP Programming on ScaleMP

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

CHAPTER 1 INTRODUCTION

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Parallel Algorithm Engineering

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Design and Implementation of the Heterogeneous Multikernel Operating System

Operating System Multilevel Load Balancing

LinuxWorld Conference & Expo Server Farms and XML Web Services

Parallel Programming

Database Hardware Selection Guidelines

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

High Performance Cluster Support for NLB on Window

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

Chapter 2 Parallel Architecture, Software And Performance

White Paper The Numascale Solution: Extreme BIG DATA Computing

Overview and History of Operating Systems

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Big Data Processing: Past, Present and Future

- An Essential Building Block for Stable and Reliable Compute Clusters

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Optimizing Shared Resource Contention in HPC Clusters

Distributed Operating Systems

Multi-core Curriculum Development at Georgia Tech: Experience and Future Steps

Distributed Operating Systems. Cluster Systems

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Chapter 2 Parallel Computer Architecture

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

SERVER CLUSTERING TECHNOLOGY & CONCEPT

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Scaling Study of LS-DYNA MPP on High Performance Servers

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Multilevel Load Balancing in NUMA Computers

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Chapter 1 Computer System Overview

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

benchmarking Amazon EC2 for high-performance scientific computing

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

Azul Compute Appliances

Simple Introduction to Clusters

Distributed Architecture of Oracle Database In-memory

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

A Survey of Parallel Processing in Linux

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Transcription:

Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called cores in this sense) have been attached. Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package. A dual-core processor contains two cores, a quad-core processor contains four cores, and a hexa-core processor contains six cores

Symmetric Multiprocessing

Symmetric Multiprocessing Symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory and connect via a bus SMPs generally do not comprise more than 32 processors. Because of the small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely costeffective, provided that a sufficient amount of memory bandwidth exists

Symmetric Multiprocessor (SMP) Memory: centralized with uniform access time ( uma ) and bus interconnect Examples: Sun Enterprise 5000, SGI Challenge, Intel SystemPro

Decentralized Memory versions 1. Shared Memory with "Non Uniform Memory Access" time (NUMA) 2. Message passing "multicomputer" with separate address space per processor Can invoke software with Remote Procedue Call (RPC) Often via library, such as MPI: Message Passing Interface Also called "Syncrohnous communication" since communication causes synchronization between 2 processes 3. Software DSM A level of o/s built on top of message passing multiprocessor to give a shared memory view to the programmer.

Distributed Directory MPs

Communication Models Shared Memory Processors communicate with shared address space Easy on small-scale machines Advantages: Model of choice for uniprocessors, small-scale MPs Ease of programming Lower latency Easier to use hardware controlled caching Message passing Processors have private memories, communicate via messages Advantages: Less hardware, easier to design Good scalability Focuses attention on costly non-local operations Virtual Shared Memory (VSM) also called Software DSM o A level of o/s built on top of message passing multiprocessor to give a shared memory view to the programmer.

Shared Address/Memory Multiprocessor Model Communicate via Load and Store Oldest and most popular model Based on timesharing: processes on multiple processors vs. sharing single processor process: a virtual address space and ~ 1 thread of control Multiple processes can overlap (share), but ALL threads share a process address space Writes to shared address space by one thread are visible to reads of other threads Usual model: share code, private stack, some shared heap, some private heap

Advantages shared-memory communication model Compatibility with SMP hardware Ease of programming when communication patterns are complex or vary dynamically during execution Ability to develop apps using familiar SMP model, attention only on performance critical accesses Lower communication overhead, better use of BW for small items, due to implicit communication and memory mapping to implement protection in hardware, rather than through I/O system HW-controlled caching to reduce remote comm. by caching of all data, both shared and private.

Message Passing Model Whole computers (CPU, memory, I/O devices) communicate as explicit I/O operations Essentially NUMA but integrated at I/O devices vs. memory system Send specifies local buffer + receiving process on remote computer Receive specifies sending process on remote computer + local buffer to place data Usually send includes process tag and receive has rule on tag: match 1, match any Synch: when send completes, when buffer free, when request accepted, receive wait for send Send+receive => memory-memory copy, where each supplies local address, AND does pair-wise synchronization!

Advantages message-passing communication model The hardware can be simpler Communication explicit => simpler to understand; in shared memory it can be hard to know when communicating and when not, and how costly it is Explicit communication focuses attention on costly aspect of parallel computation, sometimes leading to improved structure in multiprocessor program Synchronization is naturally associated with sending messages, reducing the possibility for errors introduced by incorrect synchronization Easier to use sender-initiated communication, which may have some advantages in performance

Decentralized Memory Types also known as A distributed computer Types: 1. Cluster computing 2. Massive parallel processing 3. Grid computing

Cluster Definition Group of computers and servers (connected together) that act like a single system. Each system called a Node. Node contain one or more Processor, Ram,Hard disk and LAN card. Nodes work in Parallel. 13

Cluster types Load Balancing Cluster. Computing Cluster(Parallel sequence alignment). High-availability (HA) clusters. 14

Cluster types: Load Balancing Cluster Task 15

07/14/08 A load balancing cluster with two servers and 4 user stations

Load-balancing clusters are configurations in which cluster-nodes share computational workload to provide better overall performance. For example, a web server cluster may assign different queries to different nodes, so the overall response time will be optimized

Cluster types: Computing Cluster Task 18

"Computer clusters" are used for computation-intensive purposes, rather than handling IO-oriented operations such as web service or databases.

Cluster type: High-availability Clusters 20

"High-availability clusters improve the availability of the cluster approach. They operate by having redundant nodes, which are then used to provide service when system components fail

Cluster advantages Performance. Scalability. Maintenance. Cost. 22

Massive Parallel Processing(MPP) MPP is a single computer with many networked processors. MPPs have many of the same characteristics as clusters, but MPPs have specialized interconnect networks MPPs also tend to be larger than clusters, typically having "far more" than 100 processors. In an MPP, "each CPU contains its own memory and copy of the operating system and application.

Grid Computing Grid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely coupled computers, acting in concert to perform very large tasks. Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates the executions of large-scale resource intensive applications on geographically distributed computing resources. Facilitates flexible, secure, coordinated large scale resource sharing among dynamic collections of individuals, institutions, and resource Enable communities ( virtual organizations ) to share geographically distributed resources as they pursue common goals Ian Foster and Carl Kesselman

Grid Computing, Cont. An embarrassingly parallel problem is one for which little or no effort is required to separate the problem into a number of parallel tasks This is often the case where there exists no dependency (or communication) between those parallel tasks