Distributed (Operating) Systems. Introduction

Size: px
Start display at page:

Download "Distributed (Operating) Systems. Introduction"

Transcription

1 Distributed (Operating) Systems Introduction

2 Distributed Operating Systems 1 Schedule Sessions 1. Introduction: Distributed systems (Hardware/Software issues) 2. Process management in clusters: Load balancing and job scheduling 3. Distributed communications 4. Distributed services Scenarios High-performance solutions for scientific applications (process management) Distributed systems for transactional services 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 Mon Tue 3-comm 1-Intro 4-serv LUNCH 2-proc Scenario 2 Scenario 1

3 Distributed Operating Systems 2 Bibliography Distributed Systems: Concepts and Design G. Coulouris, J. Dollimore, T. Kindberg; Addison-Wesley, 2001 Distributed Systems: Principles and Paradigms A. S. Tanenbaum, M. Van Steen; Prentice-Hall, 2007 Distributed Operating Systems: Concepts & Practice D. L. Galli; Prentice-Hall, 2000 Distributed Operating Systems & Algorithms R. Chow, T. Johnson; Addison-Wesley, 1997 Distributed Computing: Principles and Applications M.L. Liu; Addison-Wesley, 2004

4 Distributed (Operating) Systems Introduction and Concepts

5 Distributed Operating Systems 4 Distributed System (DS) Hardware: Network-connected processor without shared physical memory: Loosely-coupled system Non-common clock Processor-dependent I/O systems Independent failures of system components Heterogeneous system Goal of this seminar: Distributed System Software Distributed Operating Systems (classical view) Software interface that hide distributed system complexity: Single System Image

6 Distributed Operating Systems 5 Advantages and Drawbacks Advantages: Cost/performance ratio Parallel processing: high performance Fault tolerance: high availability Scalable, open and heterogeneous Most appropriate for originally distributed applications E.g., geographically distributed enterprise Drawbacks: More complex software development Networks connection problems: latency, bandwidth and availability Security

7 Distributed Operating Systems 6 New Paradigms for DS Cluster Computing: Dedicated systems: High performance. High availability. Homogeneous system: Nodes. LAN (generalist or specific). Open issues: Coupling degree, distributed services. Gird Computing: Resource sharing and idle processor usage. Restricted to some specific tasks. Different scopes: Inter-departmental grids. Inter-organization grids. Open issues: Coordination, security and dynamic changes.

8 Operating System Support 1. OS for Distributed Systems: Requirements Characteristics 2. Distributed Systems 3. Parallel/Distributed OS: Operating Systems Parallelisation Distributed System Services Microkernels Operating System Support 7

9 Distributed Architectures A distributed system is a collection of independent computers presented to the user as a single computer. Distributed Computer Architectures: Flynn 72: SISD, SIMD, MISD, MIMD Johnson 88: UMA, NUMA, NORMA Operating System Support 8

10 Distributed System Application Internet Services: , news, web,... Corporate networks or intranets. Parallel processing: Massive processing (+efficiency). Distributed topology (distributed-nature problems) Distributed massive data management. High performance multimedia. Industrial and control systems. Real-time systems. <and many others...> Operating System Support 9

11 Distributed System Profile Distributed systems have: 1. No common clock: Message and co-ordination aspects. 2. Global concurrency: Real parallel execution. 3. Independent failures: Partial failures. Distributed system usage: 1. Collaborative processing: combined features and services. 2. Parallel processing: massive or high-performance calculation. Operating System Support 10

12 System Requirements Collaborative systems Openness Scalability Reliability Transparency Security Parallel systems Performance Scalability Reliability Transparency Security Common characteristics but different hardware platforms and applications. All of them DISTRIBUTED Operating System Support 11

13 Operating System Distribution Operating systems for multiprocessors with shared memory (SMP): Software tightly coupled Hardware tightly coupled Distributed operating systems (DOS): Software tightly coupled Hardware loosely coupled Network operating system: Software loosely coupled Hardware loosely coupled Operating System Support 12

14 Operating Systems for SMPs Architectures with multiple processors (2 to 8) with uniform access shared memory (SMP: Symmetric Multiprocessors) Characteristics: Small variations of the traditional OS versions. There is only one copy of the OS. Concurrency with real parallelism ( shared time). Commercial versions (Linux, WinNT, Solaris, AIX,...). Different problems: kernel code running on multiple processors (concurrent system calls), synchronisation mechanisms (spin-locks), optimisation and scheduling (processor affinity),... Operating System Support 13

15 Distributed Operating Systems (DOS) A distributed operating system is a group of processor interconnected by a communication network that hides its complexity presenting to the user a virtual uniprocessor. Characteristics: It runs on a distributed systems making them appear as a centralised system. Transparency: Must hide complex factor of the distribution. It is easier to say than to do. This goal is reached partially by the experimental systems. Failures make the users comply. Operating System Support 14

16 Distributed Operating Systems (DOS) Problems: Each node has a copy of the OS: Which tasks are performed locally and which globally? How mutual exclusion is achieved without shared memory? How deadlocks are detected without global states? Process scheduling: Each operating system copy has an own task queue (process migration). How a single directory tree is defined? Problems due to no-common clock, partial failures and heterogeneity. Main result: New concepts have been developed and they are useful for other domains. Operating System Support 15

17 DOS Evolution First network operating systems: New network services in a conventional OS E.g.: UNIX 4BSD ( 1980) New network functionalities: Sun s ONC ( 1985): includes NFS, RPC, NIS First DOS: New OS based on conventional (monolithic) versions. E.g.: Sprite, University of Berkeley ( 1988) DOS based on μ-kernel. E.g.: Mach, CMU ( 1986) Amoeba, designed by Tanenbaum ( 1984) Chorus, INRIA, France ( 1988) Operating System Support 16

18 Network Operating Systems Network of computers loosely coupled that share resources with no external control on the hardware/software of each node. Characteristics: No virtual uniprocessor vision is presented (independent nodes). Each node runs a copy of the OS (different). Conventional OS+ network utilities. Communication protocols for resource sharing and high-level service access. From rcp/rlogin to Sun s Open Network Computing (ONC). Operating System Support 17

19 Cooperative Systems High-level services-oriented software systems that requires communication mechanisms to build upper level services. Characteristics: A grade of transparency is provided but the single-system vision is not presented. Autonomous independent systems. They are founded on middlewares (CORBA, DCE, COM+,...) These systems are designed as a combination of multiple services offered by different network elements. Operating System Support 18

20 Middleware Middleware: Software layer over the operating system that provides standard distributed services. Open systems independent of the vendor. Hardware and OS independent. Examples: DCE (Open Group). CORBA (OMG).... Middleware OS OS OS Operating System Support Hardware Hardware Hardware 19

21 Single System Image (SSI) The illusion, created by hardware/software, that presents a collection of resources as one. Hardware SSI: DEC Memory Channel or SMPs Operating System: DOS or Gluing layer Application and Services: Middlewares (many levels). Every SSI has a boundary. Operating System Support 20

22 Why SSI is useful? It is easy to program/use: Traditional programming, known interfaces. Low-level issues hidden. Allows centralized and distributed management depending on task requirement. (Potentially) provides: Fault tolerance. Scalability. Modular improvement. Operating System Support 21

23 Operating System Layers A simplified vision of an Operating System has the following layers: Hardware. Kernel. System services. Application programs. Users. Users Applications Services Kernel Hardware Operating System Support 22

24 Kernel Responsabilities Services Kernel Computer Monolithic Kernels: Many OS functionalities inside the kernel scheduler, memory manager, drivers, file systems... μ Kernels: Many OS tasks are performed outside the kernel. Remaining: (i) process communication, (ii) memory management, (iii) low-level management and scheduling y (iv) low-level i/o Services μ Kernel Computer Services μ Kernel μ Kernel μ Kernel Operating System Support Distributed Services: Distributed system structure. Depending on the level: Distributed operating systems Network operating systems or (Cooperative). 23

25 Operating System on Distributed Systems MPPs SMPs Clusters Distributed Size 100s 1000s 10s 100s or less 10s 1000s OS N x kernels Single OS kernel N x OS platforms N x OS platforms OS type Specific purpose Special variants of standard OSs Standard OS plus tools (not always) Standard OS and special tools Communic. Message / DSM Shared Memory Scheduling Single queue Single queue Message passing (e.g.: MPI) Multiple queues coordinated Message passing or middleware Independent queues Operating System Support Single System Image (SSI) 24

26 Tools for Distributed/Cluster Systems Operating system: Modular/Layered Monolithic Based on μ-kernels Runtime systems: Parallel file systems or I/O libaries Distributed shared memory software Resource management: Process scheduling tools Load balancing Applications: Management and administration tools. Processing tasks and jobs Operating System Support 25

27 Distributed Operating Systems Hardware and Software Overview

28 Distributed Operating Systems 27 Concept of Cluster Alternative to traditional supercomputing facilities. Instead of traditional systems: Specific hardware. High-cost. Slow hardware development. Painful software development. the use of general-purpose systems provides: Commodity hardware (Commercial-off-the-self: COTS). Moderate-cost. Fast hardware development. Even more painful software development.

29 Distributed Operating Systems 28 Concept of Cluster Cluster: Hardware system based on commodity hardware connected by a dedicated (high-performance) network. Nodes: PCs or workstations (SMPs). Network: From high-speed networks to specific hardware. Mysterious acronyms: PoPCs: Pile of PCs COWs: Clusters of workstations CLUMPS: Clusters of multiprocessors NOWs: Networks of workstations...

30 Distributed Operating Systems 29 Hardware Characteristics Nodes: Processor: Intel Pentium, AMD Athlon, Compaq Alpha, IBM PowerPC, Sun SuperSparc (3-4...Ghz) Memory: SDRAM, DDR or similar (2-8 GB) Storage: SCSI or RAID Network: Key element. It could cost 50+% of the system value Cheap alternative: Ethernet ( Mb/seg)

31 Distributed Operating Systems 30 Cluster Networks (I) General purpose network technologies: Improvement in network bandwidth. Only reduced improvements in the latency Not well-suited Low-latency protocols: Active Messages (Berkeley): Zero-copy synchronous model. GAM. Fast Messages (Illinois): Reliable AM in order. VMMC (Princeton): Distributed shared memory pages (DSM). U-net (Cornell): Virtual interfaces for memory pages. BIP (ENS Lyon): Low-latency basic interface.

32 Distributed Operating Systems 31 Cluster Networks (II) Cluster communication standards: VIA: Hardware interface (native/emulated) for communications. Mpas physical memory regions and virtual network interfaces. MPI versions over VIA. InfiniBand: I/O hardware standard (2.5Gbps) using one-way connections. 6 Communication models. Using RDMA and IPv6. Network hardware: Ethernet, FastEthernet, GigaEthernet: Cheap but limited. Collision problems. VIA emulations. Giganet (clan): Implementation over VIA (1.26Gbps) Myrinet: Low-latency programmable networks. Cut-through routing and failure detection. GM protocol. Others: QsNet, ServerNet, SCI, ATM, FiberChannel, HIPPI, ATOLL,...

33 Distributed Operating Systems 32 Technologies Comparative Gigabit Ethernet Giganet Myrinet QsNet SCI ServerNet2 MPI badwidth stable (MB/sec) MPI latency (μseg) ~ Maximum number of nodes 1000 s 1000 s 1000 s 1000 s 1000 s 64k VIA support Win/Linux Win/Linux Over GM NOne Software Hardware MPI support type MPICH over MVIA or TCP Thrird parties Thrird parties Quadrics or Compaq Thrird parties Compaq or Thrird parties Amy Apon / Mark Baker 2000

34 Distributed Operating Systems 33 Software Development (I) Operating Systems: Linux: Free, cheap, fast and fast-development. e.g., Beowulf Solaris: Good parallelism support and good network services. e.g., Solaris MC AIX: Powerful and well-optimized software development tools. e.g., SP2 Windows: Why not? e.g., Wolfpack

35 Distributed Operating Systems 34 Software Development (II) Middleware and SSI: SSI (Single System Image): The whole cluster is presented as a single monoprocessor. Layered development: Hardware (Local). Operating system (μkernel) or gluing level: GLUnix or MOSIX Application, services and middleware: CODINE Common services (desirable): Single access point. Single file hierarchy. Single management point. Single network connection. Single work-management service. Single user interface Single I/O space Single process space Checkpointing. Process migration

36 Distributed Operating Systems 35 Software Development (III) Programming tools: Thread support: Pthreads or OpenMP Message passing in clusters: MPI: MPICH or LANMPI. PVM: Worse performance but more features. DSM: Distributed shared memory: Software: TreadMarks, Linda or Nanos Hardware: DASH or Merlin Parallel debuggers Instrumentation tools.

37 Distributed Operating Systems 36 Software Development (IV) Administration tools: Remote management: Administrative commands: install software, copy files. Process-level resource management. User list and other system information: NIS. e.g., SP2 tools, Cluster Command & Control (C 3 ) Scheduling systems: Work queues and workload management Resource supervision. e.g., CODINE, CONDORPBS (Portable Batch System)

38 Distributed Operating Systems 37 Input/Output System I/O Crisis: Exponential growth of CPUs power (Moore s law). I/O systems much smaller growth. I/O phase is the actual bottleneck of high-performance systems. Solution based on I/O parallelism: Parallel I/O systems: MPI I/O Parallel filesystems: ParFiSys, GPFS Intelligent I/O: Armada, Panda

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

A Comparison of Distributed Systems: ChorusOS and Amoeba

A Comparison of Distributed Systems: ChorusOS and Amoeba A Comparison of Distributed Systems: ChorusOS and Amoeba Angelo Bertolli Prepared for MSIT 610 on October 27, 2004 University of Maryland University College Adelphi, Maryland United States of America Abstract.

More information

Distributed Systems LEEC (2005/06 2º Sem.)

Distributed Systems LEEC (2005/06 2º Sem.) Distributed Systems LEEC (2005/06 2º Sem.) Introduction João Paulo Carvalho Universidade Técnica de Lisboa / Instituto Superior Técnico Outline Definition of a Distributed System Goals Connecting Users

More information

Distributed Operating Systems

Distributed Operating Systems Distributed Operating Systems Prashant Shenoy UMass Computer Science http://lass.cs.umass.edu/~shenoy/courses/677 Lecture 1, page 1 Course Syllabus CMPSCI 677: Distributed Operating Systems Instructor:

More information

How To Understand The Concept Of A Distributed System

How To Understand The Concept Of A Distributed System Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz ens@ia.pw.edu.pl, akozakie@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of

More information

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun CS550 Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun Email: sun@iit.edu, Phone: (312) 567-5260 Office hours: 2:10pm-3:10pm Tuesday, 3:30pm-4:30pm Thursday at SB229C,

More information

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis Middleware and Distributed Systems Introduction Dr. Martin v. Löwis 14 3. Software Engineering What is Middleware? Bauer et al. Software Engineering, Report on a conference sponsored by the NATO SCIENCE

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available: Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Simplest Scalable Architecture

Simplest Scalable Architecture Simplest Scalable Architecture NOW Network Of Workstations Many types of Clusters (form HP s Dr. Bruce J. Walker) High Performance Clusters Beowulf; 1000 nodes; parallel programs; MPI Load-leveling Clusters

More information

Principles of Operating Systems CS 446/646

Principles of Operating Systems CS 446/646 Principles of Operating Systems CS 446/646 1. Introduction to Operating Systems a. Role of an O/S b. O/S History and Features c. Types of O/S Mainframe systems Desktop & laptop systems Parallel systems

More information

MOSIX: High performance Linux farm

MOSIX: High performance Linux farm MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm

More information

Distributed Operating Systems. Cluster Systems

Distributed Operating Systems. Cluster Systems Distributed Operating Systems Cluster Systems Ewa Niewiadomska-Szynkiewicz ens@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of Technology E&IT Department, WUT 1 1. Cluster

More information

OPERATING SYSTEMS Internais and Design Principles

OPERATING SYSTEMS Internais and Design Principles OPERATING SYSTEMS Internais and Design Principles FOURTH EDITION William Stallings, Ph.D. Prentice Hall Upper Saddle River, New Jersey 07458 CONTENTS Web Site for Operating Systems: Internais and Design

More information

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang Distributed RAID Architectures for Cluster I/O Computing Kai Hwang Internet and Cluster Computing Lab. University of Southern California 1 Presentation Outline : Scalable Cluster I/O The RAID-x Architecture

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Lecture 23: Multiprocessors

Lecture 23: Multiprocessors Lecture 23: Multiprocessors Today s topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks

More information

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 Distributed Systems REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 1 The Rise of Distributed Systems! Computer hardware prices are falling and power increasing.!

More information

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts Chapter 2 Introduction to Distributed systems 1 Chapter 2 2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts Client-Server

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

OpenMosix Presented by Dr. Moshe Bar and MAASK [01] OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive

More information

Cluster Implementation and Management; Scheduling

Cluster Implementation and Management; Scheduling Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /

More information

Distributed Systems. Examples. Advantages and disadvantages. CIS 505: Software Systems. Introduction to Distributed Systems

Distributed Systems. Examples. Advantages and disadvantages. CIS 505: Software Systems. Introduction to Distributed Systems CIS 505: Software Systems Introduction to Distributed Systems Insup Lee Department of Computer and Information Science University of Pennsylvania Distributed Systems Why distributed systems? o availability

More information

Transparency in Distributed Systems

Transparency in Distributed Systems Transparency in Distributed Systems By Sudheer R Mantena Abstract The present day network architectures are becoming more and more complicated due to heterogeneity of the network components and mainly

More information

Chapter 1: Distributed Systems: What is a distributed system? Fall 2008 Jussi Kangasharju

Chapter 1: Distributed Systems: What is a distributed system? Fall 2008 Jussi Kangasharju Chapter 1: Distributed Systems: What is a distributed system? Fall 2008 Jussi Kangasharju Course Goals and Content Distributed systems and their: Basic concepts Main issues, problems, and solutions Structured

More information

3 - Introduction to Operating Systems

3 - Introduction to Operating Systems 3 - Introduction to Operating Systems Mark Handley What is an Operating System? An OS is a program that: manages the computer hardware. provides the basis on which application programs can be built and

More information

High Performance Computing

High Performance Computing High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.

More information

Virtual machine interface. Operating system. Physical machine interface

Virtual machine interface. Operating system. Physical machine interface Software Concepts User applications Operating system Hardware Virtual machine interface Physical machine interface Operating system: Interface between users and hardware Implements a virtual machine that

More information

MPI / ClusterTools Update and Plans

MPI / ClusterTools Update and Plans HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski

More information

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the

More information

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering

More information

Improved LS-DYNA Performance on Sun Servers

Improved LS-DYNA Performance on Sun Servers 8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms

More information

QUADRICS IN LINUX CLUSTERS

QUADRICS IN LINUX CLUSTERS QUADRICS IN LINUX CLUSTERS John Taylor Motivation QLC 21/11/00 Quadrics Cluster Products Performance Case Studies Development Activities Super-Cluster Performance Landscape CPLANT ~600 GF? 128 64 32 16

More information

Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers

Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers Chee Shin Yeo 1, Rajkumar Buyya 1, Hossein Pourreza 2, Rasit Eskicioglu 2, Peter Graham

More information

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking Cluster Grid Interconects Tony Kay Chief Architect Enterprise Grid and Networking Agenda Cluster Grid Interconnects The Upstart - Infiniband The Empire Strikes Back - Myricom Return of the King 10G Gigabit

More information

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study DISTRIBUTED SYSTEMS AND CLOUD COMPUTING A Comparative Study Geographically distributed resources, such as storage devices, data sources, and computing power, are interconnected as a single, unified resource

More information

Architecture. System Software. Applications. Architecture. System Software. Applications. Commercialization Commodity. Research and Development

Architecture. System Software. Applications. Architecture. System Software. Applications. Commercialization Commodity. Research and Development Chapter 1 Cluster Computing at a Glance Mark Baker y and Rajkumar Buyya z y Division of Computer Science University ofportsmouth Southsea, Hants, UK z School of Computer Science and Software Engineering

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

Virtual Machines. www.viplavkambli.com

Virtual Machines. www.viplavkambli.com 1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software

More information

Building scalable and reliable systems

Building scalable and reliable systems Lectures on distributed systems Building scalable and reliable systems Paul Krzyzanowski Background The traditional approach to designing highly available systems was to incorporate elements of fault-tolerant

More information

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could

More information

COM 444 Cloud Computing

COM 444 Cloud Computing COM 444 Cloud Computing Lec 2: Computer Clusters for Scalable Parallel Computing Computer Clusters for Scalable Parallel Computing 1. Clustering for Massive Parallelism 2. Computer Clusters and MPP Architectures

More information

Storage Virtualization from clusters to grid

Storage Virtualization from clusters to grid Seanodes presents Storage Virtualization from clusters to grid Rennes 4th october 2007 Agenda Seanodes Presentation Overview of storage virtualization in clusters Seanodes cluster virtualization, with

More information

independent systems in constant communication what they are, why we care, how they work

independent systems in constant communication what they are, why we care, how they work Overview of Presentation Major Classes of Distributed Systems classes of distributed system loosely coupled systems loosely coupled, SMP, Single-system-image Clusters independent systems in constant communication

More information

Distributed Systems and Recent Innovations: Challenges and Benefits

Distributed Systems and Recent Innovations: Challenges and Benefits Distributed Systems and Recent Innovations: Challenges and Benefits 1. Introduction Krishna Nadiminti, Marcos Dias de Assunção, and Rajkumar Buyya Grid Computing and Distributed Systems Laboratory Department

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

CMS Tier-3 cluster at NISER. Dr. Tania Moulik CMS Tier-3 cluster at NISER Dr. Tania Moulik What and why? Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach common goal. Grids tend

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Chapter 1: Introduction. What is an Operating System?

Chapter 1: Introduction. What is an Operating System? Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real -Time Systems Handheld Systems Computing Environments

More information

A Tour of the Linux OpenFabrics Stack

A Tour of the Linux OpenFabrics Stack A Tour of the OpenFabrics Stack Johann George, QLogic June 2006 1 Overview Beyond Sockets Provides a common interface that allows applications to take advantage of the RDMA (Remote Direct Memory Access),

More information

CLUSTER APPROACH TO HIGH PERFORMANCE COMPUTING

CLUSTER APPROACH TO HIGH PERFORMANCE COMPUTING Computer Modelling & New Technologies, 2003, Volume 7, No.2, 7-15 Transport and Telecommunication Institute, Lomonosov Str.1, Riga, LV-1019, Latvia Computational Methods and Modelling CLUSTER APPROACH

More information

Linux for Scientific Computing

Linux for Scientific Computing Linux for Scientific Computing Bill Saphir Berkeley Lab wcs@nersc.gov Things you should know if you re thinking about using Linux for Scientific Computing Bill Saphir Berkeley Lab wcs@nersc.gov Random

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand

More information

Cluster Computing in the Classroom: Topics, Guidelines, and Experiences

Cluster Computing in the Classroom: Topics, Guidelines, and Experiences Cluster Computing in the Classroom: Topics, Guidelines, and Experiences Amy Apon α, Rajkumar Buyya β, Hai Jin δ, and Jens Mache φ Computer Science and Computer Engineering α University of Arkansas, Fayetteville,

More information

Solid State Storage in Massive Data Environments Erik Eyberg

Solid State Storage in Massive Data Environments Erik Eyberg Solid State Storage in Massive Data Environments Erik Eyberg Senior Analyst Texas Memory Systems, Inc. Agenda Taxonomy Performance Considerations Reliability Considerations Q&A Solid State Storage Taxonomy

More information

CHAPTER 15: Operating Systems: An Overview

CHAPTER 15: Operating Systems: An Overview CHAPTER 15: Operating Systems: An Overview The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint

More information

HPC Software Requirements to Support an HPC Cluster Supercomputer

HPC Software Requirements to Support an HPC Cluster Supercomputer HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417

More information

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these

More information

Lecture 1: the anatomy of a supercomputer

Lecture 1: the anatomy of a supercomputer Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers of the future may have only 1,000 vacuum tubes and perhaps weigh 1½ tons. Popular Mechanics, March 1949

More information

Client/Server and Distributed Computing

Client/Server and Distributed Computing Adapted from:operating Systems: Internals and Design Principles, 6/E William Stallings CS571 Fall 2010 Client/Server and Distributed Computing Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Traditional

More information

Scheduling and Resource Management in Computational Mini-Grids

Scheduling and Resource Management in Computational Mini-Grids Scheduling and Resource Management in Computational Mini-Grids July 1, 2002 Project Description The concept of grid computing is becoming a more and more important one in the high performance computing

More information

Introduction to Virtual Machines

Introduction to Virtual Machines Introduction to Virtual Machines Introduction Abstraction and interfaces Virtualization Computer system architecture Process virtual machines System virtual machines 1 Abstraction Mechanism to manage complexity

More information

A Comparison on Current Distributed File Systems for Beowulf Clusters

A Comparison on Current Distributed File Systems for Beowulf Clusters A Comparison on Current Distributed File Systems for Beowulf Clusters Rafael Bohrer Ávila 1 Philippe Olivier Alexandre Navaux 2 Yves Denneulin 3 Abstract This paper presents a comparison on current file

More information

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Microkernels, virtualization, exokernels. Tutorial 1 CSC469 Microkernels, virtualization, exokernels Tutorial 1 CSC469 Monolithic kernel vs Microkernel Monolithic OS kernel Application VFS System call User mode What was the main idea? What were the problems? IPC,

More information

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354 159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1

More information

Software Concepts. Uniprocessor Operating Systems. System software structures. CIS 505: Software Systems Architectures of Distributed Systems

Software Concepts. Uniprocessor Operating Systems. System software structures. CIS 505: Software Systems Architectures of Distributed Systems CIS 505: Software Systems Architectures of Distributed Systems System DOS Software Concepts Description Tightly-coupled operating system for multiprocessors and homogeneous multicomputers Main Goal Hide

More information

Chapter 16 Distributed Processing, Client/Server, and Clusters

Chapter 16 Distributed Processing, Client/Server, and Clusters Operating Systems: Internals and Design Principles Chapter 16 Distributed Processing, Client/Server, and Clusters Eighth Edition By William Stallings Table 16.1 Client/Server Terminology Applications Programming

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

More information

Simple Introduction to Clusters

Simple Introduction to Clusters Simple Introduction to Clusters Cluster Concepts Cluster is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level,

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

IBM Deep Computing Visualization Offering

IBM Deep Computing Visualization Offering P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS

PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS Philip J. Sokolowski Department of Electrical and Computer Engineering Wayne State University 55 Anthony Wayne Dr. Detroit, MI 822

More information

Components of a Computer System

Components of a Computer System SFWR ENG 3B04 Software Design III 1.1 3 Hardware Processor(s) Memory I/O devices Operating system Kernel System programs Components of a Computer System Application programs Users SFWR ENG 3B04 Software

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Chapter 1: Operating System Models 1 2 Operating System Models 2.1 Introduction Over the past several years, a number of trends affecting operating system design are witnessed and foremost among them is

More information

Using Linux Clusters as VoD Servers

Using Linux Clusters as VoD Servers HAC LUCE Using Linux Clusters as VoD Servers Víctor M. Guĺıas Fernández gulias@lfcia.org Computer Science Department University of A Corunha funded by: Outline Background: The Borg Cluster Video on Demand.

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies sp@elostech.cz

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies sp@elostech.cz Stanislav Polášek ELOS Technologies sp@elostech.cz Red Hat - an Established Global Leader Compiler Development Identity & Authentication Storage & File Systems Middleware Kernel Development Virtualization

More information

- Behind The Cloud -

- Behind The Cloud - - Behind The Cloud - Infrastructure and Technologies used for Cloud Computing Alexander Huemer, 0025380 Johann Taferl, 0320039 Florian Landolt, 0420673 Seminar aus Informatik, University of Salzburg Overview

More information

UNIT I LESSON 1: DISTRIBUTED SYSTEMS

UNIT I LESSON 1: DISTRIBUTED SYSTEMS LESSON 1: DISTRIBUTED SYSTEMS UNIT I CONTENTS 1.0 Aim and Objectives 1.1. Introduction 1.2. Organization 1.3. Goals and Advantages 1.4. Disadvantages 1.5. Architecture 1.6. Concurrency 1.7. Languages 1.8.

More information

Building a Linux Cluster

Building a Linux Cluster Building a Linux Cluster CUG Conference May 21-25, 2001 by Cary Whitney Clwhitney@lbl.gov Outline What is PDSF and a little about its history. Growth problems and solutions. Storage Network Hardware Administration

More information

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:

More information

Analysis and Implementation of Cluster Computing Using Linux Operating System

Analysis and Implementation of Cluster Computing Using Linux Operating System IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 2, Issue 3 (July-Aug. 2012), PP 06-11 Analysis and Implementation of Cluster Computing Using Linux Operating System Zinnia Sultana

More information

Chapter 3 Operating-System Structures

Chapter 3 Operating-System Structures Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

LinuxWorld Conference & Expo Server Farms and XML Web Services

LinuxWorld Conference & Expo Server Farms and XML Web Services LinuxWorld Conference & Expo Server Farms and XML Web Services Jorgen Thelin, CapeConnect Chief Architect PJ Murray, Product Manager Cape Clear Software Objectives What aspects must a developer be aware

More information

MCA Standards For Closely Distributed Multicore

MCA Standards For Closely Distributed Multicore MCA Standards For Closely Distributed Multicore Sven Brehmer Multicore Association, cofounder, board member, and MCAPI WG Chair CEO of PolyCore Software 2 Embedded Systems Spans the computing industry

More information

Cluster, Grid, Cloud Concepts

Cluster, Grid, Cloud Concepts Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of

More information

Technical Overview of Windows HPC Server 2008

Technical Overview of Windows HPC Server 2008 Technical Overview of Windows HPC Server 2008 Published: June, 2008, Revised September 2008 Abstract Windows HPC Server 2008 brings the power, performance, and scale of high performance computing (HPC)

More information

Glosim: Global System Image for Cluster Computing

Glosim: Global System Image for Cluster Computing Glosim: Global System Image for Cluster Computing Hai Jin, Li Guo, Zongfen Han Internet and Cluster Computing Center Huazhong University of Science and Technology, Wuhan, 4374, China Abstract: This paper

More information

Weighted Total Mark. Weighted Exam Mark

Weighted Total Mark. Weighted Exam Mark CMP2204 Operating System Technologies Period per Week Contact Hour per Semester Total Mark Exam Mark Continuous Assessment Mark Credit Units LH PH TH CH WTM WEM WCM CU 45 30 00 60 100 40 100 4 Rationale

More information