Distributed (Operating) Systems. Introduction



Similar documents
Principles and characteristics of distributed systems and environments

A Comparison of Distributed Systems: ChorusOS and Amoeba

Distributed Systems LEEC (2005/06 2º Sem.)

Distributed Operating Systems

How To Understand The Concept Of A Distributed System

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Symmetric Multiprocessing

- An Essential Building Block for Stable and Reliable Compute Clusters

Simplest Scalable Architecture

Principles of Operating Systems CS 446/646

MOSIX: High performance Linux farm

Distributed Operating Systems. Cluster Systems

OPERATING SYSTEMS Internais and Design Principles

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Scalability and Classifications

Lecture 23: Multiprocessors

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

Introduction to Cloud Computing

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

Cluster Implementation and Management; Scheduling

Distributed Systems. Examples. Advantages and disadvantages. CIS 505: Software Systems. Introduction to Distributed Systems

Transparency in Distributed Systems

Chapter 1: Distributed Systems: What is a distributed system? Fall 2008 Jussi Kangasharju

3 - Introduction to Operating Systems

High Performance Computing

Virtual machine interface. Operating system. Physical machine interface

MPI / ClusterTools Update and Plans

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

Improved LS-DYNA Performance on Sun Servers

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

System Models for Distributed and Cloud Computing

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Virtual Machines.

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

COM 444 Cloud Computing

Storage Virtualization from clusters to grid

Distributed Systems and Recent Innovations: Challenges and Benefits

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Parallel Programming Survey

Chapter 1: Introduction. What is an Operating System?

A Tour of the Linux OpenFabrics Stack

Linux for Scientific Computing

CHAPTER 1 INTRODUCTION

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Solid State Storage in Massive Data Environments Erik Eyberg

CHAPTER 15: Operating Systems: An Overview

HPC Software Requirements to Support an HPC Cluster Supercomputer

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Lecture 1: the anatomy of a supercomputer

Client/Server and Distributed Computing

Introduction to Virtual Machines

A Comparison on Current Distributed File Systems for Beowulf Clusters

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Software Concepts. Uniprocessor Operating Systems. System software structures. CIS 505: Software Systems Architectures of Distributed Systems

Chapter 16 Distributed Processing, Client/Server, and Clusters

Cluster Computing at HRI

Simple Introduction to Clusters

Enabling Technologies for Distributed Computing

IBM Deep Computing Visualization Offering

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS

Components of a Computer System

Clusters: Mainstream Technology for CAE

High Performance Computing. Course Notes HPC Fundamentals


Using Linux Clusters as VoD Servers

A Flexible Cluster Infrastructure for Systems Research and Software Development

Chapter 2 Parallel Computer Architecture

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies

UNIT I LESSON 1: DISTRIBUTED SYSTEMS

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Analysis and Implementation of Cluster Computing Using Linux Operating System

Chapter 3 Operating-System Structures

LinuxWorld Conference & Expo Server Farms and XML Web Services

Cluster, Grid, Cloud Concepts

Technical Overview of Windows HPC Server 2008

Glosim: Global System Image for Cluster Computing

Weighted Total Mark. Weighted Exam Mark

Transcription:

Distributed (Operating) Systems Introduction

Distributed Operating Systems 1 Schedule Sessions 1. Introduction: Distributed systems (Hardware/Software issues) 2. Process management in clusters: Load balancing and job scheduling 3. Distributed communications 4. Distributed services Scenarios High-performance solutions for scientific applications (process management) Distributed systems for transactional services 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 Mon Tue 3-comm 1-Intro 4-serv LUNCH 2-proc Scenario 2 Scenario 1

Distributed Operating Systems 2 Bibliography Distributed Systems: Concepts and Design G. Coulouris, J. Dollimore, T. Kindberg; Addison-Wesley, 2001 Distributed Systems: Principles and Paradigms A. S. Tanenbaum, M. Van Steen; Prentice-Hall, 2007 Distributed Operating Systems: Concepts & Practice D. L. Galli; Prentice-Hall, 2000 Distributed Operating Systems & Algorithms R. Chow, T. Johnson; Addison-Wesley, 1997 Distributed Computing: Principles and Applications M.L. Liu; Addison-Wesley, 2004

Distributed (Operating) Systems Introduction and Concepts

Distributed Operating Systems 4 Distributed System (DS) Hardware: Network-connected processor without shared physical memory: Loosely-coupled system Non-common clock Processor-dependent I/O systems Independent failures of system components Heterogeneous system Goal of this seminar: Distributed System Software Distributed Operating Systems (classical view) Software interface that hide distributed system complexity: Single System Image

Distributed Operating Systems 5 Advantages and Drawbacks Advantages: Cost/performance ratio Parallel processing: high performance Fault tolerance: high availability Scalable, open and heterogeneous Most appropriate for originally distributed applications E.g., geographically distributed enterprise Drawbacks: More complex software development Networks connection problems: latency, bandwidth and availability Security

Distributed Operating Systems 6 New Paradigms for DS Cluster Computing: Dedicated systems: High performance. High availability. Homogeneous system: Nodes. LAN (generalist or specific). Open issues: Coupling degree, distributed services. Gird Computing: Resource sharing and idle processor usage. Restricted to some specific tasks. Different scopes: Inter-departmental grids. Inter-organization grids. Open issues: Coordination, security and dynamic changes.

Operating System Support 1. OS for Distributed Systems: Requirements Characteristics 2. Distributed Systems 3. Parallel/Distributed OS: Operating Systems Parallelisation Distributed System Services Microkernels Operating System Support 7

Distributed Architectures A distributed system is a collection of independent computers presented to the user as a single computer. Distributed Computer Architectures: Flynn 72: SISD, SIMD, MISD, MIMD Johnson 88: UMA, NUMA, NORMA Operating System Support 8

Distributed System Application Internet Services: e-mail, news, web,... Corporate networks or intranets. Parallel processing: Massive processing (+efficiency). Distributed topology (distributed-nature problems) Distributed massive data management. High performance multimedia. Industrial and control systems. Real-time systems. <and many others...> Operating System Support 9

Distributed System Profile Distributed systems have: 1. No common clock: Message and co-ordination aspects. 2. Global concurrency: Real parallel execution. 3. Independent failures: Partial failures. Distributed system usage: 1. Collaborative processing: combined features and services. 2. Parallel processing: massive or high-performance calculation. Operating System Support 10

System Requirements Collaborative systems Openness Scalability Reliability Transparency Security Parallel systems Performance Scalability Reliability Transparency Security Common characteristics but different hardware platforms and applications. All of them DISTRIBUTED Operating System Support 11

Operating System Distribution Operating systems for multiprocessors with shared memory (SMP): Software tightly coupled Hardware tightly coupled Distributed operating systems (DOS): Software tightly coupled Hardware loosely coupled Network operating system: Software loosely coupled Hardware loosely coupled Operating System Support 12

Operating Systems for SMPs Architectures with multiple processors (2 to 8) with uniform access shared memory (SMP: Symmetric Multiprocessors) Characteristics: Small variations of the traditional OS versions. There is only one copy of the OS. Concurrency with real parallelism ( shared time). Commercial versions (Linux, WinNT, Solaris, AIX,...). Different problems: kernel code running on multiple processors (concurrent system calls), synchronisation mechanisms (spin-locks), optimisation and scheduling (processor affinity),... Operating System Support 13

Distributed Operating Systems (DOS) A distributed operating system is a group of processor interconnected by a communication network that hides its complexity presenting to the user a virtual uniprocessor. Characteristics: It runs on a distributed systems making them appear as a centralised system. Transparency: Must hide complex factor of the distribution. It is easier to say than to do. This goal is reached partially by the experimental systems. Failures make the users comply. Operating System Support 14

Distributed Operating Systems (DOS) Problems: Each node has a copy of the OS: Which tasks are performed locally and which globally? How mutual exclusion is achieved without shared memory? How deadlocks are detected without global states? Process scheduling: Each operating system copy has an own task queue (process migration). How a single directory tree is defined? Problems due to no-common clock, partial failures and heterogeneity. Main result: New concepts have been developed and they are useful for other domains. Operating System Support 15

DOS Evolution First network operating systems: New network services in a conventional OS E.g.: UNIX 4BSD ( 1980) New network functionalities: Sun s ONC ( 1985): includes NFS, RPC, NIS First DOS: New OS based on conventional (monolithic) versions. E.g.: Sprite, University of Berkeley ( 1988) DOS based on μ-kernel. E.g.: Mach, CMU ( 1986) Amoeba, designed by Tanenbaum ( 1984) Chorus, INRIA, France ( 1988) Operating System Support 16

Network Operating Systems Network of computers loosely coupled that share resources with no external control on the hardware/software of each node. Characteristics: No virtual uniprocessor vision is presented (independent nodes). Each node runs a copy of the OS (different). Conventional OS+ network utilities. Communication protocols for resource sharing and high-level service access. From rcp/rlogin to Sun s Open Network Computing (ONC). Operating System Support 17

Cooperative Systems High-level services-oriented software systems that requires communication mechanisms to build upper level services. Characteristics: A grade of transparency is provided but the single-system vision is not presented. Autonomous independent systems. They are founded on middlewares (CORBA, DCE, COM+,...) These systems are designed as a combination of multiple services offered by different network elements. Operating System Support 18

Middleware Middleware: Software layer over the operating system that provides standard distributed services. Open systems independent of the vendor. Hardware and OS independent. Examples: DCE (Open Group). CORBA (OMG).... Middleware OS OS OS Operating System Support Hardware Hardware Hardware 19

Single System Image (SSI) The illusion, created by hardware/software, that presents a collection of resources as one. Hardware SSI: DEC Memory Channel or SMPs Operating System: DOS or Gluing layer Application and Services: Middlewares (many levels). Every SSI has a boundary. Operating System Support 20

Why SSI is useful? It is easy to program/use: Traditional programming, known interfaces. Low-level issues hidden. Allows centralized and distributed management depending on task requirement. (Potentially) provides: Fault tolerance. Scalability. Modular improvement. Operating System Support 21

Operating System Layers A simplified vision of an Operating System has the following layers: Hardware. Kernel. System services. Application programs. Users. Users Applications Services Kernel Hardware Operating System Support 22

Kernel Responsabilities Services Kernel Computer Monolithic Kernels: Many OS functionalities inside the kernel scheduler, memory manager, drivers, file systems... μ Kernels: Many OS tasks are performed outside the kernel. Remaining: (i) process communication, (ii) memory management, (iii) low-level management and scheduling y (iv) low-level i/o Services μ Kernel Computer Services μ Kernel μ Kernel μ Kernel Operating System Support Distributed Services: Distributed system structure. Depending on the level: Distributed operating systems Network operating systems or (Cooperative). 23

Operating System on Distributed Systems MPPs SMPs Clusters Distributed Size 100s 1000s 10s 100s or less 10s 1000s OS N x kernels Single OS kernel N x OS platforms N x OS platforms OS type Specific purpose Special variants of standard OSs Standard OS plus tools (not always) Standard OS and special tools Communic. Message / DSM Shared Memory Scheduling Single queue Single queue Message passing (e.g.: MPI) Multiple queues coordinated Message passing or middleware Independent queues Operating System Support Single System Image (SSI) 24

Tools for Distributed/Cluster Systems Operating system: Modular/Layered Monolithic Based on μ-kernels Runtime systems: Parallel file systems or I/O libaries Distributed shared memory software Resource management: Process scheduling tools Load balancing Applications: Management and administration tools. Processing tasks and jobs Operating System Support 25

Distributed Operating Systems Hardware and Software Overview

Distributed Operating Systems 27 Concept of Cluster Alternative to traditional supercomputing facilities. Instead of traditional systems: Specific hardware. High-cost. Slow hardware development. Painful software development. the use of general-purpose systems provides: Commodity hardware (Commercial-off-the-self: COTS). Moderate-cost. Fast hardware development. Even more painful software development.

Distributed Operating Systems 28 Concept of Cluster Cluster: Hardware system based on commodity hardware connected by a dedicated (high-performance) network. Nodes: PCs or workstations (SMPs). Network: From high-speed networks to specific hardware. Mysterious acronyms: PoPCs: Pile of PCs COWs: Clusters of workstations CLUMPS: Clusters of multiprocessors NOWs: Networks of workstations...

Distributed Operating Systems 29 Hardware Characteristics Nodes: Processor: Intel Pentium, AMD Athlon, Compaq Alpha, IBM PowerPC, Sun SuperSparc (3-4...Ghz) Memory: SDRAM, DDR or similar (2-8 GB) Storage: SCSI or RAID Network: Key element. It could cost 50+% of the system value Cheap alternative: Ethernet (100-1000Mb/seg)

Distributed Operating Systems 30 Cluster Networks (I) General purpose network technologies: Improvement in network bandwidth. Only reduced improvements in the latency Not well-suited Low-latency protocols: Active Messages (Berkeley): Zero-copy synchronous model. GAM. Fast Messages (Illinois): Reliable AM in order. VMMC (Princeton): Distributed shared memory pages (DSM). U-net (Cornell): Virtual interfaces for memory pages. BIP (ENS Lyon): Low-latency basic interface.

Distributed Operating Systems 31 Cluster Networks (II) Cluster communication standards: VIA: Hardware interface (native/emulated) for communications. Mpas physical memory regions and virtual network interfaces. MPI versions over VIA. InfiniBand: I/O hardware standard (2.5Gbps) using one-way connections. 6 Communication models. Using RDMA and IPv6. Network hardware: Ethernet, FastEthernet, GigaEthernet: Cheap but limited. Collision problems. VIA emulations. Giganet (clan): Implementation over VIA (1.26Gbps) Myrinet: Low-latency programmable networks. Cut-through routing and failure detection. GM protocol. Others: QsNet, ServerNet, SCI, ATM, FiberChannel, HIPPI, ATOLL,...

Distributed Operating Systems 32 Technologies Comparative Gigabit Ethernet Giganet Myrinet QsNet SCI ServerNet2 MPI badwidth stable (MB/sec) 35-50 105 140 208 80 65 MPI latency (μseg) 100-200 20-40 ~18 5 6 20.2 Maximum number of nodes 1000 s 1000 s 1000 s 1000 s 1000 s 64k VIA support Win/Linux Win/Linux Over GM NOne Software Hardware MPI support type MPICH over MVIA or TCP Thrird parties Thrird parties Quadrics or Compaq Thrird parties Compaq or Thrird parties Amy Apon / Mark Baker 2000

Distributed Operating Systems 33 Software Development (I) Operating Systems: Linux: Free, cheap, fast and fast-development. e.g., Beowulf Solaris: Good parallelism support and good network services. e.g., Solaris MC AIX: Powerful and well-optimized software development tools. e.g., SP2 Windows: Why not? e.g., Wolfpack

Distributed Operating Systems 34 Software Development (II) Middleware and SSI: SSI (Single System Image): The whole cluster is presented as a single monoprocessor. Layered development: Hardware (Local). Operating system (μkernel) or gluing level: GLUnix or MOSIX Application, services and middleware: CODINE Common services (desirable): Single access point. Single file hierarchy. Single management point. Single network connection. Single work-management service. Single user interface Single I/O space Single process space Checkpointing. Process migration

Distributed Operating Systems 35 Software Development (III) Programming tools: Thread support: Pthreads or OpenMP Message passing in clusters: MPI: MPICH or LANMPI. PVM: Worse performance but more features. DSM: Distributed shared memory: Software: TreadMarks, Linda or Nanos Hardware: DASH or Merlin Parallel debuggers Instrumentation tools.

Distributed Operating Systems 36 Software Development (IV) Administration tools: Remote management: Administrative commands: install software, copy files. Process-level resource management. User list and other system information: NIS. e.g., SP2 tools, Cluster Command & Control (C 3 ) Scheduling systems: Work queues and workload management Resource supervision. e.g., CODINE, CONDORPBS (Portable Batch System)

Distributed Operating Systems 37 Input/Output System I/O Crisis: Exponential growth of CPUs power (Moore s law). I/O systems much smaller growth. I/O phase is the actual bottleneck of high-performance systems. Solution based on I/O parallelism: Parallel I/O systems: MPI I/O Parallel filesystems: ParFiSys, GPFS Intelligent I/O: Armada, Panda