DISTRIBUTED SYSTEMS DISTRIBUTED SYSTEMS. Basic Issues. Course Information. Web page: Examination: written (TDDB37)

Similar documents
Virtual machine interface. Operating system. Physical machine interface

Distributed Systems Lecture 1 1

Distributed Operating Systems

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Client/Server Computing Distributed Processing, Client/Server, and Clusters

How To Understand The Concept Of A Distributed System

Principles and characteristics of distributed systems and environments

Distributed Systems LEEC (2005/06 2º Sem.)

Transparency in Distributed Systems

Middleware and Distributed Systems. System Models. Dr. Martin v. Löwis. Freitag, 14. Oktober 11

A Comparison of Distributed Systems: ChorusOS and Amoeba

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

Distributed Computing

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CORBA and object oriented middleware. Introduction

Chapter 1: Distributed Systems: What is a distributed system? Fall 2008 Jussi Kangasharju

Middleware Lou Somers

Chapter 1: Introduction. What is an Operating System?

Distributed System Principles

Distributed Systems. Security concepts; Cryptographic algorithms; Digital signatures; Authentication; Secure Sockets

Operating Systems 4 th Class

Introduction to Cloud Computing

Dr Markus Hagenbuchner CSCI319. Distributed Systems

Distributed Systems and Recent Innovations: Challenges and Benefits

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

6306 Advanced Operating Systems

Ingegneria del Software II academic year: Course Web-site: [

Chapter 16 Distributed Processing, Client/Server, and Clusters

How To Make A Distributed System Transparent

Distributed Systems. Outline. What is a Distributed System?

Introduction to CORBA. 1. Introduction 2. Distributed Systems: Notions 3. Middleware 4. CORBA Architecture

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

What You Will Learn About. Computers Are Your Future. Chapter 8. Networks: Communicating and Sharing Resources. Network Fundamentals

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

System types. Distributed systems

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Network Attached Storage. Jinfeng Yang Oct/19/2015

Distribution transparency. Degree of transparency. Openness of distributed systems

Distributed Systems. Examples. Advantages and disadvantages. CIS 505: Software Systems. Introduction to Distributed Systems

Module 15: Network Structures

Middleware: Past and Present a Comparison

Chapter 14: Distributed Operating Systems

Distributed System: Definition

Module 1 Introduction CS755! 1-1!

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Distributed Systems. Distributed Systems

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

Outline: Operating Systems

Chapter 16: Distributed Operating Systems

Chapter 18: Database System Architectures. Centralized Systems

Contents. Chapter 1. Introduction

E-Business Technologies

INTERNET OF THE THINGS (IoT): An introduction to wireless sensor networking middleware

Client/Server and Distributed Computing

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

CHAPTER 15: Operating Systems: An Overview

Learning Objectives. Chapter 1: Networking with Microsoft Windows 2000 Server. Basic Network Concepts. Learning Objectives (continued)

Distributed Data Management

Application Brief: Using Titan for MS SQL

System Models for Distributed and Cloud Computing

Fault Tolerant Servers: The Choice for Continuous Availability

A network is a group of devices (Nodes) connected by media links. A node can be a computer, printer or any other device capable of sending and

Integrated Application and Data Protection. NEC ExpressCluster White Paper

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

SCALABILITY AND AVAILABILITY

Optimizing the Virtual Data Center

White Paper on NETWORK VIRTUALIZATION

PARALLELS CLOUD STORAGE

Basics. Topics to be covered. Definition of a Distributed System. Definition of a Distributed System

Operating System Software

Chapter 17: Distributed Systems

Software design (Cont.)

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Star System Deitel & Associates, Inc. All rights reserved.

Client/server and peer-to-peer models: basic concepts

Overview and History of Operating Systems

SAN Conceptual and Design Basics

Tier Architectures. Kathleen Durant CS 3200

Computer Networks Vs. Distributed Systems

Overview of Routing between Virtual LANs

High Performance Computer Architecture

Comparison of Thin Client Solutions

Virtual Machines.

Chapter 6: distributed systems

White Paper ClearSCADA Architecture

What is Middleware? Software that functions as a conversion or translation layer. It is also a consolidator and integrator.

IOS110. Virtualization 5/27/2014 1

High Availability Databases based on Oracle 10g RAC on Linux

Amoeba Distributed Operating System

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

LinuxWorld Conference & Expo Server Farms and XML Web Services

OPERATING SYSTEMS Internais and Design Principles

Paragon Protect & Restore

Transcription:

Distributed Systems Fö 1-1 Distributed Systems Fö 1-2 Course Information Web page: http://www.ida.liu.se/~tddb37 DISTRIBUTED SYSTEMS (TDDB37) Examination: written Petru Eles Institutionen för Datavetenskap (IDA) Linköpings Universitet email: petel@ida.liu.se http://www.ida.liu.se/~petel phone: 28 1396 B building, 329:220 Lecture notes: available from the web page, latest 24 hours before the lecture. Text book: George Coulouris, Jean Dollimore, Tim Kindberg: "Distributed Systems - Concepts and Design", Addison Wesley Publ. Comp., 4th edition, 2005. Other titles can be used in addition: Andrew S. Tanenbaum, Maarten van Steen: "Distributed Systems", Prentice-Hall International, 2002. Mukesh Singhal, Niranjan G. Shivaratri: "Advanced Concepts in Operating Systems. Distributed, Database, and Multiprocessor Operating Systems", McGraw-Hill, 1994. Distributed Systems Fö 1-3 Distributed Systems Fö 1-4 Course Information (cont d) DISTRIBUTED SYSTEMS Basic Issues Labss&Lessons: Traian Pop Institutionen för Datavetenskap (IDA) email: trapo@ida.liu.se http://www.ida.liu.se/~trapo phone: 28 1970 B building, 3D:437 1. What is a Distributed System? 2. Examples of Distributed Systems 3. Advantages and Disadvantages Jakob Rosén Institutionen för Datavetenskap (IDA) email: jakro@ida.liu.se http://www.ida.liu.se/~jakro phone: 28 40 46 B building, 329:224 4. Design Issues with Distributed Systems 5. Preliminary Course Topics

Distributed Systems Fö 1-5 Distributed Systems Fö 1-6 What is a Distributed System? Examples of Distributed Systems A distributed system is a collection of autonomous computers linked by a computer network that appear to the users of the system as a single computer. Some comments: Network of workstations Workstations System architecture: the machines are autonomous; this means they are computers which, in principle, could work independently; The user s perception: the distributed system is perceived as a single system solving a certain problem (even though, in reality, we have several computers placed in different locations). By running a distributed system software the computers are enabled to: - coordinate their activities - share resources: hardware, software, data. According to this definition, the Internet as such, is not a distributed system, but an infrastructure on which to implement distributed applications/services (such as the World Wide Web). Gateway to Wide Area Network Local area network File servers Personal workstations + processors not assigned to specific users. Single file system, with all files accessible from all machines in the same way and using the same path name. For a certain command the system can look for the best place (workstation) to execute it. Distributed Systems Fö 1-7 Distributed Systems Fö 1-8 Examples of Distributed Systems (cont d) Automatic banking (teller machine) system Examples of Distributed Systems (cont d) Automotive system (a distributed real-time system) Teller machines Bank_2 data Bank_2 backup Actuators Sensors Input/Output FPGA RAM CPU X-by wire Anti-lock breaking Adaptive cruise control CACHE Network Interf. Gateway Safety critical network Engine control Trottle control Teller machines Bank_1 data Bank_1 backup Primary requirements: security and reliability. Consistency of replicated data. Concurrent transactions (operations which involve accounts in different banks; simultaneous access from several users, etc). Fault tolerance Gateway Non-safety critical high-speed network Non-safety critical low-speed network Entertainment network

Distributed Systems Fö 1-9 Distributed Systems Fö 1-10 Why do we Need Them? Advantages of Distributed Systems Examples of Distributed Systems (cont d) Distributed Real-Time Systems Synchronization of physical clocks Scheduling with hard time constraints Real-time communication Fault tolerance Performance: very often a collection of processors can provide higher performance (and better price/performance ratio) than a centralized computer. Distribution: many applications involve, by their nature, spatially separated machines (banking, commercial, automotive system). Reliability (fault tolerance): if some of the machines crash, the system can survive. Incremental growth: as requirements on processing power grow, new machines can be added incrementally. Sharing of data/resources: shared data is essential to many applications (banking, computersupported cooperative work, reservation systems); other resources can be also shared (e.g. expensive printers). Communication: facilitates human-to-human communication. Distributed Systems Fö 1-11 Distributed Systems Fö 1-12 Disadvantages of Distributed Systems Design Issues with Distributed Systems Difficulties of developing distributed software: how should operating systems, programming languages and applications look like? Networking problems: several problems are created by the network infrastructure, which have to be dealt with: loss of messages, overloading,... Design issues that arise specifically from the distributed nature of the application: Transparency Communication Performance & scalability Heterogeneity Openness Security problems: sharing generates the problem of data security. Reliability & fault tolerance Security

Distributed Systems Fö 1-13 Distributed Systems Fö 1-14 Transparency Transparency (cont d) How to achieve the single system image? How to "fool" everyone into thinking that the collection of machines is a "simple" computer? Access transparency - local and remote resources are accessed using identical operations. Location transparency - users cannot tell where hardware and software resources (CPUs, files, data bases) are located; the name of the resource shouldn t encode the location of the resource. Migration (mobility) transparency - resources should be free to move from one location to another without having their names changed. Replication transparency - the system is free to make additional copies of files and other resources (for purpose of performance and/or reliability), without the users noticing. Example: several copies of a file; at a certain request that copy is accessed which is the closest to the client. Concurrency transparency - the users will not notice the existence of other users in the system (even if they access the same resources). Failure transparency - applications should be able to complete their task despite failures occurring in certain components of the system. Performance transparency - load variation should not lead to performance degradation. This could be achieved by automatic reconfiguration as response to changes of the load; it is difficult to achieve. Distributed Systems Fö 1-15 Distributed Systems Fö 1-16 Communication Performance and Scalability Components of a distributed system have to communicate in order to interact. This implies support at two levels: 1. Networking infrastructure (interconnections & network software). 2. Appropriate communication primitives and models and their implementation: communication primitives: - send message passing - receive - remote procedure call (RPC) communication models - client-server communication: implies a message exchange between two processes: the process which requests a service and the one which provides it; - group muticast: the target of a message is a set of processes, which are members of a given group. Several factors are influencing the performance of a distributed system: The performance of individual workstations. The speed of the communication infrastructure. Extent to which reliability (fault tolerance) is provided (replication and preservation of coherence imply large overheads). Flexibility in workload allocation: for example, idle processors (workstations) could be allocated automatically to a user s task. Scalability The system should remain efficient even with a significant increase in the number of users and resources connected: - cost of adding resources should be reasonable; - performance loss with increased number of users and resources should be controlled; - software resources should not run out (number of bits allocated to addresses, number of entries in tables, etc.)

Distributed Systems Fö 1-17 Distributed Systems Fö 1-18 Heterogeneity Openness Distributed applications are typically heterogeneous: - different hardware: mainframes, workstations, PCs, servers, etc.; - different software: UNIX, MS Windows, IBM OS/2, Real-time OSs, etc.; - unconventional devices: teller machines, telephone switches, robots, manufacturing systems, etc.; - diverse networks and protocols: Ethernet, FDDI, ATM, TCP/IP, Novell Netware, etc. The solution Middleware, an additional software layer to mask heterogeneity One of the important features of distributed systems is openness and flexibility: - every service is equally accessible to every client (local or remote); - it is easy to implement, install and debug new services; - users can write and install their own services. Key aspect of openness: - Standard interfaces and protocols (like Internet communication protocols) - Support of heterogeneity (by adequate middleware, like CORBA) Distributed Systems Fö 1-19 Distributed Systems Fö 1-20 Openness (cont d) Openness (cont d) Software Architecture: The same, looking at two distributed nodes: Applications & Services Applications & Services Middleware Operating System Hardware: Computer&Network "the platform" Operating System Hardware: Comp.&Netw. platform 1 Middleware platform 2 Operating System Hardware: Comp.&Netw. Node 1 Node 2

Distributed Systems Fö 1-21 Distributed Systems Fö 1-22 Reliability and Fault Tolerance One of the main goals of building distributed systems is improvement of reliability. Availability:If machines go down, the system should work with the reduced amount of resources. There should be a very small number of critical resources; critical resources: resources which have to be up in order the distributed system to work. Key pieces of hardware and software (critical resources) should be replicated if one of them fails another one takes up - redundancy. Data on the system must not be lost, and copies stored redundantly on different servers must be kept consistent. The more copies kept, the better the availability, but keeping consistency becomes more difficult. Security Security of information resources: 1. Confidentiality Protection against disclosure to unauthorised person 2. Integrity Protection against alteration and corruption 3. Availability Keep the resource accessible Distributed systems should allow communication between programs/users/ resources on different computers. Fault-tolerance is a main issue related to reliability: the system has to detect faults and act in a reasonable way: mask the fault: continue to work with possibly reduced performance but without loss of data/ information. fail gracefully: react to the fault in a predictable way and possibly stop functionality for a short period, but without loss of data/information. Security risks associated with free access. The appropriate use of resources by different users has to be guaranteed. Distributed Systems Fö 1-23 Distributed Systems Fö 1-24 Course Topics at a Glance Basics Introduction Models of Distributed Systems Communication in Distributed Systems Middleware Distributed Heterogeneous Applications and CORBA Peer-to-Peer Systems Theoretical Aspects/Distributed Algorithms Time and State in Distributed Systems Distributed Mutual Exclusion Election and Agreement Distributed Data and Fault Tolerance Replication Recovery and Fault Tolerance Distributed Real-Time Systems Introduction - just finished! Course Topics Communication in Distributed Systems - Message passing and the client/server model - Remote Procedure Call - Group Communication Distributed Heterogeneous Applications and CORBA - Heterogeneity in distributed systems - Middleware - Objects in distributed systems - The CORBA approach Peer-to-Peer Systems - Basic design issues - The Napster file sharing system - Peer-to-peer middleware Time and State in Distributed Systems - Time in distributed systems - Logical clocks - Vector clocks - Causal ordering of messages - Global states and state recording

Distributed Systems Fö 1-25 Course Topics (cont d) Distributed Mutual Exclusion - Mutual exclusion in distributes systems - Non-token based algorithms - Token based algorithms - Distributed elections Replication - Motivation for replication - Consistency and ordering - Total and causal ordering - Update protocols and voting Recovery and Fault Tolerance - Transaction recovery - Checkpointing and recovery - Fault tolerance in distributed systems - Hardware and software redundancy - Byzantine agreement Distributed Real-Time Systems - Physical Clocks - Clock Synchronization - Real-Time Scheduling - Real-Time Communication