Synchronization in. Distributed Systems. Cooperation and Coordination in. Distributed Systems. Kinds of Synchronization.



Similar documents
Distributed Synchronization

3.7. Clock Synch hronisation

Homework 1 (Time, Synchronization and Global State) Points

Clock Synchronization

Clock Synchronization

Distributed Systems Theory 6. Clock synchronization - logical vs. physical clocks. November 9, 2009

Chapter 6: distributed systems

Clock Synchronization

LANs. Local Area Networks. via the Media Access Control (MAC) SubLayer. Networks: Local Area Networks

Table of Contents. Cisco Network Time Protocol: Best Practices White Paper

Time Calibrator Fountain Computer Products

Dr Markus Hagenbuchner CSCI319. Distributed Systems

ECE 358: Computer Networks. Homework #3. Chapter 5 and 6 Review Questions 1

The Role of Precise Timing in High-Speed, Low-Latency Trading

Stop And Wait. ACK received; transmit frame 2 CS 455 3

Wide Area Monitoring, Control, and Protection

Time Synchronization in IP Networks J.M.Suri, DDG(I), TEC B.K.Nath, Dir(I), TEC

Precision Time Protocol (PTP/IEEE-1588)

RARP: Reverse Address Resolution Protocol

Monitoring the NTP Server. eg Enterprise v6.0

Distributed Databases

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

IEEE-1588 Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems

Computer Time Synchronization

Choosing the correct Time Synchronization Protocol and incorporating the 1756-TIME module into your Application

Time Synchronization & Timekeeping

Introduction to Distributed Systems

STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT

Network Layer: Network Layer and IP Protocol

Redes de Comunicação em Ambientes Industriais Aula 7

Data Link Layer(1) Principal service: Transferring data from the network layer of the source machine to the one of the destination machine

Ethernet Port Quick Start Manual

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software

Distributed Software Systems

Network Time Management Configuration. Content CHAPTER 1 SNTP CONFIGURATION CHAPTER 2 NTP FUNCTION CONFIGURATION

Final Exam. Route Computation: One reason why link state routing is preferable to distance vector style routing.

Accordingly, it is useful to explore alternative access strategies using simpler software appropriate for less stringent accuracy expectations.

Computer Time Synchronization

A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs

Transport Layer Protocols

Clocks/timers, Time, and GPS

Real Time Programming: Concepts

Clocking. Clocking in the Digital Network

IP - The Internet Protocol

Based on Computer Networking, 4 th Edition by Kurose and Ross

Ethernet. Ethernet. Network Devices

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages

OSPF Routing Protocol

Variable length subnetting

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

Professor: Ian Foster TAs: Xuehai Zhang, Yong Zhao. Winter Quarter.

CONTROL MICROSYSTEMS DNP3. User and Reference Manual

Lecture 15. IP address space managed by Internet Assigned Numbers Authority (IANA)

Principles and characteristics of distributed systems and environments

Middleware and Distributed Systems. System Models. Dr. Martin v. Löwis. Freitag, 14. Oktober 11

Route Discovery Protocols

The basic mode for adjusting a time zone clock are primarily: 21, 24 and 51-1 (51-1 is for Alpha Characters) Entering Mode Programming

Clearing the Way for VoIP

Metrics One Way. Two way. Availability Other metrics (R, BW,..?) latency packet loss Jitter delay variation. extensions of one-way metrics

TI GPS PPS Timing Application Note

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I)

Dynamic Source Routing in Ad Hoc Wireless Networks

GPS NTP Time Server for Intranet Networks DIN RAIL Version

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

MN-700 Base Station Configuration Guide

Final for ECE374 05/06/13 Solution!!

Managing Users and Identity Stores

Best Practices for Leap Second Event Occurring on 30 June 2015

ZooKeeper. Table of contents

Delivering NIST Time to Financial Markets Via Common-View GPS Measurements

Database Replication with Oracle 11g and MS SQL Server 2008

LAN Switching Computer Networking. Switched Network Advantages. Hubs (more) Hubs. Bridges/Switches, , PPP. Interconnecting LANs

Precision Time Protocol on Linux ~ Introduction to linuxptp

CS 457 Lecture 19 Global Internet - BGP. Fall 2011

RESILIENT NETWORK DESIGN

Transport layer protocols. Message destination: Socket +Port. Asynchronous vs. Synchronous. Operations of Request-Reply. Sockets

Chapter 9: Distributed Mutual Exclusion Algorithms

Operating Systems and Networks Sample Solution 1

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Westek Technology Snapshot and HA iscsi Replication Suite

Wireless LAN Concepts

IP SLAs Overview. Finding Feature Information. Information About IP SLAs. IP SLAs Technology Overview

TIMING SIGNALS, IRIG-B AND PULSES

Integrating Network Attached Mass Storage Systems into Educational Networks: Performance and Security Issues

Microsoft SQL Server Data Replication Techniques

This section will focus on basic operation of the interface including pan/tilt, video, audio, etc.

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

04 Internet Protocol (IP)

PHASOR MEASUREMENT UNIT (PMU) AKANKSHA PACHPINDE

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

Using Network Time Protocol (NTP): Introduction and Recommended Practices

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme. Auxiliary Protocols

Internet Timekeeping Around the Globe 1,2

Construct User Guide

Network Time Protocol: Best Practices White Paper

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

Implementing Storage Concentrator FailOver Clusters

Transcription:

Cooperation and Coordination in Distributed Systems Communication Mechanisms for the communication between processes Naming for searching communication partners Synchronization in Distributed Systems But... not enough for cooperation: Simultaneous access to a Synchronization shared resource Ordering of events Deadlock avoidance Agreement on actions Consistency in transaction processing Groups of replicated objects More complicated problems than in central systems! Page 1 1 Page Kinds of Synchronization The Role of Synchronization based on actual (absolute) time Synchronization by relative ordering of events Distributed global states Using a coordinator: election mechanisms Mutual exclusion for protection against multiple access Distributed transactions A distributed system consists of a number of processes Each process has a state (values of variables) Each process takes actions to change its state, or to communicate with other processes (send, receive) An event is the occurrence of an action Events within a process can be ordered by the time of occurrence In distributed systems, also the time order of events on different machines and between different processes has to be known Needed: concept of global time, i.e. local clocks of machines have to be synchronized Page 3 3 Page 4 4

Clock Synchronization Clocks in distributed systems are independent Some (or even all) clocks are inaccurate When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time. How to determine the right sequence of events? Clocks Necessary for synchronization: assign a timestamp with each event But... how to determine the own resp. all other times in the system? Example Compiler synchronization is needed considering the absolute time on all machines: Network How can we - synchronize clocks with real world? - synchronize clocks with each other? Page 5 5 Skew: the difference between the times on two clocks (at any instant) Computer clocks are subject to clock drift (they count time at different speeds) Clock drift rate: the difference per unit of time from some ideal reference clock Ordinary quartz clocks drift by about 1 sec in 11-1 days. (1-6 secs/sec). High precision quartz clocks drift rate is about 1-7 or 1-8 secs/sec Page 6 6 Universal Coordinated (UTC) Clock Synchronization Algorithms International Atomic is based on very accurate atomic clocks (drift rate 1-13 ). Problem: Atomic day is 3 msec shorter than a solar day UTC is an international standard for time keeping solving this problem It is based on atomic time, but occasionally adjusted to astronomical time: when the difference to the solar time grows up 8 msec, an additional leap second is inserted It is broadcasted from radio stations on land and satellite (e.g. GPS) Computers with receivers can synchronise their clocks with these timing signals (But: only a small fraction of all computers have such receivers!) Problem with received UTC: propagation delay has to be considered Signals from land-based stations are accurate to about.1-1 milliseconds Signals from GPS are accurate to about 1 microsecond Universal Coordinated (as reference time): t Clock time on machine p: C p (t) Perfect world: C p (t) = t, i.e. dc/dt = 1 Reality: there is a clock drift so that a maximum drift rate can be specified: ρ : 1 - ρ dc/dt 1 + ρ Needed for synchronization: definition of a tolerable skew, the maximum time drift δ With this, re-synchronization has to be made in certain intervals: all δ/ρ seconds How to make such a re-synchronization? Page 7 7 Page 8 8

Cristian's Algorithm There is one central time server T with a UTC receiver All other machines M are contacting the time server at least all δ/ρ seconds T responds as fast as it can M computes current time: Hold time t send for sending the request Measure time when response with t UTC arrives (t receive ) Subtract service time t response of T Divide by two to consider only the time since the reply was sent Add 'delivery time' to the time t UTC sent by T Result t synchronous becomes new system time M M time? Page 9 9 t UTC T t send t UTC t receive } t response Both values are measured with the same clock t receive t send t response t synchronous = t UTC + Consider message run-time, avoid M's time to be moved back The Berkeley Algorithm Another approach (Berkeley Unix): active time server logical synchronization 1. time server sends its time to all machines. the machines answer with their current deviation from the time server 3. the time server sums up all deviations and divides by the number of machines (including itself!) 4. the new time for each machine is given by the mean time Important: fast clocks are not moved back, but instructed to move slower 1:8 1 T 1:8 1:8 1:8 M 1 1: M 1 1: Page 1 1 M M 1:3 M 3 1:6 1:3 1:8 3 T d = -1 M 3 1:6 d=-6 M 1 1: M 1 1:7 1:8 d= T d=- 1:3 1:8, s.d. 4 T +5 +1-5 d=4 M M M 3 1:6 M 3 1:7 1:3, slow down Distributed Algorithms Problem with Cristian/Berkeley: use of a centralized server; mainly used in Intranets Simple mechanism for decentralized synchronization (based on Berkeley Algorithm): Divide time into fixed-length synchronization intervals At the beginning of each interval all machines Broadcast their current time Collect all values of other machines arriving in a given time span Compute the new time - by simply averaging all answers, or - by discarding the m highest and the m lowest answers before averaging (to protect against faulty clocks), or - by averaging values corrected by an estimation of their propagation time.... but: in large-scale networks, the broadcasting could become a problem widely used algorithm in the Internet: Network Protocol (NTP) Page 11 11 Network Protocol (NTP) NTP is a time service designed for the Internet Reliability by using redundant paths Scalable to large number of clients and servers Authenticates time sources to protect against wrong time data NTP is provided by a network of time servers distributed across the Internet Hierarchical structure: synchronization subnet tree Primary servers are connected to UTC sources Secondary servers are synchronized to primary servers More accurate time 3 (Synchronization subnet ) 1 3 3 Lowest level servers in users computers, synchronised to secondary servers Page 1 1 Note: this is only an example, there can be more than three layers

NTP - Synchronization of Servers The synchronization subnet can reconfigure if failures occur, e.g. a primary that loses its UTC source can become a secondary a secondary that loses its primary can use another primary Modes of synchronization: Multicast A server within a LAN multicasts time to others which set clocks assuming some delay (not very accurate) Procedure call A server accepts requests from other computers (like in Cristiain s algorithm). Higher accuracy than using multicast (and a solution if no multicast is supported) Symmetric Pairs of servers exchange messages containing time information Used when very high accuracy is needed (e.g. for higher levels) All modes use UDP to transfer time data Messages exchanged between a Pair of NTP Peers UTC is sent in messages between the servers Each message contains timestamps of recent events, e.g. for message m : Local times of Send (T i-3 ) and Receive (T i- ) of previous message m Local time of Send (T i-1 ) of current message m Recipient of m notes the time of receipt T i ( it then knows T i-3, T i-, T i-1, T i ) In symmetric mode there can be a non-negligible delay between messages Machine B Machine A T i- 3 T i- m T i-1 m' T i Page 13 13 Page 14 14 Accuracy of NTP Accuracy of NTP Server B T i- Server A T i- 3 T i For each pair of messages between two servers, NTP estimates an offset o i between the two clocks and a delay d i (total time for transmitting the two messages, which take t and t ). You have: T i- = T i-3 + t + o and T i = T i-1 + t o for the current offset o between A and B This gives us (by adding the equations) : d i = t + t = T i- -T i-3 + T i - T i-1 Also (by subtracting the equations) o = o i + (t -t )/ where o i = (T i- -T i-3 - T i + T i-1 )/ m Page 15 15 T i-1 m' Server B Server A T i- 3 T i- Using the fact that t, t > it can be shown that o i - d i / o o i + d i /. m Thus o i is an estimation of the offset and d i is a measure of the accuracy NTP servers filter pairs <o i, d i >, estimating reliability of time servers from variations in pairs and accuracy of estimations by low delays d i, allowing them to select peers Accuracy of 1s of milliseconds over Internet paths, 1 millisecond on LANs Page 16 16 T i-1 m' T i

Lamport stamps The absolute time is not needed in any case. Often enough: ordering of events only with respect to logical clocks Relation: happens-before: a b means that a happens before b (Meaning: all processes agree that event a happens before event b) 1. a b is true, when both events occur in the same process. a b is true, if one process is sending a message (event a) and another process is receiving this message (event b) 3. is transitive 4. neither a b nor b a is true, if they occur in two processes which do not exchange messages (Concurrent Processes/Events, notation: a b) Needed: assign a (time) value C(a) to an event a on which all processes agree, with C(a) < C(b) if a b Lamport's Algorithm Page 17 17 Lamport's Algorithm Process 1 Process Process 3 6 1 18 4 3 36 4 48 54 6 A D 8 16 4 3 4 48 56 64 7 8 B C 1 3 4 5 6 7 8 9 1 Unsynchronized clocks: messages C and D arrive before they are sent Solution using the 'happens before' relation: Process 1 Process Process 3 6 1 18 4 3 36 4 48 7 76 A(6) D (69) Page 18 18 8 16 4 3 4 48 61 69 77 86 B (4) C (6) 1 3 4 5 6 7 8 9 1 initialize all clocks with sending local time with each message arriving before sending violates the 'happens before' relation. In this case, forward the clock of the receiver to the next higher value Addition: for all events a and b holds C(a) C(b). This can be achieved by attaching the local process numbers to the local time (eg. 8561453.13) Application of Lamport stamps Replicated database: updates have to be performed in a certain order Enhancement: Vector stamps Problem with Lamport timestamps: they do not capture causality Using vector timestamps Definition: A vector timestamp VT(a) for event a is in relation VT(a) < VT(b) to event b, if a is known to causally precede b. VT is constructed by each process P i as a vector V i with: 1. V i [i] is the number of events that have occurred so far at P i Required: totally-ordered multicast Using Lamport's stamps: Each message is time stamped with the current (logical) time of the sender The messages are sent to all receivers (and to the sender itself!) Received messages are ordered by their timestamps Receivers multicast acknowledgements Only after receiving acknowledgements from all receivers, the message with the lowest timestamp is read by the processes Page 19 19. If V i [j] = k then P i knows that k events have occurred at P j When P i sends a message m, then it sends along its current V i This timestamp vector tells the receiver P j how many events in other processes have preceded m P j adjusts its own vector for each k to V j [k] = max{v j [k], V i [k]} (These entries reflect the number of messages that P j has to receive to have at least seen the same messages that preceded the sending of m) Add 1 to entry V j [j] for the event of receiving m Page

Vector stamps - Example Vector stamps - Example Vector clock V i at process p i is an array of N integers initially V i [j] = for i, j = 1,, N Before p i timestamps an event it sets V i [i] := V i [i] +1 p i piggybacks V i on every message it sends When p j receives (m, V i ) it sets V j [j] = V j [j] +1 for the receiving event and afterwardsv j [k] := max(v i [k], V j [k]) k = 1,, N p 1 (1,,) a (,,) b m 1 Host 1 Host Host 3 Host 4,,,,,,,,,,,, Physical 1,,,,,, (1,,,) 1,,, 1,1,, (,,,) (1,,,),,,,,1,,,3, (,,,),,,1,,, 3,,, (,,,) 4,,, (4,,,) 4,,4, p p 3 (,,1) e (,1,) (,,) c d m f (,,) Physical time n,m,p,q Vector logical clock (vector timestamp) Message Page 1 1 Page Causality Violation Vector timestamps can be used for detecting causality violations: P1 P P3 P has obj1 1 Physical 1 4 3 4 5 Include(obj1) 6 obj1.method() Causality violation occurs when the order of messages causes an action based on information that another host has not yet received. In designing a distributed system, potential for causality violation is important Detecting Causality Violation P1 P P3,,,,,, 1,, Physical,, (,,) (1,,),,1,1, (,,),,,, Violation: (1,,) < (,1,) Potential causality violation can be detected by vector timestamps. If the vector timestamp of a message is less than the local vector timestamp, on arrival, there is a potential causality violation. Page 3 3 Page 4 4

Global State Often required: not only ordering of events, but global state of a distributed system Global state = local state of each process + messages currently in transit Examples: a. Garbage collection object reference p 1 message o p garbage object Object o seems to be garbage, but it has sent a message containing a reference to it Global State Problem with getting a global state: there is no global time! To do: get a global state from lots of local states recorded at different real times Graphically for global state: cut b. Deadlock c. Termination p 1 p 1 passive wait-for wait-for activate Page 5 5 p p passive Both processes are waiting for a message from the other process Both processes are passive and seem to be terminated, but in fact there is a message sent by p to activate p 1 Allows sent messages Consistent cut Allows no received-butnot-sent messages A global state is consistent if it corresponds to a consistent cut Inconsistent cut Page 6 6 Distributed Snapshot Chandy/Lamport: distributed snapshot (reflects a consistent global state) Assumptions: No process or communication failures occur, all messages arrive intact, exactly once Communication channels are unidirectional and FIFO-ordered There is a communication path between any two processes Any process may initiate the snapshot (sends Marker) Snapshot does not interfere with normal execution Each process records its state and the state of its incoming channels (no central collection) Distributed Snapshot Taking a snapshot: Any process P can initialize the computation by recording the local state P sends a marker to each process to which he has a communication channel Q receives marker First marker received record local state and send a marker on each outgoing channel All other markers: record all incoming messages for each channel One marker for each incoming channel received: stop recording and send results to P Local state is recorded, send markers Record all messages received after recording Local state and messages are recorded Page 7 7 Page 8 8

Snapshot Algorithm of Chandy/Lamport Marker receiving rule for process p i On p i s receipt of a marker message over channel c: if (p i has not yet recorded its state) it records its process state now; records the state of c as the empty set; turns on recording of messages arriving over other incoming channels; else p i records the state of c as the set of messages it has received over c since it saved its state. end if Marker sending rule for process p i After p i has recorded its state, for each outgoing channel c: p i sends one marker message over c (before it sends any other message over c). Page 9 9