CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University



Similar documents
RARP: Reverse Address Resolution Protocol

Protocols and Architecture. Protocol Architecture.

Introduction to IP v6

Network Programming TDC 561

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

04 Internet Protocol (IP)

Computer Networks. Chapter 5 Transport Protocols

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Communications and Computer Networks

IP Multicasting. Applications with multiple receivers

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Definition. A Historical Example

Tomás P. de Miguel DIT-UPM. dit UPM

Implementing DHCPv6 on an IPv6 network

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

What is CSG150 about? Fundamentals of Computer Networking. Course Outline. Lecture 1 Outline. Guevara Noubir noubir@ccs.neu.

Chapter 14: Distributed Operating Systems

Internet Protocols Fall Lectures 7-8 Andreas Terzis

CS 457 Lecture 19 Global Internet - BGP. Fall 2011

Slide 1 Introduction cnds@napier 1 Lecture 6 (Network Layer)

Internet Protocol Address

Chapter 11. User Datagram Protocol (UDP)

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Module 15: Network Structures

Chapter 16: Distributed Operating Systems

Objectives of Lecture. Network Architecture. Protocols. Contents

IP Network Layer. Datagram ID FLAG Fragment Offset. IP Datagrams. IP Addresses. IP Addresses. CSCE 515: Computer Network Programming TCP/IP

Concepts and Mechanisms for Consistent Route Transitions in Software-defined Networks

Three Key Design Considerations of IP Video Surveillance Systems

Computer Network Foundation. Chun-Jen (James) Chung. Arizona State University

SFWR 4C03: Computer Networks & Computer Security Jan 3-7, Lecturer: Kartik Krishnan Lecture 1-3

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

IP Addressing A Simplified Tutorial

8.2 The Internet Protocol

Guide to Network Defense and Countermeasures Third Edition. Chapter 2 TCP/IP

Chapter 3. Internet Applications and Network Programming

Variable length subnetting

Internet Control Protocols Reading: Chapter 3

Neighbour Discovery in IPv6

CS268 Exam Solutions. 1) End-to-End (20 pts)

Module 2 Overview of Computer Networks

The necessity of multicast for IPTV streaming

PART OF THE PICTURE: The TCP/IP Communications Architecture

A Transport Protocol for Multimedia Wireless Sensor Networks

Per-Flow Queuing Allot's Approach to Bandwidth Management

IPv6 Associated Protocols

Proxy Server, Network Address Translator, Firewall. Proxy Server

How To. Instreamer to Exstreamer connection. Project Name: Document Type: Document Revision: Instreamer to Exstreamer connection. How To 1.

Internet Protocol (IP) IP - Network Layer. IP Routing. Advantages of Connectionless. CSCE 515: Computer Network Programming IP routing

Mobile IP Network Layer Lesson 02 TCP/IP Suite and IP Protocol

Network Level Multihoming and BGP Challenges

Internetworking. Problem: There is more than one network (heterogeneity & scale)

Review: Lecture 1 - Internet History

Computer Networks/DV2 Lab

Internet Protocols. Addressing & Services. Updated:

Network Layers. CSC358 - Introduction to Computer Networks

IP addressing and forwarding Network layer

Efficient Addressing. Outline. Addressing Subnetting Supernetting CS 640 1

HPAM: Hybrid Protocol for Application Level Multicast. Yeo Chai Kiat

[Prof. Rupesh G Vaishnav] Page 1

Network-Oriented Software Development. Course: CSc4360/CSc6360 Instructor: Dr. Beyah Sessions: M-W, 3:00 4:40pm Lecture 2

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

IP and Mobility. Requirements to a Mobile IP. Terminology in Mobile IP

Ref: A. Leon Garcia and I. Widjaja, Communication Networks, 2 nd Ed. McGraw Hill, 2006 Latest update of this lecture was on

Lecture 8. IP Fundamentals

Course Overview: Learn the essential skills needed to set up, configure, support, and troubleshoot your TCP/IP-based network.

Chapter 17: Distributed Systems

Guide to TCP/IP, Third Edition. Chapter 3: Data Link and Network Layer TCP/IP Protocols

ECE 358: Computer Networks. Solutions to Homework #4. Chapter 4 - The Network Layer

Module 2: Assigning IP Addresses in a Multiple Subnet Network

Internet Protocol: IP packet headers. vendredi 18 octobre 13

12 Quality of Service (QoS)

Chapter 4. Distance Vector Routing Protocols

TCP Session Management (SesM) Protocol Specification

IP addressing. Interface: Connection between host, router and physical link. IP address: 32-bit identifier for host, router interface

Instructor Notes for Lab 3

(Refer Slide Time: 02:17)

VXLAN: Scaling Data Center Capacity. White Paper

Interconnecting IPv6 Domains Using Tunnels

Protocols. Packets. What's in an IP packet

ACHILLES CERTIFICATION. SIS Module SLS 1508

Objectives. The Role of Redundancy in a Switched Network. Layer 2 Loops. Broadcast Storms. More problems with Layer 2 loops

Ethernet. Ethernet. Network Devices

Lecture Computer Networks

Module 7. Routing and Congestion Control. Version 2 CSE IIT, Kharagpur

Web Service Robust GridFTP

Virtual PortChannels: Building Networks without Spanning Tree Protocol

CSE 3461 / 5461: Computer Networking & Internet Technologies

ΕΠΛ 674: Εργαστήριο 5 Firewalls

What is VLAN Routing?

Internet Packets. Forwarding Datagrams

Internet Protocol version 4 Part I

Transcription:

CS 555: DISTRIBUTED SYSTEMS [MESSAGING SYSTEMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Daisy chain MapReduce jobs? Multiple mappers or reducers in a row Mapper has produced intermediate results and then fails before data is fetched by reducer. What happens? If we run the same job multiple times are the combiner groups the same every time? Problem domains where MapReduce is not useful? Alternatives to Hadoop? If source code contains while(true) will Hadoop skip it? How do static variables work in Mapper/Reducer Can we configure skip bad records in MapReduce? org.apache.hadoop.mapred.skipbadrecords Setting number of mappers: conf.setnummaptasks(int num) Can we define a combiner that is different from the reducer? If mapper produces (k1, v1) can combiner produce (k2, v2)? L13.1 L13.2 Topics covered in this lecture Messaging Systems IP Multicast Indirect communications Group-based communications Several distributed systems built on top of the service offered by transport layer Sockets Message Passing Interface (MPI) L13.3 L13.4 Interfaces to the transport layer Socket primitives for TCP/IP Sockets interface Introduced in the 1970s in Berkley Unix X/Open Transport Interface (XTI) by AT&T Sockets and XTI are very similar Primitive Socket Bind Listen Accept Connect Send Receive Close Meaning Create a new communication endpoint Attach a local address to a socket Announce willingness to accept connections Block caller until a request arrives Actively attempt to establish a connection Send some data over the connection Receive some data over the connection Release the connection L13.5 L13.6 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.1

The communication pattern using sockets Message Passing Interface (MPI) Server socket bind listen accept read write close Synchronization point Communications socket connect write read close High performance computers need highly efficient communications Primitives must be: At the right level of abstraction Implementation must have minimal overhead Client L13.7 L13.8 Sockets were deemed inefficient Wrong level of abstraction Only simple {send, receive} primitives Performance issues Designed for network communications Uses general-purpose stacks Not suitable for proprietary protocols in HPC Result? Most HPC systems were shipped with proprietary communication libraries High-level and efficient communication primitives Of course they were all mutually incompatible Portability became an issue L13.9 L13.10 Need for hardware and platform independence led to a standard Message Passing Interface (MPI) Designed for parallel applications Tailored for transient communications Assumes process crashes/partitions are fatal No automatic recovery MPI assumes communications takes place within a group of processes Each group is assigned an identifier Each process within a group has an identifier (groupid, processid) pair Uniquely indentifies endpoints Instead of transport-level addresses L13.11 L13.12 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.2

Some of the message passing primitives in MPI MPI offers a very large number of communication primitives Primitive MPI_bsend MPI_send MPI_ssend Meaning Append outgoing message to local send buffer Send message and wait until it is copied to local or remote buffer Send message and wait until receipt starts MPI_sendrecv Send message and wait for reply MPI_isend Pass reference to outgoing message, and continue Official reason More possibilities to improve performance Pick and choose most suitable one Cynical view Committee could not make up its collective mind Threw in everything! MPI_issend Pass reference to outgoing message Wait until receipt starts MPI_recv Receive a message; block if there is none MPI_irecv Check if there is an incoming message; But do not block L13.13 L13.14 Multicast communications Uses datagram packets Sender sends packet to multicast address 224.0.0.0 to 239.255.255.255 Class D IPv6 Multicast addresses have the prefix ff00::/8 MULTICAST COMMUNICATIONS Bits 8 4 4 112 Field Prefix Flags Scape Group ID L13.15 L13.16 Multicast communications: Routing data Routers make sure packet delivered to all hosts in multicast group Choose points where streams are duplicated Pay attention to TTL for the datagrams Maximum number of routers a datagram is allowed to cross Multicast issues Packet size restrictions in Multicast Practical considerations Turned off at several institutions to curb free-riding L13.17 L13.18 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.3

Multicast is not particularly suitable in some situations Consumption patterns change dynamically n Groups cannot be pre-allocated Representing consumption profiles as groups n Enormous number of groups potentially 2 N for N consumers n Eliminating impossible groups would still require millions of groups APPLICATION LEVEL MULTICASTING L13.19 L13.20 Application-level multicasting Sending data to multiple receivers Multicasting Belonged in the domain of network protocols Set up communication paths No convergence on protocols ISPs are reluctant to support this Construction of the overlay network Nodes organize into a tree Unique path between every pair of nodes Organize nodes into a mesh network Every node has multiple neighbors Multiple paths exist between pairs of nodes More robust to failures L13.21 L13.22 Use of rendezvous nodes New node contacts rendezvous node to get a partial list of nodes New node uses this list to determine which nodes to connect to Metrics used in decision making n Load, delays, bandwidths, etc INDIRECT COMMUNICATIONS L13.23 L13.24 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.4

Indirect communications Communications between entities in a distributed system through an intermediary Senders and receivers have no direct coupling Direct coupling E.g. TCP communications between 2 endpoints Leads to a rigidity in terms of dealing with change Example A client-server system Difficult to replace a server with another one that offers equivalent functionality L13.25 L13.26 Key properties of using an intermediary for communications Space uncoupling Time uncoupling Space uncoupling Sender does not know (or need to know) the identity of receivers And vice versa Developer has a degree of freedom in dealing with change Participants (senders or receivers) can be replaced, updated, replicated or migrated L13.27 L13.28 Time uncoupling Sender and receiver(s) can have independent lifetimes Do not need to exist at the same time to communicate Useful in volatile environments Where senders and receivers join and leave constantly Indirect communications: Primary uses In distributed systems where change is anticipated E.g. mobile environments Event disseminations Situations where receivers are unknown and may be liable to change L13.29 L13.30 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.5

Indirect communications: Disadvantages Performance overhead due to the added level of indirection Systems developed using this concept are more difficult to manage Precisely because of the lack of coupling Indirection may imply decoupling in space and time. but not always the case (1/2) Space-uncoupled, time-coupled IP multicast Space uncoupled n Messages directed to a group, not a specific receiver Time coupled n All receivers must exist at the time of the message-send to L13.31 L13.32 Indirection may imply decoupling in space and time. but not always the case (2/2) Space and time coupling in distributed systems Space-coupled but time-uncoupled Sender knows the identity of specific receiver or a set of receivers Receiver(s) may not exist at the time of sending Example: E-mail Space coupling Space uncoupling Time-coupled Communication directed towards a given receiver(s) Receiver(s) must exist at that moment in time E.g.: Remote invocation Sender does not need to know the identity of the receiver(s) Receiver(s) must exist at that moment in time E.g.: IP Multicast Time-uncoupled Communication directed towards a given receiver(s) Sender and Receiver(s) can have independent lifetimes E.g.: E-mail Sender does not need to know the identity of the receiver(s) Sender and Receiver(s) can have independent lifetimes E.g.: publish/subscribe L13.33 L13.34 Time-coupling and asynchronous communications Asynchronous communications Sender sends message and continues without blocking Time uncoupling Extra dimension where sender and receiver can have independent existence GROUP COMMUNICATIONS L13.35 L13.36 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.6

Group communications Service where message is sent to a group Message is then delivered to all members of the group Sender is not aware of the identities of the receivers Group communications May be implemented over IP Multicast Overlay network Adds extra value in terms of Managing group membership, detecting failures and providing reliability, and ordering guarantees Group communication is to IP Multicast What TCP is to point-to-point service in IP L13.37 L13.38 The programming model Central concept is a group This is associated with group membership Processes may join or leave the group Process sends message to this group Propagated to all members with guarantees relating to reliability and ordering Communications: All members in a group: broadcast Communication with a single member: unicast Key feature of group communications Process issues only one operation to send the message to the group E.g. agroup.send(amessage) It does not issue multiple send operations to individual processes This has much more than convenience implications for the programmer L13.39 L13.40 Advantage of a single group-send over multiple individual-send operations Enables efficient utilization of bandwidth Take steps to send the message no more than once over a communication link n For e.g., send it over a distribution tree n Use network hardware support for multicast when available Minimize total time taken to deliver message to all destinations As compared to transmitting separately and serially This is also important in terms of delivery guarantees If a process performs multiple, independent send operations? No way to provide guarantees that affect the group as a whole If sender fails halfway through sending? n Some members receive while others don t Relative ordering of two messages delivered to any two group members is also undefined L13.41 L13.42 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.7

As opposed to multiple, independent individual messages Group communications can provide a range of guarantees for: Reliability Ordering Process groups Closed If only members of the group may multicast to it Open If processes outside the group may send messages to it Overlapping Entities within a group may be members of multiple groups Non-overlapping groups: memberships do not overlap L13.43 L13.44 Reliability in one-to-one communications? PROCESS GROUPS IMPLEMENTATION ISSUES Integrity Message received is the same as the one that was sent Validity Any outgoing message is eventually delivered Duplicate elimination Exactly one copy of the message is delivered L13.45 L13.46 Group communications Extends semantics to cover delivery to multiple receivers Adds an an additional property to the reliable delivery requirements Agreement If message is delivered to one process, it is delivered to all processes in the group Relative ordering of messages delivered to multiple destinations FIFO ordering Preserve order from the perspective of a sending process Causal ordering Based on the happens before relationship Total ordering Same order will be preserved at all processes L13.47 L13.48 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.8

Group membership management [1/4] Provide interface for group membership changes Interfaces to: Create and destroy process groups Add or withdraw process from a group Group membership management [2/4] Failure detection Monitor group members Not just for crash failures at the nodes But also for unreachability due to communication failures Relies on failures detectors to reach decisions about group membership n Detector marks processes as Suspected or Unsuspected L13.49 L13.50 Group membership management [3/4] Notify members about membership changes When a process Is added Excluded n Due to failure detection n Deliberate withdrawal Group membership management [4/4] Perform group address expansion When process multicasts, it supplies group identifier rather than a list of processes Membership service expands identifier into the current membership for delivery L13.51 L13.52 IP Multicast WHY IP MULTICAST IS A WEAK CASE OF GROUP MEMBERSHIP SERVICE Allows processes to join or leave groups dynamically Performs address expansion Senders only specify the IP multicast address as the destination of the message But it does not: Provide members with information abut membership Delivery is not coordinated with membership changes Achieving these properties is referred to as viewsynchronous group communication L13.53 L13.54 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.9

The contents of this slide set are based on the following references Distributed Systems: Concepts and Design. George Coulouris, Jean Dollimore, Tim Kindberg, Gordon Blair. 5th Edition. Addison Wesley. ISBN: 978-0132143011. [Chapter 6] Distributed Systems: Principles and Paradigms. Andrew S. Tanenbaum and Maarten Van Steen. 2nd Edition. Prentice Hall. ISBN: 0132392275/978-0132392273. [Chapter 4] L13.55 SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.10