Distributed Applications - Verteilte Anwendungen Johann Schlichter Institut für Informatik TU München, Munich, Germany April 2013 Vorlesungsunterlagen (Student Script 1 ) 1 Script generated by Targeteam; Not for general Distribution
Contents 1 Overview 2 1.1 Lecture Content........................... 2 1.2 Bibliography............................ 3 1.2.1 Course Text Books..................... 3 1.2.2 Further Reading...................... 4 1.3 Abbreviations............................ 5 2 Introduction 7 2.1 Issues................................ 7 2.2 Background............................. 7 2.2.1 Development of computer technology........... 7 2.2.2 Internet computing..................... 8 2.2.3 Enterprise Computing................... 9 2.3 Key Characteristics of distributed Systems............. 10 2.3.1 Motivation......................... 10 2.3.2 Properties of distributed systems.............. 11 2.3.3 Challenges of distributed systems............. 11 2.3.4 Examples for development frameworks.......... 12 2.4 Distributed application....................... 13 2.4.1 Definition.......................... 13 2.4.2 Programmer s perspective................. 13 2.4.3 Distributed application vs. parallel program........ 14 2.5 Influential distributed systems................... 14 2.5.1 Mach............................ 14 i
CONTENTS 2.5.2 Sun Network File System (NFS).............. 16 2.5.3 Java 2 Platform Enterprise Edition (J2EE)......... 18 2.5.4 Google........................... 22 3 Architecture of distributed systems 23 3.1 Issues................................ 23 3.2 System Models........................... 23 3.2.1 Architectural model.................... 23 3.2.2 Interaction model...................... 24 3.2.3 Failure model........................ 25 3.2.4 Security model....................... 25 3.3 Transparency............................ 26 3.3.1 Location transparency................... 26 3.3.2 Access transparency.................... 26 3.3.3 Replication transparency.................. 27 3.3.4 Migration transparency................... 27 3.3.5 Language transparency................... 28 3.3.6 Other transparencies.................... 28 3.3.7 Goal for distributed applications.............. 29 3.4 Paradigms for distributed applications............... 30 3.4.1 Information Sharing.................... 30 3.4.2 Message exchange..................... 30 3.4.3 Naming entities....................... 33 3.4.4 Bidirectional communication................ 34 3.4.5 Producer-consumer interaction............... 38 3.4.6 Client-server model..................... 38 3.4.7 Peer-to-peer model..................... 40 3.4.8 Group model........................ 44 3.4.9 Taxonomy of communication............... 44 3.4.10 Levels of Abstraction.................... 45 3.5 Client-server model......................... 46 3.5.1 Terms and definitions.................... 46 3.5.2 Concepts for client-server applications........... 49 ii
CONTENTS 3.5.3 Processing of service requests............... 50 3.5.4 File service......................... 51 3.5.5 Time service........................ 53 3.5.6 Name service........................ 53 3.5.7 LDAP - Lightweight Directory Access Protocol...... 54 3.5.8 Failure tolerant services.................. 60 4 Remote Invocation (RPC/RMI) 63 4.1 Issues................................ 63 4.2 Introduction............................. 63 4.2.1 Local vs. remote procedure call.............. 63 4.2.2 Definition.......................... 64 4.2.3 RPC properties....................... 64 4.3 Distributed applications based on RPC............... 67 4.3.1 Distributed application................... 67 4.3.2 RPC language....................... 71 4.3.3 Phases of RPC based distributed applications....... 71 4.4 Remote Method Invocation (RMI)................. 73 4.4.1 Definitions......................... 74 4.4.2 RMI characteristics..................... 74 4.4.3 RMI architecture...................... 75 4.4.4 Locating remote objects.................. 77 4.4.5 Developing RMI applications............... 78 4.4.6 Parameter Passing in RMI................. 83 4.4.7 Distributed garbage collection............... 84 4.5 Servlets............................... 84 4.5.1 Servlet Properties...................... 85 4.5.2 Servlet Lifecycle...................... 85 4.5.3 HttpServlet Interface.................... 86 4.5.4 Structure of a Servlet.................... 87 5 Basic mechanisms for distributed applications 91 5.1 Issues................................ 91 iii
CONTENTS 5.2 External data representation.................... 91 5.2.1 Marshalling and unmarshalling............... 92 5.2.2 Centralized transformation................. 92 5.2.3 Decentralized transformation................ 92 5.2.4 Common external data representation........... 93 5.2.5 XML as common data representation........... 95 5.3 Time................................. 97 5.3.1 Introduction......................... 97 5.3.2 Synchronizing physical clocks............... 99 5.4 Distributed execution model.................... 105 5.4.1 Events............................ 105 5.4.2 Ordering by logical clocks................. 107 5.4.3 Logical clocks based on scalar values........... 107 5.4.4 Logical clocks based on vectors.............. 108 5.5 Failure handling in distributed applications............. 110 5.5.1 Motivation......................... 110 5.5.2 Steps for testing a distributed application......... 110 5.5.3 Debugging of distributed applications........... 111 5.5.4 Approaches of distributed debugging........... 112 5.6 Distributed transactions....................... 113 5.6.1 General observations.................... 113 5.6.2 Isolation.......................... 114 5.6.3 Atomicity and persistence................. 116 5.6.4 Two-phase commit protocol (2PC)............. 116 5.6.5 Distributed Deadlock.................... 119 5.7 Group communication....................... 122 5.7.1 Introduction......................... 122 5.7.2 Groups of components................... 123 5.7.3 Management of groups................... 124 5.7.4 Message dissemination................... 125 5.7.5 Message delivery...................... 126 5.7.6 Taxonomy of multicast................... 129 iv
CONTENTS 5.7.7 Group communication in ISIS............... 130 5.7.8 JGroups........................... 132 5.8 Distributed Consensus....................... 133 5.9 Authentication service Kerberos.................. 138 5.9.1 Introduction......................... 138 5.9.2 Authentication process scenario.............. 139 6 Web Services 142 6.1 Motivation - Example........................ 142 6.2 Service Oriented Architecture - SOA................ 143 6.2.1 Characteristics....................... 143 6.2.2 Layered Approach..................... 144 6.2.3 Adopting Service Oriented Architecture (SOA)...... 144 6.3 Web Services - Characteristics................... 145 6.3.1 Informal Definition..................... 146 6.3.2 Integration......................... 146 6.3.3 Features of Web Services.................. 146 6.3.4 Potential of Web Services................. 147 6.3.5 Web Services - Distributed Objects............ 147 6.4 Web Services Architecture..................... 148 6.4.1 Web Services interoperability Stack............ 148 6.4.2 Basic Architecture..................... 148 6.4.3 Roles............................ 149 6.4.4 Operations of the Web Service Architecture........ 149 6.4.5 Basic Standard Technologies................ 150 6.4.6 Message Exchange Patterns................ 152 6.5 Simple Object Access Protocol (SOAP).............. 154 6.5.1 Parts of SOAP....................... 155 6.5.2 Exchange Model...................... 155 6.5.3 Using SOAP in HTTP................... 155 6.5.4 SOAP RPC Conventions.................. 156 6.5.5 Minimalist Infrastructure for Web Services........ 158 6.5.6 SOAP-Router........................ 158 v
CONTENTS 6.6 Web Services Description Language (WSDL)........... 159 6.6.1 WSDL Information Model................. 160 6.6.2 Example for SOAP Request/Response........... 162 6.6.3 Generating code from WSDL............... 164 6.6.4 Common bad Practices................... 164 6.7 Universal Description, Discovery, and Integration (UDDI).... 165 6.8 REST................................ 167 6.9 Web Service Composition..................... 168 6.9.1 Dimensions to handle complexity............. 168 6.9.2 Web Service Orchestration................. 169 6.10 Adopting Web Services....................... 170 6.10.1 Example Web Services................... 170 6.10.2 Apache Axis........................ 171 6.10.3 Web Services and Java................... 172 6.10.4 Integration and WS Standards............... 172 6.10.5 Supporting - Restraining Forces.............. 173 6.10.6 Distributed Process Architecture.............. 174 6.10.7 Semantic Web Services................... 174 6.11 Mashups............................... 175 6.11.1 Mashup Techniques.................... 175 6.11.2 Development Support................... 179 7 Design of distributed applications 180 7.1 Issues................................ 180 7.2 Steps in the design of distributed applications........... 181 7.3 Design - Development environment................ 182 7.4 Service-Oriented Modeling..................... 185 7.4.1 Service Evolution...................... 185 7.4.2 Life Cycle Structure.................... 186 7.4.3 Life Cycle Modeling.................... 187 7.4.4 SOM Framework...................... 188 7.4.5 Other SOA Design Methodologies............. 189 vi
CONTENTS 8 Distributed file service 190 8.1 Issues................................ 190 8.2 Introduction............................. 190 8.2.1 Definitions......................... 190 8.2.2 Motivation for replicated files............... 191 8.2.3 Two consistency types................... 191 8.2.4 Replica placement..................... 192 8.3 Layers of a distributed file service................. 193 8.3.1 Layer semantics...................... 193 8.4 Update of replicated files...................... 194 8.4.1 Optimistic concurrency control............... 194 8.4.2 Pessimistic concurrency control.............. 194 8.4.3 Voting schemes....................... 195 8.5 Coda file system........................... 197 8.5.1 Architecture......................... 197 8.5.2 Naming........................... 198 8.5.3 Replication strategy.................... 199 8.5.4 Disconnected operation................... 200 9 Distributed Shared Memory 202 9.1 Introduction............................. 202 9.2 Programming model........................ 203 9.3 Consistency model......................... 203 9.4 Tuple space............................. 204 9.4.1 Atomic operations..................... 204 9.4.2 Tuple space implementation................ 204 9.4.3 Example for client-server communication......... 205 9.5 Object Space............................ 206 9.5.1 Introduction......................... 206 9.5.2 Features of JavaSpaces................... 207 9.5.3 Data structures....................... 207 9.5.4 Basic operations...................... 208 9.5.5 Events............................ 210 9.5.6 Example Java Spaces.................... 211 vii
CONTENTS 10 Object-based Distributed Systems 213 10.1 Object Management Architecture - OMA............. 213 10.2 Object Request Brokers ORB.................... 214 10.2.1 General features...................... 214 10.2.2 Structure of ORB...................... 215 10.3 Common object services...................... 217 10.4 Inter-ORB protocol......................... 218 10.4.1 GIOP Features....................... 218 10.4.2 External data representation................ 219 10.4.3 Object reference...................... 219 10.4.4 GIOP message....................... 219 10.4.5 Example for IIOP use.................... 221 10.4.6 RMI over IIOP....................... 221 10.5 Distributed COM.......................... 222 10.5.1 Object Model........................ 223 10.5.2 Architecture......................... 223 10.5.3 Object Invocation Model.................. 224 10.6.NET Framework.......................... 225 10.6.1 Common Language Runtime (CLR)............ 225 10.6.2 Frame Class Library.................... 226 10.6.3.NET-Remoting....................... 226 11 Summary 227 viii
CONTENTS Prof. J. Schlichter Lehrstuhl für Angewandte Informatik / Kooperative Systeme, Fakultät für Informatik, TU München Boltzmannstr. 3, 85748 Garching Email: schlichter@in.tum.de (URL: mailto:schlichter@in.tum.de) Tel.: 089-289 18654 URL: http://www11.in.tum.de/ 1
Chapter 1 Overview introduction of basic concepts for the design and implementation of distributed applications. Architecture of distributed applications Distributed objectbased systems Remote invocation (RPC/RMI) Distributed shared memory Design and Concepts of distributed applications Distributed applications Web Services Distributed file service 1.1 Lecture Content Discussion of various aspects, concepts and mechanisms of distributed applications. 2
1.2. BIBLIOGRAPHY Basic principles for the design of distributed applications. Terminology, communication mechanisms, client-server model, aspects of remote invocation (RPC, RMI). model for distributed applications. happend-before relation, clocks for synchronization Introduction to distributed transactions and group communication. 2 phase commit, aspects of consistent message delivery ("atomic multicast", virtual synchronization) in groups, group management. Information replication and distributed file systems. consistency of replicated information, concurrency control. Designing distributed applications. Web services MDA (Model Driven Architecture) SOA modeling Object-oriented distributed systems. Impact of the object-oriented paradigm on design of distributed applications, especially Corba. Secure communication in distributed systems. brief introduction to the authentication of users and systems, and discussion of the Kerberos system. 1.2 Bibliography The following literature was used to prepare this lecture. 1.2.1 Course Text Books George F. Coulouris, Jean Dollimore, Tim Kindberg, Gordon Blair, "Distributed Systems: Concepts and Design", Addison-Wesley, 2012 see also Web Site (URL: http://www.cdk5.net/) for references and additional information George F. Coulouris, Jean Dollimore, Tim Kindberg, "Verteilte Systeme: Konzepte und Design", Pearson Studium, 2005 (German) 3
1.2. BIBLIOGRAPHY Andrew S. Tanenbaum, Maarten van Steen, "Distributed Systems - Principles and Paradigms", Prentice Hall, 2007 Andrew S. Tanenbaum, Maarten van Steen, "Verteilte Systeme - Prinzipien und Paradigmen", Pearson Studium, 2007 (German) 1.2.2 Further Reading S. Allamaraju et al., "Professional Java Server Programming - J2EE Edition", Wrox Press, 2000 G. Alonso, F. Casati, H. Kuno and V. Machiraju, "Web services: concepts, architectures and applications", Springer-Verlag,, 2004. D.K. Barry "Web services and service-oriented architectures", Kaufmann, 2003. Morgan- M. Bell, "Service-Oriented Modeling", John Wiley&Sons, 2008 K. Birman, "Reliable Distributed Systems", Springer, 2005 M. Liu, "Distributed Computing - Principles and Applications", Pearson Addison-Wesley, 2004 G. Glass, "Web services: building blocks for distributed systems", Prentice- Hall, 2002. S. Graham, D. Davis, S. Simeonow, G. Daniels, P. Brittenham, Y. Nakamuar, P. Fremantle, D. König and C. Zentner "Building web services with Java", Sams Publishing, 2005. U. Hammerschall, "Verteilte Systeme und Anwendungen", Pearson Studium, 2005 (in German). Eric Newcomer, "Understanding Web Services", Addison-Wesley, 2002 F. Shanahan, "Amazon.com - Mashups", Wiley Publishing, 2007 A. Tanenbaum, "Modern Operating Systems", Prentice Hall, 2008 4
1.3. ABBREVIATIONS 1.3 Abbreviations API BPEL4WS B2B B2C CLSID CORBA CSCW DCE DCOM DIT DME DNS DSM EAR EJB GIOP IDL IETF IID IIOP IPC ISO J2EE JAF JAR JDBC JMS JNDI JSP JTA KDC LDAP LDIF Application Programming Interface Business Process Execution Language for Web Services Business-to-Business Business-to-Consumer class identifier (in the context of DCOM) Common Object Request Broker Architecture Computer Supported Cooperative Work Distributed Computing Environment (OSF) Distributed Component Object Model Directory Information Tree (LDAP) Distributed Management Environment (OSF) Domain Naming Service Distributed Shared Memory Enterprise Archive Enterprise Java Beans General Inter-ORB Protocol Intreface Definition Language Internet Engineering Task Force Interface Identifier (in the context of DCOM) Internet Inter-ORB Protocol Interprocess communication International Standards Organization Java 2 Platform Enterprise Edition Java Beans Activation Framework Java Archive Java Database Connectivity Extension Java Message Service Java Naming and Directory Interface Java Server Pages Java Transaction API Key Distribution Center (part of Kerberos) Lightweight Directory Access Protocol LDAP Data Interchange Format 5
1.3. ABBREVIATIONS NFS ODP OLE OMA OMG ONC ORB OSF QoS RMI RPC SDL SOA SOAP SOM SSL UDDI WAR WSDL XDR Network File System (von Sun) Open Distributed Processing Object Linking and Embedding Object Management Architecture Open Management Group Open Network Computing (of Sun) Object Request Broker (Corba) Open Software Foundation Quality of Service Remote Method Invocation Remote Procedure Call Specification and Description Language Service Oriented Architecture Simple Object Access Protocol Service Oriented Modeling Secure Socket Layer Universal Description, Discovery, and Integration Web Archive Web Services Description Language external Data Representation 6
Chapter 2 Introduction 2.1 Issues Issues of the following section Motivation for distributed systems and distributed applications. Basic terminology for distributed systems, e.g. applications, and interface. terms like distributed Introduction to some influential historic distributed systems, such as NFS File system, Mach and Java 2 Platform Enterprise Edition. 2.2 Background Variety of domains for distributed applications collaborative information spaces, workflow management, telecooperation, autonomous agents 2.2.1 Development of computer technology 7
2.2. BACKGROUND 1950 specialized applications (reserved computing time) isolated data 1960 1970 1980 1990 2000 numerical applications (batch) commercial applications (Time Sharing) presentation-oriented applications (personal workstation) distributed application internet computing data modeling isolated data, desktop publishing distributed information management Multimedia Web Services service oriented architecture (SOA) 2.2.2 Internet computing Networks of heterogeneous computers, applications using shared resources which are geographically dispersed, information communication (i.e. improved information flow), and activity coordination. Examples are: online flight-reservation distributed money machines audio/video conferencing applications, e.g. Microsoft Netmeeting (see the application domain "Computer-supported Cooperative Work"), Internet telephony (e.g. Skype) World Wide Web 8
2.2. BACKGROUND Grid Computing use the resources of many separate computers connected by a network to solve large-scale computation problems, e.g. SETI@home (URL: http://seticlassic.ssl.berkeley.edu/): Search for Extraterrestrial Intelligence. Social software sharing of private information and collaborative tagging, e.g. Blogs, Flickr, YouTube, Twitter, Facebook Massively multiplayer online games MMOGs) a very large number of users interact through the Internet with a persistent virtual world 2.2.3 Enterprise Computing database file system application 1 application 2 network database file system services Enterprise computing systems close, direct coupling of application programs running on multiple, heterogeneous platforms in a networked environment. These systems must be completely integrated and very reliable, in particular information consistency, even in case of partial system breakdown. security and guaranteed privacy. adequate system response times. high tolerance in case of input and hardware/user errors (fault tolerance). autonomy of the individual system components. 9
2.3. KEY CHARACTERISTICS OF DISTRIBUTED SYSTEMS 2.3 Key Characteristics of distributed Systems 2.3.1 Motivation The following factors contribute to the increasing importance of distributed systems: Decrease of processor and storage cost. High bandwidth networks Insufficient and often unpredictable response times of mainframe systems Growing number of applications with complex information management and complex graphical user interfaces. Growing cooperation and usage of shared resources by geographically dispersed users; caused by the globalization of markets and enterprises, e.g. applying telecooperation (groupware, CSCW) and mobile communication to improve distributed teamwork. Informal definition The term distributed system may be defined informally: 1. after Tanenbaum: a distributed system is a collection of independent computers which appears to the user as a single computer. 2. after Lamport: a distributed system is a system that stops you from getting any work done when a machine you ve never heard of crashes. 3. Definition: We define a distributed system as one in which hardware and software components located at networked computers communicate and coordinate their actions mainly by passing messages. Methods of distribution There are five fundamental methods of distribution: 10
2.3. KEY CHARACTERISTICS OF DISTRIBUTED SYSTEMS 1. Hardware components. 2. Load. 3. Data. 4. Control, e.g. a distributed operating system. 5. Processing, e.g. distributed execution of an application. In the following sections, we will focus on the latter three types of distribution, in particular on the processing distribution. 2.3.2 Properties of distributed systems Distributed systems have a number of characteristics, among them are: 1. Existence of multiple functional units (physical, logical), e.g. software services. 2. Distribution of physical and logical functional units. 3. Functional units break down independently. 4. Distributed component control: a distributed operating system controls, integrates and homogenizes the distributed functional units 5. Transparency (see page 26): details irrelevant for the user (e.g. distribution of data across several computers) remains hidden in order to reduce complexity. 6. Cooperative autonomy during the interaction among the physical and logical functional units implies concurrency during process execution. 2.3.3 Challenges of distributed systems The design of distributed systems poses a number of challenges Heterogeneity applies to networks, computer hardware, operating systems, programming languages and implementations by different programmers. Use of middleware to provide a programming abstraction masking the heterogeneity of the underlying system. middleware provides a uniform computational model for use by the programmers of servers and distributed applications. middleware examples are Corba (see page 213), Java RMI (see page 73). 11
2.3. KEY CHARACTERISTICS OF DISTRIBUTED SYSTEMS Openness requires standardized interfaces between the various resources. Scalability: adding new resources to the overall system. Security: for information resources. Privacy: protect user profile information. 2.3.4 Examples for development frameworks High motivation to use standardized development frameworks Sun Network File System (NFS) (see page 16) by SUN a distributed file system behaving like a centralized file system. Open Network Computing (ONC) by SUN platform for distributed application design; it contains libraries for remote procedure call (RPC) (see page 63) and for external data representation (XDR) (see page 91). distributed applications in ODP (Open Distributed Processing) by ISO specification of the interfaces and the component behavior. Common Object Request Broker Architecture (CORBA) (see page 213) by OMG defines a common architecture model for heterogeneous environments based on the object-oriented paradigm. It aims at interoperability between distributed objects residing on different platforms. Java 2 Platform Enterprise Edition (J2EE) by Sun, e.g. RMI (see page 73) component-based Java framework providing a simple, standardized platform for distributed applications; runtime infrastructure and a set of Java API s..net (URL: http://www.microsoft.com/net/) framework of Microsoft middleware platform especially for Microsoft environments consists of a class library and a runtime environment incorporates the distributed component object model (DCOM) 12
2.4. DISTRIBUTED APPLICATION 2.4 Distributed application a set of cooperating, interacting functional units reasons for distribution: parallelism during the execution, fault tolerance, and inherent distribution of the application domain. 2.4.1 Definition Definition: The term distributed application contains three aspects. The application A, whose the functionality is split into a set of cooperating, interacting components A 1,.., A n, n IN, n > 1; each component has an internal state (data) and operations to manipulate the state. The components A i are autonomous entities which can be assigned to different machines F i. The components A i exchange information via the network. 2.4.2 Programmer s perspective Interfaces help to establish well-defined interaction points between the components of a distributed application. Interfaces of a distributed application Consider the distributed application A that consists of the components A 1,..., A 5 : export interface A 3 A 4 A 1 import interface A 2 A 5 13
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS Interface specification specifies component operations (names, functionality) and communication between components. required parameters (including their types). the results returned by the operation including arity and type. visible side-effects caused by the component execution, for example data entry into a database. effects of an operation on the results of subsequent operations. constraints concerning the sequence of operations. 2.4.3 Distributed application vs. parallel program Although distributed applications might look similar to parallel programs at first glance, there are still some differences. distributed application parallel program granularity coarse fine data space private shared failure handling within the communication protocols not considered 2.5 Influential distributed systems Xerox PARC experimented in the 1970 s with distributed applications (Alto workstation, Ethernet). book of Ken Birman (chap 27) gives a brief overview of a number of distributed systems, e.g. Amoeba, NavTech, Totem, Argus, etc. 2.5.1 Mach Mach is an operating system developed at Carnegie-Mellon University. characterized through its small kernel, called microkernel. It is Goals of Mach Mach is especially suitable for multiprocessor applications or applications designed for distributed systems. Major design goals were 14
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS Emulation of Unix. transparent extension to network operation. portability. Architecture The process (a task) defines an execution environment that provides secured access to system resources such as virtual memory and communication channels. A process consists of a set of threads. applications server subsystems virtual memory management network server process management microkernel threads communication channels / memory objects threads as distribution unit, i.e. only entire threads are assigned to different processors. memory objects realize virtual storage units; shared utilization of memory objects by different processes is based on "Copy-on-Write", i.e. the memory object is copied when write operation takes place. Mach message exchange Processes communicate through communication channels, called ports. process 1 send to port 47 port 47 process 2 15
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS A port is realized as a message queue to which multiple senders may send messages; there is only one receiving process per queue. Ports are protected by capabilities. Mach supports network communication through network servers. process 1 port 67 process 2 send to port 47 network server port 47 port 135467 send to network port 135467 send to port 67 network server 2.5.2 Sun Network File System (NFS) network extension to Unix and other operating systems for distributed file management. Characteristics File catalogs are exported (by server subsystems) and mounted (by the client machines). 16
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS Sun Client root local mount local usr org remote mount man sys staff sys usr users usr HP Server root root IBM Server Support of a mount service: file /etc/exports on NFS server lists names of local filesystem available for remote mounting. mounting request by client with: remote host, directory pathname and local name with which it is to be mounted. automounter: dynamically mounting of a remote directory whenever an empty mount point is referenced by a client. NFS supports access transparency (see page 26). NFS implementation NFS implementation is based on RPC calls between the involved operating systems. It can be configured to use UDP or TCP. 17
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS client machine server machine client process system call virtual file system Unix kernel virtual file system Unix kernel local remote Unix file system NFS client network NFS protocol NFS server Unix file system earlier version of NFS was a stateless file server, i.e. a server subsystem does not store state information about its clients and their past operations. current version (URL: http://tools.ietf.org/html/rfc3530) of NFS is a stateful file server, i.e. a server subsystem supports locking and delegation of actions to client to improve client-side caching. 2.5.3 Java 2 Platform Enterprise Edition (J2EE) The J2EE platform (now called Java Platform, Enterprise Edition - Java EE) is essentially a distributed application server environment. It is a Java environment that provides the following: a runtime infrastructure for hosting applications. a set of Java extension APIs to build applications. Objectives of J2EE The idea of J2EE is to provide a standardized programming model for the realization of distributed applications at the organizational level. Java-based, but with interfaces to legacy applications, for example through Corba. component-based. 18
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS network-oriented: supporting Web Services. J2EE consists of 2 components a runtime infrastructure for applications. a set of Java extension APIs to build applications. Examples are Enterprise Java Beans (EJB), Java Servlets, JavaServer Pages (JSP), RMI via Internet-Inter-ORB Protocol (RMI-IIOP), Java Naming and Directory Interface (JNDI), Java Transaction API and Java Mail. J2EE architecture A J2EE platform consists of the J2EE application server (runtime environment), one or several J2EE containers, and the data storage. J2EE container Web-Container J2EE container EJB Container Java-Servlets JSP Pages Enterprise Java Beans Application Clients RMI/IIOP JNDI JTA JDBC JMS JavaMail JAF RMI/IIOP JNDI JTA JDBC JMS JavaMail JAF J2EE Application Server data storage EJB is a specification of a server-side, managed component architecture. a bean offers one or more business interfaces to clients. especially suited for 3-tier architectures. J2EE container 19
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS A typical J2EE platform has one or several containers. A J2EE container has two principal tasks: runtime environment for managing application components. to provide access to J2EE APIs. available APIs of the J2EE platform RMI/IIOP: Remote Method Invocation (via IIOP) JNDI: Java Naming and Directory Interface JTA: Java Transaction API JDBC: Java Database Connectivity Extension JMS: Java Message Service Java Mail JAF: Java Beans Activation Framework. Examples for application components: JavaServlets, JavaServer Pages, Enterprise JavaBeans. J2EE supports the following general containers Web container: Java Servlets, JSP pages EJB container : Enterprise Java Bean components Applet container : Java applets Application container : Standard Java applications J2EE application A J2EE application consists of several modules, each of which again contains several application components. Modules and application components are listed in an archive file: EAR (Enterprise archive), WAR (Web archive) or JAR (Java archive) 20
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS J2EE application (EAR File) application.xml EJB module Web module Java module EJB module (JAR File) ejb-jar.xml Web module (WAR file) web.xml Java module (JAR file) application-client.xml EJB EJB Web Web Java Java Java Server Pages JavaServer Pages technology uses XML-like tags and scriptlets written in the Java programming language to encapsulate the logic that generates the content for the Web page. Comment: <%--Comment --%> Declaration: <%! int x = 0; %> Expression: <%= expression %> Scriptlets -contain Java Code <% code fragments %> <% if (value.getname().length!= 0) { %> <H2>The value is: <%= value.getname()%></h2> <%} else {%> <H2>Value is empty</h2> <% }%> implicit objects available to JSP request, response, session, out, page Example implementations JBoss (URL: http://www.jboss.org/): Open Source advanced middleware for J2EE based distributed applications IBM WebSphere (URL: http://www-01.ibm.com/software/websphere/): proprietary integration and application infrastructure software; provides J2EE support 21
2.5. INFLUENTIAL DISTRIBUTED SYSTEMS J2EE is continuously extended by new technologies, e.g. integrating the support for Web Services. 2.5.4 Google Google is one of the largest distributed systems in use today. besides providing a search engine Google is now a major player in cloud computing. end of 2010 more than 88 billion queries a month; never experienced a major outage since the beginning in 1998. Google provides a significant number of applications, such as GMail, Google Docs, Google Calendar, Google Wave, Google Maps, Google Earth, Google Wave, Google News, Google App Engine, etc. physical infrastructure: commodity PCs organized in racks which are organized into clusters with very large storage capacity. Clusters are housed in Google data centers. middleware: communication paradigm based on protocol buffers and publishsubscribe. distributed computation based on MapReduce and Sawzall. Key principle of MapReduce break input data into a number of chunks. carry out initial processing on these chunks of data to produce intermediary results. combine intermediary results to produce the final output. 22
Chapter 3 Architecture of distributed systems 3.1 Issues This section focuses on the following issues Discussion of basic aspects of distributed systems. Transparency as a key concept of distributed systems. How do distributed components cooperate? Thus, we discuss models of cooperation among components of distributed applications. What is the client-server model? 3.2 System Models A distributed system can be described in form of descriptive models. 3.2.1 Architectural model defines the interaction between components and the mapping onto the underlying network. Software layers 23
3.2. SYSTEM MODELS applications, services middleware operating system computer and network devices Middleware is defined as a layer of software whose purpose is to mask heterogeneity and to provide a convenient programming model to application programmers. hides the complexity of the communication between two or more systems or services. major categories of middleware: distributed objects, distributed components, publish-subscribe, web service, peer-to-peer. Examples are Corba, Java RMI, DCOM (Microsoft s Distributed Component Object Model). Middleware services are e.g.: communication facilities, naming of remote entities (objects), persistence (distributed file system), distributed transactions, facilities for security. System architectures deals with the placement of components across a network of computers and the functional roles they assume during interaction. client-server model. proxy servers. peer processes. community of software agents. 3.2.2 Interaction model The interaction model deals with performance and with the difficulty of setting time limits, e.g. for message delivery. 24
3.2. SYSTEM MODELS it is impossible to maintain a single global time logical clocks are required for synchronization. messages do not arrive in the same order at all locations. consistent ordering of events. 3.2.3 Failure model The failure model defines the ways in which failures may occur and how they are handled. distinction between failures of processes and communication channels. different types of failures: crash faults: the process simply stops due to Hardware failures or Software errors. message loss: messages may be lost due to buffer overflow of routers or network congestion. fail stop failures: the process fails by crashing; system notifies relevant partners. timing failures: a local clock exceeds the bounds on its rate of drift from real time or transmission takes longer than the specified bound. arbitrary failures (non-malicious Byzantine failure): a process arbitrarily omits intended processing steps, takes unintended processing steps or sends corrupted messages. malicious Byzantine failure: an attacker who has studied the system attempts to break it. Examples are the corruption or replay of messages, or the modification of the program (install hacked version). 3.2.4 Security model The security model defines the possible threats to processes and communication and the ways how they are handled. secure communication channels, e.g. use of cryptography. protecting objects against unauthorized access. authentication of messages; proving the identities of their senders. The following sections of the course will discuss in more detail various aspects of these system models. 25
3.3. TRANSPARENCY 3.3 Transparency key concept for better exploitation of resources within a distributed, heterogeneous system. 3.3.1 Location transparency Problem: location of an object (resource or service) in a distributed system. Location transparency implies that the user need not necessarily know the physical location of the object within the network; the access is realized through a location-independent name of the object. user printer network computer 1 computer 2 Important aspect of location transparency: Object name contains no information about the current object location. 3.3.2 Access transparency Problem: How to access objects in a distributed system. Access transparency provides access to local and remote objects in exactly the same way. 26
3.3. TRANSPARENCY file D1 file D2 computer 1 network computer 2 3.3.3 Replication transparency For reasons of availability or fast access, resources, e.g. objects may be replicated. Problem: Management of several copies of an object in a distributed system. Replication transparency means that the user is unaware of whether an object is replicated or not. The user accesses replicated objects as if they exist only once. replica 1 of file f replica 2 of file f computer 1 network user computer 2 A variety of protocols have been proposed that deal with the problem of consistency among replicated files ( Update of replicated files) (see page 194). 3.3.4 Migration transparency Problem: Object relocation in distributed systems. Migration transparency provides a solution to the problem of relocation of objects in distributed systems. Objects may migrate from one computer to another without influencing the correct behavior of running applications. 27
3.3. TRANSPARENCY Host migration transparency Problem: Computer migrates from one subnetwork to another subnetwork, e.g. if a user connects his laptop computer to different subnetworks. This requires a dynamic assignment of the IP address (e.g. DHCP), a name server, etc. The computer supports the same environment, the same applications, and the same look-and-feel, no matter where the mobile workers are currently connected to the network. Types of migration off-line migration. on-line migration. 3.3.5 Language transparency Problem: An application s components are realized by different programming languages. Interactions between individual components is independent from the programming language used for implementing the respective components. Example: calendar system C-based system user interface communication management of calendar information C++-based system inferencing with respect to calendar information Lisp-based system 3.3.6 Other transparencies There are a number of other transparencies relevant for distributed systems. 28
3.3. TRANSPARENCY Failure transparency Problem: Partial failure in distributed systems, for example computer crashes or network failures. Up to a certain degree, failures are masked by the system. Concurrency transparency concurrent access to shared resources by distributed users or application components. Problem: shared access to objects in distributed systems. Several users or application programs can access objects simultaneously (for example shared data) without mutual influence. Execution transparency Execution transparency implies that processes may be processed on different runtime systems. Performance transparency allows for dynamic reconfiguration of the system to improve the overall system performance when changes in load characteristics are detected. Scalability transparency supports extensions and enhancements of the system or the applications without the need of modifications to the system structure or changes to the application algorithms. 3.3.7 Goal for distributed applications A major goal of most distributed systems, especially of distributed file or operating systems, is the realization of a rich set of transparency levels. Problems for some application types. Problem: Computer-supported Cooperative Work due to concurrency transparency, the team members are not always aware of their simultaneous activities (there is no "group awareness"). selective transparency: location and access transparency, but no strict concurrency transparency. 29
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS 3.4 Paradigms for distributed applications This chapter will discuss mechanisms for cooperation on different levels of abstraction. Communication may take place between entire applications or between components of a distributed application: Information Sharing, message exchange, producer-consumer model (pipe mechanism), client-server model, P2P model and group communication. 3.4.1 Information Sharing Components of a distributed application communicate through shared, integrated information management. Examples sharing documents, URLs: BSCW Workspace (URL: http://bscw.gmd.de/) sharing objects, software components: JavaSpaces (see page 206) component 1 component 2 component 3 integrated information management No direct communication between components, e.g. distributed shared memory. 3.4.2 Message exchange Interprocess communication (IPC): message exchange between sender and receiver. 30
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS operation send operation receive operation blocking non-blocking blocking non-blocking synchronous asynchronous communication Background Message exchange takes place between a sending and a receiving process. Basic functionality send(e: receiver, N: message) ; receive(s: sender, B: buffer) ; Communication perspectives We can distinguish between different perspectives with respect to the communication among the involved processes: the sender s view, and the receiver s view Assumption: Sender S has invoked the operation send(e, N); receiver E performs the operation receive(s, B). Categories of Message Exchange We will distinguish between asynchronous and synchronous message exchange, as well as the so-called remote-invocation send. 31
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Asynchronous message exchange (nonblocking) Sender S can resume its processing immediately after the message N is put forward into the message queue NP (NP is also called message buffer). S will not wait until the receiver E has received the message N. A receive operation indicates that the receiver is interested in receiving a message. Example The receiver E repeats the invocation of the receive operation until a message arrives. If the message N is available, it is transferred; otherwise E continues with its normal processing. sender S message system receiver E send receive time receive Advantages of asynchronous message exchange Advantages useful for real-time applications, especially if the sending process should not be blocked. supports parallel execution threads at the sender s and the receiver s sites. it can be used for event signaling purposes. Disadvantages management of message buffers, handling of buffer overflow, access control problems, and of process crashes (receiver). notification of S in case of failures may be a problem, since mostly S has already continued with its regular processing. design of a correct system is difficult. The failure behavior depends heavily on buffer sizes, buffer contents, and the time behavior of the exchanged messages. Synchronous message exchange (blocking) 32
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Sender S is blocked until recipient E has effectively received message N; similarly, receiver E is blocked until N is transferred, i.e. until N is stored into the receiver s buffer. send sender S receiver E receive suspended suspended time acknowledgement Decoupling of sender and receiver avoid endless waiting times: decoupling of sender S and receiver E: associate a timeout with every send operation, i.e. the sender terminates waiting time after a certain time span has elapsed. creation of subprocesses for sending the message, e.g. using "threads". Remote-invocation send Sender S suspends execution until the receiver has received and processed the submitted request that was delivered as part of the message. 3.4.3 Naming entities Names are used to uniquely identify entities and refer to locations. An important issue is name resolution. Names A name is a string of characters that is used to refer to an entity (e.g. host, printer, file). entities have access points to invoke operations on them address is the name of the access point. an identifier is a name which uniquely identifies an entity. Name space Names in distributed systems are organized into a name space. 33
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Name spaces are organized hierarchically. Representation as a labeled directed graph. Path along graph edges specifies the entity name, e.g. documents/projects/lecture2003/concept.tex; absolute vs relative path names. Name resolution: a name lookup returns the identifier or the address of an entity, e.g. LDAP (see page 54) Name Service. 3.4.4 Bidirectional communication Usage of the request-answer scheme for message exchange. Sockets Sockets provide a low level abstraction for programming bidirectional communication. A socket is an application created, OS-controlled interface into which application can both send and receive messages to/from another application. unique identification: IP-address and port number. input stream client server output stream Socket connection Sockets in Java Java package java.net Socket constructors - methods constructors of java.net.socket Socket(): Creates an unconnected socket, with the system-default type of SocketImpl. 34
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Socket(InetAddress address, int port): Creates a stream socket and connects it to the specified port number at the specified IP address. Socket(Proxy proxy) Creates an unconnected socket, specifying the type of proxy, if any, that should be used regardless of any other settings. Socket(String host, int port) Creates a stream socket and connects it to the specified port number on the named host. methods of java.net.socket void bind(socketaddress bindpoint): Binds the socket to a local address. void close(): Closes this socket. void connect(socketaddress endpoint): Connects this socket to the server. void connect(socketaddress endpoint, int timeout): Connects this socket to the server with a specified timeout value. Example Example from the client perspective import java.io.* import java.net.* public class EchoClient { public static void main(string[] args) throws IOException { Socket echosocket = null; Printwriter out = null; BufferedReader in = null; try { echosocket = new Socket("www.in.tum.de", 7); //create Socket // create Writer, Reader out = new PrintWriter(echoSocket.getOutputStream(), true); in = new BufferedReader( new InputStreamReader(echoSocket.getInputStream()) ); } catch (UnknownHostException e) { System.err.println("unkown host in.tum.de"); System.exit(1); } catch (IOException e) { System.err.println("No I/O from in.tum.de"); System.exit(1); } 35
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS //read streams BufferedReader stdin = new BufferedReader ( new InputStreamReader(System.in)); String userinput; while ((userinput = stdin.readline())!= null) { out.println(userinput); System.out.println("echo: } // close streams and sockets out.close(); in.close(); stdin.close(); echosocket.close(); } } " + in.readline()); Call semantics Communication between sender and receiver is influenced by the following situations loss of request messages. loss of answer messages. sender crashes and is restarted. receiver crashes and is restarted. Different types of call semantics Any communication between a sender and a receiver is subject to communication failures. Therefore, we distinguish between different call semantics. at-least-once semantics Under an at-least-once semantics, processed once or several times. the requested service operation is 36
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS sender S receiver E 1 request answer execution timeout 2 time answer execution exactly-once semantics requested service operation is processed exactly once: repeatedly sent requests. detection of sender S receiver E timeout 1 2 request answer execution list of current requests time timeout 3 answer acknowledgement execution last semantics Under a last semantics, the requested service operation is processed once or several times, however, only the last processing produces a result and, potentially, some side-effects. at-most-once semantics Under an at-most-once semantics, the requested service operation is processed once or not at all. If the service operation is processed successfully, the at-most-once semantics coincides with the exactly-once semantics. 37
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Example for providing at-most-once semantics After timeout at the sending site the request is not retransmitted. The request is transmitted in the context of a transaction. 3.4.5 Producer-consumer interaction In this interaction type (also called fire & forget interaction) after an invocation of the consumer, the producer resumes its execution immediately (and is not suspended). producer consumer invocation Special case: Pipe mechanism (similar to unix pipes); after information has been provided to the consumer, the producer terminates the execution. 3.4.6 Client-server model A central component (the server) provides a service to requesting clients. client client client server 38
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Request-Answer Interaction The client-server model (see page 46) implements a sort of handshaking principle, i.e., a client invokes a server operation, suspends operation (in most of the implementations), and resumes work once the server has fulfilled the requested service. client server request answer SOA Service-oriented architecture (SOA): abstract architectural approach loose coupling and dynamic binding between services based on principles of modularized software and interface/component-based design Collection of services services communicate with each other, e.g. data passing or remote invocation each service must manage its own data SOA contains 3 roles: service requestor, service provider and service registry. Web services represent an implementation of SOA concept (currently the most important one) Examples for servers In a distributed environment, a server manages access to shared resources (e.g. a file server). Problems: server crash resource is no longer available in the network. server becomes a bottleneck for accessing the resource. 39
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Internet Explorer, Netscape/Opera browser are examples for clients and Apache Web-Server is an example for a server. Web Server - HTTP Communication between Web Browser and Web Server is based on the HTTP protocol stateless (see page 52) protocol. based on TCP sockets using typically port 80. session information is handled by the application layer (cookies). HTTP protocol supports the methods get, put, post,.. return values / status code, such as 404: not found 401: unauthorized 400: bad request 3.4.7 Peer-to-peer model All processes play a similar role interacting cooperatively as peers to perform a distributed computation. there is no distinction between clients and servers. clients talk directly to one-another. 40
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS application coordination code application coordination code client/server application coordination code client/server application coordination code client/server client/server client/server application coordination code Client-Server vs. Peer-to-Peer Client-Server Servers are centrally maintained and administered Client has fewer resources than a server Peer-to-Peer (P2P) A peer s resources are similar to the resources of the other participants. peers communicate directly with other peers and share resources. Issues of P2P Peer discovery and group management Data location and placement Reliable and efficient file exchange Security/privacy/anonymity/trust Napster Napster was one of the first P2P applications. 41
imac imac Schlichter, TU München 3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Napster server with music list publish list of music 1 2 Search music list Home PC 3 Load music from source Home PC Technical issues Many clients aren t accessible many client systems come and go Firewalls limit incoming connections to clients round trip times to some regions are very slow most clients had slow upload links Clients might withdraw a file unexpectedly. Legal issues When service was launched, Napster designers hoped they had a way around the legal limits of sharing music: clients advertise stuff if some of that stuff happens to be music. That is the responsibility of the person who does it. the directory server "helps clients to advertise stuff" but it does not endorse the sharing of protected intellectual property. Napster is making money by integrating Ads. In the court case the judges saw it differently: Napster s clear purpose is to facilitate theft of intellectual property. 42
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Gnutella Gnutella was one of the first examples of a pure P2P system system is not run by a single company no nodes which act only as servers; Gnutella eliminates the directory server. for sharing files the user must connect to the Gnutella network, a loose federation of computers running Gnutella for connection the computer only has to know the address of one other Gnutella machine, e.g. machines published at well known web sites. at first connection the computer receives hundreds of addresses of machines which may be used at subsequent occasions. a Gnutella program tries to maintain 3 or 4 connections to other Gnutella machines at any one time. find a file: send request with file name and current hop count to its neighbors. neighbor has matching file: respond with the location of the file increment hop count; if hop count < maximum hop count, then propagate request to its neighbors. Other System Examples BitTorrent is a P2P communications protocol for file sharing the recipients of data also supply data to newer recipients, reducing the cost and burden on any given individual source. reducing dependence upon the original distributor. edonkey is a P2P file sharing network used primarily to exchange audio and video files and computer software. Files identified using compound MD4 hash checksums, which are a function of the bit content of the file. Overnet as successor of edonkey protocol. Gossip-based Approach Propagate information in the same way as epidemic diseases spread. 43
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS approach explained informally time t 0 : suppose I know something new time t 1 : I pick a friend and tell him; now 2 people know. time t 2 : we each pick a friend and tell them; now 4 people know time t 3 :... information spreads at exponential rate. due to re-infection information spreads at approx. 1.8 k after k rounds. combination of push and pull works best. algorithm is quite robust and scalable information travels on exponentially many paths. difficult to slow down. the load of the participating nodes is independent of the system size. information spreads in log(system size) time. network load is linear in system size. 3.4.8 Group model Combining of a set of components (e.g. processes) into a group. Example: For reasons of fault tolerance, a service is provided by a group of servers. Important aspects Processing of a shared global problem in a shared environment. Need to exchange information. Group awareness. Coordination of communication and actions within the group. Groupware / Computer-supported Cooperative Work (CSCW) is an application domain which applies the group model paradigm. 3.4.9 Taxonomy of communication As the cooperation models show, communication between components of distributed applications is an essential part; it is mostly based on message exchange. 44
3.4. PARADIGMS FOR DISTRIBUTED APPLICATIONS Message serialization certain order for message delivery messages to a group of recipients: messages arrive in different order, due to different transmission times. One sender There are the following ordering schemes: according to the message arrival on the recipient s side; different receivers can have different message arrival sequences. according to message sequence number generated by the sender; this approach is sender-dominated. receiver creates a serialization according its own criteria. Several senders If several senders are involved, the following message ordering schemes may be applied: 1. no serialization. 2. loosely-synchronous. There is a loosely synchronized global time which provides a consistent time ordering. 3. virtually-synchronous. The message order is determined by causal interdependencies (see page 126) among the messages. For example, a message N has been sent after another message M has been received, i.e. N is potentially dependent on M. 4. totally ordered. by token: before a sender can send a message, it must request the send token. a selected component (the coordinator) determines the order of message delivery for all recipients. 3.4.10 Levels of Abstraction 45
3.5. CLIENT-SERVER MODEL Level of Abstraction High object space, collaborative applications network services, object request broker remote procedure call, remote method invocation Client/Server, Peer - to - Peer Low message passing 3.5 Client-server model The client-server model implements a sort of handshaking principle, i.e., a client invokes a server operation, suspends operation (in most of the implementations), and resumes work once the server has fulfilled the requested service. 3.5.1 Terms and definitions client request answer service Server S server S 1 Operation... O operation O 11... operation O 1k client machine server machine 46
3.5. CLIENT-SERVER MODEL Definitions It should be clear to the reader by now that we have used so far technical terms at different levels of abstraction. sender, receiver: pure message exchanging entities. client, server: entities acting in some specialized protocol. Client Definition: A client is a process (some say, an application) that runs on a client machine and that typically initiates requests for service operations. Service Potential clients are a priori unknown. Definition: A service is a piece of software that provides a well-defined set of service operations. This piece of software may run on one or multiple (server) machines. Server Definition: A server is a subsystem that provides a particular service to a set of a priori unknown clients. A server executes a (piece of) service software on a particular server machine. Obviously, a single server machine can host multiple server subsystems. A server provides a set of operations (procedures). Client-server interfaces client server import export 1. Client interface (import interface) It represents the server within the client; It prepares parameters and sends the request messages to the server; 47
3.5. CLIENT-SERVER MODEL It prepares the interpretation of the result that is extracted from the answer message submitted by the server. 2. Server interface (export interface) It represents all potential clients within the server; It accepts client requests; interprets the parameters; prepares results; It invokes the respective service operation; It prepares and sends the answer message containing the result of the service operation. Multitier architectures client client & server server Example for a multitier architecture web server: interface between client (browser) and application web server is server and client. web browser (applets) HTTP (RMI) web server cgi (RMI) application server SQL database Timing process 48
3.5. CLIENT-SERVER MODEL web browser web server application server database server request operation request operation request data wait for result wait for result wait for data return data return result return result time 3.5.2 Concepts for client-server applications Client presentation execution Server presentation presentation presentation execution presentation execution (with local database) presentation execution database database presentation execution database execution database execution database execution (with local database) database Case 1 Case 2 Case 2 Case 3 Case 3 Case 4 Different cases Case 1: remote data storage. access, for example, via Sun NFS. Case 2: remote presentation (for example X window system). Case 3: distributed application cooperative processing among the individual components of an application. Case 4: distributed data storage The information is distributed between client and server; information replication is possible. 49
3.5. CLIENT-SERVER MODEL 3.5.3 Processing of service requests clients and servers have different life spans; servers manage these requests in a queue. Single dedicated server process A single dedicated server process is in charge of processing requests for service operations. client queue requests server invocation of server operation results to client no parallel processing of requests, which results in the following disadvantage: approach may be time consuming. no interruption of the processing of the current request when a higher prioritized request appears in the queue. server becomes bottleneck. Cloning of new server processes Every incoming request is handled by a new server process. 50
3.5. CLIENT-SERVER MODEL clients server processes queue dispatcher requests results to client Cloning of new server processes is expensive; Synchronization of access to shared persistent data; Parallel processing of several applications is possible; Parallel request processing through threads This is a variant of the second approach. Shared address space, i.e. the approach allows shared utilization of variables; 3.5.4 File service Definition: A file service [Svobodova 1984] provides (remote) centralized data storage facilities to clients distributed among a network. server deals with bulk data storage, high performance computation, collecting/managing large amounts of data. client deals with "attractive" display, quick interaction times. use of caching to speed up response time. Use of cache "hints" to facilitate cache management speed up system when hint is correct. mechanism to detect wrong hint and seek up-to-date information. 51
3.5. CLIENT-SERVER MODEL Distinction between stateless and stateful Stateless server Stateless server do not manage any state information about their clients; the client must supply all necessary parameters to process the request. client server state file name access mode displacement read (file, displacement, nbytes) read data as well as new position in file stateless server does not track clients or ensure that cached data stays up to date cache refresh is responsibility of the client. client uses often write-through caching policy. A crashed server can be restarted without dealing with state reinstallments. Stateful server Stateful server subsystems manage state information about their clients. client server open (file, mode) file descriptor fd read (fd, nbytes) read data state file name access mode displacement server tracks its clients and takes actions to keep their cached states up-to-date. Client can trust its cached data cache is owned by the server. As a consequence, programming at the client site becomes less complex. 52
3.5. CLIENT-SERVER MODEL stateful transactional server architecture: after recovery of server crash an abort message is sent to client. 3.5.5 Time service Definition: A time service provides a synchronized system-wide time for all nodes in the network. 3.5.6 Name service Definition: A name service, sometimes called a directory service, provides (remote) centralized name management facilities to clients distributed among a network; names refer to objects; examples are files, other servers, services, personal computers, printers, as well as users. Name servers manage a list of names. Such a directory entry might be stored in a data structure name /* Name of the object as parameterized in a client request.*/ address /* Address of the object within the network, e.g., host number concatenated with communication port number. */ access information /* This access information may limit access to the object for particular clients. */ attributes /* Additional attributes of the object. */ Example for a Name Service Domain Name System (DNS): hierarchical domain-based naming scheme for the Internet. distributed database for implementing this naming scheme. mapping of host names and email destinations (e.g. www11.in.tum) to their respective IP addresses. top-level organizational domains: edu: universities and other educational institutions com: commercial organizations de: organization in Germany DNS database is distributed across a logical network of name servers. Each server stores primarily data for the local domain. 53
3.5. CLIENT-SERVER MODEL Animation Domain Name Service see Online Version 3.5.7 LDAP - Lightweight Directory Access Protocol In distributed environments, information concerning persons, applications, files, printers, and other resources available via network is often stored in a special database, the so-called directory. LDAP is a protocol supporting the access to and update of directory information. It is an open industry standard. LDAP is used by the IntegraTUM (URL: http://portal.mytum.de/iuk/integratum/index_html) project to provide a university-wide directory service at TUM. Basics Definition: A directory is a list of objects arranged in some order and with descriptive information (meta-data). difference between directory and database directory has a high volume of read requests directories do not support transactions different query languages A directory service is a name service containing object names and meta-data. Queries in directories: based on names and meta-data. White Pages: object access according to object name. Yellow Pages: object access according to object meta-data. LDAP is a communication protocol supporting access to / update of directory information. it has been developed as simple alternative to X.500 standard. it is based on TCP/IP rather than the ISO/OSI protocol stack. modern web browsers (for example netscape) support LDAP. LDAP specifies several models information model: basic data structures naming model: referencing of objects (distinguished names) functional model: communication protocol and operations security model: control for directory access 54
3.5. CLIENT-SERVER MODEL LDAP architecture The LDAP architecture is based on the client-server model and the TCP/IP protocols. application client application request reply LDAP server receive message access directory return reply LDAP client TCP/IP TCP/IP request message directory reply message LDAP uses strings for data representation. General interaction process 1. Client initiates a session with the LDAP server (binding). Client specifies a name or an IP address and port (e.g. port 389) of the LDAP server. Client specifies user name and password. 2. Client invokes LDAP operations (read, write, seek). 3. Client terminates session (unbinding). Information model A directory entry describes an object, for example person, printer, server, organizations etc. each entry has a distinguished name (DN). each entry has a set of attributes with a type and one or several values. Attribute syntax 55
3.5. CLIENT-SERVER MODEL Syntax Description bin binary information ces case exact string, also known as a "directory string", case is significant during comparisons. cis case ignore string. Case is not significant during comparisons. tel telephone number, numbers are interpreted as text (without blancs or colons) dn distinguished name. generalized time year, month, day, time represented as printable string postal address postal address with lines separated by "$" characters. Examples Attribute, Alias Syntax Description Example commonname, cn cis name of entry John Smith surname, sn cis surname of a Smith person telephonenumber tel telephone number 089-289 25700 organizationalunitname, cis name of Informatik ou organization organization, o cis name of TUM organization owner dn distinguished cn=john Smith, name of entry o=tum, c=de owner jpegphoto bin JPEG Photo photo of John Smith Based on these attributes, schemas for entries can be defined. Examples of schemas: InetOrgPerson: entry for one person attributes: commonname (cn), surname (sn) organization: entry for an organization attribute: organization (o) Naming model The LDAP naming model defines how entries are identified and organized. Any distinguished name (DN) of an object consists of a sequence of parts, so-called relative distinguished names (RDN). 56
3.5. CLIENT-SERVER MODEL The entries in an LDAP directory are hierarchically structured as tree (Directory Information Tree, DIT). Entries are arranged within the DIT based on their distinguished name. Example of DN: cn=john Smith, o=ibm, c=de. DIT also supports aliases. DIT can be distributed across several servers. Reference to entries of other LDAP servers via URLs. Directory Root c=de c=us o = TUM tel = 089-289-01 o=ibm o=ibm ou = Informatik cn: John Smith mail: jsmith@mail.com ou = sales cn: Ernst Mayr mail: mayr@in.tum.de cn = John (alias) Functional model The functional model defines operations for accessing and modifying directory entries. Among others LDAP supports the following directory operations: create a LDAP entry delete a LDAP entry update a LDAP entry, e.g. modification of the distinguished name (= move in DIT) compare LDAP entries search for LDAP entries which meet certain criteria 57
3.5. CLIENT-SERVER MODEL Search The search operation allows a client to request that an LDAP server search through some portion of the DIT for information meeting user-specified criteria in order to read and list the result(s). Examples find the postal address for cn=john Smith,o=IBM,c=DE. find all entries which are children of ou=informatik,o=tum,c=de. Search constraints. base object: defines the starting point of the search. The base object is a node within the DIT. scope: specifies how deep within the DIT to search from the base object, e.g. baseobject: only the base object is examined. singlelevel: only the immediate children of the base object are examined; the base object itself is not examined. wholesubtree: the base object and all of its descendants are examined. filter: search filter on entry attributes; Boolean combination of attribute value assertions example: (&(cn=schmi*)(!(c=de))) Code example 58
3.5. CLIENT-SERVER MODEL #define SEARCHBASE "o=tum,c=de" LDAP *ld; char *User = NULL; char *Passwd = NULL; char searchfilter[] = "cn=mayr"; /* open a connection */ if ((ld = ldap_open("ldapserver.in.tum.de", LDAP_PORT)) == NULL) exit(1); /* authenticate as nobody */ if (ldap_simple_bind_s(ld, User, Passwd)!= LDAP_SUCCESS) { ldap_perror(ld, "ldap_simple_bind_s"); exit(1); } /* search the database */ if (ldap_search_s(ld, SEARCHBASE, LDAP_SCOPE_SUBTREE, searchfilter, NULL, 0)!= LDAP_SUCCESS) { ldap_perror(ld, "ldap_search_s"); exit(1); }... /* close and free connection resources */ ldap_unbind(ld); ldif - exchange format ldif = LDAP Data Interchange Format; it is used to import and export directory information. 59
3.5. CLIENT-SERVER MODEL dn: cn=informatik cn: Informatik objectclass: top objectclass: groupofnames member: cn=baumgarten,uwe, mail=baumgaru@in.tum.de member: cn=schlichter,johann, mail=schlicht@in.tum.de... dn: cn=baumgarten,uwe, mail=baumgaru@in.tum.de cn: Baumgarten,Uwe modifytimestamp: 20001213084405Z mail: baumgaru@informatik.tu-muenchen.de givenname: Uwe sn: Baumgarten objectclass: top objectclass: person... dn: cn=schlichter,johann, mail=schlicht@in.tum.de cn: Schlichter, Johann modifytimestamp: 20001213084406Z mail: schlicht@in.tum.de givenname: Johann sn: Schlichter telephonenumber: +49-89 289-25700 o: Technische Universität München streetaddress: Arcisstr 21 st: Germany objectclass: top objectclass: person 3.5.8 Failure tolerant services There may exist multiple redundant services; server copies and client copies are grouped together into server and client groups. Modular redundancy Client requests are sent to and processed by all server replicas (active replication). Each server replica sends its result to the voting unit of the client. The voting unit decides on the received results (e.g. majority voting). 60
3.5. CLIENT-SERVER MODEL server group client server S1: execute P client C: call P Voting Unit server S2: execute P server S3: execute P Primary-standby-approach client C1: call P client group 1 request 5 answer client C2: call P request 1 server S1: execute P 3 answer server group 2 2 server S2: 4 execute P 4 performed performed server S3: execute P At any specific time, there is only one replica acting as master (primary replica); RPC requests are always propagated to the primary replica; at checkpoints the current state is propagated to the secondary replicas. in case of an error the master is replaced by a backup replica. 61
3.5. CLIENT-SERVER MODEL distinction between hot and cold standby. 62
Chapter 4 Remote Invocation (RPC/RMI) 4.1 Issues Invocation of remote services is an essential aspect of distributed applications What is a remote procedure call (RPC), a remote method invocation (RMI)? Which problems arise by its usage? What are client- and server stubs? What is an RPC generator? How do client and server systems find each other? 4.2 Introduction 4.2.1 Local vs. remote procedure call single process caller request answer procedure 63
4.2. INTRODUCTION RPC is an extension of the same type of communication to programs running on different computers; single thread of execution and transfer of data. caller program request interface between remote systems answer callee program 4.2.2 Definition Definition: Birrell and Nelson (1982) define an RPC as a synchronous flow of control and data passing scheme achieved through procedure calls between processes running in separate address spaces where the needed communication is via small channels (with respect to bandwidth and duration time). synchronous: The calling process (client) is blocked until it receives the answer of the called procedure (server); the answer contains the results of the processed request. procedure calls: the format of an RPC call is defined by the signature of the called procedure. different address spaces: it cannot be assumed that client and server have network-wide unique memory addresses; it is necessary to handle pointers during parameter passing different from local procedure calls. small channel: computers. reduced bandwidth for communication between involved 4.2.3 RPC properties Neither the client nor the server assume that the procedure call is performed over a network. 64
4.2. INTRODUCTION Control flow for RPC calls bind to server client server register service prepare, send request unpack reply 1 RPC-request 3 RPC-response 2 time Differences between RPC and local procedure call For an RPC, the caller and the callee run in different processes. both processes (caller and callee) have no shared address space. no common runtime environment. different life span of client and server (see page 46). Handle errors occurring during a RPC call, e.g. caused by machine crashes or communication failures RPC-based applications must take communication failures into consideration. Basic RPC characteristics An RPC can be characterized as follows 1. uniform call semantics. 65
4.2. INTRODUCTION 2. "type-checking" of parameters and results. 3. parameter functionality. 4. Optimize response times rather than throughput. 5. new error cases bind operation failed; request timed out; arguments are too large goal is some transparency (see page 26) concerning exception handling and communication failures (relevant for the programmer). RPC and OSI Integration of the RPC into ISO/OSI protocol stack layer 7 application layer layer 6 presentation layer layer 5 session layer layer 4 transport layer client-server model RPC message exchange, e.g. request-response protocol transport protocols e.g. TCP/UDP or OSI TP4 hides communication details Operating system interface to underlying communication protocols transfer of data packets transport protocols: UDP (User Datagram Protocol) transports data packets without guarantees; TCP (Transmission Control Protocol) verifies correct delivery of data streams. message exchange: protocols. socket interface to the underlying communication RPC: hides communication details behind a procedure call and helps bridge heterogeneous platforms. 66
4.3. DISTRIBUTED APPLICATIONS BASED ON RPC RPC vs message exchange RPC message exchange synchronous (generally) asynchronous 1 primitive operation (RPC call) 2 primitive operation (send, receive) messages are configured by RPC system message specification by programmer one open RPC several parallel messages possible The RPC protocol defines only the structure of the request/answer messages; it does not supply a mechanism for secure data transfer. RPC exchange protocols There are different types of RPC exchange protocols the request (R) protocol the request-reply (RR) protocol the request-reply-acknowledge (RRA) protocol. 4.3 Distributed applications based on RPC How to implement distributed applications based on remote procedure calls? 4.3.1 Distributed application In order to isolate the communication idiosyncrasy of RPCs and to make the network interfaces transparent to the application programmer, so-called stubs are introduced. Stubs Integration of software handling the communication between components of a distributed application. Stubs encapsulate the distribution specific aspects. Stubs represent interfaces. Client Stub: contains the proxy definition of the remote procedure P. 67
4.3. DISTRIBUTED APPLICATIONS BASED ON RPC Server Stub: contains the proxy call for the procedure P. client program logical interface server program client C answer request server S 1 8 4 5 client stub server stub 2 7 3 6 network code message transfer network code Stub functionality Client and server stubs have the following tasks during client - server interaction. 1. Client stub specification of the remote service operation; assigning the call to the correct server; representation of the parameters in the transmission format. decoding the results and propagating them to the client application. unblocking of the client application. 2. Server stub decoding the parameter values; determining the address of the service operation (e.g. a table lookup). invoking the service operation. prepare the result values in the transmission format and propagate them to the client. 68
4.3. DISTRIBUTED APPLICATIONS BASED ON RPC Implementing a distributed application Manual implementation of stubs is error-prone use of a RPC generator to generate stubs from a declarative specification. RPC generator An RPC generator reduces the time necessary for implementation and management of the components of a distributed application. a declarative interface description is easier to modify and therefore less error-prone. client.c ms.idl server.c client stub RPC generator server stub ms.h data transformation Applying the RPC generator The individual steps for generating a distributed application are illustrated in the following figure. 69
4.3. DISTRIBUTED APPLICATIONS BASED ON RPC client application RPC interface specification server operations RPC generator client stubs data transformation header files server stubs compiler compiler compiler compiler application component client stub component server stub component operations component linker linker client program server program Structure of a distributed application The internal structure of a distributed application created using an RPC generator is as follows: 70
4.3. DISTRIBUTED APPLICATIONS BASED ON RPC client.o client stub RPC system RPC system server stub server.o filter filter send receive send receive network implemented by application programmer generated by RPC generator 4.3.2 RPC language declarative language to specify the interfaces between clients and servers. Structure of the interface description Use of a declarative language for the specification of the interface between components of a distributed application. [interface attribute-list] interface identifier constant declarations type declarations operation declarations Purpose of the interface attribute list version of the RPC system. fixed ports through which the server may be invoked. 4.3.3 Phases of RPC based distributed applications We distinguish between 3 phases: a) design and implementation b) binding of components c) invocation: a client invoking a server operation. 71
4.3. DISTRIBUTED APPLICATIONS BASED ON RPC Component binding The components of a distributed application (client and server) may be started independently; linking of components to enable RPC calls. Static binding Static binding takes place when the client program is generated. In this case, the server address is hard-coded within the client program. Semistatic binding The client determines the server address during the initialization of the client process. Server address remains unchanged for the whole life span of the client process. Binding can take place via entry in a database. broadcast or multicast message. name service. mediation mechanism ("broker" or "trader"); a broker mediates between client and server. Dynamic binding The server address is determined immediately before an RPC is performed. client operation is not hindered by the following situations server migration. client binding to alternative servers (if the commonly called server is not available). dynamic server replacement. binding sometimes integrates a solution to the factory problem, i.e. the startup of a non-operational server. Mediation and brokering Possible terms for a mediation component are: registry, broker or trader; Corba uses the term object request broker. Functionality of a broker 72
4.4. REMOTE METHOD INVOCATION (RMI) servers register their available service interfaces with the broker ("export interface"). the broker supplies the client with information in order to localize a suitable server and to determine the correct service interface ("import interface"). Client-to-server binding broker V 2. import 1. export client C 3. RPC call server S Broker information A broker manages information about the available, exported interfaces. server names ("white pages") service types ("yellow pages") behavioral or functional attributes static attributes: functionality of the provided services, cost, required bandwidth. dynamic attributes: current server state. Handling client requests Broker may either just provide the service interface to the client or act as a mediator between client and server. direct communication between C and S. indirect communication between C and S; communication between C and S is only possible via broker V (or several brokers). 4.4 Remote Method Invocation (RMI) RMI supports communication among objects residing on different Java virtual machines 73
4.4. REMOTE METHOD INVOCATION (RMI) (JVM). RMI (URL: http://www.oracle.com/technetwork/java/javase/tech/indexjsp-136424.html) is an RPC (see page 63) of the object-oriented Java environment. 4.4.1 Definitions Definition: Remote object is an object whose method can be called by an object residing on another Java Virtual Machine (JVM), even on another computer. Definition: Remote interface is a Java interface specifying the methods of a remote object. Definition: Remote method invocation (RMI) allows object-to-object communication between different Java Virtual Machines (JVM), i.e. it is the action of invoking a method of a remote interface on a remote object. The method calls for local and remote objects have the same syntax. remote object remote interface m1 m2 m3 data implementation of methods m4 m5 4.4.2 RMI characteristics Note in RMI: client and server are objects. RMI supports location and access transparency. Localization of remote objects. Communication with remote objects (using method calls). Automated class loading for objects passed as parameters or results. 74
4.4. REMOTE METHOD INVOCATION (RMI) Clients interact with remote interfaces, rather than with classes implementing these interfaces. How does RMI work Java RMI uses a registry to provide naming services for remote objects, stub and skeleton to facilitate communications between client and server. client program client host client stub data communication 4 server skeleton server host server object 2 3 RMI registry host return client stub 1 RMI registry look server object register server object RMI works as follows 1. a server object is registered with the RMI registry 2. a client looks through the RMI registry for the remote object 3. once the remote object is located, its stub is returned to the client 4. the remote object can be used in the same way as a local object communication between client and server is handled by stubs and skeletons. 4.4.3 RMI architecture 75
4.4. REMOTE METHOD INVOCATION (RMI) client method invocation remote object application layer proxy object stub skeleton presentation layer remote reference layer remote reference layer session layer transport system (TCP/IP, network) Stub/Skeleton layer Layer intercepts method calls by the client and redirects these calls to the remote object. Object serialization/deserialization; hidden from the application. Remote Reference layer Connects client and remote objects exported by the server environment by a 1-to-1 connection link. The layer provides JRMP (Java Remote Method Protocol) via TCP/IP. Mapping of stub/skeleton operations to the transport protocol of the host; it interfaces the application code with the network communication. The layer supports the method invoke. Object invoke (Remote obj, java.lang.reflect.method method, Object [ ] params, long opnum) throws Exception 76
4.4. REMOTE METHOD INVOCATION (RMI) 4.4.4 Locating remote objects How does the client find the remote object? client naming.lockup naming registry.lockup registry RMI supports a special name service, the RMI registry mapping of names to remote objects. stand-alone Java application. the RMI registry runs on all those machines hosting remote objects. standard port for registry requests is 1099. the RMI registry is itself a remote object. access of the RMI registry via the java.rmi.naming class. Naming interface methods public static void bind (String name, Remote obj) Throws AlreadyBoundException, java.net.malformedurlexception, RemoteException. associates the remote object obj with name (in URL format). example for name: rmi: //host[:service-port]/service-name if name is already bound to an object, then AlreadyBoundException is triggered. public static void rebind (String name, Remote obj) Throws java.net.malformaturlexception, RemoteException. associates always the remote object obj with name (in URL format). public static Remote lookup (String name) Throws NotBoundException, java.out.malformedurlexception, RemoteException. 77
4.4. REMOTE METHOD INVOCATION (RMI) returns as a result a reference (a stub) to the remote object. if name is not bound to an object, then NotBoundException is triggered. public static void unbind (String name) Throws NotBoundException, RemoteException. public static String [ ] list (string name) Throws java.net.malformedurlexception, RemoteException. as a result, it returns all names entered in the registry. the name parameter specifies only the host and port information. Registry-Lookup The client invokes a lookup for a particular URL, the name of the service (rmi://host:port/service). The following describes the steps: 1) a socket connection is opened with the host on the specified port. 2) a stub to the remote registry is returned. 3) the method Registry.lookup() on this stub is performed. The method returns a stub for the remote object. 4) the client interacts with the remote object through its stub. 4.4.5 Developing RMI applications The steps developing an RMI application differs slightly from the development steps of a traditional RPC application. 1. Defining a remote interface A remote interface is the set of methods that can be invoked remotely by a client. The remote interface must be declared public. The remote interface must extend the java.rmi.remote interface. Each method must throw the java.rmi.remoteexception exception. If the remote methods have any remote objects as parameters or return types, they must be interfaces rather than implementation classes. Example: remote interface definition 78
4.4. REMOTE METHOD INVOCATION (RMI) public interface HelloInterface extends java.rmi.remote { /* this method is called by remote clients and it is implemented by the remote object */ } public String sayhello ( ) throws java.rmi.remoteexception 2. Implementing the remote interface Definition of an implementation class that defines the methods of the remote interface; the abstract class java.rmi.server.remoteserver provides the basic semantics to support remote references. java.rmi.server.remoteserver has subclasses java.rmi.server.unicastremoteobject: defines a non-replicated remote object whose references are valid only while the server process is alive. java.rmi.activation.activatable: defines a remote object which can be instantiated on demand (if it has not been started already). Example: Remote interface implementation 79
4.4. REMOTE METHOD INVOCATION (RMI) import java.io.*; import java.rmi.* ; import java.rmi.server.* ; import java.util.date.* ; public class HelloServer extends UnicastRemoteObject implements HelloInterface{ public HelloServer( ) throws RemoteException { super( ); /* call superclass constructor to export this object */ } } public String sayhello( ) throws RemoteException { return "Hello World, the current system time is " + new Date( ); } 3. Generating stubs and skeletons The tool rmic generates stub and skeleton from the implemented class (up to Java version 5). rmic HelloServer 80
4.4. REMOTE METHOD INVOCATION (RMI) 1 define server object interface develop client program define server implementation class create and register server object 5 2 4 3 rmic client stub (Bytecode) server skeleton (Bytecode) 4. Remote object registration Every remotely accessible object must be registered in a registry in order to make it available; stubs are needed for registration. the registry is started at the host of the remote object. Example for object registration import java.rmi.* ; public class RegisterIt { 81
4.4. REMOTE METHOD INVOCATION (RMI) public static void main (String args []) } } try { // Instantiate the object HelloServer obj = new HelloServer( ); System.out.println ("Object instantiated: " + obj); Naming.rebind("/HelloServer", obj); System.out.println("HelloServer bound in registry"); } catch (Exception e) { System.out.println(e) } 5. Client implementation This step encompasses the writing of the client that uses remote objects. The client must incorporate a registry lookup in order to obtain a reference to the remote object. The client interacts with the remote interface, never with the object implementation. Example: Client implementation 82
4.4. REMOTE METHOD INVOCATION (RMI) import java.rmi.*; public class HelloClient { public static void main (String args [ ]) { if (System.getSecurityManager( ) == null) System.setSecurityManager (new RMISecurityManager( ) ); try { String name = "//" + args [0] + "/HelloServer"; HelloInterface obj = (HelloInterface) Naming.lookup (name); String message = obj.sayhello( ); System.out.println(message); } catch (Exception e) { System.out.println("HelloClient exception: " + e); } } } Missing access rights results in the exception: java.security.accesscontrolexception: access denied At the end the client must be started. 4.4.6 Parameter Passing in RMI Parameters with primitive data types are passed with their values between JVMs; for object parameters, a distinction is made between local and remote: 1. local object parameter RMI passes the object itself, rather than the object reference. The transmitted object must implement the interface java.io.serializable or java.io.externalizable. Classes requiring special handling must implement private void writeobject(java.io.objectoutputstreamout) throws IOException; 83
4.5. SERVLETS private void readobject(java.io.objectinputstreamin) throws IOException, ClassNotFoundException; 2. remote object parameter RMI transmits the stub of the remote object; the stub is a reference to the remote object. 4.4.7 Distributed garbage collection Utilization of life references for each JVM; reference counter represents the number of life references. client server object remote reference layer remote reference layer reference counter reference counter The first client access creates a referenced message sent to the server. If there is no valid client reference, then an unreferenced message is sent to the server. Time limit of references ("lease time", e.g. 10 minutes); the connection to the server must be renewed by the client, otherwise the reference becomes invalid. 4.5 Servlets Servlets (Java Servlets) are programs invoked by a client and executed on the server host: used to extend the functionality of the server. 84
4.5. SERVLETS client 1 client 2 client 3 server 1 server 2 servlet engine servlet 1 code servlet 2 code servlet 3 code 4.5.1 Servlet Properties execution of a servlet in the context provided by the servlet engine. Apache Tomcat (URL: http://jakarta.apache.org): free, open-source implementation of Java servlet technology. methods specified within each servlet object and invoked by the servlet engine init: when a servlet is initialized. shutdown: when a servlet is no longer needed. service: when a client request is forwarded to the servlet. servlets are invoked via HTTP requests (get or post method), e.g. <form method="post" action="http://myhost:8080/servlet/formservlet">... arguments of the form... 4.5.2 Servlet Lifecycle Interface javax.servlet.servlet specifies the methods to be implemented by the servlets. public void init() throws ServletException; 85
4.5. SERVLETS public void service(servletrequest request, ServletResponse response) throws ServletException, IOException; public void destroy(); JVM loads servlet class loaded creates servlet using its constructor created invokes the init method initialized servlet is invoked for the first time invokes the destroy method after timeout or Web Server is stopped destroyed served same servlet is invoked again invokes the service method invokes the service method 4.5.3 HttpServlet Interface HttpServlet inherits abstract class GenericServlet which implements interfaces Servlet and ServletConfig. GenericServlet defines a generic protocol-independent servlet HttpServlet defines a servlet for the HTTP protocol 86
4.5. SERVLETS javax.servlet.http.httpservlet doget (req:httpservletrequest, resp: HttpServletResponse): void dopost (req:httpservletrequest, resp: HttpServletResponse): void dodelete(req:httpservletrequest, resp: HttpServletResponse): void doput(req:httpservletrequest, resp: HttpServletResponse): void... javax.servlet.genericservlet javax.servlet.servlet init (config: ServletConfig): void service (req: ServletRequest, resp: ServletResponse): void destroy(): void javax.servlet.servletconfig getinitparameter(name: String): String getinitparameternames (): Enumeration getservletcontext(): ServletContext getservletname(): String doget is invoked to respond to a GET request dopost is invoked to respond to a POST request dodelete is invoked to respond to a DELETE request; normally used to delete a file on the server doput is invoked to respond to a PUT request; normally used to send a file to the server HttpServlet class provides a default implementation which must be overriden to process the requests. 4.5.4 Structure of a Servlet import javax.servlet.*; import javax.servlet.http.* import java.io.*; 87
4.5. SERVLETS public class MyServlet extends HttpServlet { /** called by the servlet engine to initialize servlet */ public void init() throws ServletException {... } /** process the HTTP Get request */ public void doget(httpservletrequest request, HttpServletResponse response) throws ServletException, IOEXception {... } /** process the HTTP Post request */ public void dopost(httpservletrequest request, HttpServletResponse response) throws ServletException, IOEXception {... } /** called by the servlet engine to release the resource */ public void destroy () {... } // other methods } Example - CurrentTime import javax.servlet.*; import javax.servlet.http.* import java.io.*; public class CurrentTime extends HttpServlet { /** process the HTTP Get request */ public void doget(httpservletrequest request, HttpServletResponse response) throws ServletException, IOEXception { response.setcontenttype( text/html ); PrintWriter out = response.getwriter(); out.println("<p>the current time is " + new java.util.date() ); out.close(); // close stream } } Invocation http://localhost:8080/.../servlet/currenttime Example - Registration of Students This HTML form and the associated servlet process student registrations. The HTML form represents an input formular sent via the GET method the servlet. 88
4.5. SERVLETS HTML Form <html> <head> <title>student Registration Form</title> </head> <body> <p>student Registration Form <form action="http://localhost:8080/examples/servlet/getparameters" method="get"> Last Name <input type="text" name="lastname" size="20"> First Name <input type="text" name="firstname" size="20"> <p>gender: <input type="radio" name="gender" value="m" checked>male <input type="radio" name="gender" value="f">female</p> <p>major <select name="major" size="1"> <option value="cs">computer Science <option value="ma">mathematics </select> Minor <select name="minor" size="2" multiple> <option>computer Science <option>mathematics <option>economics <option>mechanical Engineering </select></p> <p>hobby: <input type="checkbox" name="tennis">tennis <input type="checkbox" name="soccer">soccer <input type="checkbox" name="golf">golf</p> <p>remarks:</p> <p><textarea name="remarks" rows="3" cols="56"></textarea></p> <p><input type="submit" value="submit"> <input type="reset" value="reset"></p> </form> </body> Servlet import javax.servlet.*; import javax.servlet.http.* import java.io.*; 89
4.5. SERVLETS public class GetParameters extends HttpServlet { /** process the HTTP Get request */ public void doget(httpservletrequest request, HttpServletResponse response) throws ServletException, IOEXception { response.setcontenttype( text/html ); //obtain parameters from the client String lastname = request.getparameter("lastname"); String firstname = request.getparameter("firstname"); String gender = request.getparameter("gender"); String major = request.getparameter("major"); String[] minors = request.getparametervalues("minor"); String tennis = request.getparameter("tennis"); String soccer = request.getparameter("soccer"); String golf = request.getparameter("golf"); String remarks = request.getparameter("remarks"); out.println("last Name: <b>" + lastname + "</b> First Name: <b>" + firstname + "</b><br>"); out.println("gender: <b>" + gender + "</b><br>"); out.println("major: <b>" + major + "</b> Minor: <b>"); if (minors!= null) for (int i = 0; i < minors.length; i++) out.println(minors[i] + " "); out.println("</b><br> Tennis: <b>" + tennis + "</b> Soccer: <b>" + soccer + "</b> Golf: <b>" + golf + "</b><br>"); out.println("remarks: <b>" + remarks + "</b>"); out.close(); // close stream } } 90
Chapter 5 Basic mechanisms for distributed applications 5.1 Issues The following section discusses several important basic issues of distributed applications. Data representation in heterogeneous environments. Discussion of an execution model for distributed applications. What is the appropriate error handling? What are the characteristics of distributed transactions? What are the basic aspects of group communication (e.g. algorithms used by ISIS)? How are messages propagated and delivered within a process group in order to maintain a consistent state? 5.2 External data representation Heterogeneous environment means different data representations requirement to enable data transformation. independence from hardware characteristics while exchanging messages means: use of external data representation. 91
5.2. EXTERNAL DATA REPRESENTATION 5.2.1 Marshalling and unmarshalling client marshaling of arguments unmarshaling of results data stream across the network unmarshaling of arguments marshaling of results server marshal: parameter serialization to a data stream. unmarshal: data stream extraction and reassembly of arguments. software for argument transformation either provided by RPC system or as plugin by the application programmer. 5.2.2 Centralized transformation node A node B transformation Only B transforms data, both for data to be sent to A and data received from A. 5.2.3 Decentralized transformation node A node B transformation 92
5.2. EXTERNAL DATA REPRESENTATION All nodes execute data transformations. Variants A transforms data which are then sent to B; B transforms data which are then sent to A. A transforms data by B; B transforms data by A. A and B transform data in a network-wide standard format; the respective recipients retransform the received data into the local format. If new system components are dynamically added to the distributed system, the new system components simply have to "learn" about the network-wide unique standard representation. No special hardware is required. Example: XDR as part of ONC by Sun. 5.2.4 Common external data representation Two aspects of a common external data representation are of importance: a machine-independent format for data representation, and a language for description of complex data structures. Examples: XDR ("external Data Representation") by Sun and ASN.1 (URL: http://www.asn1.org/) (Abstract Syntax Notation). Other formats are Corba s common data representation: structured and primitive types can be passed as arguments and results. Java s object serialization: flattening of single objects or tree of objects. Representation of numbers For the representation of numbers in main memory, one of the following methods are generally used. "little endian" representation: the lower part of a number is stored in the lower memory area "big endian" representation: the higher part of a number is stored in the lower memory area, e.g. the Sun-Sparc architecture 93
5.2. EXTERNAL DATA REPRESENTATION Example representation of the number 1347 Memory Address 1000 1001 1002 1003 Big Endian 00000000 00000000 00000101 01000011 Little Endian 01000011 00000101 00000000 00000000 Convention: for network transfer, numbers which encompass several bytes are structured according to a well-defined representation, such as "big endian". External representation of strings There are different internal representations for strings: C: "abc" a b c \0 Pascal: "abc" 3 a b c Standardized external representation: 4 bytes n bytes r bytes length n byte 0 byte 1... byte n-1 0... 0 4 + n + r (with (n+r) mod 4 = 0) External representation of arrays 4 bytes n elements length n... element 0 element 1 element n-1 94
5.2. EXTERNAL DATA REPRESENTATION Arrays with a variable number of elements are represented as a "counted array." Transfer of pointers no shared address space for client and server: pointer transfer is problematic. 1. prohibit pointers in a remote procedure call. 2. dereference pointers in a remote procedure call. serialize the data structure the pointer is pointing to ("marshal"), and transfer the whole data structure. no use of null pointers; instead, we use boolean variables. in heterogeneous environments no use of function pointers. However, in a homogenous Java environment function pointers can be dereferenced and the function transferred to the server site. 3. pointer transfer. 5.2.5 XML as common data representation Complex data types can be mapped to XML for transmission across the network. primitive datatypes XSD equivalent boolean, byte, unsignedshort (used for char), int, long, float, string,... Example: primitive datatypes Request for the invocation of the Java method echostring("cat") SOAP body of request that sends a string. <soap:body> <n:echostring xmlns:n= http://tempuri.org/mapping.server.primitive > <value xsi:type= xsd:string >cat</value> </n:echostring> </soap:body> SOAP provides built-in support for encoding arrays. 95
5.2. EXTERNAL DATA REPRESENTATION Example: array datatype Request for Java method invocation echoints([1, 2, 3]) SOAP body of request that sends an array. <soap:body> <n:echoints xmlns:n= http://tempuri.org/mapping.server.array > <ints href= #id0 > </n:echoints> <id0 id= id0 soapenc:root= 0 xsi:type= soapenc:array soapenc:arraytype= xsd:int[3] > <i xsi:type= xsd:int >1</i> <i xsi:type= xsd:int >2</i> <i xsi:type= xsd:int >3</i> </id0> </soap:body> complex data types are mapped to XML schema types; SOAP platforms provide API for creating custom mapping. e.g. writeschema to specify an XML schema definition high low Application specific data encoding language General data encoding language Network data encoding language XML ASN.1 Sun XDR Level of Abstraction 96
5.3. TIME 5.3 Time Time is an important and interesting issue in distributed systems We need to measure time accurately: to know the time an event occurred at a computer to do this we need to synchronize its clock with an authoritative external clock Algorithms for clock synchronization useful for concurrency control based on timestamp ordering authenticity of requests e.g. in Kerberos Three notions of time: time seen by an external observer global clock of perfect accuracy. However, there is no global clock in a distributed system time seen on clocks of individual processes. logical notion of time: event a occurs before event b. 5.3.1 Introduction Each computer in a distributed system (DS) has its own internal clock used by local processes to obtain the value of the current time processes on different computers can timestamp their events but clocks on different computers may give different times computer clocks drift from perfect time and their drift rates differ from one another. clock drift rate: the relative amount that a computer clock differs from a perfect clock Even if clocks on all computers in a DS are set to the same time, their clocks will eventually vary quite significantly unless corrections are applied. Timestamp To timestamp events, we use the computer s clock 1. At real time t, the operating system reads the time on the computer s hardware clock H i (t) 97
5.3. TIME 2. It calculates the time on its software clock C i (t)= a H i (t) + b e.g. a 64 bit number giving nanoseconds since some "base time" in general, the clock is not completely accurate, but if C i behaves well enough, it can be used to timestamp events at p i Skew between clocks network Computer clocks are not generally in perfect agreement. Skew: the disagreement between two clocks (at any instant). Computer clocks are subject to clock drift (they count time at different rates). Clock drift rate: the difference per unit of time from some ideal reference clock Ordinary quartz clocks drift by about 1 sec in 11-12 days. Coordinated Universal Time (UTC) International Atomic Time is based on very accurate physical clocks (drift rate 10-13 ). UTC is an international standard for time keeping It is based on atomic time, but occasionally adjusted to astronomical time It is broadcast from radio stations on land and satellite (e.g. GPS) 98
5.3. TIME Computers with receivers can synchronize their clocks with these timing signals Signals from land-based stations are accurate to about 0.1-10 millisecond Signals from GPS are accurate to about 1 microsecond 5.3.2 Synchronizing physical clocks physical clocks are used to compute the current time in order to timestamp events, such as modification date of a file time of an e-commerce transaction for auditing purposes External - internal synchronization External synchronization A computer s clock C i is synchronized with an external authoritative time source S, so that: S(t) - C i (t) < D for i = 1, 2,... N over an interval I of real time t. The clocks C i are accurate to within the bound D. Internal synchronization The clocks of a pair of computers are synchronized with one another so that: C i (t) - C j (t) < D for i, j = 1, 2,... N over an interval I of real time t. The clocks C i and C j agree within the bound D. Internally synchronized clocks are not necessarily externally synchronized, as they may drift collectively. if the set of processes P is synchronized externally within a bound D, it is also internally synchronized within bound 2D. Clock correctness A hardware clock H is said to be correct if its drift rate is within a bound q > 0. (e.g. 10-6 secs/ sec) the error in measuring the interval between real times t and t is bounded: (1 - q)(t - t) H(t ) - H(t) (1 + q)(t - t), where t >t no jumps in time readings of hardware clocks 99
5.3. TIME Weaker condition of monotonicity t > t C(t ) > C(t) e.g. required by Unix make. we can achieve monotonicity with a hardware clock that runs fast by adjusting the values of a and b of C i (t)= a H i (t) + b a faulty clock is one that does not obey its correctness condition. crash failure - a clock stops ticking. arbitrary failure - any other failure e.g. jumps in time. Synchronization in a synchronous system a synchronous distributed system is one in which the following bounds are defined the time to execute each step of a process has known lower and upper bounds. each message transmitted over a channel is received within a known bounded time. each process has a local clock whose drift rate from real time has a known bound Internal synchronization in a synchronous system One process p1 sends its local time t to process p2 in a message m p2 could set its clock to t + T trans where T trans is the time to transmit m T trans is unknown but min T trans max uncertainty u = (max-min). Set clock to t + (max - min)/2 then skew u/2. 100
5.3. TIME Cristian s method for an asynchronous system Observations: round trip times between processes are often reasonably short in practice, yet theoretically unbounded practical estimate possible if round-trip times are sufficiently short in comparison to required accuracy Approach A time server S receives signals from a UTC source Process p requests time in m1 and receives t in m2 from S. p sets its clock to: t + T round /2 Accuracy is +/- (T round /2 - min). because the earliest time S puts t in message m2 is min after p sent m1: t + min the latest time was min before m2 arrived at p: t + T round - min the time by S s clock when reply message m2 arrives is in the range [t + min, t + T round - min] Process p message m1 message m2 Time server S Discussion The approach has several problems. It is only suitable for deterministic LAN environment or Intranet. a single time server might fail redundancy through group of servers, multicast requests it does not deal with faulty time servers how to decide if replies vary (byzantine agreement problems) imposter providing false clock readings Berkeley algorithm An algorithm for internal synchronization of a group of computers 101
5.3. TIME A master polls to collect clock values from the others (slaves) The master uses round trip times to estimate the slaves clock values It takes an average (eliminating any above some average round trip time or with faulty clocks) It sends the required adjustment to the slaves (better than sending the time which depends on the round trip time). If master fails, group can elect a new master to take over. Both algorithms (Cristian and Berkeley) are not really suitable for Internet. Network Time Protocol (NTP) Cristian and Berkeley algorithm are intended for the Intranet. NTP defines an architecture for a time service and a protocol to distribute time information over the Internet. use of 64 bit timestamps. NTP synchronizes clients to UTC. 1 2 2 3 3 Primary servers are connected to UTC sources Secondary servers are synchronized to primary servers Synchronization subnet - lowest level servers in users computers NTP - synchronization of servers The synchronization subnet can reconfigure if failures occur, e.g. a primary that loses its UTC source can become a secondary a secondary that loses its primary can use another primary Modes of synchronization 102
5.3. TIME Multicast: A server within a high speed LAN multicasts time to others which set clocks assuming some delay (not very accurate) Procedure call: A server accepts requests from other computers (like Cristian s algorithm). Higher accuracy. Useful if no hardware multicast. Symmetric: Pairs of servers exchange messages containing time information Used where very high accuracies are needed (e.g. for higher levels) Messages between a pair of NTP peers All modes use UDP transport protocol for the message exchange Server B T1 T2 m m' time Server A T0 T3 Each message bears timestamps of recent events: Local times of Send and Receive of previous message Local time of Send of current message Recipient (Server A) notes the time of receipt T3 ( we have T0, T1, T2, T3). In symmetric mode there can be a non-negligible delay between messages Accuracy of NTP For each pair of messages between two servers, NTP estimates an offset o between the two clocks and a round-trip delay d i (total transmission time for the two messages m and m, which take t and t ) T i-2 = T i-3 + t + o and T i = T i-1 + t - o This gives us the delay (by adding the equations) d i = t + t = T i-2 - T i-3 + T i - T i-1 Also the offset (by subtracting the equations) o = o i + (t - t )/2, where o i = (T i-2 - T i-3 + T i-1 - T i )/2 103
5.3. TIME Estimate of offset Using the fact that t, t > 0 it can be shown that o i - d i /2 o o i + d i /2. Thus o i is an estimate of the offset and d i is a measure of the accuracy NTP servers filter pairs <o i, d i > retains the 8 most recent pairs estimates the offset o NTP applies peer-selection to identify peer for reliability estimate. Accuracy over Internet: tens of ms over a LAN: 1 ms Precision Time Protocol (PTP) PTP is a protocol to synchronize clocks throughout a computer network NTP is typically used over the Internet handling large amounts of nondeterministic delays; accuracy in the ms-range. PTP is designed for LANs achieving clock accuracy in the sub-microsecond range. Synchronization Message Exchange PTP defines a master-slave hierarchy. Master T1 T4 Sync Follow-Up Delay request Delay response time Slave T2 T3 timestamps known by slave T2 T2, T1 T2, T1, T3 T2, T1, T3, T4 104
5.4. DISTRIBUTED EXECUTION MODEL master periodically transmits a Sync message using UDP multicast. the follow-up message includes the actual time the Sync message left the master. slave initiates exchange with master to determine round-trip delay: delayrequest and delay-response messages. if d is the transit time for the Sync message and o the constant offset between master and slave clocks T2 - T1 = o + d and T4 - T3 = -o + d o = (T2 - T1 -T4 + T3)/2 PTP supports an algorithm to perform a distributed selection of the best candidate clock. 5.4 Distributed execution model 5.4.1 Events Classes of events Components of a distributed application communicate through messages causing events in the components. The component execution is characterized by three classes of events: internal events (e.g. the execution of an operation). message sending. message receiption. in some cases distinction between message reception and message delivery to application as separate events. The execution of a component TK creates a sequence of events e 1,..., e n,... The execution of the component TK i is defined by (E i, i ) with: E i is the set of events created by TK i execution i defines a total order of the events of TK i 105
5.4. DISTRIBUTED EXECUTION MODEL The relation msg defines a causal relationship for the message exchange: send(m) msg receive(m), i.e. sending of the message m must take place prior to receiving m. There are the following interpretations a b, i.e. a before b; b causally depends on a. a b, i.e. a and b are concurrent events. Rules for "happened-before" after Lamport In order to guarantee consistent states among the communicating components, the messages must be delivered in the correct order. The happened-before relation after Lamport may help to determine a message sequence for a distributed application. The following rules apply: Events within a component are ordered with respect to the before-relation, i.e. a b if "a" is a send event of component TK1, and "b" the respective receive event of component TK2, then a b; if a b and b c, then a c; if (a b) and (b a), then a b; i.e. a and b are concurrent, i.e. they are not ordered. Utilization of logical clocks to determine the event sequence. Let T: a set of timestamps C: E T a mapping which assigns a timestamp to each event a b C(a) < C(b) If the reverse deduction is valid, too ( ), then the clock is called strictly consistent. 106
5.4. DISTRIBUTED EXECUTION MODEL 5.4.2 Ordering by logical clocks Each component manages the following information: its local logical clock lc; lc determines the local progress with respect to occuring events. its view on the global logical clock gc; the value of the local clock is determined according to the value of the global clock. There exist functions for updating logical clocks in order to maintain consistency; the following two rules apply. Rules Rule R1 specifies the update of the local clock lc when events occur. Rule R2 specifies the update of the global clock gc. 1. Sending event: determine the current value of the local clock and attach it to the message. 2. Receiving event: the received clock value (attached to the message) is used to update the view on the global clock. 5.4.3 Logical clocks based on scalar values Description The clock value is specified by positive integer numbers. the local clock lc and the view on global clock (gc) are both represented by the counter C. Execution of R1 prior to event execution, C is updated: C := C + d. Execution of R2 after receiving a message with timestamp C msg (the timestamp is part of the message), the following actions are performed C := max (C, C msg ) execute R1 deliver message to the application component 107
5.4. DISTRIBUTED EXECUTION MODEL Example TK 1 TK 2 TK 3 1 1 1 2 2 3 3 4 4 5 5 6 7 7 8 9 The scalar clock mechanism defines a partial ordering on the occurring events. scalar clocks are not strictly consistent, i.e. the following is not true: C(a) < C(b) a b 5.4.4 Logical clocks based on vectors Description The time is represented by n-dimensional vectors with positive integers. Each component TK i manages its own vector vt i [1...n]. The dimension n is determined by the number of components of the distributed application. vt i [i] is the local logical clock of TK i. vt i [k] is the view of TK i on the local logical clock of TK k ; it determines what TK i knows about the progress of TK k Example: vt i [k] = y, i.e. according to the view of TK i, TK k has advanced to the state y, i.e. up to the event y. the vector vt i [1...n] represents the view of TK i on the global time (i.e. the global execution progress for all components). Execution of R1 vt i [i] := vt i [i] + d Execution of R2 After receiving a message with vector vt from another component, the following actions are performed at the component TK i 108
5.4. DISTRIBUTED EXECUTION MODEL update the logical global time: 1 k n: vt i [k] := max (vt i [k], vt[k]). execute R1 deliver message to the application process of component TK i Example for vector clocks TK1 1 0 0 2 0 0 3 0 0 4 3 4 5 3 4 TK2 0 1 0 2 0 0 2 2 0 2 3 0 2 3 0 2 4 0 2 3 4 TK3 0 0 1 2 3 2 2 3 3 2 3 4 optimization: omit vector timestamps when sending a burst of multicasts missing timestamp means: use values of previous vector timestamp and increment the sender s field only. Characteristics of vector clocks Comparison of two vector clocks (timestamps) vh[1..n] and vk[1..n]: vh vk x: vh[x] vk[x] vh < vk vh vk and x: vh[x] < vk[x] vh vk (vh < vk) and (vk < vh) Let a and b be events with timestamps (vector clocks) va and vb, then the following is true a b va < vb a b va vb 109
5.5. FAILURE HANDLING IN DISTRIBUTED APPLICATIONS If a of Tk i and b of Tk j have been triggered, then the following is true a b va[i] < vb[i] and va[j] < vb[j] a b va[i] > vb[i] and va[j] < vb[j] Vector clocks are strictly consistent. 5.5 Failure handling in distributed applications 5.5.1 Motivation Failures in a local application handled through a programmer-defined exception-handling routine. no handling. Failures in a distributed application. Failures may be caused by communication link failures. crashes of machines hosting individual subsystems of the distributed application. The client crashes the server waits for RPC calls of the crashed client; server does not free reserved resources. The server crashes client cannot connect to the server. byzantine failures: processes fail, but may still respond to environment with arbitrary, erratic behavior (e.g., send false acknowledgements, etc.) failure-prone RPC-interfaces. bugs in the distributed subsystems themselves. 5.5.2 Steps for testing a distributed application Testing is done in 3 subsequent steps 1. Test of the distributed application without the communication parts. enables the test of component functionality. Such aspects as asynchronousness, parallelism and time are not taken into consideration. 2. Test of the distributed application with local communication. enables time predictions about components without considering network transport times. 110
5.5. FAILURE HANDLING IN DISTRIBUTED APPLICATIONS 3. Test of the distributed application with network-wide communication. In this step, the following problems are identified time dependencies between component execution. sequential and parallel execution of components. support of multiple clients. 5.5.3 Debugging of distributed applications Setting a breakpoint in the server code and inspecting the local variables can cause a timeout in the client process. Problems with distributed applications Due to the distribution of the components and the necessary communication between them debugging must handle the following issues. 1. Communication between components. 2. Snapshots. Observation and control of the message flow between components. no shared memory, no strict clock synchronization. state of the entire system. the global state of a distributed system consists of the local states of all components, and the messages under way in the network. 3. Breakpoints and single stepping in distributed applications. 4. Nondeterminism. In general, message transmission time and delivery sequence is not deterministic. failure situations are difficult to reproduce, if at all. 5. Interference between debugger and distributed application. irregular time delay of component execution when debugging operations are performed. 111
5.5. FAILURE HANDLING IN DISTRIBUTED APPLICATIONS 5.5.4 Approaches of distributed debugging focus on the send/receive events caused by the message exchange and less on the internal component operations. Monitoring the communication between components Only the message flow between components is considered; components are considered as "black boxes"; local test aids are used for the debugging of the individual components. component component debugger Global breakpoint Approach This approach of global breakpoints is based on the events caused by the message exchange between the components of the distributed application. The events are partially ordered. use of logical clocks (scalar or vector clock) in order to determine event dependencies. component 1 t11 t12 component 2 t22 t21 t23 component 3 t31 t32 t33 t 12 and t 23 are not ordered; t 11 and t 33 are ordered; Causally distributed breakpoint In at least one component the breakpoint condition is met. 112
5.6. DISTRIBUTED TRANSACTIONS all components are rolled back to the earliest possible, consistent state after the last event being in a before-relationship with the triggering event. component 1 t11 t12 component 2 t22 t21 t23 component 3 t31 t32 t33 Example of a distributed debugger: IBM IDEBUG: a multilanguage, multiplatform debugger with remote debug capabilities. 5.6 Distributed transactions Distributed transactions are an important paradigm for designing reliable and fault tolerant distributed applications; particularly those distributed applications which access shared data concurrently. 5.6.1 General observations Several requests to remote servers (e.g. RPC calls) may be bundled into a transaction. Any transaction whose activities involves multiple servers is a distributed transaction. begin-transaction callrpc (OP 1,..., )... callrpc (OP n,..., ) end-transaction A distributed transaction involves activities on multiple servers, i.e. within a transaction, services of several servers are utilized. Transactions satisfy the ACID property: Atomicity, Consistency, Isolation, Durability. 113
5.6. DISTRIBUTED TRANSACTIONS 1. atomicity: either all operations or no operation of the transaction is executed, i.e. the transaction is a success (commit) or else has no consequence (abort). 2. durability: the results of the transaction are persistent, even if afterwards a system failure occurs. 3. isolation: a not yet completed transaction does not influence other transactions; the effect of several concurrent transactions looks like as if they have been executed in sequence. 4. consistency: a transaction transfers the system from a consistent state to a new consistent state. 5.6.2 Isolation Isolation refers to the serializability of transactions. All involved servers are responsible for the serialization of distributed transactions. Example: let U, T be distributed transactions accessing shared data on the two servers R and S. if the transactions at server R are successfully executed in the sequence U before T, then the same commit sequence must apply to server S. Mechanisms for handling concurrent distributed transactions are: timestamps, locking, optimistic concurrency control. Timestamp ordering In a single server transaction, the server issues a unique timestamp to each transaction when it starts. In a distributed transaction each server is able to issue globally unique timestamps. for distributed transactions, the timestamp is the pair (local timestamp, server-id) The local timestamp refers to the first server which issued the transaction timestamp. Assume: timestamp(trans) = t trans and timestamp(obj) = t obj transaction trans accesses object obj if (t trans < t obj )then abort(trans) else access obj; 114
5.6. DISTRIBUTED TRANSACTIONS Locking Each server maintains locks for its own data items. Transaction trans requests lock (e.g. read, write lock) before access. A transaction trans is well-formed if: trans locks an object obj before accessing it. trans does not lock an object obj which has already been locked by another transaction; except if the locks can coexist, e.g. two read locks. prior to termination, trans removes all object locks. A transaction is called a 2-phase transaction if no additional locks are requested after the release of objects ("2-phase locking"). Optimistic concurrency control if conflicts are rare, optimistic concurrency control may be useful: no additional coordination necessary during transaction execution. The check for access conflicts occurs when transactions are ready to "commit"; Examples The following examples show the concurrency control approaches used by some current systems. Dropbox cloud service that provides file backup and enables users to share files and folders, accessing them from anywhere. uses optimistic concurrency control; file granularity. Wikipedia creating and managing of wiki pages uses optimistic concurrency control for editing. Google Docs cloud service providing web-based applications (word processor, spreadsheet and presentation) that allow users to collaborate by means of shared documents. awareness based concurrency control: if several people edit the same document simultaneously, they will see each other s changes. 115
5.6. DISTRIBUTED TRANSACTIONS 5.6.3 Atomicity and persistence These aspects of distributed transactions may be realized by one of the following approaches. Let trans be a transaction. Intention list all object modifications performed by trans are entered into the intention list (log file). When trans commits successfully, each server S performs all the modifications specified in AL S (trans) in order to update the local objects; the intention list AL S (trans) is deleted. New version When trans accesses the object obj, the server S creates the new version obj trans ; the new version is only visible to trans. When trans commits successfully, obj trans becomes the new, commonly visible version of obj. It trans aborts, obj trans is deleted. 5.6.4 Two-phase commit protocol (2PC) This protocol supports the communication between all involved servers of the distributed transaction in order to jointly decide if the transaction should commit or abort. We can distinguish between two phases Voting phase: the servers submit their vote whether they are prepared to commit their part of the distributed transaction or they abort it. Completion phase: it is decided whether the transaction can be successfully committed or it has to be aborted; all servers must carry out this decision. Steps of the two-phase commit protocol One component (e.g. the client initiating the transaction or the first server in the transaction) becomes the coordinator for the commit process. In the following we assume, client C is the coordinator. 116
5.6. DISTRIBUTED TRANSACTIONS S 1 C S i S n 1. Coordinator C contacts all servers S i of the distributed transaction trans requesting their status for the commit (CanCommit?) if server S k is not ready, i.e. it votes no, then the transaction part at S k is aborted; i with S i is not ready then trans is aborted; the coordinator sends an abort message to all those servers who have voted with ready (i.e. yes). 2. i with S i is ready, i.e. commit transaction trans. Coordinator sends a commit message to all servers. 3. Servers send an acknowledgement to the coordinator. Operations The coordinator communicates with the participants to carry out the two-phase commit protocol by means of the following operations: cancommit(trans) Yes/No: call from the coordinator to ask whether the participant can commit a transaction; participant replies with its vote. docommit(trans): call from the coordinator to tell participant to commit its part of a transaction. doabort(trans): call from the coordinator to tell participant to abort its part of a transaction. havecommitted(trans, participant): call from participant to coordinator to confirm that it has committed the transaction. getdecision(trans) Yes/No: call from participant to coordinator to ask for the decision on trans. 117
5.6. DISTRIBUTED TRANSACTIONS Communication in the two-phase commit protocol coordinator ready to commit committed finished CanCommit? yes DoCommit HaveCommitted Server ready to commit committed Number of messages: 4 * N messages for N servers. Problems During the 2PC process several failures may occur one of servers crashes. the coordinator crashes. depending on their state, this may result in blocking situations, e.g. the coordinator waits for the commit acknowledge of a server, or a server waits for the final decision (commit or abort). Extended 2PC 118
5.6. DISTRIBUTED TRANSACTIONS Coordinator: multicast: ok to commit? collect replies all ok => log commit to outcomes table wait until saved to persistent store send commit else => send abort collect acknowledgements garbage collect data from outcomes table After Failure: for each pending protocol in outcomes table send outcome (commit or abort) wait for acknowledgements garbage collect data from outcomes table Server: first time message (CanCommit) received ok to commit => save data to temp area (persistent store) reply ok commit => make change permanent send acknowledgement abort => delete temp area message is a duplicate (recovering coordinator) send acknowledgement After Failure: for each pending protocol contact coordinator to learn outcome Three-Phase Commit protocol (3PC) is another approach to overcome blocking of servers until the crashed coordinator recovers. 5.6.5 Distributed Deadlock Multiple transactions may access objects of multiple servers resulting in a distributed deadlock. at object access the server lock manager locks the object for the transaction. deadlock detection schemes try to find cycles in a wait-for graph. server Z C D held by W transaction waits for object A server X waits for held by held by V waits for U held by B server Y 119
5.6. DISTRIBUTED TRANSACTIONS theory: construct a global wait-for graph from all local wait-for graphs of the involved servers. Problems: the central server is a single point of failure. communication between servers take time. Edge Chasing distributed approach to deadlock detection no global wait-for graph is constructed. each involved server has some knowledge about the edges of the wait-for graph. servers attempt to find cycles by forwarding messages (called probes). each distributed transaction T starts at a server the coordinator of T. the coordinator records whether T is active or waiting for a particular object on a server. lock manager informs coordinator of T when T starts waiting for an object and when T acquires finally the lock. Edge Chasing Algorithm The algorithm consists of 3 steps: initiation, detection and resolution. deadlock detected W U V W held by W waits for server Z coordinator W object A C initiation server X waits for W U V held by V coordinator V W coordinator U U U waits for held by B server Y 120
5.6. DISTRIBUTED TRANSACTIONS initiation: server X notes that W is waiting for another transaction U; it sends the probe "W U" to the server of B via the coordinator of U. detection: detection consists of receiving probes and deciding whether a deadlock has occurred and whether to forward the probes. Server Y receives the probe "W U"; it notes B is held by transaction V and appends V to the probe to produce "W U V"; probe is forwarded to server Z via coordinator of V. resolution: when a cycle is detected, a transaction in the cycle is aborted to break the deadlock. Transaction Priorities Every transaction involved in a deadlock cycle may cause the initiation of deadlock detection several servers initiate deadlock detection in parallel possible more than one transaction in a cycle is aborted. Example: transaction T attempts to access an object A locked by U transaction W attempts to access an object B locked by V initial situation detection started at object A detection started at object B server held by T deadlock detected T T waits for V transaction A V A V W V T A B U T U W V T U U W V W V T U U W B T U W B deadlock detected W W transactions are totally ordered by priorities. in a cycle, transaction with lowest priority is aborted. 121
5.7. GROUP COMMUNICATION 5.7 Group communication 5.7.1 Introduction Usually, the communication primitives known in operating systems are binary, i.e., an individual sender opens a communication path to a single selected receiver. Group communication facilities the interaction between groups of processes. Motivation Many application areas such as CSCW profit immensely if primitives for a group communication are supported properly. Other relevant application areas include fault-tolerant file services and replication-transparent file systems. In both cases, all communication to the primary file server subsystem must also be propagated to the so-called stand-by file service or to the file replicas, respectively. In the first case, a standby file service takes over when the primary site crashes or becomes unavailable for some other reason (e.g., link failures, network partitioning), while in the second case the communication helps to maintain a consistent state among the file replicas. typical application for group communication fault tolerance using replicated services, e.g. a fault-tolerant file service. object localization in distributed systems; request to a group of potential object servers. conferencing systems and groupware. functional components (e.g. processes) are composed to a group; a group is considered as a single abstraction. Important issues Important issues of group communication are the following: Group membership: the structural characteristics of the group; composition and management of the group. Support of group communication: the support refers to group member addressing, error handling for members which are unreachable, and the message delivery sequence. 122
5.7. GROUP COMMUNICATION Communication within the group unicasting, broadcasting, multicasting Multicast messages are a useful tool for constructing distributed systems with the following characteristics fault tolerance based on replicated services. locating objects in distributed services. multiple update of distributed, replicated data. Synchronization the sequence of actions performed by each group member must be consistent. Conventional approaches Group addressing Central approach: There is a central group server which knows the current state of the group composition. Decentralized approach: Each group member is aware of the group structure and its members. Communication services This issue refers to the technology used for the communication between group members. Datagrams (for example UDP). reliable data stream (for example TCP). In order to get a consistent global group behavior, even in case of errors, a special group communication support is needed, for example ISIS (and the succeeding project Horus) by Cornell University. 5.7.2 Groups of components Classification of groups Groups can be categorized according to various criteria. 123
5.7. GROUP COMMUNICATION Closed vs. open group Closed group not permitted closed group Open group open group permitted Distinction between flat and hierarchical group. A flat group may also be called a peer group. Distinction between implicit (anonymous) and explicit group. In the first case, the group address is implicitly expanded to all group members. 5.7.3 Management of groups Operations for group management Query for existing group names. Creation of a new group and deletion of an existing group: groupcreate; groupdelete. Joining or leaving a group: groupjoin; groupleave. Reading and modifying group attributes dynamically. Read information about group members. 124
5.7. GROUP COMMUNICATION Group management architecture Again, there are different approaches for providing the group management functionality. centralized group managers, realized as an individual group server. decentralized approach, i.e. all components perform management tasks. requires replication of group membership information, i.e. must be maintained. joining and leaving a group must happen synchronously. Hybrid approach for each LAN cluster, there is a central group manager. consistency replication of group membership information and consistency control is limited to the group managers. a group manager knows all local components, as well as the remote group managers; on executing a group function (e.g. a modification of the group membership), it contacts the local components and also propagates the information to all other group managers. GM1 Internet GM3 GM2 5.7.4 Message dissemination For message dissemination to the group members the following mechanisms are possible options: 125
5.7. GROUP COMMUNICATION Unicast: send and receive messages addressed to individual group members. Group multicast: send and receive messages addressed to the group as a whole. Inter-group multicast: send and receive messages addressed to several groups. Broadcast: send and receive messages addressed to all components (requires filtering). Hybrid approach for wide-area networks GM1 Internet GM3 GM2 5.7.5 Message delivery Message delivery is an important issue of group communication; two aspects are relevant: a) who gets the message, and b) when is the message delivered. Atomicity Atomicity specifies who receives a message. in the absence of errors, we have the "exactly-once" semantics, i.e. messages to the group are delivered exactly once to all group members. "all-or-nothing" semantics for messages to the group ("atomic broadcast"), i.e. a message is either delivered to all group members or to none. atomicity facilitates distributed application programming. 126
5.7. GROUP COMMUNICATION Sequence of message delivery It is desired to deliver all messages sent to the group G to all group members of G in the same sequence, because otherwise we might get non-deterministic system behavior. Example for group reconfiguration S3 S2 S1 m3 C1 C2 m1 m2 m4 m4 is sent by C1 before the group composition is modified. However, in order to guarantee atomicity, m4 should not be delivered to S1 and S2 (since, due to the crash, it is no longer possible to deliver m4 to S3). Ordering for message delivery Delivery of messages without delay in the same sequence is not possible in a distributed system ordering methods for message delivery. synchronously, i.e. there is a system-wide global time ordering. loosely synchronous, i.e. consistent time ordering, but no system-wide global (absolute) time. Total ordering by sequencer A selected group member serializes all the messages sent to the group. 127
5.7. GROUP COMMUNICATION sender sequencer message N sequence number of N receiver 1st step: the sender distributes the message N to all group members; 2nd step: sequencer (serializer) determines a sequence number for N and distributes it to all group members; delivery of N to the application processes takes place according to this number. Virtually synchronous ordering determination of a correct sequence based on the before relation between two events modeling their causal dependency (see causally distributed breakpoints (see page 112)). Example 1. T 1 sends N 1, and T 2 sends N 2 with N 2 dependent on N 1 2. T 4 sends N 3 with N 1 and N 3 concurrent 3. at T 2 : N 3 is received before N 1 4. at T 3 : N 3 is received after N 1 T1 T2 N1 N2 T3 T4 N3 sync-ordering 128
5.7. GROUP COMMUNICATION This approach for message delivery introduces synchronization points. Synchronously ordered messages are delivered to all group members in-sync. let N i be a synchronously ordered message all other messages N k are delivered either before or after N i has been delivered to all group members. The ordering method enables the group to synchronize their local states (at synchronization points the group members have a common consistent state). 5.7.6 Taxonomy of multicast Multicast messages for constructing distributed systems based on group communication; different multicast communication semantics Multicast classes Depending on the message delivery guarantee, five classes of multicast services can be distinguished. 1. unreliable multicast: an attempt is made to transmit the message to all members without acknowledgement; at-most-once semantics with respect to available members; message ordering is not guaranteed. 2. reliable multicast: the system transmits the messages according to "best-effort", i.e. the "at-least-once" semantics is applied. B-multicast primitive: guarantees that a correct process will eventually deliver the message as long as the multicaster does not crash. B-deliver primitive: corresponding primitive when a message is received. 3. serialized multicast: consistent sequence for message delivery; distinction between totally ordered causally ordered (i.e. virtually synchronous) 4. atomic multicast: a reliable multicast which guarantees that either all operational group members receive a message, or none of them do. 5. atomic, serialized multicast: atomic message delivery with consistent delivery sequence 129
5.7. GROUP COMMUNICATION Relationship between multicast classes reliable multicast unreliable multicast atomic multicast atomic serialized multicast serialized multicast Multicasting can be realized by using IP multicast which is built on top of the Internet protocol IP. Java API provides a datagram interface to IP multicast through the class MulticastSocket. 5.7.7 Group communication in ISIS The ISIS system developed at Cornell University is a framework for reliable distributed computing based upon process groups. It specifically supports group communication. Successor of ISIS was Horus (URL: http://www.cs.cornell.edu/info/projects/horus). ISIS is a toolkit whose basic functions include process group management and ordered multicast primitives for communication with the members of the process group. abcast: totally ordered multicast. cbcast: causally ordered multicast. abcast protocol atomic broadcast supports a total ordering for message delivery, i.e. all messages to the group G are delivered to all group members of G in the same sequence. abcast realizes a serialized multicast abcast is based on a 2-phase commit protocol; message serialization is supported by a distributed algorithm and logical timestamps. 130
5.7. GROUP COMMUNICATION Phase 1 Sender S sends the message N with logical timestamp T S (N) to all group members of G (e.g. by multicast). Each g G determines a new logical timestamp T g (N) for the received message N and returns it to S. Phase 2 S determines a new logical timestamp for N; it is derived from all proposed timestamps T g (N) of the group members g. T S,new (N) = max (T g (N)) + j/ G, with j being a unique identifier of sender S. S sends a commit to all g G with T S,new (N). Each g G delivers the message according to the logical timestamp to its associated application process. cbcast protocol causal broadcast guarantees the correct sequence of message delivery for causally related messages. Concurrent messages can be delivered in any sequence; this approach minimizes message delay. Introduction The cbcast protocol uses vector timestamps to implement causally ordered message exchange between the members of a peer group. S1 S2 S3 N1 N2 131
5.7. GROUP COMMUNICATION Algorithm of the cbcast protocol Let n be the number of group members of G. Each g G has a unique number of {1,..., n} and a state vector z which stores information about the received group messages. The state vector represents a vector clock (see page 108). Each message N of sender S has a unique number; message numbers are linearly ordered with increasing numbers. Let j be a group member of the group G. the state vector z j = (z ji ) i {1,...,n} specifies the number of messages received in sequence from group member i. Example: z ji = k; k is the number of the last message sent by member i G and received in correct sequence by the group member j. at group initialization all state vectors are reset (all components are 0). Sending a message N; j G sends a message to all other group members. z jj := z jj + 1; the current state vector is appended to N and sent to all group members. Receiving a message N sent by member i G. Message N contains state vector z i. There are two conditions for delivery of N to the application process of j (C 1): z ji = z ii - 1. (C 2): k i: z ik z jk. 5.7.8 JGroups JGroups (URL: http://www.jgroups.org/) is a reliable group communication toolkit written in Java. It is based on IP multicast and extends it with reliability, especially ordering of messages and atomicity. management of group membership. Programming Interface of JGroups groups are identified via channels. channel.connect("mygroup"); 132
5.8. DISTRIBUTED CONSENSUS a channel is connected to a protocol stack specifying its properties. application protocol stack Sequencer GMS Frag Total ordering of messages using a coordinator group membership layer fragmentation layer UDP network Code Example String props = "UDP:Frag:GMS:causal"; Message send_msg; Object recv_msg; Channel channel = new JChannel(props); channel.connect("mygroup"); send_msg = new Message(null, null, "hello World"); channel.send(send_msg); recv_msg = (Message) channel.receive(0); System.out.println("Received " + recv_msg); channel.disconnect(); channel.close(); 5.8 Distributed Consensus problem of distributed processes to agree on a value; processes communicate by message passing. Examples 133
5.8. DISTRIBUTED CONSENSUS all correct computers controlling a spaceship should decide to proceed with landing, or all of them should decide to abort (after each has proposed one action or the other) in an electronic money transfer transaction, all involved processes must consistently agree on whether to perform the transaction (debit and credit), or not desirable: reaching consensus even in the presence of faults assumption: communication is reliable, but processes may fail Consensus Problem agreement on the value of a decision variable amongst all correct processes p i is in state undecided and proposes a single value v i, drawn from a set of values. next, processes communicate with each other to exchange values. in doing so, p i sets decision variable d i and enters the decided state after which the value of d i remains unchanged P1 v1 = proceed d1 := proceed d2 := proceed Consensus algorithm v3 = abort P2 v2 = proceed P3 crashes Properties The following conditions should hold for every execution of the algorithm: termination: eventually, each correct process sets its decision variable agreement: the decision variable of all correct processes is the same in the decided state. integrity: if the correct processes all proposed the same value, then any correct process has chosen that value in the decided state. 134
5.8. DISTRIBUTED CONSENSUS Algorithm algorithm to solve consensus in a failure-free environment each process reliably multicasts proposed values after receiving response, solves consensus function majority(v 1,.., v n ), which returns most often proposed value, or undefined if no majority exists. properties: termination guaranteed by reliability of multicast. agreement, integrity: by definition of majority, and the integrity of reliable multicast (all processes solve same function on same data). when crashes occur how to detect failure? will algorithm terminate? when byzantine failures occur processes communicate random values. evaluation of consensus function may be inconsistent. malevolent processes may deliberately propose false or inconsistent values. The Byzantine Generals Problem three or more generals are to agree to attack or to retreat. one general, the commander issues order others (lieutenants to the commander) have to decide to attack or retreat one of the generals may be treacherous if commander is treacherous, it proposes attacking to one general and retreating to the other if lieutenants are treacherous, they tell one of their peers that commander ordered to attack, and others that commander ordered to retreat 135
5.8. DISTRIBUTED CONSENSUS difference to consensus problem: one process supplies a value that others have to agree on properties: termination: eventually each correct process sets its decision variable. agreement: the decision value of all correct processes is the same. integrity: if the commander is correct, then all processes decide on the value that the commander proposes. Interactive Consistency Problem Each process suggests a single value. goal: all correct processes agree on a vector of values ("decision vector"); each component correspond to one processes agreed value example: agreement about each processes local state. properties: termination: eventually each correct process sets its decision vector. agreement: the decision vector of all correct processes is the same. integrity: if p i is correct, then all correct processes decide on v i as the i-th component of their vector. Relationship between these Problems Assume that the previous problems could be solved, yielding the following decision variables 136
5.8. DISTRIBUTED CONSENSUS Consensus: C i (v 1,.., v n ) returns the decision value of process p i Byzantine Generals: BG i (k, v) returns the decision value of process p i where p k is the commander which proposes the value v Interactive Consistency: IC i (v 1,.., v n )[k] returns the k-th value in the decision vector of process p i where v 1,.., v n are the values that the processes proposed Possibilities to derive solutions out of the solutions to other problems solution to IC from BG run BG n times, once with each p i acting as commander IC i (v 1,.., v n )[k] = BG i (k, v k ) with (i, k = 1,.., n) solution to C from IC run IC to produce a vector of values at each process apply an appropriate function on the vector s values to derive a single value C i (v 1,.., v n ) = majority(ic i (v 1,.., v n )[1],.., IC i (v 1,.., v n )[n]) solution to BG from C commander p k sends its proposed value v to itself and each of the remaining processes all processes run C with the values v 1,.., v n that they receive derive BG i (k, v) = C i (v 1,.., v n ) with i = 1,.., n termination, agreement and integrity preserved in each case. Consensus in synchronous Networks Assumption: no more than f of the n processes crash (f < n). The algorithm proceeds in f+1 rounds in order to reach consensus. the processes B-multicast values between them. at the end of f+1 rounds, all surviving processes are in a position to agree. algorithm for process p i concensus group g On initialization values i (1) := {v i }; values i (0) := {}; 137
5.9. AUTHENTICATION SERVICE KERBEROS in round r (1 r f+1) B-multicast(g, values i (r)-values i (r-1)); //send only values that have not been sent values i (r+1) := values i (r) while (in round r) { On B-deliver(v j ) from some p j values i (r+1) := values i (r+1) v j } After (f+1) rounds assign d i = minimum (values i (f+1)) 5.9 Authentication service Kerberos Definition: Authentication means verifying the identities of the communicating partners to one another in a secure manner. Kerberos has been developed at the MIT as part of the distributed framework Athena. Kerberos ist part of a variety authentication components. The Kerberos authentication protocol is based on the protocol by Needham and Schröder. 5.9.1 Introduction This course provides only a short introduction to Kerberos (for further information, consult the Kerberos Web-Site (URL: http://web.mit.edu/kerberos/www/)) Motivation Kerberos assumes the following components Client C, Server S, Key distribution center KDC, and Ticket granting service TGS. Goal of Kerberos A client C requests the service of the server S. KDC and TGS are supposed to guarantee the secrecy and authenticity requirements. 1. KDC manages the secret keys of the registered components. 138
5.9. AUTHENTICATION SERVICE KERBEROS 2. Within a session TGS provides the client C with tickets for authentication with servers of the distributed system. Security objects of Kerberos Kerberos enables authentication through the following three security objects. 1. TGS ticket: issued by KDC to the client C for presentation at TGS. 2. Authentifier: generated by client C; it identifies the client and guarantees the validity for the communication with server S. 3. Session key: generated by Kerberos for the communication between client C and server S. 5.9.2 Authentication process scenario Graphical representation Kerberos KDC TGS 1 request TGS ticket TGS ticket 2 request Server ticket 3 server ticket 4 C authentifier authentifier 5 S Animation Kerberos see Online Version 139
5.9. AUTHENTICATION SERVICE KERBEROS Description of exchanged messages Message 1: C to KDC C KDC with information C, TGS KDC determines by querying a database the secret key K C for communication between Kerberos and C; KDC generates a good random session key K C, tgs. Message 2: KDC to C KDC C with information In the following the terms K C and K[C] are equivalent. secret key K C is derived from the user password. (K C, tgs ) K[C] (C, TGS, T kdc, L kdc, K C, tgs ) K[tgs] = ticket(c, TGS) K[tgs] the second part of the message is not interpreted by C. Rather, it is forwarded to TGS as a whole (representing the TGS ticket). The ticket is encrypted with the secret key K tgs of TGS. T kdc timestamp for ticket creation time. L kdc life-span of the ticket. Message 3: C to TGS C TGS with information (C, T C ) K[C,tgs] TGS determines a random session key K c, s, if TGS ticket is still valid, T C is current, and ticket(c, TGS) K[tgs] field C matches (of the first parameter and of the ticket). Message 4: TGS to C S 140
5.9. AUTHENTICATION SERVICE KERBEROS TGS C with information (K C, S ) K[c, tgs] (C, S, T tgs, L tgs, K c,s ) K[S] = ticket(c, S) K[S] The second part of the message serves as a ticket of C for server S. K S is the secret key of server S known to Kerberos. Message 5: C to S C S with information (C, T C ) K[c,s] ticket(c, S) K[S] Messages 5 and 6 support the mutual authentication of C and S, respectively. Message 6: S to C S C with information (T C ) K[c, s] Problems with Kerberos Manipulation of local computer clocks to circumvent the validity time of tickets i.e. synchronization of clocks in distributed systems must be authorized and authenticated. Example: user login with Kerberos 1. login program of the workstation W sends user name N to KDC. 2. if the user is known, then KDC sends a session key K N encrypted with the user password, as well as a TGS ticket. 3. login program requests the password from the user and decrypts the session key K N using the password; if the password was correct, then the decrypted session key K N and the session key K N within the TGS ticket are identical. 4. the password can be removed from the main memory because for further communication, only K N and the TGS ticket are used; both are used to authenticate the user at TGS if the user requests a server S. 5. establish a user login session on workstation W. 141
Chapter 6 Web Services Web services provide a standard means of communication among distributed software applications based on the Web technology. Standardization by the W3C community. 6.1 Motivation - Example Today, we normally use Web browsers to interact with Web sites browser names document via URL request and reply messages encoded in HTML, using HTTP as communication protocol Web Services generalize this model so that computers can talk to other computers. Use of Web Services in a distributed travel arrangement application 142
imac Schlichter, TU München 6.2. SERVICE ORIENTED ARCHITECTURE - SOA Internet Hotel service Web Services Client application Internet Travel agency Web Service Internet Hotel service Web Services Internet Hotel service Web Services 6.2 Service Oriented Architecture - SOA SOA evolved from component-based architectures. SOA is a collection services with a loose coupling and dynamic binding between services 6.2.1 Characteristics find service registry service description publish client service requestor service request service response server service provider service is a well defined, self contained function does not depend on context or state of other services manages its own data coarse granularity 143
6.2. SERVICE ORIENTED ARCHITECTURE - SOA communication between services for data passing and for coordinating activities focus is on the design of service interface SOA vs. Component based Architecture SOA differs from today s component-based architectures in the following respects: component-based tight integration code-oriented development technical complexity of the IT infrastructure build to last SOA loose horizontal integration process-oriented development interoperable architecture for business and IT build to change 6.2.2 Layered Approach Focus is on business processes of enterprises: Mapping of business processes to services Application layer Process layer Service layer Component layer Application Process service object layer Application Process service Application Process service object layer 6.2.3 Adopting Service Oriented Architecture (SOA) The adoption within organizations depends on a variety of issues: 144
6.3. WEB SERVICES - CHARACTERISTICS Supporting Issues interoperable networked applications easier exchange of distributed data easier access of enterprise wide data availability of external services cross-organizational computing reduced maintenance cost small effects on existing operational systems Restraining Issues different formats and semantics of data sources security issues due to network access standards are evolving and some are not fixed lack of understanding The Enterprise Services Bus (ESB) refers to both a software architecture and and class of software products used for the realization of SOA. messaging middleware that provides interoperability between enterprise applications via XML, Web Services interfaces and standardized rule-based routing of documents. Mule (URL: http://mule.mulesource.org/display/mule/home) is an Open Source ESB. SOA blueprints initiative: define the requirements for a reference example that highlights the best SOA practices. web services are an approach of building a SOA based on Web technologies encapsulation of application components in web services 6.3 Web Services - Characteristics A Web Service is a standardized way of integrating Web-based applications. 145
6.3. WEB SERVICES - CHARACTERISTICS 6.3.1 Informal Definition Web Services can live anywhere in the network are described using a service-description language which is in formal XML notation covers all the details necessary to interact with the service (message formats for operations, transport protocols and location) hides the implementation details of the service are published to a registry of services are available through its declared API and invocation mechanism provide an entry point accessing local/remote services 6.3.2 Integration allows integration of application functionality within organizations between business partners across organizational boundaries 6.3.3 Features of Web Services specific features of Web Services programmable: WS are accessed via a programmable interface self descriptive: meta data describe the WS. encapsulation: self contained application component. loosely coupled: communication via message passing using platformindependent and language-neutral protocols. location transparent: communication. access to WS from different locations via network protocol transparent: WS is based on Internet protocol suite; operation may support several protocols, e.g. HTTP, SMTP. composition: several WS may be combined into a new WS. Web services are software components which enable loosely coupled, componentoriented, cross-technology application implementations. Web Services are document-centric communication is by sending documents from the server and back. most properties are associated with the document itself, and not the service. 146
6.3. WEB SERVICES - CHARACTERISTICS 6.3.4 Potential of Web Services Web Services have the potential to change IT infrastructure of organizations setting up a service oriented architecture based on web services process oriented integration of existing systems intra- and inter-organizational scenarios approach for enterprise application integration (EAI) development of complex cooperative processes paradigm for the development of new software architectures reuse of software components redesign of monolithic enterprise resource planning (ERP) increase the process oriented interoperability and the flexibility of the technical infrastructure. 6.3.5 Web Services - Distributed Objects Web services and distributed objects have some sort of description language what to call: operations, signatures, return types, exceptions. how to make an invocation. compilers generate client stub and server skeleton both have well-defined network interactions both have a similar mechanism for registering and discovering available components. Differences Web services are usually designed for stateless computing. Distributed objects enable stateful computing. Web services are a technology supporting the integration on the Web. Distributed objects are mainly for intranet. 147
6.4. WEB SERVICES ARCHITECTURE 6.4 Web Services Architecture Definition: A Web service (W3C) is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by internet protocols. A Web Service is a standardized way of integrating Web-based applications using XML, SOAP, WSDL and UDDI open standards over an Internet protocol backbone. XML: tag the data SOAP: transfer the data WSDL: describe the available services UDDI: list the available services. simplified view: a web service is a remote procedure call over the internet using XML messages. 6.4.1 Web Services interoperability Stack Compositional BPEL4WS, WS-Notification Quality of Experience WS-Security, WS-Transactions,.. Description WSDL, UDDI, WS-Policy,.. Messaging XML, SOAP, WS-Adressing transport HTTP, SMTP,... 6.4.2 Basic Architecture defines an interaction between software components as an exchange of messages between service requesters and service providers. 148
6.4. WEB SERVICES ARCHITECTURE Functions of the architecture exchanging messages. describing Web services. publishing and discovering Web service descriptions. The service: a Web service is an interface; implementation of it is the service. The service description: details of the interface and the implementation of the service. 6.4.3 Roles The basic Web service architecture models the interactions between three roles Service Provider processes a Web service request. Service Discovery Agency agency through which a Web service description is published and made discoverable. Service Requestor requests the execution of a Web service. 6.4.4 Operations of the Web Service Architecture 149
imac Schlichter, TU München 6.4. WEB SERVICES ARCHITECTURE find Discovery Agencies service description publish service description client service requestor interact server service provider Publish: a service needs to publish its description such that a requestor can subsequently find it. Find: the requestor queries a registry for the required service and retrieves a service description. Interact: a service needs to be invoked and the results are returned. 6.4.5 Basic Standard Technologies Web services are based on 3 basic standards WSDL: Web Services Description Language. UDDI: Universal Description, Discovery and Integration SOAP: Simple Object Access Protocol 150
imac Schlichter, TU München 6.4. WEB SERVICES ARCHITECTURE query directory 2 3 query response (WSDL) Discovery Agencies directory 1 service description using WSDL SOAP messages 4 XML service request service requestor XML service response 5 service provider Steps involved in providing and consuming a service 1. a service provider describes its service using WSDL. This definition is published to a directory of services. 2. a service requestor queries the directory to locate a service and determine how to communicate with that service. 3. directory sends service description to service requestor. 4. service requestor send service request based on WSDL 5. service provider send response based on WSDL Web Service Messages WSDL uses XML to define messages. 151
imac Schlichter, TU München 6.4. WEB SERVICES ARCHITECTURE <element name= CustomerInfoRequest > <element name= account type= string >... </element> <element name= CustomerInfoResponse > <element name= name type= string /> <element name= phone type= string /> <element name= street type= string /> <element name= city type= string />... </element> query directory request query response <m:getcustomerinfo...> <account>1069</account> <m/:getcustomerinfo...> service description using WSDL service requestor response <m:getcustomerinforesponse...> <name>huber</name> <phone>289-18655</phone> <street>boltzmannstr 3</street> <city>garching</city>... <m/:getcustomerinforesponse> service provider 6.4.6 Message Exchange Patterns define the sequence of one or more messages exchanged between service requestor and service provider. Examples are: one-way, request/response, broadcast. The Web service architecture may support different interaction scenarios. Peer-to-Peer 152
imac Schlichter, TU München 6.4. WEB SERVICES ARCHITECTURE find/ publish Discovery Agencies service description find/ publish service description server client interact client server service description service requestor/ provider service requestor/ provider In the peer-to-peer scenario, each Web service instance serves in both the service requestor and service provider roles. Direct Interaction service description Discovery Agency client service requestor interact publish service description server service provider The role service requestor and discovery agency are fulfilled by the client. Intermediary 153
imac Schlichter, TU München 6.5. SIMPLE OBJECT ACCESS PROTOCOL (SOAP) service description find Discovery Agencies publish interact service description client service requestor interact intermediary server service provider Intermediaries may perform additional functions (besides the operations defined by the message exchange patterns) with a message such as routing, security, management. 6.5 Simple Object Access Protocol (SOAP) simple and lightweight XML-based mechanism for exchanging data between network applications. SOAP (URL: http://www.w3.org/tr/soap12-part1/) is a de-facto standard for XML messaging: relatively simple. flexible and extensible. based on XML. not bound to a specific protocol; use of Internet protocols such as HTTP, SMTP may be used for RPC or document transfer. HTTP request HTTP response method name parameter list service object Web client result Web server use of SOAP for sending Web Services messages 154
6.5. SIMPLE OBJECT ACCESS PROTOCOL (SOAP) 6.5.1 Parts of SOAP SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. SOAP consists of three parts: an envelope. a set of encoding rules. a convention for representing remote procedure calls and responses. SOAP Message A SOAP message is an XML document that consists of a mandatory SOAP envelope; defines the various XML namespaces. an optional SOAP header; defines auxiliary information, e.g. authentication, encoding of data. and a mandatory SOAP body; payload of message, e.g. method name and arguments (in case of RPC). 6.5.2 Exchange Model one-way transmissions from a sender to a receiver. combination of SOAP messages to implement interaction patterns such as request/response. A SOAP application receiving a SOAP message must process the message by performing the following actions 1. Identify all parts of the SOAP message intended for that application; interpret the "SOAP actor" attribute of the SOAP header. 2. Verify that all mandatory parts are supported by the application for this message and process them accordingly. 3. If the SOAP application is not the ultimate destination of the message then remove all parts identified in step 1 before forwarding the message. 6.5.3 Using SOAP in HTTP SOAP naturally follows the HTTP request/response message model providing SOAP request parameters in a HTTP request and SOAP response parameters in a HTTP response. use of media type "text/xml". 155
6.5. SIMPLE OBJECT ACCESS PROTOCOL (SOAP) SOAP Message Embedded in HTTP Request POST /StockQuote HTTP/1.1 Host: www.stockquoteserver.com Content-Type: text/xml; charset="utf-8" Content-Length: nnnn SOAPAction: "Some-URI" <SOAP-ENV:Envelope xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <m:getlasttradeprice xmlns:m="some-uri"> <symbol>dis</symbol> </m:getlasttradeprice> </SOAP-ENV:Body> </SOAP-ENV:Envelope> SOAP request: processed by a servlet, CGI or standalone daemon running on a remote web server. SOAP Message Embedded in HTTP Response HTTP/1.1 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: nnnn <SOAP-ENV:Envelope xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <m:getlasttradepriceresponse xmlns:m="some-uri"> <Price>33,2</Price> </m:getlasttradepriceresponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope> 6.5.4 SOAP RPC Conventions RPC interactions may be mapped to SOAP. 156
6.5. SIMPLE OBJECT ACCESS PROTOCOL (SOAP) application object client converts procedure calls to/ from XML messages sent through HTTP application object provider SOAP based middleware network SOAP messages exchanged on top of HTTP SOAP based middleware Example Java Method public int addfive(int arg); Request Message in SOAP <env:envelope> <env:body> <myns:addfive xmlns:myns="http://my-domain.de/" enc:encodingstyle="http://"> <arg xsi:type="xsd:int">33</arg> </myns:addfive> </env:body> </env:envelope> Response Message in SOAP <env:envelope> <env:body> <myns:addfiveresponse xmlns:myns="http://my-domain.de/" xmlns:rpc="http://www.w3.org/2003/05/soap-rpc" enc:encodingstyle="http://"> <rpc:result>ret</rpc:result> <ret xsi:type="xsd:int">38</ret> </myns:addfiveresponse> </env:body> </env:envelope> 157
6.5. SIMPLE OBJECT ACCESS PROTOCOL (SOAP) 6.5.5 Minimalist Infrastructure for Web Services Service requestor application object (client) SOAP-based middleware SOAP messages exchanged on top of HTTP Service provider application object (service provider) SOAP-based middleware SOAP messages look for services converts procedure calls to/from XML messages sent through HTTP SOAP-based middleware SOAP messages publish service description Service description UDDI registry 1. providers advertise their services in a UDDI registry 2. clients look for services in a UDDI registry statically: at development time dynamically: at run-time 3. client invokes the service 6.5.6 SOAP-Router Routing is a process of delivering messages through a series of nodes or intermediaries, called SOAP-Routers in the context of SOAP. The SOAP Router is the entity that moves SOAP messages between internal and external networks. 158
imac Schlichter, TU München 6.6. WEB SERVICES DESCRIPTION LANGUAGE (WSDL) client service requestor external network SOAP Router internal network service 1 service n Besides routing capabilities the SOAP-Router may provide value-added services such as logging, auditing and enforcement of security policies. WS_Routing is a protocol that defines how SOAP messages can be delivered using various transports. 6.6 Web Services Description Language (WSDL) Ian Forster states: "Web service have little value if others cannot discover, access, and make sense of them." Definition: A WSDL (URL: http://www.w3.org/tr/2007/rec-wsdl20-primer- 20070626/) document defines services as collections of network endpoints, or ports. WSDL has a purpose similar to that of IDLs in conventional middleware platforms. A WSDL description describes 3 fundamental properties of a Web Service What a service does: operations and the arguments needed to invoke them. How a service is accessed: details of data formats and protocols. Where a service is located: details of the protocol-specific network address, such as a URI. 159
6.6. WEB SERVICES DESCRIPTION LANGUAGE (WSDL) 6.6.1 WSDL Information Model A WSDL document uses the following elements in the definition of network services: Types: a container for non-built-in data type definitions using some type system, e.g. arrays and structures. Message: an abstract, typed definition of the data being transferred between the requestor and service; method call (request/response): modeled as 2 messages. Port Type: an abstract set of operations supported by one or more endpoints; an operation specifies a specific input/output message sequence. Operation: an abstract description of an action supported by the service. Binding: specifies a concrete protocol and data format for the operations and messages defined by a particular PortType, such as SOAP or Corba. Port: a single endpoint defined as a combination of a binding and a network address. Service: a collection of related endpoints. Parts of WSDL WSDL is divided in 2 parts an abstract part which describes what is offered; it consists of types, message, operations and port types. a concrete part which describes how and where it is offered; it consists of bindings, services and ports. 160
6.6. WEB SERVICES DESCRIPTION LANGUAGE (WSDL) <definitions> <types>.. </types> <message name= >.. </message> abstract part what <porttype name= >.. </porttype> how where <binding name= >.. </binding > <service name= >.. </service> concrete part </definitions> Relationship of parts 161
6.6. WEB SERVICES DESCRIPTION LANGUAGE (WSDL) Definitions Operations Service Bindings data type definitions message definitions operation data type definitions message definitions operation port type binding port & network address data type definitions message definitions operation definitions are generally expressed in XML. operations describe actions for the messages supported by a Web Service; the equivalent of a method signature in Java. service bindings connect port types to a port. 6.6.2 Example for SOAP Request/Response WSDL definition of a simple service providing stock quotes; the service supports the single operation GetLastTradePrice(ticker symbol) and returns the price as a float. <?xml version="1.0"?> <definitions name="stockquote" targetnamespace="http://example.com/stockquote.wsdl" xmlns:tns="http://example.com/stockquote.wsdl" xmlns:xsd1="http://example.com/stockquote.xsd" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns="http://schemas.xmlsoap.org/wsdl/"> 162
6.6. WEB SERVICES DESCRIPTION LANGUAGE (WSDL) <types> <schema targetnamespace="http://example.com/stockquote.xsd" xmlns="http://www.w3.org/2000/10/xmlschema"> <element name="tradepricerequest"> <complextype> <all><element name="tickersymbol" type="string"/></all> </complextype> </element> <element name="tradeprice"> <complextype> <all><element name="price" type="float"/></all> </complextype> </element> </schema> </types> <!-- Parameter der Nachricht --> <message name="getlasttradepriceinput"> <part name="body" element="xsd1:tradepricerequest"/> </message> <!-- Parameter der Antwort --> <message name="getlasttradepriceoutput"> <part name="body" element="xsd1:tradeprice"/> </message> <porttype name="stockquoteporttype"> <operation name="getlasttradeprice"> <input message="tns:getlasttradepriceinput"/> <output message="tns:getlasttradepriceoutput"/> </operation> </porttype> <binding name="stockquotesoapbinding" type="tns:stockquoteporttype"> <soap:binding style="document" transport="http://schemas.xmlsoap.org/soap/http"/> <operation name="getlasttradeprice"> <soap:operation soapaction="http://example.com/getlasttradeprice"/> <input><soap:body use="literal"/></input> <output><soap:body use="literal"/></output> </operation> </binding> 163
6.6. WEB SERVICES DESCRIPTION LANGUAGE (WSDL) <service name="stockquotesoapservice"> <documentation>our defined service</documentation> <port name="stockquoteport" binding="tns:stockquotesoapbinding"> <soap:address location="http://example.com/stockquote"/> </port> </service> </definitions> 6.6.3 Generating code from WSDL Use of a WSDL compiler to automatically generate code (e.g. a Java interface) from a WSDL file. WSDL of service provider 2 1 WSDL compiler (client side) WSDL compiler (server side) WSDL generator Service requestor application object (client) stub Service provider application object (service provider) skeleton SOAP-based middleware SOAP messages SOAP-based middleware WSDL documents can be generated from APIs (1). Stubs and skeletons can be generated from WSDL document (2). 6.6.4 Common bad Practices Analysis of existing WSDL documents shows that functionality of many Web services are hard to understand due bad practices. 164
6.7. UNIVERSAL DESCRIPTION, DISCOVERY, AND INTEGRATION (UDDI) developers take not sufficient care of names and comments. port types are tied to concrete protocols. semantically unrelated operations are placed in a single port type. overload output messages to transport results and error information. 6.7 Universal Description, Discovery, and Integration (UDDI) provides the definition of a set of services supporting the description and discovery of businesses, organizations, and Web Service providers, the Web services they make available, the technical interface to access those services. UDDI itself is a Web Service; has a WSDL interface and can be described by a UDDI registry. UDDI Business Registry System Categorization of the information contained in a UDDI registry. UDDI white pages: basic information such as company name, contact information, and of services these organizations provide. UDDI yellow pages: detailed business data and Web Services, organized by relevant business classification. UDDI green pages: information how a given Web Service can be invoked. UDDI Entities UDDI allows to store and manipulate four main types of entities 165
6.7. UNIVERSAL DESCRIPTION, DISCOVERY, AND INTEGRATION (UDDI) businessentity businessservice bindingtemplate TModel bindingtemplate TModel businessservice bindingtemplate TModel businessentity: represents the owner of a Web Service. Attributes: name, unique key, zero or more services, descriptions,... businessservice: represents a group of one or more Web Services. Attributes: name, unique key, one binding template per Web Service, descriptions,... bindingtemplate: represents a single Web Service; contains all the information to locate and invoke the service Attributes: unique key, an access point that indicates the URL of the Web Service TModel: represents WSDL interface types. Attributes: name, unique key, an URL that points to the data associated with the TModel, description,.. UDDI Registry API UDDI registries have 3 main types of users service providers that publish services requesters that look for services other registries that need to exchange information. UDDI supports the following sets of APIs 166
6.8. REST UDDI Inquiry API: operations to find registry entries such as find_service, or get details on specific entity, e.g. get_servicedetail. UDDI Publishers API: add, modify, and delete entries, e.g. save_service or delete_service. UDDI Security API: get and discard authentication tokens to be used in communication with registry. UDDI Ownership Transfer API: transfer ownership of structures between registries. UDDI Subscription API: enables monitoring of changes in a registry by subscribing to track new, modified, and deleted entries. UDDI Replication API: supports replication of information between registries. UDDI registry xmethods (URL: http://www.xmethods.com/ve2/index.po) for publicly available Web Services. 6.8 REST REST (Representational State Transfer) is an architectural style of distributed applications. REST is not a standard; it is a set of principles how to use Web standards, such as HTTP, URIs and Mime Types. The Web is a REST system. REST is based on the following key principles give every relevant resource an ID: use URIs to identify everything that is any item of interest. URL: http://www.boeing.com/aircraft/747 A representation of the resource is returned (e.g., Boeing747.html). representation places the client application in a state. link resources together: navigating links results in state transfers of the client application. use standard methods: such as get, post, put, delete. communication is stateless. The 167
6.9. WEB SERVICE COMPOSITION resources with multiple representations: client may specify the formats which it accepts GET /customers/1234 HTTP/1.1 Accept: text/x-vcard 6.9 Web Service Composition an important issue is the choice of the appropriate granularity small vs. large Web Services - thousands vs. a handful of Web Services what are the appropriate reusable, shared business components Composition of complex Web Services from smaller reusable Web Services web service hotel services web services Internet travel agency Internet airline reservation client composition to handle complexity composite web service travel arrangement web services rental car reservation web services 6.9.1 Dimensions to handle complexity component model: defines the sub-services. orchestration model: defines the order in which the sub-services are invoked. WS-Coordination is an extensible framework that describes how different Web Services work together reliably. Coordination framework contains Activation, Registration and Coordination services data access model: specifies the data exchange between the sub-services. transactional model: transactional semantics of the composed service. WS-Transaction specifies the protocols for each coordination type (used by WS-Coordination) 168
6.9. WEB SERVICE COMPOSITION AtomicTransactions: all-or-nothing property, 2-phase-commit. Business Activity: handle long-lived activities and to apply business logic to handle business exceptions; BusinessAgreement Protocol. exception handling: handling of errors in the sub-services. 6.9.2 Web Service Orchestration transparent chaining translucent chaining query service UDDI registry invoke chain workflow service client invoke service client invoke service web service 1 request input web service 2 status web service 1 request input web service 2 invoke chain aggregate service invoke service web service 1 request input web service 2 opaque chaining transparent chaining client evaluates service description and determines appropriate usage translucent chaining workflow service invokes the sub-services in the correct order status propagation to the client (e.g. for user authentication to web service) opaque chaining aggregate service handles the invocation of sub-services autonomously eliminates client s awareness 169
6.10. ADOPTING WEB SERVICES 6.10 Adopting Web Services There exist already a variety of free of commercial Web services; provided especially by Internet companies, such as Google, Amazon or Yahoo. 6.10.1 Example Web Services The following lists some available Web services; often registration necessary in order to use them. Amazon E-Commerce Service (ECS) ECS (URL: http://aws.amazon.com/) provides access to Amazon s product database with the following types of data detailed product information. customer-contributed content, e.g. wish list, product reviews. seller information. 3rd party product information. shopping cart contents. ECS supports both SOAP and REST style interactions. product operations ItemSearch: performs a search for a specific item, typically using a set of keywords SimilarityLookup: returns a list of similar products to a given product-id (based product specifications and features). ItemLookup: access to the data related to a specific product. remote shopping cart operations CartCreate: creates a remote shopping cart. CartAdd: adds an item to the shopping cart. further operations are CartGet (obtains content of the cart), CartModify (remove an item from cart) and CartClear (remove all items from cart). Example of a REST-style invocation: http://webservices.amazon.com/onca/xml? Service=AWSECommerceService&AWSAccessKeyId=my-key& Operation=ItemSearch&SearchIndex=Books&Keywords=Amazon Mashups 170
6.10. ADOPTING WEB SERVICES FedEx Office and Priniting Service printing of online documents and distribution of paper documents as commercial Web Service free print plug-in for standard office application; plug-in added to list of printers Pick up your document at any U.S. location or ship via FedEx for added convenience. location independent use of printing service Via Michelin (URL: http://dev.viamichelin.com/wswebsite/gbr/jsp/vmdn/vmdn- WebServices-Documentation.jsp?WSDoc=GeoV3) Reverse Geocoding Web Service allows users to obtain the closest road segment (named or not) for each supplied geographic coordinates (WGS84). XMethods (URL: http://www.xmethods.net): clearinghouse for Web Services 6.10.2 Apache Axis Apache Axis (URL: http://ws.apache.org/axis/) supports an environment to implement and provide Web services. set of client-side APIs for dynamically invoking SOAP Web services (with or without WSDL descriptions). tools to translate WSDL documents into Java frameworks. mechanisms for hosting Web services either within a servlet container (e.g. Tomcat) or via standalone server. a set of APIs for manipulating SOAP envelopes, bodies, and headers, and using them inside Message objects. data binding which enables mapping of Java classes into XML schemas and vice versa. a transport framework that allows usage of a variety of underlying transport mechanisms (e.g. JMS, email, etc). 171
6.10. ADOPTING WEB SERVICES Axis2 In the meantime there exists already Apache Axis2 (URL: http://axis.apache.org/axis2/java/core/) Java-based implementation of both the client and server sides of the Web services Axis2 is more flexible, efficient and configurable in comparison to Axis1.x Axis2 not only supports SOAP messages, but it also supports RESTful Web services. 6.10.3 Web Services and Java Java provides a number of APIs implementing the Web Services standards SAAJ ( SOAP with Attachments API for Java) SOAP messages as Java objects JAX-WS (Java API for XML based Web-Services) programming model for Web Services; replaces JAX-RPC JJWSDL: Accessing WSDL descriptions JAXR (Java API for XML Registries) Accessing Web Services Registries, e.g. UDDI JAXP (Java API for XML Processing) Abstract XML-API-Specification implemented by e.g. Apache Xalan(XSLT), Apache Xerces2 (XML Parsing (DOM, SAX..)). XWSS (Java Web-Services Security) Signatures, Encryption (roughly for SOAP what SSL is for HTTP) 6.10.4 Integration and WS Standards 172
6.10. ADOPTING WEB SERVICES process management process modeling, execution: BPEL4WS process control : - user interface integration: WSUI/WSXL message management transformation services: XSLT synchronization : - transaction services: WS-Coordination, WS Transactions adapter interface description: WSDL messaging: SOAP transport layer: network layer: middleware interface description: WSDL reliability: WS Reliability Messaging: SOAP, XML Transport: HTTP, SMTP,.. physical network TCP, UDP IP meta data & additional services meta database : UDDI system management: - security services: WS-Security, SAML, XML-Encryption, XML- Signature development support: - 6.10.5 Supporting - Restraining Forces The adoption of Web Services in organization depends on Supporting Forces interoperable networked applications, i.e. independence of hardware, operating system, application server,... easier exchange of distributed data easier access of enterprise wide data availability of external services, encapsulation of legacy applications cross-organizational computing reduced maintenance cost, easier reuse of components emerging industry-wide standard 173
6.10. ADOPTING WEB SERVICES Restraining Forces different formats and semantics of data sources security issues due to network access standards are evolving and not fixed yet lack of understanding of effects on operational systems 6.10.6 Distributed Process Architecture client web services application server adapter adapter adapter Corba ORB RMI objects environment other existing legacy applications enterprise data Corba services 6.10.7 Semantic Web Services In order to allow for automatic discovery of appropriate web services and of automatic interaction / chaining / incorporation with web services we need semantic meta-data for web services: DataTypes with rich semantics... Example: Map-Service Input: (int, int) 174 Web-Service Ontologies,
6.11. MASHUPS Output: APPLICATION/GIF Input: (int, int): (x,y) of center of map? of corner of map? which corner? what coordinate system? Wgs84? Gauss-Krueger?... Output: APPLICATION/GIF: What kind of map? Topological? Political? POI? Traffic? Units of measure? candidate technology: OWL-S (URL: http://www.daml.org/services/owl-s/) (Ontology Web Language for Web Services) OWL-based Web service ontology, which supplies Web service providers with a core set of constructs for describing the properties and capabilities of their Web services in unambiguous, computer-interpretable form. 6.11 Mashups Definition: Mashup simply indicates a way to create new Web applications by combining existing Web resources utilizing data and Web APIs. 6.11.1 Mashup Techniques Work for the combination of data and services can be done on the server, the client or both of them. 1. Mashing on the Web Server All the work of mashing is done on a Web server while the browser just waits for a response. 175
6.11. MASHUPS SOAP request 2. Amazon HTTP Get 1. Mashup web server 3. SOAP response Web Browser 6. REST request 4. user 7. response Mashing 5. XML response Yahoo Web browser request a page from the server using straight HTTP. The page is constructed by the server from data of partner sites (e.g. Amazon and Yahoo). First a SOAP request is sent to Amazon. Amazon returns a SOAP response. The 2nd request is sent to Yahoo using the REST-style approach. Yahoo responds with an XML document over HTTP. The web server aggregates the responses combining the data in a well-defined manner. The resulting data is bound to HTML and inserted into the response; sent back to the browser. Characteristics Browser is decoupled from the partner sites supplying the data. Web server acts as a proxy and aggregator for the responses. Browser requests the entire page. Scalability problem because server does all the work. 2. Mashing using Ajax This approach allows a richer user experience; the work is divided between the server and the browser. 176
6.11. MASHUPS user Web Browser default form JavaScript Ajax request Callback function 11 XSLT transformation HTTP Get 1 2 response 3 request 10 response Mashup web server default form 4 request proxy 9 Mashing SOAP request 8 5 Amazon 6 SOAP response REST request XML response 7 XHTML Yahoo 12 display results page content (a) Web browser request a page from the server using straight HTTP. (b) Browser first loads the page; there is no mashup content present. (c) Browser issues a request back to the server for additional content (SOAP, REST or XML RPC). (d) Mashup server acts as a proxy. (e) A SOAP request is sent to Amazon. (f) Amazon returns a SOAP response. (g) The 2nd request is sent to Yahoo using the REST-style approach. (h) Yahoo responds with an XML document over HTTP. (i) Some mashing may occur on the server. (j) Once the data is ready, it is sent back to the browser. (k) A transformation is applied to the XML data to convert into XHTML which includes data and presentation markup. (l) The generated snippet is inserted into the page s structure and presented to the user. Characteristics more complex because developers face JavaScript challenges, communication and asynchronicity. server 177
6.11. MASHUPS Ajax may refresh only a portion of the page. navigation mechanism of browser is bypassed. approach may result in a rich Internet application. presentation of results is driven by XSLT style sheet. browser is doing most of the work. all data are routed through a common point on the server. 3. Mashing with JSON JSON (URL: http://www.json.org/) (JavaScript Object Notation): lightweight data-interchange format that is gaining popularity in the mashup community. Web Browser HTTP Get 1 Mashup web server default form JavaScript 2 response default form 3 add <SCRIPT> tag, src=amazon user 6 new <SCRIPT> tag, calls the Display function, passing the JSON obtained from Amazon 7 XHTML page content display results calls Display function to process and display the JSON object s data 8 5 Browser makes request to obtain the source for the new tag Partner responds with JSON formatted data 4 Amazon (a) Web browser request a page from the server using straight HTTP. (b) Web server provides page which contains a couple key JavaScript functions. (c) The browser attempts to load the source code for the "new" script tag. (d) By loading the script an HTTP Get request is sent to the partner site. (e) The partner site responds with a JavaScript object serialized in JSON. (f) The JSON script becomes wrapped in a function call to the render function. (g) The browser now attempts to execute this new piece of JavaScript. (h) The render method is invoked and the JSON script is evaluated and turned into a JavaScript object. 178
6.11. MASHUPS Characteristics the browser communicates directly with the partner site. programmers must handle pre-made objects supplied in JSON. JSON objects are easier to read than XML. there is no data consolidation on the server. 6.11.2 Development Support in order to facilitate and speed up mashup development a number of tools and frameworks have recently emerged. Two dimensions may be distinguished component model: components describes the characteristic properties of the mashup a well-defined component interface facilitates reusability of components component properties: type: a component can be data, application logic or user interface. interface: create-read-update-delete (CRUD) interface, API for a specific programming language or IDL/WSDL. extensibility: whether the user may extend the component model. composition model: specifies how the components are glued together to create the mashup application flow-based: defines the orchestration as sequencing or partial order among components. event-based: uses the publish-subscribe model. Examples for tool-assisted mashup development Yahoo Pipes (URL: http://pipes.yahoo.com): mix data feeds to create data mashups using a visual editor. Yahoo Pipes are hosted and executed on a Yahoo server. QedWiki was a Wiki-based mashup maker by IBM; pages are hosted on an IBM server; mostly executed on the client side. ProgrammableWeb (URL: http://www.programmableweb.com/mashups) provides a mashup directory and marketplace which let users rank and discuss mashups. 179
Chapter 7 Design of distributed applications In traditional, nondistributed applications, procedures or modules help to structure system functionality and data structures. Components of the application use procedures or modules to encapsulate algorithms that logically belong together. This encapsulation may also solve certain reusability issues. The binding of all components into a complete software system is purely static. The following sections discuss the structured design of distributed applications. 7.1 Issues Software engineering of distributed applications raises interesting issues. particular, the following problems must be considered: In 1. Specification of a suitable software structure Applications must be decomposed into smaller, distributable components; encapsulation of data and functions. Which functionality is provided locally and which remotely? How should we test and debug distributed applications? 2. Mechanisms for name resolution How can an application localize and make use of a remotely provided service? Assignment of names to addresses. What should happen if a client cannot contact the localized server subsystem? 180
7.2. STEPS IN THE DESIGN OF DISTRIBUTED APPLICATIONS 3. Communication mechanisms Selection of the desired communication model, e.g. client-server model, group communication or peer-to-peer. How does the application (both client and server) handle network communication errors? 4. Consistency How can the data be kept consistent, particularly for replicated data? If a cache is used for performance improvement, then it must be kept consistent with the stored data. User interface consistency for the individual components. 5. User requirements Functionality and reconfigurability of the distributed application and its components. Service quality, such as security, reliability, fault tolerance and performance. What kind of security mechanisms are provided? Is authentication an issue? Which actions will be triggered if a client cannot communicate with its server? What type of heterogeneity is necessary? What efficiency (performance) is expected? 7.2 Steps in the design of distributed applications Designing a distributed application is a 7-step approach: 1. The repositories of the application data are identified. 2. Data are assigned to individual modules. This is a fundamental step of any software engineering approach. 3. The module interface is defined. 4. Define a network interface. 5. Classify each module as client or server. 181
7.3. DESIGN - DEVELOPMENT ENVIRONMENT 6. Registration of servers, i.e. the method in which servers are to be made available to other functional units is determined. 7. A strategy for the binding process of client and server subsystems is defined. 7.3 Design - Development environment use of Software Engineering concepts, methods and tools to design and development distributed applications software development cycle is divided into phases requirements analysis, specification, design, implementation, test and integration, maintenance for details see Software Engineering courses Open Distributed Processing (ODP) introduced by ISO with the goal of defining a reference model for distributed applications integrating a wide range of standards for distributed systems, e.g. ISO/OSI reference model Reduction of complexity by specifying different levels of abstractions of the distributed system ("viewpoints"). Enterprise viewpoint: deals with the overall goals that the distributed system should reach within the organization. Information viewpoint: focus on aspects of the structure, the control of and the access to information Computation viewpoint: aspects of the logical distribution of data and subsystems. Engineering viewpoint: physical distribution of data and subsystems Technology viewpoint: the different physical and technical subsystems, e.g. network, hardware platforms. Model Driven Architecture (MDA) concept for structured and documented software development OMG standard (Object Management Group ) use of architectural models 182
7.3. DESIGN - DEVELOPMENT ENVIRONMENT Models Definition: A model is a description of (part of) a system written in a welldefined language. Definition: A well-defined language is a language with well-defined form (syntax), and meaning (semantics), which is suitable for automated interpretation by a computer MDA Concept consists of 3 steps development of platform independent models (PIMs) mapping to platform dependent models (PSMs) implementation, integration and test transformation between models (PIM PSM, PSM code) Platform independent model (PIM) business functions Platform specific model (PSM) specifics of the implementation Code generation, system development, test 1. Step: development of PIM PIM models the functionality and behavior of software system specifies components, classes, pre-/post conditions, semantics no technological details, e.g. type of communication (such as SOAP) use of UML (Unified Modeling Language) to model information in diagrams. use case diagrams class and component diagrams sequence diagrams state diagrams 2. Step: mapping to PSM mapping of PIM to platform specific models PSMs 183
7.3. DESIGN - DEVELOPMENT ENVIRONMENT Platform independent model (PIM) PSM: web service model PSM: Java/EJB model PSM: Corba model PSM:... PSM models realization of software solution in UML Example: software components are Web services and communication via SOAP 3. Step: code generation generation of specific technological constructs, e.g. Java packages implementation of system functionality use of tools for automatic code generation Platform independent model (PIM) PSM: web service model PSM: Java/EJB model PSM: Corba model PSM:... E.g. WSDL, Java constructs E.g. Java, EJB constructs E.g. IDL/C++ constructs... AutoFocus AutoFocus is a platform to specify distributed systems developed by the group of Prof. Broy, TU München based on formal methods of systems engineering integrates hierarchical description techniques allows distributed and platform independent development project advanced to AutoFocus 2 (URL: http://www4.informatik.tumuenchen.de/~af2/) supporting the following functionality 184
7.4. SERVICE-ORIENTED MODELING requirement analysis tool (AutoRAID), such as use-cases and scenarios, business and application requirements Design modelling views and editors, such as system structure diagram, state transition diagram, message sequence charts interactive simulation environment, code generation, consistency maintenance support, 7.4 Service-Oriented Modeling Ideas and proposals emerged to transfer the service approach to the design and modeling of software systems. Definition: Service-oriented modeling (SOM) is the discipline of modeling business and systems, for the purpose of designing and specifying service-oriented business systems within SOA. create models that provide a comprehensive view for the analysis, design, and architecture of all software components in an organization. envision the coexistence of services in an interoperable computing environment. Definition: The service-oriented modeling framework (SOMF) is a serviceoriented development life cycle methodology that provides practices, disciplines and a universal language to provide tactical and strategic solutions to enterprise problems 7.4.1 Service Evolution SOM advocates the transformation of a service through 4 states. Conceptual Service Analysis Service Design Service Solution Service 1. conceptual service: in its inception, a service appears merely as an idea or concept. 2. analysis service: it becomes a unit of analysis. 185
7.4. SERVICE-ORIENTED MODELING 3. design service: it evolves into a design entity. 4. solution service: it ends in a physical solution that is ready to be deployed in the production environment. 7.4.2 Life Cycle Structure identifies the elements for service development and operations. It consists of 4 major components. timeline: defines the life span of a service. events: 2 types of events during the service life span. predicted and scheduled events, e.g. deployment stage. milestone, planning stage or unexpected events, e.g. stock market crash, trading volume exceeds capacity of trading service. events have beginnings and may last for a while. seasons: services live through 2 major life cycle seasons. design-time season: services are conceptualized, analyzed, designed, constructed and tested. run-time season: services are managed, monitored, and controlled to ensure proper performance. disciplines: identify modeling and nonmodeling best practices and standards to be pursued throughout the service life cycle. season disciplines: e.g. service-oriented conceptualization, business integration or construction. continuous disciplines: e.g. service portfolio management, service governance. 186
7.4. SERVICE-ORIENTED MODELING continuous disciplines e.g. planning stage service life cycle events e.g. financial crisis design-time season disciplines run-time season disciplines start service life cycle seasons and disciplines service life cycle timeline end 7.4.3 Life Cycle Modeling The following core processes can be identified in which business and IT personnel must be engaged to produce design and solution artifacts. Conceptual service Service-oriented conceptual modeling Design service Service-oriented logical architecture modeling Analysis service Service-oriented discovery & analysis modeling Service-oriented business integration modeling Analysis service Service-oriented conceptual architecture modeling Packaged services Service-oriented logical design modeling Packaged services Solution service 187
7.4. SERVICE-ORIENTED MODELING Conceptual modeling: identify driving concepts behind future solution services. Discovery & analysis modeling: discover and analyze services for granularity, reusability, interoperability, loose-coupling, and identify consolidation opportunities for the existing software assets. Business integration modeling: identify service integration and alignment opportunities with business domains processes (organizations, IT, products, geographical locations). Logical design modeling: establish service relationships and message exchange paths; address service visibility, prepare service logical compositions; model service transactions. Conceptual architecture modeling: establish an SOA architectural direction; select an SOA technological environment; establish an SOA technological stack; identify technological asset ownership. Logical architecture modeling: integrate SOA software assets; establish SOA logical environment dependencies; foster service reuse, discoverability, loose coupling and interoperability. 7.4.4 SOM Framework Modeling components and disciplines are integrated into a SOM framework. modeling practices abstraction practice realization practice modeling disciplines modeling artifacts conceptual environment conceptual modeling conceptual service conceptual architecture modeling conceptual architecture analysis environment discovery & analysis modeling analysis service business integration modeling logical environment logical design modeling design service logical architecture modeling logical architecture modeling environments modeling solution solution service physical environment physical architecture 188
7.4. SERVICE-ORIENTED MODELING 7.4.5 Other SOA Design Methodologies A brief overview of some other SOA design methodologies Creating Service-Oriented Architectures (CSOA) by Barry & Associates focus is on technical aspects consist of the 5 phases experiment with Web Services adapt existing systems to use Web Services remove intersystem dependencies establish internal SOA incorporate external services Service-Oriented Transformation of Legacy Systems (SOTLS) by Nadhan targets the stepwise evolution of existing application systems towards service-oriented architectures focus is on technical aspects Service-Oriented Design and Development (SOAD) by Papazoglou incorporates the perspectives of the service provider as well as the service consumer consist of the phases planning, analysis, service design, service construction, service test, service deployment/execution and service management/monitoring 189
Chapter 8 Distributed file service 8.1 Issues This section introduces schemes for replication and concurrency control in the context of distributed file services. What are the general characteristics of a distributed file service? How to maintain consistency of replicated files? What are voting schemes? Presentation of the Coda file service. 8.2 Introduction When a group of programmers has the task to build a distributed application, in addition to distributed code management there is also the need for distributed file services. 8.2.1 Definitions Definition: A distributed file system (e.g. Sun Network File System (NFS) (see page 16)) is characterized by: a logical collection of files on different computers into a common file system, and computers storing files are connected through a network. 190
8.2. INTRODUCTION Definition: A distributed file service is the set of services supported by a distributed file system. The services are provided by one or several file servers; a file server is the execution of file service software on a computer. Definition: Allocation is the placement of files of a distributed file system on different computers. Definition: system. Relocation changes file allocation within the distributed file Definition: Replication there exist multiple copies of the same file on several computers. Replication degree REP d of a file d: total number of copies of d within the distributed file system. If replication transparency (see page 27) is supported, the user is unaware of whether a file is replicated or not. 8.2.2 Motivation for replicated files A distributed file system supporting replicated files has the following characteristics: Less network traffic and better response times. Higher availability and fault tolerance with respect to communication and server errors. Parallel processing of several client requests. The key concept of a distributed file system is transparency. User s impression: interaction with a normal, central file system. Goal to support the following transparency (see page 26) types: location, access, name, replication and concurrency transparency. 8.2.3 Two consistency types In the context of replicated files we can distinguish between two types of file consistency. 191
8.2. INTRODUCTION Internal Consistency A single file copy is internally consistent, e.g. by applying a "2-phase commit" protocol. Mutual Consistency It is obvious that all copies of replicated information should be identical all file copies are mutually consistent, for example by applying the "multiple copy update" protocol. Strict mutual consistency: after executing an operation, all copies have the same state. Loose mutual consistency: all copies converge to the same consistent state of information. 8.2.4 Replica placement A major issue of distributed data store is the decision when and where to place the file replicas. Permanent replicas The number and placement of replicas is decided in advance, e.g. mirroring of files at different sites. Server-initiated replicas They are intended to enhance the performance of the server. Dynamic replication to reduce the load on a server. file replicas migrate to a server placed in the proximity of clients that issue file requests. Client-initiated replicas Client-initiated replicas are more commonly known as caches. Used only to improve access times to data. Client caches are normally placed on the same machine as its client. Replicas are only kept for a limited time. 192
8.3. LAYERS OF A DISTRIBUTED FILE SERVICE 8.3 Layers of a distributed file service The functions of a distributed file service are usually arranged in a hierarchical way. naming / directory service replication service transaction service file service block service 8.3.1 Layer semantics Each layer of the distributed file service has a specific task. Name/directory service placement of files; file relocation for load balancing and performance improvement; localization of the server which manages the referenced file. mapping of textual file names to file references (server name and file identifier). Replication service file replication for shorter response times and increased availability. handles data consistency and the multiple copy update problem. Transaction service provides a mechanism for grouping of elementary operations so as to execute them atomically; mechanisms for concurrency control; Mechanisms for reboot after errors; 193
8.4. UPDATE OF REPLICATED FILES File service relates file identifiers to particular files; performs read and write operations on the file content and file attributes. Block service accesses and allocates disk blocks for the file. 8.4 Update of replicated files Basically, there are two types of approaches for multiple update control: the optimistic and the pessimistic approach. 8.4.1 Optimistic concurrency control Data consistency is not guaranteed. The concurrency control scheme does not constrain the activities of the user; it allows access to inconsistent data. Example: Coda file system of Carnegie-Mellon University. The Available Copy scheme read access to the local or to best-available file copy. in case of write access, all available file copies are updated. 8.4.2 Pessimistic concurrency control Pessimistic concurrency control for data-critical applications, e.g. applications. Always access to consistent data. banking Classification of pessimistic concurrency control 194
8.4. UPDATE OF REPLICATED FILES multiple copy update nonvoting voting primary site token passing majority voting weighted voting Primary site A well-defined file copy, the primary site, serializes and synchronizes all (write) operations. Token passing Access to the replicated file (i.e. a file copy) is only permitted, if the client has the token. Voting schemes The result of the negotiation between all file replicas determines whether a file access is granted or not. global consent is necessary, but control is decentralized. in case of consent, the relevant file block is locked. Examples: Majority consensus, weighted voting. 8.4.3 Voting schemes Voting schemes provide pessimistic concurrency control. Introduction Voting schemes are algorithms for maintaining mutual consistency of replicates even in situations of computer crashes and network partitionings. Let us assume, there exist REP replicas of file d. 195
8.4. UPDATE OF REPLICATED FILES Let sg(r) be the weight of the vote of computer r; K be the set of all computers considered. Let the sum of all weights be SUM = r K sg(r). Definitions Definition: The votum for a desired access of a file is defined as the sum of votes from the set of computers that have voted for the desired access. Definition: The obtained votum is called successful if the sum of votes from the set of computers that have voted for the desired access is equal to or greater than a lower bound, the so-called quorum. File access is permitted (positive votum), if the following holds for read access: at least R positive votes (read quorum). for write access: at least W positive votes (write quorum). Multiple-reader-single-writer strategy quorum must support the multiple-reader-single-writer strategy; at least one computer must have the most up-to-date physical replica of the file. Multiple-reader-single-writer strategy maintains file consistency: R + W > SUM, i.e. a reader excludes a writer, and vice versa. W + W > SUM, i.e. only one writer gets a positive vote. Voting scheme variants For further variants and details see the book Borghoff/Schlichter, Springer-Verlag, 2000. Write-All-Read-Any Write access to all copies; read access to any copy of the file. The scheme can be considered an extreme case of a voting scheme write quorum: W = n, with n the number of computers. read quorum: R = 1 196
8.5. CODA FILE SYSTEM Majority consensus In the majority consensus scheme, a votum is successful if at least a majority of computers with a right to vote have voted for the desired access. the vote of each computer with a replicate has the same weight ("one person one vote"), i.e. r K: sg(r) = 1. A votum is successful if the majority of all relevant computers agree with respect to the desired access: W = R = REP/2 + 1, if REP is even. W = R = (REP+1)/2, if REP is odd. Weighted voting Each computer possessing a file copy receives a certain number of votes, i.e. r K: sg(r) {0, 1, 2,...}. We get: SUM = r K sg(r) Example: same read and write quorum W = R = SUM/2 + 1, if SUM even. W = R = (SUM+1)/2, if SUM odd. 8.5 Coda file system Coda was designed to be a scalable, secure, and highly available distributed file service. supporting the mobile use of computers. files are organized in volumes. Coda relies on the replication of volumes. 8.5.1 Architecture 197
8.5. CODA FILE SYSTEM client machine server machine user process user process Venus process Vice process RPC client stub RPC server stub virtual file system layer local file system Venus processes provide access to files maintained by the Vice file servers. role is similar to that of an NFS client. responsible for allowing the client to continue operation even if access to the file servers is (temporarily) impossible. 8.5.2 Naming Each file is contained in exactly one volume. Distinction between physical volumes. logical volume (represents all replicas of a volume). RVID (Replicated Volume Identifier): identifier of a logical volume. VID ( Volume Identifier): identifier of a physical volume. File identifier Coda assigns each file a 96-bit file identifier. 198
8.5. CODA FILE SYSTEM volume replication DB RVID file identifier file handle VID1 VID2 server file handle file server server 1 volume location DB server 2 server file handle file server 8.5.3 Replication strategy Coda relies on replication to achieve high availability. It distinguishes between two types of replication. Client caching When a file is opened, an entire copy of the file is transferred to the client; caching of the file. client becomes less dependent on the availability of the server. Cache coherence is maintained by means of callbacks. Server records a callback promise for a client. update of the file by a client notification to the server invalidation message to other clients. 199
8.5. CODA FILE SYSTEM client A open(rd) file f server open(wr) client B file f invalidate close close open(wr) open(rd) file f OK no file transfer time Server replication Coda allows file server to be replicated; the unit of replication is a volume. Volume Storage Group (VSG): collection of servers that have a copy of a volume. client s Accessible Volume Storage Group (AVSG): list of those servers in the volume s VSG that the client can contact. AVSG = {}: client is disconnected. Coda uses a variant of the "read-one, write-all" update protocol. Coda version vector Coda uses an optimistic strategy for file replication. For each file version there exists a Coda version vector (CVV). CVV is a vector timestamp (see page 108) with one element for each server in the relevant VSG. CVV is initialized to [1,..., 1]. On file close the Venus process of the client broadcasts an update message to all servers in AVSG all servers of AVSG update the relevant CVV entries. Let v1 and v2 are CVVs for two versions of a file f. when neither v1 v2 nor v2 v1 there is a conflict between the two file versions. 8.5.4 Disconnected operation In the disconnected situation a client will simply resort to its local copy in the cache. 200
8.5. CODA FILE SYSTEM AVSG = {} for the volume. Venus supports a priority list of files which should be cached locally. Venus supports hoarding. Reintegration When disconnected operation ends, a process of reintegration starts. for each cached file that has been modified, Venus sends update operations to all servers in AVSGs. 201
Chapter 9 Distributed Shared Memory Issues of the section implicit communication via shared memory what is the Linda tuple space? Javaspaces as modern tuple space 9.1 Introduction Distributed shared memory (DSM) is an abstraction used for sharing data between computers that do not share physical memory. process accessing DSM distributed shared memory DSM appears as memory in address space of process physical memory physical memory physical memory 202
9.2. PROGRAMMING MODEL 9.2 Programming model Message passing model variables have to be marshalled from one process, unmarshalled into other variables at the receiving process. transmitted and Distributed shared memory the involved processes access the shared variables directly; no marshalling necessary. processes may communicate via DSM even if they have non-overlapping lifetimes. Implementation approaches in hardware shared memory multiprocessor architectures, e.g. NUMA architecture. in middleware language support such as Linda tuple space or JavaSpaces. 9.3 Consistency model The content of DSM may be replicated by caching it at the separate computers; data is read from the local replica. updates have to be propagated to the other replicas of the shared memory. Approaches to keep the replicas consistent Write-update updates are made locally and multicast to all replicas possessing a copy of the data item. the remote data items are modified immediately. Write-invalidate before an update takes place, a multicast message is sent to all copies to invalidate them; acknowledgement by the remote sites before the write can take place. other processes are prevented to access the blocked data item. the update is propagated to all copies, and the blocking is removed. 203
9.4. TUPLE SPACE 9.4 Tuple space The tuple space was invented by Gelernter (Yale University) as an object-oriented approach to managing distributed data. It was specially designed for Linda language. Tuple space consists of a set of tuples that could be interpreted as lists of typed fields. A tuple space has the following basic characteristics: it is based on the shared-memory model. tuples represent information, e.g. ("Linda", 3). 9.4.1 Atomic operations Tuple space supports read and write operations on the shared memory. 1. Operations on a tuple t out(t): creates a new tuple t in the tuple space. in(t): reads and simultaneously removes a tuple from the tuple space. read(t): reads a tuple; t remains in the tuple space and subsequent operations can refer to it. 2. Read access is associative, e.g. in("order",?i,?j). 3. in, read are synchronous. 4. inp, readp are asynchronous. 5. Generation of new processes: eval(t). 9.4.2 Tuple space implementation Implementation alternatives 1. central tuple space. 204
9.4. TUPLE SPACE 2. replicated tuple space, each computer maintains a complete copy of the tuple space. 3. distributed tuple space; division into subspaces each computer owns part of the tuple space; out operations are executed locally. 9.4.3 Example for client-server communication The following simple example shows how the client-server style of communication could be programmed in the tuple space model. /* Client */ begin end int client-id = unique identifier; list of unspecified parameterlist; list of unspecified resultlist;... collect arguments for server call; out ("request", client-id, parameterlist); in ("reply", client-id,?resultlist); process results /* Server */ 205
9.5. OBJECT SPACE begin end int client-id; list of unspecified parameterlist; list of unspecified resultlist;... while (true) do in ("request",?client-id,?parameterlist); compute result; out ("reply", client-id, resultlist); 9.5 Object Space object space for sharing and exchanging objects between components of a distributed application JavaSpaces (URL: http://java.sun.com/developer/technicalarticles/tools/javaspaces/index.html) supports an object space. based on the Linda tuple concept. Tuples are references to Java objects 9.5.1 Introduction A space is a shared, network-accessible object repository. Processes use the repository as a persistent object storage and exchange mechanism. 206
9.5. OBJECT SPACE process write take process process read JavaSpace JavaSpaces uses the Jini technology. 9.5.2 Features of JavaSpaces The JavaSpaces programming interface is simple; a space provides the following key features. Objects in a space are passive. processes do not manipulate objects directly in the space. processes do not invoke methods of objects in the space. Spaces are shared: they represent a network-accessible memory that many remote processes can interact with concurrently. Spaces are persistent: objects are stored until a process explicitly removes them or until their lease time expires. Spaces are associative: objects are accessed via associative lookup, rather than by identifier or by memory address. Spaces are transaction oriented: access operations to the space are atomic. Spaces support the exchange of executable code. 9.5.3 Data structures Entry interface Objects in a space are realized via the Entry interface (net.jini.core.entry package). 207
9.5. OBJECT SPACE Interface Definition public interface Entry extends java.io.serializable { // this interface is empty } Example of an object representing a shared variable in the distributed system public class SharedVar implements Entry { public String name; public Integer value; public SharedVar() { } public SharedVar(String name, int value) { this.name = name; this.value = new Integer(value); } } Instantiation of a shared variable within a process SharedVar global_counter = new SharedVar("counter", 0) SpaceAccessor The shared space is identified via the method getspace of the SpaceAccessor class. JavaSpace space = SpaceAccessor.getSpace(); Access to the space identifier; there are two options the space is registered as Jini service, i.e. Jini lookup services may be used. the space is registered in the RMI (see page 73) registry. 9.5.4 Basic operations Overview For manipulation of space entries, there are the following basic operations available read take, i.e. read and remove write notify, i.e. inform the process when an entry matching the given pattern has arrived. 208
9.5. OBJECT SPACE Write - operation Lease write (Entry e, Transaction txn, long lease) throws RemoteException, TransactionException Parameter semantics Entry e is entered into the space; e is transmitted, as well as stored, in a serialized form in the space. Transaction txn allows to group several operations to a transaction; the parameter value null represents a transaction with only one operation. long lease specifies how long the entry e is to be stored in the space before the space automatically removes the entry e. The result Lease specifies how long the space will store the entry e. Write can trigger the exceptions RemoteException (communication problems) and TransactionException (transaction txn not valid). Example space.write(global_counter, null, Lease.FOREVER); Read and take - operation The methods read and take access an object in a space. read copies the object into the local process environment while take removes it from the space. For remote access, a process needs a template. A template is a kind of entry: containing some specified and some empty fields (i.e. the value null). matching associatively the relevant objects in the space. If several objects in the space match the template, then an object is selected at random. Example SharedVar template = new SharedVar("counter"); SharedVar result = (SharedVar) space.take(template, null, Long.MAX_VALUE) The take operation waits until there is a suitable entry in the space available. 209
9.5. OBJECT SPACE Matching rules An access template matches an object in the space if the following rules hold true the template class matches the object class, or else the template class is a super class of the entry s class. if a template field has a wildcard (null), then it matches the corresponding object field. if a template field is specified, then it matches the object s corresponding field if the two values are the same. Atomicity All basic operations are atomic. The following code segment defines an atomic access of a shared global variable in the space SharedVar template = new SharedVar("counter"); SharedVar result = (SharedVar) space.take(template, null, Long.MAX_VALUE); result.value = new Integer(result.value.intValue() + 5); space.write(result, null, Lease.FOREVER); Thus, there are no race conditions between concurrent processes for the shared variable. 9.5.5 Events A client can request to be notified when a specific tuple instance is written to the JavaSpace. 210
9.5. OBJECT SPACE process 1 C write C 1. request notification for T T process 2 C 2. insert a copy of C A A B D JavaSpace 3. notify when C is inserted C 4. lookup for tuple that matches T 5. return C 9.5.6 Example Java Spaces A process is notified when a new message is deposited in the object space. The process retrieves the new message from the object space. Message Entry import net.jini.core.entry.entry; public class Message implements Entry { public String content; public Message() { } } Listener import java.rmi.server.*; import java.rmi.remoteexception; import net.jini.core.event.*; import net.jini.space.javaspace; public class Listener implements RemoteEventListener { private JavaSpace space; public Listener(JavaSpace space) throws RemoteException { this.space = space; UnicastRemoteObject.exportObject(this); } 211
9.5. OBJECT SPACE } public void notify(remoteevent ev) { Message template = new Message(); try { Message result = (Message)space.read(template, null, Long.MAX_VALUE); } catch (Exception e) { e.printstacktrace(); } } HelloWorld import jsbook.util.spaceaccessor; import net.jini.core.lease.lease; import net.jini.space.javaspace; public class HelloWorldNotify { public static void main(string[] args) { JavaSpace space = SpaceAccessor.getSpace(); try { Listener listener = new Listener(space); Message template = new Message(); space.notify(template, null, listener, Lease.FOREVER, null); Message msg = new Message(); msg.content = "Hello World"; space.write(msg, null, Lease.FOREVER); } catch (Exception e) { e.printstacktrace(); } } } 212
Chapter 10 Object-based Distributed Systems There are a number of object-oriented systems, some of which have been ported to a distributed environment, whereas others have been designed especially for distributed environments, e.g. Emerald, Argus and Linda. All these systems, however, have in common that they are usually targeted for homogeneous environments. The OMG (URL: http://www.omg.org/) (Object Management Group) was founded in 1989 by a number of companies to encourage the adoption of distributed object systems and to enable interoperability for heterogeneous environments (hardware, networks, operating systems and programming languages). 10.1 Object Management Architecture - OMA The architecture is also referred to as CORBA ("Common Object Request Broker Architecture"). OMA is a possible middleware for object-oriented distributed applications. 213
10.2. OBJECT REQUEST BROKERS ORB application objects object request broker (ORB) object bus object services (system layer) common functionality (application layer) ORB supports the communication among the objects through a request/reply protocol. ORB includes object localization, message delivery, method binding, parameter marshalling, and synchronization of request and reply messages. ORB itself does not execute methods. Rather, it mediates between application objects, service objects, and shared functionalities of the application layer ("application frameworks"). 10.2 Object Request Brokers ORB The ORB connects distributed objects dynamically at runtime and supports the invocation of distributed service objects. 10.2.1 General features ORB supports the following general characteristics 1. static and dynamic invocation of object methods static: method interface is determined at compilation time. dynamic: method interface is determined at runtime. 214
10.2. OBJECT REQUEST BROKERS ORB 2. interfaces for higher programming language, e.g. C++, Smalltalk, Java. 3. a self-descriptive system. 4. location transparency. 5. security checks, e.g. object authentication. 6. polymorphic method invocation, i.e. the execution of the method depends on the specific object instance. Difference between RPC and ORB RPC calls a specific server function; data are separated. ORB calls the method of a specific object. 7. hierarchical object naming. 10.2.2 Structure of ORB client server object interface repository runtime repository dynamic interface static interfaces ORB interface static skeletons dynamic skeleton object adapter ORB kernel ORB components ORB core (kernel): mediates requests between client and server objects; handles the network communication within the distributed system. operations to convert between remote object references and strings. operations to provide argument lists for requests using dynamic invocation. 215
10.2. OBJECT REQUEST BROKERS ORB Static invocation interface at compile time, operations and parameters are determined. an object class may have several different static interfaces. Dynamic invocation interface Procedures and parameters are determined at runtime; the interface is identical for all ORB implementations, i.e. there is only one dynamic invocation interface. ORB interface supports ORB service calls, e.g. conversion of object references to strings and vice versa; the interface is determined by the ORB. Interface repository stores at runtime the signatures of the available methods; the signatures are described by the IDL notation; in case of the dynamic invocation interface a lookup within the interface repository is performed. Object adapter: bridges the gap between Corba objects with IDL interfaces and the programming language interfaces of the server class. forwards client calls to the appropriate server object using the skeletons. defines a runtime environment for initialization of server objects and assignment of object identifiers. activates objects. Runtime repository stores information about the object classes supported by the server, as well as the already instantiated objects and their identifiers. Skeletons Skeleton classes are generated in the language of the server by the IDL compiler: contains stubs for server object calls. static: stubs for the static interfaces; they are generated along with the static stubs on the client side by the IDL compiler; there may be several static skeletons. 216
10.3. COMMON OBJECT SERVICES dynamic: provides a runtime binding mechanism for servers that need to handle incoming method calls for objects that do not have a static skeleton (generated by the IDL compiler). Embedding in distributed Applications Usually the ORB is embedded as a library function. client machine client application server machine object implementation static interface dynamic interface ORB interface object adapter static skeleton dynamic skeleton ORB interface client ORB local operating system server ORB local operating system network ORBIX by Progress (URL: http://web.progress.com/de/products/allproducts.html) (former Iona Technologies). available as libraries: client and server library. based on TCP/IP transport mechanism the TAO system by Doug Schmidt (URL: http://www.cs.wustl.edu/~schmidt/tao.html) is an implementation of the Corba model. exists as a free platform and as a commercial product. 10.3 Common object services A collection of system level services which can be utilized by the application objects; they are extending the ORB functionality. 217
10.4. INTER-ORB PROTOCOL Life-cycle Service: defines operations for object creation, copying, migration and deletion. Persistence Service: provides an interface for persistent object storage, e.g. in relational or object-oriented databases. Name Service: allows objects on the object bus to locate other objects by name; integrates existing network directory services, e.g. OSF s DCE, LDAP (see page 54) or X.500. Event Service: register the interest in specific events; producer and consumer of events need not know each other. Concurrency Control Service: provides a lock manager. Transaction Service: supports 2-phase commit coordination for flat and nested transactions. Relationship Service: supports the dynamic creation of relations between objects that know nothing of each other; the service supports navigation along these links, as well as mechanisms for enforcing referential integrity constraints. Query Service: supports SQL operations for objects. 10.4 Inter-ORB protocol Communication between ORBs is based on GIOP ("General Inter-ORB Protocol"). 10.4.1 GIOP Features GIOP specifies a set of message formats, e.g. request, reply, cancelrequest, and common data representations (CDR) for communication between ORBs. It also specifies a standard form for remote object references. GIOP works directly over a connection-oriented, reliable transport system IIOP (Internet Inter-ORB Protocol) is GIOP based on TCP/IP. 218
10.4. INTER-ORB PROTOCOL 10.4.2 External data representation Distinction between primitive and complex data types, so-called typecodes; assignment of integer values to identify data types primitive: char, octet, short, long, float, double, boolean complex: struct, union, sequence, symbol chains, fields The format of complex data types is described in the interface repository. Example tk_struct (Typecode struct): string: repository_id string: name ulong: count {string: member name TypeCode: membertype } 10.4.3 Object reference The object reference identifies the object that can be accessed through the Inter- ORB protocol. For IIOP, the object reference (IOR profiles) consists of IP host address (e.g. host name). TCP port number. object key. 10.4.4 GIOP message A GIOP message has three components the GIOP message head. a header which depends on the message type, e.g. request message, reply message. the message content. 219
10.4. INTER-ORB PROTOCOL GIOP message head The GIOP message head has the same format for all message types; it identifies the message type sent to another ORB. GIOP message head structure module GIOP struct Version {octet: major; octet: minor}; enum MsgType {Request, Reply, CancelRequest, LocateRequest, LocateReply, CloseConnection, MessageError, Fragment} struct Message_Header { } char magic[4]; this is the string "GIOP" Version GIOP_version; octet flag octet message_type usigned long message_size the component flags determines the used byte ordering (big/little endian) and whether or not the entire message has been divided into several fragments. message_type is an element of the MsgType enumeration; it identifies the message type. GIOP message types GIOP supports the following message types: 1. Request: request the execution of an operation at the remote object, e.g. access of an attribute; the message contains the call parameters. 2. Reply: answer to a request message. 3. CancelRequest: termination of a request; the calling ORB does not expect an answer to the original request. 220
10.4. INTER-ORB PROTOCOL 4. LocateRequest: is used to determine whether the given object reference is valid, or whether the destination ORB processes the object reference, and if not, to which address requests for the object reference are to be sent. 5. LocateReply: answer to LocateRequest 6. CloseConnection: the destination ORB notifies the calling ORB that it closes the connection. 7. MessageError: exchange of error information. 8. Fragment: if, for instance, the request consists of several parts, then first a request message is sent, and then the remaining parts are sent using fragment messages. 10.4.5 Example for IIOP use Web access of a database using a Java applet and Corba. client environment Web browser 1 get HTML page HTTP send applet 2 Web server Web page Java applet Corba server objects ORB database Java applet execution 3 4 ORB IIOP IIOP 10.4.6 RMI over IIOP RMI uses JRMP (Java Remote Method Protocol) for the communication between client and server objects, i.e. there is no interoperability with Corba. 221
10.5. DISTRIBUTED COM RMI client (Java) Corba client (any programming language) JRMP IIOP RMI server (Java) Corba server (any programming language) Extension of RMI to RMI-IIOP RMI-IIOP uses JNDI in order to register objects by their names. RMI client (Java) JRMP RMI server (Java) RMI-IIOP client (Java) IIOP RMI-IIOP server (Java) Corba client (any programming language) IIOP Corba server (any programming language) Moreover, there is a Java IDL for Corba does not use JRMP for communication between remote objects. no interaction with RMI objects is supported. 10.5 Distributed COM DCOM grew out of COM (Component Object Model) tightly integrated into Windows OS. Goal of COM: support the development of components that can be dynamically activated and that can interact with each other. component: executable code either contained in a DLL or in form of an executable program. COM is offered in form of a library that is linked to a process. 222
10.5. DISTRIBUTED COM DCOM: extension of COM which allows a process to communicate with components that are placed on another machine. DCOM provides access transparency. 10.5.1 Object Model DCOM adopts the remote-object model. a DCOM object is simply an implementation of an interface each interface has a unique 128-bit identifier, called Interface Identifier (IID). each IID is globally unique. DCOM supports only binary interfaces; essentially a table with pointers to the implementations of the methods which are part of the interface. pointer to method implementation binary interfaces IDL-to-interface compiler IDL specification memory table Java class defs language-defined interfaces IDL-to-language compiler C++ class defs Standard compiler compiler specific code A DCOM object is created as an instance of a class. DCOM objects are transient. 10.5.2 Architecture The overall architecture of DCOM in combination with the use of class objects, objects and proxies has the following form. 223
10.5. DISTRIBUTED COM client machine object server service control manager SCM client application proxy marshaler client proxy COM type library class object proxy marshaler object stub object COM service control manager SCM registry Local OS Local OS registry Microsoft RPC network the type library specifies the exact signature of the method to be invoked dynamically. the registry records the mappings of a call identifier to a local file name containing the implementation of that class. the service control manager (SCM) is responsible for activating objects. port for incoming requests and object identifier are registered by SCM. the proxy marshaler deals with transforming the code of a proxy into a series of bytes for network transmission. the client proxy represents the object s interface on the client side; responsible for (un)marshaling of object invocations. The object stub does (un)marshaling of invocations on the server side. 10.5.3 Object Invocation Model DCOM supports the remote-invocation model, i.e. receives an answer from the remote object. use of a cancel object to cancel a pending synchronous call. Passing of Object References a client is blocked until it a client references a remote object via an interface pointer; an interface is implemented by means of a proxy. how does a process A pass an object reference to process B? 224
10.6..NET FRAMEWORK process A client application Interface ID process B client application client proxy proxy marshaler marshaled client proxy proxy marshaler client proxy network binding information same binding information object stub object object server DCOM was combined with Microsoft Transaction Server (MTS) and Microsoft Message Queue Server (MSMQ) to COM+. supports distributed transactions to enable transactional distributed applications integrated into Windows OS 10.6.NET Framework The Microsoft.NET Framework is a software framework available with Windows OS for building distributed applications. represents a strategy change from the product-oriented desktop world to the service-oriented component world. goal: many future Windows applications should be built using.net..net has 2 core elements: Framework Class Library. Common Language Runtime (CLR) and the 10.6.1 Common Language Runtime (CLR) provides a runtime environment for applications which may be developed in different languages, e.g. C#, C++, Java, Perl or Python. CLR supports the following services memory management. thread management. 225
10.6..NET FRAMEWORK libraries encapsulate access to OS functions. common intermediate Language (MSIL). All.NET programs execute under the supervision of the CLR. Common Type System (CTS) CTS defines all possible datatypes and programming constructs supported by the CLR data structures are uniformly interpreted on the MSIL layer. enables interoperability between the languages supported by.net 10.6.2 Frame Class Library object-oriented library of common functions available to all languages using the.net Framework. e.g. file access, XML document manipulation, or database interaction. organized in a hierarchy of namespaces System Microsoft System.XML System.Web Microsoft.Win32 System.Web.Services System.Object is the base of all library classes and application classes. 10.6.3.NET-Remoting technology for remote method invocation provided by the framework. relevant classes are in the namespace System.Runtime.Remoting. support of different transport protocols, e.g. TCP (binary formatting) or HTTP (SOAP formatting). activation of remote objects. 226
Chapter 11 Summary This lecture discussed basic concepts which are important for the design and implementation of distributed applications. Major issues presented were: basic interaction models, such as client-server, remote procedure call, remote method invocation. distributed execution model, distributed transactions and group communication. distributed shared memory, e.g. tuple space. Web Services. distributed file systems, replication, voting schemes. design issues for distributed applications. object-based distributed systems, such as Corba. authentication in distributed systems. 227