Large Scale Server Publishing for Dynamic Content

Transcription

1 Large Scale Server Publishing for Dynamic Content Karl Andersson Thunberg June 7, 2013 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Jan-Erik Moström Supervisor at Dohi Sweden: Linus Mähler Lundgren Examiner: Fredrik Georgsson UmeåUniversity Department of Computing Science SE UMEÅ SWEDEN

2

3 Abstract The number of interactive and dynamic web services on the Internet is growing more and more and to accommodate as much functionality as possible, many techniques for asynchronous web communication are being developed. This thesis report describes the evaluation of an existing web service that uses bidirectional communication over the web to provide voting functionality in real-time on web pages. The thesis consists of an assessment of problem domains, an evaluation of the system and an implementation of some of the identified problems. It focuses on a few core issues of the current solution, namely the communication techniques between the client and the server, the setup of the overarching structure of the system and the separation of messaging channels for different use cases. The evaluation of the reference system was motivated by addressing the issue of being able to packet the service better as a product and create a distinction between the use case and the underlying system. It was done so that the stakeholders of the product may more easily define the way the service can be used and so that a better course of action can be taken for continuing the development of the service. The implemented solution shows an example of how the messaging channels could be separated and what kind of trade-offs exist between the current and implemented solution.

4 ii

5 Contents 1 Introduction Background Goals Purpose Related Work Thesis Overview Method Outline Development Method Planning Sessions Ongoing Sprint Activities Sprint Transitions Evaluation of Existing Software Results Analysis In-depth Study Documenting and Tracking Results System Orientation Assessment of Problem Domains Web Communication Communication Techniques Web Servers Web Application Libraries Internal Messaging Messaging Paradigms Qualities of Service (QoS) Messaging Systems Data Processing Batch Data Processing Stream Data Processing iii

6 iv CONTENTS 3.4 Distributed Coordination with Zookeeper Evaluation of Reference System Use Case System Overview Discussion Assessment of Issues Inseparable Use Case Functionality Not Defined Required Maintenance Single Messaging Channel Complete Replica of Global State Intra-dependencies Simple Messaging State Synchronisation System Definition Proposal of Improvements System Architecture Messaging Channels Publish-Subscribe Pattern Introduction Overview Subscription Models Problems Measurements of Results Classification of Methods Research Summary Decentralizing Brokers Peer-to-peer Methods Self-managed Self-optimized Methods Conclusion and discussion Decentralising Brokers Peer-to-peer Methods Self-managed Self-optimised Methods The Future Implementation of Improvements Goals Measurements Implementation Details Channel Definition

7 CONTENTS v Components Results Results Analysis Separation of Messaging Data Separation of Messages Initialization Discussion Restrictions Future Work Acknowledgement 77

8 vi CONTENTS

9 List of Figures 1.1 A high level representation of the system, outlining three important aspects of the system; the asynchronous web communication between client and server (1), the back-end architecture (2) and the separation of the messaging channels (3) A work-flow overview of a typical sprint planning session The work-flow during a pre-planned workday Work-flow at the end of a sprint An overview of the system and its underlying components. The numbers represents three technical aspects that were deemed important for this thesis. They are: 1. the asynchronous communication between a web client and a web server, 2. the back end architecture and 3. the separation of the messaging channels for events A representation of how a client currently is exposed to the service. It is exposed to the service on a web page where the static content, such as HTML and CSS, is downloaded from a static file store. This static content also contains logic for communicating with the web server that contains the voting functionality Overarching components in the reference solution. A web client fetches the static content from a static file store, which also contains the logic to communicate with the web server that contains the voting functionality. A web client connects to a web server using a load balancer. Any votes are put on a processing queue so that votes are bufferd to a data processing component that calculates a result and returns it using a notification system. The service also uses a database to store the global state of all events An overview of the service bus architecture describing the way components interact with each other. This solution proposed the use of a message bus which all components uses to communicate with each other vii

10 viii LIST OF FIGURES 4.4 An overview of the two-tier architecture presenting the layout of the components of the system. This solution divides the service in two parts that communicate with a message bus in between them. One one side is the client functionality and on the other side is the use case functionality Outline of the layered web server solution. It introduces two levels in the web server where the higher level contains use case specific logic separate from each other, while the lower level manages these use cases A description of how types in a type-based publish-subscribe solution may relate to each other. A subscriber may subscribe to the type sports news, in which it will receive all sports news, both those that are golf news and those that are football news. A subscriber may also subscribe to, for example, golf news, in which it will only receive golf news and not fotball news An overview of the components in the solution which highlights the general functionality of the messaging channels that is not tied to any use case. It separates the web server in two levels: a higher level which contains use case specific functionality in separate modules and a lower level that manages these modules. It introduces a web client interface that the clients use to express their interest in events or channels. The client validation cache is also used directly by the lower level of the web servers. The load balander and the notification system is also viewed as part of the basic functionality Overarching components in the solution with modified components highlighted. Those that were modified during the implementation are the web client, web server, notification system and database. Other components were used as they were Number of messages received by three clients where the first client is publishing and subscribing to channel 1, the second client is subscribing to channel 1 and the third client is subscribing to channel The perceived latency for a client when initializing a connection to the service with one subscription The perceived latency of ten consecutive client initializations The size of the bootstrap when ten channels are added to the service, each with a bootstrap of 1000B data. The bootstrap is the initial message that the web server sends to a client when the client first connects

11 Chapter 1 Introduction Dynamic web applications are applications that attempt to enhance the web experience for users by providing interactive or interchanging content such as video or live chatting. Dynamic content 1 could be incorporated client-side which includes the ways a web page behaves in response to user input. It can also be present on the server-side which refers to the way a server responds to a client based on its internal state. Combining these two methods is a third alternative that can provide further functionality by using additional communication between a web client and a web server. Finding ways to create an asynchronous connection between a web client and a web server has always been studied because of the additional content such functionality can provide. Earlier solutions include making use of the HTTP request/response protocol to simulate a bidirectional connection using polling techniques. Because of HTML5 and the continuously improving performance of web browsers, dynamic content is being used more and more in web application and this has brought forward newer and more efficient ways to communicate over the web. Making use of browser plugins and controllers have also been used to establish further communication functionality but these kinds of solutions are slowly being phased out as more regular web communication techniques are standardised. Today, virtually any larger website provides some dynamic content using asynchronous communication. A couple of examples are Facebook s chat functionality, automatic updates on Twitter feeds and streaming video on Youtube. Games are a large part of the web and as HTML is getting more versatile and web communication is getting faster and more standardized, it opens more use cases. As web communication develops, existing web applications and their software architecture may need to be revised to support new methods and technologies which can be a complex task because of the large scale infrastructure needed of a web application and the higher demand on web application performance. 1.1 Background Dohi Sweden is a holding company that offers, inter alia, products and services for the interactive web. The company has developed a web service that makes use of asynchronous communication to provide a solution that can send and receive information between a web client and a web server without the need of a page reload. This service is used for receiving a lot of asynchronous data from the web clients, processing it on the back end and then 1 Dynamic content refers to content provided by dynamic web applications. 1

12 2 Chapter 1. Introduction pushing the results back to the web clients. It was made for another company that had a specific use case for it and it includes a web client interface, a web administration interface and a back end that serves these two interfaces. The use of this service is tied to specific events or time intervals where its functionality is provided for these events separately and the data transmitted during an event is completely decoupled from any other event. The current service that Dohi Sweden provides was developed completely for this use case and may be limited to use for other companies or similar use cases. The web clients for a specific event cannot be distinguished from web clients for other events and the data that is received and processed during an event are broadcasted to every connected web client no matter if a web client is interested in the event or not. The service could be used in many other ways if it featured a more general functionality, with the specific use case separated from the underlying system where also the web clients could be separated based of the event they are interested in. Making a more versatile solution is of interest for the stakeholders of Dohi Sweden as it enables packeting this solution so that it can be offered to other companies and applied to other use cases. Supporting multiple use cases that are clearly separated from each other would make a more appealing product as it would let clients focus on the work for the specific use of the product instead of doing redundant work that may be the same between use cases and events. 1.2 Goals The overall goal of this thesis project is to evaluate the stakeholders current solution and the underlying technologies in order to improve the service with a solution that may better suit their needs. This evaluation addresses the issue of separating the use case with the underlying system to make a solution that behaves more like a framework that can be used to implement other types of web services and use cases. Web services that are within the scope of the desired solution are those that makes use of asynchronous communication between the front end and the back end and those that feature a scalable back end where received data may be processed before a result is sent back. It is assumed that transmitted data between web clients and the back end are linked to an event and are completely separated from data that are transmitted for other events. To be able to create a base on which decisions for the evaluation can be made, it is necessary to include an assessment of the problem domains that are relevant for this project. The assessment does not include every part of the overall solution but instead focuses on three problem domains; client-server communication, internal back-end messaging and backend data processing. A fourth part, that is at least equally important but is not included in this work, is the data model of the solution. What kind of data model that should be used relies heavily on what kind of use case it should be designed for but the work needed to create a data model that suits all targeted use cases is outside the scope of this thesis. Other parts of the solution are not explicitly evaluated but may differ from the current solution, as replacing one may make it easier to use any of the suggested improvements. The work also includes an implementation to test and validate the results of the evaluation which means that the project can be broken down into the following three goals: Make an assessment of already existing techniques and solutions relevant to the project domains and describe in what way they are relevant, what they do and how they are used. Solutions for a specific task must be compared and their differences documented. Evaluate the current solution and identify problems related to separating the use case and the system. Make a proposal of improvements for the identified problems

13 1.3. Purpose 3 and describe in what way they differ from the current solution. This also includes documentation of how the the differences from the improvements impacts the current solution in terms of scalability, latency, manageability and extendability. Test and evaluate a set of the proposed improvements by applying them to the current implementation, including tests to validate the behaviour of the original use case and to document the differences in terms of scalability, latency, manageability and extendability. An abstract view of the service is presented in Figure 1.1 which does not consider its individual components but instead focuses on the higher level functionality and three aspects that are of importance for this kind of system. The overview is purposely presented at this high level, postponing the detailed description of the components until later chapters. The three important aspects of the service are the client-server communication, the back end application architecture and the separation of the data transmitted on different events. The client-server communication (1) refers to the service s ability to uphold an asynchronous connection that provides a reliable and fast way of transmitting data. It is important that the service can be provided to as many clients as possible and do not feature any heavy processing on the client-side of the solution so that the service does not degrade performance of any other web service it is integrated into. The back-end application architecture (2) is of interest as the way the internal components are set up and interact with each other, determines what the solution can be used for. The last aspect (3) addresses the importance of being able to distinguish between the messaging channels used for the different events as that makes it much more clear of how the use case can be separated from the system. 1.3 Purpose These goals, when met, will result in a documented evaluation of the current service which contains a list of proposed improvements where a subset of these improvements have been tested and demonstrated. This documentation outlines ways to proceed when continuing with the development of the service and how this development could be done depending on the desired path. This report will serve as a documentation for current techniques and solutions that together with the evaluation and the implementation provide material for gaining more knowledge in this field as well as provide an example of how this type of service can be made. The implemented improvements consists of a back end that contains a clearer distinction from the current use case so that it is more apparent how the system works and what it could be used for. The solution is able to support more than the current single use case and is able to send data to only the correct clients. This creates a good base that can be used when incorporating this functionality in the current system. 1.4 Related Work This project is an evaluation of a system that is owned and developed by Dohi Sweden and it is currently still being developed to include more use cases for the initial client. This may help during the development of this thesis as it may give further directions of where this project is heading and what areas is of interest for this thesis to study. The result of this thesis is meant to provide additional information when proceeding with developing the current service so that it may be easier to implement the addressed improvements. The

14 4 Chapter 1. Introduction Figure 1.1: A high level representation of the system, outlining three important aspects of the system; the asynchronous web communication between client and server (1), the back-end architecture (2) and the separation of the messaging channels (3). current solution is evaluated from the state it was in when this project was started and any further functionality will not be included in this evaluation. 1.5 Thesis Overview Chapter 2 presents the methodology used during this project to reach the defined goals of the thesis, including how the steps of the project was planned and documented. This

15 1.5. Thesis Overview 5 chapter also describes the tools used during the development and how results were measured and analysed. Chapter 2 concludes with a brief overview of the entire system together with an explanation of how to read subsequent chapters. By presenting the overview as early as possible, it allows readers to skip uninteresting parts, while still understanding how each component is related in the actual solution. Chapter 3 describes an assessment of the problem domains that are relevant for this thesis work. This chapter is meant to document the base on which future decisions for the implementation was made on so that the reader may better follow the reasoning behind the path that was taken. It represent the analysis of existing techniques and solutions that was made prior to the evaluation of the current system. Readers that are already confident in the problem domains that this chapter addresses may skip this chapter completely and continue reading without any problems of understanding the other chapters. Having made an assessment of the problem domains for this thesis, Chapter 4 focuses on the task of evaluating the current solution and identifying problems with regards to the project goals. It presents the use case for the system and gives an overview of how the system is built and how its different components interact with each other. It includes a discussion about the different components focusing on the problem domains and concludes with a list of proposed improvements for the identified problems. Chapter 5 is an in-depth study of scientific research in the area of the publish-subscribe pattern. It focuses on ways to build the internal messaging system so that one of the immediate goals of the thesis can be fulfilled, namely the separation of the use case and the underlying system. This chapter starts with an overview of the pattern that explains how it works and what it is used for, including different alternations of it. It contains a summary of three domains that are of interest for this project and outlines their impact and how they can be of use in this system. Chapter 6 presents an implementation that aims to solve some of the addressed problems using some of the proposed solutions from chapter 4. It describes the plan that was taken during the implementation as well as the design of the system with a list of what needs to be changed from the reference system. It also contains motivation as to why the proposed improvements was implemented, how they will be tested and what the results from the tests may conclude. Chapter 7 addresses the implementation that is made to test and evaluate a set of proposed improvements. It combines the result provided by the previous chapters and presents a suggested implementation, viewing the solution in its entirety. It proceeds to evaluate the results of the tests for the implementation in accordance with the evaluation criteria from Chapter 2. It concludes with some quantitative results derived from the test runs on the complete system based on the goals of this thesis. Chapter 8 contains an overall discussion and conclusions of both the implementation results and how ideas derived from the in-depth study in Chapter 5 can be realized with the developed framework as a base. It highlights the restrictions and limitations present in the implementation and what implications the developed system may have in the future.

16 6 Chapter 1. Introduction

17 Chapter 2 Method The general methodology, used when performing this thesis project, is described in this chapter. Section 2.1 lists the steps performed to divide, implement and analyze each individual component. The agile development method called Scrum, was used for planning and documentation during this work and this method is presented in Section 2.2. This chapter includes a description how existing software is to be evaluated in Section 2.3. Section 2.4 explains how the results related to the system evaluation are to be analyzed and how the tested improvements should be applied to create as fair of a comparison as possible. The approach used in the in-depth study is presented in Section 2.5 while Section 2.6 presents how the general project progress were documented. The chapter concludes with a system overview in Section 2.7 that describes the components of the system and which parts are focused on during the thesis. 2.1 Outline This section describes a general outline of the steps taken, both for the assessment of domains, the theoretical evaluation, the in-depth study and the implementation. It explains how the work was divided and performed and each step can be related to the thesis in its entirety as well as to each individual chapter. 1. Create a list of system requirements from the stakeholders, or in the case of current software analysis, assemble a list of supported features and potential limitations. 2. Split up the project into logical parts. These parts will be addressed individually, with a background focus on how these components make up the entire solution. 3. Do a pre-study related to the addressed subject and areas related to the different parts of the system. 4. Set up quantitative measurements for performance and validation for the thesis s individual components and the solution as a whole. 5. Produce data for the quantitative measurements by either implementing tests or gather information from other sources. 6. Analyze and document the results and relate these to the list of quantitative measurements. 7

18 8 Chapter 2. Method 2.2 Development Method Dohi Sweden uses an agile development method called Scrum[40]. Scrum focuses on the iteration of Sprints which is a limited time-frame where the development of a product is incremented to a potentially releasable product. Each Sprint has a clear and decisive goal that is supposed to be reached in a designated time-frame which usually spans one to a few weeks. Sprints have generally a consistent duration throughout the development and are immediately followed by another Sprint when the current is done. Short iterative steps are used to quickly build prototypes which can be tested to make quick decisions on how to proceed. Companies and project groups usually use their own modified version of the Scrum development method. The version and work-flow used at Dohi Sweden is a compromise of previous experiences at the company and the limited time span of the thesis project. The daily and weekly Scrum activities that were held with the external supervisor at Dohi Sweden are the following: Every day began with a Scrum meeting, including the discussion of: What was done yesterday Problems that had been encountered What had to be done until tomorrow Each week concluded with a summary of the following three aspects: Completed weekly tasks Problems that had been encountered Tasks that had to be postponed or modified After every month (roughly one iteration) the progress was presented to the external supervisor and summarized to provide a detailed project progress report. From this general activity work-flow there are three central Scrum activities that need to be specified further; the planning sessions, the activities during an ongoing sprint and finally the transition between two subsequent sprints Planning Sessions Planning sessions is an important part where all requirements are formulated into tasks, subsequently divided to fit a specific time granularity and then grouped into components. Although the development method belongs in the agile class, the planning process can be considered as a generally iterative process. As each requirement is first formulated into high-order tasks, these are iteratively divided into lower-level tasks. As long as the expected granularity of these tasks exceed a specified time threshold, they should be divided further. By keeping the tasks at the lowest possible level without specifying implementation details, it ensures that they are conceptually easy, allowing a more steady flow of progress to be reported. By grouping the divided tasks into components they are also easier to order by either expected difficulty or by the order of dependency on other tasks. Figure 2.1 on the facing page gives a work-flow overview of how the weekly sprint planning sessions of the thesis project were performed.

19 2.2. Development Method 9 Figure 2.1: A work-flow overview of a typical sprint planning session Ongoing Sprint Activities These planning sessions produced (at least) a week s workload divided into 4-hour tasks. The granularity of these tasks varied slightly, but as a consistency measure and to aid in project progress overview they were all averaging roughly four hours each. The typical workday was 8 hours, with five days a week; yielding an expected task completion rate of ten tasks per week. A work-flow overview of daily activities of a pre-planned sprint are shown in Figure 2.2. It was not explicitly stated in the planning sessions, but it was considered that the time assigned to tasks would also include testing and documentation. The completion of any task (on time or not) results in progress that is either completed and tested or marked for further testing. Any task may also be postponed for whatever reason Sprint Transitions A sprint is considered completed, whether the tasks have been completed or not, when a new week begins. Tasks that are not completed in one sprint will carry over to the next one. As the new sprint begins it starts with a summary of the progress of the completed sprint, where tasks that were not completed may now either be discarded, re-prioritised or postponed even further in the new planning session that followed directly. Figure 2.3 shows how a recently completed sprint is summarised and its results combined into a new planning session. The concept of accepting a weekly progress is related to a measurement of how well the actual progress matched the planned activities. If there were any unplanned interrupts or bugs introduced that consumed a lot of time, this may have led to some tasks not being

20 10 Chapter 2. Method Figure 2.2: The work-flow during a pre-planned workday. Figure 2.3: Work-flow at the end of a sprint. completed as planned. In these cases, the progress may be considered unacceptable, and some tasks may have to carry over to the next sprint. Some tasks may have to be discarded completely or postponed indefinitely.

21 2.3. Evaluation of Existing Software 11 The decision to discard or postpone tasks may in projects involving more than one person may instead be reduced to re-distribution of the workload, allowing other team members to accumulate more tasks to ease the burden of particularly bug-ridden or difficult tasks assigned to specific persons. Of course, the purpose of having daily Scrum meetings should limit the necessity of this, as the progress of each individual and the team as a whole is monitored regularly. This makes it easier to make adaptive changes to regulate the progress with the ultimate ambition to avoid having to resort to discarding or postponing tasks. Short sprints instead of long iterations does not not only offer the possibility to make short-term workload readjustment for particular team members, it also provides natural milestones for both iterations and the project road-map as a whole. 2.3 Evaluation of Existing Software There are several solutions for asynchronous web communication already available on the market. Companies are providing both complete solutions and software frameworks to support web applications with dynamic content. However, many frameworks are custom for a particular use case or application area. This introduces limitations that apply for the size of the application scope, the general cost of production and the choice of different software architecture components. By evaluating the current software, it is not only possible to compare the performance to a proprietary solution but also possible to gain insight on what problems exists and how they can be solved, or worked around. This evaluation will promote feature extension instead of re-iteration of already implemented and wellfunctioning solutions. The motivation for developing a proprietary framework is to promote the possibility for easy continuation of the already existing use case while still being able to extend the system to other applicable areas. Three main evaluation criteria were used in the evaluation of current software solutions: 1. The ability to support as many web clients as possible over asynchronous connections, no matter which platform or browser the client uses. 2. The computational complexity and calculation time of vital software procedures, such as separating the different messaging channels for events. 3. The software architecture, in the context of being able to support different use cases and flows for the application tasks. 2.4 Results Analysis Since the different goals of the project have been split into three separate chapters, the results for each goal will be addressed in respective chapter while an analysis and discussion of the overall system as well as the work for this project are presented in chapter 7. To be able to perform fair comparisons with current software, the implementation will imitate the original behaviour as closely as possible and any differences between the two systems will be thoroughly analysed and documented in what way they differ and what impact that may have on the test results and a future integration of the implemented functionality into the current service.

22 12 Chapter 2. Method 2.5 In-depth Study The in-depth part of the thesis project (see Chapter 5) makes up the theoretical foundation that concludes with the thesis project implementation. Focus was on three aspects of the publish-subscribe pattern and the in-depth study tries to give an overview of how these work and what they may be used for in the current system. The placement of the in-depth study chapter was chosen explicitly to be preceded by the chapter on implementing proposed improvements, as the techniques and methods described in the in-depth review are closely related to those explained in detail in the development of the improvements. 2.6 Documenting and Tracking Results Throughout the whole project, a project diary was used and shared online with the internal supervisor so that he could get a better understanding of the current status of the project. The work for the project diary consisted of noting down the following aspects for each week: What I have done. What kind of experiences I have gained. A status update of the schedule, including a revision when the schedule have not been followed. Who I have been communicating with and in what way. During the thesis project at Dohi Sweden, a project management tool called Pivotal Tracker 1 was used to document the tasks for each sprint and note down what has been done each day and how much time that has taken. It was also used to keep track of the milestones for the project and to generate various backlog information such as burn-down charts and estimations of when milestones and tasks will be done using a calculated project velocity. 2.7 System Orientation Figure 2.4 shows an overview of the current system, including the components of the back end and how they communicate with each other. Three important aspects for this evaluation is the communication between the front end and the back end, the architecture of the back end and the separation of messaging channels for different events. The client-server communication will focus on the link between the front end and the web server. The setup of the overarching system and its components will address most of the components presented in the figure, excluding the static file store and focusing on the way components relate and interact with each other. The separation of events addresses the problem of logically distinguishing between different event data inside the system components as well as on the actual channels of the internal messaging bus. 1

23 2.7. System Orientation 13 Figure 2.4: An overview of the system and its underlying components. The numbers represents three technical aspects that were deemed important for this thesis. They are: 1. the asynchronous communication between a web client and a web server, 2. the back end architecture and 3. the separation of the messaging channels for events.

24 14 Chapter 2. Method

25 Chapter 3 Assessment of Problem Domains This chapter describes an assessment of problems and questions that are relevant for this thesis work. These problems have been grouped into four domains which are represented with four sections in this chapter. The first three domains reflect those that were defined by the goals of this work while the last domain centres around a specific problem that is prevalent in the other domains. Section 3.1 focuses on the problems of creating large scale asynchronous communication over the web. This section addresses underlying communication protocols as well as abstractions of messaging implementations. Section 3.2 is centered around the internal messaging on the back end. Since the solution must be able to scale well there are questions related to how scalable back-end components may synchronise and communicate with each other. Section 3.3 is an assessment of problems for handling large scale data processing on the back end both distributed and as a distinct logical unit. It addresses ways to manage and process data as well as making a clear separation between different use cases. Section 3.4 refers to the problem of ensuring and managing reliable distributed systems. This section is an assessment of how coordination services can be used to handle components of the solution that must be distributed. 3.1 Web Communication The solution that is addressed in this thesis is defined as a web service since the front end of the system is aimed to be incorporated into a web page or any other container that displays and manages web content. This means that the service needs to use some kind of web communication with the clients. Furthermore, since the communication should go both ways, the way a client and the service communicates is of great interest for this study as the HTTP request/response protocol used for web communication is not bidirectional. Because of this, this section includes an assessment of commonly used ways to create asynchronous and bidirectional communication between a web client and a web server. Using web communication between the clients and the service also means that the service needs some kind of web server in the front. The service may also be used as a some kind host for web content which also motivates a web server. This section includes a small description 15

26 16 Chapter 3. Assessment of Problem Domains of a number of web servers and a summary, based on the studied web servers, of found properties that is of importance for this kind of solution Communication Techniques This section describes a comprehensive list of available techniques, in form of methods and protocols, used to create asynchronous connections between a web server and a web client. This list is aimed to document how these methods and protocols are used and in what way they differ from each other. An important aspect that is addressed for these techniques is the ability to conform to a standard for web communication. This includes, for example, the way a technique depends on the client s web browser, if a protocol uses a common port or if and in what way a technique uses the HTTP request/response protocol. Three terms that relate heavily to this text is Ajax, Comet (Reverse Ajax) and XML- HttpRequest (XHR). Ajax is a general term that refers to a set of client-side techniques used to make it possible for a web client to send data to and retrieve data from a web server without reloading the whole page. Comet or Reverse Ajax is a general term that refers to a set of techniques used to make it possible for a web server to push data to a web client. Comet capabilities is usually achieved by using HTTP streaming or by using Ajax with long polling. XHR is an widely used API that a web client can use inside a web page, using scripting languages, to communicate with a web server using the HTTP request/response protocol without loading a new web page. Long Polling Long polling is an HTTP request/response protocol technique used by a client to poll a web server for new content where the server waits with the HTTP response if it does not have any data to send to the client. When the server has new data it uses the connection it is still holding to push the data to the client. After the client receives a response it sends another request to the server. This makes it so that the server always has a connection to the client it can use to send data. Long polling is widely supported by web browsers since it is a technique that has been around for very long. This also means that there are a lot of solutions and implementations available that uses this technique so it becomes easier to find a solution that fits into a more specific use case. This technique is an improvement of regular polling since it has a high probability to reduce the polling or message exchange frequency and it enables servers to respond immediately when it has new data to send. When many messages are being sent between the server and the client there is no real performance improvement from regular polling[32]. As the server needs to hold a connection open when it does not have anything to send it creates idle processes which take up some resources on the server[41]. Long polling is usually enabled by loading some kind of library onto the client which could add some delay when accessing a page. HTTP Streaming HTTP streaming or HTTP server push is a technique for sending data from a web server to a web client by making the web server hold the HTTP connection open after a response has been sent to the client. By holding the connection open the server can push data to the client whenever the server gets new data to send. HTTP streaming can make use of XHR to handle ingoing and outgoing events or use page streaming where the server keeps

27 3.1. Web Communication 17 the client in the loading state and send script tags which gets executed by the client as soon as they are received. Since a connection is kept open, there is not as much overhead from sending multiple messages as long polling since there is no need to make an HTTP request/response for every message. This gives greater performance when exchanging a lot of messages between a server and a client compared to long polling[14]. WebSocket WebSocket consists of a protocol and an API that provides full-duplex communication using TCP over port 80 and is attained with an upgrade handshake from HTTP[59]. This means that web servers can communicate with web browsers asynchronously by both letting the browser send messages to the server and by letting the server send messages to the browser. Since the connection uses the standard HTTP ports for the web there is no need to open up any additional ports to allow WebSocket communication. What is needed is a web server and a web browser that supports WebSocket. A WebSocket connection can send both UTF-8 encoded text and binary data. Full duplex communication gives increased efficiency over long polling and HTTP streaming because of less overhead with the messages exchanged[49]. With the use of TCP and the ability to send message in binary, WebSocket can be used to serve any type of data that a regular program can. WebSocket is supported by a lot of browsers but it is still a fairly new technique and its API has still not yet been standardised so there are still some older and used versions of web browsers that do not fully support WebSocket[5]. It may also be more difficult to find and make use of software architecture solutions such as load balancing since they also need to support WebSocket[26]. WebSocket has its own API which lets developers focus less on the message exchange handling as they have a unified way of communicating and do not need to learn multiple ways, compare different methods, etc. Bi-directional Streams Over Synchronous HTTP (BOSH) BOSH is a transport protocol that uses pairs of HTTP requests and responses to emulate bi-directional communication. It uses long polling to ensure that the server always has a connection it can send messages to the client. To let the client communicate with the server it uses an additional HTTP request aside from the long polling request. BOSH claims to have a very low latency for a protocol that only uses regular HTTP communication[17]. This makes it a very viable alternative for WebSocket as it enables messages to be sent both ways, with methods that may work when WebSocket do not. BOSH uses a JavaScript library to handle messages on the client-side and that library needs to be loaded into the client before BOSH can be used. Web Real-Time Communication (WebRTC) WebRTC is an API designed to make voice calling, video chat and file sharing between browsers possible without the need of a plugin. It is being drafted by the World Wide Web Consortium (W3C) and even though it is not complete it is still partially supported by different browsers[57]. The WebRTC project includes components such as network modules and audio and video codecs to make it easier for the developers to create high quality and real time communication functionality inside the browser. WebRTC uses the Real-time Transport Protocol (RTP) which makes it a better choice for serving content such as audio and video. This also makes the WebRTC into a Peer-

28 18 Chapter 3. Assessment of Problem Domains to-peer (P2P) solution for transferring data both between a web browser and a web server but also between two web browsers which means that communication can be set up to work between two clients without the need of going through a server. This is something WebSocket alone cannot do. WebRTC is still being drafted though and is not as supported by web browsers than WebSocket[57]. WebRTC needs a signalling protocol to set up the communication between two clients which is something WebSocket can provide[10]. Server-Sent Events (SSE) Server-Sent Events is a technology used to provide functionality for sending data from a web server to a web client[56]. It describes how a server can communicate with a client and contains a JavaScript API called EventSource that the client can use to request and handle a connection to a data stream on the server. EventSource is being standardised as part of HTML5 by the W3C. SSE is unidirectional and does only provide a way for letting a server push messages to a client and has, bacause of this, not as wide of a use case as WebSocket and other bidirectional communication methods. SSE can be emulated by using JavaScript which makes it possible to support older browser that do not support SSE natively[38]. Browser Plugins There are numerous browser plugins that can be used to enable additional communication techniques between a web browser and a web server. One possible push technique is to make use of a one pixel large Adobe Flash movie to establish an additional connection to the server. This hides the movie from the user while providing the extra communication link from the Flash movie. The extra connection is used as a regular TCP connection without the need of HTTP requests or responses. SilverLight is an application framework from Microsoft that aims to fulfill a similar purpose as Adobe Flash. SilverLight can also be incorporated into a web browser as a plugin and provides different ways for a web server and a browser to communicate. ActiveX is another alternative that gives extra control of the browser. These methods use other or additional types of protocols than the HTTP request/response and needs extra software and explicit support in the browser which makes them a secondary solution for this solution. Using these types of browser controllers have introduced many extra security problems and many of them are widely known for their exploitations by malicious web applications Web Servers This section is an assessment of web servers that can be used as the party that explicitly communicates with the clients of the service. It focuses on asynchronous, event-driven web servers that can scale and perform well when handling many concurrent clients. A very popular HTTP web server that hosts a significant amount of web sites (63.7% of all active websites[25]) over the world is the Apache web server. This server was not included in this assessment since it is not an event-driven web server. Event-driven web servers sees handling all network connections as a single task which is solved by one (most common) or multiple threads in a loop, iterating over the connections and reading from them and writing to them asynchronously. This is different from the more traditional way of creating a separate thread for each connection. The main drawbacks of using the traditional method is its scalability as creating a thread for each connection creates a lot more memory and thread management overhead.

29 3.1. Web Communication 19 Nginx Nginx is an asynchronous event-driven web server that can also function as a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols. It was found to be the second most used web server for all active sites according to Netcraft s December 2012 Web Server Survey[15] and is used by, for example, Netflix and Wordpress. It is designed to be a very lightweight, scalable and high performance web server and is widely used for load balancing as a front-end server. When using Nginx as web server, inactive HTTP keep-alive connections can be handled with only 2.5MB of memory[29]. Tornado Tornado is an open source solution for a scalable, non-blocking web server that also provides an application framework written in Python. It contains a set of modules that includes a WebSocket module and has numerous web application implementations built on top of it such as Socket.IO 1 and SockJS 2 as well as many help libraries and frameworks[53]. This makes it a good choice when choosing a web server that can support other, perhaps already used, frameworks and libraries. Including libraries in the Python framework might not be as easy as Tornado needs asynchronous libraries to be able to fully use the non-blocking functionality. Jetty Jetty is an open source, HTTP server, HTTP client and Servlet container based on Java and is used in many products such as Apache Maven, Google App Engine, Eclipse and Hadoop. It supports different embedded solutions, frameworks and clusters. It supports WebSocket, SPDY and the Java Servlet API[19]. Jetty features a very extensive and pluggable functionality which makes it easier to customize and optimize for different use cases. Mongrel2 Mongrel2 is an open source web server that supports HTTP, Flash, WebSocket and long polling communication[28]. It supports 13 different programming languages and can be built on many different platforms. It is built on ZeroMQ 3 for concurrency and is therefore perhaps easier than others to manage when scaling. The ability to implement components or functionality using different languages makes it a very versatile server solution that can handle many types of use cases. Lighttpd Lighttpd, pronounced Lighty, is a web server that was designed to be very lightweight and handle parallel connections on a single web server[24]. It is a very secure and flexible web server that has a very low memory footprint. It is similar to Nginx but is not considered to be as stable and easy to use as Nginx but it does feature a full-blown bug tracking system[63] which is of great use during development

30 20 Chapter 3. Assessment of Problem Domains Node.js Node.js or Node is a system for deploying fast and scalable network applications written in JavaScript. Node uses event-driven, asynchronous I/O to handle communication and the system can be used to create a web server written in JavaScript. Node has a fairly high amount of modules that can provide additional functionality from the Node core such as implementations of networking techniques, authentication and testing[31]. Event-driven asynchronous I/O generates less overhead than the OS threaded solution and as Node is not threaded it does not have any dead-locking possibilities. This makes Node more viable for the development of scalable web solutions as developers do not have to spend as much time managing a threaded environment[48]. Applications for Node is written in JavaScript which means that a web client and a web server can use the same language. This will make it easier to make web applications as there is no need for interpretation between two languages. It is also much easier to make test cases incorporating both sides. JavaScript is a dynamic language where it is possible to compile code into other languages such as C or Java. Vert.x Vert.x is a Java-based application framework which, similar to Node.js, uses event-driven I/O and an asynchronous programming model to provide concurrent applications[55]. Vert.x runs on a Java Virtual Machine (JVM) and supports a fair amount of programming languages such as Ruby, Java, Groovy, JavaScript and Python. One difference from Node.js is that Vert.x can use multiple threads on a single Vert.x instance and does not need to add more instances to fully saturate a multicore server solution. Vert.x does also support different communication patterns natively such as the publish/subscribe pattern. EventMachine EventMachine is a software system for the Ruby programming language and provides eventdriven I/O. It is designed for concurrent programming where scalability and performance is of concern. EventMachine has been around for many years and is a mature and well-tested system compared to Node.js and Vert.x who have been around for much less time[12]. Applications for EventMachine are written in Ruby which makes it a less language agnostic system compared to, for example, Vert.x and web solutions needs to be written using different languages on the front end and the back end. EventMachine is also restricted to using EventMachine libraries and cannot ensure that other types of Ruby functionality works. Twisted Twisted is a programming framework for event-driven networking written in Python and supports many different protocols such as TCP, UDP, SSL/TLS, IP, HTTP, CMPP, SSH, IRC and FTP[54]. Twisted is, like EventMachine, a more mature solution than Node.js and Vert.x and has a lot of additional functionality. Like the Tornado web server, Twisted cannot directly use any Python library because of the asynchronous nature of Twisted and the synchronous nature of regular Python languages. Twisted might have a steeper learning curve than, for example, Node.js because of its additional functionality it supports and perhaps needs more code and time to set up an application with equivalent capabilities as an application for Node.js.

31 3.1. Web Communication 21 Summary When looking at event-driven web servers for asynchronous communication there exists some properties that are used to motivate a specific web server. The programming language that a web server supports is naturally of importance since a certain use case may better fit a specific language based on factors such as the functionality of the use case and the experience of the developer. Supporting multiple languages minimizes the risk of a developer choosing another web server because of the language. Node.js has gained a fair amount of popularity lately because of the use of JavaScript on the server-side, which enables developers to create web sites that use the same, widely known, language on both sides. Two aspects that are commonly used as low-level performance indicators of a web server are the memory footprint and the message throughput when scaling the number of connected clients. The memory footprint refers to the size of memory a web server needs to handle a certain amount of clients and using an event-driven web server this can be held to a very small amount. Message throughput refers to the number of messages a web server can send and receive for a specific time frame and a specific amount of clients. Both these aspects relate to the overhead of managing multiple connected clients which is one of the main attractions of event-driven servers. They are important as one of the prerequisite of the targeted solution is to be able to scale well and handle many clients concurrently. The number of supported libraries and modules for a web server are important as it may help abstract parts so that the developer can focus on the core functionality of the solution. Support for general and well-known libraries are also a factor as previous experience can then be used. A prevalent problem with the assessed web servers are the disparity between the asynchronous framework of the web server and the common libraries in the same language which the web server uses. Most of these libraries are not made for asynchronous tasks and may be difficult to assert whether they are reliable for use with the specific web server. One way the listed web servers can differ is their take on multi-threading focused on scalability and manageability. Node, for example, uses only one thread to handle client connections and instead pushes the responsibility to make full use of a processor to the developer where the idea is to use multiple instances of Node. This makes a Node instance much more easier to implement since developers do not have to worry about multi-threading[48] while the scalability of a Node on a single processing unit needs more work outside of the Node instance implementation Web Application Libraries One important aspect when creating a messaging system is the use of abstraction in the networking layer of an implementation. Incorporating a library to use as a higher level interface for communication between a client and a server makes it easier for the developer to focus on the core functionality. Socket.IO and SockJS are two JavaScript libraries that provide the use of multiple communication techniques using fallbacks where a second technique could be used if a first one fails. Socket.IO can be used as a wrapper for WebSocket with fallbacks to other techniques, such as Flash and Ajax long polling. SockJS core does not support Flash but provides different streaming protocols instead. Atmosphere is a Java framework that also contains components for creating asynchronous web application and has support for WebSocket, Server-side Events, long polling, HTTP Streaming and JSONP[2]. One of the benefits of using JavaScript on the server-side is that the same language could be used on both sides. Socket.IO contains two parts: a web-server application that is supposed to run on Node and a client-side API that can be run in a web browser. This

32 22 Chapter 3. Assessment of Problem Domains makes it possible to use the same solution on both sides which minimizes the overhead of using two different programming environments. SockJS tries to make its API as similar to the regular WebSocket API as possible so that there is no need to learn and conform to another API. The JavaScript version of SockJS is made for Node but it currently exists implementations for other languages and server solutions such as Tornado, Netty and Twisted. Socket.IO also has a version for Tornado. Providing an API for multiple server solutions makes it easier for developers to incorporate SockJS functionality using a web server and a language they are familiar with. The Channel API is an API for Google App Engine (GAE) that enables messaging capabilities between applications written for Google App Engine and web clients[58]. Google App Engine is a cloud computing platform where applications can be deployed and served across multiple servers without the need to manually supervise the infrastructure of the application as components such as web servers are provided and managed by GAE using virtualization. The Channel API is a way to marshal the different communication techniques and instead provide an API that is easier to use when implementing messaging functionality such as server to client publishing. 3.2 Internal Messaging The web service that is studied in this thesis is supposed to serve more than concurrent connections and must keep the client-side latency to a minimum. One important aspect of a web service that should both scale and perform well is the internal synchronisation and communication between back-end components. It is apparent that it is very important to choose a messaging solution that properly fits the targeted solution. This section describes some different messaging paradigms to get a better understanding how these work and when they should be applied Messaging Paradigms The messaging paradigms that are described are remote procedure call (RPC), distributed shared memory (DSM), message queueing and publish-subscribe. These paradigms differ in many ways and have different uses and this study tries to assess when and how these are applied. The entities in a messaging system are usually called the message sender, the message recipient (receiver) and the messenger (broker). A messenger or broker is an intermediate component that makes sure that a message is routed from a sender to a recipient. Remote Procedure Call (RPC) Remote procedure calls or RPC refers to when a software invokes a procedure in another program environment, commonly over a network[3]. This is usually implemented in a way so that there is no large difference between coding local functionality and issuing remote invocations. RPC tries to make remote processing transparent by treating a distributed system as a single process. RPC is used to minimize the impedance for accessing local and remote functionality and to reduce the complexity of using a full-fledged messaging system and interface. Using RPC is problematic when addressing the difficulties of distributed messaging as remote procedure calls faces different problems than local calls such as partial failures, and different memory access models in communicating parties. One of the disadvantages

33 3.2. Internal Messaging 23 of RPC is that there currently does not exists a general solution which has proven RPC a fitting solution for large scale communication over a wide area. Distributed Shared Memory (DSM) Distributed Shared Memory or DSM refers to the solution where a virtual address space is shared among loosely coupled processes, often over a network[30]. A process views the shared memory as regular local memory and moves the issue of scaling a distributed solution to mapping a memory access request to a space in shared memory. This method gives a very simple abstraction as implementations using a distributed shared memory do not have to worry about moving data. This also simplifies process migration as different machines all still use the same memory. The DSM provides time and space decoupling as processes using a DSM does not know about each other but does not provide synchronisation decoupling. A DSM is generally not as used for typical client-server models where resources need to be viewed and accessed in an abstract manner because of security and modularity issues[9]. Message Queueing Message queueing enables buffering of a sent message for later retrieval and decouples the sender and the receiver in respect to space and time[13]. This is commonly done by introducing a messenger between a sender and a receiver that first enqueues sent messages and then sends them to the receiver when the receiver expresses its interest in it. This paradigm often provides transactional, timing and ordering guarantees for the messages that are sent. It is used to provide greater resilience between communicating processes and is an appropriate paradigm where there exists a discrepancy between the number of messages being sent to a process and the rate of which the process can receive and process messages. Message queues do not provide synchronization decoupling with well-supported scalability as consumers synchronously pull messages from the queue. Publish-subscribe The publish-subscribe pattern decouples the communicating processes with regards to space, time and synchronisation[13]. Receiving processes address their interest in some kind of content and sending processes address their interest in sending some kind of content. The implemented publish-subscribe system is then responsible to route sent messages to processes that have expressed interest in the content that the message consists of. This is a very similar paradigm to message queueing as they decouple the processing in the same manner. The publish-subscribe pattern is more appropriate when there exists a need of more intricate routing messages than a one way communication path. This paradigm also enables a higher scalability as it can make better use of the underlying network architecture and process topology Qualities of Service (QoS) This section addresses three qualities of service that may be of importance when implementing a messaging system: persistence, reliability and availability. Persistence Persistence is a quality of service that is desirable in a messaging system when the sender and receiver are decoupled in respect to time and there exists a situation where the sender cannot

34 24 Chapter 3. Assessment of Problem Domains directly be guaranteed that the message has been delivered to the receiver[13]. Persistence often means storing messages on a hard drive or flash memory so that messages are not lost between the sender and the receiver if, for example, a messenger would crash. This quality of service may degrade the performance of a system since messages cannot only be stored, for example, in RAM but must also be put on some persistent, often much slower, storage. This is of concern when facing a system that should be able to perform well while scaling as distributing more messengers may make the system more complex when trying to guarantee the delivery of a message in a path of a larger messenger architecture. Reliability Reliability is another quality of service related to aspects of guaranteeing the delivery of a message[13]. Reliability addresses end-to-end delivery guarantees with different properties such as using reliable protocols, persistent messages and message ordering. This is an important quality of a messaging service when using more loosely coupled messaging system and the end-to-end participants are not using point-to-point communication. One problem with distributed systems is the fault tolerance of the distributed architecture and ensuring messages are delivered in case of a messenger crash. Availability Availability refers to the portion of a time interval a service can ensure it operability. Different systems may have different definitions of operability where, for example in a multi-server architecture, the system may still be considered operational as long as one server is still functioning. This is not as much of a messaging aspect as it is a more general quality of service for distributed systems[13]. When implementing a solution that gives the internal messaging a central role it is important that the messaging system is robust enough to guarantee a functional state for as long as much. An example of increasing a messaging system s robustness and availability is by using replication of messengers so that one messenger could pick up a delivery should another crash or somehow fail. This is on the expense of increased resource management and cost as well possible degradation of performance Messaging Systems The following parts of this sections describe a couple of messaging systems that, as of today, have some fairly large support and use. This section tries to highlight the main use cases of the messaging systems as well as describe in what way they are used and differ from each other. Kafka Kafka is a distributed publish-subscribe messaging system that is written in Scala and was made for managing activity stream processing on a website[22]. It is an explicitly distributed broker system that was developed for LinkedIn and is currently an open-sourced project by Apache. Kafka is designed for persistent messages and focuses on throughput instead of features. It offers a JVM-based client and a primitive Python client but supports any language that makes use of standard socket I/O. Kafka can use different data structures where some support transactional semantics - this can reduce performance though because of the disk operations Kafka uses for persistency.

35 3.2. Internal Messaging 25 Kafka is a fairly recent messaging system which do not use AMQP 4 and does not offer as much functionality or complex routing because of its focus on throughput instead of features. Kafka has a low overhead in network transferring and on-disk storage which gives it good performance [36]. It uses a very lightweight API that is simple to use and together with its distributed architecture it performs very well when scaling out. As Kafka uses its own non-amqp protocol it is more difficult to replace. RabbitMQ RabbitMQ is a widely used open-sourced messaging system that implements a broker architecture and uses AMQP[35]. It supports HTTP, STOMP and MQTT and provides client libraries in many different languages such as Java, Python and Erlang. The rabbitmq brokers use Erlang to pass messages around. A RabbitMQ messaging solution can be distributed in three different ways, clustering, federation and shovel. Clustering refers to when multiple brokers are grouped and act as a single logical broker. Federation is when brokers connect to each other, typically over the Internet, to exchange and share messages so that data is mirrored between them. Shovel refers to when brokers are connected and instead of mirroring their data they forward it to others. RabbitMQ contains solutions for high availability and fail-overs using many different techniques such as clustering and mirroring data. RabbitMQ is one of the most used solutions for AMQP messaging systems and has been around for some time. Because of this, RabbitMQ has a lot of libraries, tools and documentation which makes it a good choice when looking for a reliable messaging system. RabbitMQ is not as focused on performance as Kafka and instead provides additional functionality that Kafka does not. It features more different routing capabilities and is highly configurable. Qpid Qpid is an open-sourced messaging system from Apache that, similar to RabbitMQ, implements AMQP[34]. It provides brokers in Java and C++ and many client libraries such as Java, C++, Python and Ruby. Qpid contains many features for managing transactions, queueing messages, and distributing brokers. It is very configurable and uses a pluggable layer to provide additional functionality such as persistency and automatic client fail-over. Qpid does not feature as much functionality for a distributed solution as Kafka and RabbitMQ and clustering is only used to mirror all messages between brokers. Qpid is, similar to RabbitMQ, highly configurable and uses AMQP which makes it a good choice when replacing some other messaging system. Qpid is also very similar to JMS and provides a JMS API as well which may make it easier to implement when already using JMS. ActiveMQ ActiveMQ[43] is an open-sourced message broker from Apache that implements JMS. ActiveMQ supports many languages such as Java, C/C++, Perl, Python and Ruby and a wide variety of protocols including AMQP. ActiveMQ supports a lot of clustering options such as a shared database, a shared file system or a master-slave set up. It also supports 4 Advanced Message Queuing Protocol, an open standard application layer protocol for message-oriented middleware.

36 26 Chapter 3. Assessment of Problem Domains four different message stores that can be used for messages persistency. ActiveMQ can be deployed using peer-to-peer topologies instead of a broker architecture. ActiveMQ is very highly configurable and contains a lot of additional functionality for setting up any type of messaging system. One disadvantage with ActiveMQ is its performance compared to other systems, it has been proven to be slower than other systems and its performance when message stores is used is very low. Scaling an ActiveMQ broker system has also been showing some inconsistencies in terms of performance and stability. 3.3 Data Processing This section describes two data processing models, namely batch data processing and stream data processing. Instead of taking a general approach to these models, this section focuses on an implementation for both models, named Hadoop and Storm. These two processing models are currently the most popular for their problem domain and this section addresses how these solutions work and are used Batch Data Processing Batch data processing refers to the processing of very large collections of data that is mostly already gathered at the start of the procedure[44]. It uses a cluster of worker nodes where the work is divided between worker nodes that cooperates to compute a result. This is typically used when there exists some high amount of data that is supposed to be analyzed and reduced to some quantifiable result such as statistics from a database or a log file. MapReduce[8] is a programming model for batch data processing that uses a master node and a number of worker nodes. It divides the procedure into three steps: master maps input to sub problems and specific worker nodes, worker nodes compute answers for each problem and then the master collects and combines the answers into a result. The first and the third step can also be distributed as long as the different input data and the different answers from the worker nodes has an appropriate level of independence. MapReduce is also the name of an implementation of this type of batch processing made by Google. Hadoop Hadoop[18] is a software framework that enables distributed processing by using a programming model that was derived from Google s MapReduce. It is open-source and licensed under the Apache v2 license. It is made to provide a data-intensive processing solution that can be run on commodity hardware and instead of relying on hardware it focuses on delivering a high-availability system that can detect and handle failures at application level. To be able to handle large files with sequential read/write operations, Hadoop makes use of its own distributed file system called Hadoop Distributed File System (HDFS) derived from the Google File System (GFS). HDFS splits a file into multiple chunks and distributes them on multiple data nodes[4]. A master node is used to keep track of files that has been distributed over the data nodes and can tell where a certain chunk of a file is located. The master node also supervises writes to a file on the HDFS. This is done by pushing the changes from the client to all relevant data nodes and then using the master node to designate one of the data nodes as primary and send a commit to this node. The primary node decides on an order to update, synchronises and updates together with all other data nodes and then sends a commit response back to the client.

37 3.3. Data Processing 27 Hadoop uses a job tracker to submit processing jobs[60] which takes a job configuration that describes a map, a combine and a reduce function together with the way input and output should be managed. The job tracker delegates the work in the map phase to the worker nodes which are also called task trackers. Each task tracker is responsible of extracting and applying the map function on their own part of the input. Applying the map function on the input will result in a number of key/value pairs which are stored in the memory buffer which are periodically sorted using the combine function to to group resulting data with respect to the corresponding task trackers that will perform the reduce operation. When a task tracker is done it notifies the job tracker and when all are done the job tracker will start the reduce phase. A task tracker in the reduce phase will then apply the reduce function on its reduce data and output the results. The job tracker will periodically check the health of the task trackers to see, for example, if a tracker has crashed, in which the job tracker will issue the map or reduce work on another task tracker Stream Data Processing Stream data processing refers to data-intensive processing where, in contrast to batch data processing, all data is not known at the beginning of the procedure. This model focuses on use cases where data is continuously being generated and instead of computing a large set of static data, data is computed as it is introduced to the system[27]. This can be seen as a complement to batch data processing as stream data processing focuses on small batches of rapid data that must be processed in real-time whereas batch processing focuses on a set of data as a whole and typically expects some delay between reading input and returning a result. Storm Storm is an open-source software for distributed real-time data processing[46]. It focuses on computation of smaller data sets that are continuously introduced to the system and can be used for real-time analytics and online machine learning among others. Its inherent parallelism gives it very high performance in terms of message throughput and it has been benchmarked at processing one million 100 byte messages per second and node on two Intel E5645 processors at 2.4Ghz with 24GB of memory[47]. In contrast to Hadoop, Storm is not expected to terminate and instead be up and running for as long as it is allowed to. Storm uses a topology of sprouts and bolts in an acyclic directed graph where sprouts are processing nodes that introduce data and bolts are the execution points in the system[50]. The sprouts are the source from which data is either generated or fetched from somewhere else. A sprout can be connected to any number of bolts where it can send its data to. Bolts are the computational units that can receive data from any number of sprouts and bolts and transmit data through to any number of bolts. Except for defining the sprouts and bolts that are needed for a system, a topology also needs to be specified that describes how the sprouts and bolts should be connected. Thrift[52] is a software framework for scalable services development that supports a wide range of programming languages. It is used for data serialisation with focus on efficiency and seamless use between languages. Storm uses Thrift to define topologies and a JSON-based protocol over stdin/stdout for the communication between sprouts and bolts. This makes Storm more easy to use as its functionality can be implemented in many different languages. Storm consists of a master node that runs a daemon called Nimbus and a number of worker nodes were each worker node runs a daemon called supervisor. Nimbus features similar responsibilities as Hadoop s job tracker where tasks are distributed among worker

38 28 Chapter 3. Assessment of Problem Domains nodes and the health of worker nodes are monitored[21]. A supervisor waits for tasks to be assigned by the Nimbus and is responsible for managing the compute processes on the node. The interaction between the Nimubs and the supervisors of Storm is done through Zookeeper, see section 3.4 about zookeeper. This separates the coordination of the Storm cluster from the Storm functionality and creates a more distinct definition of what Storm exclusively does. 3.4 Distributed Coordination with Zookeeper This section focuses on an intermediate service that have been prevalent throughout the whole assessment for this chapter, namely distributed coordination with Zookeeper. This section tries to describe the fundamentals of using Zookeeper in a distributed system. Zookeeper is a centralised coordination service for distributed solutions that is intended for high performance[16]. It tries to take over the responsibilities of distributing an application from the application itself to improve manageability and make an application easier to implement. It provides services for maintaining configuration information, synchronising distributed processes and managing groups in a distributed application. It enables consensus, election service, synchronisation and a naming registry among many other distributed functionalities. Zookeeper was in the beginning made for Hadoop before it was turned into a stand-alone project by Apache. Hadoop used Zookeeper to coordinate the nodes of Hadoop so that distribution functionality could be offloaded from an application for Hadoop to Zookeeper. Zookeeper is used by Storm[21] to coordinate the master and the worker nodes to make it easier to deploy and manage a Storm cluster. It is also used by Kafka to manage a cluster of brokers. ZooKeeper uses a shared hierarchical name space of data registers called znodes to coordinate distributed processes[62]. These znodes are provided with high throughput, low latency, high availability and strictly ordered access. The name space is very similar to a regular file system where every znode is identified by a sequence of path elements separated by a slash (/). This file system was made to store smaller data of coordination information and is not meant to be used as a large data store. The data tree is also stored in-memory which further limits the service to use small data sets. The service is replicated over a set of machines where a client may connect to any of these replicas. One of the replicas is assigned as a master which controls the way data can be written to the file system to make writes guaranteed to be persisted in-order. It is also responsible for updating the other replicas with the new data. Reads are concurrent and as any replica can be used by a client, reads are eventual consistent. This makes Zookeeper less appropriate when facing a very write-heavy workload as guaranteeing linear writes may take some time.

39 Chapter 4 Evaluation of Reference System This chapter describes an evaluation of the current solution that this thesis aims to improve. It tries to explain how the provided service is used in section 4.1 so that later suggested improvements can be more easily understood. Section 4.2 contains an overview of the solution including how the system components are implemented and how they interact with each other. Section 4.3 includes a discussion about the motivation and drawbacks of the different components focused on the domains from chapter 3. The last section lists identified problems or issues revolving the immediate goal of this thesis, namely to improve the solution with regards to making it a more distinct product. 4.1 Use Case The system is currently used to provide voting functionality on a website either for rating subjects or for answering categorical questions. The functionality is provided during events where the customer wants to ask questions related to the event. There are two ways a user may interact with the system, either as a regular client that can vote on questions and view results or as an administrator that can create and supervise questions. These two types of interactions are completely separate from each other on the client-side and their functionality are accessed through completely different web pages. The back end is responsible for receiving votes, processing them and return a result back to the clients. This is done using an asynchronous connection between the web client on the front end and the web server on the back end. The admin interface uses the back end in a RESTful[23] way to update questions and to control which one is active. A client gets exposed to the service by visiting a web page that has incorporated the front end of the service into its context which is currently done using an Iframe, see Figure 4.1. When a question is started, the back end notifies the front end by sending a message in which the front end will then update its content so that the client then can vote. When a client votes, the front end sends the vote to the back end that processes votes and periodically sends current result back to all clients. The result basically consists of the current rating for a rating question or the number of votes for each choice for a categorical question. The way an administrator gets access to the system is also via a web page but the content is provided directly as a standalone page and not in an Iframe. 29

40 30 Chapter 4. Evaluation of Reference System Figure 4.1: A representation of how a client currently is exposed to the service. It is exposed to the service on a web page where the static content, such as HTML and CSS, is downloaded from a static file store. This static content also contains logic for communicating with the web server that contains the voting functionality. 4.2 System Overview This section describes the overall architecture of the current solution and includes a list of addressed components describing what they do, how they are implemented and how they interact with each other. Front End The front end consists of a web page containing both regular web content as well as functionality to support asynchronous communication with the back end. It could either consist of a page providing the regular use case for the service i.e. voting functionality or it could consist of a page providing administrative functionality. The page for the regular use case is incorporated into another service s web page using an Iframe. The front end communicates with the back end using the API from Socket.IO and JavaScript. Messages are sent as plain text using JavaScript Object Notation (JSON). Load Balancer When a client first connects, it speaks with a load balancer that keeps track of all the running instances of the web servers and routes the client to one of them. The load balancer only routes the client when it is first connecting, after the client has been assigned a web server, the client speaks directly to the web server. The load balancer makes sure that a previously connected client always gets the same web server to talk to when it is reconnecting to the

41 4.2. System Overview 31 Figure 4.2: Overarching components in the reference solution. A web client fetches the static content from a static file store, which also contains the logic to communicate with the web server that contains the voting functionality. A web client connects to a web server using a load balancer. Any votes are put on a processing queue so that votes are bufferd to a data processing component that calculates a result and returns it using a notification system. The service also uses a database to store the global state of all events. service. This is done using the load balancer HAProxy on an AWS Elastic Compute Cloud (EC2) instance. Web Server The web server consists of a cluster of AWS EC2 instances that are running Tornadio2 which is a python implementation of Socket.IO on top of the Tornado web server framework. Socket.IO is used to provide many ways of communicating asynchronously with a client with

42 32 Chapter 4. Evaluation of Reference System fallbacks when one does not work because of client limitations, such as web browser support for the current techniques. The server instances are horizontally scaled up or down using AWS Auto Scaling depending on use statistics. Static File Store The service uses a file store to host the static web content that is fetched by connecting clients. The file store functions as a regular web server for static web pages and a client fetches the page directly from the file store using point-to-point file transfer without talking to the web servers of the back end. The currently used file store is the key/value store Simple Storage Service or S3 from AWS which can also function as a web server. Client Validation Cache The web servers uses a remote cache to store client validation information. The AWS ElastiCache is used to store this data outside of a specific web server. When a client connects to a web server it first checks with the validation cache to see if there are already some data on that client. Data Processing Queue When a web server receives a vote from a client it puts the message on a queue that connects to the data processing system. This is used to buffer the messages that is sent to the processing system and decouple space and time between the web servers and the processing system. The current system uses AWS Simple Queue Service or SQS to provide this functionality. Data Processing The processing of messages that are sent in from the clients are offloaded on a stream data processing system called Storm. This component receives votes from the processing queue, computes a result and sends it to all of the web servers using a notification system. Storm is also responsible for storing the result and other statistics in a database. The results that are computed are typically the mean value for a voting question or the vote count for a categorical question. Storm is deployed as a cluster on a number of EC2 instances. Storm uses a spout for the interaction with the Simple Queueing Service (SQS) which listens for and receives messages from the queue. It uses a bolt for the Simple Notification Service (SNS) to send results to the notification system and a bolt for the Relational Database Service (RDS) to update the database. Notification System The notification system is responsible for relaying messages between web servers and the data processing component. When Storm computes a new result it transmits the result to all web servers using the notification system. The notification system is also used by the administrators of the service to relay when a question has been started or stopped to the other web servers that the administrator is not connected to. AWS Simple Notification Service or SNS is used as the notification system.

43 4.3. Discussion 33 Database The reference system uses a database to store the current state as well as log statistics and other data. When a web server are added to the back end cluster, the web server fetches the current state from the database and transmits it to the client. The current solution uses AWS Relational Database Service or RDS as the service s database. 4.3 Discussion This section focuses on the reference project in the context of the previously assessed domains, the web communication between front end and back end, the internal messaging between back-end components and the data processing on the back end. It includes a discussion for each domain in how the current solution handles the problems relevant to the domains. Web Communication The service is using the Socket.IO API on both the front end and the back end to communicate between them. Using a higher level interface makes it easier for developing the service as abstracting non-core functionality lets developers focus on the use case specific parts. Socket.IO abstracts the specific communication protocols and also supports fallbacks to other protocols when one fails. This is a fast way of enabling support for as many different types of clients as possible. Using the same type of API on both sides minimizes the overhead in time needed to create two frameworks using different APIs. As for using Python and the Tornado Web Server on the server-side it has been proving to be a very fast solution[33][37]. One drawback is that the two frameworks are not implemented in the same language which may highten the impedance as code cannot be reused between the two frameworks and the developer needs to be familiar with two languages instead of one. The load balancer uses HAProxy on an AWS EC2 instance instead of using AWS own Elastic Load Balancing (ELB). This is done because ELB does not currently support the Websocket protocol using sticky sessions, or server affinity, where a client is routed to the same web server it was previously connected to. The downside is that HAProxy and EC2 requires (more) maintenance and do not provide as many additional services such as auto scaling. ELB would also work well with other components in the system as many are also AWS services. Making clients only talk to the load balancer at the beginning of a connection and then directly speaking to the web server is good from a performance point of view as it eliminates another step between the end points of the communication. The file store that serves the static web content is also only used in the beginning of a client s connection with the service. This means that the file transfers do not go through the web server which can focus on the exchange of messages after the initial part of the connection. This solution does not take any explicit bandwidth or performance from the web servers which makes the load balancing and auto scaling elements simpler to implement. As the static content is only downloaded in the beginning before exchanging messages, the performance for retrieving static content is less of an issue. Using AWS S3 makes it easy to scale the storage needed and it keeps the maintenance down to a minimum.

44 34 Chapter 4. Evaluation of Reference System Internal Messaging The internal messaging in the reference solution is done using two AWS services, namely SQS and SNS. These two services are used decouple space and time between the web servers and the processing system. This makes components more independent from each other and easier to replace as well as to update. Using a queue also offloads responsibility for messaging functionality from the two components and creates components with more distinct logical purposes. SQS and SNS is also used for buffering messages between components so that not all components fall behind when one does. Using a queue for the incoming messages to the processing system makes sense since it may be viewed as a contention point since all (or the majority of) messages received by all web servers are sent to the processing system. All messages that are sent to the processing system are generally of the same type so there is also no need for any more advanced messaging paradigm. No messages need any advanced routing and the processing system does not need to express different types of data it is interested in. SQS needs less maintenance and can scale automatically. It also provides some security, reliable forwarding where SQS guarantees a message will be delivered to at least one recipient. One of the disadvantages with SQS is that it is slower than most of the non-cloud messaging systems and can also be more expensive than a messaging implementation placed on EC2[61]. The current notification system can be used to broadcast messages of a specific topic and listen for notifications of a specific topic. SNS has the same benefits as any Amazon Web Service and provides a simple publish-subscribe implementation with built-in scalability and availability. SNS features a way to be instantly notified when a message of interest is being broadcast which means that participants can listen for notifications without needing to poll the message system for notifications. This reduces overhead in the listening participants as well as network contention. Data Processing The processing of votes is handled using stream data processing where messages or votes are processed as they come in to the system. This should better suit the use case since votes are not all made at the same time and are instead asynchronously sent to the service with no respect to each other in time. A batch data processing model may not be appropriate since results could be updated based on the new data and the previous results and there is no need to look at the data as a whole. A batch data processing system may be more suitable when trying to analyze the statistics that is put in the database by the web servers and the stream data processing system. Storm is run as a cluster on a number of AWS EC2 instances which takes further use of the underlying help from deploying the overall system on the AWS cloud. It uses Zookeeper as a distributed coordinator to synchronize the data between the Nimbus and Supervisor nodes. Offloading the coordination on another system further separates the corefunctionality of the processing system from miscellaneous responsibility. Storm guarantees that every message introduced to a topology of the system will be processed, even if a machine goes down and the messages it was processing get dropped. This together with Zookeeper makes Storm a very reliable distributed solution for stream data processing.

45 4.4. Assessment of Issues Assessment of Issues This section addresses implementation issues of the service focusing on the goal of making a more complete service which Dohi Sweden may provide as a stand-alone product Inseparable Use Case The main concern for this thesis is that the current solution was built for a specific use case and were not meant for any other type of functionality. This limits the service greatly in terms of extendability and modification as there is not a distinct separation between the use case and the underlying system. It also makes it difficult to present what the service can offer apart from the already existing use case since other types of functionality needs to be developed with the entire system, and consequently the current use case, in mind. The service provides basically two types of questions: rating questions and categorical questions and if the client wants any other type of question or functionality, it needs to be implemented by the developers at Dohi Sweden. All components are tightly coupled to the current use case; the web server, for example, handles only messages related to the current question types. This is also true for the data processing component that only processes votes for these two question types. The data model also only stores data specific to the voting functionality Functionality Not Defined When trying to improve an implementation so that the core could be separated from the use case, it is important to know what it should actually be used for. The behaviour of the solution must be generalised and core functionality needs to be identified. Without defining what the system should do, it is hard to extract or focus on the parts that is of importance as well as identify problems that is relevant for this evaluation. This is also a problem for this evaluation as this work also needs to identify what exactly constitutes the core of the system Required Maintenance The system uses a number of cloud services which includes, for instance, implemented components deployed on AWS EC2. These needs to be supervised and maintained by the developers at Dohi and requires that resources are set aside for this. Some of these are also set to scale automatically when facing heavy load or other problems concerning demands on resources. The scaling is not completely automatic and many services need to be shut down between events to minimize the resources used and consequently keep the cost down. This could be more problematic as the time when an event should occur may not be known too long beforehand Single Messaging Channel The system was built focusing on a single event and did not address multiple and concurrent events. Even if the system may handle multiple events it does so only because of the statelessness of system components. Web servers, for example, do not distinguishes between votes from different questions and just forwards all of them to the data processing component. Web servers also forwards questions to all clients which means that if two events were occurring at the same time, all clients would get questions from both of them even if they

46 36 Chapter 4. Evaluation of Reference System are only participating in one of the events. Instead, there exists client-side safety checks to ensure that only questions from the correct event is handled Complete Replica of Global State When a web server is added to the system, it fetches information from all of the events that are present in the system and keeps a local replica of the complete global state. This global state is also transferred to every client even if a client is not interested in the state of all events. This poses scalability issues for the web servers and the web clients as they must store and handle every event that is present in the system. The communication between a client and a web server must also scale with the complete global state as the web server must push everything to the client Intra-dependencies The system uses many services from AWS that fall in the cloud service model Infrastructure as a Service (IaaS). These services have separate APIs and protocols which bind the system to these services and increases intra-dependencies and decreases modularity. While making a development process more simple, the use of these cloud services make it harder to move the system and without any interfaces between the different components of the system, the system is much less versatile in terms of replacing a component and consequently modifying and extending the system Simple Messaging One very central part of the system is the messaging between the participants of the service. This currently does not provide any type of message ordering between web clients and web servers and clients may receive messages in different orders. This introduces a race condition that may lead to divergent behaviour because of different message orders. The client has been given safety checks to ensure that this divergent behaviour is limited and as long as events are run one at a time and are manually supervised it may not be a practical problem but it still poses restrictions for any additional use or development of the service State Synchronisation When a web server is added to the system, the web server first fetches the state of the system from the database and then uses this internally as a local replica of the state. This may lead to an inconsistent state between web servers and consequently between clients as the state could change during synchronisation. This means that when a client connects to an event it can, for example, receive a question that becomes outdated during the establishment of the connection. 4.5 System Definition A definition of what constitutes the system has been made to be able to better explain and motivate suggestions of different implementation changes. This definition includes a description of the core parts that the system consists of as well as the core behaviour of the system. Most of further suggestions about the system are based on or are influenced by this

47 4.6. Proposal of Improvements 37 definition. The definition as well as further suggestions assumes that the already existing use case must also be fully functional in the applicability of any suggestion. One aspect of the system that is prevalent throughout the whole solution is the focus on or use of messages in the system. Apart from the HTTP GET request at the beginning when a users connect to the service, a client relies solely on sending and receiving messages to interact with the back end and uses it to vote on questions and receive the results for questions. The web servers as well as the data processing layer, also use an internal messaging system to communicate with each other. Messaging in the system enables scalability as it further separates different parts in the system and decouples them in terms of space, time and synchronisation. Focusing on the messages in the solution may lead to a better evaluation of the system as a messaging model fits this solution. Except for providing messaging functionality for the client, the system also provides static web content that the user receives when he connects. This content does not necessarily need to be provided directly by the system, it could be separately designed and created by the customer as long as it featured a messaging channel to the service. This definition emphasises the complete independence between the static web content and the messaging functionality but does not exclude the static content from the core functionality that the service should be able to provide. This relates to the assumption that the service must support the already existing use case in its entirety. This does not exclude use cases were only the messaging functionality is desired by the service but puts a requirement on the service to be able to provide both messaging functionality as well as static content. To summarise, the service directly provides two core functionalities for the front end, namely an asynchronous messaging interface for web communication and a regular file store to serve static web content. The front end is a web front end that uses the messaging interface to communicate with the service and may also use the file store to fetch static content such as web pages. The idea of using the file store is to group a front-end view together with the communication layer so that it is not necessary to fetch these using two separate services. Except for the messaging bus and the file store, the back end also requires some additional functionality to support the existing use case. This type of functionality could be thought of as being internal, where the effect of it is exposed on the front end using the messaging functionality. Examples of this kind of functionality in the current solution are the data processing layer and the data model. This type of functionality is not considered being part of the core of the service but they are required to be operable nonetheless for the existing use case. 4.6 Proposal of Improvements This section presents a proposal of improvements, taking the previously described issues in account. Instead of directly solving each problem separately, it focuses on two parts, namely the overall system architecture and the messaging channels for events. This section describes ways to alter these aspects to create a solution that may better solve the listed problems System Architecture Using the previous definition of what constitutes the current service, the alterations of the current software architecture focuses on the messaging system as a central part of the

48 38 Chapter 4. Evaluation of Reference System solution. This section describes how components may be grouped together and how different components may interact with each other. Service Bus One way to view each component of the system is as a separate service that provides some kind of functionality to the system that is not directly tied to any other service or part of the system. This puts a messaging bus in the centre of the architecture which is responsible for providing communication with the necessary services another service needs. This type of software architecture, called service bus architecture, focuses on the complete separation of components or services by interfacing with the central messaging bus and using it to discover and access other services in the system while providing the functionality of a service over the bus to anyone that is interested. Introducing this type of architecture for the current system means that all of the components would be viewed as separate services that only need to communicate directly with the messaging bus and a service would only know about the services that itself need to provide its own functionality. An overview of how this setup would be described can be seen in Figure 4.3. In this architecture, the client validation cache would be a separate service that provides its functionality, store and retrieve client validation tokens, over the messaging bus. Another service would be the front end together with the web server cluster that would represent the user interface of the system, which introduces the input from the user into the system. The use case specific parts of the voting functionality would be a standalone service that connected the user voting input to other services in the system. The database would be another service which provides the functionality of storing to and retrieving data from it. The last service would be the data processing functionality that would provide ways to process retrieved data and publish the result back on the messaging bus. The service bus would create a very decoupled system where additional functionality of the current use case or a new use case could be implemented without any need to modify any non-relevant parts of the system and without any possibility of any non-relevant parts failing or otherwise changing its behaviour because of the introduced functionality. This would also put the definition of the system on a very low level where as good as any functionality could be provided as long as it is provided over the messaging bus. Possible use cases would be those that may use messages to communicate between internal components. This type of architecture would put extra work on the maintenance of the messaging bus and on implementing components that operates only over the messaging bus and completely separate from other services. Defining each service or component would be easier as they would have a very distinct function in the overall solution. This solution decouples the different functionality of the system and creates a more scalable system while introducing a more complex overall solution with a higher end-to-end latency. As components need to use the message bus to communicate, it is not possible to explicitly interact with other components without the message bus interface. This interface poses additional overhead in handling the provided functionality of components as, the database service for example, cannot use their already well-defined interfaces directly without some kind of adapter between the messaging bus and the internal interface. Two-tier Messaging Bus The two-tier messaging bus is another software architecture that uses a messaging bus in the centre. This architecture focuses on the separation of user input and use case specific logic. It defines two types of components, a client handler and a use case handler, which

49 4.6. Proposal of Improvements 39 Figure 4.3: An overview of the service bus architecture describing the way components interact with each other. This solution proposed the use of a message bus which all components uses to communicate with each other. are all connected using a messaging bus. The messaging bus is responsible for forwarding messages between a client handler and a use case handler. It is used to separate input from logic so that other types of input can be added to a use case or other types of use cases, which uses the same type of user input, can be added. An overview of the two-tier architecture, including a layout of the components of the current system, is presented in Figure 4.4. This architecture groups the current functionality into two parts where one part handles the web communication with the front end and the other part handles the voting use case functionality, including processing votes, fetch and update state in database and validate client tokens. These two groups interact with each other by sending messages over a message bus. With this type of solution, it is easier to add other types of clients without modifying any existing code, as the client input functionality is completely separated from any other functionality. Adding another use case to the system is done by adding another use case handler and modifying the client input so that relevant messages are routed to the handler. A use case handler is completely separated from any other use case handler except for an implicit relation because of the possible sharing of a client input handler. Like the service bus architecture, this architecture also requires the different services, or handlers in this

50 40 Chapter 4. Evaluation of Reference System Figure 4.4: An overview of the two-tier architecture presenting the layout of the components of the system. This solution divides the service in two parts that communicate with a message bus in between them. One one side is the client functionality and on the other side is the use case functionality. case, to communicate using the interface of the message bus. This requirement is more relaxed for the two-tier architecture as components inside a use case handler may use their own type of interactions with each other. One drawback of this type of architecture is that if you follow the design strictly, then to keep different use case handlers separated, they cannot share any resources or internal components. This means that, to implement a new use case, a completely new set of internal components would be needed even if some are identical to those of other use case handlers.

51 4.6. Proposal of Improvements 41 This introduces overhead for the extra maintenance that needs to be done when managing multiple logical components for the same or similar functionality Messaging Channels One of the main concerns of this evaluation is the separation of the data that are transmitted for different events. Being able to distinguish data between different events addresses many of the previously listed concerns such as complete state replication in the local memory of all web servers and broadcasting of all messages to all clients no matter if the client is interested in the message or not. This section describes two parts of the system that are of interest when trying to create a distinction between events, namely the logic in the web server and the messaging channels for the internal messaging system. Separation of Logic in Web Servers The web servers of the solution is one of the core components which the overall performance of the system depends on heavily. The web servers cannot currently separate data from different events and treats every client as a client that is interested in every aspect of every event. This could be solved by introducing two layers or levels in the web server where the most basic level, level 1, would address the separation of events and level 2 would consists of the use case specific logic for an event, see Figure 4.5. Defining an event as the messaging channel that it needs to communicate with the client and the other back end components, would effectively mean that the separation of events does also separate the different use cases that could be used by the system. Level 1 would be able to directly communicate with the connected clients and, by seeing which events each client is interested in, would group clients and expose them together with functionality to communicate with them to the corresponding use case logic on level 2. When a message is sent from a client, the level 1 of the web server would then forward the message to corresponding event logic on the second level. This would solve the problem of broadcasting every message to everyone as the event logic would be provided means to communicate with exactly those that are interested in the corresponding event. The state of a use case or an event would be stored and managed by the corresponding event logic module on level 2. This divides the local state of a web server into smaller, separate parts which relate directly to the event logic modules that are present on the web server. By choosing which event modules that should be present on a web server, it limits the local state to only keep the state of those events that are included in the web server. Separating the logic of the web server into two levels would create a base functionality that operates the same no matter the event or the use case. This could be used to define exactly what kind of functionality the web server can provide in the system and what kind of use cases it supports. The second level of the web server decouples the logic for the different events which gives a higher level of abstraction where specific use case modules can be implemented and deployed without the need to modify any other module. Internal Messaging System This section explains how separating the internal messaging channels for different events may help solving the identified problems of the solution. It focuses on the use of the messaging pattern called publish-subscribe to separate different events as well as separate internal non-related components in the system. The publish-subscribe pattern is addressed both in

52 42 Chapter 4. Evaluation of Reference System Figure 4.5: Outline of the layered web server solution. It introduces two levels in the web server where the higher level contains use case specific logic separate from each other, while the lower level manages these use cases. chapter 3 and chapter 5 and this section will instead focus on the explicit use and effect of introducing such a pattern in the current service. The currently used internal messaging system is a publish-subscribe pattern implementation but it is currently not used in any way other than to create a single messaging channel that all messages are sent on. To add support for multiple channels, the setup and teardown of the channel must be moved from the global scope of the entire web server to the corresponding event functionality of the channel. This makes a message channel coupled with the event and can be viewed as a separate part belonging only to the corresponding event. By creating a messaging system where components express their interest in different types of messages, components are decoupled in space and do not have to manage any other type of messages than those that they want. This greatly increases the scalability of the network as well as decouples non-related components from each other so that components may be added, removed or modified in the system without any non-related component would change its behaviour.

53 Chapter 5 Publish-Subscribe Pattern This report is a documentation of a thesis project performed at Dohi Sweden that consists of creating and evaluating a large scale back-end solution for asynchronous communication between a web client and a web server. The publish-subscribe pattern has been around for a long time and it currently exists many different types of known uses for it as well as a fair amount of research in this area. This chapter aims to make a summary of the publish-subscribe paradigm focused on an intermediate goal of the main study, namely the separation of use cases. The area of publish-subscribe is of great interest for this work as the overall communication for the targeted solution fits this type of pattern. 5.1 Introduction The publish-subscribe pattern is a messaging pattern that is of great use for messageoriented solutions as it tries to loosely connect messaging participants based on what type of content they are interested in[51]. The publish-subscribe pattern defines two types of participants: the publisher that produces content and the subscriber that consumes content. The highest benefit from this pattern is its loose coupling between participants as publishers and subscribers do not need to know each other, they do not have to participate at the same time and messages are sent and received asynchronously. This is a widely used messaging pattern on the Internet and other types of large networks as it enables high scalability for distributed messaging. An overview of the publish-subscribe pattern has been made to be able to more easily address relevant aspects of the pattern. This section tries to explain how this pattern commonly works and describe a general framework of an implemented publish-subscribe system Overview A publish-subscribe solution is usually refered to as a notification service where the publishers and the subscribers are the clients of the service[1]. For the interaction between components of a notification service, three terms are commonly used: a notification, an event and a publication. A publisher produces an event which is also often called a publication for the notification service while a subscriber receives and consumes a notification from the service. This thesis will use notification as a general term when addressing any of them later on. This thesis will also use the expressions of a subscription request and an 43

54 44 Chapter 5. Publish-Subscribe Pattern advertisement request to denote when a subscriber expresses its interest in receving certain content and when a publisher expresses its interest in publishing certain content. Furthermore, the type of content that a subscriber expressed its interests in receiving is called a pattern and the type of content that a publisher expresses its interests in publishing is called a filter. In the context of notification passing, nodes that lie in between and forward notifications between a sender and a receiver are commonly refered to as brokers or broker nodes. Brokers can be, for example, intermediate forwarding systems, network routers or other peers depending on the implemented notification service. Except for making a centralized broker architecture, brokers could be distributed and managed in many ways[20][42][11] which is one of the reasons publish-subscribe is such a well used pattern for scalable messaging Subscription Models One aspect of a notification service that needs to be determined is how subscribers express their interest in content. Choosing a more expressive subscription model often makes the service more complex and harder to realize and deciding on this trade-off is important as different subscription models result in different solutions for the underlying system. There are three commonly known subscription models: topic, content and type-based subscription models and these will be described in this section. Topic-based In a topic-based notification service, a publisher pairs a publication with a topic in which then all subscribers that express their interest in that topic will receive the notification. This can be seen as having a separate messaging channel for each topic that does not relate to any other channel. A topic-based notification service has the least expressiveness compared to the other two addressed subscription models but it is also the most easy to implement. Topic-based notifications can be viewed as the equivalence of broadcasting in group communication as subscribers of a topic are implicitly static - the recipients do not need to be calculated when publishing a publication. Content-based Content-based publish-subscribe is a more expressive subscription model than topic-based publish-subscribe. Instead of connecting publishers and subscribers with an external property such as a topic, content-based publish-subscribe routes messages that matches a specific pattern that a subscriber has specified. A pattern consists of a number of conditions for properties of the notification itself and these conditions are used to filter out any notification that not fit these conditions. An example of a pattern Θ(S) that subscriber S expresses its interest in could be: Θ(S) = (x > 5) (y == 0) which means that S wants to receive a notification if the property x of the notification is larger than 5 or if the property y is zero. This gives a lot expressiveness as subscriptions could be based on complex sequences of constraints on the internal properties of the notifications.

55 5.2. Problems 45 Type-based In a type-based notification service, a subscriber expresses its interest in a specific type which is a unique identifier just like topics in topic-based publish-subscribe. Types-based publish-subscribe on the other hand uses a object-oriented approach where a notification can be seen as an object with a specific object type which can be further derived from another object type. Figure 5.1 describes three types where Sports News could be seen as a base type with two subtypes: Fotball News and Golf News. Subscribers could express their interest in, for example, Golf News in which they would only receive notifications of type Golf News. Subscribers could also subscribe to Sports News in which they would receive notifications of all three types. This leads to a higher expressiveness than topic-based publish-subscribe together with type-safety checks of notifications at compile-time. Figure 5.1: A description of how types in a type-based publish-subscribe solution may relate to each other. A subscriber may subscribe to the type sports news, in which it will receive all sports news, both those that are golf news and those that are football news. A subscriber may also subscribe to, for example, golf news, in which it will only receive golf news and not fotball news. 5.2 Problems The most common problems related to notification services are listed as the following: Expressiveness vs Realisability. The underlying system of a notification service is based heavily on the expressiveness of the service. A content-based subscription model, for example, requires a filtering process during dissemination compared to a topicbased model where subscribers for a notification are statically retrieved. These models use very different system architectures and the general consensus of the expressiveness for a publication service is that a higher expressiveness also raises the complexity of the implementation. This poses a problem as a more general solution that should fit more use cases typically also increases the need of a higher expressiveness. Throughput vs Reconfigurability. The time it takes for a notification service to disseminate a notification to its subscribers varies depending on the implementation. One aspect that may hinder the performance of a notification service in terms of throughput and latency is the reconfigurability of the service. If all subscribers and publishers where decided beforehand there would be no need to support any reconfigurability as the exact routing of notifications would already be known. By raising the reconfigurability level to, for example, handle dynamic publishers and subscribers it also lowers the performance of the overall system. Decentralization vs Manageability. A common problem for all solutions that is decentralised for scalability purposes is its decreased manageability. This is even more problematic for solutions such as notification services that may not have a clear way

56 46 Chapter 5. Publish-Subscribe Pattern of decentralisation. When going from a single broker to multiple brokers questions arise how these should function since it may be difficult to horizontally scale these. Security. Early publish-subscribe implementations were used for mail and news feeds and security was not as high of a priority. Because of the decoupling nature of the pattern it is problematic when facing security concerns such as content authentication, confidentiality and integrity. Since publishers and subscribers do not know each other, the notification service must make sure that these concerns are tended to and ensured by the service. Reliability and Persistence. Message reliability and persistence is two additional concerns that may be generally more complex in a notification service because of the decoupling nature of the publish-subscribe pattern. There is often more than one step between a publisher and a subscriber in a notification service and all steps need to guarantee reliability and persistence. This is not an easy task as, stated in the introduction, publishers and subscribers do not need to know each other, they do not have to participate at the same time and messages are sent and received asynchronously. 5.3 Measurements of Results Common quantitative measurements related to messaging systems are throughput in terms of number of messages disseminated and latency in terms of time between sending and receiving. These measurements will be considered in this study and they will be measured against each other as one will affect the other. Aspects of these measurements that will be noted is the size of the messages sent and if there exists any situation where performance varies greatly. The throughput and latency will also be addressed in the context of scalability where the performance of these two aspects are measured against the level of scaling used. This is expressed as measuring the increased or decreased throughput and latency when varying the number of broker nodes or other quantitative scalable variables. The rate of failures when disseminating a notification is of interest as it relates to the reliability and persistence of a notification service. This aspect can be expressed as the number of messages that may be dropped or somehow lost during the way from a publisher to a subscriber. 5.4 Classification of Methods There are many existing techniques and methods that address the problems that is of interest in this study. The ones that are considered in this study have been grouped into three categories to be able to more easily focus on and understand the problems of this study. These do not include all scientific work within this research area but express the ones that were deemed the most interesting for this specific work. The categories are also not necessarily completely separate from each other and may overlap in terms of practical use but the distinct focus for each category is described in this section. 1. Decentralising brokers. This category focuses on the distribution of managed brokers to better ensure different qualities of service such as availability and responsiveness. Distributing managed brokers is probably the most used way of scaling a notification service as it uses the widely known and used client-server model. It is based on when there exists a separate layer of brokers between the publishers and

57 5.5. Research Summary 47 subscribers that is owned and handled manually by the notification service. Depending on the desired expressiveness different types of broker topologies may be suitable. These topologies may also work on different levels of the network infrastructure where, for example, topologies that work on a lower level uses functions of the physical network to disseminate a notification. Different topologies may also use different routing algorithms. 2. Peer-to-peer methods. Notification services that only uses the clients of the service has been of interest lately because of the growth of smartphones and other highly mobile and lightweight units. This category refers to when there are no brokers in between and the problem of disseminating a notification either lies partially on a coordinator or entirely on the clients. Interesting aspects of peer-to-peer solutions that will be addressed are how a peer-to-peer topology relates to the underlying physical network topology and how different routing algorithms for the topologies work. 3. Self-managed self-optimised methods. This category focuses on the properties of self-managed notification services. Instead of using a broker topology that is statically and manually specified beforehand self-managed services are able to dynamically modify its architecture based on the use of the service. This also relates to the capabilities of ensuring robustness and availability in a automatic way without the need of a, for example, manual restart of a node when a node crashes. Being self-optimised refers to the possibilities of a notification service automatically adjusting itself to improve performance. 5.5 Research Summary Having introduced the subject, listed reoccurring problems and classified the methods, it is time to present gathered scientific progress, both successful and unsuccessful. One method will be presented for each method category (decentralising brokers, peer-to-peer methods and self-managed methods), clearly describing the proposed idea, identifying and associating the problem(s) addressed while highlighting possible limitations and/or assumptions made. Each method, and the selected summarized work, will be concluded by stating the results followed by a discussion. The last section of this chapter concludes the in-depth study with a more detailed comparative discussion of (dis)advantages of the different approaches as well as indications of where future research can allow for improvements, generalisation and extensions of proposed ideas Decentralizing Brokers Decentralised brokers relies on managed nodes in the publish-subscribe network to route notifications. A notification service that uses this kind of solution puts the dissemination functionality in a layer separate from its clients where the client-side parts of the service only feature some simple communication with the separate layer. This type of messaging system is probably the most used as it follows the common client-server architecture where the messaging functionality can be viewed as a separate system that is completely managed by its maker. Siena[6] is a notification service that makes use of distributed brokers and has been around for very long. It features a very extensive theoretical documentation of the publishsubscribe paradigm which is further used in this summary to describe a typical implementation of a publish-subscribe system. The fundamentals of Siena is studied together with a

58 48 Chapter 5. Publish-Subscribe Pattern system architecture for publish-subscribe called EventGuard[45]. This system architecture addresses security concerns such as confidentiality, authentication and fail-overs for an existing publish-subscribe service. The authentication part from EventGuard will be studied for the general solution that was extracted from Siena. Two broker topologies or architectures that will be addressed in this thesis is hierarchical and peer-to-peer topologies. The hierarchical topology connects a broker with a single higher level broker (master) and multiple lower level nodes which could be both brokers and clients. A master can receive notifications from all its clients but will only forward notifications to a client that needs it. This makes the master comparable to a gatekeeper, keeping unwanted traffic off the clients. The peer-to-peer topology is a more general topology where brokers communicate with neighbours bidirectionally without a hierarchy. There are two commonly known types of routing algorithms: subscription forwarding and advertisement forwarding. The former type of routing algorithms propagates subscription requests through the network to create the routing paths for notifications. A routing path is then used to route a message through the reverse path from which the subscription request came from. The latter type of routing algorithms broadcasts advertisements first, creating a tree of brokers where every broker has information about the advertisers. Subscriptions are then sent along the reverse path back to relevant advertisers, marking the paths that notifications for a subscription will take when published. The publish-subscribe implementation is, in general terms, pruning spanning trees over a network of brokers to minimize communication, storage and computation costs. A subscription request is only propagated along the paths that a previous request has not already covered. This study uses the expression X S S Y to be able to later denote a relation between subscriptions where subscription X covers subscription Y. The expression X A A Y is used to denote the equivalent relation between advertisement requests. To keep track of the relations between subscriptions, a partially ordered set or poset of filters can be used. Siena denotes P S as a poset defined by S S and P A as a poset defined by A A. In a poset of subscriptions P S, a filter f 1 is an immediate predecessor of another filter f 2 and f 2 is an immediate successor to f 1 if f 1 S S f 2 and there is no other filter f 3 such that f 1 S S f 3 S S f 2. The filters that do not have any successors in the poset is called roots and these are the ones that will produce network traffic. A Hierarchical Architecture A broker in a hierarchical topology contains a poset P S where the subscribers of filters in the poset represents lower level brokers and clients. This poset is used to route notifications to subscribers in the lower level where filters for subscriptions covers the classification of the notification. Because of this, only subscription forwarding is considered for hierarchical architectures as masters would not be able to respond to an advertisement with the corresponding subscriptions. Algorithm 1 and algorithm 2 tries to express the way the poset is used in a broker when it receives a subscription and a notification. These algorithms uses the expression S(f) to denote the set of subscribers to a filter f and the expressions f and f to denote the sets

59 5.5. Research Summary 49 containing the immediate successors and the immediate predecessors respectively. Algorithm 1: Subscription request management in a hierarchical broker topology Data: A poset P S for the current broker Data: A subscriber X that issued the subscription request Data: A filter f that corresponds to the requested subscription if f : X S(f) f S S f then terminate else if f / P S then insert f into P S end insert X into S(f) if f = then forward subscription to master end if f then remove X from filters f in all predecessors of f where f S S f end end Algorithm 2: Notification propagation in a hierarchical broker topology Data: A queue Q containing the root subscriptions of current broker s poset Data: A notification n that should be routed to matching subscriptions if this broker has a master and the master did not send n then send n to master end for each element s in Q that has not yet been visited do visit s if n S S s then append all predecessors of s that has not been visited else remove s from Q end end for each element s in Q do send n to s end When a broker receives a subscription request it manages it in basically three ways: 1. If there already exists a filter that the subscriber also subscribes to and this covers the new filter, the broker will just do nothing since notifications corresponding to the new subscription will already be routed to the subscriber. 2. If the filter already exists in the poset the subscriber will be added to the subscribers list for that filter. 3. If the filter does not exists in the poset the filter will be inserted and the subscriber will be added to the inserted filter. For the last two cases where a subscription is inserted into a filter f, the broker needs to remove any previous subscriptions from the same subscriber that the new filter will cover.

60 50 Chapter 5. Publish-Subscribe Pattern This is done using a breadth first search starting in f and removing any subscriptions with the same subscriber as the insterted one. If the filter that the subscription is inserted into is a root i.e. it does not have a successor in the poset the subscription will be sent to the master as well. An Acyclic Peer-to-peer Architecture The peer-to-peer solution covered in this segment is very similar to the hierarchical topology that were previously described. Instead of keeping track of a master and a set of client nodes, a broker in the peer-to-peer topology uses only a set of neighbours denoting the connected nodes to that broker. It uses a poset P S just like in the hierarchical topology but this poset also has a set of forwards for a subscription that contains the neighbours that the subscription already has been forwarded to. The acyclic peer-to-peer architecture propagates notifications exactly the same way as the hierarchical architecture. The way a subscription request works is also similar but in the latter two cases where covered subscriptions needs to be removed, the peer-to-peer uses a different procedure in forwarding the subscription to its neighbours. The set f orwards(f) of a subscription f can be defined as the following: forwards(f) = neighbours NST (f) f P S f S S f forwards(f ) (5.1) This means that, f is forwarded to all neighbours except those not downstream from the server along any spanning tree rooted at an original subscriber of f and those to which subscriptions f covering f have been forwarded already by this server. Authentication To ensure authentication, EventGuard uses a key management system called Meta Service (MS). When a subscriber S wants to subscribe to a topic w it first sends the request to the MS. The MS then authorises the subscriber using some central authorisation service and then accepts or denies the request based on what the subscriber wants to subscribe to and what is is allowed to subscribe to. If the subscriber is authorised by the MS it receives a subscription permit from the MS consisting of a key K(w), a token T (w) and a signature sigms S (T (w)). The key K(w) is used to decrypt received notifications from the topic w. The token T (w) is a one-way hash of w and is used as a private identifier of the topic w. The signature sigms S (T (w)) is ElGamal-encrypted1 and is used to check the validity of the subscriber on the publish-subscribe nodes. When a subscriber receives a subscription permit from the MS it then formally subscribes to the publish-subscribe service using T (w) as topic instead of w. Notifications for the topic w will then be routed in the publish-subscribe topology to all subscribers of T (w). When a publisher P wants to send an advertisement to be a publisher for for a topic w it first speaks with the MS using a public-key pk(p ). MS then authorises the publisher and sends an advertisement permit to the publisher containing a key K(w), a token T (w) and a signature sigms S (T (w), P, pk(p )). The key and the token are the same key and token that subscribers receive. Except for the token, the signature sigms S (T (w), P, pk(p )) also includes an identifier for the publisher and the publisher s public-key. When a publisher receives a advertisement permit it submits an advertisement to the publish-subscribe service on topic 1 encryption

61 5.5. Research Summary 51 T (w). Subscribers of T (w) will receive the publishers signature which includes the public-key for that publisher. When an authorised publisher wants to send a notification under the topics w 1, w 2,..., w m it first makes a request to the MS that returns a random key K r. The publisher then encrypts the content of the notification with K r and publishes the encrypted content together with an encryption of the random key E K(wi)(K r ) for each topic using the key for corresponding topic. Subscribers of a topic w i can then use the key K(w i ) to decrypt K r and then use K r to decrypt the notification content Results The report containing an overview of Siena, includes an evaluation of the fundamentals and tries to test the framework in terms of relative performance, expressiveness and scalability. The tests varied the amount of interested objects and parties while keeping the network sites constant. Siena tests four different topologies: centralised, hierarchical, acyclic peer-to-peer and general peer-to-peer. It also includes evaluations of the total cost as well as different parts of the system such as cost per subscription and notification. The total cost refers to the message traffic between all sites and it was shown from the test results that when there were more than 100 parties, the total cost was essentially constant. This was referred to as the saturation point as there was a high chance that there existed a party at every node using every message channel. When the number of interested parties were below the saturation point, all of the tested topologies scale sub linearly as it was very likely that an object of interest and an interested party was not on the same site. The hierarchical topology performed worse than the acyclic peer-to-peer solution as the hierarchical topology is forced to propagate notifications to the root whether or not it is necessary. The main difference between architectures in terms of cost per subscription and per notification is in the way each architecture forwards subscriptions. In a network of N sites the acyclic peer-to-peer architecture must propagate a subscription using O(N) hops through the network while the hierarchical architecture only needs to forward subscriptions up to the root, with O(log(n)) hops. The cost for propagating notifications are on the other hand in favour of the acyclic peer-to-peer architecture as this kind of propagation keeps a constant lower cost compared to the hierarchical solution with a varying amount of interested parties. The documentation of EventGuard includes a tested and evaluated implementation on top of a Siena core. These tests showed that EventGuard could be successfully deployed on a Siena system without the need of modifying Sienas routing or matching functionality. The solution was tested on a binary tree-based hierarchical topology with a varying number of nodes advertising, subscribing and publishing. The tests was used to compare the performance of the EventGuard, in terms of throughput and latency, and the performance of the Siena base without EventGuard. Throughput was measured in terms of maximum number of notifications that the solutions can handle per second. Latency was measured using the amount of time it takes for a notification to propagate to a subscriber using maximum throughput in the system. These tests showed that Siena with an incorporated EventGuard, scales with the same order of magnitude in terms of throughput as the Siena base, with only a small constant decrease. The results from testing the latency showed similar results with only a small constant decrease between the EventGuard and Siena and a latency difference that was not higher than 4% at any given time.

62 52 Chapter 5. Publish-Subscribe Pattern Discussion Distributing managed brokers is a good way of scaling the messaging system while keeping the control of the system. It fits well into the common client-server model and do not require much from its messaging clients. This section makes it clear how brokers can be distributed using a hierarchical and a peer-to-peer architecture to scale the performance of the messaging system. EventGuard shows how authentication functionality could be incorporated in a publishsubscribe system without changing the fundamentals of the system. This makes it easier to test the differences between a regular messaging system and a system with a security layer. It also makes it easier to modify the publish-subscribe system without the need of modifying everything, a system could much easier be replaced while still keeping the security features. The tests shows that the authentication solution presented by EventGuard is practical as an implementation does not decrease performance and scalability. Siena was made to handle content-based publish-subscribe functionality which gives a highly expressive messaging system that can be used for many different types of use cases. This shows that a messaging solution could be implemented with scalability features as well as with a high expressiveness Peer-to-peer Methods This section addresses publish-subscribe solutions that do not use any managed brokers but instead relies solely on the peers of the system. Peers can be used to create a publishsubscribe solution on top of the application infrastructure building an application-level network of peers as brokers. This can be very effectively used to create large-scale information dissemination that is reliable and cheap as there does not exists any other components except for the clients. Pastry Pastry is an object location and routing substrate for large-scale distributed peer-to-peer applications. Pastry makes use of peer-to-peer communication to create an applicationlevel topology over the network that can be used to implement a wide range of functionality such as global data storage, group communication and naming. It is used in a notification infrastructure, called Scribe, to provide a peer-to-peer publish-subscribe solution. Pastry uses a 128-bit value to identify each node in its overlay topology. These node ids are assigned randomly to nodes and it is assumed that the generation of ids is uniformly distributed in the 128-bit space. An id could be generated using a hash function on a node s public key or IP address which creates a high possibility that neighbouring nodes, those with adjacent ids, are diverse in, for example, geography and network attachment. Messages contains a key K which is the node id of the recipient. Node ids and message keys are seen as a sequence of digits with base 2 b. This sequence is used by Pastry to route a message to a node with a node id that is numerically closest to the message key. A Pastry node maintains a leaf set and a routing table that it uses to route messages in the system. The leaf set L contains the L /2 nodes which have the numerically closest smaller node ids to the node and the L /2 nodes which have the numerically closest larger node ids to the node. L are typically 2 b or 2 2 b. When a node receives a message, it first checks if the message key is within range of the leaf set and, if that is the case, routes it to the numerically closest node in the leaf set. The routing table consists of log 2 bn rows with 2 b 1 entries in each row where each entry on a row n shares the node s id in the first n

63 5.5. Research Summary 53 K i = (3) (1) (1) (0) (2) Table 5.1: Routing table for a node i in a Pastry overlay network. digits, but do not share the digit at position n + 1. Each entry in the routing table contains the IP address of one of the nodes that has fits this prefix. See Table 5.1 for an example of a routing table for a node i with id K i = When a message M with key K is received on a node A, and the key is not within range of the leaf set, the node checks the routing table and forwards the message to a node that shares a common prefix with the key by at least one more digit. Pseudo code for the core routing algorithm in Pastry is shown in Algorithm 3. Algorithm 3: Routing algorithm for a node in a Pastry overlay network. Data: A message M with key K that has arrived to the node with node id A Data: The entry R i l at column i, 0 i < 2b and row l, 0 l < 128/b in the routing table R. Data: The i-th closest node id L i in the leaf set L, L /2 K L /2. Data: The value of the digit K l at position l in the key K. Data: shl(a, B): the length of the prefix shared among A and B, in digits. if L L /2 K L L /2 then forward M to L i so that K L i is minimal else let l = shl(k, A) forward M to R K l l end Scribe Scribe[39] is a peer-to-peer topic-based publish-subscribe solution that is based on Pastry to create a fully decentralised application-level network overlay topology. It sets a rendezvous point for a topic and uses that to build a multicast tree by joining the Pastry routes from each subscriber up to the rendezvous point. Scribe consists of a Pastry network of peers where peers have equal responsibilities. Scribe adds two more types of functionality to the each node, namely the forward and deliver methods. The deliver method is invoked when a message arrives at a node with a node id numerically closest to the key of the message, or when a message was sent to the node with Pastry s send operation. If a received message should not be delivered and instead forwarded, the node invokes the forward method. These methods will carry out a specific task depending on the message type which could be: CREATE, SUBSCRIBE, PUBLISH, and UNSUBSCRIBE. The forward and the deliver methods are described in Algorithm 4 and Algorithm 5 respectively. Each topic in Scribe has a unique topic id in the same format as a node id and a message key. The Scribe node that is numerically closest to the topic id acts as the rendezvous point for the topic and forms the root of the topic s multicast tree. To create a topic, a

64 54 Chapter 5. Publish-Subscribe Pattern Algorithm 4: Forwarding algorithm for a Scribe node. forward(msg, key, nextid): switch msg.type do case SUBSCRIBE if msg.topic topics then add msg.topic to topics msg.source = thisnodeid route(msg, msg.topic) end add msg.source to topics[msg.topic].children nextid = null endsw endsw Algorithm 5: Deliver algorithm for a Scribe Node. deliver(msg, key): switch msg.type do case CREATE add msg.topic to topics endsw case SUBSCRIBE add msg.source to topics[msg.topic].children endsw case PUBLISH for every node in topics[msg.topic].children do send(msg, node) end if subscribedto(msg.topic) then invokeeventhandler(msg.topic, msg) end endsw case UNSUBSCRIBE remove msg.source from topics[msg.topic].children if topics[msg.topic].children == 0 then invokeeventhandler(msg.topic, msg) msg.source = thisnodeid send(msg, topics[msg.topic].parent) end endsw endsw

65 5.5. Research Summary 55 Scribe node uses Pastry to route a message with message type CREATE and topic id as the message key. The numerically closest node then adds the topic to the list of topics it already knows about using the deliver method. The id of a topic is hashed so that topic ids and consequently rendezvous points are uniformly distributed over the nodes of the Pastry network. The multicast tree of a topic is built by joining the Pastry routes from the subscribers to the rendezvous point and for each node on the way, the forward method will be invoked. If a forwarding node is not already a member of the multicast tree of the topic, it will set itself as a forwarder of the tree and route the message forward to a closer node. The forwarder will then add the sender to its children of that topic. When a publisher publishes a notification, it first checks to see if it knows the IP address of the rendezvous point, in which the publisher just sends the PUBLISH message directly to it. If the publisher does not know the IP address, it uses Pastry to route a message to the rendezvous point, asking for its IP address. When a rendezvous point receives a PUBLISH message it disseminates the notification using the constructed multicast tree for that topic. Authentication The multicast trees that are built when providing publish-subscribe functionality in an application-level overlay network such as Pastry, could provide further security by using a Merkle tree[64]. A Merkle Tree is a binary tree composed of cryptographic hash values, where leaves contain cryptographic hash values of data blocks, the internal nodes contain the hash concatenation of the children values and the root contains the content public key. Publishers create a set of private keys and generate a hash tree of these keys where the paths to the top hash, called public key or root key, is used as authentication paths. Leaves contain the hash values of the private keys and the nodes between the leaves and the root contain the hash of the concatenation of their two children. When a publisher wants to publish a notification, he first chooses one of the private keys and signs the notification with it. He then calculates the authentication which is the list of hash values needed to reach the top hash or the public key. This can later be used by a subscriber to verify the notification. This type of security model makes use of a separate authentication service where publishers and subscribers of a topic first must authenticate themselves to. This service stores information about the topic that is needed by the publishers and subscribers to be able to securely send and receive notifications. For each topic it stores the following: 1. Spread function. A mathematical function that lists the sequence of data blocks identifiers forming the content. It is used to avoid storing all data block IDs in the authentication service. 2. Root hash. The root hash of the Merkle tree used to authenticate the content. 3. Public Key. The public key of the content provider. 4. Signature. The signature of the content. Results The documentation of Pastry includes an evaluation of an implemented version of the solution. It was written in Java and uses network emulation environment to be able to test it with up to Pastry nodes. Each Pastry node were randomly assigned a location

66 56 Chapter 5. Publish-Subscribe Pattern on a plane in the emulated environment before the Pastry system were tested for different performance aspects such as routing and locating a close node. In this documentation, the overall routing performance were measured in the number of routing hops between two random Pastry nodes using 1000 to number of nodes in the network where b = 4 and L = 16. From the trials, it showed that the maximum number of hops required to route in a network of N nodes were as expected log 2 bn. It further showed that the number of route hops scale with the size of the network as predicted. Another performance aspect that was evaluated in Pastry was the ability to locate one of the 5 closest nodes near the client. It was tested in an environment of nodes with b = 3 and L = 8, where a randomly selected Pastry node sent a message to a randomly selected key. The test recorded the first 5 numerically closest nodes to the key that reached along the route. The results showed that Pastry is able to locate the closest node 68% of the time and one of the top two nodes 87% of the time. An experimental evaluation of Scribe[7] presented results and conclusions about the performance of an implementation of Scribe. Three metrics were used to measure the performance of Scribe, namely the delay to deliver notifications, to group members, the stress on each node and the stress on each physical network link. These were tested using a simulation of a network with 5050 routers and end nodes that were randomly assigned to the routers. Multiple test runs were used with a varying number of groups and group sizes. The delay when disseminate notifications to a group using Pastry were tested and compared against the delay of regular IP multicast. The relative delay penalty (RDP) for Scribe against IP multicast showed a mean value of 1.81 and more than 80% had an RDP less than The stress on a node were measured by counting the number of groups that had non-empty children tables and the number of entries in children tables in each node. Using 1500 groups and nodes shows a mean number of non-empty children tables per node of 2.4 and a mean number of entries in all the children tables of any node of 6.2. Discussion Pastry shows a very cheap and effective way of creating an application-level overlay network for large peer-to-peer solutions. It scales well with the network without significantly reducing latency as it do not require more than log 2 bn for a network with N nodes. This would be a very viable solution when an application can not use a regular client-server model with an intermediate layer of managed components. Scribe shows that Pastry is a powerful tool that can be used to create highly scalable communication solutions. Scribe uses Pastry to maintain groups and group membership and creates a very effective and scalable publish-subscribe solution that only relies on the peers of the system. Scribe can concurrently support many different types of applications as it can efficiently handle large number of nodes, groups and groups sizes. Scribe could be used to leverage a clientserver model where clients may want to set up a communication channel between themselves that does not need much supervision or participation by the back end. A publish-subscribe system using Pastry could be successfully used with an authentication solution that do not require different responsibilities of nodes and that keeps the nodes equal. Authenticating the communication channel of a topic makes sure that the content that is being delivered in the group are introduced following the rules by a creator or a manager of the group and not just from everyone.

67 5.5. Research Summary Self-managed Self-optimized Methods A publish-subscribe solution which uses brokers to disseminate notifications often relies on a broker topology on a higher level than that of the physical network. This may lead to non optimal performance of the system as the system does not make use of the inherited locality of brokers that are handling similar types of subscriptions. If the locality of brokers are not considered when creating an overlay topology of brokers, the propagation of a notification may take longer routes which increases the number of TCP hops and the time it takes for the solution to disseminate the notification. What follows is a description of a self-organising algorithm which tries to dynamically arrange TCP connections between pairs of brokers using only the routing tables of the brokers[1]. This preserves the scalability properties of a regular publish-subscribe solution as the algorithm only relies on local knowledge provided by the brokers. The goal of this algorithm is to improve the performance and scalability of a publish-subscribe solution by reducing the number of TCP hops for a notification dissemination. This study focuses on reducing the number of hops by grouping brokers that manages similar subscriptions and does not address grouping brokers that are physically close to each other on the underlying network. The algorithm defines a measurement for subscription similarity called associativity. This metric is used to describe the intersection of two broker s zones of interest. A notification matches a subscription if the point it represents falls inside the geometrical zone identified by the subscription. If follows that the larger the intersection between two broker s zones of interest, the more notifications that will be received by one broker but not by the other one. The associativity for a zone Z with respect to another zone Z is defined as: AS Z (Z ) = Z Z Z Let B i and B j be two brokers whose zones of interest are Z i and Z j, then the associativity of B i is defined as AS i (B j ) = AS Zi (Z j ). From this, the associativity of a broker B i with respect to its neighbours, is defined as: AS(B i ) = B j N i AS i (B j ) The associativity of the whole publish-subscribe system can similarly be defined as: AS : B i {B 1,...,B N } AS(B i) N

68 58 Chapter 5. Publish-Subscribe Pattern The algorithm is realized the same way in every broker, where the goal, for each broker, is to detect a possible rearrangement of the TCP links to increase its associativity while not decreasing the overall associativity of the system AS. A possible rearrangement is where two brokers, B and B, may directly connect to each other with a TCP link instead of using a path of brokers between them. If a broker B finds a rearrangement that will increase its associativity with another broker B while not decreasing the overall associativity of AS, then the algorithm proceeds by connecting B and B and deciding on a link in the path between B and B that must be tear down in order to ensure no cycles. The algorithm assumes that each broker can open a bounded number F of TCP links at the same time. The number of links available at broker B is defined as al B where 0 al B < F. Each link l i,j between two brokers B i and B j is associated with a weight w that reflects the associativity between the brokers and is used to measure the associativity that two brokers gain or lose when a link is created or teared down. The weight is a measure of the number of notifications that will pass through that link that are of interest to both brokers. The weight w i,j of a link l i,j is defined as: w(l i,j ) = w i,j = AS i (B j ) + AS j (B i ) Using the definition of weights for a link, a hop sequence can be defined as an ordered set of pairs of a broker id and a weight denoted as: HS(B 0, B l ) = {(B 0, 0), (B 1, w 0,1 ),..., (B i, w i 1,i ),..., (B l, w l 1,l )} which represents the path between B 0 and B l with the associated weights. The algorithm consists of four phases: triggering, tear-up link discovery, tear-down link selection and reconfiguration. What follows is a description of these phases. Triggering Triggering refers to when a broker B detects a possibility of increasing its associativity. Let Z i be the zone of broker B i, before the arrival of a new subscription S, and Zi = Z i S be B i s new zone of interest. The algorithm is triggered if the following predicate, Activation Predicate or AP, is verified: AP : Z i Z i l i,j : AS Z i (Z i,j ) > AS Zi (Z i,j ) For each l i,j satisfying this predicate, a tear-up discovery procedure is invoked along that link, as B i suspects that behind it could be a broker which can increase its associativity.

69 5.5. Research Summary 59 Tear-Up Link Discovery When a tear-up discovery procedure is triggered for a broker B i, B i sends a request message along the link l i,j with the following information: Z i, the new subscription S, and the hop sequence HS, initialised to (B i, 0). When a broker B j receives a request message, it computes the associativity between itself and B i, it updates HS by adding (B j, w l ) and computes the following Forwarding Predicate, FP: FP : l j,h l : AS Z i (Z li,j ) > AS Z i (Z j ) The predicate is used to indicate whenever there is a possibility that a higher associativity can be discovered with a broker behind l j,h : if no links exist such that FP is satisfied, then a reply message is sent back to B i along the path stored in HS. For each l j,h satisfying FP, B j forwards the request and then waits for the corresponding reply. When B j receives a reply from each link, it computes the maximum among all the values, including its own, and sends back a reply to B i using the reply message from the broker with the calculated maximum. Tear-Down Link Selection This phase focuses on identifying and selecting a link that has to be teared down during the reconfiguration phase. This phase is started every time a broker B i receives a reply for a tear-up discovery procedure that itself started. The reply contains the hop path HS and the identifier B l of the broker behind the link l that the request was sent on. The link to be potentially teared-up B i and B l as l tu. If al l = 0 al i = 0 the link l tu cannot be created and thus no links exist that can be teared down and l td = NULL. If this is not true then there exist two cases: 1. al i > 0 al l > 0: in this case both B i and B j have available connections and thus they can establish the link l tu without removing one of their existing links. 2. al l = 0, al i > 0(resp.al i = 0, al l > 0): in this case l td must be one of B l s (resp. B i ) links. Reconfiguration This phase focuses on tearing down a link that has been selected by the tear-down link selection phase. As the tear-down procedure must avoid creating a partition of the broker network, the procedure introduces a locking mechanisms that ensures that there is only one tear-down happening at a time along the path from B i to B l. The locking mechanism works by first sending a lock message along the path to B l where a broker B between the endpoints execute the following algorithm: When receiving a lock message: if B is involved with another concurrent reconfiguration phase on the same link it received the lock message from, it replies with a NACK message. Otherwise we have two cases:

70 60 Chapter 5. Publish-Subscribe Pattern 1. if B = B l, B sends an ACK message to the next node towards B i 2. if B B l, B sends a LOCK message to the next node towards B l When receiving a ACK message: B forwards the ACK message to the next node towards B i When receiving a NACK message: B forwards the NACK message to the next node towards B i and removes the lock Once the path is locked B i sends a close message to B j which tears the link down. After that B j sends back an unlock message where then the routing table for all brokers in the path are updated. Results A report containing an implementation of this study[1], uses Siena as a base for the publishsubscribe functionality and Java with J-Sim to simulate a real-time network which the prototype was run on top of. In total, a network of 100 nodes were simulated with one broker running on top of each network node. Simulation scenarios were sequences of subscriptions changes and notification publications, over a bi-dimensional space with two numerical attributes. Notifications are generated using an uniform distribution over the space. One aspect that was tested in this implementation was the routing performance which referred to the number of TCP hops per notification required for the dissemination. This aspect was tested and compared using the messaging system with and without the selforganising algorithm. This test showed a reduction of forwards by 70% which is close to the minimum. Another aspect that was tested was the overhead of the self-organising algorithm better routing performance comes at the price of additional network traffic introduced by the algorithm itself. The overhead was measured using the average number of messages per operation against the pub/sub ratio R. Such messages include all the messages generated by the system for notification diffusion, subscription routing and self-organisation. For a ratio as low as R = 10:1 the self-organisation cost is not outweighed by its benefits. As the ratio increases, the cost decreases of about 30% with respect to the case when no self-organisation is performed. Discussion The material covered in this section shows that a setup of decentralised brokers can incorporate a self-managing algorithm so that the brokers may autonomously organise themselves without the need of any manual intervention. This section serves as a good example of how a decentralised messaging solution can still feature a more manageable system which still provides a high expressiveness and throughput. The test results show on better routing performance when using the algorithm. This highlights the problem of using brokers that are not aware of the underlying network system. The addressed self-organising algorithm is a very simple algorithm that only consists of some few, very distinct phases. This bodes well for the continuation of the development and introduction of such an algorithm which can be easily implemented using the steps previously described.

71 5.6. Conclusion and discussion Conclusion and discussion Having presented the different workings of publish-subscribe for each category, this section concludes the in-depth study chapter of this thesis report with a comparative discussion. The individually summarised methods have been reviewed but this section aims to inspect these methods in regard to each other by comparing their (dis)advantages and highlight their possible applicability in the current solution. This conclusion also includes a discussion about the future of addressed methods Decentralising Brokers Decentralised brokers are probably the most common way of scaling this type of notification system. It scales well and is completely managed and controlled by the owners. It is also the way the current solution is handled and scaled out except for a slight variation of the explicit control of the notification component, as it is a cloud service owned and provided by AWS. This notification service that AWS provides, called SNS, is a topic-based publish-subscribe solution that uses intermediate brokers on the AWS cloud to disseminate notification. This service is a part of AWS cloud services and, because of that, it has a lot of extra functionality such as scaling and security. Using a cloud service reduces the need of maintenance while it provides very many simple ways to monitor and analyze a wide range of information about the service. One way to make further way for the messaging channels in the service, is to upgrade the topic-based routing to content-based or type-based notification routing. This would make it possible for the different components of the system to interact with each other in more intricate ways. A type-based solution could for example be used to create sub channels with the possibility to express interest in parent channels where messages from both channels would be routed to. This would make it possible to, for example, create one sub channel for the voting functionality and one sub channel for the admin functionality while components, like logging tools, could listen to their parent channel and receive messages from both of the sub channels. Type-based notification systems would also be able to simulate the same type of hierarchy used by the path component of a HTTP request where messaging channels could be distinguished using the path component. For example, /voting/client/event1, could denote a channel that consists of two parent channels /voting/client and /voting. This would combine RESTful functionality with the type-based notification functionality so that the interface between those could be made as minimal as possible Peer-to-peer Methods Pastry and Scribe shows that there exist very viable solutions for creating scalable peer-topeer publish-subscribe systems on an application-level network. It does however not give as much maintainability and other types of notification services, especially those on the cloud, outperforms this kind of solution in many ways. This solution is instead meant for use cases where it does not exist any way to provide the functionality of a use case using a centralised broker system. This type of notification system could instead be used for when a central messaging system is not needed or wanted and the clients of the system should interact directly with each other. Peer-to-peer solutions could be used in the current system to offload the central parts on the back end and instead partially move messaging functionality to the peers. This could be used to create a more scalable way of pushing out data to the clients as a web

72 62 Chapter 5. Publish-Subscribe Pattern server does not need to keep track of every client and does not need to publish messages to every client. This would mean that the quality of the provided services are dependent on clients capabilities to route messages. This peer-to-peer network could be used to implement additional functionality that do not need the back end for any heavy data processing, for example. This type of functionality may relate to dynamic handling of events where clients, for example, can create their own events. Offloading these types of events on the clients makes it possible for the system to handle more concurrent events. This type of notification system also shows that there exists viable security solutions which may further motivate the creation of dynamic events by the clients and possibilities to create security policies for events so that, for example, only friends may connect to a channel Self-managed Self-optimised Methods The addressed self-managing algorithm shows a simple and easy to understand procedure that can be used to add some automation to a messaging system. As the reference project uses a cloud based service for its internal messages, it is impossible to configure the brokers of the notification system. Therefore it may not have as much value for the current system if not the messaging system would be replaced with an alternative that could support these methods. If a non-cloud based messaging solution would be implemented for the current system, this algorithm could be a good way to improve the performance of the solution as the system would make use of the locality for the different clients. It could be implemented for many types of messaging systems as seen on its test cases on top of Siena, a very generic and well used base for publish-subscribe systems. This fits nicely into the use of peer-to-peer solutions of publish-subscribe if the messaging would be offloaded onto the clients, as the locality of the clients plays a much larger part for peer-to-peer solutions where clients positions may move much more The Future The use of cloud based services is a great way to skip development of redundant and unnecessary code as cloud services may provide the needed functionality out of the box with already present and well-tested functionality for, for example, scaling and monitoring. These cloud services, on the other hand, cannot often be tailored specifically to the needs of the developed application and cannot, because of that, truly compete with proprietary solutions when it comes to performance and if the messaging system would be replaced by a non-cloud based messaging system, the addressed methods of this chapter would come in great use. While these methods focuses on the internal messaging system, one way to extend the current use case while offloading the internal system would be to introduce dynamic events that could be created and managed by the users. This added functionality could be used to create an event that is only relevant for the client and, for example, its friends. These methods is especially of interest for this kind of messaging as it addresses peer-to-peer and self-managed solutions with security in mind where authentication could be used with any publicly known and used procedure such as authentication with Facebook or Google. This in-depth study could also be used as a documentation of said methods where any implementation of a publish-subscribe system would benefit from addressing the issues in this chapter. Siena, for example, is a well-known base for publish-subscribe systems that could work as a reference solution when designing any type of publish-subscribe functionality where a messaging solution is needed that decouples the sender and the receiver.

73 Chapter 6 Implementation of Improvements This chapter is a documentation of the implementation that was made during this thesis. It focuses on applying one of the proposed main improvements on the current system, namely the separation of messaging channels in the web server. It is not an implementation from scratch but a modification of the current system which focuses on applying the improvement and changing as little as possible of the behaviour and inner workings of the current service. This implementation can then be seen a good base or reference if this type of functionality should be incorporated into the existing system later on. This chapter presents the goals for this implementation in section 6.1 and describes how they will be measured in section 6.2. It includes an overview of the implementation details in section 6.3 and concludes with a presentation of the results in section Goals One of the main goals for this implementation is the modification of the web server so that it is able to distinguish between clients for different events and is able to multicast messages to clients for an event. The proposed improvement defined two levels in the web server where the first level addressed the generic functionality of the message channels and the second level addressed the event logic where use case specific functionality would be present. These levels will be implemented and will consist of including a way to separate the logic for each event inside the web server so that any message that is sent for an event is handled by its respective event logic on the second level. This improvement of the current solution directly addresses two issues, namely sending everything to everyone and completely replicating the global state in both clients and web servers. By implementing the two levels in the web server, the web server will be able to distinguish between clients and multicast messages to clients for an event. The second issue, on the other hand, requires use case specific changes to the solution so that the logic for each event can be used separately. This means that the use case implementation needs to be modified so that it uses this separation of event logic and does not, for example, fetch the whole global state and instead only fetches the state for a specific event. This also means that the current solution must be changed in the same way on the client where a client only stores and handles the state of those events that it is interested in. The goal to solve this 63

74 64 Chapter 6. Implementation of Improvements second issue can then be broken down to as making sure that the web server only holds the event logic for those events that it needs, namely those that connected clients are interested in. The modifications of the current system must be explicitly documented in which parts of the system it changes and what it changes. This is to make it clear of how these changes can be applied when or if the stakeholders of Dohi Sweden decides to incorporate this into the system. The implementation must also be able to demonstrate the behaviour of the current use case by incorporating it into the modified solution. Reaching these goals would not pose any challenges if the performance aspects of the solution would not be considered. One of the overall goals of this implementation as well as this thesis is to come up with improvements for the existing service that may better fit the needs of the stakeholders and that may later be used by the existing service. This implementation will trade usability and performance, in terms of client-server delay, to scalability, generality and complexity, and these trade-offs must be documented so that readers may get a good idea of what differs between the two systems. 6.2 Measurements Demonstrating that the implemented solution only sends messages to the clients of the corresponding event can be easily done by creating two events, connect a client to each event and then send some messages over the events. Showing that the client and the web server only keeps track of relevant events can be done by logging the data that is stored in both components. The web browser Google Chrome supports ways to log data that is transmitted over asynchronous web connections and this can be used to show that a message also only contain data for the corresponding event. The differences between how the solutions proceeds when a client connects is measured using the perceived latency by the client, the number of messages sent and the size of the messages sent. The phase for when a client connects is defined as the steps between the first communication with the web server and the consecutive steps until the client has everything it needs to be able to send and receive messages for the event it is interested in. In the reference solution, this means that the initialization phase consists of fetching the bootstrap from the web server, fetching the token from the web server and then connecting to the web server for the asynchronous communication link. The initialization phase for the modified solution consists of connecting to the web server, receiving a connection response, sending a subscription response and finally receiving a subscription response. The special quick subscribe method of the modified solution has also been tested and it consists of the steps where the client connects to a special URL and then receives a combined connect and subscribe response. How the modified solution behaves with its lazy initialization is tested by doing ten consecutive initializations with ten clients where the previously connected clients stays connected. The perceived latency by the connecting client is then noted down. The bootstrap of the both solutions are also tested. The boostrap is the initial message that is sent when a client connects and contains information specific to a channel or, for the reference solution, information about all channels. This information is used by the client to set up the necessary parts of an event so that it can be presented the correct way.

75 6.3. Implementation Details Implementation Details This section describes the modifications that were made on the current system by addressing the components that has been changed and how they differ from the existing solution. It presents the separation between the general functionality and the use case specific logic and outlines how it works Channel Definition The proposed improvements defines a channel as a separated messaging channel for a single use case or, in the case of the reference use case, a single event. Each event or use case gets its own channel which it uses to send and receive messages over for the corresponding use case or event. These channels represent the core functionality of the generalization that this implementation introduces. The idea is that clients subscribes to channels instead of just connecting to the web server where a client that is a subscriber of a channel can send and receive messages on that channel. The channel functionality is present in three components, the client interface, the web server and the client validation cache, see Figure 6.1 for an overview of all components of solution with addressed components highlighted. The channel id, or reference, is for the voting use case intended to be similar to that of a path component of a URL. A channel reference is also meant to be a composition of an account and the name of an event, such as /dohi/event1. This is used for the request handlers and the channel manager in the web server to be able to group channel specific logic and to further minimize overhead by, for example, let a client subscribe to a channel when connecting to the server using a certain URL Components There are four components that have been modified to some extent and how these have been changed is described in the following subsections. The overall architecture has not been changed from the existing solution and components are still only interacting with those it do in the reference system, see Figure 6.2 for an overview of the system architecture where modified components have been highlighted with a wider and rounded border. Web Server The web server has been completely reworked from the ground up to accommodate for the core functionality of the channels. It consists of two levels where the first level consists of the core functionality and the second level consists of use case specific logic. When a client first connects to the web server, the web server directly sends a respond to the client containing the validation token that the connected client must use in every message to validate itself. A connected client can then send subscription requests to the server which, depending if there is any channel module that validates the request, sets the client as a subscriber to the channel and sends a response back that may contain use case specific information such as initial state. The web server holds channel modules which are bound by a mapping expression for how an id or reference to the channel could look. The web server uses these modules to perform lazy initialization on channels, triggering when the first client subscribes to a channel. See Algorithm 6 for a step by step description of the subscription procedure used in the web server. The initialization of a channel includes setting up a topic on SNS for the channel and call a channel modules initialization method. The SNS topic is used for channel messaging

76 66 Chapter 6. Implementation of Improvements between internal components of the system and the module initialization call is used to let the use case or event specific logic to do an initialization. When the last client unsubscribes from an initialized channel on a web server, the web server tears down the channel by unsubscribing to the SNS topic and calling a destructor method on the channel module. A channel module can hold regular handlers for HTTP request/response logic which do not go through the subscription procedure but is still exposed to the channel module and the messaging functionality for a channel and can as such still interact with the use case Figure 6.1: An overview of the components in the solution which highlights the general functionality of the messaging channels that is not tied to any use case. It separates the web server in two levels: a higher level which contains use case specific functionality in separate modules and a lower level that manages these modules. It introduces a web client interface that the clients use to express their interest in events or channels. The client validation cache is also used directly by the lower level of the web servers. The load balander and the notification system is also viewed as part of the basic functionality.

77 6.3. Implementation Details 67 Figure 6.2: Overarching components in the solution with modified components highlighted. Those that were modified during the implementation are the web client, web server, notification system and database. Other components were used as they were. logic. Algorithm 6: Routing algorithm for a node in a Pastry overlay network. Data: The set of channel modules P : id, for possible channels Data: The set of active channels A : channel Data: The set of channels S that the client is subscribing to Data: The channel c that the client is trying to subscribe to if c / S then if c A and A(c) validates the subscription then add c to S and send OK response else if c matches any channel module in P and P (c) validates the subscription then set up topic for c on SNS set up channel, add it to A and call init method on channel add c to S and send OK response with end end end

78 68 Chapter 6. Implementation of Improvements The event module for the current use case is similar to the use case functionality in the reference system, except it does not operate on the complete global state but instead only on its own channel state. When the channel manager in level 1 asks the event module to validate a channel it checks if the channel exists in the database and accepts the validation if it does. When the logic for an event initializes, it fetches the state of the channel from the database and stores it locally and then gives it to the channel manager whenever the channel manager asks for a channel subscription response. The admin interface handling is done exactly the same by using HTTP request/response handlers and directly querying the database. The data model has been changed, though, from storing happenings to storing channels and, because of that, the admin handlers has been modified to support the new data model. To further minimize the number of steps that needs to taken in the starting phase when a client connects, a client can, instead of connecting to the root of the server address, connect to the server using a path component that starts with /channel/. The web server will interpret this as a connection with a single subscription request to a channel whose reference is represented the remaining part of the path component. Notification System The notification system is still SNS and the internal components do still communicate using a single topic but the web server now also creates a topic for each channel that can be used to send messages over its own channel and further separate use cases or events. Database The data model has been modified to accommodate the new separation of data with channels. When a channel for an event is initialized, the event logic needs to be able to fetch the state of that single event. Instead of using happenings, surveys, questions and options, the database now stores channels as a substitution for happenings and surveys grouped together where each channel in the database represents a channel or event for the current use case. A user is no longer tied to a happening but can instead be tied to a number of accounts where an account represents a group of channels. Front end The front end has been divided into two parts where a client interface has been extracted from the use case specific web client. The client interface is a Javascript library that provides functionality to connect to a server, disconnect from a server, subscribe to a channel, unsubscribe to a channel and send and receive messages for a channel. This interface hides the low level communication with the web server, such as token handling and sending subscribe messages, and provides a more abstract functionality where clients can be notified about channel specific information by registering callbacks. The current voting client has been modified so that it uses the new client interface to connect to the server and subscribe to a channel. The data model handler has also been changed to support the new data model where only the specific event data is stored locally. The client implementation has been limited to the rating functionality where a user can rate a subject and then receive a result, in the form of a sliding window mean value of votes, which it displays in a heatometer.

79 6.4. Results Results The implementation was tested using the reference client, the reference admin interface and a test client which consisted of a web page with a text area that displayed received messages and a text field that could be used to send messages. Except for the reference channel module, another module was added to the web server that accepts every subscriber and broadcasts any message it receives. The test clients were run on the same machine, running Windows 7 with processing power close to 2.7GHz and memory of 8GB. All of the services used are run either as a AWS service or on an AWS EC2 micro instance. See chapter 4.2 for a complete overview of how every component of the service are implemented. To show that messages are only sent over the corresponding channel, three instances of the test client were started where, in the improved solution, the first and second client were connected to one channel and the third client was connected to another channel. The first client then sent 1000 messages aimed only for the clients of its corresponding channel. As seen in Figure 6.3, all three clients received every message when tested with the reference solution, while only the two first clients received the messages in the modified solution. Figure 6.3: Number of messages received by three clients where the first client is publishing and subscribing to channel 1, the second client is subscribing to channel 1 and the third client is subscribing to channel 2. The time to connect to the server and initialize a subscription, is shown in Figure 6.4. It presents the time for three versions; the reference solution, the general method for the modified solution and a special method for the modified solution. The time of the reference solution refers to the time it takes for a client to first fetch the bootstrap by using an AJAX call, then fetch the token using an AJAX call and lastly connecting to the bi-directional communication link. The general method for the modified solution refers to the time it takes for a client to connect to the server and then, after it has received a connection response, do a subscription request and receive a subscription response. The special, quick subscribe method, refers to the time it takes for a client to connect to the server using a special URL,

80 70 Chapter 6. Implementation of Improvements which the server also sees as a subscription request, and then receive a combined connection and subscription response. The times shown are the mean values of a 1000 tries for each method and it shows that the special method for the modified solution is slightly faster than the reference solution, whereas the general method for the modified solution is considerably slower than the other two. Figure 6.4: The perceived latency for a client when initializing a connection to the service with one subscription. Figure 6.5 presents how the three methods perform when ten consecutive clients connect to the server and subscribe to a channel. It presents the latency of the initialization for the nth connecting client where n 1 clients already are connected to the server. These times represent the mean values of 1000 tries. It shows that the two methods for the modified solution take a longer time for when the first client connects. Figure 6.6 shows the size of the bootstrap for the reference use case of both solutions when consecutively adding channels which has a bootstrap of 1000 bytes. The client of the modified solution is only subscribing to one channel. The reference solution scales linearly with the number of channels added to the system while the size of the bootstrap sent by the modified solution is constant no matter the number of existing channels. 6.5 Results Analysis This section analyses the results from the previous section and compares it between measurements and between different trade-offs for the services Separation of Messaging Data The response time to be fully connected to a web server of the reference solution is defined as the sequence of connecting to the web server, fetching the token and then fetching the