DEVELOPMENT OF A AUTONOMOUS BUILD AND DISTRIBUTION SYSTEM FOR SOFTWARE WITH HYBRID DEVELOPMENT MODELS

Similar documents

Version Control Systems: SVN and GIT. How do VCS support SW development teams?

CPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)

Software Configuration Management and Continuous Integration

In depth study - Dev teams tooling

The Deployment Production Line

Version Control! Scenarios, Working with Git!

Content. Development Tools 2(63)

Implementing Continuous Integration Testing Prepared by:

Configuration & Build Management

Source Control Systems

Zero-Touch Drupal Deployment

Distributed Version Control

Version Control with Subversion

SOFTWARE DEVELOPMENT BASICS SED

<Insert Picture Here> Introducing Hudson. Winston Prakash. Click to edit Master subtitle style

Software Configuration Management. Addendum zu Kapitel 13

GENiC. Deliverable D5.1 Development & Integration guidelines including integration environment & means. Dissemination Level: Public

Using Git for Project Management with µvision

Software Continuous Integration & Delivery

THE WINDOWS AZURE PROGRAMMING MODEL

Distributed Version Control with Mercurial and git

Continuous Integration. CSC 440: Software Engineering Slide #1

CoDe:U Git Flow - a Continuous Delivery Approach

Software Development In the Cloud Cloud management and ALM

Modern App Architecture for the Enterprise Delivering agility, portability and control with Docker Containers as a Service (CaaS)

Software Configuration Management. Slides derived from Dr. Sara Stoecklin s notes and various web sources.

LDAP Authentication Configuration Appendix

StriderCD Book. Release 1.4. Niall O Higgins

Software configuration management

The Real Challenges of Configuration Management

CSCI E 98: Managed Environments for the Execution of Programs

Erlang in E-Commerce and Banking

Practicing Continuous Delivery using Hudson. Winston Prakash Oracle Corporation

Modern Application Architecture for the Enterprise

HDFS Users Guide. Table of contents

Solving the Software Quality Challenges of Agile Development

Advanced Computing Tools for Applied Research Chapter 4. Version control

Building a Continuous Integration Pipeline with Docker

Page 1. Outline of the Lecture. What is Software Configuration Management? Why Software Configuration Management?

Software Configuration Management Best Practices for Continuous Integration

SharePoint 2013 Migration Readiness

Global Software Change Management for PVCS Version Manager

Continuous Integration

Continuous Integration (CI)

VOC Documentation. Release 0.1. Russell Keith-Magee

Using GitHub for Rally Apps (Mac Version)

An Oracle White Paper May Oracle Tuxedo: An Enterprise Platform for Dynamic Languages

WINDOWS AZURE EXECUTION MODELS

(Refer Slide Time: 01:52)

USING SYNERGY WITH CRUISE CONTROL

Introduction to Version Control with Git

Lab Exercise Part II: Git: A distributed version control system

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

The following is a list of the features available with the managed Intersoft IP Telephony Services.

Kofax Transformation Modules Generic Versus Specific Online Learning

Efficient database auditing

Agile Projects 7. Agile Project Management 21

using version control in system administration

Git Basics. Christopher Simpkins Chris Simpkins (Georgia Tech) CS 2340 Objects and Design CS / 22

Process Methodology. Wegmans Deli Kiosk. for. Version 1.0. Prepared by DELI-cious Developers. Rochester Institute of Technology

Introduction to Programming Tools. Anjana & Shankar September,2010

In the IEEE Standard Glossary of Software Engineering Terminology the Software Life Cycle is:

Streamlining Patch Testing and Deployment

Software Construction

Ikasan ESB Reference Architecture Review

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Continuous Integration and Delivery at NSIDC

The Virtualization Practice

CONTINUOUS INTEGRATION

IBM Rational ClearCase, Version 8.0

Introducing Xcode Source Control

Oracle RAC Services Appendix

Software development. Outline. Outline. Version control. Version control. Several users work on a same project. Collaborative software development

Syslog Analyzer ABOUT US. Member of the TeleManagement Forum

Jenkins Continuous Build System. Jesse Bowes CSCI-5828 Spring 2012

The preliminary design of a wearable computer for supporting Construction Progress Monitoring

Version Control with. Ben Morgan

Installing and Administering VMware vsphere Update Manager

Version Control with Svn, Git and git-svn. Kate Hedstrom ARSC, UAF

Mary E. Shacklett President Transworld Data

Chapter 13 Configuration Management

Using Git with Rational Team Concert and Rational ClearCase in enterprise environments

Software Engineering for LabVIEW Applications. Elijah Kerry LabVIEW Product Manager

The Continuous Delivery Effect

Continuous Integration and Bamboo. Ryan Cutter CSCI Spring Semester

Optimizing Your Software Process

Kernel comparison of OpenSolaris, Windows Vista and. Linux 2.6

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Distributed Software Development with Perforce Perforce Consulting Guide

Chapter 13: Program Development and Programming Languages

Continuous Integration & Feature Branches

White Paper. Anywhere, Any Device File Access with IT in Control. Enterprise File Serving 2.0

Web Applications Access Control Single Sign On

Transcription:

Faculty of Computer Science Institute of Systems Architecture Diploma Thesis DEVELOPMENT OF A AUTONOMOUS BUILD AND DISTRIBUTION SYSTEM FOR SOFTWARE WITH HYBRID DEVELOPMENT MODELS Tino Breddin Mat.-Nr.: 326234 Supervised by: Prof. Dr. Alexander Schill, Dipl. Inf. Josef Spillner Submitted on August 10th, 2010

2

ABSTRACT The Open-Source Software business model has helped Open-Source Software (OSS) to gain acceptance within the software industry as viable software which can handle mission critical tasks. This model has also introduced a new way to develop Open-Source Software. Instead of being entirely developed by a open community of developers, a company supports and leads the development primarily. Moreover OSS projects in general benefit from the increasing interest, with more developers helping such projects. The distributed nature of OSS development teams requires new ways to manage projects in order to achieve good productivity. Concepts like Continuous Integration provide such methods, focusing on improving information sharing and instant feedback on changes within a development process. Continuous build systems are a central part of any infrastructure which implements Continuous Integration methods and as such holds an important role to improve software quality. Good software quality includes robustness in regard to the deployment environment of any given software. Better quality is perceived by users if they don t run into problems when setting up and using the software. Modern continuous build systems don t help development teams as much as desirable in such cases. Old concepts which build the foundation of such systems prohibit the enhancement of these to be of better use. Therefor the system which is developed in this thesis, named Swarm, is based on different concepts which are meant to support software development teams in a improved manner. Swarm s core is based upon the use of many so called production environments, as the environments in which software should be build and tested is called as part of Continuous Integration. To further integrate well with development processes based on Distributed Version Control Systems, the features of such systems are used to provide feedback for the various changes to a software project which are available through various communication channels. The prototypical implementation of the system presented in this thesis shows how software quality can be boosted further than existing continuous build systems can do. Moreover Swarm can be adapted to allow even more improvements which is reserved for future extensions. 3

4

Dedication For my great-grandma, Gertrud. She will always be remembered as the kind women she was. For my wife, Marie-Kristin, who always finds ways to make me smile. For my family, thanks for the unconditional support throughout my studies. 5

6

CONFIRMATION I confirm that I independently prepared the thesis and that I used only the references and auxiliary means indicated in the thesis. Dresden, August 10th, 2010 7

8

CONTENTS 1 Introduction 13 1.1 Motivation......................................... 13 1.2 Idea............................................ 14 1.3 Structure of the Thesis.................................. 14 1.4 Summary......................................... 14 2 Background 15 2.1 Continuous Integration.................................. 15 2.1.1 Concepts..................................... 15 2.1.2 System: CruiseControl.............................. 16 2.1.3 System: Hudson................................. 17 2.1.4 Summary..................................... 17 2.2 Source Code Management............................... 17 2.2.1 History of Version Control............................ 17 2.2.2 Centralized Version Control........................... 18 2.2.3 Distributed Version Control........................... 18 2.2.4 Git......................................... 19 2.2.5 Summary..................................... 21 2.3 Collaborative Software Development.......................... 21 9

2.3.1 Sourceforge.................................... 21 2.3.2 Github....................................... 21 2.3.3 Summary..................................... 21 2.4 Erlang/OTP Project.................................... 22 2.4.1 Language Characteristics............................ 22 2.4.2 Open Telecom Platform............................. 24 2.4.3 Development Process.............................. 24 2.4.4 Summary..................................... 24 2.5 Summary......................................... 25 3 Swarm A Distributed Build System 27 3.1 Requirements Analysis.................................. 27 3.1.1 Continuous Integration.............................. 27 3.1.2 Distributed Version Control Systems...................... 28 3.1.3 System Integration................................ 28 3.1.4 Summary..................................... 29 3.2 System Principles.................................... 29 3.2.1 Single Project Support.............................. 30 3.2.2 Deferred Authentication............................. 31 3.2.3 Git SCM Support................................. 31 3.2.4 Summary..................................... 32 3.3 System Design...................................... 32 3.3.1 Architecture.................................... 33 3.3.2 Subsystem: Platform Management...................... 33 3.3.3 Subsystem: VCS Interface............................ 39 3.3.4 Subsystem: Data Storage............................ 43 3.3.5 Subsystem: Job Processor........................... 49 3.3.6 Subsystem: Operation and Management................... 52 3.3.7 System Configuration.............................. 57 10 Contents

3.3.8 Summary..................................... 58 3.4 System Integration.................................... 58 3.4.1 Event Hooks................................... 59 3.4.2 Hook Scripts................................... 59 3.4.3 Summary..................................... 60 3.5 Workflows........................................ 60 3.5.1 Latest Version Change.............................. 60 3.5.2 Result Propagation................................ 61 3.5.3 New Change................................... 63 3.5.4 Summary..................................... 63 3.6 Summary......................................... 63 4 Evaluation 65 4.1 Component-based Development............................ 65 4.2 Usage of Erlang/OTP................................... 65 4.3 Test Run.......................................... 66 4.4 Test Deployment..................................... 67 4.5 Summary......................................... 67 5 Conclusion 69 5.1 Plugin Support...................................... 69 5.2 Restful HTTP API..................................... 69 5.3 Native VCS Support................................... 70 5.4 Inter-Instance Communication.............................. 70 5.5 Summary......................................... 71 List of Figures 72 List of Tables 74 Glossar 77 Acronyms 79 Bibliography 80 Contents 11

12 Contents

1 INTRODUCTION The topic of this thesis is introduced in this chapter by giving a detailed explanation of the initial motivation in section 1.1. The following section 1.2 concisely summarizes the idea which builds the foundation for this work. Lastly the structure of this thesis is presented in section 1.3 and the foundations for this thesis are summarized in section 1.4. 1.1 MOTIVATION In recent years OSS has emerged as a business opportunity which is now well known as the Open-Source Software business model. While some companies make their software available to the public to drive adoption others specialize on supporting already established OSS projects. While it becomes easier for all participants to use Open-Source Software, this is not necessarily the case when it comes to driving the development forward. Often companies have well established internal development procedures which they don t want to expose to the public. This makes it hard for external developers to contribute to a project. Furthermore in-house development driven by customer projects might not be integrated with the OSS version of the software. In parallel to the development of the OSS business model, the development models being used have change as well. This led to the adoption of continuous build systems to improve software quality and feedback about changes throughout the complete development cycle of a software project. This class of software already provides tremendous benefits for any software project. Still, these build systems have been designed with other requirements in mind than modern OSS projects are presented with. Software is expected to support many operating systems and integrate with an ever-changing and divers set of third-party software. If the gap between in-house development at companies and efforts of OSS developers could be bridged to connect the different workflows without ongoing manual effort, contributions would be much easier to integrate than it is the case at the moment. Such changes would enable the collaboration between OSS projects and corporations in new ways when it comes to software development. In addition the possibilities to more easily extend support for operating systems would add more benefits to the overall goal, to improve software quality which itself helps OSS projects to drive adoption. 13

1.2 IDEA The concept of continuous build systems needs to be reconsidered based on the requirements of modern OSS projects. The core functionality of build systems should be modified to provide the ability to build software on many different systems by default. Furthermore the outcome of builds should be easily sharable with other tools to enable the creation of sophisticated development processes. As a Proof of Concept (PoC) for these core functionalities a continuous build system should be developed which proves the aforementioned benefits while taking a simplistic approach in regard to other aspects of modern build systems. 1.3 STRUCTURE OF THE THESIS Chapter 1 opens up this thesis by introducing the underlying motivation and general idea to solve the described problems. Subsequently relevant topics, such as Continuous Integration (CI) and Source Code Management (SCM), are explored in detail in chapter 2 to provide the necessary background for further discussions. The system itself is developed in chapter 3. This includes the requirements analysis as well as the system design. The results are evaluated in chapter 4 in regard to the initial assumptions and gathered requirements. Finally this thesis concludes by discussing shortcomings and future improvements in chapter 5. 1.4 SUMMARY OSS development has led to rapid changes both in software development practices and software business models. Unfortunately the available software development tools don t allow OSS projects to fully exploit collaboration between all contributors and drive user adoption, as explained in further detail in section 1.1. Instead it is suggested in section 1.2 that a novel continuous build system should serve as a showcase of how new core functionalities can further enhance the software quality of OSS projects. Lastly section 1.3 describes the structure of this thesis. 14 Chapter 1 Introduction

2 BACKGROUND The system which is developed in this thesis is based on the achievements made in different areas of software development to address the issues presented in the section 1.1. As the thesis title implies the developed system is a continuous build system, thus the theory behind Continuous Integration will be explained in section 2.1 and highlight the need for such systems. The various emerging Distributed Version Control System (Distributed VCS), e.g. Git [git] and Mercurial [mercurial], have changed the way developers share code and contribute to OSS projects. Section 2.2 describes the history and basics of these systems briefly. The following section 2.3 gives examples for platforms which support collaborative software development within OSS projects. Subsequently the basics of Erlang, the programming language being used for the implementation of the system developed in this thesis, are explained in section 2.4. Section 2.5 summarizes the presented background while pointing out important aspects. 2.1 CONTINUOUS INTEGRATION The term Continuous Integration was first defined by [Beck2000] as part of a set of Extreme Programming (XP) best practices. Building and testing a software product every time a development task is completed was considered CI. [Beck2000] arguments that this regular cycle would instantly show whether new changes break the software and could be addressed much earlier than in traditional development models such as the Waterfall Model [Royce1987]. This simple concept has evolved from a suggested XP practice to a software development methodology [Duvall2007] which is based on several practices of which the most important are briefly explained in the following sections. 2.1.1 Concepts Build Automation Building the current version of a software should be possible by executing a single command. The assumption is that the more steps a build process involves the more unlikely it becomes that developers actually build the software during development. This might not hold true for small projects but the more complexity and components are added to a software the more difficult the build process becomes. Keeping the simplicity of building software during the development is a crucial part of Continuous Integration. 15

Build Fast During the development of software it is almost inevitable that the time which is needed to run a build increases. Building complex software can take several hours to finish which increases the time it takes until developer receives feedback by building. This leads to a state where nobody can tell whether recent changes have affected the system behavior or even broke the system entirely. To alleviate this situation one should focus on keeping the build process fast so that developers can actually build the software and test it after having applied changes. This can be achieved by creating components which are independent enough that it makes sense to build and test them individually after recent changes, while building and testing the whole system independently. Progress Must Be Visible Extreme Programming focuses on the interaction within a team instead of following strict processes. This creates a need for providing as much information as possible about the state of the development to any team member at any time. It should be visible how much work a team member was able to finish, which changes broke the system at which time by whom, how much attention has been put on individual components and whether a team member got stuck at a certain task. The goal is to keep all team members informed so that the most critical tasks can be tackled at any time. Each Commit triggers Builds A way to implement the principle described in the previous section 2.1.1 is to build the complete system every time a developer commits to the code base. In a software development team one measure for progress are committed changes, thus it is feasible to automatically provide feedback for any commit which is visible to all team members. It is essential that developers are relieved from the burden to start this process manually since this should be done behind the scenes. Clone Production Environment While developers tend to have a development environment which is specifically geared towards their personal needs, this too often isn t the case for the build and test environments. To eliminate the risk of leaving issues undetected when running the developed system on the target platform, it is necessary to create a environment for the automatic builds and tests which resembles the target environment. This doesn t only affect the chosen operating system but also the complete hardware and software stack of the target system. The better the CI environment is modeled after the target environment the more confident can the development team be that they didn t miss critical bugs before deploying the software. 2.1.2 System: CruiseControl As one of the first successful OSS continuous build systems CruiseControl [cruisecontrol] has served as an example for many systems of that kind thereafter. Initially developed within ThoughtWorks the system has been published as OSS later and found wide adoption. This is underlined by the subsequent offerings of commercial products by ThoughtWorks based on 16 Chapter 2 Background

CruiseControl to meet corporate user s requirements. Technically CruiseControl is a traditional continuous build system which uses the server it is served from to run builds. Its flexible plugin architecture has helped to extend the base framework with various additional functionality which helped to spread adoption of the system even further. CruiseControl is still in active use and development with various spin-offs, written in other programming languages, being available. 2.1.3 System: Hudson The OSS continuous build system Hudson [hudson] has become a very popular alternative to CruiseControl due to its extensive SCM support and plugin architecture. Its development has been sponsored by Sun Microsystems thus attracting many users early on. Although it was initially being used as a continuous build system for Java projects, support for different projects has been added through the help of many additional plugins. The project keeps growing in popularity after receiving various OSS awards lately. It is nevertheless based upon the very same principles as CruiseControl like using the local server for running builds. 2.1.4 Summary Continuous Integration is comprised of various smaller concepts which are individually referred to as agile practices and as a whole help improving software development by sharing information. Notably the idea of making any progress visible to all members of a team as well as getting progress information as fast as possible support the idea of CI. Many OSS continuous build systems have picked up these concepts and help shaping the software build process to fit such agile practices. The OSS systems CruiseControl and Hudson stand out because of their wide user adoption which is influenced by the versatility of both tools. 2.2 SOURCE CODE MANAGEMENT When a team of software developers works on the same software it becomes increasingly important to be able to see who made which changes when. This need for organization led to the development of Version Control Systems (VCSs). Since version control is an integral part of any Continuous Integration environment it is necessary to highlight how version control started and how it has evolved until today. Section 2.2.1 covers the roots of Version Control Systems and details the important milestones in the development of the state-of-the-art systems. The class of Centralized Version Control Systems is covered in section 2.2.2. Furthermore section 2.2.3 focuses on the recent developments in the area of Distributed Version Control Systems. Finally the takeaways of Source Code Management are presented in section 2.2.5. 2.2.1 History of Version Control The development of special purpose systems for controlling source code became first publicly noticed through [Rochkind1975] and [Glasser1978] who introduced the concept of using a SCM system and recent development in this area. [Tichy1982] showed the stages and culprits involved in working on such systems. The first widely used OSS SCM called RevisionControl System (RCS), introduced by [Tichy1985], was an important milestone because it nourished the development of many different approaches in the area of SCM as summarized by [Royce1987]. Further development has focused on Centralized Version Control Systems (Centralized VCSs) which are presented in detail in the following section 2.2.2. A recent conceptual change has been introduced by Distributed Version Control Systems, covered in section 2.2.3. 2.2 Source Code Management 17

central repository developer developer developer Figure 2.1: A centralized version control workflow. 2.2.2 Centralized Version Control The first group of SCM systems which received wide adoption were Centralized Version Control Systems. These systems rely on a central location to store all versioned data as well as version history. These locations are referred to as central repository. Figure 2.1 depicts a typical Centralized VCS setup with one central repository and several developers. These users can only checkout a single version of the data in the central repository. No further versioning information is kept within a developer s copy of the data other than the version identifier itself. Thus all versioning knowledge is kept within the central repository. Furthermore developers are only allowed to push changes to the central repository. Those changes need to be based on the latest version of the data as well. This leads to a very inflexible process of working on the source code. However having this central authority can fit well into other parts of a software development process. Concurrent Versions System (CVS) [cvs] is a notable implementation of such systems as it gained popularity after its initial release in 1990 and is still in active development. Further improvements are provided by the OSS Centralized VCS Subversion as described by [Pilato2004]. 2.2.3 Distributed Version Control While Centralized VCSs employ a strict workflow based on a central repository the newer concept of Distributed Version Control Systems tries to support flexible workflows as much as possible by eliminating the need for central repositories. All copies of a repository are considered full-fledged repositories themselves because all versioning metadata is kept within a copy as well. Therefor developers are given the freedom to share data more easily as shown in figure 2.2. This unconstrained flexibility allows teams to create workflows which are specifically tailored towards their needs by creating roles and logical contracts for repositories. Figure 2.3 presents a simple workflow which contains a single shared repository, which can only be 18 Chapter 2 Background

shared repository developer developer developer Figure 2.2: A distributed version control setup. pushed to by a single maintainer. Developers can read from the shared repository and share data between each other. But ultimately final changes need to be send to the maintainer for inclusion in the team s shared repository. This example shows how versatile Distributed Version Control Systems are in terms of workflow adaptation. While there are many Distributed VCSs available, especially two systems have established themselves as the state-of-the-art. Mercurial [mercurial] is a system implemented in Python which focuses on providing a easy user interface, thus there is always only one way to do a certain task. Another approach is taken by Git [git] which is implemented in C and often provides the developer many options to accomplish a certain task. In general both systems are very similar regarding their features and performance. 2.2.4 Git Git has initially been developed by the Linux Kernel Development Team to support their complex requirements for SCM. Since then it has become very popular especially in the OSS community. Git provides many ways to organize and change the source code one is controlling with it. But the system developed in this thesis only needs a small subset of the features available thus the important features and their usefulness are discussed in this section. Branching Creating branches of the source code is a essential tool to separate different development paths with the same base source code. The concept is that one can work on a separate branch for a set of features and once all changes have been committed, these would be added to the main branch again. Creating a new branch used to be a time consuming operation in Centralized VCSs which often involves the locking of the main branch, thus blocking anybody from committing changes. 2.2 Source Code Management 19

maintainer shared repository developer developer developer Figure 2.3: A development workflow employing a maintainer to approve changes to the shared repository. Git performs branching operations very fast because it is able to operate locally instead of having to synchronize with a remote server. This allows for a more flexible usage of branches, such as having a separate branch for each known bug or feature which somebody is working on. Switching a branch is fast as well, so that one can change between so called Topic Branches easily if necessary, without mixing fixes for a bug or the implementation of a new feature. Developers are encouraged to use many branches instead of mostly working on the main branch. Merging When making extensive use of branches the actual merging of changes in a branch with another branch is very important. Just as with branching the merge operation itself is very fast which enables developers to merge often, to check that recent changes work well with the downstream of the project. Furthermore Git uses a multi-stage merging algorithm which is supposed to work very well without much manual interaction. Rebasing The rebase operation is used to align the changes in a branch to an upstream branch. It rolls back the local branch to the last commit it has in common with a upstream branch. Then it will apply the new changes from the upstream branch to the local branch. Finally the local changes will be re-applied on top of the upstream changes. If Git experiences merging problems the developer will have to manually merge the conflicts. The operation is supposed to be used to apply local changes on top of the upstream changes without actually merging them from the beginning. Therefor the upstream changes become the base for any local changes, thus one will rather edit the local changes than the ones from the upstream branch which are supposed to be working and well tested. 20 Chapter 2 Background

2.2.5 Summary Source Code Management has become an integral part of any software development process over the last decades. In this section an overview has been given over the History of Version Control 2.2.1 beginning with RCS as the first widely used OSS Version Control System until the more recent systems, such as Mercurial, gained traction. The two dominant approaches for SCM, Centralized VCS and Distributed VCS, mostly differ in how flexible the workflows are which they enforce as described in detail in sections 2.2.2 and 2.2.3. Git, a popular Distributed VCS, puts an emphasis on the speed of any operation it performs and is used for the system developed in this thesis. 2.3 COLLABORATIVE SOFTWARE DEVELOPMENT Open-Source Software has led to the assembly of many teams which work in separate physical locations around the planet. Inevitably traditional software development tools which are used in corporations can t provide sufficient support for such flexible teams. This has led to the development of new tools and platforms which help developers to collaborate on projects productively while being physically separated. Two popular web platforms for collaborative development are presented in sections 2.3.1 and 2.3.2. Lastly section 2.3.3 summarizes the benefit for development teams. 2.3.1 Sourceforge As one of the first project hosting platforms SourceForge became a hub for OSS projects. It started out in 1999 with source code hosting and added more advanced features ove time to become a full-fledged project hosting platform. These features include bug tracking, wikis, forums and mailing lists. The platform also expanded to offering commercial plans for closed projects, which essentially receive the same features. 2.3.2 Github Many project hosting platforms provide various options for the different features which are available, while some niche platforms focus on one option for each feature instead. Github is one such niche platform which started as a source code hosting platform for Git repositories. Later it has become a viable platform for complete projects though, while still focusing on giving developers only one option for any feature such as code reviewing or wikis. The platform has found wide adoption especially because of its early support for Git hosting. The common workflow for code sharing on Github is depicted in figure 2.4. Any developer can create a clone of an existing public repository, which is then tied to his own Github account. This repository can be used to share code changes with others since it is publicly available. Other developers can then incorporate these changes into their own copies of the original repository. 2.3.3 Summary Teams which are physically separated need not only a way to share information but also need to be able to setup processes which support their efforts. Because of the ubiquitous internet access especially web-based tools have been adopted for such purposes. Both SourceForge and Github, which were described in sections 2.3.1 and 2.3.2, are popular among the large set of such platforms because auf their good feature set. 2.3 Collaborative Software Development 21

Github original repository cloned repository cloned repository developer developer developer Figure 2.4: A common workflow for repositories hosted on Github. 2.4 ERLANG/OTP PROJECT The Erlang/OTP project comprises the development of the programming language Erlang and a set of well supported applications referred to as Open Telecom Platform (OTP). The language itself was developed at the Ericsson CSLab from the 1980s until it was made available as OSS in 1998 [?]. The language and libraries are covered by the Erlang Public License [epl]. Since then Ericsson has continued driving the development of the language especially because it s still used for internal product development. Erlang/OTP is already widely used for telecom and messaging applications while the adoption for systems which require easy scalability and fault-tolerance is increasing steadily over the last years. This section introduces Erlang/OTP by giving a dense overview of the language in section 2.4.1 itself and its libraries in section 2.4.2. Section 2.4.3 focuses on the current development process which is used by the Erlang community. Finally section 2.4.4 summarizes the findings and shortly discussed the use of Erlang in this thesis. 2.4.1 Language Characteristics Erlang is often referred to as a concurrent functional programming language. It has several features which are meant to make programming concurrent applications easier. The most important of those are briefly described in this section in addition to some language basics. Virtual Machine Erlang source files (.erl) are compiled to byte-code (.beam). This byte-code is then interpreted at runtime by the Erlang Virtual Machine, called BEAM. This gives Erlang the advantage of good portability, since the compiled code is platform independent. The virtual machine itself is optimized for multi-core processors which improves its performance on modern hardware tremendously. 22 Chapter 2 Background

Lightweight Processes A fundamental language construct are processes. A process is some code which is executed sequentially. It has a very small memory footprint as well as fast creation and destruction performance. The Erlang Virtual Machine can manage millions of processes while only being memory bound. Processes can be executed in parallel making executing chunks of code in parallel very trivial. Listing 2.1 shows how a process can be started, which is called spawning in Erlang terminology. 1 spawn ( fun ( ) > e r l a n g : d i s p l a y ( hello_world ) end ). Listing 2.1: Spawning a process which prints a message to stdio. Message Passing Erlang processes don t share any memory. Thus messages are being used to send data between processes. Any process can send any type of data to any other process in the system. At a receiving process all incoming messages are being added to a receive queue from which the process can pull new messages. Therefor the language provides the receive keyword which makes a process block and pull the oldest message from the queue. Listing 2.2 shows how messages are send and received. 1 % sending message to myself ( the running process ) 2 s e l f ( )! { d i s p l a y, hello_world }. 3 4 % receive t h a t s p e c i f i c message 5 receive 6 { d i s p l a y, Msg} > e r l a n g : d i s p l a y (Msg) 7 end. Listing 2.2: Sending and receiving messages with one process. Hot Code Swapping The virtual machine provides the ability to swap Erlang byte-code at runtime, allowing systems to be upgraded without any downtime. Therefor two versions of any given module are kept in memory, the current version V and the older version V-1. If a new version of a module is loaded it becomes the current version V while the former current version becomes V-1. All new processes will be using the current version of the module, whereas already running processes will continue to use the older version. Yet the running processes can choose to upgrade to the current version at any time, making it possible to upgrade whole systems without having to restart running processes. Single Variable Assignment As mentioned earlier Erlang doesn t have any shared memory. This concept has been taken to the variable level by making variables immutable after being assigned a value. This allows for easier tracing of errors because the assignment of a value to a variable is explicit. However this concept can lead to messy code in practice if not used properly. Listing 2.3 shows how variables are assigned properly. 1 % proper assignment of a new v a r i a b l e 2 MyVar = 1. 3 4 % f a i l i n g assignment, because v a r i a b l e i s a l r e a d y assigned 5 MyVar = MyVar + 2. % r e t u r n s : exception e r r o r : no match of r i g h t hand side value 3 6 7 % proper assignment 8 MyVar2 = MyVar + 2. Listing 2.3: Correct variable assignment and common failures. 2.4 Erlang/OTP Project 23

Distributed Erlang Message passing and lightweight processes give programmers a easy abstraction from multi-core processors allowing them to make use of all available resource without having to worry about multi-threading. Distributed Erlang provides the same level of abstraction on a system level. A single running Erlang virtual machines is called node. Erlang nodes can be connected in the sense that processes from each node can communicate through messages with each other. This makes it trivial to run any Erlang program on more than one Erlang node. The real advantage is that these nodes can reside on different physical machines, which allows programmers to run a distributed system on top of distributed Erlang. 2.4.2 Open Telecom Platform Erlang is shipped with a set of applications which act as the standard library for the language. These applications are referred to as OTP, named after their initial development as part of telephony development projects. These applications are known to be well tested and in production use for many critical systems, thus the Erlang/OTP team encourages users of Erlang to utilize the OTP as much as possible. The applications are constantly improved to ensure that they present a viable solution for common problems. 2.4.3 Development Process After Erlang/OTP has been made available to the public as Open-Source Software in 1998, the Erlang/OTP team at Ericsson continued to manage the development of the language. New releases would be published on a regular basis, but the development wasn t visible to the public. Furthermore the direction of development was largely based on customer s needs. The interaction with the Erlang community was mainly based on three mailing lists, one for bug reports, one for patches and one intended for general discussion. EEPs In order to be able to get involved in the development of the language itself one would have had to write an Erlang Enhancement Proposal (EEP) [eep]. Those proposals were meant as a base for discussing important language changes before any work is done. This process has worked well until 2008, when the Erlang community started to grow and it became apparent that Erlang/OTP would benefit from a better community involvement in the development process. In 2009 the Erlang/OTP team started to open up their development process by publishing the main development branch of Erlang/OTP on Github [erlangongithub]. Now developers are able to contribute changes to the development version by sending standardized patches to the patches mailing-list. Those patches will then be reviewed and potentially added to the development branch. Thus the open source community is now more involved into the development which has led to an increased interest in Erlang/OTP, judging from the number of patches submitted to the mailing list. 2.4.4 Summary Erlang is a mature programming language which is often used for programming highly concurrent applications. Together with its standard library OTP it allows developers to create scalable and fault-tolerant software with less effort than would be necessary with other programming languages such as C. Erlang is also a functional programming language which uses concepts such as pattern matching, no shared data and single-assignment variables to allow programmers to express difficult problems more easily. The language is Open-Source Software and available from its main source code repository on Github [erlangongithub]. 24 Chapter 2 Background

2.5 SUMMARY Continuous Integration is a software development methodology which emphasizes sharing of information within a project. The concepts behind CI and continuous build systems which actually implement these concepts were described in section 2.1. Section 2.2 presented the history and modern tools for Source Code Management which became increasingly important as software project became more ambitious and complex. It focused on Git as it is used within the system which is developed in this thesis. Subsequently the importance of project hosting platforms for collaborative software development were explained in section 2.3, giving Github as an example for such platforms. Lastly section 2.4 gives a short introduction to the programming language Erlang, which is used for the prototypical implementation of the system developed in this thesis. Its strength for concurrent programming are especially useful for the concurrent nature of the this system. 2.5 Summary 25

26 Chapter 2 Background

3 SWARM A DISTRIBUTED BUILD SYSTEM Continuous build systems have been gaining popularity ever since Continuous Integration has been adopted by more and more software development teams. As a result those systems have matured to a point where there are many available systems which provide very similar capabilities. The system which is proposed in this chapter takes a different approach in regard to some concepts which have been indicated in section 1.1. Section 3.1 creates the foundation for this proposal by specifying the high-level requirements for this system. Three important system principles are defined in section 3.2 before the system design is presented in elaborate detail in section 3.3. Subsequently the capabilities for integration with third-party systems is outlined in section 3.4 before some core workflows of the system are explained in section 3.5. Lastly this chapter is concluded by a summary of the proposed system in section 3.6. 3.1 REQUIREMENTS ANALYSIS This section presents the requirements for Swarm. Section 3.1.1 outlines the requirements which are defined by existing continuous build systems and the concept of Continuous Integration. Furthermore the popularity of Distributed Version Control Systems requires some novel functionality as well which is explained in section 3.1.2. In addition section 3.1.3 describes ideas from general systems integration which relate to continuous build systems. To conclude all requirements are again summarized in section 3.1.4. 3.1.1 Continuous Integration Continuous Integration as a development methodology already defines a set of requirements which continuous build systems need to adhere to. These can also be identified by looking at some core features of existing continuous build systems. Project Repository Monitoring As part of a project which is setup within the continuous build system the respective source code repository should be monitored automatically. This allows the continuous build system to notice when the latest version within the repository changes. 27

Build Latest Version Furthermore the continuous build system should build and test the latest version of the source code, once it changed. The delay between the time when the change was published and the start of the build should be as small as possible. Make Results Publicly Available Continuous Integration stresses the fact that any information about the state of the source code should be shared within a team. Thus the results of builds of a project need to be presented in a publicly readable form, once available. That means even results from ongoing builds should be presented. Build in Production Environment The motivation of this thesis presented in section 1.1 states that builds need to be run on many systems to improve software quality rather than sticking to one particular environment. This requirement especially holds true for OSS projects. Therefor a continuous build system needs to be able to build the source code on as many platforms as possible, which are distinct from the one it is running on. This requirement also marks a conceptual change from existing continuous build systems, which focus on using the platform they are running on as the reference environment. 3.1.2 Distributed Version Control Systems The paradigm change within Source Code Management systems has allowed development teams to customize their development workflow towards their needs. In addition developers are presented with many now effective methods to control their source code, such as branching. Continuous build systems need to adapt to this change to be able to keep up with the changing pace of development. Build Topic Branches Globally shared repositories are not the first destination of changes within custom development workflows anymore. Teams might decide to only submit verified changes to their central repository, while changes which still need to be verified are kept in the developers repositories, some other shared repository or within a code review system. Either way, part of the verification should be the building of changes as soon as developers have finished working on these. A continuous build system should build such changes whether they live in a repository as topic branches, are kept as patches in a code review system or send as an email to a mailing list. The same benefits of building the latest version of some source code apply to building potential changes to the latest version, which is the next step of Continuous Integration. This requirement marks another conceptual change from existing continuous build systems, since these only take the latest version in a repository into account when looking for changes. 3.1.3 System Integration A continuous build system is usually part of a larger combination of systems which support the development process of a team. Within this process each system has its dedicated purposes and brings distinct advantages. Just like other systems a continuous build system needs to be properly integrated into such an environment to fully exploit the advantages of Continuous Integration. 28 Chapter 3 Swarm A Distributed Build System

Share Results An important part of the integration is that results of builds for any change are shared with other systems within the build environment. This allows the setup of sophisticated processes which propagate the data to the correct communication channels. Thus a continuous build system should provide the foundation for sharing results with other communication channels. Share Results with other System Instances OSS projects are usually developed by many participants which can be both individuals and corporations as observed in the motivation of this thesis in section 1.1. Thus the development might not be centralized and based on a single defined process. Instead the different parties might have separate processes and development environments, but are still working on the same project. In such cases receiving information about changes which have been made within another environment can be as important as changes from within the own environments. Thus a continuous build system should be able to share both changes and their results with other instances of the system. This requirement defines another conceptual change from existing continuous build systems, because these have been regarded as stand-alone systems ever since. This sharing of information would truly federalize information between development environments. 3.1.4 Summary In this section the following set of core functionalities has been defined which should be provided by a next generation continuous build system. Project Repository Monitoring Build Latest Version Make Results Publicly Available Build in Production Environment Build Topic Branches Share Results Share Results with other System Instances They are influenced by the concepts behind Continuous Integration, Distributed Version Control Systems and system integration. Each of these requirements will be further discussed as part of the system design in section 3.3. 3.2 SYSTEM PRINCIPLES This section explains the principles which the bespoke system follows in addition to implementing the requirements gathered in section 3.1. These principles allow the system design, which is covered in detail in section 3.3, to be very precise without focusing on extensibility where it isn t needed. 3.2 System Principles 29

The following principles will be presented in detail in the following sections. Single project support Deferred authentication Git-only SCM support A summary of the applied system principles is given in section 3.2.4. 3.2.1 Single Project Support Popular continuous build systems often provide support for multiple projects at a time. This feature is required due to the nature of the usage of such systems and the groups of users. The monitored software tends to be project-related thus running for a specified amount of time during which the feedback from Continuous Integration is most important. After the project has finished the software might still be monitored but since no changes will be added the continuous build system won t add any benefit thereafter. In the case of such time-delimited projects the administrative overhead should be kept to a minimum. Having one instance of a continuous build system being able to monitor many projects eliminates the need to setup an instance per project. This saves the system administrators time as well as resources since no additional hardware is needed. Swarm is meant to be used by projects with different characteristics. Such projects should be ongoing rather than time-delimited. Many OSS projects don t have a pre-defined set of requirements which defines the scope of a project. Instead they are work-in-progress where new ideas and features are added continuously. Furthermore the targeted projects are generic platforms rather than specific instances of a software. Whereas in client projects the deployment environment is well known and teams can work towards that environment, OSS projects can t assume how the environment into which the software is deployed looks like in all cases. Thus OSS projects need to support as many combinations of hardware, operating system and software. This ensures that the software can be spread among as many user groups as possible. Based on these assumptions Swarm supports only one project per system instance. A administrator needs to setup an instance of Swarm per project which should be monitored. This design decision allows for a simple administration of an instance. Nevertheless The underlying architecture as explained in section 3.3.1 is not completely tied to this concept. If necessary it can be extended to support the monitoring of multiple projects, which would also require the adaptation of other components such as the Operation, Administration and Maintenance (OAM) subsystem presented in section 3.3.6. 30 Chapter 3 Swarm A Distributed Build System

user request Web Server user data exchange LDAP Server any user only if user is SysAdmin web frontend serves configuration frontend serves SWARM instance Figure 3.1: A web server providing authentication through an LDAP service. 3.2.2 Deferred Authentication Swarm is a continuous build system which acts autonomously after a administrator has setup the project correctly. Thus it doesn t require a complex user management or flexible authentication support because only few changes need to be made thereafter. To keep the system simple and focused on its core functionality it won t attempt to provide any sort of authentication. Instead other means of security should be used to provide the right users access to the configuration frontend as well as giving all users access to the main web frontend. In this respect Swarm defers the authentication support to external systems which provide a mature and flexible security model already. This limitation, which is considered to be a feature, will also allow Swarm to be deployed in internal networks with custom security measures already in place. Since Swarm doesn t enforce any security model the employed security system simply needs to be made aware of a Swarm instance. All data available to the security system can be used as in any other case. A simple scenario for deferred authentication is the usage of web servers and Lightweight Directory Access Protocol (LDAP) to limit the access of users to certain websites as illustrated in figure 3.1. Since the Swarm configuration frontend is available via a unique Uniform Resource Locator (URL), the web server can be configured to provide only system administrators access to that particular URL and allow all users to view all other URLs served by Swarm. This kind of configuration is a routine operation and often used to secure certain URLs. 3.2.3 Git SCM Support As described in section 2.2.3 various OSS Distributed Version Control Systems are available and often provide the same core set of functionality. To emphasis simplicity Swarm will only 3.2 System Principles 31

Swarm VCS Abstraction Layer Git Hg Bazaar Figure 3.2: SCM support realized through abstraction layer over specific systems. support the use of Git for monitored projects. This limitation, once again a feature, allows the system to leverage Git as much as possible. Although only Git will be supported, the system will abstract from the underlying SCM as shown in figure 3.2. This abstraction will provide as much extensibility as necessary to be able to support other SCM systems in the future. 3.2.4 Summary The principles presented in this section do limit the system s capabilities, but also allow the system design to focus on simplicity and core functionality. Deferred authentication delegates any authentication to third-party systems, while Git SCM support and single project support help to focus the design on other areas of interest. Nevertheless these limitations can be ignored in the future if the system is supposed to provide the skipped functionality. 3.3 SYSTEM DESIGN The system design of Swarm is covered in broad detail in the following sections. Section 3.3.1 introduces the overall architecture which is based upon easily exchangeable components. Further each component is presented individually including its internal design as well as the Application Programming Interface (API) to other components and the prototypical implementation. Section 3.3.2 covers the platform management subsystem which manages the usage of external servers within Swarm. Subsequently the VCS interface, which manages any source code repository access, is explained in section 3.3.3. All system data is stored and access to it managed by the data storage subsystem detailed in section 3.3.4. Further 32 Chapter 3 Swarm A Distributed Build System

OAM Platform Management External Servers Job Processor Data Storage VCS Interface Filesystem Figure 3.3: System architecture showing the component communication channels. section 3.3.5 focuses on the job processor which is the most active component in the system because of continuous management of running tasks. The user front-end and its foundation provided by the OAM subsystem are described in section 3.3.6 followed by a description of the standard system configuration in section 3.3.7. The system design is finally summarized again in section 3.3.8. 3.3.1 Architecture The goal of Swarm s architecture is to use a component-based design to be able to use external components for generic tasks. Furthermore such a design allows the individual parts to be self-contained with few well-defined dependencies which need to be provided by other components. Figure 3.3 depicts the various components as well as external system which Swarm depends on. The arrows indicate directed communication channels. As an example, the OAM component, which is presented in detail in section 3.3.6, is only calling other components actively, not being called by them at any time. Thus all other component act independently from the OAM component, which could very well be removed from the system if the functionality it provides is not needed anymore. Each component is individually described in the following sections. The few external dependencies of this architecture are the access to the filesystem to store data and source code repositories as well as external servers which can be used to run builds on them. Other than these dependencies Swarm is self-contained which adds to the flexibility of the system. 3.3.2 Subsystem: Platform Management The need for multi-platform support has been described in detail in section 3.1. Such support is not a novel idea. Other continuous build systems, such as [hudson], allow users to add multiple platforms if the required plugins are installed. This approach works sufficiently but is far 3.3 System Design 33

from ideal since the computational model of the core build system is targeted towards the use of a single platform. The objective of this subsystem is defined initially, followed by important terminology definitions. Then the API functions are listed which are provided by this component. Further the component s design is detailed, with a subsequent description of its implementation. Lastly the component s role is summarized and concludes this section. Objective The platform management subsystem needs to provide access to multiple platforms for other system components. Furthermore this access should be available through a API which abstracts from the subsystem s internal data model. State information about the platforms should be kept internally. Definition Platform A platform is a combination of specific hardware, the operating system and the installed software. Often multi-platform support refers to supporting multiple processor architectures, which is not the case with this extended definition. The processor architecture is as much part of a platform as the respective version of an operating system or user libraries. This definition is required because software projects often rely on third-party components, and even specific versions of these. Figure 3.4 shows how two processor architectures, two versions of the same operating system and two versions of a specific library can be combined to eight platforms. third-party component GCC 4.2 GCC 4.5 GCC 4.2 GCC 4.5 GCC 4.2 GCC 4.5 GCC 4.2 GCC 4.5 operating system Ubuntu 10.04 Ubuntu 10.04 Ubuntu 8.04 Ubuntu 8.04 Ubuntu 10.04 Ubuntu 10.04 Ubuntu 8.04 Ubuntu 8.04 processor architecture x86 x86 x86 x86 ARM ARM ARM ARM platform A platform B platform C platform D platform E platform F platform G platform H Figure 3.4: Definition of platforms based on hardware, operating system and software. 34 Chapter 3 Swarm A Distributed Build System

Definition Platform Provider Platforms need to be available through some means of communication. A system which hosts platforms is called a platform provider. Providers are used to get access to a platform via standard communication mechanisms such as Secure Shell (SSH). The Host Machine The machine on which a continuous build system is running is referred to as its host. Most continuous build systems use the host machine to execute builds on it, thus it often closely resembles the expected production environment of the software which is developed. Dedicated Servers The simplest form of platform providers are dedicated servers. Such servers provide one particular platform, a combination of hardware and software. Both physical and virtual machines belong to this category, because the means of communication with the servers doesn t change. Since the setup of dedicated servers is a common administrative task, these platform providers are mostly used by continuous build systems to improve multi-platform support. Cloud Computing Platforms An abstraction over virtual servers is provided by cloud computing platforms. Servers can be created and released within minutes, which makes such clouds ideal for rather dynamic computation needs. The platforms which are available on a cloud can be adapted within certain parameters. Often the chosen operating system needs to support certain virtualization functionality while any additional software can be arbitrarily chosen. This platform versatility makes cloud computing platforms very powerful platform providers. Despite these benefits other continuous build systems don t make use of them yet. API Other components should be allowed to get information about available platforms as well as access to such platforms through a simple API. Thus only the few functions listed in table 3.1 need to be exposed externally. Function Name Parameters Return Values list get Id the id of a platform definition List of all definitions for all available platforms false if the platform couldn t be provided release Token the access token for a provided platform An Access Token if the platform could be provided true the token will be invalidated and the corresponding platform will be made available again Table 3.1: API functions of the platform management subsystem. 3.3 System Design 35

Design While the API described in section 3.3.2 is very simple, the underlying mechanism to manage the various platforms requires a lot of flexibility to be able to support a range of platform providers. The design is based on two user contracts. These contracts are discussed next before the resulting component design is presented in detail. Single User Policy contract. The workflow of the component is based upon a simple yet important A platform may only be used by one user at a time. This policy simplifies the internal workflow since the component only needs to keep track on whether a platform is used or not. Furthermore the user of a platform can be sure that there will be no other users on a provided platform interfering with potentially sensible tasks. User Reliability Because platforms are blocked if used by a user, it needs to be ensured that they are released eventually. Therefor two assumptions need to be discussed before settling on a policy. Unreliable users User programs which have blocked platforms can crash, experience data corruption and thus forget about the occupied platforms. In such cases the platform management would need to employ token timeouts to be able to release platforms dedicated to faulty clients. But such timeouts also add complexity to the component s API as well as the clients interaction with the component. It would need to report back to the component periodically to signal that it is still running and using a platform. Reliable users In contrast user programs can be regarded as reliable. It would be assumed that they employ their own measures to ensure that no data loss occurs and in case of failures the used platforms are either released or usage is continued once the program has recovered. Both presented assumptions add complexity to user programs, either in the form of timeout prevention or failure recovery. But while the assumption that user programs are unreliable requires the platform management to keep track of token timeouts internally, reliable user programs allow for a simpler component design. No additional client communication would be enforced as well as the internal state would be simplified. Based on the discussed assumptions the following policy will shape the platform management s design further. User programs are considered to be reliable and must ensure that platforms are released eventually. 36 Chapter 3 Swarm A Distributed Build System

API Management Interface Data and State Storage Platform Provider Abstraction Platform Provider Adapter Platform Provider Adapter Platform Provider Platform Provider Figure 3.5: Architecture of the platform management subsystem. Component Blocks The platform management component consists of the following set of building blocks, which make up the component design in combination with the previously presented user contracts. API Management Interface Data and State Storage Platform Provider Abstraction Platform Provider Adapters The connections between those building blocks are roughly depicted in figure 3.5. API The API, as mentioned in section 3.3.2, provides only a small set of functions. It is stateless and very simple. All state information is kept within the data and state storage, while the API masks access to this information to the outside. Management Interface While the API provides functions to access platforms, the management interface allows user programs to add, update and remove platforms. The interface is stateless and only operates on the information kept in the data and state storage. 3.3 System Design 37

Data and State Storage All information which is available to user programs as well as all component state information needs to be stored persistently. In case of system failures or restarts, the platform information should be durable. Information about which platforms are in use needs to be consistent at all times. The data and state storage provides the required criteria for all internal data storage. Access is available for component-internal use only. This allows the overall component to be used as a black-box without outside dependencies. Platform Provider Abstraction When user programs request access to a platform, the component will try to provide a platform by using the respective platform provider. Although the set of platform providers is small initially, as discussed in section 3.3.2, it might expand in the future. Thus the platform provider support should be flexible enough to allow easy extension. Therefor the platform provider abstraction is an internal interface to specific platform provider adaptors. Based on the type of platform, the abstraction will call the corresponding adapter which will then perform the required work to enable access to a platform. Platform Provider Adapters Support for a provider is given though platform provider adapters. Such an adapter encapsulates the provider-specific knowledge for managing platforms. All specific information is kept inside the adapter which makes the process of adding a new adapter to the component as easy as adding the adapter code to the system. No further configuration or adaptation of other blocks of the component is necessary. Implementation Since the platform management doesn t depend on other components, as dictated by the component design in section 3.3.2, its implementation is relatively simple compared to other system components. Figure 3.6 shows the supervision tree of the component which consists of the main application supervisor and a single worker process. The worker process is started once the component is initiated and handles all communication with the platform management component. Although the communication doesn t require a dedicated process, this approach has been chosen to simplify access patterns to platforms. In case of multiple request for a single platform, the first user program s request will be served. Since requests are handled sequentially the following requests will not be provided with access to the requested platform. Thus requests are served in First-In-First-Out (FIFO) order. One can be assured that there will never occur a race condition because of two user programs trying to get the same platform simultaneously. API and Management Interface Both interfaces are provided by a single interface module. This module implements the functions presented in the component design in section 3.3.2. All function calls are translated into messages and sent in a synchronous call to the worker process. Worker Process Exclusive access to the component s internal data and state is given to the worker process. Both is kept persistently in two Disk-based Erlang Term Storage (DETS) tables, one for individual platform information and one for any additional platform provider data. The tables are stored on the local filesystem. Except from actions triggered by API calls, the process doesn t do any further work on the stored data. 38 Chapter 3 Swarm A Distributed Build System

PM Supervisor PM Worker Figure 3.6: Supervision tree of the platform management subsystem. Platform Provider Adapters These adapters are implemented as separate modules which provide the required interface functions. Thus the worker process will only look at a platforms provider type, and call the corresponding adapter s interface function. Initially two adapters are available, an adapter for dedicated servers and an adapter for the Amazon EC2 Cloud. In order to be able to use the Amazon EC2 cloud adapter, the EC2 API access parameters need to be provided to the component, so that it will be able to use the API successfully. Amazon EC2 Support The required support for the Amazon EC2 adapter, as well as the similar Eucalyptus Cloud project, is implemented as a separate interface component. The access logic could ve been implemented as a module within the platform management component as well. Because other projects can benefit from such a library though, it has been implemented as its own component which can be embedded more easily. The component s API resembles the official Amazon EC2 API so that users will be able to use it easily without the need to study new interface functions. Summary The platform management subsystem provides a simple API to manage platforms. The implementation provides access to individual servers as well as Computing Clouds such as Amazon EC2. This allows user programs to use a vast number of platform combinations which wouldn t be possible with dedicated servers alone. 3.3.3 Subsystem: VCS Interface Many Version Control Systems are available and used by OSS projects. Because continuous build systems need to be able to access source code repositories, support for such access 3.3 System Design 39

API Version Control System Abstraction VCS Adapter VCS Adapter Version Control System Version Control System Figure 3.7: Architecture of the version control interface subsystem. needs to be provided by this subsystem. The unified API of the component is outlined, followed by the subsystem design. The actual implementation is detailed thereafter. Finally the component s advantages are summarized to conclude this section. Objective The VCS interface component provides access to various Version Control Systems through a single public API. Further it should ensure that source code repositories are not corrupted by parallel access. API A user program should be presented with a unified API for the various Version Control Systems which the component supports, the functions of which are defined in table 3.2. Although most VCSs share common functions, especially Centralized VCSs and Distributed VCSs as discussed in section 2.2, are built upon different paradigms which don t easily lend themselves to a common set of API functions. Thus the component s public API is a combination of both, with some functions being not available if they are specific to another VCS. Design Since the version control interface component is independent from other system components, its internal architecture is simplistic as well. Figure 3.7 illustrates the component s design. 40 Chapter 3 Swarm A Distributed Build System

API The version control interface is not meant to hold copies of repository state information, since the underlying VCS takes care of such repository management already. Thus the component s API is stateless, implying that user programs always need to refer to the source code repository on which certain action should be executed. VCS Integration All Version Control Systems which are of interest for Swarm, as discussed in section 2.2, are implemented in other programming languages than Erlang. Thus one can t simply use them as libraries as done with other components. Instead the command-line front-ends of these VCSs will be used for communication between Swarm and the VCS. This approach lacks the flexibility of a native interface, but also builds upon the strong foundations of those established systems. VCS Abstraction Although Swarm will only provide support for the VCS Git initially, its internal architecture should provide the flexibility to extend VCS support without changing the architecture. Therefor the component abstracts from the actual VCS adapter which is used and calls an abstraction layer instead of accessing adapters directly. Adapters are expected to implement the abstraction s interface functions which are supported by the VCS. VCS Adapter Adapters are modules which provide support for particular Version Control Systems. They are self-contained and modeled after the VCS abstraction API. State information about repositories should not be exposed to the outside. Support for a new VCS can be added by adding the respective VCS adapter to the version control interface component. Implementation Just as the platform management component the version control interface is implemented as a simple single worker process component. The process supervision tree is depicted in figure 3.8, showing that the component supervisor manages only one worker process at all times. Once started the worker process will act as a queue to ensure that API calls which address the same repository won t access it simultaneously. Although the use of a single worker process might slow down operations which address different repositories, these situation won t occur often within Swarm, since it manages only a single project, which is also associated with a single repository. Thus most API calls to the version control interface will address the same repository. API The API is provided by a separate module which transforms all function calls to synchronous calls to the worker process. The worker will call the respective function of the VCS abstraction layer. Which VCS adapter will be used is defined by an application configuration parameter. VCS adapters are modules which implement the abstraction API functions which are supported by the underlying Version Control System. Git Adapter As defined in section 3.2 Swarm only supports Git from the beginning. The respective adapter implements all API function so that a repository can be fully controlled through the version control interface subsystem. 3.3 System Design 41

Function Name Parameters Return Values initialize Path location of the new repository ok if the repository was created successfully clone pull push checkout get lastcommit Source path to the original repository Destination path to the new repository Path path to the repository Branch the branch which should be pulled Source path to the repository to be pushed from Destination path to the repository to be pushed to Path path to the repository Target object which should be checked out Parameter name of the attribute which should be read Path path to the repository Commit unique reference of a commit Path path to the repository Branch the branch from which the commit should be read error when failed ok if the cloning was successfully error when failed ok if the pull was successful error when failed ok if the changes were pushed successfully error when failed ok if checkout was successful error when failed Value the parameter s value Commit hash of the last commit error when failed Table 3.2: API functions of the VCS interface subsystem. 42 Chapter 3 Swarm A Distributed Build System

VCSI Supervisor VCSI Worker Figure 3.8: Supervision tree of the VCS interface subsystem. Summary The version control interface acts as a abstraction over source code repositories for other system components. It is extensible, but only provides support for Git repositories initially as explained in section 3.3.3. The subsystem only processes user calls, without doing any further active operations in between. 3.3.4 Subsystem: Data Storage The database is the heart of many systems because it houses all information which is required or has been gathered. Therefor the role of the data storage subsystem is very important because it manages both the storage engine and access to it. The concise data model it provides and tight integration with the storage engine are described in this section. Objective The data storage subsystem is responsible for storing all system data persistently to disk. Furthermore it defines a system-wide data model and provides access functions to those models. It needs to hide details of the chosen storage engine to other system components. Data Model The data model for Swarm is defined within the data storage subsystem and used throughout the rest of the system. It only covers shared data. As described in sections 3.3.2 and 3.3.3 3.3 System Design 43

Entity Primary Key(s) Secondary Key(s) Attributes project id name branch vcs_url build_steps test_steps last_commit patch patch_state id patch_id & platform job id patch_id package id package_spec_id url package_spec id Table 3.3: System s data model entities including attributes. author description name merged status logs platforms status name description interval build_steps individual components might use additional types of data internally, which are not incorporated by the data model presented here. Because the system principles, as defined in section 3.2, put non-core functionality out of scope of the system, the data model can be very concise. A list of all data entities and their associated attributes is listed in table 3.3. Project The central entity in Swarm is the project which is managed. In this sense a project resembles a source code repository which Swarm uses as a reference for changes. The project s unique id is used to identify it within the system whereas its name is used to refer to the project within a user context. A project is always tied to a repository url which is stored as the vcs_url. In case the repository is using a Distributed VCS the reference branch is represented by the project s branch attribute. Swarm distinguishes between two types of commands which should be run to verify the correctness of a source code change, there are build_steps and test_steps. Both are lists of individual steps but they differ in how Swarm treats the execution of such steps. These semantics will be explained later. Finally a project entity has a reference to the last known commit in last_commit to ensure that Swarm uses the latest state of the reference repository. Patch Changes to the monitored source code are resembled as patches within Swarm. A patch has a unique id which is usually equivalent to the unique identifier of the patch within the repository. Furthermore the author and description of a patch are stored in separate attributes for convenient access from a user context. A patch is also given a name which is by default the shortened description or defined by a user. Lastly Swarm tracks whether a patch is already part of the monitored repository through the merged attribute. Patch State Because Swarm supports building source code on multiple platforms, the state of such builds needs to be kept easily accessible. A patch state resembles a build of a particular patch on a particular platform. Therefor the patch state has a composite primary key based on 44 Chapter 3 Swarm A Distributed Build System

the patch_id and the platform being used. Furthermore the current state of the build is represented by the status. All output which is directed to stdio while commands of the build are running is stored in the logs attribute. Job Swarm tries to run as many builds concurrently as possible. As a logic wrapper for any build which needs to be run, the job entity holds all necessary information regarding a certain build. The unique id is random, while the secondary key patch_id references the patch which is the subject of a job. Furthermore the list platforms holds identifiers for all platform a job needs run a build on. Lastly status indicated whether the job is still running. Only when all builds have finished, either succeeded or failed, the status will be finished. Package Spec As blueprints for packages Swarm uses package specs. A package spec is identified by a unique id. For better understanding of a spec it has a name and description as well. When a package is to be assembled is dictated by the trigger. The commands which need to be executed in order to build a package are stored in a list as build_steps. Package The result of building a package spec is a package itself. It is a single file which is made available by the system through a url, which also contains the package s unique id. For later tracking the package holds a reference to he package spec it is based on in package_spec_id. Design The data storage subsystem is composed of two abstraction layers which are added on top of the general storage engine which it uses. Figure 3.9 depicts the layered design where the data abstraction layer acts as an API to other components. Access to the storage engine is managed by the transactional data access layer. Storage Engine The core of the data storage subsystem is the storage engine which is used to store all data. The objectives for this component already incur a important requirement for the storage engine. All data needs to be stored persistently on disk. The engine itself is a separate component which can be either a external storage engine, such as the popular open-source RDBMS MySQL [mysql], or an embedded engine providing data storage functionality, which highly depends on the implementation of the data storage component, which is presented in section 3.3.4. Moreover the storage engine needs to preserve data consistency in the event of failures. Transactional Data Access Since Swarm is composed of many processes which are running in parallel, concurrent data access is a common pattern for this kind of application. Thus access to the same data needs to be coordinated such that each access works on the fresh data. Such transaction support is provided by a layer on top of the core storage engine. The engine itself might provide such a layer already, otherwise it needs to be implemented to support the aforementioned access pattern accordingly. 3.3 System Design 45

Data Access Abstraction Model Model Model Transactional Data Access Storage Engine Figure 3.9: Architecture of the data storage subsystem. Data Access Abstraction The outside facing layer of the data storage subsystem provides access functions to easily read and write data entities. It is composed of models for all data entities described in section 3.3.4. A model encapsulates knowledge for one particular data entity. It provides convenience functions to create new instances and work with existing ones. Furthermore the access to the storage engine is masked by these models which lowers the complexity of data access for other components, since no knowledge of the storage engine in use is required. While this doesn t prevent other components from accessing the storage engine directly, it is strongly preferred to use the data access abstraction layer for all data access. Both the maintainability and extensibility of the data storage subsystem benefit from this streamlined approach. Implementation Because most of the work of the data storage subsystem is performed by the actual storage engine, the implementation of the remaining two abstraction layers remains relatively simple. Thus this section will compare storage engines in regard to the requirements of the subsystem and describe the engine which is used for the prototype. Then the implementation of both abstraction layers is presented which is based on the chosen storage engine. Storage Engine Comparison The storage engine which should be used can be picked out of a long list because many are available as Open-Source Software. Thus this comparison, see table 3.4 for an overview, will only consider a small set which represents the different kinds of engines available. Both the Erlang Term Storage (ETS) and DETS are key-value stores which are part of Erlang/OTP. The main difference between both engines is that ETS stores data only in memory while DETS is entirely disk-based. Both provide the same functionality in most other aspects. Every data access is atomic, but transactions are not supported. Both can be used very easily from within 46 Chapter 3 Swarm A Distributed Build System

Name Transaction Support Data Persistency Interface ETS not supported in-memory native (part of OTP) DETS not supported on-disk native (part of OTP) Mnesia full support in-memory & on-disk native (part of OTP) MySQL full support on-disk Erlang ODBC driver Tokyo Tyrant full support in-memory & on-disk Erlang driver Table 3.4: Comparison of storage engines for the data storage subsystem. Erlang since they are part of Erlang/OTP. Nevertheless both don t satisfy the component s requirements. Mnesia is a distributed Database Management System (DBMS) entirely written in Erlang. It combines the data persistency features of ETS and DETS to provide both in-memory and on-disk storage of data. Furthermore it offers full support for transactions including nested transactions. Since it is implemented in Erlang it can be easily integrated in Erlang applications. Another storage engine which could be used is MySQL. It differs from the other engines in that it runs as a separate server which can be accessed though a TCP/IP-based protocol. Erlang/OTP provides a generic Open Database Connectivity (ODBC) application which can be used to connect to MySQL and access databases. One of its many features is extensive transaction support and persistent on-disk storage for all data. Lastly Tokyo Tyrant is a key-value store which runs as a separate server as well. It can be accessed using an OSS Erlang client [tora], which implements the required TCP/IP protocol. It provides rich support for transactions and the ability to store data either in-memory or on-disk, which differentiates it from MySQL s persistency capabilities. Out of this list of storage engines ETS can t be used because it doesn t support storing data on-disk, which is a persistency requirement. Furthermore DETS can t be used because of fairly primitive support for failure scenarios. In case of such failures the database files might be corrupted to a point which requires manual inspection and manipulation to be able to use those files again. This is not desirable for a autonomous system, therefor DETS can be dismissed as a candidate. This leaves Mnesia, MySQL and Tokyo Tyrant as potential storage engines. All three fulfill the necessary requirements and provide many useful additional features. Nevertheless they differ in that MySQL and Tokyo Tyrant are separate servers which can only be accessed though TCP/IP. This doesn t only add latency to any data access but also increases the complexity of a system deployment, because the DBMS needs to be setup correctly and administered before Swarm can be run. Contrary, Mnesia can be integrated within Swarm enabling a fully transparent setup of the DBMS as part of the actual system. This difference improves the overall maintainability of the system, which is an increasingly important aspect of modern software. Thus Mnesia will be used for further implementation of the data storage subsystem. Mnesia As a distributed DBMS Mnesia provides rich support for failure scenarios. Although Mnesia won t be used in distributed mode, the data storage subsystem still benefits from the rich feature set and robustness. Setting up a new Mnesia database as well as using existing databases is a short and easy task which should be done upon system startup. Thus the subsystem ensures that the database exists and starts up the Mnesia application which provides access to the database. Once started the application will ensure consistency of the data and perform necessary repairs in case the system crashed in unexpected ways. The application is an example of a more complex Erlang component, as the multi-tier supervision tree, shown in figure 3.10, indicates. Two processe are particularly important in regard to the required functionality, the Mnesia Transaction Manager and Mnesia Locker. The Mnesia Locker can lock a single table row or entire tables and give exclusive access rights to a single process, which is the foundation for Mnesia s transaction 3.3 System Design 47

Mnesia Supervisor Mnesia Event Mnesia Kernel Sup Mnesia Locker Mnesia Controller Mnesia Late Loader Mnesia Monitor Mnesia TM Mnesia Subsrc Mnesia Recover Mnesia Checkpoint Sup Mnesia SNMP Sup Figure 3.10: Supervision tree of the Mnesia application. support. The locking can be adjusted by the user program, allowing it to exploit knowledge of data access patterns within the system. The Transaction Manager provides the transaction functionality on top of Mnesia s basic atomic read/write data access. Thus the transactional data access is handled by Mnesia itself. Therefor the models, as part of the data access abstraction, consistently use Mnesia s transaction API. Data Entity Models Swarm uses a small set of data entities which have been described in section 3.3.4. For each of these entities the data storage component provides a model, which implements data access functions for the given entity. The model itself will access the storage engine to perform any necessary reads or writes, hiding the level of complexity for any data access function from user programs. Furthermore each entity has its own named data structure, which is a Erlang record. This allows access to entity attributes through their name. The notion of a record is similar to that of an object in object-oriented programming with the difference that a record is only data, thus doesn t have a lifecycle like an object does. As an example the listing 3.1 shows the record definition of the job data entity in Erlang. For this record listing 3.2 shows a simple model implementation. The only function, new, creates a new job record, stores it in the database and finally returns the new job record. As intended the caller won t see any of the interaction between the model and the storage engine, but is provided with a job record which represents a persistently stored data entity. 1 record ( job, { 2 id, 3 patch_id, 4 platforms, 5 st at us = new 6 } ). Listing 3.1: Record definition of the job data entity 48 Chapter 3 Swarm A Distributed Build System

1 module ( job ). 2 3 export ( [new/2 ] ). 4 5 new( Patch, Platforms ) > 6 % get c u r r e n t timestamp 7 {H, M, S } = now ( ), 8 % combine timestamp and patch i d to unique job i d 9 Id = s t r i n g : j o i n ( 10 [ Patch, i n t e g e r _ t o _ l i s t (H), i n t e g e r _ t o _ l i s t (M), i n t e g e r _ t o _ l i s t ( S ) ], " " ), 11 % i n s t a n t i a t e new job record instance 12 Job = # job { i d=id, patch_ id=patch, platforms=platforms }, 13 % store new job i n database 14 datastore : t r a n s a c t i o n ( fun ( ) > mnesia : write ( jobs, Job, write ) end ), 15 % r e t u r n job record to c a l l e r 16 Job. Listing 3.2: Simple model for the job data entity Other data access functions which would be provided by the job model are the ones required for deleting, updating and reading data entities. Summary The data storage subsystem is entirely self-contained because it uses the embedded DBMS Mnesia as the storage engine. Therefor it benefits from the robustness of Mnesia itself. The component further masks all access to the data by providing models for all data entities in the system. This provides a clean abstraction, which improves the maintainability of the whole system. 3.3.5 Subsystem: Job Processor Jobs are the core computational work which is managed by Swarm, thus proper management of those is provided by the job processor subsystem. The distributed design of the component is presented in this section, followed by an explanation of its robust implementation. Objective Jobs need to be executed in parallel on remote machines, while ensuring that no obsolete work is performed. Furthermore results of jobs need to be gathered and saved constantly until they ve finished. Finished jobs need to be terminated properly while crashed jobs need to be re-executed eventually. Definition Job A job is the unit of work within Swarm. It consists of the source code which is subject to the build process and a set of platforms it is supposed to be build on. Only if the builds on all of these platforms have either finished successfully or terminated because of crashes, the job will be marked as finished. 3.3 System Design 49

API Job Coordination Job Execution Worker Worker Worker Worker Remote Machine Remote Machine Remote Machine Remote Machine Figure 3.11: Architecture of the job processor subsystem. Design Like other system components the job processing subsystem follows a layered design which is composed of four tiers, as shown in figure 3.11. The outside facing API masks the core job coordination layer, which directly instructs the job execution layer. Lastly jobs are actually run on remote machines. API Other components need to be able to start jobs and enquire the status of already running jobs. The API provides these user functions. Calls to these are forwarded to the job coordination which will perform the necessary actions and reply to the original caller. Thus the API is a thin layer on top of the job coordination to simplify the user access further. Job Coordination The core of the job processor component is the job coordination layer. Its main objective is to instruct the job execution layer to start working on new jobs, which have been submitted through the API and have been found to be correct or otherwise adapted so that no duplicate work is performed. Furthermore it keeps track of running jobs and ensures that no zombie jobs exist by periodically receiving status information from running jobs. Lastly the job coordination receives output updates from running jobs, which are added to the output archive of the respective patch. Thus it acts as a proxy between the job execution and data storage subsystem. Job Execution The actual build execution is done within the job execution layer. It consists of workers, one for each job which is processed. The workers are started by the job coordination for each new job. They run independently from each other, thus problems experienced during a particular build process don t influence other job executions. A worker manages the execution of individual builds on remote machines. Output which is received from these executions is forwarded to the job coordination. Once all builds of a single job have been completed the respective worker terminates itself. 50 Chapter 3 Swarm A Distributed Build System

JP Supervisor JP Manager JP Worker Sup JP Worker JP Worker remote JP Runner JP Runner JP Runner JP Runner JP Runner JP Runner Figure 3.12: Supervision tree of the job processor subsystem. Remote Machines Access to remote machines is available through the platform management component. A worker acquires access to a remote machine for each platforms it needs to build on. It ll execute commands on those machines, and release the access rights once the overall job has been finished. Implementation The implementation of the job processor subsystem is rather complex compared to other system components because it needs to connect to other servers, transfer data and execute code on these. Figure 3.12 depicts the subsystem s final supervision tree, which already shows that some processes are run on remote Erlang nodes. Remote Nodes Since much work of this component needs to be done on other servers, it becomes important how those servers are accessed. Since Erlang provides very good support for running distributed over several servers, the most convenient and robust solution is to run a separate Erlang node on each remote machine which is connected to the main Swarm Erlang node. This approach allows Swarm to execute arbitrary Erlang code on such remote nodes, making it possible to use all features of Erlang/OTP to run builds. This also enhances the robustness of job execution because one can react to failing process on such nodes or even failing nodes more easily based on the built-in robustness of Erlang. In order to start a remote node and connect to it, the job processing component simply needs password-less SSH access to the remote machine itself. Through this it will be able to start a new Erlang node on the remote machine. 3.3 System Design 51

Job Processing Runner Running a build on a single platform is performed by a process called job processing runner. When initialized it is given the build instructions which should be used in the process. Then it will wait for the source code to be send to it as an archive. Once that has been received the process will wait for a signal to start the build execution. All output from the actual build is forward to the master Swarm node. Once all build instructions have been performed the process terminates itself, sending a final exit signal to its master process. Job Processing Worker The initialization, setup and execution of job processing runner processes for each job is handled by the job processing worker. It is instantiated for each new job by the job processing manager. Once started the worker process identifies all platforms which are associated with the job and tries to acquire access to each of these. Then a remote Erlang node is started in each platform, followed by the creation of a job processing runner process on each platform. Subsequently the worker will provide each runner process with an archive containing the relevant source code for the build and trigger its execution. Afterwards the worker waits until all runners have finished their tasks and terminated. Finally it will report back to the job processing manager, signaling the completion of the job, and terminate itself. Job Processing Manager The job coordination is handled by a single management process, allowing a simple storage of temporary job-related data within that process. All coordination decisions are made by this process. Once a new job can be started, the job processing supervisor is instructed to start a new job processing worker to handle the job execution. From there on the manager will only ensure that the job is still executed, but leave all further execution details to the job processing worker. Summary The job processor subsystem uses the powerful Erlang distribution features to execute builds on remote machines. Many jobs are processed in parallel and coordinated so that no duplicate execution happens. The individual processes of the component are in constant contact and share information about the state of running jobs. All important data is persistently stored using the data storage subsystem. 3.3.6 Subsystem: Operation and Management This section presents the design of the OAM subsystem, its objectives are initially defined, followed by the simplistic design of the component, which uses a 3-tier architecture. Subsequently a detailed description of how the design is prototypically implemented is given. Finally the summary of the OAM subsystem design concludes this section. Objective The subsystem allows users to get information about the system state and results of finished and ongoing builds through a webpage. Furthermore the system configuration can be changed through a separate administration webpage. 52 Chapter 3 Swarm A Distributed Build System

Design The architecture of the OAM subsystem is similar to many web applications. Its difference is that it is embedded into the continuous build system to provide the user with a web interface to internal data and functionality just as many other systems do. It uses a 3-tier architecture which consists of the web-server, the web application framework and a internal abstraction layer which is used to interface with other system components. Figure 3.13 outlines this standard architecture. Dashboard Webpage Administration Webpage Webserver Web Application Framework Internal Abstraction Layer Figure 3.13: Architecture of the operation and maintenance subsystem. Internal Abstraction Layer This base tier of the component s architecture takes the role of a database within a common web application architecture. Because the component is embedded within Swarm, it needs to use the data storage subsystem to read or write data and other subsystems for any other actions. Thus this layer provides an API for the web application framework to use during processing of user requests. This allows for a better separation of web request processing and internal component calls. Web Application Framework Part of any web application is the handling of Hypertext Transfer Protocol (HTTP) requests, which involves the following steps. Reading the HTTP request Routing the request to the appropriate handler Executing the handler Retrieving results from the handler Building the HTTP response Returning the HTTP response 3.3 System Design 53

Platform A Platform B Platform C Latest Version OK OK OK Change A OK OK FAILED Change B OK FAILED OK Change C OK FAILED OK Change D OK in progress OK Change E in progress in progress OK Figure 3.14: Example of a matrix showing the state of each build. All of these steps are mandatory for each request and very generic. The specific parts for each request are provided by the request handlers which are executed internally. In order to avoid having to implement the generic parts for HTTP request handling, the component uses a off-the-shelf framework which performs all of the above generic tasks. It needs to provide enough flexibility to customize its behavior and allow the assignment of URLs to handlers as well as the definition of custom request handlers. Web Server The topmost tier of the architecture is the web server, which listens to a Transmission Control Protocol (TCP) socket for incoming requests and forwards proper HTTP requests to the underlying web application framework. Furthermore it keeps connections to the clients alive and sends responses which are returned by the web application framework back to the original client, ensuring that the response is a proper HTTP response. All of these tasks are very generic and should be performed with low resource usage, since the amount of traffic which needs to be handled by the subsystem can be expected to be rather low. The web server doesn t have to provide advanced features of full-blown HTTP servers, such as authentication support. Dashboard Webpage The main page for information retrieval for users is the dashboard. It should give an overview over all changes which are subject to builds within Swarm. Furthermore the state of each build should be separated by the platforms they are running on. Thus one way to visualize such an overview is to use a matrix as shown in figure 3.14. Moreover the user should be able to drill down into each cell, which resembles the state of a build on a certain platform, to see further detailed output information of the respective build. This allows the user to get an overview as well as detailed information if desired. 54 Chapter 3 Swarm A Distributed Build System

Administration Webpage In order to be able to setup a project in Swarm as well as change project information and various general system settings, the OAM subsystem provides a separate webpage. The following actions should be done through this administration front-end by the system s administrator. Change project name Change project repository Specify build steps Specify test steps As explained in section 3.2.2 the OAM subsystem doesn t provide any authentication to this front-end, therefor the system administrator needs to ensure that proper security measures are in place to prevent users from accessing the administrative functions. Implementation Since the OAM subsystem uses a very generic web application architecture, it actually uses many third-party components which implement the generic HTTP handling. Thus the integration of those components as well as the web pages, custom request handlers and internal abstraction layer make up the implementation of the OAM subsystem. Mochiweb There are a couple of embeddable components available in Erlang, which provide HTTP handling capabilities, since this type of functionality is very common. Mochiweb is such a component, which has very limited functionality because it focuses solely on managing HTTP connections. Because of this simplicity it is easily embeddable and uses a tiny amount of resources. When idle, the component is only using one Erlang process, which listens for new incoming connections. For each new connection a new process is dispatched which handles the connection. Once the connection has been closed, the process managing it will terminate as well. Because one bare Erlang process only occupies 335 Bytes of memory on a 64-bit system, Mochiweb doesn t consume a lot of resources when idle or under low load. When started, Mochiweb is provided with a callback module which is used for every incoming HTTP request. Nitrogen The component which handles the generic HTTP processing should be easily embeddable as well and only provide the necessary functionalities which were specified by the design of this subsystem. Nitrogen is a HTTP framework which provides basic processing features while relying on other components to manage HTTP connections and provide the request handling logic. Thus it works well together with Mochiweb. When started it only runs a single Erlang process as well, which keeps track of requests which are being processed. The framework itself will not incur more memory usage at any time. It is configured by specifying a list of routes, which determine which handler is called upon which HTTP request. No more information is needed for the framework to process requests. For any incoming request Nitrogen will transform the raw HTTP data into an Erlang format, then pass it to the appropriate handler. The result received from the handler is transformed into HTTP format and returned to Mochiweb. 3.3 System Design 55

OAM Supervisor Nitrogen Mochiweb Figure 3.15: Supervision tree of the operation and maintenance subsystem. Handlers Because the web front-end is very simple, only a few request handlers are required to provide the desired functionality. The dashboard handler returns an overview of all builds within the system upon every request. Therefor it needs to call the underlying internal abstraction layer to retrieve the latest data from the data store. The build handler provides the complete output for a particular build. Furthermore the admin handler processes any requests associated with the administration functions and also provides an overview of those. Lastly a generic handler is needed which serves the static files which are referenced within the web pages, such as Cascading Style Sheets (CSS) and JavaScript files. Integration As already mentioned the integration of the above components is relatively simple because of their embeddable nature. Figure 3.15 shows the simple process hierarchy of the subsystem. When the subsystem is started, it first needs to start Nitrogen which is supplied with a list of routes as outlined in listing 3.3. Subsequently Nitrogen will start Mochiweb and register itself as the callback for any incoming HTTP request. Once Mochiweb started listening to the TCP port, which is specified as a system configuration parameter, it will return, leaving Nitrogen to finish its startup procedure so that the subsystem is ready to serve user requests. 1 [ 2 { " / ", dashboard_handler }, 3 { " / b u i l d ", b u i l d _ h a n d l e r }, 4 { " / admin ", admin_handler }, 5 { " / p u b l i c ", s t a t i c _ f i l e } % generic handler provided by Nitrogen 6 ] Listing 3.3: List of route definitions for Nitrogen. Summary The OAM subsystem uses a common 3-tier web application architecture to provide a web front-end for information retrieval and administrative functions. It uses generic subcomponents 56 Chapter 3 Swarm A Distributed Build System

to manage HTTP connections and process HTTP requests. Further custom request handlers implement the functionality needed to provide the desired information to users on web pages. Lastly all of these components are tightly integrated and use very minimal resources which benefits the overall system performance. 3.3.7 System Configuration The configuration of a software system is always an important aspect of system deployment because it influences how much time and effort system administrators have to spend to perform a system setup. This section describes how Swarm can be configured. First the individual properties are outlined, followed by a description of the bootstrapping process for new system setups. Finally a summary of Swarm s configuration concludes this section. Parameters 1 [ Swarm requires only few bits of information to run as the system works fairly stand-alone. The configuration parameters for the system are stored in a configuration file called app.config. The location of that file can be specified as a start parameter upon system startup. Listing 3.4 shows a correct example configuration. The configuration file contains a Erlang list of Erlang tuples of the form {Application, ListOfParameters}. A ListOfParameters contains Erlang tuples of the form {Parameter, Value}. Through this schema one can specify parameters for various Erlang applications which are used within Swarm. 2 { swarm, [ 3 { repo_base_dir, " / var /www/ swarm " }, 4 { vcs, " g i t } 5 ] }, 6 { mnesia, [ 7 { d i r, " / var / data / swarm " } 8 ] }, 9 { sasl, [ 10 { e r r o r _ l o g g e r _ m f _ d i r, " / var / logs / swarm " } 11 ] } 12 ] Listing 3.4: A example configuration for Swarm. Repositories Directory Swarm manages various local clones of the project repository which it monitors, which allows it to make full use of the underlying Version Control System for change management. These clones are stored in a main directory, which can be specified through the parameter repo_base_dir for the application swarm. Version Control System Although Swarm only supports Git as the VCS backend, it uses an abstraction internally as explained in section 3.3.3. Thus the backend needs to be specified through the parameter vcs for the application swarm. Database Directory As described section 3.3.4 the Erlang DBMS Mnesia is used as the storage engine for all system data. Mnesia stores all data files in a single directory which needs to be specified through the parameter dir for the application mnesia. 3.3 System Design 57

Log Directory Swarm s system logging is based on a Erlang/OTP application called System Application Support Libraries (SASL). All logging data is written to a single directory which is defined through the parameter error_logger_mf_dir for the application sasl. Bootstrapping Because Swarm is a stand-alone application which doesn t depend on other systems, not even source code repository to be monitored initially, its setup is very simple. Given the source directory of Swarm, the system can be started up the first time by issuing the commands shown in listing 3.5 sequentially. Swarm will automatically setup its embedded database if none is found. The system can be restarted using the command shown in listing 3.6 within the release directory. 1 make r e l 2 cd r e l / swarm 3. / bin / swarm s t a r t Listing 3.5: The command sequence for starting Swarm the first time. 1. / bin / swarm r e s t a r t Listing 3.6: The command for restarting a running Swarm instance. Summary Swarm is an easy to deploy application which only needs a few specified configuration parameters before one can start the system the first time, which are outlined in this section. Furthermore its setup and startup procedure doesn t involve human interaction. Instead the system provisions its embedded database automatically. This simple deployment characteristics allows system administrators to setup Swarm in a matter of minutes. 3.3.8 Summary Swarm s system design which has been presented in this section is based on best practices for highly concurrent server applications. It is split into various small components which are largely self-contained which helps to design each component to exploit parallel task execution as much as possible. Each component provides a defined interface for communication with other components. Each component in the system can be easily exchanged with another one which provide similar functionality and the same interfaces. This make the prototypical implementation which was described for each subsystem flexible because it can be adapted by using a different implementation for some small set of functionality without changing major parts of the system. The design has been kept simplistic were possible by removing unnecessary functionality from the list of requirements while still keeping the system extensible through future changes. 3.4 SYSTEM INTEGRATION Integrating with third-party systems is an essential feature of a modern continuous build system because it allows teams to adapt the information flow to their development workflow, making it possible to use Continuous Integration even further. Swarm s simple approach to this requirement is presented in section 3.4.1, followed by an example of how to use it given in section 3.4.2. Lastly this support is quickly summarized in section 3.4.3. 58 Chapter 3 Swarm A Distributed Build System

Event Name Event Description Event Hook Latest VersionChanged New Change Build Finished The latest version of the source code in the monitored project repository has changed A new change has been added A build has finished successfully latest-version-changed commit NewCommit new-change commit Commit author Author description Description build-finished commit Commit platform PlatformId url LogDetailUrl Build Failed A build has failed build-failed commit Commit platform PlatformId url LogDetailUrl Change Ok Table 3.5: List of event hooks. A change has been built on all platforms successfully change-ok commit Commit 3.4.1 Event Hooks The integration with other system is based on event hooks which can be used to execute custom code which sends data to other systems. This allows Swarm s internal data representation to be unaffected while relying on proven technologies. Furthermore it allows system administrators to use available third-party tools to communicate data instead of having rely on a limited set of provided actions. Events Swarm uses the events listed in table 3.5 internally, which all trigger an associated event hook which is described in the table as well. 3.4.2 Hook Scripts When an event is triggered, Swarm simply executes the associated event hook within the system s hooks directory, which is expected to be a subdirectory of the main Swarm release directory. Thus in order to be able communicate data to other systems upon an event, one just needs to provide a executable script with the appropriate event hook name, which is then executed and provided with the arguments detailed in table 3.5. Listing 3.7 shows such an Erlang script which is named change-ok and thus would be executed when this event occurs. 1 #! / usr / bin / env e s c r i p t 2 3 main ( [ " commit ", Commit ] ) > 4 % send a GET request to another system on the same server, i n d i c a t i n g t h a t the change i s ok 5 os : cmd ( " c u r l http : / / 1 2 7. 0. 0. 1 / change / ok / " ++ Commit ). Listing 3.7: Example of a event hook script for change-ok. 3.4 System Integration 59

3.4.3 Summary The support to notify other systems of changes within Swarm uses event hooks which allow the system administrator to create scripts which are executed when an event occurs. This rather limited approach nevertheless allows the use of any tool to continue the communication of information after such an event, thus providing a powerful foundation. Further a hook is limited to one script, therefor one would need to contact several system within that one script if required. 3.5 WORKFLOWS Swarm is a highly concurrent system which executes many builds at the same time. Such an execution involves starting processes, data transfer, retrieving and storing data as well as cleanup of a build. To illustrate the interaction between components within Swarm when system events occur further, this section describes a few standard workflows in detail. The handling for the basic event when the latest version of the monitored project changes is explained in section 3.5.1. Subsequently the section 3.5.2 details how results are gathered for running build. The more generic event handling for other source code changes is outlined in section 3.5.3. Lastly the system workflows are summarized in section 3.5.4 3.5.1 Latest Version Change Any continuous build system uses the base concept of monitoring a source code repository for changes, which is therefor noted in section 3.1.1 as a primary requirement. Swarm does fullfil this requirement by monitoring a specified project s repository. Because of this base concept the most common event within the system is when the latest version in that monitored repository changes. In traditional systems, such as Hudson, this is the event which triggers a new build. In Swarm this event leads to a set of builds depending on the number of platforms which are defined for a project. Internally the aforementioned event leads to a series of process calls and the creation of new processes. Figure 3.16 roughly depicts how the involved processes interact until the builds are started. The steps can be split into the following logical parts. Polling Last Version Propagating Event Starting Job Starting Builds Polling Last Version The system event is generated by the Project Worker process. This process maintains the local clone of the monitored source code repository. When initially started it will create the clone and subsequently update it with the lastest changes from the original repository. Because one can t make the assumption that Swarm s system administrator has access to the monitored repository s filesystem, Swarm can t rely on receiving change notifications from the underlying VCS. Therefor the Project Worker needs to poll the monitored repository for new changes periodically using the VCS interface subsystem, which was presented in section 3.3.3. After each polling operation the process also retrieves the hash of the latest commit within the local repository clone. If this hash is different from the one stored in memory as the latest, the process will issue the system event and update the cached hash value with the new commit hash. 60 Chapter 3 Swarm A Distributed Build System

Project Worker VCS Interface Job Manager Job Worker Job Runner pull ok lastcommit LastCommit build LastCommit Start Start for each platform Figure 3.16: Activity diagram outlining the system behavior when the latest version of the monitored project changes. Propagating Event The system event is send to the Job Manager process which handles the coordination of all jobs within the system. The definition of a job is given in section 3.3.5. Because the subject of this job is a completely new change, it will be build on all available platforms at a time. Because of the nature of this event no further duplicate checks need to be done, which leads to a direct start of the job. Starting Job A job is executed by a dedicated Job Worker process. For each valid job the Job Manager will start such a Job Worker and assign the job to it as part of the initialization. The new process will then investigate its given job and start processing it right away. Starting Builds For each platform within a job, the corresponding Job Worker starts a Job Runner process on that platform. Before doing so it will already start a Erlang VM on that platform and connect to it. The Job Runner is given the build instructions which it needs to run as well as the source code which includes the latest change already. Once all the data has been transferred to the platform, the Job Worker triggers the execution of the build by the Job Runner. These steps are performed by the Job Worker for each Job Runner simultaneously. 3.5.2 Result Propagation Once a job has been started the respective Job Worker process enters a stage of passive operation during which it acts as a proxy for all log messages as outlined in figure 3.17. During the execution of a build the Job Runner process gathers the output from stdio and sends it line by line to its responsible Job Worker. When receiving such a message, the Job Worker simply stores it persistently as part of the patch details in the database. Further the following three steps have an impact on the result propagation activity. 3.5 Workflows 61

Data Storage Job Manager Job Worker Job Runner Job Runner store log send log store log send log store log send log finished store log send log update state finished finished Figure 3.17: Activity diagram showing how build results are consolidated during runtime. Build Complete Job Complete Job Finalizing Build Complete A Job Runner process will keep sending log data to the Job Worker as long as the build steps are executed. Once these have been finished, the process will terminate normally, which triggers a notification being send to the Job Worker. This notification is used to update the status of the build to finished as part of the job. Job Complete A job consist of many builds. Only when all of these have either finished or crashed, the job is considered completed. In such a case the Job Worker will clean up its state, which includes tearing down left over Erlang VMs on the used platforms. Finally it will terminate normally, which triggers a notification being send to the Job Manager. Job Finalizing Upon receiving the notification that a Job Worker has terminated normally, the Job Manager will remove it from its set of monitored processes. Further it will update the job data within the database to signal that it was finished successfully. In case of a system restart this would indicate that the job wouldn t need to be run again. At this stage all processes which have been spawned to perform the job have terminated, leaving no zombie processes within the system. 62 Chapter 3 Swarm A Distributed Build System

3.5.3 New Change When other changes need to be build, which are not yet added to the monitored repository, Swarm s internal workflow is very similar to the one described in section 3.5.1. The Project Worker gets notified about a change, which will then lead to the creation of a new job. This will cover all available platforms because the change is new. Further the source code which is being used within that job is the latest version within the monitored project including the new change. This assumes that the change can be applied without having to manually merge. Once the job has been created it is send to the Job Manager which will then perform all actions necessary and execute all builds. These steps are the same as in the workflow for a latest version change. Further the result propagation is similar as well since each job is internally treated the same. This generic execution of jobs allows one to adapt the way changes are combined with the latest version of the source code based on findings in production. No modifications would need to be applied to the job handling, but rather the way how changes are introduced into the system and used to create new jobs. 3.5.4 Summary Swarm s internal workflows are defined by the parallel execution of builds on remote platforms. The main work is done as part of the job handling and result consolidation, which can be used to build more sophisticated change graphs if wanted. Initially only the change of the latest version within the monitored repository and simple other changes are used to create new jobs within the system. This provides the functionality which is needed to provide ongoing feedback to users about the state of the monitored source code. 3.6 SUMMARY Swarm is a full-fledged continuous build system in the sense that it supports to monitor a project s source code repository to recognize changes and run builds for every version of the software. It provides the necessary feedback to users as required for proper Continuous Integration support. However, the system takes these concepts a step further by allowing software to be build on many platforms simultaneously. This provides projects which a higher level of feedback about the state of the software, allowing teams to support more platforms than previously possible. Further the advantages of Distributed Version Control System are taken into account to allow the building of changes which haven t been applied to a project s main repository. This concept brings flexibility of such Distributed VCS to the development process management. The system design adheres to component-based design principles, defining individual functional blocks which can be exchanged depending on the requirements for the implementation. This flexibility comes with a high level of maintainability because each component follows a simple internal design. The system is mostly self-contained with few outside dependencies such as external servers to run builds on, which further improves the system s maintainability. Further it supports the propagation of internal information to third-party system through a simple, yet powerful hook mechanism. 3.6 Summary 63

64 Chapter 3 Swarm A Distributed Build System

4 EVALUATION The evaluation of the work presented in this thesis is divided into two logical parts. First the system design is shortly recalled in section 4.1 followed by a detailed explanation on the advantages of using Erlang for the prototypical implementation in section 4.2. Subsequently the performed system tests are described in section 4.3 and the planned deployment of Swarm is quickly presented in section 4.4. Finally the evaluation is summarized in section 4.5. 4.1 COMPONENT-BASED DEVELOPMENT Swarm s design heavily relies on self-contained components which has been described in much detail in section 3.3. This encapsulation of logic into components helps to work on these in isolation as well. One can easily modify a component, test this component thoroughly and be sure that it will continue to work with other components as long as the interface doesn t change. This cycle also allows to use test-driven development practices when working on individual components. Overall the encapsulation helps to improve maintainability as much as it allows further code modifications. 4.2 USAGE OF ERLANG/OTP Erlang/OTP has a long history of being used for distributed, fault-tolerant systems as well as highly concurrent applications lately. Because of its language features and the proven Open Telecom Platform it has been chosen for the prototypical implementation of the system developed in this thesis. During development and testing some aspects turned out to be especially valuable for this kind of system, thus these are described in further detail here. Virtual Machine As already outlined in section 2.4.1, Erlang byte-code is executed in a VM which supports many operating systems and processor architectures. This allows one to focus on the actual implementation of the prototype rather than having to worry about the final production environment. The resulting system is highly portable, an advantage which has long been advertised for Java applications. 65

Distributed Erlang Swarm is a distributed system in the sense that it executes builds on many other systems simultaneously. To do so it starts a new Erlang VM on such a system and connects it to the node it is running on itself. This creates a cluster of nodes which runs Distributed Erlang to easily address member nodes. Further this cluster expands and shrinks continuously as jobs are executed, because builds are started and finished. This level of controlled flexible distribution would be hard to achieve with other systems programming languages, but is provided as a core functionality as part of Erlang/OTP. Mnesia The choice of the data store being used in the prototype is baked by a thorough comparison of a set of candidates which was conducted in section 3.3.4. Mnesia offers many sophisticated features and provides advantages over the other candidates such as the embedded nature of the database. Even more compelling is that it is provided as part of the Open Telecom Platform, allowing one to use it as a database right from the start. Furthermore no additional software requirements need to be satisfied when developing, testing or running Swarm. SMP Support The Erlang VM has outstanding Symmetric Multiprocessing (SMP) support which enables Erlang applications to utilized many Central Processing Unit (CPU) cores transparently. The application needs to make sure it distributes its work over many processes while the VM takes care of scheduling those processes to run on the available processors. Because Swarm does use separate processes for each unit of work, either jobs or builds, within the system it benefits from the SMP support and can utilize a multi-core system without further modifications. 4.3 TEST RUN During development the system has continuously been tested using only a single remote system to run builds on. To further test the execution of builds on multiple remote machines, a more complex setup was chosen which better reflects a potential production environment. The test environment consisted of three dedicated servers, one which Swarm is run on and two which it used to run builds. Further Swarm is given access to eight Amazon EC2 images which it should use to run builds on. This leads to a total of 10 platforms, where each change should be build on. This test provides interesting results which are explained in detail here. Resource Consumption Swarm is a lightweight server which doesn t consume much memory nor CPU when being idle. During the tests the memory consumption increased only by a few MegaBytes (MBs) as long as builds were executed, which can be traced back to the additional processes which are used for managing and running builds. Not much additional data is kept in memory. The CPU utilization did increase from almost no usage to 1% on a 4-core system, which reflects the lightweight nature of Swarm. Because all the work is done on the remote systems, the server is left with management tasks and thus requires only few additional resources even when work is in progress. 66 Chapter 4 Evaluation

Build Latency In terms of Continuous Integration it is important that builds are executed as soon as possible to provide developers with feedback when it matters. The tests have shown a delay between the time changes occurred and when the results for builds were available. This delay is largely defined by two parts. Swarm uses a polling mechanism to check for changes in the monitored repository. If a change occurs right after such a request, Swarm will only notice it when sending the next request. The interval is defined as part of the system configuration, thus one can adjust it to fit the needs of the development process. Next Swarm needs to transfer data to the remote systems before these can start any builds. This is mostly the source code which is subject to those builds. Depending on the size of the project, builds are delayed for the time the system needs to transfer the data to the target system. Network Bandwidth Because Swarm is a distributed system one needs to observe how much data is exchanged between all participating systems and whether or not there is a bottleneck. The tests showed that the ongoing communication between Swarm and all build platforms did not incur much network overhead. All messages are compacted and very small. Furthermore the frequency is comparatively low. Instead the transfer of source code packages before starting a build is the single bottleneck because it is a blocking operation for a single build. Because Swarm did perform 10 of these transfers simultaneously, its network bandwidth was the limiting factor in this case. Thus it is beneficial to have a good network connection to all remote systems. 4.4 TEST DEPLOYMENT The system has only been tested under lab conditions as explained in section 4.3. To get better feedback about its performance for an active project it will be deployed to monitor the development of Erlang/OTP itself. Unfortunately this couldn t be finished as part of this thesis because of the high effort involved in such a deployment. Erlang/OTP is a large project which is not only actively developed but also features a unique development process which was described in section 2.4.3. The integration of Swarm into this process is still progressing and should yield results which will help to understand how the system should be improved going forward. 4.5 SUMMARY Swarm is a self-contained lightweight continuous build system which uses many systems to actually run builds. It utilizes Erlang/OTP for distributing work over several machines, using multiple processor cores to execute code and run a embedded database. Thus it uses Erlang/OTP as best as possible. Furthermore initial tests have shown that the distribution of builds works as expected while increasing the overall resource usage only slightly. The system is meant to be deployed for Erlang/OTP itself to provide further feedback on its performance in a production setting. 4.4 Test Deployment 67

68 Chapter 4 Evaluation

5 CONCLUSION The system presented in this thesis aims to provide software projects with the capabilities to use modern software such as Computing Clouds and Distributed Version Control Systems to improve the quality and support for the software being developed. As a continuous build system, Swarm can be used it its current state to automate the build and test of software. However, it is only meant as a starting point for further innovation in the area of Continuous Integration. A few such potential concepts and improvements are presented in this chapter. Finally those are summarized in section 5.5. 5.1 PLUGIN SUPPORT Swarm in its current state can t be extended easily without modifying the core of the system. Other continuous build systems [hudson] show that a flexible plugin architecture allows the users to further adapt the system to their development process. This does enhance the system and increases the audience of the system because of its support of rather exotic features, which would never be provided as part of the core system. Because of its internal event-based nature, Swarm provides a solid foundation for a flexible plugin architecture. Plugins could be given the ability to subscribe to the system events which are already used internally. Further the plugins could be added to the system supervision tree, allowing the addition of active logic as part of a plugin, because it is run within Swarm. This is made easy by the use of Erlang/OTP for the system implementation. Lastly plugins would need a separate API to access the system datastore, so that the core system data is shielded from custom plugin data. But still plugins would benefit from the advantages provided by the system s internal data storage. 5.2 RESTFUL HTTP API Using a restful HTTP API is a common way of exposing internal system functionality to external components. The ease of HTTP handling allows other systems to use such an interface more easily than it is the case with more traditional interface, e.g. using TCP/IP. Swarm already includes a web-stack as part of the OAM component presented in section 3.3.6. This stack could be re-used to serve other functionality very easily. 69

Such an API is a good example for a potential plugin. Because of the passive nature of an API the plugin would not run as processes in the system, but still use the web-stack to serve HTTP requests. It would need access to the internal data, which is provided by the data storage subsystem. Further the API could allow other system to send new changes over HTTP which would allow even better system integration. 5.3 NATIVE VCS SUPPORT Swarm uses a separate component to interface with existing Version Control Systems while providing a abstraction to other internal components as described in section 3.3.3. This approach is sufficient enough, but requires constant maintenance since the VCS command-line interface might change between versions. Many other system interface with VCSs the exact same way though. However, the underlying VCS is one of the few third-party dependencies Swarm relies on to be able to operate. One could improve the maintainability of the system even further by using a native implementation of the VCS in use, which can be embedded within Swarm. Such native support would provide many advantages to the overall system. Other systems [gerrit] already gain great flexibility by using a native VCS implementation. The data can not only be kept in memory and stored within the system s own datastore instead of the filesystem. Furthermore the tight integration would allow easier event management and faster handling of VCS events. This is currently impossible due to the VCSs and Swarm being disconnected which requires a polling approach. A native implementation of such Version Control Systems would not only allow Swarm to work with the monitored source code more flexible. As separate components these could easily be used by other systems as well, which could lead to innovative use of source code repositories and changes. 5.4 INTER-INSTANCE COMMUNICATION Originally this thesis tried to add peer-to-peer like communication mechanisms to the system to allow instances of Swarm to share patches and their metadata. This functionality requires a flexible and stable foundation, which has been the focus of this work instead. Therefor the communication between instances is proposed as a future enhancement of Swarm. Now that the foundation is provided one can focus on developing such novel features more easily than it would have been the case before. Nevertheless some thoughts about such functionality are provided here already. HTTP The ubiquitous communication protocol today is HTTP. Firewalls often prohibit any communication to the outside using protocol such as TCP/IP. Mostly only HTTP can be used because it is required for web-browsing. Swarm is meant to be used for both OSS projects and within corporate networks. If instances shall be able to communicate in such scenarios, it is almost inevitable that HTTP is used for communication to circumvent restrictions enforced by strict firewalls. 70 Chapter 5 Conclusion

Contracts Swarm instances need to establish a specification which governs which data they actually share. Such a specification may be called a contract, which is always only unidirectional. If an instance is interested in data from another instance it can provide a contract which uses black- and whitelisting to filter the data. This data, in case of a change, is then send to the requesting instance over HTTP to a common interface. Admin Approval Since one needs to permit such contracts to be actually used, the system administrator of a Swarm instance needs to review contracts which have been initiated by other instances. Further each of these can then be activated if wanted, which would lead to the proper execution of the contract whenever changes occur. 5.5 SUMMARY Although Swarm implements basic Continuous Integration principles in addition to more novel concepts such as multi-platform support, it is lacking many features which make established continuous build systems suitable for use in active projects. A extensive plugin support as described in section 5.1 is one such feature which would greatly help to turn Swarm into a mature continuous build system. Furthermore the concept of inter-instance communication, since not being covered in this work, is proposed to be the next extension to the core of the system. Nevertheless Swarm does provide a lot of value to software projects while being a solid platform for prototyping new ideas because of its fresh code-base and component-based design. 5.5 Summary 71

72 Chapter 5 Conclusion

LIST OF FIGURES 2.1 A centralized version control workflow.......................... 18 2.2 A distributed version control setup............................ 19 2.3 A development workflow employing a maintainer to approve changes to the shared repository.......................................... 20 2.4 A common workflow for repositories hosted on Github................ 22 3.1 A web server providing authentication through an LDAP service........... 31 3.2 SCM support realized through abstraction layer over specific systems........ 32 3.3 System architecture showing the component communication channels....... 33 3.4 Definition of platforms based on hardware, operating system and software..... 34 3.5 Architecture of the platform management subsystem................. 37 3.6 Supervision tree of the platform management subsystem............... 39 3.7 Architecture of the version control interface subsystem................ 40 3.8 Supervision tree of the VCS interface subsystem.................... 43 3.9 Architecture of the data storage subsystem....................... 46 3.10 Supervision tree of the Mnesia application....................... 48 3.11 Architecture of the job processor subsystem...................... 50 3.12 Supervision tree of the job processor subsystem.................... 51 3.13 Architecture of the operation and maintenance subsystem.............. 53 3.14 Example of a matrix showing the state of each build.................. 54 73

3.15 Supervision tree of the operation and maintenance subsystem............ 56 3.16 Activity diagram outlining the system behavior when the latest version of the monitored project changes................................... 61 3.17 Activity diagram showing how build results are consolidated during runtime.... 62 74 List of Figures

LIST OF TABLES 3.1 API functions of the platform management subsystem................ 35 3.2 API functions of the VCS interface subsystem..................... 42 3.3 System s data model entities including attributes.................... 44 3.4 Comparison of storage engines for the data storage subsystem........... 47 3.5 List of event hooks.................................... 59 75

76 List of Tables

GLOSSAR Byte Unit of digital information.. 55 continuous build system A system which implements CI concepts for software projects.. 3, 13 17, 25, 27 31, 33, 35, 39, 53, 58, 60, 63, 67, 69, 71 CruiseControl A Java-based framework which allows the setup of a continuous build process.. 16, 17 Erlang Public License A derivative work of the Mozilla Public License (MPL) which is used by Erlang/OTP. 22 Erlang/OTP The programming language Erlang including the Open Telecom Platform, its set of standard libraries. 22, 24, 46, 47, 51, 58, 65 67, 69 Git A Distributed Version Control System which emphasizes speed. 15, 19 21, 25, 32, 41, 43, 57 Github Project hosting platform specialized in Git hosting. 21, 24, 25 Hudson A extensible continuous integration server. 17, 60 Java Very popular programming language which has been originally developed at Sun Microsystems. 17, 65 JavaScript A scripting language which is very popular for client-side web programming.. 56 latency A measure of time delay experienced in a system.. 47 Mercurial A Distributed Version Control System which provides a consistent user interface. 15, 19, 21 Mnesia A distributed DBMS which is part of Erlang/OTP. 47 49, 57, 66 Mochiweb An Erlang library for building lightweight HTTP servers.. 55, 56 MySQL A Relational Database Management System (RDBMS) that runs as a server providing multi-user access to a number of databases.. 47 Nitrogen A Erlang web application framework.. 55, 56 77

operating system System software which manages the ways a user control the computer. 13, 16, 30, 34, 35, 65 proxy A class functioning as an interface to something else.. 50 SourceForge Project hosting platform offering free accounts for OSS projects. 21 Sun Microsystems A global IT company which sponsored the development of Hudson. 17 Swarm A autonomous build and distribution system which is developed in this thesis.. 3, 27, 30 33, 41, 43 45, 47 49, 51 55, 57 60, 63, 65 67, 69 71 TCP/IP A communication protocol which provides reliable, ordered delivery of a stream of bytes from a program on one computer to another program on another computer.. 47, 69, 70 ThoughtWorks A global IT consultancy which initially developed the continuous build system CruiseControl.. 16 Tokyo Tyrant A server which provides remote access to Tokyo Cabinet.. 47 78 Glossar

ACRONYMS API Application Programming Interface. 32, 34 41, 45, 48, 50, 53, 69, 70 Centralized VCS Centralized Version Control System. 17 19, 21, 40 CI Continuous Integration. 3, 14 17, 25, 27 30, 58, 63, 67, 69, 71 CPU Central Processing Unit. 66 CSS Cascading Style Sheets. 56 CVS Concurrent Versions System. 18 DBMS Database Management System. 47, 49, 57 DETS Disk-based Erlang Term Storage. 38, 46, 47 Distributed VCS Distributed Version Control System. 3, 15, 17 19, 21, 27, 29, 31, 40, 44, 63, 69 EEP Erlang Enhancement Proposal. 24 ETS Erlang Term Storage. 46, 47 FIFO First-In-First-Out. 38 HTTP Hypertext Transfer Protocol. 53 57, 69 71 LDAP Lightweight Directory Access Protocol. 31 MB MegaByte. 66 OAM Operation, Administration and Maintenance. 30, 33, 52, 53, 55, 56, 69 ODBC Open Database Connectivity. 47 OSS Open-Source Software. 3, 13 19, 21, 22, 24, 28 31, 39, 46, 47, 70 OTP Open Telecom Platform. 22, 24, 65, 66 PoC Proof of Concept. 14 79

RCS RevisionControl System. 17, 21 SASL System Application Support Libraries. 58 SCM Source Code Management. 14, 17 19, 21, 25, 28, 30, 32 SMP Symmetric Multiprocessing. 66 SSH Secure Shell. 35, 51 TCP Transmission Control Protocol. 54, 56 URL Uniform Resource Locator. 31, 54 VCS Version Control System. 17, 21, 32, 39 41, 57, 60, 70 XP Extreme Programming. 15, 16 80 Acronyms

BIBLIOGRAPHY [Beck2000] Kent Beck. Extreme programming explained: embrace change. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000. ISBN 0-201-61641-6. [mysql] Oracle Corp. MySQL DB, 2000. URL http://www.mysql.com/. [Däcker2000] Bjarne Däcker. Concurrent functional programming for telecommunications: A case study of technology introduction. Master s thesis, KTH Royal Institute of Technology Stockholm, 2000. [Duvall2007] Paul Duvall, Steve Matyas, and Andrew Glover. Continuous integration: improving software quality and reducing risk. Addison-Wesley Professional, 2007. ISBN 9780321336385. [Glasser1978] Alan L. Glasser. The evolution of a source code control system. SIGSOFT Softw. Eng. Notes, 3(5):122 125, 1978. ISSN 0163-5948. doi: http://doi.acm.org/10.1145/953579.811111. [Pilato2004] Michael Pilato. Version Control With Subversion. O Reilly & Associates, Inc., Sebastopol, CA, USA, 2004. ISBN 0596004486. [Rochkind1975] M.J. Rochkind. The source code control system. In IEEE Transactions on Software Engineering, SE-1(4), pages 364 370. IEEE Computer Society Press, 1975. [Royce1987] W. W. Royce. Managing the development of large software systems: concepts and techniques. In ICSE 87: Proceedings of the 9th international conference on Software Engineering, pages 328 338, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press. ISBN 0-89791-216-0. [cruisecontrol] Open Source. CruiseControl. URL http://cruisecontrol.sourceforge.net/. [cvs] Open Source. Concurrent Versions System. URL http://www.nongnu.org/cvs/. [eep] Open Source. Erlang Enhancement Proposals. URL http://www.erlang.org/eeps. [epl] Open Source. Erlang Public License. URL http://www.erlang.org/eplicense. [erlangongithub] Open Source. Erlang Source Code Repository. URL http://github.com/erlang/otp. [gerrit] Open Source. Gerrit Code Review. URL http://code.google.com/p/gerrit/. [git] Open Source. Git the fast version control system. URL http://git-scm.com/. Bibliography 81

[hudson] Open Source. Hudson extensible continuous integration server. URL http://hudson-ci.org/. [mercurial] Open Source. Mercurial. URL http://mercurial.selenic.com/. [tora] Open Source. Tora. URL http://github.com/mallipeddi/tora. [Tichy1982] Walter F. Tichy. Design, implementation, and evaluation of a revision control system. In ICSE 82: Proceedings of the 6th international conference on Software engineering, pages 58 67, Los Alamitos, CA, USA, 1982. IEEE Computer Society Press. [Tichy1985] Walter F. Tichy. RCS A System for Version Control. Softw. Pract. Exper., 15(7): 637 654, 1985. ISSN 0038-0644. doi: http://dx.doi.org/10.1002/spe.4380150703. 82 Bibliography