HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 1 Comparison of Open Source Software Configuration Management Tools Tero Kojo 44809J Tero.Kojo@hut.fi Abstract Software Configuration Management is an essential area for distributed software development, because of it s central role in cooperative work. Three open source software configuration management tools, CVS, Subversion and Arch, are reviewed, based on what software configuration management capabilities they implement. A comparison to the current commercial and academic tools is made based on an existing taxonomy. The differences of the tools are discussed and conclusions are drawn. The primary finding of the comparison are that while the open source tools provide only basic functionality, it is sufficient for the extremely distributed open source software development projects. A secondary finding is that the taxonomy used in the comparison does not sufficiently separate the tools in order to make a good comparison, but additional information is needed. Index Terms software configuration management, SCM, tools, open source software, OSS, comparison I. INTRODUCTION SOFTWARE configuration management (SCM) deals with the development time evolution of software products. This evolution is the result of multiple developers providing changes to a piece of software over time. The developers need not be in the same place or work at the same time. Several aspects of SCM systems are designed to deal with parallel geographically distributed work. [Estublier, 2000] According to current understanding open source software (OSS) development relies on SCM tools not only for their version management capabilities, but also as communication tools. Some studies have been made to understand how different tools are used in the OSS communities [Reis and Fortes, 2002], [Shaikh and Cornford, 2003]. But discussion of the individual tools has not been as active as could be expected. There seem to be a research gap regarding the capabilities and use of OSS software engineering tools. SCM tools have been previously compared, but not the tools from OSS development [Conradi and Westfechtel, 1996]. Given the free nature of OSS tools there might even be commercial interest in finding out what capabilities OSS tools currently provide. If the tools provide a sufficient level of functionality they might be used by companies in software development projects. This paper aims to describe and compare the capabilities of three OSS SCM tools, namely CVS, Subversion and Arch. A secondary goal is to find out the current status of OSS SCM tools. A. History Open Source Software (OSS), or Free Software (FS) as it is also known, is a term referring to software developed by volunteer and distributed under a certain type of licensing scheme. OSS licenses aim to free software and make sure that the original and all derivate work remain open source for everybody to use [Stallman, 2003]. Open source software development has it s roots in the late 1960 s. The introduction of ARPANET, the predecessor of the modern Internet, fuelled the potential to exchange pieces of software code between researchers. At the same time two major innovations took place that changed the history of computing, the C-programming language and the UNIX operating system were created. Based on these three cornerstones open source software development could begin it s evolution. [OReilly, 2000] B. Objectives and Method The objective of this paper is to compare three different open source SCM tools, namely CVS [CVS, 2003], Subversion [Subversion, 2003] and Arch [Arch, 2003]. These SCM tools were chosen as they are currently the most used of the OSS SCM tools available. Other SCM tools found were so seriously under development, that they can not be used as stable tools for software development. The tools will be reviewed based on which SCM capabilities they provide and how they address the parallel distributed development problem. SCM capabilities describe the technical solutions that able the tool to support users in their problems associated with parallel distributed development. The contribution of this paper is the evaluation of the current state of open source SCM tools according to the taxonomy presented by Conradi and Westfechtel [Conradi and Westfechtel, 1996]. Conclusions are also drawn on the needs of the open source software and commercial developers for SCM tools. This paper is based on a review of the three SCM tools and a literature survey. Literature was gathered by making searches into scientific publishers databases and software engineering literature databases. Information was gathered from their web sites and analysed in order to answer the questions the taxonomy has regarding the technical aspects of the tools. The same information was also analysed to understand the differences of the tools. The analysis was not made according
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 2 to a formal method, but based on the writer s own experience of SCM tools and a review of the available literature. This rest of this paper is structured as follows. Section II shortly presents open source software development. Then section III describes the different SCM models. Section IV describes the three open source SCM tools used in this paper. Section V compares the open source tools based on the taxonomy presented by Conradi and Westfechtel [Conradi and Westfechtel, 1996]. And finally the findings of this study are discussed and conclusions are drawn. II. OPEN SOURCE DEVELOPMENT OSS software development is organised loosely around small ad-hoc communities consisting of contributors from all around the world. These contributors rarely, if ever, meet faceto-face and yet share a strong sense of commitment to the project they work on [Kim, 2003]. This strong commitment has forced researchers and corporations to reconsider the economic aspects of software production [Judge, 2002]. Lately the central role of communities in OSS development has been brought up. An OSS community is a group of people, who share an interest in related systems. The definition of a community is not a clear one, but Scacchi [Scacchi, 2003] lists at least four OSS communities. These being, the game community, the Web infrastructure community, the software design community and the astronomy community. These four are mentioned, as they are the target of active research. In addition to these one can expect to find communities centred around for instance the Linux kernel and web browsers [Reis and Fortes, 2002], [Shaikh and Cornford, 2003]. One of the things that physical separation in OSS communities has brought, is the clear modularity of the software produced by these communities. Work has had to be divided into smaller parts in order to successfully create large software systems. This same modularity is mirrored in the project management and tools of OSS communities [Narduzzo and Rossi, 2001], [Reis and Fortes, 2002]. The tools of a single OSS project, the Mozilla web browser, are described by Reis and Fortes [Reis and Fortes, 2002]. It is clearly said that face-to-face communication should not be necessary for the development of Mozilla. This means that the project is an extreme case of distributed software development. Mozilla uses the CVS SCM system [CVS, 2003] as it s centralised code-base, to which developers can submit their changes and fetch changes made by others [Reis and Fortes, 2002]. CVS is also used as an important communication tool between the core developers and also as a learning tool for new developers. When submitting changes to a SCM system it is usual that a note is attached to the change describing what the change is. These notes are then reviewed by other developers. Sometimes there are even discussions in these notes, and they are a good way of learning about the development project. Shaikh and Cornford [Shaikh and Cornford, 2003] state that version management, part of SCM, trust and respect are the essential requisites for OSS development. Their statement is backed by a study of the problems encountered in the Linux kernel community concerning the selection of a SCM system for the kernel files. The problems encountered by the kernel community were largely about the suitability of SCM systems for use in such a large and distributed, yet strictly managed, project as the Linux kernel. III. SCM MODELS AND TOOLS A software configuration management tool is a system consisting of a repository server and client programs. The repository server contains the files that the developers work on in the project. The server also contains the full history of all the changes made to the files and additionally the server has all the possible configurations, i.e. all possible combinations of the files, of the software that is being developed. The clients are used by the developers to access the server. The developers can fetch files and changes made to the files by other developers and submit the changes they have made to files, so that others can use the changes. This activity through which the server handles these changes is called version management. In addition to version management, the SCM tool may be able to handle change requests, building software, document management or other tasks related to the software development process. These additional tasks are not the main functionality, but are sometimes implemented in SCM tools to provide users with a more complete and centralised set of tools. The role of SCM is especially big in distributed development projects, because of the lack of face-to-face communication. The SCM tool can handle multiple people working with the same file at the same time. Without an SCM tool this type of situation would almost certainly lead to lost or doubled work. Open source projects are extremely distributed and development is performed in a parallel fashion. Also because of the distribution the development takes place on different time zones. Therefore OSS projects need a highly reliable SCM system to support their development tasks. Due to historical reasons SCM tools them self have evolved into certain categories according to the functionality they provide [Estublier, 2000]. This evolution has taken place in the last two decades. A distinction of four main models has been made by Feiler [Feiler, 1991]. These four models are: checkin / checkout, composition, long transaction and change set. The four models are described here very shortly, for more details please see Feiler s report. The checkin / checkout model focuses on versioning of product components. It provides the low level primitives, such as branches, revisions and variants, necessary for the control of evolution and distributed development. With these primitives a system that provides the users with basic version management capabilities can be built. This type of system could allow users to fetch files and submit changes, but not much more. The checkin / checkout model is rarely used on it s own, and is more of a historical model. It is however the basis for all the other models. The composition model operates on whole configurations, rather than individual components. The main concepts are
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 3 the system model, which describes the whole software product and version selection rules, which are used to specify the versions included in a configuration. The composition model is useful in that it allows developers to choose what part of the software project they wish to edit. The positive side of this model is that it views the software under SCM control as a single unit, not just a collection of files. It is also easy to implement this model as part of the other other models. The long transaction model is most suited for parallel distributed development. The central concept of workspace provides a method for isolating changes made by developers. Workspaces can be shared between developers and even organised hierarchically. The long transaction model provides the concept of concurrency control schemes. There are optimistic and pessimistic concurrency control schemes. In optimistic schemes parallel changes are allowed, and in pessimistic schemes they are disallowed by locking a file which someone is editing. The use of workspaces and concurrency control schemes allows this model to be used in many different ways. For example if an optimistic concurrency control scheme and no workspace sharing is used the result is the basic checkin / checkout model. The change set model supports change requests in a natural way. A change set is a group of changes that when applied to a baseline configuration provide a bug fix or a single logical change. A baseline is a specific configuration that has been named, such as a release of the software product. The change set model is ideal for feature driven development, in which a product has stabilised so that no big changes are implemented, only new smaller features are added. Another use for this model is in the maintenance of a software product once it has been released, and bug fix and change requests start coming from end users. Then a single change set can easily be related to a single bug fix or change request. Lately the division to four models has become harder, as system vendors have adopted the best parts of all the systems into their products. This has lead to a situation in which most commercial tools implement the checkin / checkout model enhanced with the long transaction model and some tools also support the change set model. The biggest differences from the end user perspective seem to be in the usability of the tool and integration to other software engineering tools. Another division, which can be seen as an abstraction of the division to four, was made by Conradi and Westfechtel [Conradi and Westfechtel, 1996]. They divide tools into two categories according to how the SCM tool constructs a configuration. A configuration is an object representing a consistent and complete piece of software. Configurations can be described as version or change oriented. This division roughly corresponds to Feiler s [Feiler, 1991] Change set versus other models. Conradi and Westfechtel define a taxonomy of configuration tools, which will be shortly presented in Section V. For a more detailed description please refer to the original paper by Conradi and Westfechtel. From the point of view of distributed development tools implementing these configuration models provide fairly equal functionality. The differences arise mostly from what kind of software development process is used. The different models provide support for different style of software development, for instance the change set model is most suited for change request driven development, in which change requests from end users are implemented as they come from customers. The plain checkin / checkout model is best suited for slow moving or small projects, in which there is little parallel activity. The long transaction model provides best support for using distributed source code repositories. All rely on a repository server as a place where the source files are kept. The repository server can be centralised, as is usually the case, or distributed. A distributed server saves time in file transfer, but is technically more complex to implement. The decision of using distributed repositories should be made on a case by case basis. IV. OPEN SOURCE SCM TOOLS This section presents the three SCM tools, CVS, Subversion and Arch, chosen for this study. These SCM tools were chosen as they are currently the most used of the OSS SCM tools available. Other SCM tools found were so seriously under development, that they can not be used as stable tools for software development. These OSS SCM tools are developed by volunteers working on the projects in their free time. The tools have been developed to aid the development of software. As a piece of software grows it is harder to control the parallel changes made by developers. Therefore SCM systems are used to maintain control over the files that make up the piece of software. To do this the SCM tool provides a repository in which the history of the files is stored. A SCM tool can also be used as a communication tool. The change notes submitted by a developer when submitting changes to project files are a good source of information about the project. Figure 1 presents the use context of the SCM tool. OSS SCM tools are used mostly by other OSS projects. OSS projects are very transparent in the sense that anyone can look at the source code and suggest and sometimes make changes. This transparency places some requirements on the tool and process used in development. [Asklund and Bendix, 2002] The tool should be easy to learn and use. The long transaction model should be supported (or at least concurrency control schemes). SCM client software must exist for many different platforms. The history information of the source code should be easily browsable. Of these requirements the first and last are directly related to communication. For communication to take place the tool should be easily usable. The history information of the source code is a good place to learn about the project and see it s status. Thus the SCM tool is a communication tool as well as a source code repository. OSS SCM tools, especially CVS, are also used by commercial companies. The author is aware of at least three Finnish companies, producing commercial or embedded software, who
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 4 User Submit Repository server Submit User Fetch file file file Fetch User Fetch file Fetch User Submit Submit Fig. 1. The basic usage of a SCM tool actively use CVS in production use. The major benefit of using OSS software is the fact that they are free for anyone to use. The only costs are the server used and the work hours put toward making the system operational, which would also have to be used in case a commercial SCM system is used. A. CVS CVS, or actually CVS/RCS, is currently the most used of the open source configuration management tools. And therefore an example and target to other OSS SCM tools. The CVS project was originally started in 1986 as a set of shell scripts, the first big release came in 1989, when the system was ported to the C programming language. CVS is distributed under the GNU GPL license. [CVS, 2003] The main features of CVS are: Based on check-in/check-out model, which provides basic SCM functionality. Client-server model, implemented with RCS. RCS one of the oldest and most stable SCM servers. Support for fine grain project management. This means that CVS provides a possibility to follow changes made to the repository on a very detailed level. It is also possible to have e-mail notifications of changes. Interfaces external systems. CVS can be connected to other systems such as the Apache web server. The connection to Apache provides a way for developers to look at changes with their web browsers. CVS is the established de-facto standard of software configuration management, with probably millions of users. Sourceforge.net [SourceForge, 2003] alone hosts about 80 000 projects, with 716 225 registered users, all using CVS. This position gives CVS a clear position as the market leader, is such a thing is relevant or possible in OSS. Because of this position and the fact that CVS is based on 20 years old technology, it has good user manuals, lists of frequently asked questions, web pages with common problems and an active community that can answer questions and solve problems new users may encounter. B. Subversion Subversion is designed to be a replacement for CVS in open source projects. The Subversion project was started in 2000 and first released in 2001. Subversion has about ten core developers. Subversion is distributed under an Apache/BSDstyle license. [Subversion, 2003] The main features of Subversion are: Similar basic features as CVS. However Subversion has been implemented from scratch, thereby avoiding the possible mistakes made by the CVS developers. Versioning of directories, provides the possibility to create different variants on a higher level than with other SCM tools. This feature is missing from CVS. Native client-server architecture. Integration with the Apache web server. By using the Apache web server it is possible to natively use all the services provided by Apache. Many people view CVS as old and inflexible. Subversion has been described as the successor of CVS. The building of Subversion has taken quite a while, as the developers see the need to provide all the features of CVS and more [Subversion, 2003]. This does not change the fact that Subversion actually provides many new and useful features, Especially in the area of network functionality. A native client server model and the integration with the Apache web server are features, that improve network performance and are something CVS does not provide. C. Arch Arch is a version control system suitable for widely distributed open source projects. The Arch project was started in 2001 and the first release came in the same year. Arch has been designed to be small and simple while providing the essential SCM features for OSS projects. Arch is intended to replace CVS. Arch is distributed under the GNU GPL license. [Arch, 2003] The main features of Subversion are: Distributed repositories, which make it possible for multiple developers to have repositories. This increases the reliability of the repository, as it has been replicated to multiple sites. Advanced merging capabilities, because of distributed repository structure. Each repository can have different concurrency schemes and submission permissions for different developers. This also makes project management easier, as different developers can use different repositories and the work products can be merged between the repositories when the project deems it necessary.
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 5 Efficient handling of revision trees. As a new tool Arch has implemented new methods for handling file revisions. Developers can browse revision trees like they were part of the normal file system. Arch is the youngest of the three OSS SCM tools compared here. GNU arch has some features that make it particularly useful for public free software projects: It is easy to learn. It is a distributed system so you there s no need to give write permission to every project participant. Project participants can submit their work to some repository, from where a senior project member can merge the changes to other repositories as needed. It has excellent support for different kinds of branching and merging. [Arch, 2003] V. COMPARISON OF OPEN SOURCE TO COMMERCIAL AND ACADEMIC SCM TOOLS Conradi and Westfechtel [1996] present a comparison of SCM tools. On a high level SCM tools can be divided into version or change oriented, based on how they consider a software product. Version oriented models describe configurations in terms of of the components they contain, and more specifically which versions of the components. A configuration is constructed according to version rules, which select the desired component versions. The differences between components are described with deltas, short pieces of text that are applied to the current version to generate the previous version of the component. Change orientated models describe configurations in terms of changes relative to a base configuration. The base configuration is called a baseline. A baseline is a specific configuration that has been named, such as a release of the software product. All changes, or patches as they are sometimes called, are applied to some named baseline configuration. In change orientated models configurations are constructed by applying different changes to the baseline. These changes differ from deltas, the difference of two revisions of the same file, in several ways; they are named individuals, they comprise a single logic change in functionality and they may contain changes to multiple files. In contrast a delta is simply the difference between the two versions of a file created by a developer modifying it. All of the OSS SCM tools presented here are basically version oriented. This is due to the fact that many early SCM tools were version orientated, and the OSS SCM tools have their historical roots in these early tools. This simplifies the comparison, as it is not necessary to consider the differences between different orientations. Arch provides some concepts from the change orientated models, namely the possibility to use change sets, but it is implemented in a version orientated fashion. If the reader is interested in the exact details, please refer to the article by Conradi and Westfechtel [Conradi and Westfechtel, 1996]. In addition to a high level division to two orientations a conceptual taxonomy is also provided by Conradi and Westfechtel. The following presents the concepts of the taxonomy briefly. A. Taxonomy From the point of view of SCM, a configuration description consists of two parts. The product part describes which components are used in that product. And the version part describes which version selection rules are used on the components to create a single configuration out of all the possible configurations. These parts can be thought of as being in a versioned database, as both can change in time and their histories have to be preserved. Of course current SCM tools do not all use a database for their operations, but it is a suitable abstraction for describing how the product is configured. The actual configuration task can be seen as a query into the deductive database, with the query parameters describing how the configuration is constructed. Figure 2 presents the concepts of the taxonomy in graphical format. Versioned database contains the the versioned products to be configured. It consists of the product part, containing the components and their relations and the version part, containing the version selection rules and histories of the components. Selection order The configuration action is either product first or version first, according to which is chosen first. This means whether the product structure is defined first or only after the versions of the components for the product are chosen. An intervened option is also possible, meaning that the versions are selected for sub-parts of the product at a time. Product space is composed of the stored objects and characterised by relations such as has-part. The product space contains all the components that can be part of the final configuration. Version space is composed of version structures and rules characterising the objects stored in the database. The version space provides a means to select the specific versions of components to construct a configuration. Configuration description specifies the configuration to be constructed. The configuration description is an abstract description of the actual configuration of the software. Formalism describes the method by which the configuration is constructed. For instance boolean expression or SQL statements can be used to select specific component versions. Rule classes give a possibility to define different types of rules on which to construct the configuration. The possibilities here are: constraint, a mandatory rule which must be satisfied at any cost if not satisfied a configuration can not be constructed, preference, an optional rule that may be broken but if no explicit reason for braking exists it will be enforced, and default, also optional but even weaker than preference will only be used if no other selection is specified. These rules are then used in
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 6 Fig. 2. Concepts of the taxonomy [Conradi and Westfechtel, 1996] the construction of a configuration by applying them to the deductive database. Rule base may contain further information regarding the building of a configuration, such as preferences. This category is present, as some SCM tools have separated the concepts of configuration description and rule base. Formalism is similar to the formalism in the configuration description. Rule classes is similar to the rule classes in the configuration description. Configurator is the tool that performs the actual building of a configuration. For instance all the SCM tools described in this paper contain a configurator. Configurators can be built in many ways, but the comparison to an abstract machine making the query in the deductive database is a suitable analogy for comparing different SCM tools. Binding modes Whether the configurator in question creates the configuration accessed dynamically or statically. In static binding the configuration is constructed completely when the user wants to make a configuration, in dynamic binding the parts that the user accesses are generated only at the time of accessing. Dynamic binding eases the computation load and does not access the database as much as static binding. Degree of automation How much support does the configurator provide in the configuration task. An automatic configurator does all the necessary selections by itself based on the initial input of the user. This requires that defaults are available to all rules in the version selection rules, otherwise it is impossible to deduct which version of a component is chosen if the user has not made a selection. An interactive configurator requires user interaction to construct a configuration. Backtracking Whether the user can go back and change selections in case a selected configuration is not valid. This feature requires that the configurator can somehow identify invalid configurations. B. Comparison According to the Taxonomy Figure 3 presents the OSS SCM tools presented in this paper according to the taxonomy presented above. From the table we can see that, the tools are almost identical in this taxonomy. In the versioned database category all tools have a selection order of product first. This means that the product structure is determined before the specific component versions. Subversion and Arch provide means to version not only files, but also folders in which files are stored. This feature could be theoretically used to provide a version first selection order, but it seems that this is not currently the case for either tool. This is the difference among the tools in the product space. All tools support the idea of version graphs, which describe revisions and variants in a tree-like structure. The version graph approach is the simplest method for describing revisions and variants. Arch also has the capability to operate on change sets, or patch-sets as they are called. This capability is useful
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 7 CVS Subversion Arch Versioned database Selection order Product first Product first Product first Product space File hierarchy File and folder hierarchy File and folder hierarchy Version space Version graphs Version graphs Version graphs and change sets Configuration description Formalism Options Options Options Rule classes Preferences and defaults Preferences and defaults Preferences and defaults Rule base Formalism - - - Rule classes - - - Configurator Binding modes Static Static Static Degree of automation Automatic Automatic Automatic Backtracking - - - Fig. 3. Comparison of open source software configuration management tools in distributed development, as changes to components can be gathered to make up logical changes to the software. The configuration description category is similar for all tools. All use options provided to the configurator in a command as the formalism of choice. The options are given as command line parameters, if the tools are used without graphical user interfaces. All the tools have optional graphical user interfaces available, but in their basic set up they only provide command line utilities. All tools have the same rule classes, preferences and defaults. The preferences are given in the command to the configurator and the defaults can be defined in the repository server. A common default is that the latest version of a component satisfying the users preferences is chosen. This makes sure that a developer always gets the latest files when fetching files from the server. None of the tools support an additional rule base, therefore this category is empty. All the configurators use static binding. This means that when the user selects a configuration it is instantly constructed, if possible. This is a good thing, as OSS projects are highly distributed and many developers are not continually connected to the SCM server. When a configuration is statically bound there is no need to communicate with the SCM server until a checkin or upgrade is needed. All the tools are automatic, that is they construct a configuration on their own. The user is not able to intervene in the process. It should be remembered that constructing a correct configuration is not an easy task. Therefore it would be nice to have some interaction on part of the configurator. None of the tools support backtracking, as they are all automatic. C. Comparison to Commercial and Academic SCM Tools When compared to commercial and academic SCM tools presented by Conradi and Westfechtel [Conradi and Westfechtel, 1996], it is clear that the tools reviewed here can only compete with the most simple tools. Commercial tools provide better support in almost all categories. This can be expected, as commercial tools cost money to buy and upgrade. OSS tools are free to use and upgrades are provided by the OSS development project, which is a major competitive advantage. The commercial tools however provide more user friendly features, such as dynamic binding and version first selecting, features that are seen as beneficial in commercial software engineering. The experimental or academic SCM tools reviewed provide significantly better support in some areas of the taxonomy, but may be on a similar level as the OSS tools in other areas. This is also expected, as academic tools are built to prove an aspect of SCM theory, or to provide a testing environment for a research hypothesis. These tools need not be as finalised as those used in active software development. It should be noted that this taxonomy only considers the technical aspects of the tools. For instance ease of use is not addressed. However if we look at the tools from a viewpoint of a developer doing distributed development, the taxonomy seems to be sufficient. The only thing missing from the taxonomy is how the tools handle parallel changes. This is important for OSS development as it is extremely distributed. Parallel changes are handled by optimistic or pessimistic concurrency control schemes [Feiler, 1991]. An optimistic concurrency control scheme allows many people to work on the same part of the product in parallel. A pessimistic concurrency control scheme locks a part that someone is working on, so that no one else can access it. Currently it is standard procedure in SCM tools to provide both concurrency control schemes. This is the case with all the tools reviewed here, so this does not provide differences to the comparison. When compared to commercial tools it would be interesting to know whether developers find the additional features of commercial tools necessary. This is an interesting question as OSS software does not cost anything to use. And commercial SCM tools usually have fairly expensive licences. Can the additional cost of commercial SCM tools be recovered in increased productivity?
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 8 VI. DISCUSSION The three OSS SCM tools are all in different phases of their life-cycles. CVS is currently the oldest and most stable of these tools, Subversion is still in development, but nearing stable status and Arch has just reached stable status. When the Linux kernel community discussed the possible SCM solutions for maintaining the kernel, Arch was the only viable option [Shaikh and Cornford, 2003]. In the end Arch was not selected (Linus Torvalds opted for a commercial tool BitKeeper) because of the fact that at the time it was still heavily under development. The Linux kernel is an archtype of a centrally lead OSS project, with a small team of core developers accepting changes from the community and maintaining the actual product by themselves. The choice of opting to use a closed source tool gives a good idea of the current situation in OSS SCM tools. CVS was seen as being too open for changes and Arch as immature for heavy use, Subversion was not available at the time. All the OSS SCM tools reviewed here provide the necessary basic functionality for distributed development. All the tools are based on the checkin/checkout model, but provide limited functionality from the other SCM models. This actually comes from their RCS heritage [Conradi and Westfechtel, 1996]. All have some features of the long transaction model. Arch also provides a limited change set functionality. Even with the additions the OSS SCM tools are on a simple level when compared to commercial and experimental tools. However it seems that the developers have not seen a need to implement the more advanced SCM models. If the need for more advanced SCM models would have been perceived, it is certain that some OSS SCM project would have tried to implement them. As the developers in an OSS SCM tool project are using an SCM tool at the same time, they know the problems associated with SCM tools. CVS is currently the most widely used of the tools. It has a large and steady user base, the documentation is comparable to commercial tools. Based on this it is safe to say that CVS provides all the necessary functionality an SCM tool needs to provide. Based on personal experience and discussion with developers in academic and industrial settings, it is interesting to note that not nearly all of the functionality of CVS is used in a normal project. The most common features used are version control, including merging, branching of development and making baselines with the tagging function. These are the most rudimentary of operations available in SCM tools, yet they seem sufficient. The Subversion project is aimed at making a replacement for CVS. It provides, as of version 1.0, almost all the functionality of CVS and additionally advanced distribution mechanisms based on the Web DAV protocol. This is provided with a native interface to the Apache web server, which also provides the possibility of easily adding other protocols later. Arch has taken a minimalistic approach to the SCM problem. It only provides the basic functionality of CVS, but provides good distribution mechanisms and limited change set functionality. Arch has been implemented in an amazingly compact style. The main server is just 30 kloc of C++ code. This makes for good maintainability, when compared to the over 1MLoc in CVS and Subversion. The OSS way of making software is based on modularised design. In SCM this means that all the different parts of a full SCM system are implemented as separate tools. These tools include Make or Ant for build management and Bugzilla for change management. In commercial tools these functions are usually all present in the same package. With OSS tools the developer is responsible for gathering a suitable tool set. This also brings the possibility of changing to another tool, if for some reason the current one is not good enough. VII. CONCLUSION SCM tools are essential for distributed software development, because of the cooperative work possibilities they provide. When compared to commercial and academic SCM tools, it is clear that the ones built by the open source community have the most simple features. Academic SCM tools are those designed and implemented by academic researchers to prove some aspect of SCM tools feasible. One can even say that they provide only the most common functionality necessary for performing SCM actions. However the CVS system is the most widely used SCM system currently in existence. With only SourceForge.net [SourceForge, 2003] alone hosting nearly 80 000 projects using CVS. From this one could deduct that only minimal SCM system functionality is needed by developers. Of course this might be only true for open source development, and research should be made to see whether this holds in corporate environments. When the three SCM tools, CVS, Subversion and Arch, were compared with the taxonomy presented by Conradi and Westfechtel [Conradi and Westfechtel, 1996], they proved to be almost identical in their features. As the tools have been designed separately, it seems that either the tools are very similar or the taxonomy does not provide the means to distinguish the tools sufficiently. Both options seem to be correct. The tools are similar in basic design, but implemented in different ways, and providing different capabilities. For instance all tools provide different communication protocols and repository structures. The taxonomy is designed to separate the different tools on a conceptual level. However comparing tools on that level will not bring out the differences that create the distinct user experiences. As a conclusion it is safe to say that the meagre functionality provided by the current SCM tools is sufficient and essential for all the open source communities. At least there are no new OSS projects underway to implement new SCM systems that would provide more functionality. Without SCM tools there would be very little distributed open source software development. The contributions of this paper is the detailed description and comparison of the OSS SCM tools. All the tools provide features that are sufficient for doing parallel distributed development. When compared to commercial tools the OSS tools provide meagre functionality. Based on this it is questionable how useful the additional features provided by commercial tools are.
HUT / SOBERIT 2003 T-76.651 DISTRIBUTED DEVELOPMENT SEMINAR 9 Based on these conclusions it would seem that a more thorough comparison of SCM tools is needed. A comparison that would take into account more than the technical concepts of the tools. Also a study of how SCM tools are used for communication in an OSS development project would be beneficial. Such a study is currently missing and could provide more insight into how OSS projects are managed. REFERENCES [Arch, 2003] Various, The Arch Version Control System Website. http://gnuarch.org/ Last visited 10.11.2003 [Asklund and Bendix, 2002] Asklund, U. and Bendix, L., Software Configuration Management in Open Source In Proc. 1st Workshop on Open Source Software Engineering, Toronto, Ontario, Canada, May 15, 2001 [Conradi and Westfechtel, 1996] Reidar Conradi and Bernhard Westfechtel, Configuring Versioned Software Products. In ICSE 96, Proceedings, LNCS, Vol. 1167, May 1996, Springer-Verlag, pp. 88 109 [CVS, 2003] Various, The CVS Version Control System Website. www.cvshome.org Last visited 10.11.2003 [Estublier, 2000] Estublier, J., Software Configuration Management: A Roadmap. In ICSE 2000 - Future of SE Track, Ireland, May 2000. IEEE Computer Society Press, pp. 281 289 [Feiler, 1991] Feiler, P. H., Configuration management models in commercial environments. Technical Report CMU/SEI-91-TR-7, Carnegie-Mellon University, Software Engineering Institute, March 1991 [Judge, 2002] Judge, P., Ballmer: United, we ll stomp on Linux. CNET News.com 24 September 2002 http://news.com.com/2100-1001- 959165.html Last visited 10.11.2003 [Kim, 2003] Kim, E. E., An Introduction to Open Source Communities. http://www.blueoxen.org/research/00007/index.html. Blueoxen Research Report Last visited 10.11.2003 [Narduzzo and Rossi, 2001] Narduzzo, A. and Rossi, A., Modularity in Action: GNU/Linux and Free/Open Source Software Development Model Unleashed. In: http://rock.cs.unitn.it/dett anno papers.php?anno=2003 Working paper, May 2003 Last visited 10.11.2003 [OReilly, 2000] Raymond, E. S., in O Reilly, T. editor, Open Sources. Voices From the Open Source Revolution. O Reilly, 2000, pp. 15 20 [Reis and Fortes, 2002] Reis, C. R., and de Mattos Fortes, R. P., An Overview of the Software Engineering Process and Tools in the Mozilla Project. In Workshop on Open Source Software Development, Newcastle UK, February 2002 [Scacchi, 2003] Scacchi. W., Free/Open Source Software Development Practises in the Computer Game Community. Working Paper, Institute for Software Research, UC Irvine, April 2003 [Shaikh and Cornford, 2003] Shaikh, M. and Cornford, T., Version Management Tools: CVS to BK in the Linux Kernel. In ICSE 03 - The 3rd Workshop on OSS Engineering, Portland, Oregon, May 2003 [SourceForge, 2003] Various SourceForge.net. http://sourceforge.net/ Last visited 10.11.2003 [Subversion, 2003] Various The Subversion Version Control System Website. http://subversion.tigris.org/ Last visited 10.11.2003 [Stallman, 2003] Stallman, R., The GNU Website. http://www.gnu.org/ Last visited 10.11.2003