Open Source Software Project Success Factors / Modularity As a Success Factor Niko Haapaviita, Jere Jokelainen, Pekka Reijonen & Juuso Haikonen Abstract 1. Introduction 2. Modularity 2.1 Modularity in open source software development 3. Distributed development 3.1 Open source software architecture 3.2 Releasing in branches 3.3 Open source development models 3.4 Parallel development 3.5 Reusability 3.6 Reuse in Organizations and Companies 3.7 Cost efficiency in knowledge reuse 3.8 Innovation 4. Conclusion 5. Discussion References Abstract Open source source software projects consist of several parts, of which this research focuses on modularity and parts that are related to it. Due to nature of open source community, projects are comparably easy to take part in because access to source code and different release versions has been made rather easy. How an open source software project becomes
successful depends on not only modularity but also for modularity itself to become beneficial and successful requires things such as management, architectural decisions, principles of code contribution, choices of distribution, version control and reusability to work seamlessly. This presents challenges in such an accessible environment. Research conducted on modularity related success factors in this article reveals that it is relatively little looked at as a whole. Arguably it is yet difficult to declare a solid statement of proof when there are so many interdependent features in order for a project to become successful. However, this research reveals that there certainly are particular key points that have a positive effect towards achieving this goal. Key factors revealed by this research are listed in conclusion section. 1. Introduction Open source software development is an alternative for traditional closed source development. Open source differs from it in many aspects although that does not appear to be a key point in offering a higher success rate for development projects. Issues to tackle, such as management, licenses, distribution of the code, version control and testing require somewhat different approach due to the nature of requirements in development processes. This article focuses on open source software development from modularity point of view by first explaining what its part in OSS projects is, what it consists of and follows with what benefits it gives for an open source software project in order to succeed. We finish the article with conclusions and discussion about key points and other research wise open subjects of modularity in open source software development. 2. Modularity Modularity can be described in many different ways. Huan and Kusiak (1998) define that modularity is about using common units to create product variants. The goal is at the identification of independent, standardized or interchangeable units to satisfy a variety of functions. In turn Gershenson, Prasad & Zhang (2010) found in their study only one consensus in defining modularity. It was that a modular product is created of modules that build blocks. The idea is to have more components in a module, instead of them lying around independently, makes a project itself more modular. Third definition is from Schilling (2000) saying that modularity is a general system concept describing those system s components that can be separated and recombined.
2.1 Modularity in open source software development As Peng, Geng and Lin (2012) cited from Parnas research (1972), a modular design project s software code base is divided into modules. These modules use common interface to communicate with each other. Benefit for using modular design in OSS development is that decision making over these modules can be delegated to individuals who have expertise to that particular code (Mockus, Fielding, Herbsleb, 2002). Other benefits are parallel development, resistance against interruption in production process, reduced communication costs, recombination of source code, reuse of code and increased quality (Peng, Geng, Lin 2012). However Peng, Geng and Lin (2012) note that the effect of modular architecture on OSS development has not been examined on researches, at least as far as they know. A related problem in open source is free riding or inequity that means many OSS projects attract large amount of programmers but often only little amount of coders actually develop the system. Free riding can potentially derail the whole development of the project as well as endanger the application of OSS model to other areas as well. With modular software development it is possible to alleviate the free riding because in modular design the whole project s code base is divided into many loosely coupled segments or modules that are in communication with each other by common interface. When the project is divided into modules the responsibilities can be divided and delegated to specific individuals and this leads to more equalized participation. Even though it is a problem free riding in open source software development can be reduced by using modular code architecture. (Peng, Geng, & Lin, 2012). 3. Distributed development Software development can be compared to highway repairing. It can be done in two different ways, either close the whole road and give the team a complete access for repairing the road or keep the road open and fix it a couple lines at a time. Open source development is just like the second one, there are several different lines of development that are repairing and maintaining simultaneously. There are always some lines closed and being developed and the lines are not dependent on each others works. (Fogel 2005). Open source development is almost always distributed and collaborative even though distance is generating difficulties. Despite these difficulties open source projects have managed to generate succesfully very large projects that are very complex. Open source projects have several developers that are from different countries and they rarely meet in person. (Gutwin, Penner & Schneider, 2004). According to Crowston et al., distributed development also requires
more effort put in the integration when teams are distributed and they are unfamiliar with each others works. This can lead to delays of the project releases in comparison to face to face teams. While literature highlights these problems open source projects have provided a counterexample because there are thousands of very small to large successful open source projects. (Crowston, et al. 2007). Due to development process being distributed to separate development teams it allows role delegation for example to testing, merging, reviewing and integrating changes from new developers. By enabling development in distributed version control systems, in contrast to centralized ones where those who can only commit are core developers, changes and contributions done in intermediate repositories are kept in historical records. In practice this means that access to artifacts has been made easier, new users are allowed to maintain authorship of their contribution changes, tracking of historical information of intermediate branches is available, participation of non core developers is easier and teams are allowed to implement new workflows. (Rodriguez Bustos, Aponte, 2012). Aforementioned findings highly suggest that instead of keeping open source development centralized it is much more feasible for open source projects in sense of keeping development environment alive with distributed version control option. 3.1 Open source software architecture Because software development is nowadays using distributed and outsourced development, software engineers need to understand, develop and extend product family architectures in such a way that it is possible to include all the development elements that distributed development has into the architecture. To make that possible, engineers need the expertise in integration techniques like connectors, adapters and wrappers for creating an architecture that handles all the complexity that is generated by the nature of distributed development teams. Engineers put their expertise to use in creating building blocks for architectures that control complexity caused by components designed and developed in parallel by diverse teams. Reducing complexity even more and enhancing architectural consistency it is important to know goal or prescription based architectures. (Hawthorne & Perry, 2006). Plug in architecture helps new developers to join in the project because they are able to develop some new part of the software. They don t need to know the whole system. Instead, they might focus on one part (plug in) developing. This enables that people can join the development process with low level of understanding of the project beforehand. This is, for example, how the Gimp project gained new developers whom can choose a part of software which is the most interesting. (Ye, Kishida, 2003).
Distributed software architecture can be used in both closed source projects, such as Siemens nine projects studied by Herbsleb and Bass (2005), and in open source projects like Mozilla and Linux. However we think that it is crucial that global open source software has distributed architecture style because there should be at least an opportunity to develop software modules simultaneously regardless of the developers locations. In Linux s case it consists of an architecture that is built on subsystems allowing jumps to various versions of releases and are followed by new revision additions, followed by periods of relative stability. This is a typical growth pattern for such subsystems. Since open source software architecture is distributed, development teams can develop and maintain architecture support separately from main release development. (Godfrey & Tu, 2000). 3.2 Releasing in branches From the developers point of view free softwares are always in the state of continuous release. Usually developers run and use the newest version because they want to track and find the bugs in software and also to keep up with the newest features. Many times, a problem emerges in deciding when software is ready for release and in open source projects there can be newly started features that can be just lying in in progress state and make the software unstable and not ready to be a formal release. Solution for this problem is to release software in branches which means that there is always a branch that includes a ready and stable version of a software that can be isolated from other versions. (Fogel, 2005). This branching makes it possible for developers to work in separate branches. Benefits that can be achieved from using branching is preventing workflow disruption and reduce overall development cycle times. Reduction of cycle times is achieved by having teams balancing their development in isolation before adapting with their organization. Integrations include merging code, resolving merge conflict and compiling and testing the merged code. (Phillips, Ruhe & Sillito, 2012). Version control systems have a very important role especially in open source projects which are under distributed development as they may have several branches. If something fails there are always backlogs and developers can take a step back in development history and those steps have been saved. In addition, version control systems are also very important since they allow developers to see the whole life cycle of the project s branches. (Rodriguez Bustos, Aponte, 2012)
3.3 Open source development models There are a few different models that describe open source software development. One is Johnson model that is created to be like a static Bayesian game of code contribution. In Johnson model a programmer can either choose to code one module or the whole project under a monolithic architecture. Johnson notes that the project size is the key factor in defining modular design s contribution encouragement. Basic idea is that when there are enough programmers, modular design outperforms the monolithic design. Recap for Johnson model is that modular design has the potential to achieve contribution in larger projects. (Peng, Geng & Lin, 2012). In Baldwin and Clark s open source software development model programmer chooses to work on one and only one module. In this model programmers either choose to code or not to code so it s like a binary choice: yes or no, there is no alternative choice between them. In modular design programmers assign themselves to equivalent modules. This way the workload between the programmers is distributed evenly and opportunity for free riding is reduced. (Peng, Geng & Lin 2012). Peng, Geng and Lin (2012) argue in their study that while these two models are path breaking Johnson s and Baldwin s and Clark s works still leave some questions unanswered. They say that while you are studying code contribution some important aspects of open source software development should be included.their first point is that when contributors are reacting to the design of the architecture their different motivations and resource endowments will play a critical role. Second point is that code contribution is a collective job. It should be flexible enough to enable parallel and different kinds of contributions instead of an assumption that one programmer is coding a single module one point at a time. Final point is that the two concepts: code participation and code contribution, should be separated from each other. Where code participation is a yes or no matter, code contribution is a continuous variable measuring two things: how many lines of code are created by a programmer or how much effort is made to a code base. That is why code contribution is more accurate measurement of free riding than code participation. (Peng, Geng & Lin, 2012). Findings listed above suggest that having the ability to distribute workload between programmers to equivalent module assignment it sets a requirement for having developers with such skills at hand that they are able to be directed to equivalent modules. This clearly is dependent on whether it is a success factor not only in problem solving but also somewhat a decisive factor in schedule accuracy.
3.4 Parallel development Modular design enables for dividing design tasks for parallel development. The overall design task needs to be divided into smaller tasks and the interface between them has to be suitably determined. After this division it is possible to reuse the existing design with a modular architecture. This enables that upgrading an existing product takes less time and effort. (Sosale et al. 1997). In addition, Huang and Kusiak (1998) list following benefits for module sharing across product families: economy of scale in product development, increased feasibility of component / product change, increased product variety and decoupling risk. Modularized software architecture gives a possibility to new developers to join to the open source project more easily because tasks have been divided into smaller stages. This also gives developers more alternatives to choose from a pleasant module of the software for themselves which also encourages developers to go to unpleasant areas after they have gained some experience from an earlier development module. Those modules can be very close to each other and those can be divided according to difficulty level which support progressive learning. (Ye, Kishida, 2003). Modular structure of the open source software supports learning very well and learning curve for new developers is very low because they can focus on a particular module of the software without the need to know whole system. Fitzgerald, B. (2006). The learning process of developers consists of a small phases just like the software. They need to learn only a small section at a time. 3.5 Reusability The benefit of modularized programming is that it allows modules to be reassembled and replaced separately without reassembly of the whole system. (Morris, 1972). From open source perspective, modularity is widely used in manner of reuse. There are open source projects that are designed just for being reused in other projects or to provide functionality for different projects. For example, a software called Lame, is a music encoder that is designed to be built in into another programs for providing a functionality to create MP3 files. The reuse of the code is not always simple, because there are limitations such as those caused by licenses like GPL that grant permissions of the components within the limits of the license. (Haefliger, Von Krogh, & Spaeth, 2008). In open source software development, you should expect that open source developers build on each others works. (Haefliger, Von Krogh, & Spaeth, 2008). Even though you should expect someone can use your work, contributions to open source projects are not just pure public good because these developers / innovators have significant private considerations even though it will be freely revealed for matters such as self use, reputation gain or career concerns. (Peng, Geng
and Lin 2012). For reuse of the knowledge it is important that there are fewer restrictions of intellectual property distribution which has been one of the main problems in closed software projects. Open source projects can guarantee that every developer has access to the software libraries. Only practical challenge is that searching for reusable modules can take a lot of time and might be difficult to link to the existing part of program. (Haefliger, Von Krogh, & Spaeth, 2005). In open source software development licenses convey the basic rights to a developer to retrieve the code, explore it and as well to modify it. Licenses also approve the distribution of the modified or unmodified versions of the project to the other people. Although these approving licenses can lead to reuse by other people, licenses should be chosen in a way that it still motivates projects own developers to contribute. 3.6 Reuse in Organizations and Companies The reusability has enhanced due to development of generic software architectures as well as the modular software architectures. Like Haefliger et al. (2008) cite to Banker and Kaufman s research (1991) that generally modular designs require substantial investments, that pay off only in a long time period when the development efforts are saved through the reuse in software projects inside organizations. Once created modules and components can be reused which reduce significantly deployment time of new software projects (Von Krogh, Spaeth, & Haefliger, 2005). Like Knight and Dunn have said in 1998, that reusing codes and components from libraries increases the quality because it allows fully tested and debugged software. (Haefliger, Von Krogh, & Spaeth, 2008). In contrast to reported benefits, several studies have proved that code reuse in software development in organization is problematic and that success of the corporate reuse programs is dependent on organizational factors rather than technical factors. (Haefliger, Von Krogh, & Spaeth, 2008). Banker et. al 1993 has noted that the success of reuse of the code in corporate depends on whether the costs for developer to find and search the code that is able to be integrated is lower than writing from scratch. (Haefliger, Von Krogh, & Spaeth, 2008). This can be done according to Haefliger et al (2008), by creating standards and tools that facilitate the search for and as well integrates it to the system.
3.7 Cost-efficiency in knowledge reuse Reuse is defined (Barns & Bollinger, 1991) as a human problem solving matter that describes the nontrivial aspects of software development and maintenance that cannot easily be formalized or automated using current level of expertise. According to Barns and Bollinger (1991) reuse of the code in sense of cost efficiency is comparable to precious metals in metal industry where they must be used carefully, be replaced by less expensive resources when possible and recovered for further use when it s practical. Based on belief from Barns and Bollinger (1991) human problem solving can be viewed from scarce and wide points of view. For narrow option, three factors for human problem solving are proposed. First, enhancing communication of solutions with developers and by helping development groups select environments that support worker productivity represent good planning which reduces loss of human problem solving. Second, tool building is a central process of automation which in well understood activities or tasks like the conversion of formulas into assembly code are replaced with less costly automated tools such as compilers. Third, effectiveness multiplier of human problem solving relies on code reuse by ensuring that extensive work or special knowledge used to solve specific development problems will be transferred to as many similar problems as possible. Reuse can amplify the effect of formally defined work activities. These are arguably the most essential factors in order for modularity to make a project s workflow practically feasible. While scarce view on human problem solving focuses on issues of reusing the source code, broader view proposes reusability on requirements, designs, code modules, documentations, test data and customized tools. Broad view proposal is especially important since it has the potential for reducing costs significantly (Barns & Bollinger 1991). This would suggest that also modularity factors play an important role both in parallel development and reusability. Barns and Bollinger s example of code module s reuse from a custom database system can reduce costs because system s overall functional specification could lead to the reuse of the entire set of designs, code modules, documentation, test data and associated user experience that were developed from that specification. This gives reason for cost effective reuse because the requirements for adaptability related to integration are comparably lower. Reuse of the code as a positive factor for modularity does not only depend on availability and extent of use. Whether you make it cost effective depends on investment in reuse process. Barns and Bollinger (1991) state that reuse investment costs depend on technologies used in the project. Comparing technologies by their ability to allow a developer make components readily available for reuse will help eliminate the cases where aiming for reuse in such technologies would become too expensive. This clearly suggests that it will help determine the
factors towards success of modularity part in open source software projects as it most definitely saves in total costs. 3.8 Innovation In open source development innovations can be easily distributed and reused in different kinds of software. Like Peng, Geng, & Lin (2012) say, developers can share their private knowledge in the project and it is then available in common pool. So we think that in a way developers can reuse once invented modules and share innovations. Besides, developers can easily create new innovations by combining them with their own creations. Developers can use those pre created modules in a different kind of context when those innovations can give value to new innovations just by combining or/and changing use context of the module. This kind of cooperation is a tempting business point of view where innovations play an important role. (Peng, Geng & Lin, 2012). Modular architecture works as an innovation bank which attracts other potential business partners to join the project. Modular structure guarantees that it will lead to the successful use of open innovation which makes the project to look more attractive for other business contributors. (Peng, Geng & Lin, 2012). 4. Conclusion Modular design is a vital part of open source project s nature and is defined from several viewpoints by various authors. In practice, flexibility of workflow distribution enables higher chance of problem solving in shorter time in a development cycle. Also, as modules being separated, others can continue their work uninterrupted despite the fact that another module might have encountered hindering obstacles. However, by having a chance to participate in a project relatively conveniently, new developers clearly present a risk of derailing the project from its tracks because of likelihood of free riding. Modular architecture reduces this risk. It is debatable whether this is a success factor or not but at least it opens the door for creativeness in a low participation threshold. When there is an ease of access it promotes global development. Yet again it presents a debatable factor whether it makes distributed development successful since people will most likely need to familiarize themselves with other developers work and if discussion boards and forums are likely the only way of communication for clearing out confusing parts of work it risks the project being delayed. Another factor is decentralized version control system which alleviates access to release history. This eliminates so called roadblocks in development process which definitely improves the success rate of project completion.
Probably one of the strongest success factors in modularity is its modularized software architecture which encourages progressive learning and by that the learning curve towards more challenges is lower. In result this improves the likelihood of getting contributions from progressively more advanced developers which then again elevates the chances of the project not only being completed but also being completed successfully. Modularity is a very important area of open source software development and it includes many aspects that have to be taken into account. Modularity affects in development process in a way that it improves the overall development process. It enables things like more efficient development with distributing the work between developers which also improves the overall quality of the project. Modularity also improves the reusability of the code that can be experienced in a good or bad way. In a good way if projects can use other projects ready modules it makes the development process faster and shortens the development times, but in a bad way it can be felt to be negative from the original projects developers or if modules actually need more maintainability than initially suspected it may backfire in working hours and thus also increase costs rather than reduce them. For the sake of reusability its viability is determined by whether it becomes cost efficient or not. This is a factor to be looked at in projects individually. Which technologies to choose from have a direct impact on total cost benefits either positively or negatively for each project. Some work better for others and for some they might not work at all. Based on this finding project success is not only quality dependent but naturally a financial question as well. In addition, a strong factor for improving modularity success is its architectural design because possibility of combining or changing different contexts from so called common knowledge pool attracts new investors and contributors. By having a larger investment capacity gives a chance for options in choosing the technologies needed for reusability in particular projects. As stated by research above, modular architecture having either positive or negative effect on an open source projects seems to show no valid proof. With approximately one year between that and our research we ended up with the same result. This presents another feature which cannot be left out but at the same time definitely weighs in the success factor cup of modularity. Modularity as a topic is quite difficult to handle because this it has not been studied/researched especially from this perspective. This made a huge limitation for our study, because there was not so much relevant information available. Such leaves much room for further research and research to be divided into several features that affect success of modularity as a whole. 5. Discussion Modularity is merely one part of open source projects but is a critical factor that consists of several smaller elements typical to open source development ethics. One debatable factor is flexibility. What does it extend to, what is it limited to? We believe that it makes the developers
more willing to contribute to project, because they can choose the modules based on their motivation, expertise and time available. However, on the other hand flexibility increases the opportunity for free riding, because developers are responsible for their own work. In worst case scenario, this free riding slows down development process in modules or might even lead to failure of the whole project. We believe that free riding increases the risk of failure of the project when its size is relatively small. Many researches mention project success or use the word benefit in open source software development. Still three arguments where those words were involved generally gave the impression that they could not claim an absolute truth to these subjects to be success factors each and every time hence the need for mentioning not only advantages but also disadvantages in various parts of modularity in open source development. Open source projects may present some risk when developing is divided into several branches that because modularity gives a possibility for developers to create more easily new branches which can evolve to a new software project and that can be a competitor in future. Developing in several branches is quite effective but when we are observing that from the business perspective these kind of problems can always occur. But from open source perspective these kind of software developing methods can be a good thing since it creates new innovations for a general use and in that way there are overall benefits for software industries. We believe that reusing the code or even whole modules can lead to substantial savings in both time and money. Open source enables companies to use almost ready software modules in their projects. From technology s point of view, you need to make sure that the module communicates with project s other modules through common interface. In addition, you have to make sure you are obeying all the reused module s licence steps or other prerequisites. Problems may occur when companies try to fit their competitors modules in their own projects. First, modules can fit easily to other projects but when the time goes by in the development process, your interface may not be able to communicate with these new evolved modules. Additionally, by following modular distributed development, individuals can raise their skills and expertise in object oriented programming. By having rather limited experience regarding the subject, our team had to put effort in understanding the role of modularity in open source software development. Based on collected research papers that we included in this article suggest that there is yet a lot to cover in sense of success factors that affect open source projects from modularity viewpoint. References Barns, B. H., & Bollinger, T. B. (1991). Making reuse cost effective. Software, IEEE, 8(1), 13 24. Crowston, K., Li, Q., Wei, K., Eseryel, U. Y., & Howison, J. (2007). Self organization of teams for free/libre
open source software development.information and software technology, 49(6), 564 575. Fitzgerald, B. (2006). The transformation of open source software. Mis Quarterly, 587 598. Fogel, K. (2005). Producing open source software: How to run a successful free software project. O'Reilly Media, Inc.. Godfrey, M. W., & Tu, Q. (2000). Evolution in open source software: A case study. In Software Maintenance, 2000. Proceedings. International Conference on (pp. 131 142). IEEE. Gutwin, C., Penner, R., & Schneider, K. (2004, November). Group awareness in distributed software development. In Proceedings of the 2004 ACM conference on Computer supported cooperative work (pp. 72 81). ACM. Hawthorne, M. J., & Perry, D. E. (2006). Software engineering education in the era of outsourcing, distributed development, and open source software: challenges and opportunities. In Software Engineering Education in the Modern Age (pp. 166 185). Springer Berlin Heidelberg. Herbsleb, J. D., Paulish, D. J., & Bass, M. (2005, May). Global software development at siemens: experience from nine projects. In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on(pp. 524 533). IEEE. Johnson, R. E. (1997). Frameworks=(components+ patterns). Communications of the ACM, 40(10), 39 42. Mockus, A., Fielding, R. T., & Herbsleb, J. D. (2002). Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology (TOSEM), 11(3), 309 346. Parnas, D. L. (1972). On the criteria to be used in decomposing systems into modules. Communications of the ACM, 15(12), 1053 1058. Peng, G., Geng, X., & Lin, L. (2012, January). Modularity and Inequality of Code Contribution in Open Source Software Development. In System Science (HICSS), 2012 45th Hawaii International Conference on (pp. 4505 4514). IEEE. Rodriguez Bustos, C., & Aponte, J. (2012, June). How Distributed Version Control Systems impact open source software projects. In Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on (pp. 36 39). IEEE. Schilling, M. A. (2000). Toward a general modular systems theory and its application to interfirm product modularity. Academy of management review,25(2), 312 334. Von Krogh, G., Spaeth, S., & Haefliger, S. (2005, January). Knowledge reuse in open source software: An exploratory study of 15 open source projects. InSystem Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on (pp. 198b 198b). IEEE. Ye, Y., & Kishida, K. (2003, May). Toward an understanding of the motivation of open source software developers. In Software Engineering, 2003. Proceedings. 25th International Conference on (pp. 419 429). IEEE.