Can We Predict the Generation of Bugs? Software Architecture and Quality in Open-Source Development

Transcription

1 Can We Predict the Generation of Bugs? Software Architecture and Quality in Open-Source Development Manuel Sosa Technology and Operations Management Area INSEAD, Fontainebleau, France Jürgen Mihm Technology and Operations Management Area INSEAD, Fontainebleau, France Tyson Browning Department of Information System and Supply Chain Management Neeley School of Business, Texas Christian University, Fort Worth, Texas, US Abstract We study how software architecture relates to quality. Based on a software architecture representation that accounts for not only the hierarchical arrangement of its subsystems and components but also their dependency structure, we formally define the notion of system cyclicality. System cyclicality is an architectural property that captures the fraction of mutually interdependent components in a system. By examining multiple versions of 20 open-source, java-based applications (126 total) developed by the Apache Foundation, we empirically analyze the relationship between software architecture characteristics and the creation of bugs. Our results suggest that, while controlling for various system characteristics, system cyclicality is a key determinant of bug creation. Interestingly, we found evidence that it is not just the cycles themselves, but how hidden they are, that drive the effects of system cyclicality and bug creation. From an academic viewpoint, this work provides a theoretical and empirical basis for a causal link between the architecture of a complex system and its quality. This has important implications for the management of complex system design and development in fast-paced industries such as software. Our results suggest that managers could benefit from proactively examining the architecture of the system they develop and monitoring its cyclicality as one of their strategies to mitigate the creation of defects.

2 1 Introduction Previous work has studied the implications of system architecture decisions to various aspects of the firm (e.g., Baldwin and Clark 2000; Ulrich 1995; Yassine and Wissmann 2007). However, little attention has been devoted to understanding the link between product architecture characteristics and product performance. How does the architecture of a system influence its (conformance) quality? More specifically, to which features of the system architecture should managers pay attention during the development process to minimize the emergence of defects? To study the relationship between system architecture and quality, we examine the development of software applications for several reasons: they are complex, they exhibit fast change rates (like fruit flies in studies of biological evolution), and they offer (through their source code) an efficient, reliable, and standardized medium to capture the architecture of their design. In addition, software applications typically have centralized repositories to reliably track the quality issues associated with each release. Limited research in both the engineering design and computer science communities has addressed the link between system architecture and quality. On the engineering design side, researchers have studied the architecture of complex products to explore how the direct and indirect connections among product components influence the propagation of design changes. This line of research has suggested that change propagation can cause rework and long development times due to the unpredictable nature of design changes propagating not only between directly connected components but also indirectly through intermediary components (Clarkson et al. 2004; Eckert et al. 2004; Sosa et al. 2007b). More recently, Gokpinar et al. (2007) found that the connectivity of an automobile s subsystems and the extent to which their interfaces are managed significantly affected subsystems conformance quality. The literature on computer science and information systems has also examined the structure of software systems and related it to performance issues such as the time required to implement changes (Cataldo et al. 2006), the difference between open-source and closed-source development (MacCormack et al. 2006), and the factors that lead to refactoring of the source code (see (Mens and Tourwé 2004) for a review). However, previous work has not yet clearly linked the 1

3 features of a system s architecture to the generation of defects in the system. That is the focus of this paper. By integrating methods used in engineering design to analyze product and process architectures with methods from information systems on the determinants of defect proneness, we identify architectural properties that are likely to be associated with the risk of creating defects in software applications. From a user s perspective, software applications provide certain functionality and capability. As long as the application provides these reliably and efficiently, the user is generally satisfied. However, that is rarely the case. Users typically uncover bugs by testing and using the software application. From a designer s standpoint, there are many alternative ways for the software to provide the specified functionality. Designers or architects must determine how to allocate the software s functions to its various components or groups thereof, called subsystems (or modules). Architects must also determine how the software system will be organized in terms of command and control, utilities, and other supporting infrastructure of components and subsystems. These choices determine the nature and extent of the relationships between the components and subsystems of any version of the software application, and they affect not only the ease with which the components and subsystems can be successfully modified in successive versions (MacCormack et al. 2006; Parnas 1972; Sosa et al. 2007a) but also the risk of introducing undesired functionality into the system (Jones 2000; Koru and Tian 2004). A system s conformance quality is defined by its ability to meet its design specifications. Hence, system defects (called bugs in software applications) are identified when the system does not perform as specified. To meet system requirements a hardware product is designed and then produced. In software development, this is equivalent to architecting and writing the source code and then compiling it to transform it into an executable application that will ultimately deliver the specified functionality. Conformance quality is measured over the final product. Hence, defects or bugs may be uncovered by testers prior to the application s release and by users post-release. Here is important to note that many bugs are probably uncovered and corrected during the development phase (most likely during the compiling of code), similar to design rework in the latest phases of product 2

4 development, in which the product is tested and prepared for production ramp-up. Although these pre-release bugs are important to manage due the amount of design rework they typically involve, the focus on this paper is on the bugs that elude the development team and get released with the application to be uncovered later by the users. In software development, two basic concepts characterize a good design structure (Stevens et al. 1974): coupling and cohesion. Cohesion refers to the internal consistency of each software component, while coupling pertains to the strength of the connections between components. A good source code design maximizes cohesion and minimizes coupling. This principle suggests that increased coupling (greater connectivity among components) is likely to have a negative impact on the quality of a system (Koru et al. 2007; MacCormack et al. 2006). In this paper, we argue that it is not the amount of coupling but rather the fraction of components involved in cyclical dependencies that is positively associated with the generation of bugs. To test this proposition, we examined 126 releases representing multiple generations of 20 distinct open-source applications developed by the Apache Foundation. This paper is structured as follows. Section 2 discusses the software architecture representation schemes and metrics that facilitate our study, along with the theoretical framework and hypotheses. Section 3 describes the empirical study carried out to test our hypotheses. Section 4 describes the analysis and results. Section 5 concludes by discussing the academic and managerial implications of this work. 2 Linking Software Architecture and the Creation of Bugs 2.1 The Architecture of Complex Systems The architecture of a designed system (either hardware or software) is determined during its design process through both decomposition and integration (Alexander 1964; Simon 1981). Establishing the architecture of a (hardware) system includes breaking it down into functional and physical elements, mapping the functional elements onto the physical elements, and specifying the interfaces among the elements (Ulrich 1995; Ulrich and Eppinger 2007). Simon (1981) suggested that complex systems should be designed as hierarchical structures consisting of nearly 3

5 decomposable systems, with strong interfaces within subsystems and weak interfaces across subsystems. This is consistent with the independence axiom of axiomatic design, which suggests the decoupling of functional and physical elements of a product (Suh 2001), as well as with the notion of modularity, which suggests the creation of options that enable the evolution of designs (Baldwin and Clark 2000). Hence, designers typically decompose complex hardware and software systems into subsystems and components to facilitate their design. Yet, such subsystems and components must then be integrated to ensure that the product functions as a whole. Previous research in engineering design has developed methods to analyze the architecture of complex products by studying how their components interact to provide system functionality. More specifically, this stream of research has modeled products as collections of interdependent components and has developed methods to cluster components with similar dependencies into subsystems (modules) (Browning 2001; Lai and Gershenson 2006; Pimmler and Eppinger 1994) and analyzed how component connectivity patterns relate to organizational and design decisions (Sosa et al. 2003; Sosa et al. 2004; Sosa et al. 2007b). Similarly, to analyze the architecture of a software system, we examine its source code because it codifies the design of the system. Analogous to hardware products, the architecture of a software application is the scheme by which its functional elements are codified into objects in the source code and the way in which these objects interact and are grouped into subsystems and layers (e.g., Parnas 1972; Parnas 1979; Shaw and Garlan 1996). The dependency structure of the source code (i.e., the way in which its components exchange information) specifies the system s functionality precisely. In that sense, the source code captures the process (or recipe ) that determines how the system works. Recognizing this is important because it leads us to depart from the methods used to analyze hardware product architectures. Contrary to previous work focused on analyzing the architecture of complex software applications (MacCormack et al. 2006), we consider the software architecture as a process that specifies precisely how the objects comprising the system will interact over time to provide its functionality. In contrast, hardware product architectures have traditionally been analyzed by considering them 4

6 as a collection of static physical elements and dependencies (Pimmler and Eppinger 1994; Sosa et al. 2007b). Because software architectures describe the process by which the components of the system product interact to provide its functionality, we use alternative analytical techniques that have traditionally been used to analyze iterative processes such as new product development (see (Browning 2001) for a review). A process architecture adds a time dimension to the elements and relationships in a system (Browning and Eppinger 2002). While software components execute much more quickly than a project s activity network, they nevertheless execute in finite run time, with the dependencies between the elements of the source code determining the order of the actions performed by the components defined by the source code. 2.2 Architectural Representations of Software Systems The source code of a software application consists of a collection of connected components organized into subsystems, which are in turn grouped into levels and layers (Sangal et al. 2005; Shaw and Garlan 1996). To explore the features and effects of a system s architecture, we need to understand these arrangements more specifically. It will clarify our discussion to refer to an example, one of the software applications we studied, version 1.3 of Ant. Traditionally, designers have represented the architecture of their source codes with block diagrams such as Figure 1, which depicts the system s decomposition into subsystems. We define a subsystem as a set of components and/or other subsystems, where the presence of a block inside another represents a parent to child relationship in the decomposition. For example, at the first level of decomposition, Ant version 1.3 consists of four subsystems: taskdefs, types, util, and * (where * stands for a group of miscellaneous components). The subsystem taskdefs in turn breaks down into two lower-level subsystems, as does the subsystem util. Each of the subsystems shown in Figure 1 is ultimately a collection of components (which are not shown individually). Thus, the z-axis of the figure (coming out of the page) relates to the level (or depth) of decomposition. For example, the first level is formed by the four subsystems immediately below the root node (level 0), and the architecture has two levels because two of the subsystems in the first level, taskdefs and util, contain other subsystems ( compilers and regexp, respectively). 5

7 Figure 1: Architecture block diagram for version 1.3 of the Apache Ant application The vertical layout ( y-axis ) of the subsystems and components in the block diagram shown in Figure 1 is also meaningful. Subsystems and components located at the bottom of the diagram are intended to serve the subsystems above. These layers are defined by the system architect s design rules (Baldwin and Clark 2000; Sangal et al. 2005) Software is often architected in layers to provide a coherent command and control structure, such that components in higher layers can call (depend on) components in lower layers. The opposite situation, where lower-layer components depend on higher-layer ones, is possible but undesirable, for reasons we will discuss below. Thus, the block diagram conveys information about both the levels of decomposition (which relate to the physical structure of the code) and the layers of intended dependencies (which relate to the process structure of the code). Note that where one chooses to end the decomposition and declare the lowest level is the modeler s choice. In our analysis, we stop at the class level, 1 although we could go down further to the level of methods and data members and eventually even to lines of code. However, three main arguments led to our choice to decompose to the class level. First, classes tend to provide a set of common functionality (e.g., a set of low-level mathematical functions) that is maintained as one cohesive piece of software, often in a single source file by a single author. Second, the main attributes of the architecture become apparent by the class level, so further decomposition would only obscure these insights. Third, this level of decomposition is consistent with previous work focused on representing software architectures (e.g., MacCormack et al. 2006; Sangal et al. 2005). Thus, for the 1 Our dataset contains only Java applications, wherein files and classes are typically the same, except for inner classes (classes within classes), which we do not consider explicitly. 6

8 purposes of our analysis, we treat each Java class as an atomic component of the software architecture. Although block diagram representations capture the hierarchical organization of components and subsystems in layers, they do not show the dependencies between components. However, as we will discuss below, determining the architectural properties that influence bug creation makes it imperative to consider both the components dependencies and their hierarchical organization. Dependencies among software components are formed by the calls made by one component to another. 2 To represent these, both within and across subsystems and layers, we use a design structure matrix (DSM) representation (Browning 2001). A DSM is a square matrix of size n (where n is the number of elements) whose diagonal cells represent system elements and whose off-diagonal cells indicate dependencies between those elements. The use of DSMs to study the structure of development processes (by mapping out how information flows between activities) led to structured approaches to identify subsets of activities involved in design iterations (Browning and Eppinger 2002; Smith and Eppinger 1997a; Steward 1981). Several researchers have used the DSM representation to capture the architecture of complex products and to analyze patterns of interactions among the components, both for general products (e.g., Sharman and Yassine 2004; Sosa et al. 2003; Sosa et al. 2007b) and for software systems (Cataldo et al. 2006; MacCormack et al. 2006; Sangal et al. 2005; Sosa 2008; Sullivan et al. 2001). Similar to this latter stream of research, we also capture the dependency structure of the source code of a software application in a DSM representation. However, as we mentioned above and will further discuss below, we analyze our software architecture DSMs by considering the process-like nature of the source code they represent. Figure 3 shows a flat DSM representation of Ant 1.3, where the term flat signifies its agnosticism towards hierarchical levels. Hence, Figure 2 shows the 117 components of Ant 1.3 and the 463 dependencies between them. We use the convention where an off-diagonal mark in the DSM represents the dependency of the column component on the row component. Thus, a mark in cell (i,j) indicates that the object (a java class) labeling column j depends on the object labeling row i. 2 Specifically, we include the following types of dependencies: invocations (static, virtual, and interface), inheritances (extensions and implementations), data member references, and constructs (both with and without arguments). 7

9 Figure 2: A flat DSM representation of Ant 1.3 To account for the organization of components into subsystems and layers, we supplement the flat DSM with a hierarchical DSM representation. The z-axis (coming out of the page) of the DSM in Figure 3 shows the nested levels of decomposition and the components membership in subsystems. The y-axis of the DSM shows the ordering of the subsystems in layers. Thus, Figure 3 combines many of the visual benefits of Figures 1 and 2. This representation allows us to distinguish inter- and intra-subsystem component dependencies. Figure 3: A hierarchical DSM representation of Ant 1.3 8

10 2.3 Identifying Component Loops in Software Architectures A key strength of the process DSM representation and sequencing analysis is their ability to highlight component loops. We define a component loop as a subset of components whose dependencies form a complete circuit. To understand a component loop, consider the various types of dependencies that can exist among several components in a system. Error! Reference source not found. exemplifies three types of dependencies (or lack thereof) between components. In case (a), the three components are independent. Thus, procedures and data processing done by any of the components are independent of the other components (assuming sufficient processing resources). In case (b), component C provides data services to components A and B. Similarly, component B provides data to component A. As a result, there is a serial order (C, B, and A) in which these three components must be executed to ensure data availability. In cases (c) and (d), components A, B, and C are involved in a circuit or loop because they depend on each other in a cyclical manner. Procedures of component A depend on data processing performed by component B, which depend on data provided by component C, and which in turn depend on data provided by component A. Considering an information-processing view of a system, cases (a), (b), and (c) represent the three fundamental types of dependencies between elements in a product, process, or organizational system (Eppinger et al. 1994; Thompson 1967). These three cases, however, assume that the components all belong to the same group. This assumption breaks down when considering the membership of the components in different subsystems. Case (d) represents group membership by shading components A and C differently from B and (the newly added) D. Subsystem membership has two important effects on coupled dependencies. First, it may increase the size of the loop by involving additional components (such as component D) that otherwise would not be part of the loop. (Any change in data provided by component B might also affect component D, and since the loop could cause several changes in B, component D might receive several change signals.) Second, it might hide the intrinsic loop formed by A, B, and C. Because the loop crosses group membership boundaries and adds additional components due to group membership, the intrinsic loop formed by A, B, and C can get hidden. We argue that the influence of component D on the intrinsic loop is determined not only by the dependency between component B and D but also by the fact that the group membership of these 9

11 components increases the likelihood of being considered as a bundle of components. (a) Independence (b) Serial dependence (c) Coupled dependence (d) Extended coupled dependence Figure 4: Types of relationship patterns between components The concept of loops or cycles (also called iterations) is not new in the process analysis literature, where DSMs have been used to identify subsets of activities that drive iterations (Browning and Eppinger 2002; Meier et al. 2007; Smith and Eppinger 1997a; Smith and Eppinger 1997b; Steward 1981). 3 However, what is new in our conceptualization of component loops is twofold: First, we distinguish component loops in the presence of the levels and layers in which the system s components are organized. Second, we relate the presence of component loops to an important measure of product (not process) quality such as bug creation. To do so, we define system cyclicality, an architectural property of the system, as the fraction of the system that involves components embedded in component loops. Methods exist to determine the sequence of components in a process DSM that highlights the minimal subsets of coupled components (Meier et al. 2007; Steward 1981; Warfield 1973). (Our use of sequencing to identify coupling distinguishes our approach from previous work, in both the hardware and software product domains (e.g., MacCormack et al. 2006; Pimmler and Eppinger 1994), which has not differentiated between feed-forward and feedback interactions and has instead used clustering algorithms to group components. 4 ) Basic sequencing orders the DSM to minimize the 3 Although component loops in the system (or product) domain may cause design iterations in the process (or work) domain, they are conceptually different. The use of the term component to characterize loops helps us emphasize that we are concerned with the loops present in the system/product domain. 4 For further discussion of the differences between sequencing (also called partitioning) and clustering algorithms, see (Browning 2001). 10

12 number of super-diagonal marks and their distance from the diagonal. A lower-triangular matrix implies a sequence of execution that maximizes the availability of data to all components. A mark (i, j) below the diagonal indicates a feed-forward dependency where component i provides data to component j (i < j), while a super-diagonal mark indicates a feedback dependency in which component j provides data to component i that has been previously executed (since i < j). Since feedback dependencies spawn loops, feedback marks are generally undesirable in process architectures. Considering the flat DSM in Figure 3, one can identify the intrinsic component loops in Ant 1.3 by sequencing the DSM to minimize the super-diagonal marks. Figure 5 shows this result and highlights the two intrinsic component loops in the shaded blocks along the diagonal. Ant 1.3 has seven feedback marks that cause the two component loops, which respectively contain seven and 14 interdependent components. Since 21 out of the 117 components of Ant 1.3 are involved in intrinsic component loops, there is an 18% probability that a randomly chosen component is involved in an intrinsic component loop. Figure 5: Sequenced flat DSM of Ant 1.3 We refer to the component loops shown in a sequenced flat DSM (Figure 5) as intrinsic because they are identified without any constraints to the sequencing algorithm imposed by the hierarchical way in which the components are organized into subsystems (i.e., the levels of decomposition). Another perspective on the architecture can be obtained by applying a constrained form of sequencing 11

13 to the hierarchical DSM (Figure 3), where we recursively sequence the subsystems internally at each level, from the top (root) level and then down. This approach constrains the sequencing within each subsystem and highlights connections that traverse the subsystems and layers laid out by the system architects. Since the levels and layers influence the way the developers work and the associations they realize, the number of interdependently coupled components in the hierarchical DSM captures an alternative and potentially important characteristic of the architecture. Figure 6 shows a sequenced hierarchical DSM of Ant 1.3 (from Figure 3), which also contains two component loops. However, since the sequencing of the DSM is constrained by need to keep each component within its subsystem, the resulting loops are much larger. By examining the blocks formed along the diagonal by enclosing all of the components involved in the two realized component loops, we find that they contain 88 components. (The algorithm used to determine a realized component loop in a sequenced hierarchical DSM is described in the Appendix.) The first design loop includes 50 components across two subsystems ( compilers and * ) which form the high-level subsystem taskdefs. The second component loop contains 38 components across four subsystems ( types, the two subsystems that comprise the subsystem util, and the high-level subsystem * ). The 88 total components involved in realized component loops implies a probability of 75% that a randomly chosen component is involved in a realized component loop. Since the sequencing algorithm on the hierarchical DSM is constrained by the actual hierarchy of the software architecture, the number of components involved in realized loops will always be greater than or equal to the number of components involved in intrinsic loops. 5 5 Because many of the components in the realized design loops are not dependent on other components within the realized design loops, we also consider the size of the realized design loops minus these unconnected components. (These are the components with empty rows and columns in sub-matrices along the diagonal that define the two realized design loops of Ant 1.3.) We take this distinction into account in our analysis that relates component loops and bugs. 12

14 Figure 6: Sequenced hierarchical DSM of Ant 1.3 Next, we develop a theoretical argument for how component loops lead to higher bugs creation. Then, in Section 3, we empirically test such a hypothesis by using the views of component loops presented here. 2.4 Hypotheses: The Effects of Component Loops on the Creation of Bugs This paper argues that certain architectural patterns of a system can significantly impact its number of defects. Although many bugs are uncovered and fixed during the development and testing of a software application, many bugs get shipped with the system and are uncovered by its users. We focus on this latter type of defects. In general, bugs represent undesired behaviors of software systems. Based on findings from the process system literature, we would expect that an important source of bug creation would be the presence of component loops, since they are likely to trigger iterative problem solving (Roberts et al. 2006). Iterative problem solving typically corresponds to difficult and recursive problems that require making assumptions, iterating, and/or compromising, a process which may not converge easily and therefore carries a higher risk of residual errors than serial or parallel problem solving (Eppinger et al. 1994; Krishnan et al. 1997; Terwiesch et al. 2002). Moreover, as more components are involved in such iterative problems, the probability of convergence on a feasible solution decreases (Mihm et al. 13

15 2003; Smith and Eppinger 1997a), which can increase the risk of embedding bugs into the system. Hence, we hypothesize that: H1: The larger the fraction of components of version s involved in component loops, the greater the number of bugs associated with version s. Our first hypothesis conjectures that the presence of component loops will increase the risk of having bugs in the system. However, as discussed in the previous sub-sections, there are various types of component loops. Intrinsic component loops involve the minimum set of components with coupled dependences, assuming that they can be developed together without any hierarchical constraints. However, because source code is organized into modules and subsystems, intrinsic component loops are typically augmented by other components that share subsystem membership. Hence, realized component loops could provide a more realistic indication of the effects of loops as perceived by the developers. Hence, the difference between realized and intrinsic component loops is the addition of components to the intrinsic component loops due to the hierarchies of the architecture. The addition of extra components to intrinsic component loops to form the realized loops has two important effects on the risk of introducing bugs. First, it increases the size of the component loop and therefore (artificially) increases the size of the iterative problem to be solved, which in turn could lead to increase the risk of creating bugs. Second, and more importantly, the additional components could introduce noise into the component loops that could increase the distance between the components involved in the intrinsic loops (the potential root cause of the bugs). This not only makes the iterative problem more difficult due to lack of precision and stability of the information exchanged but also makes it less visible to the developers (Terwiesch et al. 2002). Iterative problem solving is even more problematic when it is not foreseen by the developers (Pich et al. 2002; Sommer and Loch 2004). Hence, we argue that realized component loops can lead to a higher number of bugs, because they are more likely to hide and disaggregate the intrinsic component loops, which otherwise could receive greater focus from the developers. The more extra components involved in realized component loops, the greater this effect, and the higher the risk of creating more bugs. This leads to our second hypothesis: 14

16 H2: Realized component loops have a stronger positive effect on bugs creation than intrinsic component loops. 3 Empirical Study: The Apache Open Source Foundation To test our hypotheses, we study readily-accessible, open-source, Java-based software applications from the Apache Foundation ( one of the largest, best-established, and widely-studied open source communities of developers and users who share values and a standard development process (Roberts et al. 2006). The Apache Foundation has a desire to create high quality software that leads the way in its field. We examined all the Java-based applications developed by Apache, focusing on Java because (1) it is one of the most widely used and open objectoriented programming languages, and (2) it captures components and their dependencies in a structured and explicit manner in its source code. This minimizes the risk of having components or dependencies being masked in the source code and only appearing later at the time of compilation. In total, we identified 69 Java-based development projects at the Apache Foundation in mid This provided our initial database. To effectively examine a causal relationship between architecture characteristics and quality, we needed to obtain a longitudinal dataset, so we down-selected to the 37 applications for which we could access data for successive major releases. That is, we discarded 32 projects because they had a limited history of only one or a few minor releases. From the 37, we selected the applications for which we could access, for successive major releases, their precompiled ( pre-built ) source code (to codify product architecture features), their bug reports (to determine number of bugs), and their release notes (to determine the innovative features and other control variables). After data purification, we compiled a set of 126 releases representing 20 applications with an average of 6 major releases (or versions) each. We used three different sources of data: bug tracking systems, precompiled.jar files6, and release notes. First, we examined the Bugzilla and Jira bug tracking systems of the Apache Foundation to obtain all the data for the bugs associated with each release. Each of these systems 6 Jar files contain all the Java class specifications (including the dependencies among them) for a given Java-based software application. 15

17 allows for users and developers to enter bug reports, which are classified by their potential severity and processed by the development team in a structured way. All bugs which are not fixed by a developer during the writing of the source code and therefore get released with the application go through this process. These databases thus record the status and closure of each bug associated with any release. We developed a web-crawler to automate the gathering of the bug data. Second, we downloaded the precompiled versions (as signified by an existing.jar file) of each application available from the Apache archives and/or the application s website, selecting the versions considered major releases. We did not normally use minor releases since these typically involve relatively small changes. We used a commercially available software application developed by Lattix ( to translate the structure of the source code captured in the.jar file into a matrix representation such as the ones shown in Figures 5 and 6. Finally, we consulted the release notes of each version of all the applications in our sample to find data on newness, age, and other important controls 3.1 Dependent variable Number of bugs associated with version s of application i (y is ). Our main dependent variable counts all the bugs that have been formally identified and attributed to version s of application i. The identification of a bug is carried out by developers or users (with confirmation by developers) after the release. Hence, this variable does not measure the capability of the development organization to discover bugs. Rather, it is a proxy for the number of actual defects embedded in version s of application i. As mentioned, we used the Buzilla and Jira bug tracking systems as the data sources to quantify this variable. Out of the complete list of bugs entered into these systems, we discarded any items that could not be verified as actual bugs by the developers (Classification: WORKS_FOR_ME or INVALID for Bugzilla and Cannot Reproduce or Not A Problem for Jira). We also discarded any bugs that the developers considered duplicates of bugs already registered in the system (Classification DUPLICATE for both Bugzilla and Jira). Attribution of a bug to a code version was primarily determined by the classification in the system (Classification according to data field Affected Versions ). If no version was explicitly given in the bug description, we assumed the bug 16

18 belonged to the most recently released version with respect to the bug entry date. 3.2 Independent variables Our key predictor variable is the extent to which version s of application i contains the various types of component loops (as discussed in section 2) in its source code. Because we can identify component loops in the presence or absence of the constraints imposed by the hierarchical assignment of components to subsystems, we define three types of component loop measures: Intrinsic cyclicality (P I,is ) is the probability that a randomly chosen component in version s of application i belongs to an intrinsic component loop. (Let us recall that intrinsic component loops are defined by the set of components that share coupled dependencies in a sequenced flat DSM such as the one shown in Figure 5.) To determine P I,is we count the number of components involved in loops in the flat DSM of version s of application i (C I,is ), divided by its total number of components (N is ). Hence, P I,is = C I,is, / N is Realized cyclicality (P R,is ) is the probability that a randomly chosen component of version s of application i belongs to a realized component loop. This measure is a function of the number of components that are involved in component loops determined while maintaining the constraints of the subsystems and layers used by programmers to organize their code (C R,is ). To identify C I,is we count the number of components in loops in the sequenced hierarchical DSM of version s of application i, such as the one shown in Figure 6. Hence, P R,is = C R,is / N is Reduced realized cyclicality (P RR,is ) is the probability that a randomly chosen component in version s of application i belongs to a reduced realized component loop. This measure is similar to P R,is but subtracts from its numerator the number of components that do not depend on any other component within the loop. 3.3 Control variables We include two sets of control variables. First, we control for exogenous, non-architectural 17

19 features of the application that are likely to affect the creation of bugs. Second, we control for architectural characteristics that relate to the direct and indirect connectivity among the components of the application so as to test more precisely whether and how system cyclicality might influence bug creation Non-architectural Controls Age of application at version s. This is measured by the number of days since the first release of the application. This assumes that the application is officially born on the date of the first major release and ages with successive releases. The cumulative time between releases is likely to increment both the complexity of and knowledge about the architecture. Since these factors are likely to affect bug creation, it is important to control for the age of the application. Days since last release. The time between successive releases varies within and across applications, so it is important to control for the time span between the previous release and version s. The longer this time, the higher the probability that more changes will have been introduced into the application, which could ultimately affect bug creation. Newness of application at version s. New features and incremental changes to existing features add uncertainty and complexity to the structure of the application. Implementing these types of changes not only consumes development resources but also is likely to introduce unforeseeable perturbations to existing features. Hence, the number of new features and incremental changes in an application is likely to affect the creation of bugs. Using the information from the release notes, we capture both the number of new features and incremental changes associated with each release. New features add functionality, while incremental changes modify existing functions. We measure the newness of version s with two control variables that count the numbers of new features and incremental changes, respectively Architectural Controls The following variables are measured for version s of application i: 18

20 Size of jar file. The overall complexity of a system is a function of the amount of information it carries. We expect more complex software systems to generate a larger number of bugs. We use the size of the jar file (in kilobytes) as a proxy of the raw complexity of the source code. This variable measures the volume of information associated with the software architecture, but it does not capture how such information is broken down into components and how these components interact. Number of nominal subsystems. The application source codes in our data set are complex systems formed by interrelated components. To manage the complexity, developers group the components (Java classes) into subsystems. Typically, subsystems group components that collectively perform certain functions. Such a grouping is likely to affect the cognitive ability of the team to understand the architecture of the source code, and therefore it may influence their propensity to create bugs. Because the assignment of each component to a subsystem is well codified by the naming convention, we are able to count the number of distinct subsystems. Note that this measure counts only the number of component-based subsystems, not any higher-level subsystems that group together only other subsystems. Number of components (N is ). The number of components into which the source code has been decomposed is a basic dimension of system complexity that conditions the architecture of the system and therefore for which we must control (Kauffman and Levin 1987). Internal system connectivity. We use two measures to control for the direct and indirect connectivity among components: o Direct connectivity (K is ) measures the number of direct connections among components (Kauffman and Levin 1987). o Indirect connectivity measures the number of non-zero cells of the binary visibility matrix of the system after subtracting the system s DSM. The visibility matrix (V) of a system is a square matrix (similar to the DSM) whose non-zero cells (v ij ) indicate that component i is connected to component j via a finite number of intermediary components. The 19

21 visibility matrix is obtained by raising the DSM (D) to successively higher powers via Boolean multiplication until the number of empty cells in the resultant matrix stabilizes (MacCormack et al. 2006; Sharman and Yassine 2004; Warfield 2000). Hence, to measure indirect connectivity we count the number of non-zero cells in V-D. Note that because V captures both direct and indirect connectivity we must subtract D from V to control for these effects separately. Number of component loops. Because our key independent variables do not explicitly control for the number of component loops present in the source code, we include a control for it whose value depends on whether we are considering intrinsic or realized component loops. Table A (in the Appendix) shows descriptive statistics and correlations between the variables included in our analysis. There were, on average, 101 bugs, 8 new features, and 23 incremental changes associated with each release. 4 Analysis and Results Our dependent variable is the number of bugs. Several features of our data make statistical analysis a non-trivial task. Because our dependent variable exhibits skewed count distributions (which takes non-negative values only), standard ordinary least-square regressions can lead to inefficient and biased estimates. To deal with this issue, statisticians recommend using Poisson-like regression models developed explicitly to model the count nature of the dependent variables (Cameron and Trivedi 1998). Because the variance is significantly larger than the mean of our dependent variable, negative binomial regression models provide a more accurate estimate of the standard errors of the coefficient estimates of our regression models (Cameron and Trivedi 1998; Hausman et al. 1984). We estimate a model of the form (Cameron and Trivedi 1998, p. 279): E [ y x, ] = " exp( x#! ) is is " i i is That is, our regression models predict that the expected number of bugs of version s of application i depends exponentially on a set of linearly independent regressors (x is ). The exponential form of our model ensures that the dependent variable is always greater than zero. The ß coefficients shown in 20

22 Table 1 are estimated by fitting the model to data. The coefficient ß j equals the proportionate change in the expected mean if the j th regressor changes by one unit. A significantly positive ß j coefficient indicates that, all else being equal, an increase in regressor j increases the expected number of bugs, whereas a significantly negative ß j coefficient indicates that, all else being equal, an increase in regressor j decreases the expected number of bugs. Of particular interest are the ß coefficients for our key independent variables. A significantly greater than zero coefficient of ß iteration_propensity would indicate that the greater the iteration propensity in version s of application i the greater the expected number of bugs. This would be in line with hypothesis H1. The α i are application-specific effects, which can be either fixed or random. These effects permit observations of the same application to be correlated across versions, thereby building serial correlation directly into the model. In a fixedeffects model, the α i absorb time-invariant, unobserved, application-specific features. By doing this we effectively control for any unobserved factors such as the culture of the development team associated with each application, since these are likely to differ across applications but much less likely to change for the same application over successive releases. For the random-effects model, the α i are iid random variables which can be estimated by assuming a distribution for α i (typically a gamma distribution). We report estimates based on the fixed-effects model, which are consistent with the random-effects estimates of those models that pass the Hausman specification test to use random effects (Hausman et al. 1984). Finally, because software development technologies may change significantly from year to year and such developments might affect bug creation across all of the applications, we include indicator variables associated with the year of each release. Table 1 provides the coefficient estimates of the models predicting the expected number of bugs. Model 1 includes a first set of control variables. This model shows that the effect of time since last release is positive and significant, indicating that the longer the time between releases the greater the likelihood of introducing a larger number of bugs. Model 2, which includes the rest of the control variables, suggests that neither the number of components (N) nor the number of direct connections among them (K) are significant determinants of bug creation. However, the significant, negative coefficient of indirect connectivity suggests that the propagation of information through intermediary 21

23 components is likely to reduce the number of bugs. To understand this further, we estimated two additional models (not shown in Table 1) in which we distinguish feed-forward and feedback indirect connectivity (in both flat and hierarchical sequenced DSMs). The results of these alternative models indicate that it is feed-forward indirect connectivity, not feedback indirect connectivity, that is significantly and negatively associated with the number of bugs. Models 3, 4, and 5 include our three measures of system cyclicality, respectively. These models also control for the number of component loops. Model 3 shows a positive (yet not significant) coefficient estimate for intrinsic cyclicality, whereas Model 4 shows a positive and significant coefficient estimate for realized cyclicality. Finally, Model 5 shows that the effect of reduced realized cyclicality is positive but not significant. Hence, Model 4 offers the strongest empirical evidence to support H1. That is, the greater the probability that a randomly chosen component belongs to a realized component loop, the larger the expected number of bugs associated with such a version of the application. The fact that Model 4 (and not Models 3 or 5) shows the largest and only significant effect of cyclicality on the number of bugs provides empirical support to H2. Based on a test of means, the coefficient estimate of realized cyclicality (Model 4) is significantly larger than both the intrinsic and reduced realized cyclicality shown in Models 3 and 5, respectively. Hence, it is not only the presence of intrinsic component loops that may increase the risk of creating bugs, but also the fact that such cycles may be hidden from the developers by the presence of other components in the source code. Our results suggest that increasing the size of the subsystems whose components are involved in loops (even if they are not connected to the other components within the realized component loop) increases the risk of masking the design cycle itself and therefore the risk of creating bugs in the system. 4.1 Bug Fixing To gain further insight into the relationship between system cyclicality and quality, we also examine the determinants of bug fixes. Bug fixes is measured by the number of bugs associated with version s of application i that have been fixed by the developers, as reported by the bug tracking systems. Analyzing the determinants of bug fixing is particularly challenging because it depends on 22