Can We Predict the Generation of Bugs? Software Architecture and Quality in Open-Source Development

Size: px
Start display at page:

Download "Can We Predict the Generation of Bugs? Software Architecture and Quality in Open-Source Development"

Transcription

1 Can We Predict the Generation of Bugs? Software Architecture and Quality in Open-Source Development Manuel Sosa Technology and Operations Management Area INSEAD, Fontainebleau, France Jürgen Mihm Technology and Operations Management Area INSEAD, Fontainebleau, France Tyson Browning Department of Information System and Supply Chain Management Neeley School of Business, Texas Christian University, Fort Worth, Texas, US Abstract We study how software architecture relates to quality. Based on a software architecture representation that accounts for not only the hierarchical arrangement of its subsystems and components but also their dependency structure, we formally define the notion of system cyclicality. System cyclicality is an architectural property that captures the fraction of mutually interdependent components in a system. By examining multiple versions of 20 open-source, java-based applications (126 total) developed by the Apache Foundation, we empirically analyze the relationship between software architecture characteristics and the creation of bugs. Our results suggest that, while controlling for various system characteristics, system cyclicality is a key determinant of bug creation. Interestingly, we found evidence that it is not just the cycles themselves, but how hidden they are, that drive the effects of system cyclicality and bug creation. From an academic viewpoint, this work provides a theoretical and empirical basis for a causal link between the architecture of a complex system and its quality. This has important implications for the management of complex system design and development in fast-paced industries such as software. Our results suggest that managers could benefit from proactively examining the architecture of the system they develop and monitoring its cyclicality as one of their strategies to mitigate the creation of defects.

2 1 Introduction Previous work has studied the implications of system architecture decisions to various aspects of the firm (e.g., Baldwin and Clark 2000; Ulrich 1995; Yassine and Wissmann 2007). However, little attention has been devoted to understanding the link between product architecture characteristics and product performance. How does the architecture of a system influence its (conformance) quality? More specifically, to which features of the system architecture should managers pay attention during the development process to minimize the emergence of defects? To study the relationship between system architecture and quality, we examine the development of software applications for several reasons: they are complex, they exhibit fast change rates (like fruit flies in studies of biological evolution), and they offer (through their source code) an efficient, reliable, and standardized medium to capture the architecture of their design. In addition, software applications typically have centralized repositories to reliably track the quality issues associated with each release. Limited research in both the engineering design and computer science communities has addressed the link between system architecture and quality. On the engineering design side, researchers have studied the architecture of complex products to explore how the direct and indirect connections among product components influence the propagation of design changes. This line of research has suggested that change propagation can cause rework and long development times due to the unpredictable nature of design changes propagating not only between directly connected components but also indirectly through intermediary components (Clarkson et al. 2004; Eckert et al. 2004; Sosa et al. 2007b). More recently, Gokpinar et al. (2007) found that the connectivity of an automobile s subsystems and the extent to which their interfaces are managed significantly affected subsystems conformance quality. The literature on computer science and information systems has also examined the structure of software systems and related it to performance issues such as the time required to implement changes (Cataldo et al. 2006), the difference between open-source and closed-source development (MacCormack et al. 2006), and the factors that lead to refactoring of the source code (see (Mens and Tourwé 2004) for a review). However, previous work has not yet clearly linked the 1

3 features of a system s architecture to the generation of defects in the system. That is the focus of this paper. By integrating methods used in engineering design to analyze product and process architectures with methods from information systems on the determinants of defect proneness, we identify architectural properties that are likely to be associated with the risk of creating defects in software applications. From a user s perspective, software applications provide certain functionality and capability. As long as the application provides these reliably and efficiently, the user is generally satisfied. However, that is rarely the case. Users typically uncover bugs by testing and using the software application. From a designer s standpoint, there are many alternative ways for the software to provide the specified functionality. Designers or architects must determine how to allocate the software s functions to its various components or groups thereof, called subsystems (or modules). Architects must also determine how the software system will be organized in terms of command and control, utilities, and other supporting infrastructure of components and subsystems. These choices determine the nature and extent of the relationships between the components and subsystems of any version of the software application, and they affect not only the ease with which the components and subsystems can be successfully modified in successive versions (MacCormack et al. 2006; Parnas 1972; Sosa et al. 2007a) but also the risk of introducing undesired functionality into the system (Jones 2000; Koru and Tian 2004). A system s conformance quality is defined by its ability to meet its design specifications. Hence, system defects (called bugs in software applications) are identified when the system does not perform as specified. To meet system requirements a hardware product is designed and then produced. In software development, this is equivalent to architecting and writing the source code and then compiling it to transform it into an executable application that will ultimately deliver the specified functionality. Conformance quality is measured over the final product. Hence, defects or bugs may be uncovered by testers prior to the application s release and by users post-release. Here is important to note that many bugs are probably uncovered and corrected during the development phase (most likely during the compiling of code), similar to design rework in the latest phases of product 2

4 development, in which the product is tested and prepared for production ramp-up. Although these pre-release bugs are important to manage due the amount of design rework they typically involve, the focus on this paper is on the bugs that elude the development team and get released with the application to be uncovered later by the users. In software development, two basic concepts characterize a good design structure (Stevens et al. 1974): coupling and cohesion. Cohesion refers to the internal consistency of each software component, while coupling pertains to the strength of the connections between components. A good source code design maximizes cohesion and minimizes coupling. This principle suggests that increased coupling (greater connectivity among components) is likely to have a negative impact on the quality of a system (Koru et al. 2007; MacCormack et al. 2006). In this paper, we argue that it is not the amount of coupling but rather the fraction of components involved in cyclical dependencies that is positively associated with the generation of bugs. To test this proposition, we examined 126 releases representing multiple generations of 20 distinct open-source applications developed by the Apache Foundation. This paper is structured as follows. Section 2 discusses the software architecture representation schemes and metrics that facilitate our study, along with the theoretical framework and hypotheses. Section 3 describes the empirical study carried out to test our hypotheses. Section 4 describes the analysis and results. Section 5 concludes by discussing the academic and managerial implications of this work. 2 Linking Software Architecture and the Creation of Bugs 2.1 The Architecture of Complex Systems The architecture of a designed system (either hardware or software) is determined during its design process through both decomposition and integration (Alexander 1964; Simon 1981). Establishing the architecture of a (hardware) system includes breaking it down into functional and physical elements, mapping the functional elements onto the physical elements, and specifying the interfaces among the elements (Ulrich 1995; Ulrich and Eppinger 2007). Simon (1981) suggested that complex systems should be designed as hierarchical structures consisting of nearly 3

5 decomposable systems, with strong interfaces within subsystems and weak interfaces across subsystems. This is consistent with the independence axiom of axiomatic design, which suggests the decoupling of functional and physical elements of a product (Suh 2001), as well as with the notion of modularity, which suggests the creation of options that enable the evolution of designs (Baldwin and Clark 2000). Hence, designers typically decompose complex hardware and software systems into subsystems and components to facilitate their design. Yet, such subsystems and components must then be integrated to ensure that the product functions as a whole. Previous research in engineering design has developed methods to analyze the architecture of complex products by studying how their components interact to provide system functionality. More specifically, this stream of research has modeled products as collections of interdependent components and has developed methods to cluster components with similar dependencies into subsystems (modules) (Browning 2001; Lai and Gershenson 2006; Pimmler and Eppinger 1994) and analyzed how component connectivity patterns relate to organizational and design decisions (Sosa et al. 2003; Sosa et al. 2004; Sosa et al. 2007b). Similarly, to analyze the architecture of a software system, we examine its source code because it codifies the design of the system. Analogous to hardware products, the architecture of a software application is the scheme by which its functional elements are codified into objects in the source code and the way in which these objects interact and are grouped into subsystems and layers (e.g., Parnas 1972; Parnas 1979; Shaw and Garlan 1996). The dependency structure of the source code (i.e., the way in which its components exchange information) specifies the system s functionality precisely. In that sense, the source code captures the process (or recipe ) that determines how the system works. Recognizing this is important because it leads us to depart from the methods used to analyze hardware product architectures. Contrary to previous work focused on analyzing the architecture of complex software applications (MacCormack et al. 2006), we consider the software architecture as a process that specifies precisely how the objects comprising the system will interact over time to provide its functionality. In contrast, hardware product architectures have traditionally been analyzed by considering them 4

6 as a collection of static physical elements and dependencies (Pimmler and Eppinger 1994; Sosa et al. 2007b). Because software architectures describe the process by which the components of the system product interact to provide its functionality, we use alternative analytical techniques that have traditionally been used to analyze iterative processes such as new product development (see (Browning 2001) for a review). A process architecture adds a time dimension to the elements and relationships in a system (Browning and Eppinger 2002). While software components execute much more quickly than a project s activity network, they nevertheless execute in finite run time, with the dependencies between the elements of the source code determining the order of the actions performed by the components defined by the source code. 2.2 Architectural Representations of Software Systems The source code of a software application consists of a collection of connected components organized into subsystems, which are in turn grouped into levels and layers (Sangal et al. 2005; Shaw and Garlan 1996). To explore the features and effects of a system s architecture, we need to understand these arrangements more specifically. It will clarify our discussion to refer to an example, one of the software applications we studied, version 1.3 of Ant. Traditionally, designers have represented the architecture of their source codes with block diagrams such as Figure 1, which depicts the system s decomposition into subsystems. We define a subsystem as a set of components and/or other subsystems, where the presence of a block inside another represents a parent to child relationship in the decomposition. For example, at the first level of decomposition, Ant version 1.3 consists of four subsystems: taskdefs, types, util, and * (where * stands for a group of miscellaneous components). The subsystem taskdefs in turn breaks down into two lower-level subsystems, as does the subsystem util. Each of the subsystems shown in Figure 1 is ultimately a collection of components (which are not shown individually). Thus, the z-axis of the figure (coming out of the page) relates to the level (or depth) of decomposition. For example, the first level is formed by the four subsystems immediately below the root node (level 0), and the architecture has two levels because two of the subsystems in the first level, taskdefs and util, contain other subsystems ( compilers and regexp, respectively). 5

7 Figure 1: Architecture block diagram for version 1.3 of the Apache Ant application The vertical layout ( y-axis ) of the subsystems and components in the block diagram shown in Figure 1 is also meaningful. Subsystems and components located at the bottom of the diagram are intended to serve the subsystems above. These layers are defined by the system architect s design rules (Baldwin and Clark 2000; Sangal et al. 2005) Software is often architected in layers to provide a coherent command and control structure, such that components in higher layers can call (depend on) components in lower layers. The opposite situation, where lower-layer components depend on higher-layer ones, is possible but undesirable, for reasons we will discuss below. Thus, the block diagram conveys information about both the levels of decomposition (which relate to the physical structure of the code) and the layers of intended dependencies (which relate to the process structure of the code). Note that where one chooses to end the decomposition and declare the lowest level is the modeler s choice. In our analysis, we stop at the class level, 1 although we could go down further to the level of methods and data members and eventually even to lines of code. However, three main arguments led to our choice to decompose to the class level. First, classes tend to provide a set of common functionality (e.g., a set of low-level mathematical functions) that is maintained as one cohesive piece of software, often in a single source file by a single author. Second, the main attributes of the architecture become apparent by the class level, so further decomposition would only obscure these insights. Third, this level of decomposition is consistent with previous work focused on representing software architectures (e.g., MacCormack et al. 2006; Sangal et al. 2005). Thus, for the 1 Our dataset contains only Java applications, wherein files and classes are typically the same, except for inner classes (classes within classes), which we do not consider explicitly. 6

8 purposes of our analysis, we treat each Java class as an atomic component of the software architecture. Although block diagram representations capture the hierarchical organization of components and subsystems in layers, they do not show the dependencies between components. However, as we will discuss below, determining the architectural properties that influence bug creation makes it imperative to consider both the components dependencies and their hierarchical organization. Dependencies among software components are formed by the calls made by one component to another. 2 To represent these, both within and across subsystems and layers, we use a design structure matrix (DSM) representation (Browning 2001). A DSM is a square matrix of size n (where n is the number of elements) whose diagonal cells represent system elements and whose off-diagonal cells indicate dependencies between those elements. The use of DSMs to study the structure of development processes (by mapping out how information flows between activities) led to structured approaches to identify subsets of activities involved in design iterations (Browning and Eppinger 2002; Smith and Eppinger 1997a; Steward 1981). Several researchers have used the DSM representation to capture the architecture of complex products and to analyze patterns of interactions among the components, both for general products (e.g., Sharman and Yassine 2004; Sosa et al. 2003; Sosa et al. 2007b) and for software systems (Cataldo et al. 2006; MacCormack et al. 2006; Sangal et al. 2005; Sosa 2008; Sullivan et al. 2001). Similar to this latter stream of research, we also capture the dependency structure of the source code of a software application in a DSM representation. However, as we mentioned above and will further discuss below, we analyze our software architecture DSMs by considering the process-like nature of the source code they represent. Figure 3 shows a flat DSM representation of Ant 1.3, where the term flat signifies its agnosticism towards hierarchical levels. Hence, Figure 2 shows the 117 components of Ant 1.3 and the 463 dependencies between them. We use the convention where an off-diagonal mark in the DSM represents the dependency of the column component on the row component. Thus, a mark in cell (i,j) indicates that the object (a java class) labeling column j depends on the object labeling row i. 2 Specifically, we include the following types of dependencies: invocations (static, virtual, and interface), inheritances (extensions and implementations), data member references, and constructs (both with and without arguments). 7

9 Figure 2: A flat DSM representation of Ant 1.3 To account for the organization of components into subsystems and layers, we supplement the flat DSM with a hierarchical DSM representation. The z-axis (coming out of the page) of the DSM in Figure 3 shows the nested levels of decomposition and the components membership in subsystems. The y-axis of the DSM shows the ordering of the subsystems in layers. Thus, Figure 3 combines many of the visual benefits of Figures 1 and 2. This representation allows us to distinguish inter- and intra-subsystem component dependencies. Figure 3: A hierarchical DSM representation of Ant 1.3 8

10 2.3 Identifying Component Loops in Software Architectures A key strength of the process DSM representation and sequencing analysis is their ability to highlight component loops. We define a component loop as a subset of components whose dependencies form a complete circuit. To understand a component loop, consider the various types of dependencies that can exist among several components in a system. Error! Reference source not found. exemplifies three types of dependencies (or lack thereof) between components. In case (a), the three components are independent. Thus, procedures and data processing done by any of the components are independent of the other components (assuming sufficient processing resources). In case (b), component C provides data services to components A and B. Similarly, component B provides data to component A. As a result, there is a serial order (C, B, and A) in which these three components must be executed to ensure data availability. In cases (c) and (d), components A, B, and C are involved in a circuit or loop because they depend on each other in a cyclical manner. Procedures of component A depend on data processing performed by component B, which depend on data provided by component C, and which in turn depend on data provided by component A. Considering an information-processing view of a system, cases (a), (b), and (c) represent the three fundamental types of dependencies between elements in a product, process, or organizational system (Eppinger et al. 1994; Thompson 1967). These three cases, however, assume that the components all belong to the same group. This assumption breaks down when considering the membership of the components in different subsystems. Case (d) represents group membership by shading components A and C differently from B and (the newly added) D. Subsystem membership has two important effects on coupled dependencies. First, it may increase the size of the loop by involving additional components (such as component D) that otherwise would not be part of the loop. (Any change in data provided by component B might also affect component D, and since the loop could cause several changes in B, component D might receive several change signals.) Second, it might hide the intrinsic loop formed by A, B, and C. Because the loop crosses group membership boundaries and adds additional components due to group membership, the intrinsic loop formed by A, B, and C can get hidden. We argue that the influence of component D on the intrinsic loop is determined not only by the dependency between component B and D but also by the fact that the group membership of these 9

11 components increases the likelihood of being considered as a bundle of components. (a) Independence (b) Serial dependence (c) Coupled dependence (d) Extended coupled dependence Figure 4: Types of relationship patterns between components The concept of loops or cycles (also called iterations) is not new in the process analysis literature, where DSMs have been used to identify subsets of activities that drive iterations (Browning and Eppinger 2002; Meier et al. 2007; Smith and Eppinger 1997a; Smith and Eppinger 1997b; Steward 1981). 3 However, what is new in our conceptualization of component loops is twofold: First, we distinguish component loops in the presence of the levels and layers in which the system s components are organized. Second, we relate the presence of component loops to an important measure of product (not process) quality such as bug creation. To do so, we define system cyclicality, an architectural property of the system, as the fraction of the system that involves components embedded in component loops. Methods exist to determine the sequence of components in a process DSM that highlights the minimal subsets of coupled components (Meier et al. 2007; Steward 1981; Warfield 1973). (Our use of sequencing to identify coupling distinguishes our approach from previous work, in both the hardware and software product domains (e.g., MacCormack et al. 2006; Pimmler and Eppinger 1994), which has not differentiated between feed-forward and feedback interactions and has instead used clustering algorithms to group components. 4 ) Basic sequencing orders the DSM to minimize the 3 Although component loops in the system (or product) domain may cause design iterations in the process (or work) domain, they are conceptually different. The use of the term component to characterize loops helps us emphasize that we are concerned with the loops present in the system/product domain. 4 For further discussion of the differences between sequencing (also called partitioning) and clustering algorithms, see (Browning 2001). 10

12 number of super-diagonal marks and their distance from the diagonal. A lower-triangular matrix implies a sequence of execution that maximizes the availability of data to all components. A mark (i, j) below the diagonal indicates a feed-forward dependency where component i provides data to component j (i < j), while a super-diagonal mark indicates a feedback dependency in which component j provides data to component i that has been previously executed (since i < j). Since feedback dependencies spawn loops, feedback marks are generally undesirable in process architectures. Considering the flat DSM in Figure 3, one can identify the intrinsic component loops in Ant 1.3 by sequencing the DSM to minimize the super-diagonal marks. Figure 5 shows this result and highlights the two intrinsic component loops in the shaded blocks along the diagonal. Ant 1.3 has seven feedback marks that cause the two component loops, which respectively contain seven and 14 interdependent components. Since 21 out of the 117 components of Ant 1.3 are involved in intrinsic component loops, there is an 18% probability that a randomly chosen component is involved in an intrinsic component loop. Figure 5: Sequenced flat DSM of Ant 1.3 We refer to the component loops shown in a sequenced flat DSM (Figure 5) as intrinsic because they are identified without any constraints to the sequencing algorithm imposed by the hierarchical way in which the components are organized into subsystems (i.e., the levels of decomposition). Another perspective on the architecture can be obtained by applying a constrained form of sequencing 11

13 to the hierarchical DSM (Figure 3), where we recursively sequence the subsystems internally at each level, from the top (root) level and then down. This approach constrains the sequencing within each subsystem and highlights connections that traverse the subsystems and layers laid out by the system architects. Since the levels and layers influence the way the developers work and the associations they realize, the number of interdependently coupled components in the hierarchical DSM captures an alternative and potentially important characteristic of the architecture. Figure 6 shows a sequenced hierarchical DSM of Ant 1.3 (from Figure 3), which also contains two component loops. However, since the sequencing of the DSM is constrained by need to keep each component within its subsystem, the resulting loops are much larger. By examining the blocks formed along the diagonal by enclosing all of the components involved in the two realized component loops, we find that they contain 88 components. (The algorithm used to determine a realized component loop in a sequenced hierarchical DSM is described in the Appendix.) The first design loop includes 50 components across two subsystems ( compilers and * ) which form the high-level subsystem taskdefs. The second component loop contains 38 components across four subsystems ( types, the two subsystems that comprise the subsystem util, and the high-level subsystem * ). The 88 total components involved in realized component loops implies a probability of 75% that a randomly chosen component is involved in a realized component loop. Since the sequencing algorithm on the hierarchical DSM is constrained by the actual hierarchy of the software architecture, the number of components involved in realized loops will always be greater than or equal to the number of components involved in intrinsic loops. 5 5 Because many of the components in the realized design loops are not dependent on other components within the realized design loops, we also consider the size of the realized design loops minus these unconnected components. (These are the components with empty rows and columns in sub-matrices along the diagonal that define the two realized design loops of Ant 1.3.) We take this distinction into account in our analysis that relates component loops and bugs. 12

14 Figure 6: Sequenced hierarchical DSM of Ant 1.3 Next, we develop a theoretical argument for how component loops lead to higher bugs creation. Then, in Section 3, we empirically test such a hypothesis by using the views of component loops presented here. 2.4 Hypotheses: The Effects of Component Loops on the Creation of Bugs This paper argues that certain architectural patterns of a system can significantly impact its number of defects. Although many bugs are uncovered and fixed during the development and testing of a software application, many bugs get shipped with the system and are uncovered by its users. We focus on this latter type of defects. In general, bugs represent undesired behaviors of software systems. Based on findings from the process system literature, we would expect that an important source of bug creation would be the presence of component loops, since they are likely to trigger iterative problem solving (Roberts et al. 2006). Iterative problem solving typically corresponds to difficult and recursive problems that require making assumptions, iterating, and/or compromising, a process which may not converge easily and therefore carries a higher risk of residual errors than serial or parallel problem solving (Eppinger et al. 1994; Krishnan et al. 1997; Terwiesch et al. 2002). Moreover, as more components are involved in such iterative problems, the probability of convergence on a feasible solution decreases (Mihm et al. 13

15 2003; Smith and Eppinger 1997a), which can increase the risk of embedding bugs into the system. Hence, we hypothesize that: H1: The larger the fraction of components of version s involved in component loops, the greater the number of bugs associated with version s. Our first hypothesis conjectures that the presence of component loops will increase the risk of having bugs in the system. However, as discussed in the previous sub-sections, there are various types of component loops. Intrinsic component loops involve the minimum set of components with coupled dependences, assuming that they can be developed together without any hierarchical constraints. However, because source code is organized into modules and subsystems, intrinsic component loops are typically augmented by other components that share subsystem membership. Hence, realized component loops could provide a more realistic indication of the effects of loops as perceived by the developers. Hence, the difference between realized and intrinsic component loops is the addition of components to the intrinsic component loops due to the hierarchies of the architecture. The addition of extra components to intrinsic component loops to form the realized loops has two important effects on the risk of introducing bugs. First, it increases the size of the component loop and therefore (artificially) increases the size of the iterative problem to be solved, which in turn could lead to increase the risk of creating bugs. Second, and more importantly, the additional components could introduce noise into the component loops that could increase the distance between the components involved in the intrinsic loops (the potential root cause of the bugs). This not only makes the iterative problem more difficult due to lack of precision and stability of the information exchanged but also makes it less visible to the developers (Terwiesch et al. 2002). Iterative problem solving is even more problematic when it is not foreseen by the developers (Pich et al. 2002; Sommer and Loch 2004). Hence, we argue that realized component loops can lead to a higher number of bugs, because they are more likely to hide and disaggregate the intrinsic component loops, which otherwise could receive greater focus from the developers. The more extra components involved in realized component loops, the greater this effect, and the higher the risk of creating more bugs. This leads to our second hypothesis: 14

16 H2: Realized component loops have a stronger positive effect on bugs creation than intrinsic component loops. 3 Empirical Study: The Apache Open Source Foundation To test our hypotheses, we study readily-accessible, open-source, Java-based software applications from the Apache Foundation ( one of the largest, best-established, and widely-studied open source communities of developers and users who share values and a standard development process (Roberts et al. 2006). The Apache Foundation has a desire to create high quality software that leads the way in its field. We examined all the Java-based applications developed by Apache, focusing on Java because (1) it is one of the most widely used and open objectoriented programming languages, and (2) it captures components and their dependencies in a structured and explicit manner in its source code. This minimizes the risk of having components or dependencies being masked in the source code and only appearing later at the time of compilation. In total, we identified 69 Java-based development projects at the Apache Foundation in mid This provided our initial database. To effectively examine a causal relationship between architecture characteristics and quality, we needed to obtain a longitudinal dataset, so we down-selected to the 37 applications for which we could access data for successive major releases. That is, we discarded 32 projects because they had a limited history of only one or a few minor releases. From the 37, we selected the applications for which we could access, for successive major releases, their precompiled ( pre-built ) source code (to codify product architecture features), their bug reports (to determine number of bugs), and their release notes (to determine the innovative features and other control variables). After data purification, we compiled a set of 126 releases representing 20 applications with an average of 6 major releases (or versions) each. We used three different sources of data: bug tracking systems, precompiled.jar files6, and release notes. First, we examined the Bugzilla and Jira bug tracking systems of the Apache Foundation to obtain all the data for the bugs associated with each release. Each of these systems 6 Jar files contain all the Java class specifications (including the dependencies among them) for a given Java-based software application. 15

17 allows for users and developers to enter bug reports, which are classified by their potential severity and processed by the development team in a structured way. All bugs which are not fixed by a developer during the writing of the source code and therefore get released with the application go through this process. These databases thus record the status and closure of each bug associated with any release. We developed a web-crawler to automate the gathering of the bug data. Second, we downloaded the precompiled versions (as signified by an existing.jar file) of each application available from the Apache archives and/or the application s website, selecting the versions considered major releases. We did not normally use minor releases since these typically involve relatively small changes. We used a commercially available software application developed by Lattix ( to translate the structure of the source code captured in the.jar file into a matrix representation such as the ones shown in Figures 5 and 6. Finally, we consulted the release notes of each version of all the applications in our sample to find data on newness, age, and other important controls 3.1 Dependent variable Number of bugs associated with version s of application i (y is ). Our main dependent variable counts all the bugs that have been formally identified and attributed to version s of application i. The identification of a bug is carried out by developers or users (with confirmation by developers) after the release. Hence, this variable does not measure the capability of the development organization to discover bugs. Rather, it is a proxy for the number of actual defects embedded in version s of application i. As mentioned, we used the Buzilla and Jira bug tracking systems as the data sources to quantify this variable. Out of the complete list of bugs entered into these systems, we discarded any items that could not be verified as actual bugs by the developers (Classification: WORKS_FOR_ME or INVALID for Bugzilla and Cannot Reproduce or Not A Problem for Jira). We also discarded any bugs that the developers considered duplicates of bugs already registered in the system (Classification DUPLICATE for both Bugzilla and Jira). Attribution of a bug to a code version was primarily determined by the classification in the system (Classification according to data field Affected Versions ). If no version was explicitly given in the bug description, we assumed the bug 16

18 belonged to the most recently released version with respect to the bug entry date. 3.2 Independent variables Our key predictor variable is the extent to which version s of application i contains the various types of component loops (as discussed in section 2) in its source code. Because we can identify component loops in the presence or absence of the constraints imposed by the hierarchical assignment of components to subsystems, we define three types of component loop measures: Intrinsic cyclicality (P I,is ) is the probability that a randomly chosen component in version s of application i belongs to an intrinsic component loop. (Let us recall that intrinsic component loops are defined by the set of components that share coupled dependencies in a sequenced flat DSM such as the one shown in Figure 5.) To determine P I,is we count the number of components involved in loops in the flat DSM of version s of application i (C I,is ), divided by its total number of components (N is ). Hence, P I,is = C I,is, / N is Realized cyclicality (P R,is ) is the probability that a randomly chosen component of version s of application i belongs to a realized component loop. This measure is a function of the number of components that are involved in component loops determined while maintaining the constraints of the subsystems and layers used by programmers to organize their code (C R,is ). To identify C I,is we count the number of components in loops in the sequenced hierarchical DSM of version s of application i, such as the one shown in Figure 6. Hence, P R,is = C R,is / N is Reduced realized cyclicality (P RR,is ) is the probability that a randomly chosen component in version s of application i belongs to a reduced realized component loop. This measure is similar to P R,is but subtracts from its numerator the number of components that do not depend on any other component within the loop. 3.3 Control variables We include two sets of control variables. First, we control for exogenous, non-architectural 17

19 features of the application that are likely to affect the creation of bugs. Second, we control for architectural characteristics that relate to the direct and indirect connectivity among the components of the application so as to test more precisely whether and how system cyclicality might influence bug creation Non-architectural Controls Age of application at version s. This is measured by the number of days since the first release of the application. This assumes that the application is officially born on the date of the first major release and ages with successive releases. The cumulative time between releases is likely to increment both the complexity of and knowledge about the architecture. Since these factors are likely to affect bug creation, it is important to control for the age of the application. Days since last release. The time between successive releases varies within and across applications, so it is important to control for the time span between the previous release and version s. The longer this time, the higher the probability that more changes will have been introduced into the application, which could ultimately affect bug creation. Newness of application at version s. New features and incremental changes to existing features add uncertainty and complexity to the structure of the application. Implementing these types of changes not only consumes development resources but also is likely to introduce unforeseeable perturbations to existing features. Hence, the number of new features and incremental changes in an application is likely to affect the creation of bugs. Using the information from the release notes, we capture both the number of new features and incremental changes associated with each release. New features add functionality, while incremental changes modify existing functions. We measure the newness of version s with two control variables that count the numbers of new features and incremental changes, respectively Architectural Controls The following variables are measured for version s of application i: 18

20 Size of jar file. The overall complexity of a system is a function of the amount of information it carries. We expect more complex software systems to generate a larger number of bugs. We use the size of the jar file (in kilobytes) as a proxy of the raw complexity of the source code. This variable measures the volume of information associated with the software architecture, but it does not capture how such information is broken down into components and how these components interact. Number of nominal subsystems. The application source codes in our data set are complex systems formed by interrelated components. To manage the complexity, developers group the components (Java classes) into subsystems. Typically, subsystems group components that collectively perform certain functions. Such a grouping is likely to affect the cognitive ability of the team to understand the architecture of the source code, and therefore it may influence their propensity to create bugs. Because the assignment of each component to a subsystem is well codified by the naming convention, we are able to count the number of distinct subsystems. Note that this measure counts only the number of component-based subsystems, not any higher-level subsystems that group together only other subsystems. Number of components (N is ). The number of components into which the source code has been decomposed is a basic dimension of system complexity that conditions the architecture of the system and therefore for which we must control (Kauffman and Levin 1987). Internal system connectivity. We use two measures to control for the direct and indirect connectivity among components: o Direct connectivity (K is ) measures the number of direct connections among components (Kauffman and Levin 1987). o Indirect connectivity measures the number of non-zero cells of the binary visibility matrix of the system after subtracting the system s DSM. The visibility matrix (V) of a system is a square matrix (similar to the DSM) whose non-zero cells (v ij ) indicate that component i is connected to component j via a finite number of intermediary components. The 19

21 visibility matrix is obtained by raising the DSM (D) to successively higher powers via Boolean multiplication until the number of empty cells in the resultant matrix stabilizes (MacCormack et al. 2006; Sharman and Yassine 2004; Warfield 2000). Hence, to measure indirect connectivity we count the number of non-zero cells in V-D. Note that because V captures both direct and indirect connectivity we must subtract D from V to control for these effects separately. Number of component loops. Because our key independent variables do not explicitly control for the number of component loops present in the source code, we include a control for it whose value depends on whether we are considering intrinsic or realized component loops. Table A (in the Appendix) shows descriptive statistics and correlations between the variables included in our analysis. There were, on average, 101 bugs, 8 new features, and 23 incremental changes associated with each release. 4 Analysis and Results Our dependent variable is the number of bugs. Several features of our data make statistical analysis a non-trivial task. Because our dependent variable exhibits skewed count distributions (which takes non-negative values only), standard ordinary least-square regressions can lead to inefficient and biased estimates. To deal with this issue, statisticians recommend using Poisson-like regression models developed explicitly to model the count nature of the dependent variables (Cameron and Trivedi 1998). Because the variance is significantly larger than the mean of our dependent variable, negative binomial regression models provide a more accurate estimate of the standard errors of the coefficient estimates of our regression models (Cameron and Trivedi 1998; Hausman et al. 1984). We estimate a model of the form (Cameron and Trivedi 1998, p. 279): E [ y x, ] = " exp( x#! ) is is " i i is That is, our regression models predict that the expected number of bugs of version s of application i depends exponentially on a set of linearly independent regressors (x is ). The exponential form of our model ensures that the dependent variable is always greater than zero. The ß coefficients shown in 20

22 Table 1 are estimated by fitting the model to data. The coefficient ß j equals the proportionate change in the expected mean if the j th regressor changes by one unit. A significantly positive ß j coefficient indicates that, all else being equal, an increase in regressor j increases the expected number of bugs, whereas a significantly negative ß j coefficient indicates that, all else being equal, an increase in regressor j decreases the expected number of bugs. Of particular interest are the ß coefficients for our key independent variables. A significantly greater than zero coefficient of ß iteration_propensity would indicate that the greater the iteration propensity in version s of application i the greater the expected number of bugs. This would be in line with hypothesis H1. The α i are application-specific effects, which can be either fixed or random. These effects permit observations of the same application to be correlated across versions, thereby building serial correlation directly into the model. In a fixedeffects model, the α i absorb time-invariant, unobserved, application-specific features. By doing this we effectively control for any unobserved factors such as the culture of the development team associated with each application, since these are likely to differ across applications but much less likely to change for the same application over successive releases. For the random-effects model, the α i are iid random variables which can be estimated by assuming a distribution for α i (typically a gamma distribution). We report estimates based on the fixed-effects model, which are consistent with the random-effects estimates of those models that pass the Hausman specification test to use random effects (Hausman et al. 1984). Finally, because software development technologies may change significantly from year to year and such developments might affect bug creation across all of the applications, we include indicator variables associated with the year of each release. Table 1 provides the coefficient estimates of the models predicting the expected number of bugs. Model 1 includes a first set of control variables. This model shows that the effect of time since last release is positive and significant, indicating that the longer the time between releases the greater the likelihood of introducing a larger number of bugs. Model 2, which includes the rest of the control variables, suggests that neither the number of components (N) nor the number of direct connections among them (K) are significant determinants of bug creation. However, the significant, negative coefficient of indirect connectivity suggests that the propagation of information through intermediary 21

23 components is likely to reduce the number of bugs. To understand this further, we estimated two additional models (not shown in Table 1) in which we distinguish feed-forward and feedback indirect connectivity (in both flat and hierarchical sequenced DSMs). The results of these alternative models indicate that it is feed-forward indirect connectivity, not feedback indirect connectivity, that is significantly and negatively associated with the number of bugs. Models 3, 4, and 5 include our three measures of system cyclicality, respectively. These models also control for the number of component loops. Model 3 shows a positive (yet not significant) coefficient estimate for intrinsic cyclicality, whereas Model 4 shows a positive and significant coefficient estimate for realized cyclicality. Finally, Model 5 shows that the effect of reduced realized cyclicality is positive but not significant. Hence, Model 4 offers the strongest empirical evidence to support H1. That is, the greater the probability that a randomly chosen component belongs to a realized component loop, the larger the expected number of bugs associated with such a version of the application. The fact that Model 4 (and not Models 3 or 5) shows the largest and only significant effect of cyclicality on the number of bugs provides empirical support to H2. Based on a test of means, the coefficient estimate of realized cyclicality (Model 4) is significantly larger than both the intrinsic and reduced realized cyclicality shown in Models 3 and 5, respectively. Hence, it is not only the presence of intrinsic component loops that may increase the risk of creating bugs, but also the fact that such cycles may be hidden from the developers by the presence of other components in the source code. Our results suggest that increasing the size of the subsystems whose components are involved in loops (even if they are not connected to the other components within the realized component loop) increases the risk of masking the design cycle itself and therefore the risk of creating bugs in the system. 4.1 Bug Fixing To gain further insight into the relationship between system cyclicality and quality, we also examine the determinants of bug fixes. Bug fixes is measured by the number of bugs associated with version s of application i that have been fixed by the developers, as reported by the bug tracking systems. Analyzing the determinants of bug fixing is particularly challenging because it depends on 22

Manuel E. Sosa Assistant Professor of Technology and Operations Management. INSEAD, France.

Manuel E. Sosa Assistant Professor of Technology and Operations Management. INSEAD, France. INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN, ICED 07 28-31 AUGUST 2007, CITE DES SCIENCES ET DE L'INDUSTRIE, PARIS, FRANCE ALIGNING PROCESS, PRODUC T, AND ORGANIZ ATIONAL ARCHITECTURES IN SOFTWARE DEVELOPMENT

More information

How To Develop Software

How To Develop Software Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which

More information

Dependency Models to Manage Software Architecture

Dependency Models to Manage Software Architecture Software development has a way of becoming difficult over time. While they often start well, software projects begin to bog down as enhancements are made to meet new demands and as development teams change.

More information

The Concern-Oriented Software Architecture Analysis Method

The Concern-Oriented Software Architecture Analysis Method The Concern-Oriented Software Architecture Analysis Method Author: E-mail: Student number: Supervisor: Graduation committee members: Frank Scholten f.b.scholten@cs.utwente.nl s0002550 Dr. ir. Bedir Tekinerdoǧan

More information

Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code

Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code 05-016 Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code Alan MacCormack John Rusnak Carliss Baldwin Forthcoming: Management Science 2006 Corresponding

More information

Requirements engineering

Requirements engineering Learning Unit 2 Requirements engineering Contents Introduction............................................... 21 2.1 Important concepts........................................ 21 2.1.1 Stakeholders and

More information

Design Rule Hierarchies and Parallelism in Software Development Tasks

Design Rule Hierarchies and Parallelism in Software Development Tasks Design Rule Hierarchies and Parallelism in Software Development Tasks Sunny Wong, Yuanfang Cai, Giuseppe Valetto, Georgi Simeonov, and Kanwarpreet Sethi Department of Computer Science Drexel University

More information

Software Engineering Reference Framework

Software Engineering Reference Framework Software Engineering Reference Framework Michel Chaudron, Jan Friso Groote, Kees van Hee, Kees Hemerik, Lou Somers, Tom Verhoeff. Department of Mathematics and Computer Science Eindhoven University of

More information

Visualizing and Measuring Enterprise Architecture: An Exploratory BioPharma Case

Visualizing and Measuring Enterprise Architecture: An Exploratory BioPharma Case Visualizing and Measuring Enterprise Architecture: An Exploratory BioPharma Case Robert Lagerström Carliss Baldwin Alan MacCormack David Dreyfus Working Paper 13-105 June 28, 2013 Copyright 2013 by Robert

More information

7 Conclusions and suggestions for further research

7 Conclusions and suggestions for further research 7 Conclusions and suggestions for further research This research has devised an approach to analyzing system-level coordination from the point of view of product architecture. The analysis was conducted

More information

The Design of Complex Systems - A Review

The Design of Complex Systems - A Review Hidden Structure: Using Network Methods to Map System Architecture Carliss Baldwin Alan MacCormack John Rusnak Working Paper 13-093 April 29, 2014 Copyright 2013, 2014 by Carliss Baldwin, Alan MacCormack,

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Foundations for Systems Development

Foundations for Systems Development Foundations for Systems Development ASSIGNMENT 1 Read this assignment introduction. Then, read Chapter 1, The Systems Development Environment, on pages 2 25 in your textbook. What Is Systems Analysis and

More information

Mapping an Application to a Control Architecture: Specification of the Problem

Mapping an Application to a Control Architecture: Specification of the Problem Mapping an Application to a Control Architecture: Specification of the Problem Mieczyslaw M. Kokar 1, Kevin M. Passino 2, Kenneth Baclawski 1, and Jeffrey E. Smith 3 1 Northeastern University, Boston,

More information

To introduce software process models To describe three generic process models and when they may be used

To introduce software process models To describe three generic process models and when they may be used Software Processes Objectives To introduce software process models To describe three generic process models and when they may be used To describe outline process models for requirements engineering, software

More information

A Network Approach to Define Modularity of Components in Complex Products

A Network Approach to Define Modularity of Components in Complex Products Manuel E. Sosa INSEAD Fontainebleau, France manuel.sosa@insead.edu Steven D. Eppinger MIT Cambridge, MA, USA eppinger@mit.edu Craig M. Rowles Pratt and Whitney East Hartford, CT, USA rowles@alum.mit.edu

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Coordination Networks in Product Development

Coordination Networks in Product Development 10 Coordination Networks in Product Development Manuel E. Sosa Abstract Complex products such as airplanes and automobiles are designed by networks of design teams working on different components, often

More information

NASCIO EA Development Tool-Kit Solution Architecture. Version 3.0

NASCIO EA Development Tool-Kit Solution Architecture. Version 3.0 NASCIO EA Development Tool-Kit Solution Architecture Version 3.0 October 2004 TABLE OF CONTENTS SOLUTION ARCHITECTURE...1 Introduction...1 Benefits...3 Link to Implementation Planning...4 Definitions...5

More information

TESTING FRAMEWORKS. Gayatri Ghanakota

TESTING FRAMEWORKS. Gayatri Ghanakota TESTING FRAMEWORKS Gayatri Ghanakota OUTLINE Introduction to Software Test Automation. What is Test Automation. Where does Test Automation fit in the software life cycle. Why do we need test automation.

More information

Systems Engineering Complexity & Project Management

Systems Engineering Complexity & Project Management Systems Engineering Complexity & Project Management Bob Ferguson, PMP NDIA: CMMI Technology Conference November 2007 Outline A conversation Defining complexity and its effects on projects Research into

More information

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces Software Engineering, Lecture 4 Decomposition into suitable parts Cross cutting concerns Design patterns I will also give an example scenario that you are supposed to analyse and make synthesis from The

More information

Soft Skills Requirements in Software Architecture s Job: An Exploratory Study

Soft Skills Requirements in Software Architecture s Job: An Exploratory Study Soft Skills Requirements in Software Architecture s Job: An Exploratory Study 1 Faheem Ahmed, 1 Piers Campbell, 1 Azam Beg, 2 Luiz Fernando Capretz 1 Faculty of Information Technology, United Arab Emirates

More information

COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN

COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN 12 TH INTERNATIONAL DEPENDENCY AND STRUCTURE MODELLING CONFERENCE, 22 23 JULY 2010, CAMBRIDGE, UK COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN Andrew H Tilstra 1, Matthew I

More information

Questions? Assignment. Techniques for Gathering Requirements. Gathering and Analysing Requirements

Questions? Assignment. Techniques for Gathering Requirements. Gathering and Analysing Requirements Questions? Assignment Why is proper project management important? What is goal of domain analysis? What is the difference between functional and non- functional requirements? Why is it important for requirements

More information

CS 389 Software Engineering. Lecture 2 Chapter 2 Software Processes. Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed.

CS 389 Software Engineering. Lecture 2 Chapter 2 Software Processes. Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed. CS 389 Software Engineering Lecture 2 Chapter 2 Software Processes Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed. Topics covered Software process models Process activities Coping

More information

DESIGN FOR QUALITY: THE CASE OF OPEN SOURCE SOFTWARE DEVELOPMENT

DESIGN FOR QUALITY: THE CASE OF OPEN SOURCE SOFTWARE DEVELOPMENT DESIGN FOR QUALITY: THE CASE OF OPEN SOURCE SOFTWARE DEVELOPMENT Caryn A. Conley Leonard N. Stern School of Business, New York University, New York, NY 10012 cconley@stern.nyu.edu WORK IN PROGRESS DO NOT

More information

Measurement Information Model

Measurement Information Model mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides

More information

Applying the Design Structure Matrix to System Decomposition and Integration Problems: A Review and New Directions

Applying the Design Structure Matrix to System Decomposition and Integration Problems: A Review and New Directions 292 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 48, NO. 3, AUGUST 2001 Applying the Design Structure Matrix to System Decomposition and Integration Problems: A Review and New Directions Tyson R.

More information

Analyzing data. Thomas LaToza. 05-899D: Human Aspects of Software Development (HASD) Spring, 2011. (C) Copyright Thomas D. LaToza

Analyzing data. Thomas LaToza. 05-899D: Human Aspects of Software Development (HASD) Spring, 2011. (C) Copyright Thomas D. LaToza Analyzing data Thomas LaToza 05-899D: Human Aspects of Software Development (HASD) Spring, 2011 (C) Copyright Thomas D. LaToza Today s lecture Last time Why would you do a study? Which type of study should

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Visualizing and Measuring Enterprise Application Architecture: An Exploratory Telecom Case

Visualizing and Measuring Enterprise Application Architecture: An Exploratory Telecom Case Visualizing and Measuring Enterprise Application Architecture: An Exploratory Telecom Case Robert Lagerström Carliss Y. Baldwin Alan MacCormack Stephan Aier Working Paper 13-103 June 21, 2013 Copyright

More information

Software Engineering. Software Processes. Based on Software Engineering, 7 th Edition by Ian Sommerville

Software Engineering. Software Processes. Based on Software Engineering, 7 th Edition by Ian Sommerville Software Engineering Software Processes Based on Software Engineering, 7 th Edition by Ian Sommerville Objectives To introduce software process models To describe three generic process models and when

More information

Mining Metrics to Predict Component Failures

Mining Metrics to Predict Component Failures Mining Metrics to Predict Component Failures Nachiappan Nagappan, Microsoft Research Thomas Ball, Microsoft Research Andreas Zeller, Saarland University Overview Introduction Hypothesis and high level

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

APPLICATION OF ICT BENEFITS FOR BUILDING PROJECT MANAGEMENT USING ISM MODEL

APPLICATION OF ICT BENEFITS FOR BUILDING PROJECT MANAGEMENT USING ISM MODEL APPLICATION OF ICT BENEFITS FOR BUILDING PROJECT MANAGEMENT USING ISM MODEL S.V.S.N.D.L.Prasanna 1, T. Raja Ramanna 2 1 Assistant Professor, Civil Engineering Department, University College of Engineering,

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at http://www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2006 Vol. 5, No. 6, July - August 2006 On Assuring Software Quality and Curbing Software

More information

Simulating the Structural Evolution of Software

Simulating the Structural Evolution of Software Simulating the Structural Evolution of Software Benjamin Stopford 1, Steve Counsell 2 1 School of Computer Science and Information Systems, Birkbeck, University of London 2 School of Information Systems,

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

A Business Process Driven Approach for Generating Software Modules

A Business Process Driven Approach for Generating Software Modules A Business Process Driven Approach for Generating Software Modules Xulin Zhao, Ying Zou Dept. of Electrical and Computer Engineering, Queen s University, Kingston, ON, Canada SUMMARY Business processes

More information

QUALITY TOOLBOX. Understanding Processes with Hierarchical Process Mapping. Robert B. Pojasek. Why Process Mapping?

QUALITY TOOLBOX. Understanding Processes with Hierarchical Process Mapping. Robert B. Pojasek. Why Process Mapping? QUALITY TOOLBOX Understanding Processes with Hierarchical Process Mapping In my work, I spend a lot of time talking to people about hierarchical process mapping. It strikes me as funny that whenever I

More information

Requirements engineering and quality attributes

Requirements engineering and quality attributes Open Learning Universiteit Unit 2 Learning Unit 2 Requirements engineering and quality attributes Contents Introduction............................................... 21 2.1 Important concepts........................................

More information

LECTURE 11: PROCESS MODELING

LECTURE 11: PROCESS MODELING LECTURE 11: PROCESS MODELING Outline Logical modeling of processes Data Flow Diagram Elements Functional decomposition Data Flows Rules and Guidelines Structured Analysis with Use Cases Learning Objectives

More information

Software Metrics. Alex Boughton

Software Metrics. Alex Boughton Software Metrics Alex Boughton Executive Summary What are software metrics? Why are software metrics used in industry, and how? Limitations on applying software metrics A framework to help refine and understand

More information

Software Engineering Question Bank

Software Engineering Question Bank Software Engineering Question Bank 1) What is Software Development Life Cycle? (SDLC) System Development Life Cycle (SDLC) is the overall process of developing information systems through a multi-step

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

USER S GUIDE for DSM@MIT

USER S GUIDE for DSM@MIT USER S GUIDE for DSM@MIT TABLE OF CONTENTS 1. OVERVIEW...3 2. INSTALLATION...5 3. FUNCTIONS...7 3.1 Inputs for the Structuring Module...7 3.2 Analyses in the Structuring Module...8 3.3 Editing the DSM...13

More information

Using Library Dependencies for Clustering

Using Library Dependencies for Clustering Using Library Dependencies for Clustering Jochen Quante Software Engineering Group, FB03 Informatik, Universität Bremen quante@informatik.uni-bremen.de Abstract: Software clustering is an established approach

More information

The Problem With Product Quality and Organizational Structure

The Problem With Product Quality and Organizational Structure The Impact of Misalignment of Organization Structure and Product Architecture on Quality in Complex Product Development Bilal Gokpinar 1, Wallace J. Hopp 2, Seyed M. R. Iravani 3 1 Department of Management

More information

Noorul Islam College of Engineering M. Sc. Software Engineering (5 yrs) IX Semester XCS592- Software Project Management

Noorul Islam College of Engineering M. Sc. Software Engineering (5 yrs) IX Semester XCS592- Software Project Management Noorul Islam College of Engineering M. Sc. Software Engineering (5 yrs) IX Semester XCS592- Software Project Management 8. What is the principle of prototype model? A prototype is built to quickly demonstrate

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Implementation of hybrid software architecture for Artificial Intelligence System

Implementation of hybrid software architecture for Artificial Intelligence System IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.1, January 2007 35 Implementation of hybrid software architecture for Artificial Intelligence System B.Vinayagasundaram and

More information

Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects

Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects Daniel P. Delorey pierce@cs.byu.edu Charles D. Knutson knutson@cs.byu.edu Scott Chun chun@cs.byu.edu Abstract

More information

SOFTWARE DEVELOPMENT STANDARD FOR SPACECRAFT

SOFTWARE DEVELOPMENT STANDARD FOR SPACECRAFT SOFTWARE DEVELOPMENT STANDARD FOR SPACECRAFT Mar 31, 2014 Japan Aerospace Exploration Agency This is an English translation of JERG-2-610. Whenever there is anything ambiguous in this document, the original

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Chapter 17 Software Testing Strategies Slide Set to accompany Software Engineering: A Practitioner s Approach, 7/e by Roger S. Pressman Slides copyright 1996, 2001, 2005, 2009 by Roger S. Pressman For

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

3SL. Requirements Definition and Management Using Cradle

3SL. Requirements Definition and Management Using Cradle 3SL Requirements Definition and Management Using Cradle November 2014 1 1 Introduction This white paper describes Requirements Definition and Management activities for system/product development and modification

More information

Using Dependency Models to Manage Complex Software Architecture

Using Dependency Models to Manage Complex Software Architecture Using Dependency Models to Manage Complex Software Architecture Neeraj Sangal, Ev Jordan Lattix, Inc. {neeraj.sangal,ev.jordan}@lattix.com Vineet Sinha, Daniel Jackson Massachusetts Institute of Technology

More information

Basic Trends of Modern Software Development

Basic Trends of Modern Software Development DITF LDI Lietišķo datorsistēmu programmatūras profesora grupa e-business Solutions Basic Trends of Modern Software Development 2 3 Software Engineering FAQ What is software engineering? An engineering

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

A Structured Methodology For Spreadsheet Modelling

A Structured Methodology For Spreadsheet Modelling A Structured Methodology For Spreadsheet Modelling ABSTRACT Brian Knight, David Chadwick, Kamalesen Rajalingham University of Greenwich, Information Integrity Research Centre, School of Computing and Mathematics,

More information

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding Mathematical Modelling of Computer Networks: Part II Module 1: Network Coding Lecture 3: Network coding and TCP 12th November 2013 Laila Daniel and Krishnan Narayanan Dept. of Computer Science, University

More information

(Refer Slide Time: 01:52)

(Refer Slide Time: 01:52) Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This

More information

Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?

Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation? Copyright Network and Protocol Simulation Michela Meo Maurizio M. Munafò Michela.Meo@polito.it Maurizio.Munafo@polito.it Quest opera è protetta dalla licenza Creative Commons NoDerivs-NonCommercial. Per

More information

Chapter 7 Application Protocol Reference Architecture

Chapter 7 Application Protocol Reference Architecture Application Protocol Reference Architecture Chapter 7 Application Protocol Reference Architecture This chapter proposes an alternative reference architecture for application protocols. The proposed reference

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Axiomatic design of software systems

Axiomatic design of software systems Axiomatic design of software systems N.P. Suh (1), S.H. Do Abstract Software is playing an increasingly important role in manufacturing. Many manufacturing firms have problems with software development.

More information

Software Engineering Transfer Degree

Software Engineering Transfer Degree www.capspace.org (01/17/2015) Software Engineering Transfer Degree This program of study is designed for associate-degree students intending to transfer into baccalaureate programs awarding software engineering

More information

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Software Development Life Cycle (SDLC)

Software Development Life Cycle (SDLC) Software Development Life Cycle (SDLC) Supriyo Bhattacharjee MOF Capability Maturity Model (CMM) A bench-mark for measuring the maturity of an organization s software process CMM defines 5 levels of process

More information

Fourth generation techniques (4GT)

Fourth generation techniques (4GT) Fourth generation techniques (4GT) The term fourth generation techniques (4GT) encompasses a broad array of software tools that have one thing in common. Each enables the software engineer to specify some

More information

Effective Peer Reviews: Role in Quality

Effective Peer Reviews: Role in Quality Effective Peer Reviews: Role in Quality Anil Chakravarthy (Anil_Chakravarthy@mcafee.com) Sudeep Das (Sudeep_Das@mcafee.com) Nasiruddin S (nasiruddin_sirajuddin@mcafee.com) Abstract The utility of reviews,

More information

Impact / Performance Matrix A Strategic Planning Tool

Impact / Performance Matrix A Strategic Planning Tool Impact / Performance Matrix A Strategic Planning Tool Larry J. Seibert, Ph.D. When Board members and staff convene for strategic planning sessions, there are a number of questions that typically need to

More information

Software Engineering. What is a system?

Software Engineering. What is a system? What is a system? Software Engineering Software Processes A purposeful collection of inter-related components working together to achieve some common objective. A system may include software, mechanical,

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013 A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:

More information

A Framework for Software Product Line Engineering

A Framework for Software Product Line Engineering Günter Böckle Klaus Pohl Frank van der Linden 2 A Framework for Software Product Line Engineering In this chapter you will learn: o The principles of software product line subsumed by our software product

More information

Development models. 1 Introduction. 2 Analyzing development models. R. Kuiper and E.J. Luit

Development models. 1 Introduction. 2 Analyzing development models. R. Kuiper and E.J. Luit Development models R. Kuiper and E.J. Luit 1 Introduction We reconsider the classical development models: the Waterfall Model [Bo76], the V-Model [Ro86], the Spiral Model [Bo88], together with the further

More information

OPRE 6201 : 2. Simplex Method

OPRE 6201 : 2. Simplex Method OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Utilizing Defect Management for Process Improvement. Kenneth Brown, CSQA, CSTE kdbqa@yahoo.com

Utilizing Defect Management for Process Improvement. Kenneth Brown, CSQA, CSTE kdbqa@yahoo.com Utilizing Defect Management for Process Improvement Kenneth Brown, CSQA, CSTE kdbqa@yahoo.com What This Presentation Will Cover How to Appropriately Classify and Measure Defects What to Measure in Defect

More information

II. TYPES OF LEVEL A.

II. TYPES OF LEVEL A. Study and Evaluation for Quality Improvement of Object Oriented System at Various Layers of Object Oriented Matrices N. A. Nemade 1, D. D. Patil 2, N. V. Ingale 3 Assist. Prof. SSGBCOET Bhusawal 1, H.O.D.

More information

SECTION 2 PROGRAMMING & DEVELOPMENT

SECTION 2 PROGRAMMING & DEVELOPMENT Page 1 SECTION 2 PROGRAMMING & DEVELOPMENT DEVELOPMENT METHODOLOGY THE WATERFALL APPROACH The Waterfall model of software development is a top-down, sequential approach to the design, development, testing

More information

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè. CMPT-354-Han-95.3 Lecture Notes September 10, 1995 Chapter 1 Introduction 1.0 Database Management Systems 1. A database management system èdbmsè, or simply a database system èdbsè, consists of æ A collection

More information

Baseline Code Analysis Using McCabe IQ

Baseline Code Analysis Using McCabe IQ White Paper Table of Contents What is Baseline Code Analysis?.....2 Importance of Baseline Code Analysis...2 The Objectives of Baseline Code Analysis...4 Best Practices for Baseline Code Analysis...4 Challenges

More information

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: hasni.neji63@laposte.net;

More information

SOFTWARE ENGINEERING IT 0301 Semester V B.Nithya,G.Lakshmi Priya Asst Professor SRM University, Kattankulathur

SOFTWARE ENGINEERING IT 0301 Semester V B.Nithya,G.Lakshmi Priya Asst Professor SRM University, Kattankulathur SOFTWARE ENGINEERING IT 0301 Semester V B.Nithya,G.Lakshmi Priya Asst Professor SRM University, Kattankulathur School of Computing, Department of IT 1 2 Process What is it? A series of predictable steps

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Case studies: Outline. Requirement Engineering. Case Study: Automated Banking System. UML and Case Studies ITNP090 - Object Oriented Software Design

Case studies: Outline. Requirement Engineering. Case Study: Automated Banking System. UML and Case Studies ITNP090 - Object Oriented Software Design I. Automated Banking System Case studies: Outline Requirements Engineering: OO and incremental software development 1. case study: withdraw money a. use cases b. identifying class/object (class diagram)

More information