UNIVERSITA DEGLI STUDI DEL SANNIO DIPARTIMENTO DI INGEGNERIA

Transcription

1 UNIVERSITA DEGLI STUDI DEL SANNIO DIPARTIMENTO DI INGEGNERIA DOTTORATO DI RICERCA IN INGEGNERIA DELL INFORMAZIONE CICLO XVII TESI DI DOTTORATO VALIDAZIONE EMPIRICA DEL PAIR PROGRAMMING RELATORI: CH.MO PROF. ANIELLO CIMITILE CH.MO PROF. GERARDO CANFORA DOTTORANDO: ING. CORRADO AARON VISAGGIO COORDINATORE: CH.MO PROF. MICHELE DI SANTO ANNO ACCADEMICO

2 Acknowledgements I d like to thank prof. Aniello Cimitle and Gerardo Canfora for their helpful leading along the three years of my study, explaining me how to study, how to realize research, how to evaluate my work. I hope they give me the opportunity to follow their steps forever. I d like to thank my father and my mother for staying at my side along all the way and giving me the suggestions and the strength to reach my final goal; they teach me to reject the fear to fall down, but to have ever the will to stand up and go on. I d like to thank also Gioia for her understanding, patience and help during the harder moments of weakness. Without her smile nothing could be done. I d like to thank also Gloria for being every time I needed to discuss with her. I d like to thank prof. Mario Piattini, Felix Garcia, Marcela Genero for their fruitful collaboration. I appreciate them both on the human and on the professional side. The deep relationship linking me to them is based on pure esteem and respect. I d like to thank prof. Giuseppe di Lucca and Emilio Bellini for their collaboration in the research. I d like to thank all the researchers and administrative people of the Research Centre of Software Technology (RCOST) for giving me all the organizational and human support to accomplish my research work in a very pleasant way. I d like to thank University of Sannio to give me the opportunity to take part to PhD program. Benevento, July Corrado Aaron Visaggio

3 ACKNOWLEDGEMENTS 2 ABSTRACT 6 CHAPTER I: INTRODUCTION 8 THE PLAN DRIVEN APPROACH TO SOFTWARE DEVELOPMENT: SOME LIMITS EMERGE 8 THE AGILE METHODS: MOTIVATIONS AND PURPOSE 9 THE INVESTIGATION: PAIR PROGRAMMING UNDER THE LENS 11 THE OVERALL ORGANIZATION OF THE THESIS 13 BIBLIOGRAPHY 14 CHAPTER II: THE DEBATE AROUND AGILE METHODS 16 THE ENTHUSIASM 16 THE SKEPTICISM 21 WHAT IS MISSING IN ORDER TO COMPLETE THE PICTURE? 24 THE STATE OF AGILE METHODS 32 BIBLIOGRAPHY 34 CHAPTER III: THE STATE OF THE ART 38 EMPIRICAL SOFTWARE ENGINEERING: THE IMPORTANCE OF THE EVIDENCE 38 EVIDENCE ON PAIR PROGRAMMING 43 FURTHER INVESTIGATIONS 53 THE RESEARCH PLAN 56 BIBLIOGRAPHY 58 CHAPTER IV: PRODUCTIVITY OF PAIR PROGRAMMING 62 THE EXPERIMENT ON PRODUCTIVITY 62 EXPERIMENT S CHARACTERIZATION 63 ANALYSIS OF DATA 64

4 4 STATISTICAL TESTS 67 THE EXPERIMENT ON STABILITY OF PRODUCTIVITY 67 ANALYSIS OF DATA 69 STATISTICAL TESTS 72 CONCLUSIONS 73 CHAPTER V: PAIR PROGRAMMING AND KNOWLEDGE LEVERAGING 74 THE PROBLEM: WHEN COULD PAIR DESIGNING BE THE PROPER SOLUTION? 74 THE EXPLORATORY EXPERIMENT 76 EXPERIMENT DESCRIPTION 76 THE RESULTS 80 THREATS TO VALIDITY 83 PRELIMINARY CONCLUSIONS 84 THE FOCALIZED EXPERIMENTS 85 INTRODUCTION 85 THE EXPERIMENTS 86 THE REPLICA IN SPAIN 89 ANALYSIS OF DATA 89 THE KNOWLEDGE DIFFUSION 91 THE KNOWLEDGE ENFORCEMENT 95 COMPARING KNOWLEDGE DIFFUSION AND ENFORCEMENT 99 EXPERIMENTAL THREATS 99 THE INDIVIDUAL BACKGROUND AS FACTOR OF SUCCESS OF PAIR DESIGNING 100 DESIGN AND RESULTS 100 CONCLUSIONS 102 BIBLIOGRAPHY 103 CHAPTER VI: PAIR PROGRAMMING AND DISTRIBUTED SOFTWARE DEVELOPMENT 107 INTRODUCTION 107 THE EXPERIMENTS 108 THE FIRST EXPERIMENT 108 QUALITY S RESULTS 116 EXPERIMENT S REPLICA 121 THE DISMISSAL PHENOMENON: CAUSES AND REMEDIES 124 EXPERIMENTAL VALIDITY 126

5 5 CONCLUSIONS 128 BIBLIOGRAPHY 130 CHAPTER VII: CONCLUSIONS 132 THE MAIN CONCERN ABOUT AGILITY 132 PRODUCTIVITY AND STABILITY OF PAIR PROGRAMMING THROUGHPUT 133 KNOWLEDGE TRANSFER WITH PAIR PROGRAMMING 134 DISTRIBUTION CAN AFFECT PAIR PROGRAMMING 134 LIMITS OF THE EXPERIMENTATION 135 FUTURE WORK 136 APPENDIX A 137 FORM FORM FORM APPENDIX B 143 USER_BRENCH REGISTRATION 143 BOOK SEARCH AND SELLING TRANSACTION 148 CLASS DIAGRAM 1: ENTITY CLASSES 152 CLASS DIAGRAM 2: CONTROL CLASSES 154 CLASS DIAGRAM 3: PRESENTATION CLASSES 156 QUESTIONNAIRE QA 158 QUESTIONNAIRE QB 159 APPENDIX C 160 EXTREME PROGRAMMING 160 SCRUM 161 CRYSTAL FAMILY OF METHODOLOGIES 161 DYNAMIC SYSTEMS DEVELOPMENT METHOD 162 FEATURE DRIVEN DEVELOPMENT 163

6 6 Abstract In the last decade, the interest toward agile methods has increasingly grown up and several companies have adopted them in software production. However, the agile approach to software process seems to deny and contradict some grounding rules and good practices of software engineering. This motivated the growth of three different positions taken by researchers and practitioners with regard to the agile approach. The first includes the advocates of the agile approach, who think that it is adequate for any context of software production. The second position includes those who think that the agile approach is not suitable at all to software development. The third position considers that the agile approach has advantages and limitations: it is necessary to precisely detect both. The main concerns regards the following issues: Without a definition of a plan to execute and an accurate phase of analysis and design, the performances of the project could deteriorate. The question is: do the costs (and the risk) of the process increase? Without a proper documentation, is it the knowledge about the product and the process transferable, understandable, maintainable? Is the agile approach suitable to all software development contexts, or in some conditions is better to avoid it? The body of knowledge about agile approach lacks of empirical studies, demonstrating which are the positive aspects to save and which are the drawbacks to avoid, with quantitative analysis. The thesis aims at providing empirical evidence about the agile approach with regards to these aspects. In the space of a thesis to study all the agile practices was not feasible; thus, an agile practice was selected: pair programming. This practice was chosen because it seemed to be exemplar of the more general concerns of agile methods. The research on pair programming was developed along three directives: Is pair programming convenient in terms of the ratio costs/benefits? Is pair programming effective for managing knowledge? Is pair programming suitable to every context?

7 7 The research has been conducted throughout an empirical investigation and has produced the results, summarized as follows: Pair programming may lead to an higher productivity if compared with solo programming. Furthermore, the throughput of the practice shows a greater stability and predictability than solo programming. Pair programming has a specific benefit: it supports the diffusion and enforcement of tacit knowledge among project s members. It can not take the place of the documentation as source of knowledge, but it can effectively improve its usage. Pair programming needs a tight collaboration and fluent communication. Without these two conditions the advantages of pair programming are likely to be lost. This is a limitation of the practice and hinders the successful application of the practice in every context. Distributed software development is a case where pair programming did not produce the expected benefits, because collaboration and communication are usually poorer than when the pair s members are co-located.

8 Chapter I: Introduction For long time the plan driven approach to software development was considered the appropriate way to execute a software process. As a matter of fact, this approach brings about many advantages, as well as: ensuring high levels of maturity, obtaining products of quality, controlling and improving the process. The application demonstrated that the plan driven approaches are not successful in every context. They can fail in those environments characterized by a frequent and unpredictable variability. In order to satisfy the need for a proper method to produce software in these contexts, agile methods were introduced. The aim of this thesis is to validate the effectiveness of agile methods. This chapter discusses the goal of research, the method followed in order to achieve the goal, and the motivations which leaded to investigate the effectiveness of agile methods as an alternative to plan-driven processes for software developments. The Plan Driven Approach to Software Development: some limits emerge The term plan-driven refers to a process completely described by a comprehensive definition including the detail of the process model, the procedures to be executed, and the precise responsibilities assigned to the process roles involved. The plan-driven approach for developing software is largely considered as a synonymous for rigor and discipline. The exemplar instance of a plan-driven process is the Waterfall Process Model [9], where the flux of the process is sequential and the execution of each activity depends on the completion of the previous ones. The basic idea of the plan-driven approach is to identify and distinguish clearly the different phases of the process, such as: requirements definition, analysis, design, coding, testing, and installation. The concept of organization s maturity, widespread by the family of the Capability Maturity Models (CMM) [7] from the Canergie Mellon Software Engineering Institute (SEI), emphasized some aspects of process quality such as: comprehensive definition, exact repeatability, continuous monitoring, evaluation of performance through metrics. Consequently, on one hand the quality of a process is considered as tightly dependent on these aspects and thus linked to the organization s maturity, and on the other hand, the respect of these constraints entails a certain stiffness for software production: it is mandatory to execute precise routines to meet certain goals and to produce specific documentation within defined tasks; each deviations from the plan is usually considered a menace to the process quality.

9 9 This approach permits to take processes under control, makes them repeatable and measurable, but also gives the processes a certain rigidity. Consequently, the organization may be not enough flexible to sustain changes in the availability of resources, in the time pressure, in the stability of requirements and so forth. Researchers and practitioners attempted to define methods and process models with a twofold purpose: assuring high levels of quality as well as the plan-driven approach does, and enabling organization to properly respond to the changing of some operative context variables. The continuous proposing of new process models as well as the spiral process of Boehm [2], the Rapid Application Development (RAD), the Rational Unified Process [6], the Prototyping, demonstrated the urgency for finding a trade-off between quality and flexibility. The grounding principles and values of Agile Movement were introduced in the Agile Software Development Manifesto [1], written in the 2001 by a group of practitioners and researchers, who supported the Agile methods since the early nineties. The Agile Manifesto, discussed in the following section, proposes a radical change of direction, with respect to plan driven development, in order to achieve flexibility without deteriorating quality of neither processes nor products. The main concern is that agile manifesto (apparently) contradicts some fundamental propositions of process quality as intended by the plan-driven approach. The Agile Methods: Motivations and Purpose The agile movement was motivated by the need to cope with the turmoil, typical of certain environments of software development, without neither overrunning the schedule nor increasing the budget; the constraints imposed by these situations usually hinder the successful application of plan-driven approaches. The term agility refers to the capability to face a frequently and rapidly changing operative context, without affecting costs and quality of the product in a relevant way. The Manifesto indicates a set of principles, named values, to be followed in order to perform an agile process of software development. In the following, the agile values are discussed, highlighting the differences with the plan-driven s definition of quality. Individuals and interaction over processes and tools. In the plan-driven approach, the knowledge about the process is managed throughout process definitions while collaboration and team coordination is enabled by proper tools, according to precise routines and procedures. In the agile philosophy, the knowledge that each team s member can offer to solve the problems should be available to the overall team; the purpose is to share quickly the knowledge for overcoming pitfalls and remove obstacles as they arise, during the project. The tight collaboration among the people is strongly recommended, relying on the effectiveness of face-to-face communication in terms of bandwidth of the information flux and knowledge transfer. The grounding idea is to gather the best

10 10 people who know what to do and how, rather than letting everyone to execute a process, relying on a precise and accurate definition of the process and on a large availability of the proper tools to accomplish tasks. The loss of a process definition makes the processes unrepeatable: the agile process means a set of practices, well known to the members of the team, executed in a very simple sequence. The idea of process stability needs to be rethought and tailored to agile processes. Working software over comprehensive documentation. The plan-driven approach requires the production of a bulky documentation (e.g. requirements definition, design, test cases, test report). Furthermore, the implementation of the system begins after a series of activities, which do not produce any code: eliciting and formalizing requirements, defining the architecture to be developed, realizing rigorous estimations of the process performances. This is not negative per se, but in certain situations, characterized by a great time pressure and scarce resources, the managers and the team s members prefer to give the priority to the production of working software. It becomes an effective indicator to evaluate the progress of the project in terms of cost; to increase the confidence of developers, managers, and customers about the goals already reached and the ones to reach; and to identify the problems either defects as soon as possible in order to reduce their impact in scope, schedule, and cost. Documentation usually helps the understanding of the process and the product which the team is developing. Which are the effects on the maintainability and transferability of the process and the product? Customer collaboration over contract negotiation. The plan-driven approach fits very well contexts where the requirements are completely defined at the project s kick off. The customer is able to sign a contract which describes precisely the product the Software Organization is going to release. Otherwise, to produce an up-front design to be respected along all the project is really hard. In such a case, the agile approach prescribes the customer to take part to the development process; the customer should assess the versions produced at each iteration and provide timely useful feedback for properly adjusting the design if the product is not satisfactory. This principle does not focus on analyzing the requirements and establishing an high level architecture, but makes the architecture to emerge from the implementation of the code and the resolution of problems at any iteration. The effects of a top-down view of the product could affect its overall quality. Responding to change over following a plan. When the availability of the resources, the requirements, the schedule change suddenly or frequently, or are particularly poor, following a well defined process model may be infeasible or risky. As a matter of fact, the process definition shows a general model of the project, without dealing with the limitations of the actual contexts in which the project is going to be executed. The agile approach suggests to adopt very lightweight processes, which do not own a rigorous definition, but leave the project team free to make different decisions along the project according to the stimuli and the needs deriving from the external universe. This characteristic is intended as permitting to successfully adapt the team and the project to the unexpected

11 11 limitations or resource scarceness. This principle is a threat to the capability of realizing dependable estimations about the project and hinders project management in different facets. The project leader foresees the costs and time, relying on her exclusive experience and evaluation. Appendix C offers an overview of some of the most widespread agile methods. The investigation: Pair programming under the lens The agile approach does not indicate a precise process definition for developing software, but it focuses on the methods to be used in the process. The agile approach provides the guidelines about the practices to be executed and leaves the organizations free to adopt these practices in accordance to the their own culture, capabilities, business goals, and limitations. The commonalties among the different agile methods are: cycling small iterations, which produce a working software, also if not complete; requiring the customer to actively collaborate by providing timely feedback on the product under construction; and give the priority at the software to be released among all the other process products. The main concern with the agile approach is the following: even if it promises to give a great flexibility to the project team, in order to face a dynamic environment, it seems also to contradict and deny a number or rules and good practices of software engineering. Some questions arise: to which extent can agile methods be adopted in industrial setting? Which are their benefits? Which are their drawbacks? The set of the agile practices is too large for studying all of them in the space of a thesis: one practice was selected and put under a lens, with the aim of studying the greatest numbers of its facets. The selected practice is pair programming. Pair Programming consists of two developers, working at the same code, on the same machine: one develops the code, the other one reviews it. The two roles can be switched during the programming session and usually the switch occurs when one developers can not proceed. The idea of pair programming is that two phases, coding and reviewing, which are usually detached and far from each others in the process, are overlapped. The hypothesis behind is that the instantaneous review permits to detect and remove defects right when they are injected in the system. Currently, the debate around agile methods is focuses around three main concerns and it is largely discussed in the chapter II: Q1: Without proper estimates on the process and on the product, the return of investments of agile processes can be seriously affected. What is the ratio costs/benefits due to the agile methods? Q2: Considered all the points of weakness of the agile processes, are there specific benefits of agile methods?

12 12 Q3: Without an appropriate process definition agile methods can fail in certain contexts. Which are the contexts suitable to the adoption of the agile methods? Pair programming was selected because it could be seen as an exemplification of the concerns expressed by Q1, Q2, and Q3, and related to the overall set of agile practices. Suitable contexts Pair Programming Specific Benefits Costs/Benefits Fig. 1. The three dimensions of investigation about Pair Programming. The purpose of the thesis is to clarify the position of agile methods with respect to the plan driven approach, by focusing on the practice of pair programming. In particular, pair programming will be analyzed from the three viewpoints shown in fig. 1. Q1: A claimed benefit of pair programming is the increasing of quality and productivity. This is due to the continuos review of the code, which let pair to remove defects from the code when they are injected. Moreover, the role switching between the pair s components reduces the latency times. The conjecture is that pair programming should ensure high level of effort s stability and productivity. Q2: Pair programming is supposed to leverage knowledge sharing among team s members. This benefit is specific of the practice and could be the reason why managers decide to adopt it also in other kinds of software process. Q3: Pair programming needs tight collaboration and a fluid communication; these are the necessary conditions to perform successfully the practice. But this suggests that if these conditions are missing the benefits of the practice could be lost. Consequently, some operative contexts which do not allow a tight collaboration and a fluent communication could deteriorate the success of the practice. An example is the distributed software development. The conjecture is that the practice could be not adaptable to every context.

13 13 The conjecture underlining the research of this thesis is the following: agile methods are not supposed to be better than the plan-driven approach in all cases, but can be a valid alternative in those contexts where the plan-driven approach fails. The purpose is to understand the actual limits and the actual benefits of agile practices, and in particular, pair programming. The overall Organization of the thesis The thesis is organized in further six Chapters as follows: The Chapter II provides an overview about the three major positions of practitioners and researchers about pair programming. There is not a common agreement about agile methods, because the state of the art is not enough mature and the greatest part of literature consists of either anecdotal experience or qualitative study. The first position considers the agile approach as a suitable solution to certain operative situations: these ones are the advocates of the agile movement. The second position rejects agile methods all together, because its supporters consider agility as too contrasting the good practices of software engineering. The third position suggests to investigate in depth all the aspects of agile methods, in order to save the positive aspects and to avoid the drawbacks. In the second part of this chapter the most relevant knowledge about agile methods is collected, organized as pros, cons, and open issue. The Chapter III focuses on the current body of knowledge about pair programming and illustrates the research plan to be followed in the thesis. The first part provides an accurate examination of the most important studies accomplished on the practice. From such an analysis the motivations for the investigation discussed in this thesis emerges. Basically, what emerges is that the state of the art has a poor evidence of the expected benefits of the practice. The second part describes the plan of the research to execute in the thesis. Chapter IV discusses the experimentation on productivity and quality of pair programming. A concern about pair programming is about the cost of the practice: the organization has to pay two developers for the work of one of them, at least apparently. The advocates of the practice think that pair programming is not the work of one, and, moreover, it is the work of more than a pair of developers. Pair programming should produce better programs, because the review phase occurs at the same time of the development; it shortens the time for delivering the software because it reduces the number of the rework; and, finally, the switch of the pair

14 14 helps avoid the latency time due to the problem which developer ignores how to solve. Chapter V investigates the relationship between pair programming and one specific benefit of the practice itself: the knowledge transfer. Working side by side, at the same piece of code, discussing solutions and alternatives, should foster the knowledge sharing between the two developers. One of the phase which could particularly benefit from an effective knowledge sharing is the design of software system. As a matter of fact, the different levels of abstraction to master when dealing with the software design, make necessary experience and good strategies for successfully operate. This is the reason why, the concept of pair programming was applied to design phase and it was named pair designing. Thus, the chapter investigates to which extent pair designing emphasizes the knowledge transfer among the components of the pair. Chapter VI studies the practice in a particular context for software development, namely distributed teams. In the last decade the distribution of software processes became very widespread, for a number of reason: the need to enlarge the production cycle up to 24 hours a day, the dispersion of kinds of competency, which are required to accomplish the same tasks, projects shared by different organizations on the globe, and so forth. Pair programming requires a strict collaboration and a frequent communication, which are usually poor in distributed teams. Consequently, adopting pair programming in a distributed process could deteriorate the performances of the practice. This chapter investigates how distribution affects the outcomes of pair programming. Chapter VII discusses the conclusions at the three research questions, according with the results of the accomplished experimentation. Furthermore, it offers an overview of the open questions left by this research and the consequent future directions of investigation. Bibliography [1] Beck K., Beedle M., van Bennekum A., Cockburn A., Cunningham W., Fowler M., Grenning J., Highsmith J., Hunt A., Jeffres R., Kern J., Marick B., Martin R., Mellor S., Schwaber K., Sutherland J., and Thomas D. Manifesto for Agile Software Development, 2001, (accesed on the 23 rd of June 2005) [2] Beck K. Extreme Programming explained: Embrace change. Addison-Wesley: Reading, Massachusetts, 1999.

15 15 [3] Boehm B.W. "A Spiral Model of Software Development and Enhancement" Computer, 11(4), 1988, IEEE CS Press, [4] Cockburn A., Writing Effective Use Cases. The Crystal Collection for Software Professionals. Addison-Wesley Professional, Reading, Massachusetts, [5] DSDM Consortium. Dynamic Systems Development Methods, version 3., DSDM Consortium, Ashford, Eng [6] Kruchten P. The Rational Unified Process: An Introduction (2nd Edition). Addison-Wesley Professional: Reading, Massachusetts, [7] Humprey W.S. Characterizing the software Process. IEEE Software, 5(2) 1988,IEEE CS Press, [8] Palmer S.R. and Felsing J.M. A practical Guide to Feature-Driven Development. Prentice Hall: Upper Saddle River, NJ, [9] Rising L. and Jannoff N.S. The scrum software development process for small teams. IEEE Software, 17(4), 2000, IEEE CS Press, [10] Royce W. "Managing Development of Large Scale Software Systems". Proc. of IEEE WESCON, August 1970

16 Chapter II: The Debate around Agile Methods This chapter discusses the different positions of researchers and practitioners on agile methods, with the aim of identifying the most urgent open issues to face and indicating the consolidated body of knowledge. The first section shows the standpoints in favor of agile methods; the second one discusses the points of weakness; and, finally, the third section enumerates the suggestions for successful adoption in real contexts. At the end of the chapter a synoptic table summarizes the three viewpoints, in terms of pros, cons, and open issues about the agile methodologies. This will provide the context and the motivation for the research discussed in this thesis. The enthusiasm Some authors believe that agile approaches are suitable to every context of software development. This section offers a comprehensive view of the most representative motivations of such a position. Agile methods are often considered as contrasting the practices used in the plandriven processes: this is one of the reason why many researchers and some practitioners do not sustain them. Barry Boehm [12] proposes to synthesize the two approaches, agile and plan driven, rather than considering them as polar opposites. The author proposes the planning spectrum, illustrated in the fig.2: unplanned and undisciplined methods occupy the extreme left, whereas the micromilestones planning ones stay at the extreme right. The author compares agile and plan driven approaches in five key areas: Developers. The agility is obtained by relying much more on the tacit knowledge embodied in the team rather than by writing the knowledge down in the plans. When the tacit knowledge is sufficient for the life-cycle needs and it is well communicated or transferred, things work fine. But there is also the risk that the team commits irrecoverable architectural mistakes because of unrecognizable shortfalls in its tacit knowledge. Plan-driven reduces this risk by investing in lifecycle architecture and plans, and using this to facilitate external expert reviews. In this case a risk must be accepted: the changes are so rapid that the plan can not change in perfect accordance with them and they begin obsolete or expensive to keep up to date. Customers. Agile methods are successful when the customers operate in dedicate mode with the development team and their tacit knowledge is sufficient for the full span of the application. These methods risk tacit knowledge shortfalls

17 17 that could be avoided with the plan-driven approaches by producing documentation and review board. Requirements. Plan driven methods work best when requirements can be defined in advance and remain relatively stable with change rates on the order of one percent per month. In the current changing environments the traditional emphasis on having complete, consistent, precise, testable, and traceable requirements encounters difficult to insurmountable requirements-update problems. Architecture. As with requirements, plan driven methods are preferred if the heavyweight architecture can accommodate and anticipate requirement changes. Size. Plan driven methods scale better to large projects. The cost of management for large teams could be not justified for small projects. The author concludes that hybrid approaches combining both methods are feasible and necessary for projects showing a mix of agile and plan-driven foreground characteristics. The agile approaches help to deal with the turmoil in the current technology trends, but the plan driven ones provide the dependability that is strongly requested in many marketplace contexts. The best solution stands in the word balance. Hackers XP Adaptive SW development Milestone Risk-Driven models Milestone Plan-Driven models Inch-pebble ironbound contract Agile methods CMM Software CMM Fig. 2. Boehm s Model. In [18] Cockburn and Highsmith recall that two dominant ideas drive agile development: reduce the cost of moving information between people, and reduce the elapsed time between making a decision and seeing the consequences of that decision. The first goal is met by: (i) placing people physically closer, (ii) replacing documents with talking in person and at whiteboards, and (iii) improving the team s amicability so that people are more inclined to release and spread valuable information quickly. The second goal is reached by: (i) making user experts available to the team or, even better, part of the team, and (ii) working incrementally. Provided that agile methods should respond to turbulent environment, agile teams require responsive people and organizations: in other words, agile teams focus on individual competency as a critical factor in project success.

18 18 People who work together with good communication and interaction can operate noticeably higher levels than when they use their individual talents. For this reason, agile project teams should focus on increasing both individual competencies and collaboration levels. The authors also remark the difference between process and competence: process can provide useful framework for groups of individuals to work together; process per se cannot overcome a lack of competency, while competency can surely overcome the vagaries of a process: people trump process. Agile teams are characterized by self organization and intense collaboration, within and across the organizational boundaries. Self organizing teams can organize themselves again and again as new challenges arise. The project should become an ecosystem: the team has to auto compensate itself both in terms of people interaction and communication. The agile approach is not thought for all: imposing it on process-centric and non collaborative organization is likely to fail. Agile teams excel in exploratory problem domains and operates better in a people-centered and collaborative organizational culture. The same authors, in [16], claim that among the business goals of organizations which develop software, confirming plans has not the higher priority: to satisfy the customer is much more important. The traditional knowledge management strove variations out of the process, because they are results of errors. Today, external environment can bring about changes: in this case they should not be eliminated but properly handled. The organization should embrace the changes without neither decreasing the quality or increasing costs. This purpose could be obtained by: delivering the product in the first weeks, in order to gather rapid feedback or confirmations; inventing simple solutions which help to prevent costly implementations of changes; improving continually the design; and testing continually in order to remove defects in the early phases. Agile methods stress two concepts: the unforgiving honesty of working code and the effectiveness of people working together with goodwill. Working code is a real reference for improvements, whereas having effective people gives the team maneuverability, speed, and cost savings. Agile methods consider the organizations as complex adaptive systems. Such systems can be briefly described as follows: decentralization of the decisions; independent individuals who interact in self-organizing ways, guided by a set of simple, generative rules, in order to create innovative emergent results. Generative rules are a set of procedures to be executed under all situations to generate appropriate practices for special situations. A similar behavior asks for people who are very independent from each other and use intensively their own creativity. Agile approach recommends short iterations in the two-six week range during which the team makes constant trade-off decisions and adjusts to new information.

19 19 These short iterations are combined with feature planning and dynamic prioritization. The development is guided by the features to integrate in the system, or, better saying, something that the customer can well evaluate; on the basis of what the customer wishes, the priority of the features can be modified by adding new ones or deleting those existing. Using agile development methods require close customer partnership. If the customer, either internal department representatives or marketing product managers, does not have a good sense of direction and wander around in strange patterns, agile developers will follow them (with occasional admonitions). Reifer [37] proposes some further interesting observations, by surveying the experience that software engineers gathered in deploying agile methods. The author surveyed ten industry segments and 14 firms. The first relevant piece of information is that the firms chose agile methods basically when they had no projects on time, within budgets, and the projects usually last one year or less. Products under development were mostly quick to market applications, generally web-based and client-server oriented. Project leaders told that these projects involved all the stakeholders of the process, including also the customers, even if the engineers disagreed about how deep was the involvement of all the stakeholders. An interesting data is that quite most of the organization were at level 2 or greater of SEI Software Capability Maturity Model. Five of the 14 organizations used benchmarks in order to capture hard data on cost, productivity and quality, with the following results: productivity: 15 to 23 percent average gain based on published industry benchmarks; cost reduction: 5 to 7 percent on average based on published industry benchmarks; time to market compression: 25 to 50 percent less time compared to previous projects in participating firms; and quality improvement: five firms had data showing that their defect rates were on par with their other products or applications were released. Most used some form of survey to capture stakeholder opinions, and all used recruitment morale, and other intangibles to build a case for trying and retaining agile methods. The author concludes with some recommendations: When adopting agile methods, it is important to recognize that the organization is going to change the traditional approaches to do its own business. The organization should be provided with an appropriate support to successfully accomplish the transition towards agile methods. Support should include startup guidelines, how to checklists, and measurement wizards; a knowledge base of past experience accessible by all; and education and training, including distance education and self-study courses. Supporters of agile methods can be found also in specific sectors of industrial software production: Greene reports [24] a case of applying agile methodologies in embedded software industry, and in particular in developing the firmware for the Intel Itanium processor family.

20 20 Firmware development can be very volatile, with changing requirements and hardware dependencies which force changes without warning. Engineers of the embedded industry are often adverse to oppressive processes and can benefit by the collaboration and team interaction, which are emphasized by agile approaches. Agile methods were applied to projects with unique characteristics as well as commonalties with embedded development in other environments. Hardware design is in constant flux as tradeoffs are made for the silicon area (on microelectronic devices), performance and functionality; the attitude of embracing change is perfect to accommodate this. In the project, the external product requirements are rigorously defined, but the hardware interface available changes. The challenge was to bridge the gap between the architecture and the processor hardware implementation. A key difference between embedded design and pure software developments is that processor design domain knowledge is the key skill, not programming. Pair programming was used for knowledge sharing and code quality, but they should acknowledge that the individual expertise in the group is actually a valuable asset. Unfortunately, it was not possible to really verify the effects of the test first: as qualitative consideration, the practice let to enforce the relevance of the testing in the individual work of developers. The increased collaboration that agile approach fosters and encourages is sometimes positive and other times it is a drawback. Some people can reject to work closely with colleagues, some others could desire a closer collaboration, but this concerns general team management: it is not specific of embedded software development. Efforts have been made to tailor agile methods, like XP, for large, complex projects to achieve faster development cycle times [22], [39]. The reference [14] identifies agile practices which are suitable for large-scale, complex software development. Characteristics of large and complex projects make difficult the use of agile methods; three major issues [20] were identified: (i) the slow spread of application domain knowledge: much effort should be spent in order to understand both the application domain and how the system should perform; (ii) fluctuating and conflicting requirements: they are caused mainly by the variations of the policies and business goals of the organization; and (iii) communication and coordination breakdowns: a large number of groups has to coordinate their activities and share information during software development. Other experiences in the same direction have been realized by further authors: Elssamadisy [22] and Reifer [38] report on large projects where the managers tried to adopt XP. Some practices, such as iterative development, frequent testing and feedback, small release, and refactoring are suitable for large projects, some others, such as standing meetings and the use of metaphors are not. Reference [22] proposes a methodology to tailor agile practices: practice 1: the use of flexible architectures and design patterns is important; up-front stable architectural design results in a strong backbone that

21 21 could support several services built on top of it. Up-front architectural design reduces the development time for new features and helps to control the cost of change; furthermore it provides the developers with a clearer understanding of the entire system and helps them understand how several services can fit into the backbone architecture. Authors suggest to accompany this practice with pair programming, refactoring and short releases; practice 2: to deliver end to end features in each short iteration; this solution should let the continuous accommodation of requirements changes at the system; practice 3: to use product managers as surrogate of customer to be involved in the development cycle; practice 4: developers should be paired in analysis, design and testing; coding should be done in solo programming; practice 5: choose the right people for the team and create a collaborative environment to support teamwork; and practice 6: refactoring should be used as a technique to enhance reuse. The skepticism There are authors who are not convinced of the appropriateness of agile methods and of their supposed benefits: this section discusses some of the most relevant works sustaining this position. Steven Raktin [35] strongly disagrees with the advocates of agile practices. He points out some heavy accusations to agile movements. He states that the scientists and the software industry waited long time for having a disciplined and respected software engineering process. Nowadays, software engineering is becoming predictable and it is starting to satisfy the customer s needs: successful business delivers what customer expects in the indicated timeframe. According to the author, agile methods encourage the undisciplined approaches, which the author names as an hacker s approach : an hacker prefers to write code and to turn to customer only when she doesn t know how to solve a problem. The author provides an interpretation of the agile manifesto from this perspective: individuals and interaction over processes and tools means: talking to people instead of using a process gives the developers the freedom to behave as they wish; working software over comprehensive documentation means: the developers want to spend all the time coding. A very common idea is that real programmers do not write documentation; customer collaboration over contract negotiation: haggling over details is merely a distraction from the real work of coding. The work on the details has to follow the delivery of the product; and responding to change over following a plan means: following plan implies that developers have to think about a problem and the likely solution for it. Developers prefer to code rather than doing other tasks.

22 22 Skowronski [42] claims that agile methods could hinder the work of the best programmers, because it leads to an environment that does not let them to isolate from the surrounding world and to propose their best solution for a problem. The author demonstrates his thesis by referring to the four phases involved in the problem solving process and defined by psychologists. (i) Preparation: the problem solver gathers all the information about the problem and starts to prepare possible solutions; (ii) incubation: the problem solver consciously works at the problem; (iii) illumination: it is less a phase than the moment when the solution appears as a flash of insight; and (iv) the flash of insight expands up to the complete solution and the problem solver tests it in the reality. The authors claim that agile approach does not help anyone of these phases. During the preparation, the problem solver needs to search information and usually it means to leave the team and work either alone or outside the workspace of the team. Agile methods emphasize the production of code above all, and gathering information or thinking about solutions are not writing code. This entails that the preparation phase does not bring out anything that can be showed to the customer or management. The incubation phase could be also more difficult for problem solvers than preparation phase. The problem solvers, after having spent time gathering information, start to think. The oral communication is distracting for incubation s work and, conversely, incubation does not produce anything tangible. The illumination and verification phase could happen in the agile teams, but if the members reject the proposed solution or do not understand or agree, they will tend to brainstorm all together in the agile style, contradicting the problem solving process. The author concludes: Programmers with poorer analytical skills could be better suited to this work [with a lot of known problem areas and interactions with customers] than highly skilled problem solvers who might find such work boring. [ ] If an application has one or more unsolved problem areas, however, agile methods could be inappropriate. This situation requires good problem solvers, and people skills will be less relevant. Manhart and Schneider [32] discuss their experience at Daimler Chrysler in introducing agile practices for developing embedded software. They initially proposed two agile practices, Test Unit and Test First, to integrate in the traditional software process, in order to make managers and developers trustful in agile practices, considered that they appeared reluctant. A critical point the development unit had to face was the timely delivery of customer-specific functions. Many reasons can bring about a late delivery, such as: unrealistic planning: development, quality, assurance, integration and release cannot be shortened under certain thresholds; expert unavailable: writing embedded software requires special domain knowledge s experts and they are not frequently available; changing priority: priorities may be unclear or ignored; and requirements misunderstood or changed: a software development or a sales representative with vague ideas about the desired functionality may cause several changes in iteration cycles to clarify requirements.

23 23 The business goals to meet were: (i) increase amounts of customer-specific software based functions, and (ii) to achieve zero bugs policy: busses as public transportation systems are safety-critical products. By applying agile practices some technical challenges have been identified: 1. Agile methods can not be applied as is. 2. Compared with other technologies used in embedded software development, agile techniques are in their infancy. The community does not wish to revolutionize too much at a time. 3. A slow step-by step introduction should be integrated with a traditional measurement framework. This does not only foster acceptance of management and engineering staff, but it will also provide a sounder basis for judging agile merits. 4. The embedded software community needs reliable data soon, and from many sources. Measurement must be applied to agile elements in embedded software process improvement. At the First Invited Canadian Workshop on Scaling Agile Methods, held February 2003 in Banff, interesting highlights emerged [38]. Difficulty arises where teams of teams work together: agile and plan driven approaches should be used together. The main problem is: to which extent agile methods can scale without sacrificing agile principles? Agile practices should be augmented in order to fit large projects. Some researchers claim that the concept of daily team meeting should scale to a daily intrateam meeting. Others authors discussed the issues associated with properly fitting requirements engineering concepts. Most applications exploit existing architectures and use extensive amounts of legacy software, commercial off the shelf and components. Coordinating the use of standard frameworks and components across teams of teams when agile methods and collective ownership principles are used was another relevant issue. Other ones focused on the relationships of agile approach with legacy systems and product families and product lines. Some thought that systematically generalizing and aggregating user stories across similar applications to factored-out-framework stories could help in managing requirements. Handling distributed development in an agile project is another issue. Synchronizing the beats of the individual teams and the possibility that the slowest or the weakest team would determine the project s overall pace or the major risk. In a large project, assuring sufficient customer involvement is unrealistic. The main problems concern how to mechanize the agile approach without unduly burdening the customer community. Some lessons learned emerged: instead of talking about what the product functionality would be, developers could show customers products which are potentially shippable; a rough architectural development was needed before pushing ahead with iterations. This could be done quickly by using an architectural team. A stable architecture could provide the team with the context for decision making. This approach poses a threat to agility because it might tip the scale in favor of up-front planning rather than letting the architecture emerge naturally;

24 24 package components with wrappers which communicated essential about their features to those unfamiliar with them would minimize integration problems on large projects; and large teams tend to be less tolerant to change. Adopters of agile methods should therefore be extremely conservative when setting expectations for change. What is missing in order to complete the picture? There is a number of authors who think that agile methodologies are neither good or bad as they are, but further studies should be accomplished for understanding their real advantages and limitations. Authors in [27], investigated quality-centric supporting software processes, by comparing the waterfall model and the agile methodology. The authors focused on two processes: Software Quality Assurance (SQA), aiming at governing the procedures for building products with the desired quality, and the Validation and Verification (V&V), aiming more directly at the product quality, including also the intermediate products. The authors analyze (and propose) the implementation of SQA in agile process: system metaphor is used instead of software architecture: it presents a simple story describing how the system works; usually this story is an handful of classes and patterns. The metaphor aims at making the team s members to communicate among each others and with the customer, and improve the system architecture; on-site customer prescribes to involve the customer in the development process. This agile practice is in place of the milestone reviews in the waterfall development and it is also heavier. The customers should correct and refine requirements; pair programming is intended to improve design quality and reduce defects by continual code and design review; refactoring helps restructuring an existing piece of code, altering its internal structure without changing its external behavior. Because each refactoring is small, the system is kept functional during this activity, with the advantage to provide a continuous improvement of the code; continuous integration allows to detect integration bugs at the early phases of the process and to remove them. In the waterfall process integration is made in the later phases, causing high costs of defect removal; and acceptance testing occurs much earlier and more frequently in an agile development: it is not done once. Summarizing three main issues emerge: (i) many of the agile quality activities occur much earlier than they do in waterfall development; (ii) the frequency of these activities is much greater than in the waterfall model: most of these activities are included in each iteration; and (iii) agile development has fewer static quality assurance techniques.

25 25 With agile approach, developers are more responsible for quality because they are directly involved rather than to have a separate team concerning of this kind of activities. Authors conclude that, it is very difficult to compare completely the quality produced by the two different kinds of processes because usually the starting conditions are different, as well as the costs. Abrahamson et al. [11] attempt to organize, analyze, and make sense out of the wide field of agile software development methods. The authors show five analytical lenses for addressing the research purposes: software development life-cycle, project management, abstract principles vs. concrete guidance, universally predefined vs. situation appropriate, and empirical evidence. Xp has recently been supplemented with extensions for the project management but they do not offer a comprehensive view. Some authors suggest to complement Scrum with other methods specially indicated for Project Management. DSMD approach to project management consists of facilitating the work of development teams with daily tracking of the project s progress. Crystal s solution focuses on increasing the ability to choose the correct method for the purpose. With regards to software development life-cycle, DSDM provides development teams with a complete support to the life-cycle, but only at managerial level. The Crystal family covers the phases form design to integration test. Xp, FDD and Scrum are focused on requirements specification, design, implementation, and testing up until the system test. The guidance without practices is usefulness and the practices need a guidance in order to be well-implemented. ASD is more focused on concepts and culture than software practice. Crystal does not provide any practice about how to realize the large amount of principles it includes. DSDM does not indicate any principle because each organization is different. AM and Xp have been derived by bundle of practices with the aim of collecting them into the software development process. Scrum defines the practices and offers guidance for the requirements specification phase as well as the integration testing. Implementation is not part of the method. A methodology usually needs to be adapted to the context: in this sense authors define the attribute situation appropriateness. The analysis brought out that Crystal and FDD were the less flexible and situation appropriate methods. The last viewpoint explores if the methods exploit any empirical support. ASD, Crystal, FDD, and Scrum are derived from subjective practical experience but they were developed without relying on any reliable systematic research. Parts of Xp have been studied empirically, although it was created by the individual experience, too. DSDM has been developed by a dedicated consortium and it claims that empirical evidence exists in form of experience reports. Some relevant conclusions were drawn: (i) project management should be addressed explicitly because it help to link developers with the project management; (ii) more work is needed in order to determine how the practices should be realized in different organizations; (iii) further studies should explore how the method should be adapted at the different situations: this includes the capability to identify and define the set of different situations; (iv) empirical studies are needed in order to obtain

26 26 dependable quantitative analysis of agile methods; finally, (v) the methods which cover too much ground, are too general or shallow to be used, whereas the methods which cover too little are too restricted or lack a connection to other methods. An interesting debate is the one realized between Tom DeMarco and Barry Boehm around the limits of agile methods and published in [13]. Barry Boehm, as he already discussed in previous publications, notices that the point of the discussion is to find the right balance between armor and discipline and mobility and agility. The first issue stemmed out is that, in order to successfully adopt agile methods, a trade off between the speed achieved due to saving time in documenting and the risk increment, for the same reason, needs to be reached. The other point is the central role of individual capability. A superbly training is necessary in order to put team s members in a position of implementing the agile practices minimizing the risk due to the lack of a well define process which leads the Project. Boehm suggests that it is necessary at least a person owning a deep domain knowledge in addition to the technical skills: the problem is that the domain knowledge is not easy to acquire. Organizations prefer to establish a set of rules and a spirit of team, rather than to invest for building individual (superb) technical skills. The last point of discussion concerns the current situation of documentation s practices as they are present in many organizations. Many organizations are so much document-centric that it results in a documentary bloat. As a consequence, each new change in the process tends to be additive: a new component is added to the process and it is imposed on all projects. Agile methods are a kind of backlash against this widely understood flaw in the resultant fixed process. Agile methods have been proposed to deal with today s turbulent business and technological environment. The goal is to enable an organization to deliver and change quickly. However the applicability of agile approaches is constrained by several factors such as project size and type, experience level of project personnel, and access to committed customers. Bohem [12] argues that agile methods are difficult to scale up to large projects because of the lack of sufficient architecture planning, over-focusing on early results and low levels of test coverage. It is also recommended that agile methods are not used in mission-critical software development. However, in the current dynamic business environments, agility is also needed for large projects that face the same issues addressed by agile methodologies such as changing environment, ambiguous user requirements, and time pressure. Agile methods cannot be adopted directly for large, complex projects due to the lack of up-front design and documentation [36]. Most experts agree that agile and traditional plan driven approach are philosophically compatible [39]. For example, Xp practices have been mapped to Sw-CMM Model that usually is considered appropriate for large-scale projects organizations [34]. The author, Mark Paulk, shows how XP can help organizations realize the Sw-CMM goals. The Sw-CMM focuses on both the management issues involved in implementing effective and efficient processes and on systematic process improvement. Xp, on the other hand, is a specific set of practices that is effective in the context of small, colocated teams with rapidly changing requirements.

27 27 The author notices that Xp generally focuses on technical work, whereas Sw- CMM focuses on management issues: both methods are concerned with the culture. The element that Xp lacks and that is crucial for the Sw-CMM is the concept of institutionalization : Xp largely ignores the infrastructure that the Sw-CMM identifies as key to institutionalizing good engineering and management practices. As system grows, the XP s practices are more difficult to implement: it is targeted toward small teams working on small-medium size projects. The main objection to using Xp for process improvement is that it barely touches the management and organizational issues which, on the contrary, the Sw-CMM emphasizes. Xp process is itself clearly defined. The author concludes that Xp and the Sw-CMM can be considered as complementary: the Sw-CMM describes the objectives (what to do) and Xp focused on the practices (how-to information) for reaching the stated goals. Cohn and Ford [19] identified several approaches for successfully introducing agile processes in an organization. A positive outcome is that programmers found enjoying to produce non-code artifacts; a drawback is the so called analysis paralysis : they seek opportunity for introducing formalized tasks in the agile processes. Agile processes favor frequent meetings with managers so that they are perceived as a form of micromanagement. If meetings with managers occur once a week or less in a plan-driven process, the meeting with the manger is more frequent, up to once a day. This could be beneficial for developers in order to better remove problems and reformulate deadline and constraints; but it could be perceived as a way to create a greater pressure on the team. In order to solve the problem the managers should behave as not complaining continuously with team members, but actively help them to solve problems. Some developers prefer an heavyweight plan-driven process because they believe they look better on a resume. Some teams, when first introduced to a scrum (it is a phase of the SCRUM process), are overwhelmed to the point of inaction by the freedom of not having a day-by-day Gannt chart directing their work. Authors suggests the realization of some actions which should help solve this kind of problems: prototyping, requirement capture, analysis and design, implementation, and stabilization. Agile processes do not have separate coding and testing phases; code written during an iteration must be tested and debugged during that iteration. Testers and programmers work more closely earlier in an agile process than in other processes. Testers tend to perform illegal activities, such as: using early iteration for sneaking the code, and write unit tests for programmers. In the former case it is better to hire the tester in the development team if he is able to code; in the latter case the author of a program can write good white-box tests. A further experience from the field is discussed in [21] by Gery Derbier, concerning the relationship between agile development and the old economy. The project objective was to deliver an automated hub for the French postal operator LPCI, that is the integration and ramp-up in a large facility of : (i) a complex assembling of automation equipment provided by several companies and sub-contractors;

28 28 (ii) a feature rich Information System supporting a complex process with multiple intricate sub-process and exceptions. Some additional pieces of information can help to understand the project s scope: the facility employed 700 people; 200 international flight departures are handled every day; more than 150 trucks arrivals and departures every day; about 30 different process flows are supported; about 80 different workplaces equipped with one of the 20 different applications developed; deployment of the solution within 5 months; and more than 250 tables in the database, about 2000 C++ classes, and executable code lines. The paper discusses how agile methods helped to face the following challenges: the project was business critical for the customer; it was the first time the customer used an integrator to provide a turn-key system; and it was a fixed-price and fixedtime contract with a short frame. Agile methods were chosen because: (i) the overall vision and mission were clearly articulated; (ii) the business was complex to understand, a significant part of tacit knowledge was involved; and (iii) a huge quantity of new information to acquire. The team used the following planing techniques: one-month structured iterations; items planned in each iteration were 5-10 days use cases; and incremental specification with use case, according to the suggestions of Cockburn [15]. The development practices used were: unit testing, frequent integration, continuous testing, and automated regression testing. The paper also discusses the results of applying agile practices: as a manager, the author writes that the agile approach was a way to make the team to learn quickly and to always be kept awaken. However, not all the people feel comfortable with uncertainty and agility; it is not easy for people to discuss about what was wrong with their own job. The first solution is to address the emotional part by giving each participant the opportunity to express how he feels about the situation; second, try to state the problem and find the solution; automating the tests could be a way for put pressure on the team for getting correct and working code; the estimation process and techniques are not well known among developers. In order to have a self-organizing team, the team must truly share the estimation process. It is the root of risk analysis and commitment taking; and it useful to have in place some techniques for evaluating scenario-based software architecture. The goal of the NSF-Sponsored Center for empirically-based Software Engineerings(CeBASE) ( is to collect, analyze, document, and disseminate knowledge on software engineering gained from experiments, case studies, observations interviews, expert discussions, and real world projects. A central activity toward achieving this goal has been the running of e-workshops [28] for capturing expert knowledge to formulate heuristics on a particular software engineering topic. The rise of agile methods provides a fruitful area for such empirical research. The eworkshop organizers planned to discuss each of these issues; people, team, size,

29 29 design simplicity, applicability for high assurance systems, outlined in Boehm s article in relation to the Agile Manifesto. The main findings were: (i) The most important factor that determines when agility is applicable is probably the project size. There is plenty of experience of teams of up to 12 people; some descriptions of teams around 25; a few data points of size team of up to 100 people, e.g person team, as described in [17] ; and, finally, isolated descriptions of teams larger than 100 people, as well as, in [26]. In one occasion, an 800 person team was, for example, organized using scrum of scrum [41]. There is not a common agreement about the influence of team size on project s agility. According to Cockburn, size is an issue, but, according to other authors, as [26], strategies for coping with larger teams exist and can be adopted. (ii) There is an ongoing debate about whether or not an agile process requires good people to be effective. Participants agreed that a certain percentage of experienced people are needed for a successful agile project. There was some consensus that 25%-33% of the project personnel must be competent and experienced. The level of experience might even be as low as 10% if the teams practice pair programming [44]. (iii) One of the most widespread criticism of agile methods is that they do not work for systems that have criticality, reliability, and safety requirements. Some participants felt that agile methods work if performance requirements are made explicit early, and if proper levels of testing can be planned for. A consensus seemed to form that Agile emphasis on testing is the key factor for applying agility to criticality. (iv) An important issue is how to introduce Agile Methods in an organization and how much formal training is required before a team can start using it. A majority of the participants felt that agile methods require less formal training than traditional methods. The emphasis is rather on skill development, not on learning agile methods. (v) To be agile is a cultural attitude. If the culture is not right, then the organization cannot be Agile. In addition, teams need some amount of local control. They must have the ability to adapt working practices as they feel appropriate. The culture must also be supportive of negotiation, as negotiation is a big part of the Agile culture. Participants concluded that Agile Methods are more appropriate when requirements are emergent and rapidly changing. (vi) Participants concluded also that the daily meetings provide a useful way of measuring problems. (vii) Product and project documentation is a topic that has drawn much attention in discussions about Agile. There is not complete agreement on that: Boehm mentioned that a documented project makes it easier for an outside expert to diagnose the problem; Kent Beck claims that an outside expert diagnosis should not require documentation but technical details; Bill Kleb said that documentation is assigned a cost and its extent should be decided by the customer.

30 30 Agile methods were also used in maintenance projects as[33] and [40]. The first experience is reported by Poole and Huisman and it concerned the use of Xp. The authors worked at an enterprise which showed several problems, as well as: testing, visibility, morale, and personal work practices. From a management standpoint they wanted to do more with the less: higher productivity, coupled with increased quality, decreased team size, and improved customer satisfaction. They started to look at Xp and learned that many of its elements come naturally to teams working on maintenance game. They seeded their process with Xp s practices. Some interesting outcomes deserve attention: metrics seem to show some quantitative improvements in productivity during the use of pair programming when addressing specific deliverables or fixing bugs; one of the greatest success stories concerns the improvements on visibility. This was the greatest benefit on the team. Having a storyboard on which to daily prioritize tasks and discuss progress encourages best practices, lets people see what others are doing and lets management gauge project; and metrics are critical to the replanning game, but getting engineers to contribute is difficult. It is necessary that the engineers are able to realize dependable estimates of their work, to plan how long it will take to complete the story. Peter Schuh described the experience of using a process similar to Xp in very stressing conditions. First of all the impending deadline did not concede using Xp. The fully adopted practices were: simple design, refactoring, collective ownership, and continuous integration; whereas the partially adopted were: planning game, pair programming, on site customer, and coding standards. Finally, the reaming ones were not adopted at all. Some lessons learned emerged from the experience: if a project has a simple build process and a serviceable test suite, the next step is to combine these and tackle continuous integration: there are two ways of doing it. The first is the standard serialized Xp practice based on a build machine or build token, where developers queue up one at a time to build, test, and integrate their changes. The second one is for developers to build and test on their own machines before checking in; the practice of coding standards is easy to encourage when you have an inhouse or open standard that you can use simply to pass or about and then occasionally refer to during discussions; and the database is an essential component of every business application. Nothing good ever comes of a database architect without a mind to conversion or reporting. In lieu of an object model or other documentation, a data model can provide an extremely handy overview of the application. Lycett et al. [31] propose a framework for allowing the mature organizations to migrate from a plan driven philosophy of developing to a more agile one. They state that for the approach to be truly successful, organizations must grasp the opportunity to reintegrate software development management, theory, and practice. The effects of disjunction have been evident long enough. Lippert et al. [30] discuss some extensions to apply in order to adapt Xp for largescale or long-term projects: they offer a high degree of security and reliability without

31 limiting the advantages of agile software development. Such extensions focus on the planning and controlling aspects of development, demonstrating that, contrary to the opinion of many IT managers, a suitably adapted agile development process is ideal for long-term projects and the development of large systems. Grenning [25] describes the starting of a project using an adaptation of Xp in a company with a large formal software development process. The project was an embedded-systems application running on Windows-Nt and was part of a network of machines which had to collaborate to provide services. The team and the managers were impressed with the results in terms of productivity and quality. Some suggestions are provided by the author: the managers should: (i) try XP on a team with open minded leaders; (ii) encourage the XP practices; (iii) recruit who want to try XP rather than force people who do not want; (iv) focus on caching; and (v) prepare to receive and elaborate the frequent feedbacks. The engineers should: (i) identify problems that can be solved; (ii) identify the risks and the benefits; and (iii) iterations should be short. Lindvall et al. [29] aim to analyze under which conditions and in which environments agile practice work well; this is needed before employing it in large scale. In order to accomplish the analysis several Software Experience Center (SEC) memeber companies initiated a series of activities to discover if agile practices match their organization s needs. The data were collected on some pilot projects realized by the companies in XP style, and most of them were qualitative rather than quantitative and this is a limit of the research. The overall experience resulted very successful for the most of the survey participants. ABB (an enterprise) found increased the quality of code, whereas the pair programming helped sharing knowledge and the agility actually sustained the feasibility of rapid changes. DaimlerChrysler s experience demonstrated that using agile methods combined with constant testing and other classical QA techniques produced high-quality software throughout the projects. Furthermore they discovered that Xp cut costs while achieving high levels of quality and customer satisfaction. The projects executed at Motorola shows the achieving of high levels of quality comparable to or better than the division average. Most of the experiences demonstrated the need for tailoring Xp: without adapting Xp to the own proper processes and technology the adoption of the agile practices is unfeasible. As a matter of fact, the problem does not stand in the agile processes themselves, but in the interaction with the other processes of the organization. This is seriously felt especially in large organization, where procedures, platforms, responsibilities, and documentation follow general policies. The large enterprise work with distributed teams over the globe and usually they have to communicate or collaborate because they work at the same macro-project. The Xp teams rely on a strict collaboration and they usually find hard to collaborate with other teams, especially if they have different culture or adopt processes different form the agile ones. In such cases the architecture suffers this kind of misalignment, with dangerous consequences for the success of the project. Motorola suggests to overcome this obstacle by conducting cross-teams workshop, but there is no evidence of the results of this solution. Refactoring seems to clash with ABB s desire to minimize the code base changes and with the policy if it is not broken, do not fix it. 31

32 32 Traditional scope and delivery planning clashes with Xp s core ideas, because it does not involve the team and tends to produce documentation which defines the scope and the delivery time in advance. The requirements do not take the form of user stories, had not been defined with the involvement of the developers, and were seldom accurate by the time development actually started. Xp expects that acceptance tests are run continuously during the iterative production cycles. The double acceptance test were executed because the customer and the quality systems often prescribe to run them separately. Quality systems generally clashes with Xp or requires additional work, which is, apparently usefulness. Pair programming should be in place of formal review, but in large organizations the pair is not able to master all the aspects of the system. The conclusions of the work are that the agile practices could give more flexibility to large organizations but further studies are necessary to use properly them. The state of agile methods The previous sections provide a view on the state of the agile methods in terms of: possible advantages and limitations; aspects which deserve major investigation; and points of weakness. The synoptic Table 1 summarizes the main issues of the chapter, sorted as cons, pros, and open issues. Table 1. Pros, Cons, and Open Issues of Agile Methods. Context Quality of the practice Contribution The Agile methods encourage the hacker s approaches (con). Appropriateness of agile methodologies to human work. Agile methods are not in accordance with the common process of problem solving that humans use (con). Combine agile approaches with plan-driven ones. Context of application for agile methodologies Context of application for agile methodologies Hybrid approaches that combine both agile and plan-driven methods are feasible and necessary for projects whose context is unstable and require a strong dependability (pro). Agile teams excel in exploratory problems and operates better in a people-centered and collaborative culture (pro). Agile methods aim at mastering the changes without increasing costs and keeping high the quality of products (pro).

33 33 Comparing Software Quality Assurance between agile and waterfall processes Agile methodologies for developing embedded software Agile methodologies for developing embedded software Scaling agile methodologies General observations from the field (i) Many of the agile quality activities occur much earlier than they do in waterfall development; (ii) The frequency of these activities is much greater than in the waterfall model: most of these activities are included in each iteration; and (iii) agile development has fewer static quality assurance techniques. (open issues) Agile methodologies are not yet enough mature for this kind of industry; a measurement framework is necessary for a well suited introduction; and, finally, actual data are required before making any decisions about the adoption of any agile methodology. (con) Agile methods seem to fit well embedded software production, especially concerning the knowledge sharing and the strong emphasis on intra-team communication, that are key factors for the success of this kind of projects. (pro) Agile practices should be augmented in order to fit large projects: (i) The practice of collective ownership can be unfeasible; (ii) Assuming the customer involvement is (iii) unrealistic; To govern collaboration of many teams, sometimes dispersed in the agile way is difficult. (con) When adopting agile methods, pay attention to change the way the enterprise makes business. The transition to agile methods need support. Use agile methods in turbulent environment but not mission-critical. The size of the project is a discriminating factor for deciding if using agile methods or do not. The 25% of the personnel should be well skilled and experienced. Agile methodologies do not require training but individual skills. Documentation should have an additional cost and it should be sustained by the customer. Agile methods needs extensions or

34 34 The relation between SW- CMM and XP customization in order to be successful. (open issues) They are complementary: the standards described the organizational view, whereas the agile approaches describe the practice to achieve the goals. (open issues) Agile Practices in Large Organizations They give flexibility, improve quality without affecting the project schedule (pros) It is necessary to tailor XP and to understand how they can interact with existing processes and teams in the Organizations (open issues) Pair programming can not substitute the formal reviews, and refactoring clashes with ABB s scope (cons). The conducted analysis shows that research upon the agile methods is yet at its beginning and, there is a huge number of aspects to be investigated in depth. Considering the open issues previously described, the research concerning the thesis deals with three problematic areas: The ratio costs/benefits. The thesis investigates if agile approach deteriorates the performances on projects. Specific benefits of the practice. Agile methods do not foster production of documentation and process definition. The thesis studies how agile methods face the concerns related to the management of process and product knowledge. The question is: are product and process understandable and transferable? The suitable contexts.. Agile methods seems to be not suitable to every context of software production. It is needed to understand to which extent the practice could fit to specific contexts of production. As discussed in the Chapter I, these issues will be analyzed with the regards to pair programming. Bibliography [11] Abrahamnson P., Warstab J., Siponen M. T., and Ronkainena J. New Directions on Agile Methods: A Comparative Analysis, Proc. of the 25 th International

35 35 Conference on Software Engineering (ICSE 03), Portland, Oregon, USA, 2003, IEEE CS Press. [12] Boehm B. Get ready for Agile Methods, with care, Computer 35(1), 2002, IEEE CS Press, pp [13] Bohem B. and DeMarco T. The Agile Methods Fray, Computer 35(6), 2002 IEEE CS Press, pp [14] Cao L., Mohan K., Xu P., and Ramesh B. How Extreme does extreme Programming Have to be? Adapting XP Practices to Large-scale Projects, Proc. of the 37 th Hawaii International Conference on System Sciences 2004, Honolulu, IEEE CS Press, [15] Cockburn A. Writing Effective Use Cases, Addison-Wesley, Reading, Massachusetts, [16] Cockburn A. and Highsmith J. Agile Software Development: The Business of Innovation, Computer 34(9), 2001, IEEE CS Press, pp [17] Cockburn A. Agile Software Development, The agile Software Development Series, ed A. Cockburn and J. Highsimth, 2001, Reading, Massachussetts: Addison-Wesley. [18] Cockburn A. and Highsmith J. Agile Software Development: The People Factor, Computer 34(11), 2001, IEEE CS Press, pp [19] Cohn M. and Ford D. Introducing an Agile Process to an Organization, Computer 36(6), 2003, IEEE CS Press, pp [20] Curtis B., Krasner H., and Iscoe N. A field Study of the Software Design Process for Large Systems, Communication of the ACM 31(11), 1988, ACM, pp [21] Debier G. Agile development in the old Economy, Proc. of the Agile Development Conference (ADC2003), Salt Lake, Utah, USA, 2003, IEEE CS Press, pp [22] Elssamadisy A. XP On A Large Project:A developer s View proc. of XP/Agile Universe, Raleigh, NC, USA, [23] Forrester J.W: Industrial Dynamics, Productivity Press, Cambridge, Massachuttets, [24] Greene B. Agile Methods Applied to embedded Firmware development, Proc. of the Agile Development Conference 2003, Salt lake, Utah, USA; IEEE CS Press, 2003.

36 36 [25] Grenning J. Launching extreme Programming at a Process intensive Company, IEEE Software 18(6), 2001, IEEE CS Press, pp [26] Highsmith J. Agile Software Development Ecosystems, The Agile software Development Series, ed. A. Cockburn and J. Highsmith, 2002, Boston, MA Addison-Wesley. [27] Huo M., Verner J., Zhu L., and Ali Babar Software Quality and Agile Methods, Proc. of the 28 th Annual International Computer Software and Applications Conference (Compsac 2004), IEEE CS Press, Hong Kong, China, [28] Lindvall M., Basili V., Boehm B., Costa P., Dangle K., Shull F., Tesoriero R., Williams L., and Zelkowitz M. Emprirical Findings in Agile Methods, proc of Extreme Programming and Agile Methods-XP/Agile Universe 2002, LNCS Springer-Verlag, Chicago, Illinois, USA, [29] Lindvall M., Muthing D., Dagnino A., Wallin C., Stupperich M., Kiefer D., May J., and Kahkonen T. Agile Software Development in Large Organization, Computer 37(12), 2004, IEEE CS Press, pp [30] Lippert M., Becker-Pechau P., Breitling H., Kornstadt A., Schmolitzky A., Wolf H., and Zullinghoven H. Developing complex Projects using XP with extensions, IEEE Software 36(6), 2003, IEEE CS Press, pp [31] Lycett M., Macredie R.D., Patel C., and Paul R.J. Migrating Aigle Methods to Standardized development Practices, IEEE Software 36(6), 2003, IEEE CS Press, pp [32] Manhart P. and Schneider K. Breaking the Ice for agile Development of Embedded Sofwtare: An Industry Experience Report, Proc. of the 26 th Internatinal conference on Software Engineering (ICSE 04), Edimburgh, Scotland, United Kindom, 2004, IEEE CS Press. [33] Poole C. and Huisman W. Using Extreme Programming in a Maintenance Environment, IEEE Software 18(6), 2001, IEEE CS Press, pp [34] Paulk M.C. Extreme Programming form a CMM Perspective IEEE Software 18(6), 2001, IEEE CS Press, pp [35] Rakitin S. R. Manifesto Elicits Cynism, Computer 34(12), 2001, IEEE Computer Society, pp. 4. [36] Rasmusson Introducing XP into Greenfield Projects: Lessons Learned, IEEE Software 20(3), 2003, IEEE CS Press, pp.21-28

37 37 [37] Reifer D.J. How Good are Agile Methods?, IEEE Software 19(4), 2002, IEEE CS Press, pp [38] Reifer, D.J., Maurer F., and Erdogmus H. Scaling Agile Methods, 20 (4), 2003, IEEE Software, IEEE CS Press, pp [39] Reifer D. J. XP and the CMM, IEEE Software 20(3), 2003, IEEE CS Press, pp [40] Schuh P. Recovery, redemption, and Extreme Programming, IEEE Software 18(6), 2001, IEEE CS Press, pp [41] Schwaber, K. And Beedle M. Agile Software Development with SCRUM, Prentice Hall, 2002, Prentice Hall, Upper Saddle River, NJ. [42] Skowronski V. Do Agile Methods Marginalize Problem Solvers? Computer, 37(10), 2004, IEEE CS Press, pp [43] Wernick P. and Hall T. The Impact of Using Pair Programming on System evolution: a Simulation-Based Study, proc.of the 20 th International Conference on Software Maintenance (ICMS 04), Chicago, Illinois, USA, 2004, IEEE CS Press, pp [44] Williams L., et al. Strengthening the Case for Pair Programming, IEEE Software 17(4), 2000, IEEE CS Press, pp

38 Chapter III: The State of the Art This chapter discusses the studies accomplished on pair programming and consists of two parts. The first one provides an overview about the method of investigation adopted for the research. This part points out which are the limitations, the recommended guidelines and the strength points of empirical software engineering. The second part analyses the state of the art upon the experiments on the pair programming. Empirical Software Engineering: the importance of the evidence What distinguishes the science from the art is the way in which we as manager and practitioners make decisions, by forming rational arguments from the evidence we have- evidence that comes both from our experience and form related research. The words of Shari Pfleeger [45] open the discussion around the role of evidential force in Empirical Software Engineering. The science in general, not only software engineering, proceeds following the scientific method, that consists of collecting data from a particular experience and strive to make those outcomes as general as possible. Usually the evidence brought by literature is not enough: the final end is to build arguments and make cases about what to do and what to avoid. A case for action has three parts: one or more claims that some set of properties are satisfied; a body of evidence supporting the claims; a set of arguments which link the claims to the evidence. Software engineers use to focus on the evidence rather than on the arguments; evidence can be used for generating hypotheses and for testing the hypotheses in order to better understand the technology to be used: 1. What does it mean that a technology works? The usage of the technology should be evaluated quantitatively. 2. What kind of evidence is needed to demonstrate that a technology works? 3. Who provides the evidence and who guarantees for the evidence? 4. If a technology works in a domain, what happens in the other domains? 5. How can the evidence inform our thinking about the social, economic, and political tradeoffs of using an imperfect technology? There are different kinds of evidence, catalogued by David Schum [71]:

39 39 1. tangible evidence, which can be examined directly to see what it reveals, and consists of objects, documents, measurements, and charts; 2. testimonial evidence, which is delivered by a person who reports on what transpired; it is divided in unequivocal testimonial evidence, involving hearsay or observations, and equivocal testimonial evidence, that is probabilistic; 3. missing evidence, which could be testimonial or tangible and it could help to provide arguments. 4. Accepted facts, which are authoritative records, as trouble tickets or acceptance tests results. A fundamental attribute of the evidence is the evidential force: it is the degree at which each piece of evidence contributes or detract from an argument. Two pieces of evidence together could enforce an argument or they could cancel one another. The process of determining the evidential force can be summarized in three steps: first, the kind of evidence which can be collected has to be established; second, the source and the process of evidence s creation must be identified; and, finally the degree of objectivity and veracity has to be understood. After having provided the evidence, it should be established the force of arguments. Schum suggests ways to evaluate it, as well as the evidence marshalling and the chains of reasoning, in order to understand how much it is possible to rely on the evidence. These methods aim at determining the probability that the hypothesis is true, given the evidence. Bloomfiled and Littlewood [45] consider evidential force in the nature of each piece of evidence. They note that the arguments with diversity in evidence are stronger than those with replications of the same evidence. They propose the multilegged argument, where each leg brings a different type of evidence, for example, one could use statistical hypothesis testing and the other one could use inferential claim based on logical proof. Useful observations concerning such an issue are brought about by Pfleeger. First, techniques exist for combining data form imperfect studies: if experiments can not be replicated, all is not lost. Second, including uncertainty in the decision-making process helps the evaluation of risks. Third, empirical investigation is a method not and end: as more studies are performed, the body of knowledge must be revisited, in order to see if conclusions could be hold. And fourth, using these approaches may improve the sophistication of software engineering investigation, culture, and experience. Kitchenham et al. [52] state that evidence-based software engineering (EBSE) should be considered as a mechanism to support and improve decisions about the adoption of technology. EBSE aims at improving decision making related to software development and maintenance, by integrating current best evidence form research with practical experience and human values. A technology is not expected to be universally good or bad, but more appropriate for a circumstance and less in others. This is the case of

40 40 investigation on pair programming, discussed in the thesis: as a matter of fact, the purpose is to understand points of weakness and strength of the practice. EBSE can be divided in five step, as fig. 3 shows. Asking an answerable question is the first step, and it consists of defining the problem to be solved (usually the Goal Question Metric is used) and then deriving the questions to answer. Well formulated questions usually have three components: the main intervention or action of interest; the context or specific situation of interest; the main outcomes or effects of interest. The question, in practitioner s context, should include the standpoints of the customer, the identification of the situation, the constraints and assumptions of the business strategy, the time required for answering. The step two is to find the best evidence; this step involves the search for sources of dependable information. The main one is the set of scientific journals, but it is not the only valid one. Other sources, provided by the customers, or found over the internet should be useful, but two properties must be taken into account and carefully evaluated. The first one is the degree of up-to-dating of information. In the technology s field the change has a very high rhythm and to use out of date information could entail heavy losses. The second one is the value of the evidence, but this concern has been largely discussed above. The third step is critically appraising the evidence and it deals with appraising the quality of the research. Scientific journals use a method for reviewing the papers for assessing the quality of the rigor in the experiment. This is not ever guarantee for dependability of the results: one important component of it is the number of replications of the experiment itself and often replicas are in a scarce number or they are unfeasible. The fourth step is to apply the evidence and it is not straightforward. To apply a specific experience in a practical context is very different form the usual transmission of knowledge trough handbooks, lecturers, books, and tutorials. The ease of applying evidence depends on the type of technology which is evaluated. Some technologies apply to the level of the individual developer (a testing technique), others to project level (a process for managing the software configuration). The fifth step is evaluating the performance, in which the application of the evidence must be evaluated. The outcome of this activity is to understand, first, if the evidence has been applied in the right way and, secondly what kind of evidence and lessons learned our experience produced and, obviously, the validity of these results in general terms. Zelkowitz and Wallace in [72] propose a reasoned taxonomy of the different experimental models for validating technology. The first distinction is about the methods to collect data: Observational: consists of collecting relevant data as the project develops. Historical: collecting data from projects which have already been completed using existing data.

41 41 Controlled Methods: it provides for multiple instances of an observation in order to provide for statistical validity of the results. Find the best evidence Asking an answerable question Source of dependable information Appraising the evidence Evaluate the quality of research Apply the evidence Evaluating the performance Evaluate the application of evidence Fig. 3. The Process of EBSE. Table 2 groups the methods of investigation in accordance with the three categories previously mentioned, and show the points of strength and weakness of each one. The research of this thesis will explain controlled synthetic and replicated experiments. Kitchenham et al. [57] provide useful guidelines for conducting empirical research in Software Engineering, which are summarized in the following. The first issue to deal with is the experimental context, that consists of three main elements: background information about the industrial circumstances in which an empirical study takes place or in which a new software engineering technique is developed; discussion of the research hypotheses and how they were derived; information about related research. It is important to define properly the experimental context because the objectives of the research have to be clearly presented and because the description of the research can be used by others, researchers or practitioners. These suggestions were taken into account when designing the experiments discussed in the thesis.

42 42 Table 2. Strenght and Weakness of validation models. Validation Category Weakness strength Method Project Monitoring Observational No specific goals Provide baseline for future; inexpensive Case study Observational Poor controls for later replication Can constrain one factor at low cost Assertion Observational Insufficient validation Can serve as a form for future experiments Field study Observational Treatment differs Inexpensive form of among studies replication Literature Historical Selection bias; Large available database; search treatments differ inexpensive Legacy Historical Cannot constrain Combines multiple studies; factors; data limited inexpensive Lessons Historical No qiuantitative data; Determine trends; Learned Cannot constrain inexpensive factors Static analysis Historical Not related to Can be automated; applies to development method tools Replicated Controlled Very expensive Can control factors for all the treatments Synthetic Controlled Scaling up; interactions Can control individual among multiple factors factors; moderate costs Not related to Can be automated; can be Dynamic Controlled development applied to tools analysis environment Simulation Controlled Data may not represent validity; Not related to development environment Can be automated; can be applied to tools; evaluation in safe environment Kitchenam et al. [56] discuss many of the problems related to data collection: the lack of standardization makes difficult to replicate studies or realize meta-analysis of studies of the same phenomenon. When the measures are quantitative and objective it is necessary to define the process of collection and the attributes of the measure, as well as the kind of scale, the error of measuring and so forth. When the measure is subjective things become more complicate: the skill and bias of the person determining the measure can affect the results. For data collected with questionnaires, it is necessary to report measures for the validity and reliability and other attributes affecting the conclusions [74]. Such information are provided for each experiment presented in the following chapters. Quality control procedures for assuring the dependability of the measures must be instantiated. Often, subjects who begin a study, drop out before the study is complete.

43 43 It is important to be sure that the dropouts do not affect the outcomes of the experiment. One way is to compare the subjects who abandoned with those who remained in terms of skills, age, experience. For the analysis two methods could be used: the classical analysis based on the testing of hypotheses and the bayesian analysis based on the use of prior information. The nature of data itself must be considered awkwardly: a step before starting the test of hypotheses consists of establishing if some data points have some unreasonable influence on the results and if some conclusions are due to outliers. The assumptions on data can invalidate the experimental outcome: a typical example is that some statistical tests require specific distribution of sample s data. These analyses on data are discussed for each experiment realized. Evidence on Pair Programming This section analyses some of the most representative empirical works on pair programming which literature counts. Table 3 summarizes the experiments data which are discussed in detail in the following. Table 3. Experiments' data. Source Building Pair Programming Knowledge through a Family of Experiments Experimentation s details Williams et al. Evaluated the effects of pair programming on student s learning Assignment NCSU: 3 projects Assignment UCSU: 5 projects Time: 3 months Team NCSU: 660 students Team UCSU: 555 students Place University of California Santa Cruz e North Carolina State University Results Quantitative. -students final marks; -students final projects; -enjoyment of the practice (thrughout questionnaires) The Case of Collaborative Programming Nosek executed an experiment for evaluating Pair Programming in industrial setting Time: 45 min Team: 15 professionals - Quantitative. Readability, functionality Qualitative. Morale of programmers.

44 44 Experimental Evaluation of Pair Programming The effects of pair programming on Performance in Introductory Programming Course Distributed Pair Programming: an empirical study Strengthening the Case for Pair Programming An experiment compares Personal Software Process (Humprey) with pair programming. It involved students. Time: 7 hours Team: 21 students Written code: 600 LOC circa Experiment compares pair programming with solo programming. Time: during lab sessions Team: 600 students The experiment compares the distributed pair programming supported by a tool and without the tool Time: not available Subjects: 76 students The experiment compares pair programming to traditional solo programming Time: not available Subjects: 15 students (control group)+ 28 students (experimental group) Poznan University of Technology, Poland University of California Santa Cruz University of California University of Utah Quantitative. -Number of submissions; -LOC (standard deviation); -LOC (average); -LOC per hour (average); -Time of total development(average, standard deviation). Quantitative. -students final marks; -lab programs final marks; Qualitative: Statistical analysis on the evidence of the number of responses to questionnaires. Quantitative: final exams grades. Qualitative: enjoyment and satisfaction of the pairs. Quantitative: tests passed, time for realizing the task.

45 45 Exploring the efficacy of distribiuted pair programming The experiment compares distributed and collocated pair programming to traditional solo programming Time: 5 weeks Subjects: 132 students North Carolina State University The metrics used for the analysis were productivity, in terms of lines of code per hour; and quality, in terms of the grades obtained by the students for the projects. Additionally, the students filled a survey regarding their experiences, while working in a particular category, the difficulties they faced, and the things they liked about their work arrangement. When does pair programming outperform two individuals? Pair and Solo industrial programmer groups are requested to complete algorithmstyle aptitude tests so as to observe the capability of solving algorithms in singles and in pairs. - Productivity and quality On understanding the compatibility of student pair programmers The experiment aims at understanding which factors increase the compability among students in pair programming Time: two semesters Subjects: 564 students North Carolina State University Quantitative: Skill levels; Myers Briggs Personality indicators;

46 46 In [58] Brian Hanks observes that there are many reasons which motivate the distribution of teams. A growing number of workers use telecommunication, and many organizations have offices in multiple locations, and geographically distributed project teams. This trend conflicts with the collocation requirement of pair programming. The author developed a tool that enables distributed pair programming, and investigates its effectiveness with a controlled experiment. The tool was based on the open source tool named Virtual Network Computing for sharing the computer s screen of the pair. The author modified the tool in order to provide a second cursor driven by one of the pair s members, named the navigator; by this way the navigator can point areas on the driver s screen without affecting the other programmer (driver) s state. The experiment involved 76 students, even if they were neither volunteer nor paid for taking part to it. The experiment consisted of two sections with different assignment of programming. Volunteer pairs were randomly assigned to one of the two groups: one of the two were required to work as colocated and the other one as distributed. The volunteer students also filled out surveys at the beginning and at the end of the experiment, and the survey contained data about students age, experience, and opinions about pair programming. They also answered other questions when they turn in their programming assignments, about how much time they spent driving, navigating, and working alone. When turning in programming tasks, each student was asked to respond to the following question: On a scale from 0(not all confidence) to 100 (very confident), how much confident are you in your solution to this assignment? There were not statistically significant differences in student confidence in either section on any of programming assignments, either before or after the students began using the tool. Moreover students in the control and experimental groups performed equally well on the final exam. Although it is not statistically significant, students who used the tool performed better on the exam than the students in the control group. Students in both the experimental groups were also equally confident in their programming solutions. The performance was evaluated in terms of final exam scores. The students grades on their programming assignments would provide a better indication of the impact of distributed pair programming on student performance. This analysis remains to be done. Nosek in [66] investigates the effects of collaborative programming on the work of professionals in terms of effectiveness of problem solving and confidence in the solutions. The collaborative programming requires that two developers work jointly at the same algorithm or piece of code. The author evaluated the performance of problem-solving, in terms of readability of the proposed solution, that is the degree to which the problem solving strategy could be determined from the subject s work. Readability is a component of overall score since it is possible for a subject to use a reasonable strategy and to use programming language structures appropriately and yet fail to solve problem, in terms of correct output.

47 47 The second variable was the functionality of the proposed solution, that is the degree to which the strategy accomplishes the objectives stated in the problem description. The author also makes four predictions: 1. programmers working in pairs will produce more readable and functional solutions to a programming problem than will programmers working alone; 2. groups will take less time on average to solve the problem than individuals working alone; 3. programmers working in pairs will express higher levels of confidence about their work and enjoyment of the process; and 4. programmers with more years of experience will perform better than programmers with fewer years of experience. The subjects were 15 full time system programmers from a program trading firm, working on system maintenance of three Unix networks and a large database running Sybase. They programmed in C language. The subjects were asked to write a script that performs a database consistency check with the output for errors to be written to a file. To evaluate the readability variable, the subjects were asked to properly comment on each of the processes within the script they were programming. All the subjects were given 45 minutes to solve the problem. A stopwatch was used to time each group and individual; each set of materials was evaluated on the degree to which it solved the problems and on the readability of the solution. The results provide additional evidence that collaboration improves the problemsolving process. The qualitative data also provides some interesting insights. The majority of programmers were somewhat skeptical of the value of collaboration in working on the same algorithm/program module, and thought that the process would be not enjoyable. Collaboration changed their mind on such an issue and improved also their productivity. Their programming solution was better than previous scripts written for the company. It is costly to run these scripts and efficiently written scripts are considered so difficult to create that the company hires expert outside consultants to write them. However, the scripts written by this programming team was twice as efficient as previously purchased scripts. In [65] the authors describe an experiment executed at the Poznan University of Technology. The aim was to evaluate the efficiency of pair programming, by comparing it with individual programming in the process proposed by Humphrey and known as the Personal Software Process (PSP). The authors decided to use a set of assignments proposed by Humprey and to measure both the development time and the number of defects. The experiment took place during the laboratory classes at the 4th year of computer science degree. The experiment included, actually, three processes: a baseline personal process as introduced by Humphrey, an Xp process with pair programming and a variation of Xp process with a single programming. The subjects, 21 students, were divided in three groups, according to the three processes.

48 48 They formed the group in order to have a grade point average similar for each group. The students could choose the programming language, but most of all chose the C++. There was almost no difference in development time between the two versions of Xp: this suggested that pair programming is rather expensive as technology and this contrasts with the results of Williams and Nosek, discussed in this section. The authors note that PSP takes a longer time than both the versions of Xp and the waterfall approach: in this context it seems to be the less efficient. They analyzed the standard deviation of development times and programs size for Xp with pair programming: it is smaller than for Xp and PSP, and consequently pair programming is more predictable than individual programming. Another finding concerns the efficiency: Xp without pair programming is better, while PSP and traditional Xp are comparable. This can be attributed rather to the flexible process used by XP without pair programming. The pair programming used in Xp helped reduce the rework if compared with the other two processes. Laurie Williams realized a valuable experiment in pair programming, providing some pieces of evidence which have been the reference for the specialized literature. [75] refers the outcomes of Nosek which are discussed above and to the work of Larry Constantine, who observed pair programmers producing code faster and freer of bugs than never before. Moreover, Williams claims that the Xp founders credit much of the success of Xp to the use of pair programming both for experts and for novice programmers. Pair programming is perceived by programmers as successful because of pair analysis and pair design: pairs can consider many more possible solutions to a problem and converge more quickly on which solution to implement. With partners reviewing and questioning decisions, the exploration of design alternatives increases, and a huge number of defects are prevented right form the start. It seems that the pair usually split for running tests and that the components merge again when some defects are found in order to remove it in the best way. At the University of Utah, senior software engineering students participated in a structured experiment. The purpose was to validate quantitatively the anecdotal and qualitative pair programming results observed in industry. All students attended the same classes, received the same instruction and participated in class discussion on the pros and cons of pair programming. Fifteen students formed the control group, in which all worked individually on all assignments. Twenty eight students formed the experimental group, in which all worked in two person, collaborative teams on the same assignments as the individual. The pairs programs passed more of the automated test cases; their results were more consistent; some individual did not hand in a program or handed in late; pairs handed in their assignment on time. The author identified three benefits of pair programming: a greater rate of defects removal, a clearer design, and an effective training. The author also measured the time required to accomplish the task, obtaining this result: the individuals needed a time longer than that spent by the pairs for

49 49 accomplishing the task. By working in tandem the pairs completed their assignments 40% to 50% faster. The authors highlight a further advantage of the technique: unlike many other techniques for improving productivity and quality, pair programming is one that the programmers actually enjoy. Pair programming improves the job satisfaction and the overall confidence in their work. A problem of the technique stands in the personalities of the pair s components: this is the most hard hurdle to overcome. Baehti et al. [46] investigate the relationships between pair programming and distribution. By distributed pair programming, the authors mean that two members of the team synchronously collaborate on the same design or code but from different locations. They work at the same code together and at the same time, by sharing the desktop, and they can view the changed brought by the other on the proper screen. The experiment was conducted in a graduate class, Object-Oriented Langauges and Systems, taught by Edward Gehringer at North Carolina State University. At the end of the semester, all students participate in a 5 week long team project reviewermapping algorithm of programming projects like updating a GUI to use JSP, implementing a dynamic reviewer-mapping algorithm for peer review, simulating the LC-2 architecture, or building a GUI for DC motor control. The students were 132. Four teams were formed: collocated team without pairs; collocated team with pairs; distributed team without pairs; and distributed team with pairs. During the experiments the students used a web-based tool for collecting developments metrics; for communication they shared the display with NetMeeting, PCAnywhere and VNC. They also used Yahoo Messenger. The metrics used for the analysis were productivity, in terms of lines of code per hour; and quality, in terms of the grades obtained by the students for the projects. Additionally, the students filled a survey regarding their experiences, while working in a particular category, the difficulties they faced, and the things they liked about their work arrangement. The results indicate the following. Distributed pair programming in virtual teams is a feasible way of developing object-oriented software. The results of the experiment indicate that software development involving distributed pair programming is comparable to that developed using collocated pair programming or virtual teams without distributed pair programming. The two metrics used for this comparison were productivity (in terms of lines of code per hour ) and quality (in terms of grades obtained). Collocated teams did not achieve statistically significantly better results than distributed teams. Feedback from the students indicates that distributed pair programming fosters teamwork and communication within a virtual team. The authors in [62] consider that some works in literature emphasize the benefits of collaboration in programming tasks. Basically, the discussions about the different perspectives from which the two programmers should analyze the problems and the continuous comparison of the two diverse and personal experiences should result in a greater capability of solving problems.

50 50 The main consideration that stands at the basis of their research is the following: knowledge is commonly socially constructed, through collaborative efforts toward shared objectives or by dialogues and challenges brought about by differences in person s perspectives [68]. The authors also observe that anecdotal evidence deriving from industry demonstrates the benefits of collaboration in programming. Conversely, academic experiments are targeted to support the benefits of collaborative programming concerning several aspects of relevance: knowledge transfer, defects removal s rate, enjoyments, and team building. The authors wish to enforce this stream of research and they realized a research program, founded by the National Science Foundation, to assess the effectiveness of pair programming on the performance an retention of women in computer science and related fields. The authors expected that programmers who worked in pairs would produce better programs than who worked independently. During the academic year, data was gathered from approximately 600 students enrolled in four sections of an introductory programming course at the University of California Santa Cruz designed for CS, ISM, and CE majors. The results were: pair programming helps obtain the highest scores in the programs assignment and a greater score at the final exam. The authors conclude that the pair programming could be used effectively in an introductory programming class. Williams et al. [76] compared two experiments run in two different universities and including the overall number of 1200 students. Educators at the University of California Santa Cruz (UCSC) and North Carolina State University (NCSU) have experimented with pair programming in introductory undergraduate programming courses. The hypotheses tested were the following ones: H 1 An equal or higher percentage of students in paired labs will complete the class with a grade of C or better compared to solo programmers. H 2 Students who work in pairs will earn exam scores equal to or higher than solo programming students. H 3 Students who complete programming projects using pair programming produce better programs than students working alone. H 4 Students in paired labs enjoy pair programming and will have a positive attitude towards collaborative programming settings. H 5 The use of pair programming in an introductory computer science course does not hamper student performance in future solo programming courses. H 6 Students participating in pair programming will be significantly more likely than solo programmers to pursue computer science-related majors one year later.

51 51 The following table summarizes the main differences between the two experiments: Table 4. Comparison between the two experiments by Williams. NCSU UCSC 15 week semesters 10 week quarters Two 50 minute lectures per week. Lecture size during study: One three-hour closed lab per week. Specific closed-lab assignment completed weekly. Mandatory pairing for programming project began during lab. Three programming projects (four in Spring 2002) completed outside of lab. Only freshmen and sophomores included in study data. Two 105 minute or three 70 minute lectures per week. Lecture size during study: One 90 minute open lab per week. No specific lab assignment. Students work on programming projects. Four or five programming projects completed outside of lab (and worked on during open lab). All enrolled2 students included in study data. 660 students in the study 555 students in the study Exams were completed Exams were completed individually. individually. Grade based on two midterms, a final, lab assignments, and programming projects. Pairing and non-pairing sections during same term. Two instructors taught at least one paired and one solo section each term and one instructor taught only a solo section. Grade based on four biweekly quizzes, a final, and programming projects. Pairing and non-pairing sections in different terms. One instructor taught a paired section one term and a solo section in a different term. Two different instructors taught the other two pair sections.

52 52 Partners randomly assigned and changed every 2-3 weeks. Partners assigned from student preference list of three and remained the same all quarter. The experiments brought about the following results. An equal or higher percentage of pair programming students completed the CS1 class with a grade of C or better when compared with solo programmers. Student participation in pair programming will lead to at least similar performance on the exams, when compared with solo programming students. Students who use pair programming on programming projects will produce better programs than solo programming students. If pair programming is required only for a closed lab there is no discernable impact on programming projects produced outside of the closed lab. Students in paired labs have a positive attitude towards collaborative programming settings. Students who use pair programming in an introductory Computer Science course are not hampered in future solo programming courses. Students who used pair programming in an introductory programming course are significantly more likely than solo-programming students to pursue Computer Science related majors one year later. Recently, the target of pair programming investigation is turning to learning and knowledge transfer. Williams and Kessler [77] found that pair programming fosters knowledge leveraging between the two programmers, particularly tacit knowledge. Williams and Upchurch [78]examine the ways pair programming may enhance teaching and learning in computer science education. Students were able to complete programming assignments faster, with higher quality, and appeared to learn faster. McDowell et al. [62] investigate the effects of pair programming on student performance in an introductory programming class. The results show that students who worked in pairs produce better programs. The research concludes that students perceived pair programming as valuable to their learning. Lui and Chan [61] conducted an experiment in order to understand when pair programming outperforms traditional solo programming when working on computer algorithms in terms of quality and productivity. Pair programming excels in procedural problems and deduction questions, which are key elements in programming algorithms. The authors conclude that pair programming achieves higher productivity when a pair writes a more challenging program that demands more time spent on design.

53 53 The finding explains that it is effective to write a program in pair for rapid changing requirements because it demands that programmers concentrate on changing design. In addition, this paper confirms William s results. The novelty of this work is to point out when a pair outperforms two individuals. It seems that if the problem is new for developers, pair programming can assure an higher productivity. Pair programming is very helpful to design algorithms faster and better through reasoning together about the new problem. Katira et al. [55] were interested in seeing if the compatibility of pair programming students could be improved They executed an experiment which required the involvement of over 564 undergraduate and graduate students at North Carolina State University. They wished at understanding which were the factors which affected the compatibility in the pairs. The research was performed during the following courses: Introduction to Programming-Java, undergraduate Software Engineering and graduate Object-Oriented. Students were required to work in pairs with different assigned partners for each one of the tasks. The tasks lasted two-three weeks throughout the semester. The students had to change the partner for each one of the programs. Only the OO students had the option to work in a pair or solo. Pairing students were assigned their partner by teaching assistant after indicating their preference on a message board. They found that 90% of pairs report that they and their partners work compatibly. The authors concluded that the pairs seem to work better if they are formed randomly, without considering personality type, skill level, or self-esteem. The most relevant result is that students prefer to pair with someone they perceive to be of similar technical competence. However, educators cannot predict this perception nor can pairs be formed based on this fact. The data suggests that pair compatibility in beginning courses may increase if pairs are formed by joining students of dissimilar personality type. Graduate student pairing could be formed by grouping together students of similar actual skill level such as midterm scores. Further investigations In this section further researches on pair programming are discussed, even if they do not apply empirical method. Simulation studies are often used for understanding long term implications which could be determined by a change in the technology, in the process, or in the method for developing software. Wernick and Hall [73] used System Dynamics [53] to foresee the effects growth of adding pair programming to a traditional software process on software system. The model tracks changes in numbers of requirements met over time by a software product, comparing trends with and without pair programming.

54 54 The simulation based approached offers the advantage of calculate the possible effects of the introduction of a new practice in the process without waiting the end of the process. The software development process is viewed as a mechanism to convert requirements which need to be met into requirements which have been met and fielded to users. The results of the model consist of an increment of costs due to the introduction of the pair, but the overall gain seems to outweigh the loss. The process provides an opportunity for greater long-term system growth and thus system longevity. An additional benefit is that the functionality can be delivered sooner at the customer. The Books Law [50] asserted adding manpower to a late software project makes it later : this statement discourages project managers who usually enlarge the project team, when project is late. The reason of the failure is that adding people means increasing training costs, assimilation time and intercommunication overhead. Organizations shared anecdotal experiences of contradicting the Brooks law throughout pair programming. Authors in [79] used the mathematical model of Stutzke for studying the effect of pairing on the Books law. This model analyzed the process and costs of assimilating new team members, including the costs associated with the diversion of his or her mentor from the project task itself. Using pair programming when project is late could help to avoiding overrunning if adding manpower. The authors also analyzed some surveys and obtained the evidence that organizations usually perform pair rotation. This can ease the training/mentoring burden and can allow new team members to learn more about the overall project. In [63] authors study to which degree the performance of a pair is correlated with the programming experience of the pair and the pair feelgood factor. They found that pair performance is not correlated with a pair s programming experience. They also found that the pair feelgood factor is a candidate driver for the pair performance. The authors recall that the findings are only preliminary, if considered the small size of data set. The authors suggest that it is helpful to investigate the feelgood factor. Jensen experienced [60] that pair programmers can work together effectively and efficiently to produce a quality product. Prior programming experience does not affect the results of pair programming tasks. There are initial situations, especially with a team of equal experience and ego, where disagreements arise over who will be the driver. Those situations are generally transient. The benefits listed in the results section overwhelmed any personality issues that arose. The second major benefit demonstrated in this experiment is also very important. Repairing defects after developments is much more expensive than uncovering and fixing the defects where and when they occur. The benefits of developing and delivering a stable product faster, reducing maintenance costs, and gaining customer satisfaction certainly minimize the risk of using pair programming teams.

55 55 In industry, software developers generally spend 30% of their time working alone, 50% of their time working with one other person and 20% of their time working with the other person. Conversely, programmers learn to work alone in academic courses. In addition, some studies show that cooperative and collaborative pedagogy are beneficial for students. Research results indicate that pair programmers produce higher quality code in about half the time when compared with solo programmers. Nagappan et al. [64] realized a study providing the following findings: Pair programming helps in the retention of more students in the introductory computer science stream. Students in paired labs have a more positive attitude toward working in collaborative environments; this should ultimately help the student in his/her professional life. Pair programming in an academic environment reduces the burden. From the results we have obtained regarding the tests and the projects, we can conclude significantly that pair programming among students is in no way a deterrent to student performance. In [70] the authors analyze the job satisfaction produced by the pair programming. They think that this could be a factor of success because satisfied workers are more productive and build better systems. However, providing a relationship between pair programming and job satisfaction does not imply at all any relationship between pair programming and any of the other expected effects on it, especially quality and productivity. The authors state that this could be kept well in mind, both to avoid wrong scientific conclusions and induce companies in wasting resources. In order to reach their research goal, the experimenters used a questionnaire administered via the Internet in a period of six months. 108 answers were analyzed, 54 of developers using pair programming and 54 of developers not using pair programming. The investigation resulted that pair programming has a significant, positive influence on the satisfaction of developers. This comes with increased communications between developers, speed of communication of design changes, and organization of meetings. In industry, pair programming has been sporadically used for decades. Only in recent years, educators have begun encouraging the use of pair programming in the classrooms. The term pair rotation is used to denote when students pair with different classmates throughout the semester. The authors of [69] applied pair rotation in two undergraduate computer science courses. At the end of the semester, the students shared their perception via an administered survey. Additionally the teachers, who are also the authors of the papers shared their observations via structured meetings. The authors found that pair rotation is beneficial to both teachers and students. Students have the opportunity to learn form several partners and change partners in situations where the current pairing is ineffective. Teachers benefit from pair rotation by effectively dealing with dysfunctional pairs and inactive students via peer evaluation.

56 56 The research plan The main concerns emerging from the state of the art are the following: the analyses accomplished are mainly qualitative instead of quantitative; the practice has not been studied within its entire complexity, e.g: application to different phases of software process, different aspects of the produced software s quality, application to different production contexts (software embedded, product lines, component base engineering, rapid development, prototyping, and distributed processes). At the author s best knowledge, there is not a mature body of knowledge validating the expected benefits of pair programming by empirical analyses, nevertheless some valuable studies have been accomplished in this direction. As some authors remark [45], [54], [76], there is a need for industrial case studies and field experiments, in order to make mature the state of the art. The research plan is headed to understand, by empirical investigation, the three aspects of pair programming discussed in chapter I: economics: does pair programming provide a good ratio costs/benefits? Knowledge management: does it sustain a continuous transfer of (process and product) knowledge within the team s members? Scalability of the practice: to which extent can pair programming fit specific contexts of development which are currently spreading in software industry? A research methodology consisting of four steps was adapted: 1. Define research questions: to identify the candidate quality factors to be evaluated and, consequently how to measure them. Not every factor can be quantitatively analyzed because either objective metrics do not exist or the measuring process is unfeasible. 2. Controlled experiments in university classes: to detect the defects at the experimental design and remove them. On one hand, this kind of study has many limitations [48], [59], such as: the samples are often scarcely representative of the population; the experimental package does not reflect completely the real world systems; and, finally, observations are made on a temporal window instead of analyzing the overall process. On the other hand, during the conception of an experiment usually it is not possible to foresee some flaws to be erased or features to be investigated in depth which rise only with the execution. 3. Controlled experiments with professionals: to realize the convergence of industry and researchers efforts toward the same goals. By working with professionals, it is possible to obtain two advantages: firstly, convincing them

57 57 about the importance of experimental analysis of their work, their commitment and support in the research can be leveraged; and secondly, understanding the actual interests of the enterprise, which are sometimes hidden to researcher s perspective, can help designing experiments more realistic with respect to the goals of investigation and the context. 4. Field experiments: obtain dependable results, by collecting data from the real world. The research plan of this thesis covers the first two points in order to debug and refine the experimental design and material for executing properly the following two steps, which the post-doc research plan will consist of. Research Questions: factors of success Controlled Experiments with Students: detect (and remove) bugs si Bugs? no Phd thesis Post-doc Controlled Experiment with Professionals: enterprise s confidence Field Experiments: dependable results transfer Fig. 4. Research Methodology. The following chapters describe the experiments realized: The ratio costs/benefits. Two experiments were dedicated to this purpose and are discussed in the Chapter IV. The first one studied the productivity of a system developed individually during a programming course. The second was larger in scope and in time spent: it consisted of developing a system for requirements traceability in a period of three months It permitted to collect different measures about effectiveness and efficiency of pair programming and, in addition, also useful qualitative observations from the subjects. Specific benefits of the practice. The Chapter V investigates how pair programming can sustain the transfer of knowledge among the team members. This experimentation owns another characteristic: the practice of pair programming was applied to the design phase (and it was named pair

58 58 designing ). This choice is motivated by considering that the design phase requires to handle knowledge at different levels of abstraction: this increases the need for an effective management of knowledge when dealing with the design. The conjecture studied is that pair designing could be considered a means for diffusing and enforcing effectively knowledge. The suitable contexts. In the Chapter VI an experiment and a replica were discussed. The experimentation aimed at analyzing which effects the distribution of pairs members produces on the practice s outcomes. Bibliography [45] Abrahamsson P., and Koskela, J. Exterme Programming: A Survey of Empirical Data from a Controlled Case Study. In Proc. of International Symposium on Empirical Software Engineering, Redondo Beach, CA, IEEE CS Press, [46] Baheti P., Gehringer E., and Stotts D. Exploring the efficacy of distributed pair programming, Proc. of the Second XP Universe and First Agile Universe Conference on Extreme Programming and Agile Methods - XP/Agile Universe, Chicago, IL, Springer LNCS, [47] Basili V. R. Software modeling and measurement: The Goal/Question/Metric paradigm. Technical Report CS-TR-2956, Department of Computer Science, University of Maryland, College Park, MD 20742, [48] Basili V.R., and Lanubile, F. Building Knowledge through family of experiments, IEEE Transactions on Software Engineering, 25(4), 1999, IEEE CS Press, [49] Bloomfield R., and Littlewood B. Multi-legged Argument: the Impact of Diversity upon Confidence in Dependability Arguments, Proc. of the Int l Conference on Dependable Systems and Networks (DSN 03), IEEE CS Press, [50] Brooks F. P., The Mythical Man-Month. Addison- Wesley Publishing Company, Reading, Massachussetts, [51] Canfora G., Cimitile A., and Visaggio C.A. Lessons learned about Distributed Pair programming: what are the knowledge needs to address?, Proc. of Knowledge Management of Distributed Agile Process-WETICE, Linz, Austria, IEEE CS Press, 2003.

59 59 [52] Diba T., Kitchenham B. A., and Jorgensen M. Evidence Based Software Engineering for Practioners, IEEE Software, 22(1), IEEE CS Press, pp.58-65, [53] Forrester J.W. Industrial Dynamics, Productivity Press, Cambridge, MA., [54] Gallis, H., Arisholm, E., and Dyba, T. An Initial Framework for Reasearch on Pair Progamming. Proc. of International Symposium on Experimental Software Engineering, Rome, Italy,IEEE CS Press, [55] Katira, Neha, Laurie Williams, Eric Wiebe, Carol Miller, Suzanne Balik, and Ed Gehringer. On understanding compatibility of student pair programmers. Proc. of the 35th SIGCSE technical symposium on Computer science education, Norfolk, Virginia, USA, ACM, [56] Kitchenham B., Hughes R.T., and Linkman S.G. Modeling Software Measurement Data, IEEE Transaction on Software Engineering, 27(9), IEEE CS Press, 2001, pp [57] Kitchenham B., Plfeeger S.L., Hoaglin D.C., and Rosenberg J. Preliminary guidelines for Empirical Research in Software Engineering, IEEE Transactions on Software Engineering, 28(8), IEEE CS Press, 2002, pp [58] Hanks B.F. Distributed Pair Programming: An Empirical Study, Proc. of Extreme Programming and Agile Methods - XP/Agile Universe 2004: 4th Conference on Extreme Programming and Agile Methods, Calgary, Canada, August 15-18, LNCS Springer-Verlag, [59] Höst, M. Regnell, B. and Wholin, C. Using Students as Subjects A comparative Study of Students & Professionals in Lead-Time Impact Assessment, proc. of 4th Conference on Empirical Assessment & Evaluation in Software Engineering (EASE), Straffordshire, UK, [60] Jensen, R. W. A Pair Programming Experience, The Journal of Defense Software Engineering, March, [61] Lui K.M. and Chan K.C.C. When Does a Pair Outperform Two Individuals?, Proc. of XP 2003, LNCS Springer-Verlag, , [62] McDowell C., Werner L., Bullock H., and Fernald J. The Effects of Pair- Programming on Performance in an Introductory Programming Course, Proc. of the 33rd SIGCSE technical symposium on Computer science education, Cincinnati, Kentucky, USA, ACM, [63] Muller M.M. and Padberg F. An Empirical Study about the Feelgood Factor in Pair Programming, 10th International Symposium on Software Metrics (METRICS'04), Chicago, Illinois, USA, IEEE CS Press, 2004.

60 60 [64] Nagappan N. L., Williams M., Ferzli B., Wiebe K., Yang C., Miller, and Balik S. Improving the CS1 Experience with Pair Programming, Proc. of SIGCSE 2003, ACM, [65] Nawrocki J. And Wojciechowski A. Experiment Evaluation of Pair Programming Proc. of ESCOM. [66] Nosek J.T. The case for collaborative programming, Communication of ACM, 41(3), 1998, ACM, [67] Pfleeger S.L. Soup or Art? The Role of Evidential Force in Empirical Software Engineering, IEEE Software, 22(1), IEEE CS Press, 2005, pp [68] Salomon G. Distribution Cognition: Psychological and Educational Considerations. Cambridge Press,Cambridge, [69] Srikanth H., Williams L., Wiebe E., Miller C., and Balik S. On Pair Rotation in the Computer Science Course, proc. of Conference on Software Engineering Education and Training 2004, Norfolk, Virginia, USA, IEEE CS Press, [70] Succi G., Marchesi M., Pedrycz W., and Williams L. Preliminary Analysis of the Effects of Pair Programming on Job Satisfaction, proc. of Fourth International Conference on extreme Programming and Agile Processes in Software Engineering (XP2002), Alghero, Sardinia, Italy, LNCS Springer- Verlag, [71] Schum D.A., Evidential Foundation of Probabilistic Reasoning, Wiley- Interscience, Reading, Massachusetts [72] Zelkowitz M.V. and Wallace D.R. Experimental Models for validating Technology, Computer 31(5), 1998, IEEE CS Press, pp [73] Wernick P. and Hall T. The Impact of Using Pair Programming on System Evolution: a Simulation-Based Study, Proc. of 20 th IEEE Int l Conference of Software Maintenance (ICSM 04), Chicago, Illinois, USA, 2004, IEEE CS Press. [74] Wilkinson L., and Task Force on Statistical Inference, Statistical Methods in Psychology Journal: Guidelines and Explanations, Am. Psychologist 54(8), 1999,pp [75] Williams L., Kessler R.R., Cunningham W., and Jeffries R. Strengthening the Case for Pair Programming, IEEE Software 17(4), 2000, IEEE CS Press, pp

61 61 [76] Williams L., McDowell C., Nagappan N., Fernald J., and Werner L. Building Pair Programming Knowledge through a Family of Experiments proc. of the Int l Symposium on Empirical Software Engineering, Rome, Italy, 2003, IEEE CS Press. [77] Williams L. and Kessler B. The Effects of "Pair-Pressure" and "Pair- Learning", Proc. of 13th Conference on Software Engineering Education and Training, Austin, Texas, USA, 2000, IEEE CS Press. [78] William L. and UpChurch R. L. In Support of Student Pair-Programming, Proc. of the thirty-second SIGCSE Technical symposium on Computer Science Education, Charlotte, NC, USA, 2001, ACM. [79] Williams L., Shukia A., and Anton A:J. An Initial Exploration of the Relationship Between Pair Programming and Brooks Law, Proc. of Agile Development Conference (ADC 04), Salt Lake City, Utah, USA, 2004, IEEE CS Press.

62 Chapter IV: Productivity of Pair Programming One of the main concerns with agile methods is that the lack of a detailed plan and of a precise definition of a process model can affect the performance of the projects. In the pair programming practice the two components usually switch the roles when necessary: the driver becomes the observer and the observer becomes the driver. The role switch should occur freely when the driver ignores how to proceed: usually when the observer takes the control of the keyboard, he has the clear idea of the strategy to adopt for overcoming the problem. This is due to the activity of the observer, because she has to continuously detect tactical and strategic defects in order to remove them; this continuous review puts her in the position to form the idea for getting out the pitfall where the driver fell. As a result, pair programming should reduce the delay times present in the solo programmer s work and necessary to overcome a problem, disappear in the case of the pair. The conjecture is that pair programming reduces the discontinuities of work in the tasks and makes stable the throughput of every pair. Two experiments were realized at the University of Sannio; characterization, results and conclusions are discussed in the following sections. The Experiment on Productivity It is immediate to wonder whether pair programming means to sustain the cost of two developers for the same work that one developer could perform. Some papers (see Chapter III) discuss this concern, and outline that pair programming decreases the effort with respect to a solo programmer with the additional benefit to increase also the quality of the code produced. The novelty of this study stands in the method of investigation adopted. The effects of pair programming on each single programmer were analyzed, by comparing the performances when the developer programs in pair and when the same developer programs as solo. Conversely, literature reports researches in which the outcomes of pairs are compared with those of solo programmers, but, at my best knowledge, there is not consolidated results about the effect of pair programming on the performances of an individual programmer. An experiment was realized in order to answer the research goal, that can be formulated as follows: analyze pair programming sessions for the purpose of evaluating with respect to its capability of reducing the developing time of each single

63 63 programmer from the point of view of the researchers in the context of students groups with different degrees. Experiment s Characterization This section illustrates the experiment realized in order to achieve the research goal. Definition The experiment was executed with the purpose of answering the research question: RQ1 Does pair programming increase the productivity of the developer? The following null hypothesis was tested: Ho: pair programming does not affect the development time spent by each programmer with respect to the solo programming. µ pair_time = µ solo_time The alternative hypothesis is: H 1 : pair programming affects the development time spent by each programmer with respect to the solo programming. µ pair_time µ pair_time Characterization Subjects. The experiment was executed with the volunteer collaboration of the students of the Master of Technologies of Software (MUTS), an high education university course for post-graduates, at University of Sannio( Students of MUTS own a scientific graduation (engineering, mathematics, physics). The course provides the basic education in computer engineering (operating systems, programming languages, network, database, and software engineering) and the students attend theoretical classes and lab sessions; they develop a large and complex project in connection with an enterprise, participate to seminaries from international experts, perform a three month stage in software companies. Each subject performed both pair programming and solo programming alternatively. Variables. The dependent variable is the time and it was evaluated by a time sheet that the subjects filled in during the experimental runs. Rationale for the sampling from the population. Students of the MUTS course are suitable for such an experiment because they were attending the course of object oriented programming.

64 64 Assignment. The students were required to develop two applications, one for each run. The Process. The students formed the pairs by themselves and the pairs remained the same in both the runs. In each run the students performed pair programming as well as solo programming, working alternatively at two different applications. The students used ECLIPSE as development environment because they were trained to use it in the courses. The Table 5 shows the experimental design. Table 5. Experimental Design. Subject Treatment 1 st Run 2 nd Run Pair Solo Pair Solo S j.1 1.A 1.B 2.A 2.B S (j+1).1 1.B 1.A 2.B 2.A S j.2 1.A 1.B 2.A 2.B S (j+1).2 1.B 1.A 2.B 2.A The Table 5 shows that the subjects S j.1 and the subject S j.2, make up the j-th pair: they had same assignment (1.A in the 1 st run and 2.A in the 2 nd run) as pair and as solo(1.b in the 1 st run and 1.B in the 2 nd run) in both the runs. J varied from 1 up to the total number of pairs, that is 12, for an amount of 24 subjects. Analysis of data The Table 6 reports the main descriptive statistics in hours. Moda stands for the most frequent value in the sample, and the last row indicates the ratio between standard deviation and the mean in order to understand the normalized interval of variation of the values in the sample. The descriptive statistics shows that when the subjects worked in pairs performed better than when they worked as solo in each run. This appears evident by the mean value: in the first run the subjects working as solo spent 61% more than the subjects working in pair; whereas in the second run the increment was the 3%. The huge difference between the runs (61-3) can be explained as follows: the assignment of the second run was more complex than that of the first run. Thus, the subjects spent more time in the second run for establishing the strategies to follow. By observing the other values, the difference between the effort spent in pair and that as solo is more significant. The most frequent value for the solos in the first run is two times the value of subjects in pairs, and in the second run the ratio is 2,31. A similar ratio is maintained for the maximum value. The standard deviation is smaller for pairs than for solos samples: this suggests that working in pairs tends to limit the effort in a certain interval: that is the effort of pairs is much more stable and predictable than that of solos. In both the runs the ratios between the standard deviation and the mean is smaller for pairs than for solos.

65 65 Table 6. Descriptive Statics of the Experiment. Statistical Treatment variable Pair 1 st run Solo 1 st run Pair 2 nd run Solo 2 nd run Mean 1,88 3,03 2,52 2,60 Max 2, ,1 Min 1,3 1,4 1,3 1 Std Dev 0,31 1,17 0,97 1,26 Mode 2 4 1,3 3 Std Dev 0,16 0,39 0,38 0,48 /Mean The Figure 5 compares some of the most relevant indicators. Statistic mean std dev moda 0 pair 1st run solo 1st run pair 2nd run solo 2nd run Treatme nt Fig. 5. Main Statistics of the Experiment. Another interesting indicator is the difference between the time each subject spent working solo and the time each subject spent working in pair; the descriptive statistics for this indicator is reported in Table 7. More specifically, the indicator is : t solo,i,k -t pair,i,k, where t solo,i,k is the time the i-th subject spent working as solo in the k-th run, whereas t pair,i,k is the time the i-th subject spent working in pair in the k-th run. The negatives row indicates the percentage of the negative values, that is the number of subjects who spent a longer time by working in pair. The percentage is relatively low; consider that the minimum is 0,85 hours for the first run and 1,6 hours for the second run.

66 66 It is interesting to notice that there are cases in which the maximum value of the difference is three hours: if considered that the maximum value of solos in both runs is about 5 hours, it means that some subjects in pairs diminish of the 50% the effort. Anyway the mean is positive and this allows to understand that by working in pairs does not requires increasing developing time and in some cases it halves the time required by the solo. Table 7. Difference of the Efforts Spent by Subjects between the two Treatments. Statistical Variable 1 st Run 2 nd Run Mean 0,69 0,58 Max 3 3 Min -0,85-1,6 Mode 0 0 Std dev 1,08 1,19 Negatives 12,5 % 16,67% Another interesting note is that the most frequent value is zero: some subjects (three for the first run and six for the second run) did not varying the time between the pair and solo treatment. The figure 6 shows the graph of the values. Difference of E ffort for Subject mean max min Statistical Values 2nd run 1st run Fig. 6. Descriptive Statistics for each Subject when Performing Pair and Solo Programming.

67 67 Statistical Tests The Table 8 reports the outcomes of the statistical tests used. A Mann-Withney test was done because the data of samples did not have a normal distribution and the p- level fixed at The first and the second row show an empirical evidence that the differences between the subjects performances in pairs and as solos are statistical significant, both in the first run and in the second run. The third and the fourth row show that there is not empirical evidence that the differences between the pairs performances between the two runs are significant: this demonstrates that the maturity treat did not affect the results of the experiment. Table 8. Results of the statistical Test. Test Between Rank sum (α) Rank sum (β) p-level Subjects in Pair (α) 274, ,000 0,04427 Subject as solo (β) In the 1 st run Subjects in Pair (α) 487, ,000 0,03512 Subject as solo (β) In the 2 nd run Subjects in Pair in the 1 st run (α) Subject in Pair in the 2 nd run (β) 410,00 410,000 1,00000 Subjects as solos in the 1 st run (α) Subject as solos in the 2 nd run (β) 572, ,000 0,94537 In the next sections the experiment about the stability of pairs productivity is discussed. The Experiment on Stability of Productivity Pair programming may produce a certain stability of the pair s productivity. This should be due to the switch between observer and driver: as a matter of fact, the switch occurs when the driver ignores how to go on, while the observer should know the way to overcome the obstacle because his role consists of reviewing the code and assessing and correct the strategies taken. Definition This second experiment aims at answering the following research question: RQ1 Can pair programming ensure a greater stability of productivity than solo programming? The experiment aims to reject the following null hypothesis:

68 68 H 0 : the stability in the throughput of pair programming is similar to that produced by solo programming. µ pair_timestab = µ solo_timestab The alternative hypothesis is: H 1 : pair programming produces an higher stability than solo programming. µ pair_timestab µ pair_timestab Characterization Subjects. The experiment was executed with the collaboration of graduate students of the course of Management of Software Systems, at the third and final year of the Bachelor of Science in Computer Engineering. The course was finalized to teach students which are the technological instruments for managing software projects. They studied software metrics, software processes, quality models (CMM and ISO), and software configuration management. The condition necessary for attending the course was that the students overcome the exams of Object Oriented Programming Language and Software Design. In the latter one they learnt UML and design patterns; in the former one they learnt Java and its main packages and technologies. Sampling of the population. The subjects were divided in eight teams of four people. The teams were formed on the basis of the curriculum s average grade of the students. The average of the grade of the students for each team had to fall in an interval between 24/30 and 26/30. This assured that the teams were well balanced in terms of skills and ability of their members. The subjects were asked to group themselves according to their preferences, but respecting the constraints (the number of team members and the average grade). Some teams performed pair programming and other teams performed solo programming. The subjects were free to decide it. To force people to perform a practice they do not like could affect the experiment results. Assignment. The teams were required to develop a system incrementally. At each iterations each team received the assignment for that iteration. The goals of each iteration were: to develop the new requirements; to integrate the new features with the existing system; to fix bugs due to integration or left from the last iteration. The three assignments are showed in the Appendix A. Each team was responsible to realize the analysis of requirements, writing the related SRS, realizing the high level and detailed design, write the code, write and execute the test cases. The subjects were asked to perform the pair (or solo) programming only during the phase of coding. The other phases had to be realized all together. Variables. The variable under analysis is the productivity per requirement. The times were measured with a time sheet (as showed in the appendix A). The subjects had to take note of the time necessary to develop each assigned requirement, at the end of the realized task.

69 69 Process. The teams were responsible for all the lifecycle of the production, for the management of configuration, for the increment of their system at each iteration. The overall process consisted of three iterations. At each iteration the subjects received the related macro-requirements to develop. At the end of the process the subjects had to demonstrate that their product actually fulfill the requirements they received. At each iteration the subjects release a version of the software, complete of SRS and design documentation. Analysis of data Figure 7 shows the average values of the time spent by the pair groups for each task. Average of times per Group Group 1 Group 2 Group 3 Group 4 Group Task 1 Task 2 Task 3 Fig. 7. Average Values for Groups. It is evident that the trend of each team remains similar in the three tasks: the only exception is reported by the Group 4 in the first task. Except for that, the trend of the paired team produced very close values for each group. Table 9 shows the correspondent values. Table 9. Average values of Groups per Task. Task 1 Task 2 Task 3 Group ,5 18 Group ,5 Group 3 0 5,75 7 Group ,7 18,3 Group 5 10,5 10,5 13,5 The Table 10 shows the standard deviation of each group per task.

70 70 Table 10. Standard Deviation of Each Group per task. Task 1 Task 2 Task 3 Group 1 17, ,85641 Group 2 2, , Group Group Group Standard deviation indicates the dispersion of the time around the mean for each group. Where the value is exactly zero the interpretation leads to only one conclusion: the different pairs worked together. The pairs within the same team tended to work in a kind of synchronous work, implementing the agile practice which requires that the team must make advances on the tasks in a very united way. There are not monads working alone. This suggests that teams formed by pairs improve their capability to manage and lead themselves. Figure 8 shows the standard deviation of the group s times among the three tasks Group 1 Group 2 Group 3 Group 4 Group 5 Fig. 8. Standard deviations of the times among the tasks. The Table 11 shows the standard deviation s values of the groups among the tasks. Table 11. The standard deviation of Group between Tasks. Group 1 3, Group 2 3, Group 3 3, Group 4 24,36072 Group 5 1,732051

71 71 The first observation concerns the absolute values. Except for the Group 4, it is evident that the values are very low. This means that the pairs were able to maintain the times spent under certain limits. The second observation concerns the variations among the values. The standard deviation seems to be very similar. This suggests that also the variations of stability among the paired teams could be considered very low. The results of the solos teams suggest that pair work is more suitable in terms of cohesion of the teams and stability. Figure 9 shows the average times of each group per task Group 6 Group 7 Group Task 1 Task 2 Task 3 Fig. 9. Times of Solo Groups. The Table 12 shows the related values. Table 12. Times of Solo Groups. Task 1 Task 2 Task 3 Group 6 3, Group ,5 Group 8 0 6,5 6 Figure 10 shows that the dispersion of values among the tasks for solo teams is higher than the one produced by the paired teams. This result appears more evident from the Table 13.

72 72 Table 13. Standard deviations of groups among the tasks. Task 1 Task 2 Task 3 Group 6 0,25 0 3, Group 7 2, , ,57735 Group 8 0 1, , The standard deviation of the groups among the tasks suggests that there is a smaller number of zeros in the table. This means that for each task there was not a strong cohesion in solo s teams. The team s components proceeded in different directions, decreasing the coordination and the orchestration of the team s activities Group 6 Group 7 Group Task 1 Task 2 Task 3 Fig. 10. Standard deviation among the tasks for solo teams. Table 14 shows that the results of variation in stability for solo s teams are similar to the ones for paired teams. This could be due the reduced size of the solo s sample. As a matter of fact, by discussion with subjects, the variations in stability is expected higher for the subjects who worked as solo. Table 14. Standard deviations of solo teams between tasks. Group 6 12,82108 Group 7 1, Group 8 3, Statistical Tests Mann Whitney tests were realized in order to obtain evidence of the results.

73 73 Table 15. Statistical Tests. Test Between Rank Sum α Rank Sum β p-level Standard deviation of Paired teams α - Solo teams β 25,000 11,000 0,4533 Standard deviation of Task 1 Paired teams α - Solo teams β 21, ,5000 0,7507 Standard deviation of Task 2 Paired teams α - Solo teams β 17,500 18,500 0,00509 Standard deviation of Task 3 Paired teams α - Solo teams β 20,000 16,000 0,44500 The statistical tests show that no empirical evidence was obtained; the reason may stand in the small size of the samples. This suggests that further replicas are necessary in order to proof the conjecture of the hypothesis. Conclusions The first experiment provides empirical evidence that the same developer decreases the time for developing a task when moves to pair programming from solo programming. Two runs were executed producing the same result. The novelty of this work stands in the research method: the effect of pair programming on the each programmer was investigated, comparing the results when the developer moves from solo programming to pair programming. The second experiment provided the following results: pair programming assures a greater stability in the productivity. This suggests that the practice can be useful to make estimation of project s duration with a certain degree of dependability; pair programming induces the team to work with a major cohesion. This could be due to the pair pressure and the search for a greater rigor in the proper own work when working with a colleague at the same task. Unfortunately, the second experiment did not produce empirical evidence of the results. This may be due to the small size of the sample. In order to achieve a stronger evidence it is necessary to replicate the experiment with a bigger number of subjects. The experimentation suggests that pair programming produces a convenient ratio costs/benefits. As a matter of fact, it is possible to conclude that: pair working increases the productivity of the single programmer; pair programming ensures a greater stability in the throughput, thus may let project manager to make more dependable estimates on project s effort; and, finally, pair programming foster a greater cohesion and cooperation of the team in achieving goals.

74 Chapter V: Pair Programming and Knowledge Leveraging This section investigates the specific benefits of pair programming. The conjecture is that pair programming can be helpful for increasing the knowledge transfer among team s members. The design phase could particularly benefit from effective knowledge transfer: this is the reason why pair programming was applied to the design phase. The chapter consists of three parts: in the first one a preliminary experiment is discussed, suggesting that pair designing can be an effective means to build design knowledge in project team; in the second one a group of more focalized experiments are described, in order to investigate specific aspects of knowledge building; and, finally, the third part investigates the individual background as a factor of success for the practice. The problem: when could pair designing be the proper solution? Knowledge to be applied when developing software includes explicit and tacit forms [98]. Software processes usually involve capabilities that are not only technical, but also related with personal attitudes and professional history, including creativity and experience. Creativity is difficult to assess both in qualitative and quantitative terms. However, experience can more readily be described by metrics and current literature reports some attempts in this direction [104]. Explicit knowledge consists of concepts and practices that can be formalized and transferred by handbooks, scientific publications, documentation, tutorial, rules, and procedures. Tacit knowledge consists of the individual capability of solving problem, and it can be built basically by doing: applying explicit knowledge, registering personal observations, and making personal models for retaining it, namely interiorization in Naci Model [25]. This kind of knowledge cannot be easily formalized and transferred, except for the dialogue. Software design asks for tacit knowledge, generally speaking experience: as a matter of fact, a software programmer becomes a software designer only after some years of practicing. Documentation is the major source of information to successfully accomplish software evolution tasks. Unfortunately, in certain cases reading documentation is not enough to understand all the aspects of the software s systems. The evolution of software design can particularly benefit from a practice that fosters and fastens the knowledge transfer among team s members, for several reasons:

75 75 Software evolution, as well as many other software engineering activities is a cooperative effort and the executed process is not linear: changing the design as the first activity, then documenting changes, writing the code, and finally realizing the regression tests. The team usually follows a form of interleaved process: making decisions about small changes, codifying and trying if the changes make sense and work, confirming the design, documenting the modifications, and finally completing and testing the implementation. The decisions made are the result of a collective activity of trials and tests; as a consequence, the path of rationing that led to the decision is not formalized and remains embedded in the brains of people who decided and approved the modifications. The turnover of maintenance personnel determines a serious loss of experiences and knowledge that is difficult to be replaced; it is more efficient and effective to ask people who know the system to explain the tips and the traps of the architecture than to attempt extracting them from (a huge and sometimes out-ofdate) documentation. Software s design requires the capability to deal with different levels of abstraction: implementation, database, business logic, presentation, deployment, interaction with other systems, and communication protocols. Mastering all these aspects and their integration is not easy and, usually, the documentation does not address them explicitly. Documentation has a low communication bandwidth: acquiring information through documentation requires time and in some situations the time is a very scarce resource. face-to-face channels offer the prospect of richer communication because of the ability to transmit multiple cues[ ]important when there are high levels of equivocality (ambiguity) and uncertainty [92]. A strategy for diffusing the knowledge built by the individual engineers during their daily work and to consolidate it at team or corporate level could be helpful; such a method is not intended as an alternative to documentation, but as a means to complement and to improve the use of documents as source of information by fostering the sense-making process. A claimed benefit of pair programming [88] is that it fosters knowledge leveraging between the two programmers, particularly tacit knowledge. It is claimed that this is due to discussing strategies and matters as they arise. The conjecture of the authors is that applying the practice of working in pairs to the design phase can facilitate the process of building design knowledge if compared with working as singletons. Similarly to pair programming, pair designing indicates the practice where two designers work side by side at the same design document; one of the two actively edits the document whereas the second performs continuous review. The two roles can be switched during the task whenever necessary.

76 76 To validate the conjecture, a set of experiments has been executed: a preliminary investigation explored if a relationship between pair designing and knowledge building actually exists; on the basis of the positive outcomes of the first experiment, a more focalized investigation was accomplished, studying the potentiality of pair designing to enforce and diffuse design knowledge in project teams; finally, a third experiment analyzed how a candidate factor of success for the practice was able to really magnify (or deteriorate) the advantages of pair designing. The exploratory experiment The purpose of the exploratory experiment was to meet the following research goal: to analyze a design task with the purpose of evaluating how pair designing increases the knowledge building with respect to solo designing from the viewpoint of the designer, in the context of an actual student project. Experiment Description The experiment focused on student learning, rather than comprehension in practice. Definition The research question to investigate is the following: RQ Is pair designing an effective means to build design knowledge? In order to answer the research question, the null hypothesis to be tested is: H 0 : There is no significant difference between the level of knowledge building in individual design with respect to pair designing. µ pair_knowbuilt = µ solo_ KnowBuilt The alternative hypothesis is: H 1 Knowledge building in designing as singletons is different form knowledge building in pair designing. µ pair_knowbuilt µ solo_ KnowBuilt Characterization Assignment. The subjects were required to realize a real system s design. The system, named BooksOverTheGlobe, concerned the automation of selling and buying second-hand books. The business functions to be implemented were: managing registration of users (vendors and buyers); managing the selling transactions between

77 77 two users, also located in different countries; supporting the search for rare books or books with particular requirements, such as language, publishing date, publisher. The subjects were required to develop only a part of the design documentation, including: use case diagrams, class diagrams, and interaction diagrams. The requirements, in the form of textual description with the list of business functions, were handed out to subjects. In Table 16 a summary of requirements to be implemented is reported. Subjects. The subjects were 45 students of the 1st Level Software Engineering course (third year) of the BA in Computer Engineering at University of Sannio, Italy. Before the experiment, the students learned design principles in lectures and in tutored lab sessions. Process. The subjects were not informed that they were taking part in an experiment, in order to avoid bias of experimental results. They were told that the aim of the work was to train them in applying UML and design principles learned in the theoretical lessons. All the subjects were randomly assigned either to a pair or to work as singletons. The pairs didn t change during the experiment. All the subjects had to develop the same system, starting from the same requirements specification. The number of pairs was equal to that of singletons and equal to 15. Each pair had an id starting with a p followed by an ordinal number, e.g. p1, p4, p13; each singleton had an id starting with an s followed by an ordinal number, e.g. s1, s3, s12. Table 16. Experiment Assignments. R1. Users Registration R1.1 Registration of the branch to the head quarter, via internet R1.2 Registration of users (vendors/buyers) to branch R1.3 Registration of users (vendors/buyers) to branch, via Internet R2. Book request R2.1 Search of books, through a parameter or a combination of them R2.2 Search of books, through a parameter or a combination of them, via Internet R2.3 Selling of a book R3. Data update R3.1 Update data of branch R3.2 Update data of buyers R3.3 Update data of vendors The Use Case Diagrams, Class Diagrams, and the Interaction Diagrams of the system were also developed by experimenters, before executing the experiment, and relevant data is shown in Table 17.

78 78 Table 17. Number of entities in the design. Use Cases 9 Classes 13 Interaction Diagrams 8 A training session preceded the experiment. The training session consisted of: an introductory seminar, a proof run, and a reinforcement seminar. During the introductory seminar, the practice of pair programming was explained, illustrating its claimed benefits in knowledge leverage. The proof run helped to train subjects in pair designing. Subjects designed a system for managing car sharing from a set of requirements defined by experimenters. The activity was performed in pairs, and produced documentation including use cases and class diagrams. In the reinforcement seminar, subjects discussed with experimenters their doubts or concerns of the practice. The experiment was executed in three runs. Each run aimed at producing a deliverable of design phase: use case diagrams, class diagrams, and interaction diagrams. At the beginning and at the end of each run, the subjects answered two questionnaires in order to evaluate, respectively, their own initial level of knowledge and the additional knowledge achieved as an effect of working in the run either as a solo or paired designer. The process of the overall experiment is depicted in Fig. 11. In the picture, the circles are the runs to be executed, and the rectangles are the artifacts, either used or produced, during the runs. quest_0 Req Spec quest_2 quest_4 quest_1 quest_3 quest_5 A_quest_0 A_quest_2 A_quest_4 Run 1 A_quest_1 Run 2 A_quest_3 Run 3 A_quest_5 U_C_diag Class_diag Inter_diag Fig. 11. Experimental Design of the Exploratory Experiment.

79 79 In order to facilitate the reading of Fig.11 the acronyms are explained as follows: Req Spec: Requirement Specifications; quest_i: questionnaire number i, e.g. quest_0 means questionnaire number zero. A_quest_i: answers to quest_i. U_C diag: use case diagrams. Class_diag: class diagrams. Iter_diag: interaction diagrams. Questionnaires. Specifically, close-response questionnaires were used mainly focused on issues of system architecture and design techniques. The questionnaires were anonymous, with the only recommendation to indicate either pair s or singleton s identification ids. Variable. The knowledge built was the dependable variable and it was measured by grading the questionnaires. From the questionnaires, the knowledge level for each subject was derived as follows: answer was assigned a score of 2 if the answer was right, 1 if the answer was missing, and 0 if the answer was not correct. In this way people preferred to answer and they were not tempted to leave any question unsolved. Sometimes this happens for laziness or because people want to finish as soon as possible. The score of each questionnaire was calculated automatically by an application written by experimenters. As input for each run, the subjects were provided with the requirements specifications and the documentation produced by themselves in the previous runs. Table 18 summarizes the experimental design.

80 80 Table 18. Preliminary results. Subjects Treatment Phase Input Output Req Spec; U_C diag; Run 1 quest_0; A_quest_0; quest _1; A_quest_1; Group A Pair designing Run 2 Run 3 Run 1 Req Spec; quest_2; quest_3; U_C diag; Req Spec; quest_4; quest_5; U_C diag; Class_diag; Req Spec; quest_0; quest _1; Class_diag; A_quest_2; A_quest_3; Inter_diag; A_quest_4; A_quest_5; U_C diag; A_quest_0; A_quest_1; Group B Singleton designing Run 2 Run 3 Req Spec; quest_2; quest_3; U_C diag; Req Spec; quest_4; quest_5; U_C diag; Class_diag; Class_diag; A_quest_2; A_quest_3; Inter_diag; A_quest_4; A_quest_5; The Results The null hypothesis has been tested with the Mann Whitney U test, because data was not normally distributed. The hypothesis has been tested for the metric knowledge building, defined as follows: KB i = KL ai KL bi for i=1,..3, where KL ai is the knowledge level of the subject after the execution of the run i and KL bi is the knowledge level of the subject before the execution of the run i. In Table 19 the results of the statistical tests are reported.

81 81 Table 19. Results for the statistical Test. Run Paired KB Singleton KB p level Mean Mean I: Use Case 1,75-0,25 0,019 II: Class Diagram 2,27 1,36 0,53 III: Interaction 1,86-0,5 0,26 diagram The paired KB mean is higher than the singletons KB mean in each run and results of the run I are statistically significant (p-level < 0.05). The results of the other two runs are not statistically significant. This is primarily due to two threats: selection and history. Concerning the selection, subjects were not coupled by criteria such as difference in experience and capability, but randomly. It seems that pairing subjects with different capabilities (given that the experience is expected to be similar) could improve the design of the experiment. In order to accomplish this, before forming pairs the subjects should be divided in groups according to their own capability. Afterward pairs should be formed by randomly selecting people from different groups. The second threat was due to the fact that students did not show a strong commitment to adhere to the overall experiment process. The first run was executed by the designated pairs, whereas in the other two runs the number of subjects decreased. A number of subjects abandoned the run II and then the run III because they considered it a training activity for the course and perceived it as unnecessary. Fig. 12 shows the trend of knowledge building over the runs. Qualitative analysis shows that the knowledge building is always higher for pairs than singletons. Knowledge building 2, 5 2 1, 5 1 0, 5 0-0, Run # pai r singlet on Fig. 12. Knowledge Building Trends. The trend is interestingly the same for the pairs and for the singletons: crescent from run I to run II and decrescent from run II to run III. This shows that the practice affects mainly the knowledge building when realizing the Class Diagrams rather than

82 82 Use Cases or Interaction diagrams. Concerning Use Cases and Interaction diagrams the practice didn t add much more knowledge at the knowledge gathered by theoretical lessons. Figure 13 shows that standard deviation of pairs is always lower or comparable to that of singletons, along all the runs. This indicates that knowledge building is more stable, thus more predictable and repeatable for pairs than for singletons. The analysis of knowledge level s values suggests other observations. The average score of quest_0 was 15,5; it is the starting knowledge level of all the subjects before the application of the pair designing practice. In Figure 14 the trends of the five questionnaires score are reported. Standard Deviation 4 3,5 3 2,5 2 1,5 1 0, Run # pair singleton Fig. 13. Standard deviation trend. The figure shows that: (i) the initial knowledge level is comparable among pairs and singletons; (ii) the pairs keep along the experiment a knowledge level higher than that of singletons; (iii) the difference tends to increase for the last questionnaires. This qualitatively confirms that, with pair designing, subjects achieve faster a higher level of knowledge. Knowledge Level Pairs Si ngl eton s A_ que st_# Fig. 14. Knowledge level trend.

83 83 Statistical tests were performed in order to evaluate the maturity threat. Results are reported in Table 20. Table 20. Results of statistical Tests. Maturity test between p-level Run I :0,78 Run II: 1,81 0,15 Run I: 0,78 Run III: 1 0,49 Run II: 1,81 Run III:1 0,61 Table 20 shows that a maturation of the subjects had place during the experiment: as matter of fact the run I presented values of knowledge building smaller than those of run II and run III. On the other hand, maturity is not statistically significant. Threats to validity Threats to construct validity. Dependent Variable. To be precise, the experiment focuses on student learning, rather than on comprehension-in-practice. Comprehension in practice means the knowledge that practitioners build by performing their daily work. Both are forms of tacit knowledge building, and both are intended to occur when people, either students or practitioners, perform development activities. However, at a finer analysis, differences can be identified. Students are committed in learning and whenever they work during classes or labs, also during the experiment, their efforts are headed in this direction. Practitioners do not necessarily concentrate their efforts in learning during their daily work. Comprehension in practice is a kind of unconscious learning and it occurs spontaneously when people face new problems. In this case, practitioners convey their efforts just on work: learning comes as a side-effect. Finally, in both the cases the observation concerned how knowledge is built by doing, but it should be considered that a difference exists. The aim of students is to apprehend: the learning is the primary goal; on the contrary, the only aim of practitioners is realizing successfully their tasks: the learning, when occurs, is not a primary and voluntarily pursued goal. Variable Measurement. Questionnaire is not a precise means to capture the tacit knowledge level and, thus, the knowledge building. Tacit knowledge for its intrinsic nature is hard to capture completely, and it cannot be formalized. Evaluating it by using questionnaires about the design of the system and design technique is an acceptable approximation. More sophisticated instruments could be more effective, but in this case the experimentation has to become multidisciplinary; it should include psychology research, at least. Threats to internal validity Learning effects. Due to the experimental design, learning effects among the tasks could be happen. Such effects are eliminated because the observation s points were

84 84 coincident with the days of task execution: pair and solo outcomes of the same tasks were compared. Persistence effects. In order to avoid this effect, the experiment was realized with subjects who did not take part to similar experiments before. Subjects motivation. The experiment was part of the usual training of the students. In order to obtain a good mark they were motivated to do their best. Fatigue effects. The experiment s runs lasted the time of a regular lab session. They used to work at that kind of tasks for the time of the run. Threats to external validity. Materials and tasks. The subjects used as the modeling language UML and design techniques traditionally thought in BA courses. It should be interesting to evaluate learning of junior designers while faced with concerns of real world projects, including quality procedures, special architectural styles, and frameworks for complex systems, such as distributed, mobile, and services-based applications. Preliminary conclusions The preliminary experiment aimed at understanding if pair designing could be a proper means to build a better knowledge of software s design. The null hypothesis can be rejected with statistical significance only for the first run, whereas for the second and third one it cannot be rejected. This can be due to two experimental threats: selection and history. Qualitative analysis gives encouraging results in order to continue the experimentation. First, those who worked in pair showed a greater knowledge building with respect to whom worked as singletons, along all the experiment. Second, the knowledge building was more stable for pairs than for singletons: the knowledge growth of pairs can be predictable and repeatable within certain limits. Finally, three limitations in the experiment can be recognized. The experimental objects were UML diagrams and basic design concepts. It will be interesting to evaluate knowledge building when dealing with real world concerns, such as quality procedures, special architectural styles, and frameworks for complex systems. The second limitation regards the instrument used for capturing the level of tacit knowledge. Due to its intrinsic nature, tacit knowledge cannot be described formally and structured. Consequently, questionnaire scores can be considered as an approximation of such a measurement. A more complete benchmark of metrics can be realized by working in multidisciplinary research group, including psychologists and education scientists. The third limitation regards the construct. Student learning rather than comprehension-in-practice was the purpose of study. Although both are forms of

85 85 tacit knowledge building, the latter should be more useful in order to design software processes oriented to improve knowledge by doing of developers. Although the limitations, the preliminary investigations suggested that a more focalized study could be realized for better understanding which aspects of knowledge building are affected by the pair designing. The focalized experiments The preliminary experiment suggested that pair designing could be used as a means for enforcing knowledge, with special regard to the tacit one. This motivated an experimentation with a stronger focus on certain aspects of knowledge management. An experiment in Italy and a replica in Spain were realized. Introduction Two phases of the knowledge management process were chosen to be investigated: the knowledge diffusion and the knowledge enforcement. Diffusing knowledge concerns the dissemination of knowledge among the team s members. Thus, it is essentially a process of sharing knowledge by socialization. This may be useful at the initial phases of the process when the team is required to reach a certain level of knowledge about the system in order to accomplish properly maintenance tasks. Enforcing knowledge means to increase the overall knowledge of team s members by combining individual pieces of knowledge. It may be useful when the team is already familiar with the system, but some specific features and characteristics are still unclear. The enforcement of knowledge is required after having diffused the knowledge; it is intended as a means for reducing reworks and latency times due to the scarce mastering of the overall design of software. An experiment and a replica were realized with the aim of answering the following research questions: RQ1 RQ2 Is pair designing effective for enforcing the design s knowledge of the project team s members during design evolution tasks? Is pair designing effective for diffusing the design s knowledge of the project team s members during design evolution tasks

86 86 The Experiments This section illustrates the experiments realized and discusses the outcomes with the aim of answering the research questions stated above. Definition The experiments was executed with the purpose of testing the following null hypotheses: H oa : the pair designing does not affect the diffusion of design knowledge when performing evolution tasks. µ Pair_Know_level = µ Solo_Know_level H ob : the pair designing does not affect the enforcement of design knowledge when performing evolution tasks. µ Pair_Know_enf = µ Solo_Know_enf The alternative hypotheses are: H 1a : the pair designing affects the diffusion of design knowledge when performing evolution tasks. µ Pair_Know_level µ Solo_Know_level H 1b : the pair designing affects the enforcement of design knowledge when performing evolution tasks. µ Pair_Know_enf µ Solo_Know_enf Characterization Subjects. The experiment in Italy was executed with the collaboration of the students of the Master of Technologies of Software (MUTS) and Master of Management and Technologies of Software (MUTEGS), high education university courses for post-graduate students, at University of Sannio ( Students of MUTS own a scientific graduation (engineering, mathematics, physics), whereas students of MUTEGS own an economic/humanistic graduation (economics, philology, literature, philosophy). Both the courses provide the same basic education in computer engineering (operating systems, programming languages, network, database, and software engineering), but MUTS students are educated for developing and maintaining Software Systems, whereas MUTEGS students for dealing with the economic and organizational issues of software lifecycle. The two Master courses are held contemporarily and both last one year, during which students attend theoretical classes and lab sessions, with the same professors and lecturers, develop a large and complex project in connection with

87 87 an enterprise, participate to seminaries from international experts, perform a three month stage in software companies. The subjects were organized as follows: 5 pairs with two MUTS students; 5 pairs with two MUTEGS students; the other 16 subjects, MUTS and MUTEGS, worked as solo designers. All the groups were formed randomly. Questionnaires. Two questionnaires, QA and QB, were handed out to subjects, in order to measure the dependent variables. Both the questionnaires were distributed as entry and exit questionnaire, so that each subject had randomly QA (or QB) at entry and, conversely, QB (or QA) at exit. This avoided that the results depended on the questionnaire itself. The questions concerned architectural and functional aspects of the system. Variables. One dependent variable was the knowledge enforcement that captures the improvement of design knowledge achieved by the developers while evolving the design. The subjects studied the design of the system before starting the maintenance tasks as pairs or as solos. The questionnaires were answered by the subjects before and after having performed evolution tasks on the system. The variable was calculated as the difference between the exit questionnaire s grade and the entry questionnaire s grade. The questionnaires were evaluated in this way: each correct answer was evaluated 1; each incorrect answer was evaluated 0. The other dependent variable was the knowledge diffusion that captures the level of knowledge achieved by the team s members. This was measured by calculating the grades of the exit questionnaires. Rationale for the sampling of the population. Students of Software Engineering courses are suitable for such an experiment because they study software architecture and software system design. Furthermore, they usually are employed as software architects or designers after the graduation. MUTS and MUTEGS students are a fine population s sample, considered that they experienced an actual project work during the overall master established together with enterprises funding the courses. Since the students have comparable curricula, there is not a relevant bias in the samples of pairs and solos; however, the statistical tests to ascertain that the randomization of pair s samples and solos samples was realized well are discussed in advance. Assignment. In order to evaluate the knowledge built by doing while evolving the system design, the assignment for the subjects consisted of improving the design of the system. The design of the system was formalized in UML and included: textual specification of the system s requirements, two use cases diagrams, and two class diagrams (for a total of 15 classes). The design was developed by experimenters. Considered the time available, bulky documentation was avoided. The maintenance tasks were basically two: reduce complexity, by erasing entities or relationships between entities not fundamental for understanding; improve readability, by changing existing entities (use cases, actors, classes, methods), or adding new ones.

88 88 This kind of assignment was targeted at maximizing the knowledge built by doing; as matter of fact, evolving software systems needs the programmer to analyze in depth the system. The system design was thought as compliant with the skills and experience of subjects, with the aim of making the objects representative of the population. The Process. The process of the experimental run was the following: each subject studied documentation for 30 minutes, individually; each subject answered an entry questionnaire, individually, for about 15 minutes. The entry questionnaire was aimed at establishing the baseline, i.e. level of knowledge of the system before working on it and built only by reading the documentation; the pairs and the solo designers performed the maintenance tasks for 2 hours; each subject answered an exit questionnaire individually, in order to understand the knowledge built by modifying the system according to two different styles, pair and solo. Before running the experiments the subjects participated to lab sessions for training them in pair designing. Although a CASE tool could be used, such as Rational Rose [113], or ArgoUML [112], the final decision was to use only pen and paper. The reason was that some subjects could be more familiar with this kind of tools and this could inject bias in the results. As consequence, more time was needed for preparation in order to make similar the ability of subjects to work with tools. Appendix B shows an excerpt of the experimental material. In Table 21 the experimental design is provided. Table 21. Experimental Design. Subjects Treatment Input Output Requirement Specification; Use case 5 MUTEGS 5 MUTEGS 5 MUTS 5 MUTS 8 MUTS Solo 8 MUTEGS Solo Paired MUTEGS MUTEGS Paired MUTS MUTS Diagram; Class Diagram; Entry questionnaire QA (or QB); Exit questionnaire QB(or QA). Modifications to Use Case Diagram and Class Diagram; Answered entry questionnaire (or QB); Answered questionnaire QB(or QA). QA exit

89 89 The Replica in Spain The subjects were students enrolled at the Department of Computer Science at the University of Castilla-La Mancha in Spain. The first group was composed of 42 students enrolled in the final-year (third) of the Computer Science (BSc) in the specialization of Management (named 3BScMngmt in the following) ; the second group was composed of 39 students enrolled in the final-year (third) of the Computer Science (BSc) in the specialisation of System (named 3BScSys in the following); and finally the third group consisted of 12 students enrolled in the final-year of the fifth year of MSc (named 5MSc in the following). The experimental design of the replica is drawn in Table 22: 32 randomised pairs where formed and 32 subjects were left working as solo designers. Table 22. Replica s Experimental Design. Subjects Treatment Input Output Paired Requirement Modifications to 3BScMngmt- Specification; Use Case Diagram 3BScMngmt Use case and Class 3BscSys- Diagram; Diagram; 3BscSys Class Diagram; Answered entry 5MSc-5MSc Entry questionnaire QA questionnaire QA (or QB); (or QB); Answered exit Exit questionnaire 64 students 3BScMngmt 3BscSys 5MSc 32 students 3BScMngmt 3BscSys 5MSc Solo QB(or QA). questionnaire QB(or QA). Dependent and independent variables, the process, the assignment and the questionnaires remained the same but: the overall experiment lasted two hours and the experimental object was properly translated in Spanish by the native Spanish language speaking authors. Analysis of data In order to accept the outcomes of the experiments as valid, it was necessary to avoid relevant differences in the samples to compare: the one of solos and the one of pairs, for both the populations (MUTS and MUTEGS). If some differences on the entry questionnaires are detected, the randomization was not accomplished correctly.

90 90 Table 23 shows this analysis for the experiment and Table 24 for the replica. Mann- Whitney s method was used in all the tests because the data of samples were not normally distributed and the p-level threshold value was fixed at 5%. Table 23. Tests for Validating the Randomization in the Experiment. Test Between Entry Questionnaires of Subjects of MUTS Pairs sample (α) Subjects of MUTS Solos sample (β) Entry Questionnaires of Subjects of MUTEGS Pairs sample(α) Subjects of MUTEGS Solos sample(β) Rank Sum α Rank Sum β p-level 171,000 39,000 0, ,000 59,000 0, The tests in Table 23 show that the MUTS subjects working as solos and those working in the pairs did not present significant differences at the entry questionnaire; similarly, the MUTEGS subjects of the solos set and those of the pairs set did not present significant differences. It is possible to conclude that the randomization was realized correctly. Table 24 shows the test for validating the randomization in the samples of the replica: also in this case there is no significant differences between the subjects performing solo and pair designing. Table 24. Tests for Validating the Randomization in the sample. Test Between Entry Questionnaires of Solos of the 3BScSys sample(α) Pairs of the 3BScSys sample (β) Entry Questionnaires of Solos of the 5MSc sample(α) Pairs of the 5MSc sample (β) Entry Questionnaires of Solos of the 3BScMngmnt sample(α) Pairs of the 3BScMngmnt sample (β) Rank Sum α Rank Sum β p-level 425, ,000 0, ,500 46,5000 0, , ,000 0,321966

91 91 The Knowledge diffusion In Table 26 the results of statistical tests for rejecting the null hypothesis H oa are reported. Mann Whitney test was used because the data of samples did not show normal distribution and fixed the p-level at Table 25. Results of Statistical Tests. Test Between MUTS Pairs (α) MUTS Solos (β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Pairs 5MSc(α) Solos 5MSc(β) Pairs 3BScSys(α) Solos 3BScSys(β) Pairs 3BScMngmnt(α) Solos 3BScMngmnt(β) Rank Sum α Rank Sum β p-level 116,500 54,50 0,049 78,50 57,50 0, ,00 75,00 0,023 51,500 26,500 0, , ,000 0, , ,00 0,00000 location Italian Experiment Spanish Experiment The null hypothesis can be rejected when the pairs in the case of MUTS sample but it cannot in the case of MUTEGS sample. This should be due to the fact that the two samples included students with very different skills. MUTS students come from scientific studies whereas MUTEGS students come form other kinds of studies. MUTS students are supposed to be more familiar with algorithms and deductive reasoning than MUTEGS students. The conjecture is that the effectiveness of the practice could be affected by the ability of subjects of performing pair design. As a matter of fact, there is statistical evidence that the performance of the two populations in terms of knowledge is different, as the third row shows. From the first experiment the following conclusions can be made: i) pair design can diffuse knowledge better than solo programming; and ii) the effectiveness of pair design could be affected by the kind of population. The replica realized in Spain produced empirical evidence that pair designing can help the diffusion of design s knowledge in all the three samples. The replica reinforces the finding of the experiment run in Italy. Table 26 shows the descriptive statistics of data samples.

92 92 Table 26. Descriptive Statistics of the Experiment. Pairs Std Average Max Min Min Dev. MUTS Pairs 1,75 5,8 9 4 MUTEGS Pairs 1,60 3,9 7 1 Italian MUTS Solos 1,03 4,25 6,00 3,00 Experiment MUTEGS Solos 1,55 5,13 7,00 3,00 Pairs 3BScSys 1,02 6,00 7,00 3,00 Solos 3BScSys 1,26 4,44 6,00 3,00 Pairs 5MSc 0,98 6,17 7,00 5,00 Spanish Solos 5MSc 0,82 5,33 7,00 5,00 Experiment Pairs 3BScMngmnt 0,73 6,30 8,00 5,00 Solos 3BScMngmnt 0,94 4,21 5,00 1,00 Fig.15 shows the descriptive statistics of MUTS sample. Descriptive Statistics of MUTS Samples MUTS Pairs MUTS Solos 2 0 Std Dev. Average Max Min Variables Fig. 15. Descriptive Statistics of MUTS Sample. It appears that in any case the pairs outperformed the solos in terms of knowledge built, as the values of average, maximum and minimum demonstrate. This does not happen with MUTEGS students: as previously mentioned, this could be due to the different skills MUTS and MUTEGS students have. As a matter of fact, there is empirical evidence that the MUTEGS population is different form the MUTS population.

93 93 Descriptive Statistics of MUTEGS Sample Std Dev. Average M ax Min Variables MUTEGS Pairs MUTEGS Solos Fig. 16. Descriptive Statistics of MUTEGS Sample. Descriptive Statistics of 3BScMngmnt Std Dev. Average Max Min Variables Spanish Pairs 3BScMngmnt Spanish Solos 3BScMngmnt Fig. 17. Descriptive Statistics of 3BScMngmnt Sample.

94 94 Descriptive Statistics of 3BScSys Sample Std Dev. Average Max Min Variables Spanish Pairs 3BScSys Spanish Solos 3BScSys Fig. 18. Descriptive Statistics of 3BScSys Sample. Descriptive Statistics of 5MSc Sample Std Dev. Average Max Min Variable Spanish Pairs 5MSc Spanish Solos 5MSc Fig. 19. Descriptive Statistics of 5MSc Sample. The Spanish samples show that pairs achieved higher values of knowledge diffusion than solos.

95 95 The Knowledge Enforcement The H 0b hypothesis was tested with Mann Whitney method and the p-level was fixed at 0,05: reports the results of the tests. Table 27. Statistical tests for the Experiment. Test Between MUTS Pairs (α) MUTS Solos(β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Rank Sum α Rank Sum β p-level 123,500 47,500 0, ,500 66,500 0, ,500 42,500 0,0428 Table 27 shows that pair design helps improve the knowledge with statistical evidence for the MUTS sample, whereas the results of the MUTEGS subjects did not report empirical evidence. This confirms the conclusions of knowledge diffusion s analysis: the kind of educational background affects the improving of knowledge obtained with pair designing. As a matter of fact, the third row demonstrates that the two samples obtained statistically different improvements of knowledge after having performed pair design. Table 28 show the results of the hypotheses testing in the replica. Table 28. Results of Statistical Tests for the Replica. Test Between Pairs 3BScSys (α) Solos 3BScSys (β) Pairs 5MSc (α) Solos 5MSc (β) Pairs 3BScMngmnt (α) Solos 3BScMngmnt (β) Rank Sum α Rank Sum β p-level 49,500 28,500 0, , , ,500 26,500 0, The replica basically confirms the findings of the experiment, because all the rows show the p-level under the threshold of 0,05. Only the first row reports a greater value, but it is relatively close to 0,05. The descriptive statistic, reported in Table 29 can suggest further observations. Table 29. Descriptive Statistic of the Experiment. Parameters MUTS Pairs MUTS Solos MUTEGS Pairs MUTEGS Solos Average 2,000-1,400-0,800-0,750 Max 5,000 2,000 1,000 1,000 Min -1,000-3,000-3,000-2,000 std dev 1,915 2,074 1,643 1,500

96 96 As Fig.20 shows, the standard deviation of pairs is less than the standard deviation of solos for the MUTS sample, but this does not occur in the MUTEGS sample: this points out that the knowledge improving in the (MUTS) pairs is more predictable than in the solos. The pair designing can be used for planning the individual growth of team s members. The Standard Deviation and The Minimum Values 3, 000 2, 000 1, 000 0, 000-1, 000-2, 000 MUTS Pairs MUTS Solos MUTEGS Pairs MUTEGS Solos min std dev -3, 000-4, 000 Subjects Fig. 20. Standard Deviation and Minimum values for the Italian Experiment. The effectiveness of the practice could depend on the educational background of pair s members, as already observed. Fig 21 shows the average and the maximum values of the experiment s sample.

97 97 The Average and the Maximum Values 6,000 5,000 4,000 3,000 2,000 1,000 average max 0,000-1,000-2,000 MUTS Pairs MUTS Solos MUTEGS Pairs MUTEGS Solos Subjects Fig. 21. Average and the Maximum values for the Italian Experiment. Table 30. Descriptive Statistics of the Spanish Replica. Statistical Parameter 5MSc Pairs 5MSc Solos 3BScSys Pairs 3BScSys Solos 3BScMng mnt Pairs 3BScMng mnt Solos average 1,167-0,500 1,714-0,579 1,111-1,036 max 3,000 3,000 4,000 3,000 3,000 2,000 min -1,000-2,000-1,000-4,000-1,000-5,000 std dev 1,722 1,871 1,736 1,865 1,278 1,856 The replica s results confirm the results of the experiment as the graph in Fig.22 demonstrates: the pairs outperformed the solos in every sample involved in the replica.

98 98 Average and Maximum Values 5,000 4,000 3,000 2,000 1,000 0,000-1, 000-2, 000 5MSc Pairs 5MSc Solos 3BScSys Pairs 3BScSys Solos Subjects 3BScMngmnt Pairs 3BScMngmnt Solos average max Fig. 22. Average and Maximum values for the Replica. Fig.23 shows that the deviation standard s values of the pairs of each sample is ever smaller than the correspondent value of the solo s sample: this suggests that the improving in the pairs is more predictable than in the solos. The Minimum and the Standard Deviation Values 3,000 2,000 1,000 0,000-1,000-2,000-3,000-4,000-5,000 5MSc Pairs 5MSc Solos 3BScSys Pairs 3BScSys Solos 3BScMngmnt Pairs 3BScMngmn t Solos min std dev Subjects Fig. 23. The Minimum and the Standard Deviation Values.

99 99 Comparing knowledge diffusion and enforcement The results for the diffusion and the enforcement reports similar conclusions: pair designing is helpful both for diffusing the knowledge within the project team, when a great number of team s members are not familiar with the software design and for improving the knowledge of each designer when they have built a preliminary idea of the software; the skill and the individual ability could seriously affect the effectiveness of the practice; this entails that if the projects manager plans to use pair design for improving the design knowledge, an assessment of the team s members is required; and the enforcement of knowledge is more predictable when using pair designing than the enforcement due to the traditional designing. Experimental threats Threats to construct validity. The dependent variables aims at capturing the knowledge. Questionnaire grading was proposed that surely cannot capture the overall aspects of the object to be measured. Tacit knowledge for its intrinsic nature is hard to formally describe and quantify. What was measured was an approximation of what was intended to measure. Threats to Internal Validity. The following issues have been dealt with: - Differences among subjects. Using a within-subjects design, error variance due to differences among subjects is reduced. In this experiment, students had a good degree in using UML. It is one of the main topics of their curriculum. - Learning effects. The subjects were required to deal with only one run with only one assignment, so learning threat was cancelled. - Fatigue effects. On average the experiment lasted a time short enough that fatigue was not very relevant. As a confirmation, the students asked for longer time to accomplish better the assignment. - Persistence effects. In order to avoid persistence effects, the experiment was run with subjects who had never done a similar experiment. - Subject motivation. The participants were volunteer, in order to help us in our research. Students were motivated to participate in the experiment because they were learning a practice that should be useful in their professional career. - The experimental package. Both in the experiment and in the replica the results of the run were independent from the experimental package, as showed in Table 31. Mann Whitney test was used with the p-level fixed at 0.05, and there is no evidence that the differences due to the questionnaires were statistically significant.

100 100 Table 31. Results of the tests on the questionnaires used in both the experiments. Test Between Questionnaire A (α) Questionnaire B (β) in the experiment Questionnaire A (α) Questionnaire B (β in the replica Rank Sum Α Rank Sum β p-level 540,00 406,00 0, ,00 677,00 0,2068 Threats to External Validity. Two threats of validity have been identified which limit the possibility of applying generalization: - Materials and tasks used. System design s documentation was used in the experiment. The system showed a discrete degree of complexity, because it describes an existing system. - Subjects. Students play a very important role in the experimentation in software engineering, as pointed out in [81][89]. In situations in which the tasks to perform do not require industrial experience the experimentation with students is viable [90]. The individual Background as factor of success of Pair designing The previous experiments demonstrated that pair designing can help enforcing and diffusing the design s knowledge. In particular, pair design can help enforcing the tacit knowledge, such as: approaches, strategies, rationales, and so forth. The building of tacit knowledge through the practice is affected by a huge amount of factors difficult to capture and measure, such as personal attitude, capability, experience. One of these factors was chosen: the individual education background, in terms of classes of graduation. The overall research goal is: to analyze a design task with the purpose of evaluating how individual background affects the knowledge built with pair designing from the viewpoint of the designer, in the context of an actual student project. In order to reach the stated research goal, the data from the previous experiment are properly analyzed, as described in the following. Design And Results The research question to answer is the following:

101 101 RQ Does the educational background affect the knowledge building due to pair designing? The following null hypothesis was tested: H o : the difference in education between the pair s components does not affect the building of system knowledge. µ Know_Back1 = µ Know_Back2 The alternative hypothesis is: H 1 : the difference in education between the pair s components affects the building of system knowledge. µ Know_Back1 µ Know_Back2 Table 32 shows the experimental design. Table 32. Experimental Design. Subjects Treatment Input Output 4 MUTS Paired MUTS Requirement Modifications to Use 4 MUTEGS MUTEGS Specification; Case Diagram and Paired Use case Diagram; Class Diagram; 5 MUTEGS MUTEGS Class Diagram; Answered entry 5 MUTEGS MUTEGS Entry questionnaire QA questionnaire QA (or 5 MUTS Paired MUTS (or QB); QB); 5 MUTS MUTS 1. Exit questionnaire 1. Answered exit 8 MUTS Solo QB(or QA). questionnaire QB(or 8 MUTEGS Solo QA). Table 33 reports the results of the statistical tests. Table 33. Results of the Statistical Tests. Tests MUTS-MUTS (a) MUTS-MUTEGS (b) MUTS-MUTS (a) MUTEGS-MUTEGS (b) MUTS-MUTEGS (a) MUTEGS-MUTEGS (b) MUTS(a) MUTS-MUTS (b) MUTEGS (a) MUTEGS-MUTEGS (b) Questionnaire A - Questionnaire B Rank Sum (a) Rank Sum (b) p-level p-level 1 tail 121,00 50,00 0,020 0, ,00 75,00 0, ,00 75,00 0,020 0,023 54,50 116,50 0,049 0,054 57,50 78,50 0,270 0, ,00 406,00 0,161 0,179

102 102 The following conclusions have been drawn: forming pairs with individuals with the same educational background (either scientific with scientific or non scientific with non scientific)emphasizes the expected benefits of pair designing. Coupling a person with a scientific background and one with a non-scientific background does not seem to improve the performance of the latter but to make worst the former. From the statistical tests the following outcomes can be derived: there is empirical evidence that the individual background affects the success of pair designing in building knowledge, as the first three rows demonstrate. the null hypothesis cannot be rejected only when comparing MUTEGS pairs and MUTEGS solo. On the contrary there is evidence that the MUTS pairs outperformed MUTS solos. This suggests that the individual background can deteriorate also the capability of the practice itself against the solo deigning. The difference between the results of the questionnaires is not significant, thus they do not affect the outcomes of the experiment. Conclusions Evolving software s design requires that all the members of the team get a deep and complete knowledge of the domain, of the architectural components and their integration. A weak knowledge of these issues can lead to wrong choices and frequent reworks during evolution tasks on the software design. One of the expected benefits of pair programming is the enforcement of the software s knowledge within the project s team. The paradigm of pair programming was applied to the design phase and named it pair designing. A group of experiments was realized in order to investigate the relationship between pair designing and knowledge building. The following conclusions can be drawn: there is empirical evidence that pair designing helps enforce the design knowledge when evolving systems. This suggests that the practice can be used as a means for increasing the knowledge of a system in critical situations, for instance when the project is going to run out. Working at a task (in this case maintenance of the software s design) does not guarantee that the designer improve its knowledge significantly. The standard deviations of solos sample, both in the experiment and in the replica were high. This entails that a strategy to make the designer learn by her work is necessary, and using the pair designing could be a suitable candidate. The individual background is relevant for the effectiveness of the practice. Team managers should pay attention when choosing people to compose the pairs. This issue deserves major and more focalized investigations. In terms of specific benefits of the practice, pair programming seems to be a good means to enforce and diffuse knowledge among software teams. It could support the use of documentation as the official source of knowledge about the system to develop and maintain.

103 103 Bibliography [80] Abrahamsson P. and Koskela J. Exterme Programming: A Survey of Empirical Data from a Controlled Case Study, Proc. of International Symposium on Empirical Software Engineering, Redondo Beach, CA, USA, 2004, IEEE CS Press. [81] Basili V., Shull F., and Lanubile F. Building Knowledge Through Families of Experiments, IEEE Transactions on Software Engineering, 25(4), IEEE CS Press, 1999, [82] Beck K. Extreme Programming Explained: embrace change. Addison-Wesley, Reading, Massachusetts,2000. [83] Brooks P. The Mythical Man Month. Addison Wesley Publishing Company, Reading, Massachusetts, [84] Canfora G., Cimitile A., and Visaggio C. A. Working in pairs as a means for design knowledge building: an empirical study, IEEE International Workshop on Program Comprehension, Bari, Italy 2004, IEEE CS Press. [85] Canfora G., Cimitile A., and Visaggio C.A. Lessons learned about Distributed Pair programming: what are the knowledge needs to address?, proc. of Knowledge Management of Distributed Agile Process-WETICE, Linz, Austria, 2003, IEEE CS Press. [86] Carver J., Jaccheri L., Morasca S., and Shull F. "Using Empirical Studies during Software Courses. Experimental Software Engineering Research Network , LNCS [87] Cockburn A. Characterizing People as Non-Linear, First-Order Components in Software Development, proc. 4th International Multi-Conference on Systems, Cybernetics and Informatics, Orlando, Florida, USA, 2000, IEEE CS Press. [88] Foresythe G.B, Hedlund J., Snook S., Horvath J.A., Williams W.M., Bullis R.C., Dennis M., and Sternberg R. Construct validation of tacit Knowledge for military Leadership, Annual Meeting of the American Education Research Association, [89] Höst M., Regnell B., and Wholin C. Using Students as Subjects A comparative Study of Students & Professionals in Lead-Time Impact

104 104 Assessment, proc. of 4th Conference on Empirical Assessment & Evaluation in Software Engineering (EASE), [90] Kitchenham B., Pfleeger S., Pickard L., Jones P., Hoaglin D., El Emam K.,. and Rosenberg J. Preliminary Guidelines for Empirical Research in Software Engineering. IEEE Transactions on Software Engineering, 28 (8), 2002, IEEE CS Press, [91] McDowell C., Werner L., Bullock H., and Fernald J., The Effects of Pair Programming on Performance in an introductory Programming Course, proc. of the 33rd Technical Symposium on Computer Science Education, Northern Kentucky - The Southern Side of Cincinnati, USA, 2002, ACM. [92] Melnik G. and Maurer F. Direct Verbal Communication as a Catalyst of Agile Knowledge Sharing, proc. of the Agile Development Conference 2004, Salt Lake City, USA, 2004, IEEE CS Press. [93] Nawrocki J. and Wojciechowski A. Experimental Evaluation of Pair Programming, proc. of European Software Control and Metrics, [94] VanDerGrift T. Coupling Pair Programming and Writing: Learning About Students Perceptions and Processes proc. of 35th SIGCSE technical symposium on Computer science education, Norfolk, Virginia, USA, 2004, ACM. [95] Williams L. and Kessler B., The Effects of "Pair-Pressure" and "Pair- Learning", proc. of 13th Conference on Software Engineering Education and Training, Austin Texas, USA,2000, IEEE CS Press. [96] Blackler F. Knowledge, Knowledge Work and Organizations: An Overview and Interpretation, Organization Studies 16 (6), [97] Canfora G., Cimitile A., Garcia F., Piattini M., and Visaggio C.A. Confirming the influence on educational background in pair-design knowledge through experiments, Proc. of the 20th Annual ACM Symposium on Applied Computing, Santa Fe, New Mexico, USA, 2005, ACM. [98] Choo C.W. The Knowing Organization, Oxford University Press, Oxford, UK, [99] Cockburn A. Agile Software Development, Addison-Wesley Pub Co, Reading MA, [100] Dybå T., Kitchenham B.A., and Jorgensen, M. Evidence-Based Software Engineering for Practitioners, IEEE Software, 22(1), 2005, IEEE CS Press, pp

105 105 [101] Jiawei H., Bailey A., and Sutcliffe A. Visualisation Design Knowledge Reuse, Proc. of the Eight International Conference on Information Visualization (IV 04), London, England, UK 2004, IEEE CS Press, pp [102] Li Y., Yang H., and Chu W. Generating Linkage between Source Code and Evolvable Domain Knowledge for the Ease of Sofwtare Evolution, Proc. of the International Symposium on Principles of Software Evolution (ISPSE 00), Kanazawa, Japan, 2000,IEEE CS Press. [103] McDowell C., Werner L., Bullock H., and Fernald J. The Effects of Pair Programming on Performance in an introductory Programming Course, Proc. of the 33rd Technical Symposium on Computer Science Education, Northern Kentucky - The Southern Side of Cincinnati, USA, 2002, ACM. [104] Nonaka I. A dynamic theory of organizational knowledge creation, Organization Science, 5, 1994, pp [105] Nosek J.T. The case for collaborative programming, Communication of ACM, 41(3), 1998, ACM, pp [106] Ran A. and Kuusela J. Design Decision Trees, Proc. of the 8th International Workshop on Software Specificationa and Design (IWSSD 96),Paderborn, Germany, 1996, IEEE CS Press. [107] Sommerville I. and Rodden T. Environments for cooperative system development, Proc. of Software Engineering Environments, Reading, UK, [108] Williams L., Cunningham W., Jeffries R., and Kessler R.R. Straightening the case for pair programming, IEEE Software,17(4), 2000, IEEE CS Press. [109] Williams L. and UpChurch R. L. In Support of Student Pair- Programming, Proc. of the thirty-second SIGCSE Technical symposium on Computer Science Education, Charlotte, NC, USA, 2001, ACM. [110] Williams, L., Krebs, W., Layman, L., Antón, A., Toward a Framework for Evaluating Extreme Programming, Empirical Assessment in Software Engineering (EASE) [111] Williams L., McDowell C., Fernald J., Werner L.,and Nagappan N. Building Pair Programming Knowledge Through a Family of Experiments, IEEE International Symposium on Empirical Software Engineering (ISESE), Rome Italy, 2003, IEEE CS Press. [112] (web site) (accessed on the 23 rd of June 2005).

106 106 [113] (web site) www-306.ibm.com/software/rational/ (accessed on the 23 rd of June 2005).

107 Chapter VI: Pair Programming and Distributed Software Development Pair programming needs a tight collaboration and a fluid communication. This could be a limit for performing successfully the practice in some specific operative contexts. The distributed processes are a clear instance of this situation. As a matter of fact, communication and collaboration are two serious problems to face when executing distributed and synchronous processes. This chapter investigates how distribution deteriorates the quality of pair programming. Introduction Recently, the distribution of software processes has become very widespread in industry and recommended practices are emerging [116], [118], [122]. These practices have been collected within a body of knowledge under the name of Global Software Development (GSD) [119][121]. An economic motivation for GSD is that large organizations tend to acquire smaller companies, with the aim of achieving a competitive advantage by enforcing their workforce, or to penetrate new market segments. Instead of one single large organization, a structure where many organizations are connected among themselves is becoming more and more widespread [123]. In such a configuration, named net enterprises or virtual enterprises, the inter-connected organizations are often scattered in different places, but they share processes at any level of detail, down to individual tasks. Consequently they also share the practices adopted to perform activities, like pair programming. In such cases, organizations can have the need for distributing pair programming. Distributed pair programming can be considered as a variation of pair programming where developers are geographically distributed and connected using technological means, rather than sitting in front of the same computer. Issues of communication and collaboration become primarily relevant when distributing software processes [117]. Distance could have negative effects on communication-intensive tasks and on spontaneous conversation [120]. A group of the experiments was realized in order to investigate at which extent the distribution may deteriorate the recognized benefits of pair programming, in terms of quality and productivity. The formulation of the research goal follows: to analyse the effectiveness and efficacy of distributed pair programming with the purpose of evaluating how distribution deteriorates benefits of the practice, from the viewpoint of the developer, in the context of a software maintenance student project.

108 108 The Experiments In order to reach the research goal, we have made an experiment at the University of Sannio, Italy. Afterwards, a replica at University of Naples Federico II, Italy, was headed to confirm the findings of the first experiment. Both the experiments have been accompanied by qualitative analysis accomplished with a questionnaire-guided discussion with experiment subjects. The first experiment This section describes the experiment made at the University of Sannio in terms of definition of the hypotheses and metrics, characterization of the context, and operation. Definition Two main concerns are critical for the distribution of pair programming tasks: collaboration and communication. If the technological platform does not address these two issues adequately, activities like code reviews, switches of roles, and decision making can be obstructed up to deteriorate the practice itself. There is evidence that pair programming can decrease time for developing and increase quality of work; it is reasonable to believe that distribution can cause the lost of such advantages. The research questions investigated in the experiment are the following: RQ1 RQ2 Are there significant differences in effort when the pair s components are distributed, referring to co-located pair s components? Are there significant differences in quality when the pair s components are distributed, referring to co-located pair s components? From here on, distributed pair indicates a pair whose components are distributed; co-located pair indicates a pair whose components are co-located. The experiment investigated RQ1 and RQ2 for maintenance tasks. The null hypotheses were: H 0RQ1 : Does not exist a significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming, µ distr_time = µ co-loc_time

109 109 H 0RQ2 : Does not exist a significant difference in the quality of maintenance performed between distributed pair programming and co-located pair programming, The alternative hypotheses were: µ distr_quality = µ co-loc_quality H 1RQ1 : A significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming does exist, µ distr_time µ co-loc_time H 1RQ2 : A significant difference between the quality of distributed pair and colocated pairs does exist, µ distr_quality µ co-loc_quality. The following metrics were used to measure effort and quality: Effort spent: measured as the difference of the start time and the end time required to accomplish the maintenance tasks; ratio scale. Time was calculated by time sheet fulfilled by subjects. Quality of the maintenance realized: f qual, an ordinal scale. The quality was evaluated on the basis of black box testing. Test cases were written and executed by experimenters and they were hidden from the subjects. f qual =Σi (bin i *over i ), where: bin i is 1 if the maintainers completed the maintenance request 0 otherwise over i is 3 if the modified programs passed tests successfully (80% of tests) 2 if the modified programs passed tests with a partial success (<80% AND >20% of tests) 1 if the modified programs did not pass tests (< 20%). i =1, 2, 3. For i maintenance requests Black box testing was used in order to evaluate quality mainly for two reasons. Firstly, the test driven development practice [114] is used in order to build working code, according to extreme programming; secondly, the purpose of each iteration in extreme programming is the production of a system s feature valuable to the customer.

110 110 Characterization Subjects. The experimental subjects were volunteer students of the Software Engineering II, a course of the fifth (and final) year of the laurea degree in Computer Engineering at the University of Sannio, Benevento. Process. Before running the experiment, the subjects were trained on pair programming. Such training consisted of seminars about agile methods and extreme programming with a special focus on pair programming practice, whose duration was 4 hrs. Afterwards, the students spent 2 hrs in the laboratory performing pair programming: they developed some java programs. The students spent 2 hrs more in the laboratory in order to implement a training round. During that period students used the protocol outlined in Figure 24 and the experiment technological platform described in Table 34. After the training round, they had the opportunity to enforce their knowledge about the experiment s tasks and execution by discussing their doubts with experimenters. Driver Observer Write the code Listen to the observer s suggestions and ideas Leave the keyboard to the observer when needed Continuously check the actions of the driver Take the control of the keyboard, but only after common agreement with the driver Achieve off-line tasks Optimise parts of the code Fig. 24. Pair programming instructions. A randomized design was used with one factor (placement of pair s components) and two treatments (co-located and distributed). Sixteen subjects took part to the experiment, forming eight pairs organized in two groups (A and B) of four pairs. The experiment consisted of two rounds: during the first round the pairs of the group A were co-located and those of the group B were distributed; they had to modify program P1, according to three perfective maintenance requests. During the second round the pairs of the group A were distributed and those of the group B were colocated; they had to modify program P2, also in this case according to three perfective maintenance requests different from the round I. In both the rounds the pairs were formed randomly within the groups. The design of the experiment is illustrated in Table 35.

111 111 Table 34. Experiment technological platform. Tools Function Purpose Motivation Share the The desktop: it lets experimenters VNC the remote had experience in Collaboration control of a PC. using it in previous projects; Open Source. Text chat. Its usage was NetMeeting Communication well known to all the experimental subjects. IDE for Java Subjects had JBuilder Programs. experience in Programming using it in previous projects. Table 35. Experimental design. Subjects Round I Round II Group A (8 Co-located P 1 Distributed P 2 units) Group B (8 units) Distributed P 1 Co-located P 2 The distribution was achieved by placing the two components of each distributed pair in two different laboratories of University buildings. Both the programs to be modified during the experiment were written in Java; Table 36 shows information about the two programs. Assignment. The students were provided with the following documentation: 1) listings of the programs; 2) textual description of maintenance tasks; 3) time sheet to fill in; 4) description of the correct execution of pair programming roles; and 5) questionnaire to be compiled at the end of the experiment.

112 112 Table 36. Experiment s Classes and related LOCs. Program LOCs # Classes Functional Description MultisalaMngmt (P 1 ) ProjectMngmt (P 2 ) 95 4 A booking system for a cinema with many projection rooms. A system for managing the attributes of a project s activities. Operation The whole experimentation lasted 6 hours. The time was measured with a time sheet filled in by each pair participating in the experiment and was checked out by one of the experimenters. This helped enforce the reliability of results. The function f qual was evaluated by the experimenters, executing the black box tests for the modified programs of each pair. As shown in Figure 25, Mann Whitney U test was used between the co-located and distributed pairs of the same round, because data was not normally distributed. This test evaluates the significance of differences in performance and quality when co-located pairs are different from the distributed ones. Round I Group A co-located Mann Whitney Group B distributed Round II Group A distributed Mann Whitney Group B co-located Fig. 25. Tests used.

113 113 Analysis of data In Table 37 the results of statistical tests on the effort and quality data are reported: both the hypotheses were tested by fixing the p-level threshold value at 5%. Table 37. Statistical test results. p-level Effort round I Effort round II Quality round I Quality round II Description Mann Whitney test on effort data between Group A (co-located) and Group B (distributed) in round I. Mann Whitney test on effort data between Group A (distributed) and Group B (co-located) in round II. Mann Whitney test on quality data between Group A (co-located) and Group B (distributed) in round I. Mann Whitney test on quality data between Group A (distributed) and Group B (co-located) in round II. The differences of the response variables are not statistically significant; relying on the experiment outcomes, it cannot be claimed that pair programming efficiency is affected by distribution. Only the round II quality s results are statistically significant. Information about descriptive statistic of the sample is provided in Table 38. Table 38. Statistical test results. Dev. Stand Co-located Avrg Max Min Moda Dev. Stand Round I Distributed Avrg Max Min Moda quality effort Na Round II quality Na Na effort na Na

114 114 In order to make more readable information of Table 38, the meanings of the acronyms are explained. Dev. Stand. : standard deviation; Avrg: average; Max: maximum value; Min: minimum value; Moda: the most frequent value of the sample; Na: not available. Figure 26 shows the box plots of effort in both the rounds. The median values are very close; on the contrary, the 25 percentiles are significantly different. This value is an indicator of the performance of the fastest pairs: the best distributed pairs took a smaller time than the best co-located pairs. This suggests that the co-located pairs spent additional time, probably for discussing common policies and strategies to follow when accomplishing the task. We believe that the distributed pairs, after an initial period in which they attempted to collaborate and communicate, broke pair work, behaving as solo-programmers. As matter of fact, the best times of distributed pairs suggest that subjects did not negotiate strategies with the companion. 180 Box Plot ( 2v*4c) 200 Box P lot ( 2v*4c) Var25 Var26 80 Median 25%-75% 60Non-Outlier Range Var22 Var23 Median 25%-75% Non-O utlier Range Fig. 26. Effort for co-located (Var 25) and distributed (Var 26) pairs in round I; effort for colocated (Var 22) and distributed (Var 23) pairs in round II. The worst times of distributed pairs are due to people keeping trying to collaborate although they had problems with communication. In other words, the distributed pairs components tend to dismiss from each other after an initial time of collaboration. It should be noticed, also, that the dispersion of values of distributed pairs effort is broader than that of co-located in both the box plots. The collaboration within the co-located pairs entails a leveling of the upper and lower values of the effort interval. This is due to a phenomenon of performances compensation: when the driver slows down the rhythm of work, the observer keeps the control of the

115 115 keyboard and continues the work. By assuming that the distributed pairs data reflect the behavior of a solo programmer, the graphs become meaningful. When slowed down, the solo programmer preferred to neglect the help of the observer rather than dealing with matters of communication and collaboration in order to maintain the pairing. The round II suggests the breaking of the distributed pairs with even more evidence: both the median and 25 percentile are greater for co-located pairs; and the dispersion of values is once again broader for distributed pairs. The more remarkable difference with the results of round I is the interval in which the co-located values vary, which is tighter than in round I. This difference is probably due to the fact that subjects learned to better work in pairs after the round I experience. This conclusion is confirmed also in Figure 27, where the quality of co-located pairs is significantly better than that of distributed pairs in the second round. 9 Box Plot ( 2v*4c) 10 Box Plot ( 2v*4c) Var18 Var19 3 Median 25%-75% Non-Outlier 2 Range Var21 Var22 Medi an 25%-75% Non-Outlier Range Fig. 27. Quality for co-located (Var 18) and distributed (Var 19) pairs in round I; quality for colocated (Var 21) and distributed (Var 22) pairs in round II. In summary, the analysis of data from the first experiment suggests that without adequate means of communication and collaboration, the pairs tend to break down (the dismissal conjecture). This can be an important risk factor when implementing distributed pair programming. The dismissal occurs mainly under two conditions: The absence of an adequate communication support: the contemporary review is one of the aspects that make pair programming advantageous. Contemporary

116 116 review requires a fluent communication in order to be decisive for the effectiveness of pair programming. A textual chat, for instance, is obstructive for the pair; the operations for using the chat disturb the continuity of work. The absence of an adequate collaboration support: the switch of roles avoids interrupting the rhythm of work. It requires that the observer can keep the control of the workstation whenever the driver cannot go on coding. Some desktop sharing tools suffer technological limitations that make annoying switching the roles. The pair dismissal conjecture has been confirmed by subjects in a questionnaireguided post-experiment assessment: while in the co-located round most of the pairs worked together for all the task, in the distributed round, after an initial time during which pairs tried to settle a common strategy of action, several among them tended to work as singleton developers. The initial roles became frozen, the switching was increasingly disregarded and finally only the driver developed the code whereas the observer looked at the companion working. Sometimes the observer attempted observations and suggestions which were neglected by the driver or often mismatched. Quality s Results The experiment did not produce empirical evidence that neither expended effort increased nor quality of code decreased. The accomplished analysis of quality was not comprehensive of all aspects of code quality. It was evaluated by testing the modified codes with a suite of functional tests. Such a measure gave us no information about how the pair s work differed in terms of internal attributes of code comparing co-located and distributed pairs. In order to have a clearer and more complete comprehension of distribution s impact on quality of modified code in pair programming, some metrics of code complexity were gathered and analyzed. This further analysis was needed in order to identify more deeply the effect of distributed pair programming on the evolution of software. Four metrics were selected for analysis, listed in Table 39. Measures were collected by the Panorama tool [125].

117 117 Table 39. Complexity metrics. Metrics Definition Cyclomatic Complexity J-Complexity Methods per class Methods users per class e-n+2, where n and e are the nodes and edges in the CFG The minimum number of instrumentation points required for recording all block test coverage data. Total number of methods in a class The total number of methods that use the methods of the class Experimental Results The results of the Mann-Whitney test are summarized in Table 40. Table 40. Mann-Whitney tests on complexity. Rank Sum Distr Rank Sum Co-loc U Z p-level Cyclomatic 45,00 60,00 17,00-0,96 0,338 Complexity J-Complexity 36,00 69,00 8,00-2,11 0,035 Methods per Class Method Users / class 45,00 60,00 17,00-0,96 0,338 43,00 62,00 15,00-1,21 0,225 The null hypothesis can be rejected with a level of significance (p level<0,05) only for J-Complexity metrics. With regards to the other metrics of complexity, there is no empirical evidence that the null hypothesis can be rejected. Standing such a result for the J-complexity, a one sided Mann Whitney test is needed in order to reject the null hypothesis, This further test gave us interesting results, reported in Table 41. Table 41. Mann-Whitney one sided test on J-Complexity. Rank Sum Distr Rank Sum Co-loc U Z p-level J-Complexity 36,00 69,00 8,00-2,11 0,038

118 118 The null hypothesis concerning the J-Complexity can be rejected with a significant evidence. J-Complexity percentile variations are smaller in distributed than in colocated pair programming maintenance tasks. It seems odd that the null hypothesis for J-complexity can be rejected in favor of distributed whereas for cyclomatic complexity it cannot. When comparing the mean values of normalized variations, they are smaller for distribution than in co-located, for each metric. The values are shown in Table 42. Table 42. Mean values for the complexity metrics. Distributed Mean values Colocated Cyclomatic Complexity 0,38 0,66 J-Complexity 0,28 0,72 Methods per Class 0,38 0,47 Method Users / class 0,49 0,63 Cyclomatic complexity and J-complexity should be strongly related with each other, so that if one of two varies, the other one is expected to vary, too, with a similar trend. So it s surprising that, although the mean values are close for the two metrics, only the null hypothesis for j-complexity can be rejected. It is supposed to depend on the specific values the samples consist of. By analyzing box plot graphs, confirmations for the dismissal conjecture arise, as discussed in the next section. Interpretations By interpreting results on complexity s analysis of the code, the pair dismissal hypothesis is enforced and enriched with further observations. In Figure 28 and Figure 29 the box plots of cyclomatic complexity and those of J- complexity are presented. It can be noticed that both the cyclomatic and the J complexity are affected by distribution. More precisely, the difference in complexity is higher when co-located. It is intended the difference of the complexity metrics between its value before the intervention and after the intervention. Assuming the pair dismissal conjecture be true, this can be clearly justified. When co-located, the pair s components reduce more the complexity of code than when distributed, because they can discuss architecture and implementations of the system to be developed. As a matter of fact, the co-located pairs produce better code from the modularization and cohesion viewpoints. This suggests that in co-located pairs the observers accomplished review and the improvements were brought to the code. When distributed, pairs components tend to work alone: review did not sustain the improvement of the code neither in complexity nor cohesion.

119 119 1,6 Box P lot ( 2v* 7c) 1,4 1,2 1,0 0,8 0,6 0,4 0,2 0,0-0,2 Var66 Distributed Var67 Co-located M edian 25 %-75 % No n-outlier Ran ge Outliers Fig. 28. The box plots of normalized variations of cyclomatic complexity for distributed and for co-located pairs. 1,6 Box P lot ( 2v*7 c) 1,4 1,2 1,0 0,8 0,6 0,4 0,2 0,0-0,2 Distributed V ar69 Co-located Var70 M edian 25 %-75% Non -O utlier Ran ge Fig. 29. The box plots of normalized variations of J-complexity for distributed and for colocated pairs. In Figure 30 and Figure 31 the box plots of methods per class and method users per class are reported.

120 120 1,2 Box P lot ( 2v*7c) 1,0 0,8 0,6 0,4 0,2 0,0-0,2 Distributed Var72 Var73 Co-located Median 25%-75% Non-Outlier Range Outliers Fig. 30. The box plots of normalized variations of methods per class for distributed and for co-located pairs. 1,2 Box Plot ( 2v*7c) 1,0 0,8 0,6 0,4 0,2 0,0-0, 2 Distr Var75 ibuted Var76 Co-located Median 25%-75% Non-Outlier Range Fig. 31. The box plots of normalized variations of methods users per class for distributed and for co-located pairs.

121 121 Experiment s replica The experiment s replica was aimed at testing the same hypotheses of the first experiment, while minimizing the occurring of the pair dismissal phenomenon. In order to limit the dismissal, two main policies were followed. Firstly, a more intensive and focussed training to students was performed: in addition to seminaries and lab exercises, students have been trained in working together and making more fast decisions. Secondly, the time for performing the tasks was sensitively reduced: it passed from 180 minutes per round to 90 minutes per round. From the first experiment s assessment discussion it emerged that in the first period the distributed pairs strove to work together. Then, given that the rhythm of work slowed down too much and that the communication and collaboration became too difficult to implement, they started to work alone. The idea was to reduce the total amount of time available to the subjects, so to gather data when distributed pairs were yet trying to work together. The definition of the replica is the same definition of the first experiment previously discussed; therefore, only the characterization and operation will be discussed here. Table 43 shows the experimental design. Table 43. Experimental Design. Subjects Round I Round II Group A (4 units) Co-located P 1 Distributed P 2 Group B (4 units) Distributed P 1 Co-located P 2 Characterization and operation Subjects. The subjects were volunteer students of the Software Engineering Course in the fourth year of the laurea degree in Computer Engineering at University of Naples Federico II. They had to implement three maintenance requests on a C++ program. The first maintenance request was corrective, the other two were perfective ones. Process. Four pairs have been involved in the experiment s replica, organized in two groups (A and B). The experiment consisted of two rounds. In the round I group A s pairs were co-located and group B s pairs were distributed and they had to implement three maintenance requests to the program AreaCalculating. In the round II group A s pairs were distributed and group B s pairs were co-located and they had to implement other three maintenance requests to the program AverageNumber. Information about programs is reported in Table 10.

122 122 Table 44. Information on replica s programs. Program LOCs # Classes Description AreaCalculating This programs calculates the 76 2 (P 1 ) areas of plan geometry Figures. This program calculates some AverageNumber 67 3 statistical values on a sample of (P 2 ) numbers. In Table 45 the Mann Whitney U test results on quality and effort data are reported, because data was not normally distributed. The hypotheses were tested by fixing the p-level threshold value at 5%. Table 45. Mann-Whitney U tests on effort and quality data for the second experiment. p-level Effort Quality Description Mann Withney tests on effort data between co-located and distributed pairs Mann Whitney tests on quality data between co-located and distributed pairs There is empirical evidence that the quality of pair programming is affected by distribution (p = < 0.05), whereas there is not empirical evidence that distribution affects effort. Descriptive statistics of the sample are provided in Table 46. Table 46. Descriptive statistic of the replica s sample. Dev. Stand Colocated Avrg Max Min Moda Dev. Stand Round I Distributed Avrg Max Min Mod a Qualit Na Na y Effort Na Na Round II Qualit Na y Effort Na Na

123 123 In Figure 32 the box plots for effort and quality are illustrated. The box plots of effort show that the performance of distributed pairs is worse than the performance of co-located pairs. The worst co-located pairs reached the same level of the best distributed ones. This result seems to contradict the dismissal conjecture. Actually, the replica was planned in order to reduce the dismissal phenomenon within the pair. In fact, the available time for accomplishing the tasks was reduced: during the observation time, subjects of distributed pairs worked as pairs while dealing with collaboration and communication problems. Standing such considerations, the overall degradation of performance in distribution seems to be due to technological issues. The dismissal phenomenon is not yet emerged in the replica, but the negative side effects of inadequate communication and collaboration means did. Such consideration appears clear when analyzing quality data illustrated in Figure 32. The median value is greater for co-located than for distributed pairs. Moreover, it should be observed that the worst quality level of colocated pairs is better than the best level of distributed ones. The difference between the quality obtained from co-located and distributed pairs is greater than in the first experiment. Probably, also this was due to the reduction of available time. In the first experiment people tried to collaborate in the initial phase, but then started to work alone, ameliorating the quality of the work because they removed the continuous effort in establishing a pair fashion stile of work. During the replica, people worked as pairs while facing the problems connected with the platform: the reviews were affected by such problems and consequently the quality was very low. 90 Box Plot ( 2v *4c ) 9, 5 Box Plot ( 2v*4c) 80 9, , , 0 7, , , Var14 V ar15 6, 0 Median 25%-75% 5, 5Non-Outlier Range Var9 Var10 Median 25%-75% N on -O ut lier Ra ng e Fig. 32. Effort for co-located (Var 14) and distributed (Var 15) pairs; quality for co-located (Var 9) and distributed (Var 10) pairs.

124 124 The dismissal phenomenon: causes and remedies After the experiment and its replica, a questionnaire-guided discussion with subjects was conducted, in order to accomplish a qualitative investigation on the experiment outcomes. The most relevant result of the discussion was the confirmation of the dismissal conjecture, also issued in reference [115]. Together with the subjects two main candidate causes for the phenomenon were identified: the communication limit (we named it the faulty phone cause) and the divergence of approaches (we named it the two-minds cause). They are described in the following: The faulty phone cause. As previously discussed, communication is a critical issue for implementing successfully distributed pair programming. Communication is important for performing contemporary review and decision making within the pair. If the support technology does not satisfy completely the needs for a comfortable communication, it could obstruct the driver while typing and the observer while inspecting the code. According to this experience the text chat was not completely adequate to support distributed pair programming as it forced subjects to pay continuously attention to the chat window in order to get the companion s intervention, and so they had to take eyes off the code frequently. After a while, people felt uncomfortable of this way of communicating and switched to work alone on the code, ignoring messages from the pair s companion. This contributed to break distributed pair. The two-minds cause. Each member of the pair brought a proper idea of strategies to meet the goals. Having the companion far from own side discourages people to argue for their ideas. The pair s components tend to assume an anarchic behaviour and the roles are performed chaotically: the control of the machine is taken without the consensus of the companion and the reviews are neglected. In distributed pair programming, the collaborative work of the pair needs to be more disciplined than in co-located pair programming. It is necessary to train adequately the subjects. The assessment conducted, also suggested solutions in order to manage adequately such issues. Behavioral protocol. Pair programming forces two people to share means of work they use to consider strictly personal. This makes people resilient in assuming either an observer or driver behavior completely, mainly because they do not know exactly which kinds of tasks are charged on the observer role and which ones on the driver role. Researchers and practitioners suggest to switch the two roles when necessary. The problem is that switching is less spontaneous in distributed settings if people are not adequately experienced with agile methods. In this case people tend to work mainly asynchronously on different tasks more than as a pair on the same task. People need adequate training to properly apply pair

125 125 programming: the duties of the observer and those of the driver must be distinguished clearly. A behavioral protocol can be a useful help for people with scarce experience of pair programming. Communication enabler. In order to support distributed pair work, communication means must implement a metaphor of the actual world. Effective platforms for distance communication must enable some sociological peculiar aspects of real life communication. In distributing pair programming, people need a communication means that owns at least two features: vocal communication and a blackboard. Vocal dialogue lets people communicate and keep on working to their task at the same time. Vocal dialogue helps people collaborate in a more realistic way than text-based chat or instant messaging. The latter devices force people to assume an unnatural behavior and this obstructs the continuity of work. A defective communication is one of the candidate causes of the pair dismissal (the faulty phone cause). An interesting advise stemmed from subjects. Video-chat tools are neither required nor considered as useful. Blackboard can be exploited as a means to transmit graphs, algorithm drafts, pictures as hints of design documentation s pieces. Different experiences and capabilities. Our experimental subjects had a similar academic background. Differences among them consisted mainly of their academic curricula. Some of them had short experience as developers in firms, but not so long to determine a well defined gap with the others. The gaps in terms of capabilities within the pairs were too reduced for being useful but enough for leading the pair far away from co-ordination. The pair needs a component with more experience, who can acts as a leader for the pair. Distribution magnifies critical situations: the figure of a leader becomes fundamental for successful task completion. Change tracing and highlighting. The environment supporting distributed pair programming must keep track of the modifications realized by the two developers. It seems to be important for co-operative work to have an immediate idea of the place and author of a modification. Distribution emphasizes because pair s components do not share physical space, e.g., they cannot use fingers to point the code under review. The platform can use different colors for referring to the different programmers. It should be useful to keep track also of the time of modifications. A source of overhead was due to the need for holding in mind who has modified which part of code and when. Awareness of the project. Distributed pairs tend to reduce discussions; as a consequence, the pair s components do not develop a common vision of the project: they consider different priorities for the goals of the project, different strategies to be adopted, different approaches to solve problems. Frequent rounds for knowledge leveraging must be properly planned during all the phases of the project; they are recommended especially at the start up of the project. Some subjects proposed that one in the pair should have a greater awareness of the

126 126 project than the other one. Such a person should assume mainly the role of the observer during the development phases. Experimental validity This section presents a discussion of the threats considered as the more relevant for the design of the experiment, referring to the classification in reference [124]. Internal validity. Maturation. The subjects were students not experienced with pair programming. For both the experiments, during the first round, the subjects acquired a competence in the new practice, then exploited in the second one. The Wilcoxon test was used between the rounds of the same group, in order to evaluate the significance of differences in performance and quality due to the maturation of groups between the two rounds. Round I Group A co-located Wilcoxon Round II Group A distributed Group B distributed Group B co-located Wilcoxon Fig. 33. Test for maturation. In Figure 33 the design of tests are illustrated and in Table 47 the results are reported. Table 47. Wilcoxon tests. Effort Group A Effort Group B Quality Group A Quality Group B p-level Description Wilcoxon test on effort data of the Group A between round I and II Wilcoxon test on effort data of the Group B between round I and II Wilcoxon test on quality data of the Group A between round I and II Wilcoxon test on quality data of the Group B between round I and II

127 127 There is no empirical evidence that maturation affects differences in each group s performance and quality between the two rounds (p>0.05). Construct validity Mono Operation bias. The subjects were required to modified one program in each round, according to three specific maintenance requests. The difference of assignment in each round can affect the final results. In order to evaluate if such differences between rounds are statistically significant, we have used Wilcoxon tests. The tests were accomplished both for the first experiment and the replica (Figure 34). Round I Group A co-located Round II Group A distributed Group B distributed Wilcoxon Group B co-located Fig. 34. Design of statistical test for maturity threat. Table 48. Wilcoxon tests results in order to evaluate the mono-bias threat. Effort first experiment Quality first experiment Effort replica Quality replica p- Description level Wilcoxon test on effort data between round I and round II in the first experiment Wilcoxon test on quality data between round I and round II in the first experiment Wilcoxon test on effort data between round I and round II in the replica Wilcoxon test on quality data between round I and round II in the replica.

128 128 Conclusions This chapter aimed to explore the third dimension of investigation: is pair programming suitable for all contexts? The conjecture is that pair programming is not suitable for all the contexts of software development, but it is successful only if some characteristics are present. Several experiments have demonstrated the benefits of pair programming in terms of performances and quality. Such benefits are claimed to be due to a tight collaboration and a fluid communication. As a consequence, if these conditions are missing, the practice can be affected. Distributed software development is an example of this kind of operative context: collaboration and communication are two issues to address when distributing processes. The distribution of software processes and teams is increasing within industry. An experiment and a replica were realized in order to evaluate the impact of distribution on pair programming. Both the first experiment and the replica have produced empirical evidence that quality of pair programming is affected by distribution. In the first experiment, the dismissal phenomenon emerged: if the technological platform does not support adequately communication and collaboration, the distributed pair working gets interrupted and just one of the pair s components keeps on the control of the workstation, neglecting the review and the switch requests from the remote companion. This entails the lost of benefits in terms of performance and quality, proper of the pair programming. Such phenomenon is a factor of risk and should be properly managed when planning the implementation of distributed pair programming within a process activity. The replica was planned in order to minimize the pair dismissal phenomenon. The data collected from the replica reflected actually the behavior of distributed pairs while facing communication and collaboration problems and strove working together. In the first experiment the dismissal phenomenon played a central role in the definition of the final results. In the replica, the dismissal was limited. The replica s results about quality make us believe that the support platform is the candidate factor for maintaining unchanged quality and performance when distributing pair programming. Both the replica and the experiment offered the following main outcomes. There was empirical evidence that the distribution affects pair programming quality. Some factors of distribution settings make the quality of the pair programming decrease. The quality assessment of the experiment suggested that such factors have to be searched in the infrastructure of collaboration. Communication has to be fluent and neither obstructive for the driver nor the observer. On the contrary,

129 129 reviewing code and discussing a common strategy require additional effort to be accomplished successfully. There was empirical evidence that the percentile variation of J-Complexity decreases when distributing pair programming. There was no empirical evidence that the distribution affects the other metrics. The interpretation of experimental data leads to a confirmation of the pair dismissal hypothesis. This means that distributed pair programming produces worst results than co-located pair programming. The conjecture is that this is due to not appropriate training of subjects and not appropriate communication means. There was no empirical evidence that effort increases when distributing pair programming. Although the dismissal phenomenon favored a higher expense of time in co-located tasks, the differences are not statistically significant. The qualitative analysis confirmed that the time can be reduced with distribution. The motivation for that is not encouraging: this is due to the breaking down of collaboration. Finally, it is only a waste of resources: two programmers are paid whereas only one works, without benefits of contemporary reviews. Some candidate factors determining the success of the pair programming. We have identified them in the selection of an appropriate communication and collaboration support. The experiment was executed in academic setting. Such kind of experiments helps to fix bugs within the experimental design, before executing it in industrial setting. As matter of fact, the phenomenon of dismissal was noticed only after the first experiment s round, and not foreseen during the design of the experiment. Furthermore, experiments with students help to point out which are the likely findings that can be interesting for industry, in order to propose appealing investigations and gain the maximum collaboration from professionals. A strong limitation of the experiment is the dimension: the samples are small, the time for observation is short, the size of the problem is scarcely significant if compared with marketplace applications. Such limitations can be accepted by considering the experiment as a preliminary investigation upon distributed pair programming. The aim is to define the most suitable design for executing the experiment in industrial setting. Summarizing, the investigation leaded to the conclusion that pair programming is not suitable for all the operative contexts. The benefits of pair programming are dependent on the process where the practice is used. From the experiments the following research questions emerged: Does an appropriate platform let the distributed pair programming remain beneficial as well as the co-located? From the post-experiment assessment discussion, one major reason of the dismissal of the pair and, consequently, of the deterioration of the pair programming effectiveness, is the lack of an appropriate platform. Such platform should comprise at least: an audio channel as support for communication, a system to exchange/share images and drafts, a versioning control assist continuous reviews. This suggests that an ad-hoc system for distributed pair programming would be helpful.

130 130 What is the best combination of the pairs in terms of competence, experience, and character profile in distribution? It seems that knowledge and behavioral aspects of individuals are critical for the success of the pair programming. All the subjects have highlighted that those aspects have a great impact on the practice. It should be very interesting to have empirical evidence of such relationship. Moreover, it should be useful to understand how to properly manage such factors in forming the pair. If this issue is important for co-located pair programming, it becomes critical for distributed pair programming, where the implementation of the practice is obstructed by other kinds of problems. For instance, the difference of culture and habits can become further hurdles to the success of the pair programming. Is distributed pair programming only a need or it can fit certain business targets better than the co-located pair programming? Till now distribution is considered a need, arisen from the widespread diffusion of pair programming and global software development. Moreover, maintaining the components physically detached can be beneficial for pair programming in specific contexts. The switch of the role should happen in a more disciplined manner. The pair can exploit resources that are placed in two different organizations, and govern them directly. Pair programming can be used for merging people with very complementary competencies and located in two different places. Investigating when and how distribution can improve the practice of pair programming should present interesting findings. Bibliography [114] Beck K., Extreme Programming Explained: embrace change, Addison- Wesley, Reading, Massachusetts, [115] Canfora G., Cimitile A., and Visaggio C.A., Lessons learned about distributed pair programming: what are the knowledge needs to address? proc. of IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE-2003), Linz, Austria, IEEE CS Press. [116] Carmel E. and Agarwal R. Tactical Approaches for alleviating distance in global software development ; IEEE Software, 18(2), 2001, IEEE CS Press, [117] Chaffey D. Groupware, workflow and intranets Digital Press, Boston, Massacchusetts, Woburn, [118] Ebert C. and de Neve P. Surviving Global Software Development ; IEEE Software, 18(2), 2001, IEEE CS Press,

131 131 [119] Fuglewicz P. Global Software Development: Attainable Challenge or the Holy Grail?, Cutter It Journal, 12(3), 1999, [120] Herbsleb J.D. and Grinter R.E., Architecture, Coordination, and Distance: Conway s Law and Beyond, IEEE Software, 16(5), 1999, IEEE CS Press, [121] Herbsleb J. D., Mockus A., Finholt T.A., and Grinter R.E. An empirical study on Global Software Development: distance and speed, Proc. of the 23rd International Conference on Software Engineering 2001, Toronto, Ontario, Canada, 2001, IEEE CS Press, pp [122] Miller H. and Sanders J., Scoping the Global Market: size is just part of the story, IEEE IT Pro, 1(3), 1999, IEEE Computer Society, [123] Nohria N. and Eccles R. Networks and Organizations: Structure, form, and action, Cambridge, MA, Harvard Business School Press, [124] Wohlin C., Runeson P., Host M., Ohlsson M.C., Regnell B., and Wesslen A. Experimentation in Software engineering: an introduction, Kluwer Academic Publishers, Boston, MA, [125] (accessed on the 23 rd of June 2005).

132 Chapter VII: Conclusions In the last decades a new approach to manage software production arose, namely Agile Methodology. The official foundation of the so called Agile Movement occurred when a group of researchers and practitioners wrote a document, named the Agile Manifesto, where the grounding values and principles leading the Agile model of software processes were discussed. The motivation beyond the rising of this new approach to software development was that in certain situations the known and widespread process models used so far failed, driving the projects to exceed the budget, misleading the customer s expectations, and overrunning. There is not yet a large consensus about the effectiveness and the applicability of agile methods, but there are mainly two contrasting positions in the research community. One claims that agile methods appear too extreme and as denying some important rules and good practices of Software Engineering. The other one claims that agile methods can be winning in those contexts where plan driven approaches fail. There is not enough empirical evidence for supporting fully one of the two positions: the thesis is intended to contribute to build such body of knowledge. The main concern about agility The main concerns deriving from the comparison of Software Engineering body of knowledge and the values of agile manifesto have been largely discussed in chapter II and Table 49 provides a synoptic overview of this analysis. Values 1. Individuals and interaction over processes and tools 2. Working software over comprehensive documentation 3. Customer collaboration over contract negotiation 4. Responding to change over following a plan Table 49. Problems with agile processes. Problems Without a process definition, is the process yet repeatable, stable, predictable? Without documentation, is the product understandable, maintainable, transferable? Without an appropriate definition and analysis of requirements, how is the quality of the product realized? Without a plan, is it possible to realize estimations and measurements on the projects in order to discover points of cost s dispersions?

133 133 The problems highlighted let believe that the adoption of agile methods in actual industrial contexts could bring about serious drawbacks for the organizations. The follow research question arises: which are the consequences of applying agile methods in an actual context of software production? There is not a consolidated and mature body of empirical evidence which allows research community to answer definitely the question. The thesis aims at gathering empirical evidence about some aspects of agile methods, and in particular investigates benefits and limitations of one agile practice, namely pair programming, facing three research questions: 1. Does pair programming help to maintain stable and high the productivity of project teams? 2. Does pair programming support the knowledge leveraging among the team s members? 3. Is pair programming affected by distributing the components of the pair? For each one of the three research questions an empirical investigation has been achieved. The main results are discussed in the following subsections. Productivity and Stability of pair programming throughput Pair Programming seems to be a proper candidate for improving the performance of the team: there is empirical evidence that pair programming increases the productivity of the team. A likely explanation could be that the pair continuously discusses about the problems and the solutions to adopt: this enforces the commitment on the problem, and the sharing of the best strategies to solve it among the team s members. Moreover, the role switching helps reduce the latency times which are typical of solo programming. Pair programming produces a greater stability in the throughput of the teams. This should help realize better estimations of effort required and completion of the tasks. Unfortunately, the experiments realized did not produce enough empirical evidence about this aspect. Qualitative results suggest that pair programming ensures a greater cohesion in the team: it means that the team tends to better coordinate itself, to check constantly the reaching of internal and intermediate goals, and to make as more homogeneous as possible the team pace. This could be due to the pair pressure, which requires the people to adopt a major rigor in the individual and cooperative work.

134 134 Knowledge transfer with pair programming The experiment realized showed evidence that pair programming is significantly effective for leveraging knowledge within project teams, and in particular the following conclusions can be drawn: there is empirical evidence that pair programming helps diffuse and improve the design knowledge when evolving systems. This suggests that the practice can be used as a means for a two-fold aim: (i) at the kick off of the projects for letting the team to acquire quickly a certain level of knowledge of the system; (ii) at the advanced phases of the project, to increase the knowledge of a system in critical situations, for instance when the project is going to run out. Working at a task does not guarantee that the designer improves its knowledge significantly. This entails that a strategy to make the designer learn by her work is necessary, and using the pair programming could be an appropriate strategy. The individual background is relevant for the effectiveness of the practice. Team managers should pay attention when choosing people to compose the pairs. This issue deserves major and more focalized investigations. Pair programming may compensate the lack of documentation in software team or a scarce and infrequent access to the documentation. The problem is that the knowledge is transferred from a person to another one: it is necessary that who brings the knowledge takes part to the project in order to share it with the other people involved in the project. This is the stronger limitation of the practice from the perspective of knowledge sharing. Pair programming shows a property that the documentation does not own. It could be effective for transferring tacit knowledge, which can not be captured by documentation. Distribution can affect pair programming Several experiments have demonstrated the benefits of pair programming in terms of performances and quality. Pair programming could not scale, in the sense that in some contexts it could loses its benefits. The distribution of software processes is an exemplar situation. As a matter of facts, distribution hinder communication and collaboration of pair s components and this could lead to deteriorate its quality. An experiment and a replica were realized in order to evaluate the impact of distribution on pair programming. Both the first experiment and the replica have produced empirical evidence that quality of pair programming is affected by distribution. The following results have been obtained: empirical evidence that the distribution affects pair programming quality. Some factors of distribution settings make the quality of the pair programming decrease. The quality assessment of the experiment suggested that such factors

135 135 have to be searched in the infrastructure of collaboration. Communication has to be fluent and neither obstructive for the driver nor the observer. On the contrary, reviewing code and discussing a common strategy require additional effort to be accomplished successfully. Empirical evidence that the percentile variation of J-Complexity decreases when distributing pair programming. There is no empirical evidence that the distribution affects the other metrics. The interpretation of experimental data leads to a confirmation of the pair dismissal hypothesis. This means that distributed pair programming produces worst results than co-located pair programming. This could be due to not appropriate training of subjects and not appropriate communication means. No empirical evidence that effort increases when distributing pair programming. Although the dismissal phenomenon favored a higher expense of time in co-located tasks, the differences are not statistically significant. The qualitative analysis confirmed that the time can be reduced with distribution. The motivation for that is not encouraging: this is due to the breaking down of collaboration. Finally, it is only a waste of resources: two programmers are paid whereas only one works, without benefits of contemporary reviews. Some candidate factors determining the success of the pair programming: appropriate communication and collaboration support. Also if some researches claim that pair programming can increase productivity of teams and the quality of the products realized, this benefit depends on how the pair s components work. If collaboration and communication is not adequately supported, these benefits can be lost. It is the case of distribution. The conclusion is that pair programming can not assure high productivity and quality in every situation. Limits of the experimentation The results of the thesis show some limitations, which have been discussed in detail within the proper chapters. The limitations regard main the internal and external validity of the outcomes; in particular two main concerns must be addressed: 1. Kinds of projects. The projects are usually exemplar projects, suitable to run an experiment with students in a limited time frame. In a real setting, the projects are usually more complex than the ones used in the experiments and the time frame should span over a longer period. 2. Representative subjects. Even if the subjects owned all the necessary competence to accomplish the assigned tasks, usually professionals have many years of experience more than students. Moreover, students have general knowledge about the main technologies. Professionals usually get specific

136 136 knowledge about certain domains or specific technologies, which introduce other aspects to investigate. 3. Proxy Measures. The processes of measurement were not ever able to capture completely the objectives of measurement. An immediate example is the knowledge. For its intrinsic nature tacit knowledge is very difficult to quantified with a set of metrics. Questionnaires grading was used for this purpose, but it can be considered as an indicator of the knowledge to be measured rather than an exact measure. This limit of the current research is intended to be overcome with future work, as explained in the next section. Future work The work to be accomplished in order to continue the research of the current thesis will be aimed at overcoming the limitations of the thesis and explore further aspects of pair programming, and more precisely: 1. Realize experiments in vivo. According to the research plan illustrated in figure 4 of the Chapter III, the controlled experiments realized in academic environment and discussed here will be replicated in real industrial contexts. The experimental designs will be evolved by deleting the defects detected with the execution of these controlled experiments with students. This should enforce the external validity. 2. Use more precise measurement process. The measurement process will be improved and refined in order to collect metrics which are more meaningful with respects to the object of measuring. The main concerns regard the measurement of knowledge: more complex systems to capture it and represent it in quantitative terms will be analyzed. 3. Investigate impacts on maintainability. A novel issue to investigate is the relationship between pair programming and maintainability. The aspects to be studied are basically two: how pair programming can facilitate and support software maintenance tasks and how maintainable the code produced by pair programming could be.

137 Appendix A Form 1 This is the time sheet with the requirements of the first iteration. Check the type of development: Pair Programming Solo Programming Team: Name(1): Surname(1): [only if in pair:] Name(2): Surname(2): Task: Realize a system to trace software requirements for a software organization. Traceabiity means the ability to capture and represent the dependencies: among the requirements, the internal ones, the external ones, the process ones. Each team must detail the SRS, the high level design, the detailed design, and has to write the code implementing the project. The current task concerns the realization of the functional area number 1: Area 1: Requirement Definition. Req1.1: The system must permit the formalization of the requirement throughout a dialogue mask, according to the following schema: Id requirment Description Users

138 138 Input Output Elaboration Scenario Alternatives Req1.2: The overall information will be saved on a database, properly designed by the student. Flat files are allowed. Req1.3: The system should permit the reading of the present requirements, by filling in properly the fields of the dialogue mask. During the development it is necessary to complete the following form: Requirement # Start (hh/mm) End (hh/mm) Form 2 This is the time sheet with the requirements of the second (and first) iteration. Check the type of development: Pair Programming Solo Programming Team: Name(1): Surname(1): [only if in pair:] Name(2): Surname(2): Task: Realize a system to trace software requirements for a software organization.

139 139 Traceabiity means the ability to capture and represent the dependencies: among the requirements, the internal ones, the external ones, the process ones. Each team must detail the SRS, the high level design, the detailed design, and has to write the code implementing the project. The current task concerns the realization of the functional area number 2: Area 1: Requirement Definition. Req1.1: The system must permit the formalization of the requirement throughout a dialogue mask, according to the following schema: Id requirment Description Users Input Output Elaboration Scenario Alternatives Req1.2: The overall information will be saved on a database, properly designed by the student. Flat files are allowed. Req1.3: The system should permit the reading of the present requirements, by filling in properly the fields of the dialogue mask. Area 2: Definition of internal and external dependencies. R2.1 The system must permit the definiton of temporal and logic dependencies, indicating the kind of dependency and the id of related requirements. R2.2 The system must permit to link each requirement to external document, defining the description of the document, the name of the document, and its phisical location. During the development it is necessary to complete the following form: Requirement # Start (hh/mm) End (hh/mm)

140 140 Form 3 This is the time sheet with the requirements of the third (second and first) iteration. Check the type of development: Pair Programming Solo Programming Team: Name(1): Surname(1): [only if in pair:] Name(2): Surname(2): Task: Realize a system to trace software requirements for a software organization. Traceabiity means the ability to capture and represent the dependencies: among the requirements, the internal ones, the external ones, the process ones. Each team must detail the SRS, the high level design, the detailed design, and has to write the code implementing the project. The current task concerns the realization of the functional area number 3: Area 1: Requirement Definition. Req1.1: The system must permit the formalization of the requirement throughout a dialogue mask, according to the following schema: Id requirment Description Users Input Output Elaboration Scenario Alternatives

141 141 Req1.2: The overall information will be saved on a database, properly designed by the student. Flat files are allowed. Req1.3: The system should permit the reading of the present requirements, by filling in properly the fields of the dialogue mask. Area 2: Definition of internal and external dependencies. R2.1 The system must permit the definiton of temporal and logic dependencies, indicating the kind of dependency and the id of related requirements. R2.2 The system must permit to link each requirement to external document, defining the description of the document, the name of the document, and its phisical location. Area 3. Definitions of process and conflict/cooperation dependencies. R 3.1 The system must permit the possibility to identify for each functional requirement which causes conflict or cooperation with other requirements (-/+) and which quality attribute is affected. R3.2 The system must permit to link each requirement to: project documentation (use case, class diagram, collaboration diagram, deployment diagram, sequence diagram), with description and path code files, with description and path test files, with description and path. During the development it is necessary to complete the following form: Requirement # Start (hh/mm) End (hh/mm)

142

143 Appendix B User_Brench Registration Send User Registration HeadQuarterSyste m Brench Operator Send Brench Registration <<include>> <<include>> Check Correctness/Completeness Checking Tesaurus Update User Remote Registred <<include>> User Send User Remote Registration BrenchSystem

144 144 User Attori BrenchOperator HeadQuarterSyste m Specifica E l utente comune, che si distingue in vendor, ovvero venditore e buyer, ovvero compratore E l operatore della Filiale E la piattaforma informatica della Sede Centrale, include Web Server e DataBase Server. Cecking Tesaurus Contiene le regole di correttezza e completezza. BrenchSystem E la piattaforma informatica della Filiale, incude Web Server e DataBase Server Use Case Descrizione Eccezioni Attori Use Case Extends Use Case Uses Use Case Inputs Use Case Outputs Criterio di Accettazione Send User Registration L operatore della filiale inserisce nel form di registrazione dell utente i dati, forniti dall utente medesimo. Al termine della compilazione, prima dell invio dei dati alla sede centrale, viene avviata l applicazione che verifica che il modulo sia compilato in modo corretto e completo. Non correttezza o incompletezza nella compilazione del form. L invio dei dati alla sede centrale ha esito negativo. BrenchOperator, HeadQuarterSystem. Nn Check Correctness/Completeness Dati anagrafici, residenza, Lista di libri offerti (nel caso l utente sia un vendor) con specifiche: nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione, specifica della trattativa. Memorizzazione dei dati del nuovo utente Registrato. I dati relativi al nuovo utente sono caricati nel database della Sede Locale ed in quello della Sede Centrale.

145 145 Aspettative Collegate Requisiti o Use Case Collegati Effettuare popolamento del Database (Sede Locale e Sede Centrale). Controlli di correttezza e completezza nei campi del form di registrazione. Invio dei dati al sistema centrale. Check Correctness/Completeness Use Case Descrizione Eccezioni Attori Use Case Extends Use Case Uses Use Case Inputs Use Case Outputs Criterio di Accettazione Aspettative Collegate Requisiti o Use Case Collegati Send Brench Registration L operatore della filiale inserisce nel form di registrazione della filiale i dati relativi alla registrazione della filiale. Al termine della compilazione, prima dell invio dei dati alla sede centrale, viene avviata l applicazione che si occupa di verificare che il modulo sia compilato in modo corretto e completo. Non correttezza o incompletezza nella compilazione del form. L invio dei dati alla sede centrale ha esito negativo. BrenchOperator, BrenchSystem. Nn Check Correctness/Completeness Locazione della filiale, Responsabile, lista degli operatori con relative , dati fiscali della filiale, recapiti telefonici e fax. Memorizzazione dei dati della filiale. I dati relativi alla nuova filiale sono caricati nel database della Sede Centrale. Effettuare popolamento del Database (Sede Centrale). Controlli di correttezza e completezza nei campi del form di registrazione. Invio dei dati al sistema centrale. Check Correctness/Completeness. Use Case Descrizione Update User Remote Registered L operatore della filiale ricava dal database locale tutti gli utenti registrati localmente via Web. Il modulo di registrazione compilato dall utente deve essere inviato alla Sede Centrale

146 146 corredato dalle informazioni della Sede Locale. Eccezioni L interrogazione ha esito negativo. L invio dei dati alla sede centrale ha esito negativo. Attori BrenchOperator, HeadQuarterSystem. Use Case Extends Nn Use Case Uses - Use Case Inputs Formulazione query di ricerca utenti registrati alla Sede Locale via Web e non registrati nella Sede Centrale. Use Case Outputs Memorizzazione dei dati nella Sede Centrale. Criterio di I dati relativi al nuovo utente sono caricati nel database della Accettazione Sede Centrale. Aspettative Collegate Funzionalità di gestione del Database (Brench System e HeadQuarter) Procedure per il controllo della correttezza/completezza Requisiti o Use Case Collegati Invio dei dati al Sistema Centrale. - Use Case Descrizione Eccezioni Attori Use Case Extends Use Case Uses Use Case Inputs Use Case Outputs Criterio Accettazione Aspettative Collegate di Requisiti o Use Case Collegati Send User Remote Registration L utente si collega via Web al form di registrazione alla Sede Locale a cui vuole registrarsi. Dopo aver compilato il form, immediatamente prima dell invio dei dati alla Sede Locale, verrà effettuato un controllo di correttezza e completezza nella compilazione del modulo. Non correttezza o incompletezza nella compilazione del form. L invio dei dati alla sede Locale ha esito negativo. User, BrenchSystem. Nn Check Correctness/Completeness Dati anagrafici, residenza, Lista di libri offerti con specifiche: nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione, specifica della trattativa. Memorizzazione dei dati del nuovo utente Registrato nel database della Sede Locale. I dati relativi al nuovo utente sono caricati nel database della Sede Locale. Effettuare popolamento del Database (Sede Locale). Controlli di correttezza e completezza nei campi del form di registrazione. Invio dei dati al Sistema Locale. Check Correctness/Completeness.

147 147 Use Case Check Correctness/Completeness Descrizione Facendo riferimento alla descrizione del dominio ed alle regole di validità presenti nel Checking Tesaurus, questa applicazione consente di definire se il modulo presenta incompletezze o incorrettezze nella compilazione di un campo. Eccezioni Non accessibilità del Checking Tesaurus Attori Checking Tesaurus. Use Case Extends Nn Use Case Uses - Use Case Inputs Modulo di registrazione compilato. Use Case Outputs Modulo di registrazione controllato con esito del controllo (indicazione dei campi incorretti ed incompleti). Criterio di Tutti i campi del form sono stati verificati. Accettazione Aspettative Collegate Accessibilità al Checking Tesaurus. Il Checking Tesaurus è mantenuto aggiornato Requisiti o Use Case Collegati Send User Registration. Send Brench Registration. Send User Remote Registration.

148 148 Book Search and Selling transaction Update Database HeadQuarterSystem Brench Operator Search Local Book <<extend>> BrenchSystem Notify Transaction To Buyer Buyer Search Remote Book Forward Book Search Vendor Notify Transaction To Brench

149 149 Use Case Update Database Descrizione Successivamente alla conclusione di una vendita, l operatore della filiale compila un modulo in cui specifica il libro che è stato acquistato che viene eliminato dal DataBase della Sede centrale e della Sede Locale Eccezioni Aggiornamento del DataBase non concluso con successo. Sessione caduta prima del completamento dell operazione. Attori Brench Operator, HeadQuarterSystem, Brench System. Use Case Extends - Use Case Uses - Use Case Inputs Nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione Use Case Outputs Esito dell aggiornamento. Criterio di Il record del libro venduto è stato rimosso dal DataBase Accettazione Centrale e Locale. Aspettative Collegate Requisiti o Use Case Collegati Accessibilità al Sistema Centrale. Funzionalità di gestione del database. - Use Case Descrizione Search Local Book L operatore della Filiale inserisce nel form di ricerca le specifiche del libro desiderato (Nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione). Viene eseguita la ricerca nel DataBase Locale. Nel caso il libro non viene trovato, la ricerca è

150 150 inoltrata ad un altra Sede Locale selezionata dall Operatore (Forward Book Search). Eccezioni Accesso al DataBase Locale non concluso con successo. Sessione caduta prima del completamento dell operazione. Attori Brench Operator, Brench System. Use Case Extends - Use Case Uses - Use Case Inputs Nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione. Use Case Outputs Esito della ricerca. Criterio di Il libro che risponde alle specifiche è fornito se presente nel Accettazione database. Aspettative Funzionalità di gestione del database. Collegate Requisiti o Use Forward Book Search Case Collegati Use Case Forward Book Search Descrizione L operatore della Filiale inserisce nel form di ricerca le specifiche del libro desiderato (Nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione) e le specifiche di un altra Sede Locale a cui inoltrare la ricerca. Eccezioni Accesso al DataBase Locale non concluso con successo. Sessione caduta prima del completamento dell operazione. Attori Brench Operator, Brench System. Use Case Extends Local Book Search Use Case Uses - Use Case Inputs Nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione, specifiche della Sede Locale. Use Case Outputs Esito della ricerca. Criterio di Il libro che risponde alle specifiche è fornito se presente nel Accettazione database. Aspettative Collegate Funzionalità di gestione del database. Funzionalità di rete. Requisiti o Use - Case Collegati Use Case Notify Transaction To Buyer Descrizione Se la trattativa di vendita è conclusa con successo, l operatore compila il form di Avvenuta Vendita che genererà la mail di notifica al compratore di transazione conclusa. Eccezioni Invio di mail non avvenuto. Attori Brench Operator, Buyer. Use Case Extends - Use Case Uses - Use Case Inputs Nome libro, autore, casa editrice, lingua, anno di

151 151 Use Case Outputs Criterio di Accettazione Aspettative Collegate Requisiti o Use Case Collegati pubblicazione, ISBN, dettagli fisici della pubblicazione, specifiche della transazione. Mail completa delle informazioni di transazione conclusa con successo da inviare. La Mail è inoltrata al compratore completa e corretta. Funzionalità di rete. - Use Case Notify Transaction To Brench Descrizione Se la trattativa di vendita è conclusa con successo, il venditore inoltra la mail di notifica di transazione conclusa con successo all Operatore della Filiale. Eccezioni Invio di mail non avvenuto. Attori Brench Operator, Vendor. Use Case Extends - Use Case Uses - Use Case Inputs Nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione, specifiche della transazione. Use Case Outputs Mail completa delle informazioni di transazione conclusa con successo da inviare. Criterio di La Mail è inoltrata all operatore della filiale Accettazione completa e corretta. Aspettative Collegate Funzionalità di rete. Requisiti o Use Case - Collegati Use Case Search Remote Book Descrizione Il compratore può realizzare una ricerca via Web del libro a cui è interessato. Eccezioni Non accessibilità al database centrale. Sessione scaduta prima del completamento dell operaizone. Attori HeadQuarterSystem, Buyer. Use Case Extends - Use Case Uses - Use Case Inputs Nome libro, autore, casa editrice, lingua, anno di pubblicazione, ISBN, dettagli fisici della pubblicazione. Use Case Outputs Risultato della ricerca Criterio di Il libro cercato è fornito se presente nel database Accettazione Aspettative Collegate Funzionalità di gestione del Database. Funzionalità di rete. Requisiti o Use Case - Collegati

152 152 Class Diagram 1: Entity Classes User nome idirizzo città numtelefono ruolo spectransaction 1..n get() set() addbook() deletebook() 1..n buy sold by 1..n Book nomelibro autore casaeditrice lingua annopubblicazione ISBN dettaglifisici prezzo formato get() set() 1..n Transaction id modalità data esito set() get() 0..n registered in executes 1 Brench nomeresponsabile nomeagenzia Indirizzo città numtelefono numfax datifiscali get() set() adduser() deleteuser() work in 1 1..n BrenchOperator nome funzione numtelefono get() set() 1 1 does once 1 BrenchRegistration id data dettaglioperazione get() set()

153 153 User Brench BrenchRegistration Book Transaction Struttura Dati per la definizione dell user, vendor e buyer. Strauttura Dati per la definizione di una Sede Locale. adduser(), deleteuser() servono, rispettivamente, all aggiunta e all eliminazione di un utente alla Sede Locale. Struttura Dati per la definizione di una filiale alla sede centrale. Struttura Dati per la definizione di un libro. Struttura Dati per la definizione della Transazione

154 154 Class Diagram 2: Control Classes Transaction id modalità data esito set() get() Registration reglocal() regremote() check() Checker completecheck() correctcheck() MailComposer to object cc ccn from send() get() set() DataHandler query connectiondata set() get() submitquery() Book nomelibro autore casaeditrice lingua annopubblicazione ISBN dettaglifisici get() set() User nome idirizzo città numtelefono ruolo spectransaction get() set() addbook() deletebook() BookSearch composequery() check() localsearch() FowardSearch compos () search Brench() Brench nomeresponsabile nomeagenzia Indirizzo città numtelefono numfax datifiscali get() set() adduser() deleteuser()

155 155 BookSearch Checker Registration ForwardSearch DataHandler MailComposer Gestisce la logica per la ricerca locale (localsearch, composequery) dei testi. Questa classe consente anche di verificare la completezza e correttezza dei form (check). Esegue il controllo di completezza (complcheck) e correttezza (correctcheck) sui form Gestisce la registrazione in locale (reglocal), in remoto (regrenote) degli utenti. Questa classe consente anche di verificare la completezza e correttezza dei form (check). Gestisce l inoltro della ricerca di un libro ad un altra sede locale. Compone la mail che dettaglia gli estremi della ricerca (compos ). Esegue la ricerca della sede locale che potrebbe soddisfare la richiesta (searchbrench). Consente l interrogazione al database locale e centrale in accordo alle differenti ricerche (sede Locale, libro) submitquery(). Realizza la composizione automatica delle mail.

156 156 Class Diagram 3: Presentation Classes LocalSystem UserReg() BrenchReg() RemoteRegUser() interfacecompose() Loc alsearch() TransactionNotification() ForwardBookSearch() FowardSearch compos () search Brench() Registration reglocal() regremote() check() RemoteSystem interfacecompose() UserRegistration() RemoteBook Search() TransactionNotification() BookSearch composequery() check() localsearch() MailComposer to object cc ccn from send() get() set()

157 157 LocalSystem RemoteSystem Presenta le funzionalità che l operatore della Sede Locale può utilizzare, richiamando le opportune funzioni del livello di controllo: UserReg: registrazione utente. BrenchReg: registrazione sede Locale al sistema centrale. RemoteRegUser: registrazione alla sede Centrale degli utenti registrati in remoto. InterfaceCompose: composizione dell interfaccia. LocalSearch: ricerca del libro nel database locale. TrasactionNotification: notifica al compratore di avvenuta transazione. ForwardBookSearch: inoltro di ricerca di libro in un altra sede Locale. Presenta le funzionalità disponibili all utente da remoto: InterfaceCompose: composizione dell interfaccia. UserRegistration: registrazione utente. RemoteBookSearch:ricerca di un libro da remoto TrasactionNotification: notifica alla sede locale di avvenuta transazione.

158 158 Questionnaire QA 1. L Operatore della Filiale potrebbe utilizzare il Modulo di Registrazione Remoto dell utente? a. Si b. No e non avrebbe alcun senso. c. Possibile, con opportune modifiche al diagramma. 2. La Registrazione della Filiale richiede un modulo di controllo della Correttezza/Completezza differente da quello della Registrazione dell Utente? a. Si b. No e non avrebbe alcun senso. c. Possibile, con opportune modifiche al diagramma. 3. Il caso d uso Forward Book Search ha una realzione di dipendenza con l attore Brench System; questo è fisicamente lo stesso che ha la relazione di dipendenza con il caso d suo Local Book Search? a. Si b. No e non avrebbe alcun senso. c. Possibile, con opportune modifiche al diagramma. 4. Il caso d uso Forward Book Search potrebbe essere un caso d uso a se stante e non estendere il caso d uso Local Book Search? a. Si b. No e non avrebbe alcun senso. c. Possibile, con opportune modifiche al diagramma. 5. La classe Transaction contiene i dati relativi all utente che vuole acquistare il libro ed il libro stesso? a. Si b. No e non avrebbe alcun senso. C Anche dell utente che vende il libro. 6. La classe BrenchRegistration contiene informazioni relative ai BrenchOperator? a. Si b. No C Dipende dall implementazione. 7. Un Vendor potrebbe essere registrato ma non avere Libri da vendere. a. VERO b. FALSO. C Questa informazione non si evince dalla documentazione. 8. Il DataHandler ha la funzione di modificare il formato dei dati. a. VERO b. FALSO. C Questa informazione non si evince dalla documentazione. 9. Il Checker può essere richiamato direttamente dall utente. a. VERO b. FALSO. C Questa informazione non si evince dalla documentazione. 10. Non esiste alcuna classe di presentazione che costruisca l interfaccia utente. a. VERO b. FALSO. C Questa informazione non si evince dalla documentazione.

159 159 Questionnaire QB 1.La Registrazione dell Utente in remoto (User Remote Registration Sending) potrebbe estendere la registrazione dell utente in locale (User registration Sending)? a. Si b. No e non avrebbe alcun senso. c. Possibile, con opportune modifiche al diagramma. 2. L aggiornamento dei dati dell utente registrato (User Remote Registred Updating) in modo remoto richiede il controllo di correttezza e completezza (Correctness/Completeness Checker)? a. Si b. No e non avrebbe alcun senso. c. Possibile, con opportune modifiche al diagramma. 3. I casi d uso Notification of Transaction To Buyer e Notification of Transaction To brench potrebbero essere condensati in un unico caso d uso? a. Si b. No e non avrebbe c. Possibile, con opportune alcun senso. modifiche al diagramma. 4. Il caso d uso Update Database potrebbe estendere il caso d uso Local Book Search? a. Si b. No e non avrebbe c. Possibile, con opportune alcun senso. modifiche al diagramma. 4. Data una transazione, le informazioni relative al venditore si ottengono attraverso il libro? a. Si b. No. C Non solo. 6. Una BrenchOperator deve aver eseguito almeno una Transaction, altrimenti non esiste nel sistema a. VERO b. FALSO. C Questa informazione non si evince dalla documentazione. 7. Posso risalire alla lista degli Utenti registrati in una sede locale attraverso i Dati contenuti in un oggetto Brench, e viceversa a. VERO b. FALSO C Questa informazione non si evince dalla documentazione. 8. Il DataHandler ha la funzione di interrogare il database. a. VERO b. FALSO. C Questa informazione non si evince dalla documentazione. 9. Il Checker verifica se tutti i campi del form sono compilati. a. VERO b. FALSO. C Non è l unico controllo che esegue. 10. L interfaccia utente è prevista solo per il sistema remoto. a. VERO b. FALSO. C Questa informazione non si evince dalla documentazione.

160 Appendix C This section offers an overview of the most widespread agile processes. Extreme Programming Extreme Programming (Xp) [2] was theorized by Kent Beck, after that he experimented some successes of his process. Xp is not associated to an exact and detailed process definition, but it is a collection of practices. The practices should be performed in accordance with the own ability and competency of the team: in many cases only a subset of the proposed practices are realized. Small release. The Xp process proceeds as a series of iterations which produces a working software, even if not completed. The iterations have to be stopped when the product is considered ready to the delivery. The versions could be released frequently, even daily. Planning game. Each day the requirements to be developed are assigned to the team s members who estimate the time necessary to develop them. Metaphor. Complex design concepts can be explained with a metaphor in order to foster a fast understanding. Simple design. Unnecessary complexity or extra code is removed. The simple solutions are preferred: they can be maintained easier. Common ownership. The versions of the product are stored in a common code base. Every one can work on any part of the code, whenever it is needed. Testing. The development follows the practice of the test first. Due to the common ownership and the iterative development cycle, the most important condition is that the code must operate as expected. Refactoring. This kind of process may inject duplication of code or other defects which do not affect the quality perceived by the customer or user, but which can seriously hinder the maintainability of the system. The code is the main documentation of the system and it must let a high communication. Pair programming. Pair programming is intended as a continuous review of the code at the same moment the code is produced. Continuous integration. The new piece of code is integrated in the system as soon as it is produced. This may require many builds a day. 40 hour a week. The team s members must go forth all together. Everyone has to know the improvements at the system and the problems to be solved. On site customer. Customer is involved in the production of the software by providing continuous feedback about the quality of the product, at each iteration.

161 161 Coding standards. It is very important to respect standards when coding in order to obtain an high level of code s readability. Scrum Originally the term scrum derives from a strategy in the game of rugby, where it denotes getting an out-of play ball back into the game with the teamwork. Scrum [9] does not propose any novel software development technique: it just focuses on the management of the team for obtaining system flexibility in a constantly changing environment. The unpredictability is determined by a huge set of variables, as well as: the time, resources, technology, and requirements. The main idea in scrum is to identify deficiencies or hurdles which make difficult to successfully reach the project goals. The Scrum process includes the following practices: The Product backlog. In this phases the business and technical goals to achieve with the project are established and prioritized: the items of the backlog can include: features, functions, bug fixes, defects, requested enhancements and technology upgrades. Effort Estimation. During this phase the estimation of efforts is accomplished with an iterative process. Each Backlog item is examined with a greater detail rather than in the previous phase. Sprint. This phase aims at tailoring the product with the changing environmental variable, such as: requirements, time, resources, and technology. The product of the phase is an executable. The sprint lasts 30 days. Sprint planning meeting. This meeting is headed to plan the products of the next sprint. Usually it consists of two phases: in the first one customers, users and management make decisions about the goals of the next sprint; and in the second one, the team plan how to implement the sprint plan. Sprint backlog. During this phase the items to be implemented in the next Sprint are selected, with accordance to the plan established in the previous phases. Daily scrum meeting. These meetings are planned for taking track of the progress of the team. Sprint review meeting. During this phase, the management, the customer, the user, and the Product Owner (this is one of the responsibility defined by scrum methods) assess the product increment and make decisions about the further activities. Backlog items and new directions are defined. Crystal family of methodologies The Crystal family of methodologies [4] includes a set of methodologies which are suitable to different kinds of project. According with the project, the appropriate methodology is chosen. The Crystal also provides the methods for tailoring the methodology when it does not fit the project.

162 162 Each methodology is marked with a color that indicates the heaviness and the required rigor of the correspondent project. Larger projects are supposed to require greater effort of coordination and a more severe awareness to plan-driven approach, Cockburn identifies four categories for the size of the project: clear, yellow orange and red, where red indicates the large projects, while clear the smaller ones. Criticality refers to the rigor the project needs and is classified in Comfort ( C) Discretionary (D), Essential money (E), and Life (L). There are some common aspects for all the methodologies: the development cycles are incremental with a maximum length of four month; Cockburn emphasizes the communication and collaboration as the key factors of the success; the Crystal family allows and fosters the use of other methods like Xp and Scrum; and one of the objectives is to reduce intermediate products. The Crystal family includes the following practices: Staging. It concerns the planning of the next increment of the system: it should be schedule for producing a release every three of four months. Revision and Review. Each increment includes several iterations, which consist of several activities: construction, demonstration, and review of the goals of the increment. Monitoring. This phase is essential for the quality of the project. It is recommended to monitor constantly the progress and the stability of each iteration. Parallelism and flux. This phase can occur only if the monitoring phase gives positive responses about stability and monitoring. The multiple teams can proceed in parallel. Holistic diversity and strategy. The crystal methodology includes the opportunity to split teams in smaller cross-functional groups, characterized by strong specialties. They will own the necessary know-how to provide an effective support in a very restrict problem s area. Methodology tuning technique. By using project interviews and team workshops, this phase provides continuous suggestions for improvements to the development process. User viewing. They are organized three times for each increment. Reflection workshops. Each team should hold pre- and post- reflection workshops. Dynamic Systems Development Method The Dynamic Systems Development Method (DSDM) [5] was developed in 1994 and it become one of the most widespread framework for rapid development in the UK. The main idea behind DSDM is that instead of fixing the amount of functionality in a product, and then adjusting time and resources to reach that functionality, it is preferable to fix time and resources and then adjust the amount of functionality accordingly. DSDM consists of five phases: Feasibility study and business study. This two phases are executed only once and are sequential. During these two phases the management makes decision

163 163 concerned with applicability of DSDM. Technical possibilities and risks related to the adoption of DSDM in this phase are analyzed. Furthermore, throughout workshops, the business and technological aspects are studied and discussed. In this phase the customer s experts are involved. Functional model iteration, design and build iteration. These three phases make up the iteration and are dedicate right to the development of the product. The development consists of realizing prototypes of the sub system for each iteration and analyzing which part of the prototypes can be candidate to be integrated in the final system, which one needs improvements and which one has to be discarded. Design and build are iterative. The DSDM includes the following practices: Active user involvement. A small number of expert users has to take part to the development in order to provide timely feedback. DSDM teams must be empowered to make decisions. A long time decisions making process is not tolerate: users should support the quick identification of the direction to choose. Frequent delivery. The delivery cycle is preferred to be short in order to adjust timely mistakes and misunderstanding. Fitness for business purpose is the essential criterion for acceptance of deliverables. The priority is upon validation, rather than verification. Iterative and incremental development is necessary to converge on an accurate business solution. Correct the errors early in the process: the requirements seldom remain stable. All changes during development are reversible. The short iterations allow to revert the wrong directions and repair the consequent mistakes. Requirements are baselined at a high level. Freezing the core requirements should be done at a high level to allow the detailed requirements to change as needed. As the development progresses, more of the requirements should be frozen as they become clear and are agreed upon. Testing is integrated throughout the lifecycle. Testing tends to be neglected when the time pressure increases. Therefore each component must be tested by the developers as they are produced. Regression testing must be emphasized, due the incremental cycle. A collaborative and cooperative approach shared by all the stakeholder is essential. The organization must assure a strong commitment in DSDM process. Furthermore, the product to be delivered at any iteration is ever a compromise among the different stakeholder involved: a common and clear agreement is needed. Feature Driven Development The Feature Driven Development [8] does not cover the overall development process, but focuses on the design and the build phase; it does not require any specific process model to be used. The characteristics of this process are, mainly: iterative

164 164 development cycles, emphasis on tangible deliverables during the process, and a continuous and accurate monitoring of the project. FDD consists of five sequential processes during which the designing and building of the system is carried out. The implementation of a feature should cover one to three weeks of work. The first stage of the process concerns the definition of requirements as use cases and functional specifications. The domain experts show the goals to achieve by a walkthrough, which the chief architect and the developers take part to. A list of the features to be developed is written down; the features are grouped in different areas of interest and the value for the client is identified for each one of them. The plan of development is organized on the basis of the priority assigned to the groups of features. The classes identified in the previous phases are also assigned to the developers. The design of the system is realized by groups of features; consequently, design inspection, coding, unit testing, integration and building concern the group of feature under development. FDD involves the following practices: Domain object modeling: formalization of the problem domain in terms of classes. Developing by features: developing a set of features which are the basis for tracking the progress of the project. Individual code ownership: everyone can work on every part of the code. Feature teams: teams are small and formed dynamically. Inspection: mechanisms to detect defects are intensively used. Regular Builds: ensuring that a running, demonstrable system is always available. Configuration Management: historical tracking of the different versions produced is mandatory. Progress reporting: the project must be continuously monitored.

165 ChapterVII: Conclusion