Extracting new metrics from Version Control System for the comparison of software developers Marcello Moura 1, Hugo Nascimento 2 e Thierson Rosa 2 Centro de Recursos Computacionais 1, Instituto de Informática 2 Universidade Federal de Goiás (UFG) Caixa Postal 131 74.001-970 Goiânia GO Brazil marcello@ufg.br, {hadn,thierson}@inf.ufg.br Goiânia, 21 de Setembro 2014 Moura, Nascimento e Rosa Extracting new metrics from VCS... 1 / 48
Summary I 1 Introduction 2 Extracting fine-grain operations from VCS 3 Metrics for the developers 4 Comparison of the developers 5 The case study 6 Conclusion Moura, Nascimento e Rosa Extracting new metrics from VCS... 2 / 48
Summary 1 Introduction 2 Extracting fine-grain operations from VCS 3 Metrics for the developers 4 Comparison of the developers 5 The case study 6 Conclusion Moura, Nascimento e Rosa Extracting new metrics from VCS... 3 / 48
Introduction Version Control Systems (VCSs), like Subversion and Git, store revisions of the files of a software development project, registering its historical evolution. Moura, Nascimento e Rosa Extracting new metrics from VCS... 4 / 48
Introduction VCSs have been used for: Helping to understand the software development process Lopez-Fernandez et al. [2004], Huang and Liu [2005], Girba et al. [2005], Voinea and Telea [2006] and Voinea et al. [2007]. Helping to know more about the developers Gilbert and Karahalios [2007], Jermakovics et al. [2011], Mockus and Herbsleb [2002], Minto and Murphy [2007], Schuler and Zimmermann [2008], Zhang et al. [2008a,b] and Di Bella et al. [2013]. Moura, Nascimento e Rosa Extracting new metrics from VCS... 5 / 48
Introduction Our work focuses on understanding the developers by the analisys of their work. 1 We identify and count finer-grain operations at line and file levels that can be extracted from a VCS, like additions, deletions and modifications. This allows to derive a much more detailed and rich information about the work performed by the developers. 2 We calculate a new set of formally defined metrics. 3 Developers are characterized by comparing each one of them against the others. Two comparison approaches for this aim are described. Moura, Nascimento e Rosa Extracting new metrics from VCS... 6 / 48
Introduction Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge. Moura, Nascimento e Rosa Extracting new metrics from VCS... 7 / 48
Introduction Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge. Moura, Nascimento e Rosa Extracting new metrics from VCS... 7 / 48
Summary 1 Introduction 2 Extracting fine-grain operations from VCS 3 Metrics for the developers 4 Comparison of the developers 5 The case study 6 Conclusion Moura, Nascimento e Rosa Extracting new metrics from VCS... 8 / 48
Extracting fine-grain operations from VCS Basic notation: P a software project in a VCS D the set of developers that worked on P. A the set of all files created during the development of P A r A the set of files that were removed (not reached the final version) of P. Moura, Nascimento e Rosa Extracting new metrics from VCS... 9 / 48
Extracting fine-grain operations from VCS We mine the VCS for three types of operations: additions, deletions and modifications of files and lines of code. Project History Moura, Nascimento e Rosa Extracting new metrics from VCS... 9 / 48
Extracting fine-grain operations from VCS Moura, Nascimento e Rosa Extracting new metrics from VCS... 9 / 48
Summary 1 Introduction 2 Extracting fine-grain operations from VCS 3 Metrics for the developers 4 Comparison of the developers 5 The case study 6 Conclusion Moura, Nascimento e Rosa Extracting new metrics from VCS... 10 / 48
Metrics for the developers Aspects defined for consideration: 1 Effort represents the total amount of operations of a type performed by a developer. 2 Code-survival indicates the amount of operations of a type performed by a developer and not changed later by anyone. Moura, Nascimento e Rosa Extracting new metrics from VCS... 11 / 48
Metrics for the developers A. Metrics for evaluating developers individually H a { a,i 1 if o Effo Add(d) = 1.devel = d a A 0 otherwise. i=1 H a Effo Mod(d) = a A i=1 h a l i j=1 1 if o a,i j.devel = d and o a,i j.type = MOD; 0 otherwise. Moura, Nascimento e Rosa Extracting new metrics from VCS... 12 / 48
Metrics for the developers A. Metrics for evaluating developers individually Surv Add(d) = H a a (A A r ) i=1 Surv Mod(d) = H a a (A A r ) i=1 1 if o a,i 1.devel = d and o a,i s with s > 1, (o a,i s.type = MOD and o a,i s.devel = d); 0 otherwise. 1 if o a,i end.type = MOD and o a,i end.devel = d and w,1 w < hl a i, such that ow a,i.devel d; 0 otherwise. Moura, Nascimento e Rosa Extracting new metrics from VCS... 13 / 48
Metrics for the developers A. Metrics for evaluating developers individually Surv Add Div Effo Add(d) = Surv Add(d) Effo Add(d) Moura, Nascimento e Rosa Extracting new metrics from VCS... 14 / 48
Metrics for the developers B. Uncovering and measuring relationships between developers Also, ADD DEL, MOD MOD, MOD DEL. Moura, Nascimento e Rosa Extracting new metrics from VCS... 15 / 48
Metrics for the developers B. Uncovering and measuring relationships between developers H a Line Add Mod(x,y) = a A i=1 1 if h l i > 1 and o a,i 1.devel = x and o a,i 1.type = ADD and o a,i 2.devel = y and o a,i 2.type = MOD; 0 otherwise. Moura, Nascimento e Rosa Extracting new metrics from VCS... 16 / 48
Metrics for the developers B. Uncovering and measuring relationships between developers Line Add ΣMod(d) = Line ΣAdd Mod(d) = Line Add Mod(d, y) y D {d} Line Add Mod(x, d) x D {d} Moura, Nascimento e Rosa Extracting new metrics from VCS... 17 / 48
Metrics for the developers C. Extending the metrics for the file level A project revision is a triple (r,d,l), where: r is the label of the revision, d is a identifier of the developer who made the revision, with d D, and L is a list of pairs (a,t) where a is a file and t {A,M,D} describes the operation. A project revision sequence is a sequence S = (r 1,d 1,L 1 ),(r 2,d 2,L 2 ),...,(r m,d m,l m ) of project revisions that represent the history of changes made on the files of P without going into detail about the changes made on their individual lines. Moura, Nascimento e Rosa Extracting new metrics from VCS... 18 / 48
Metrics for the developers C. Extending the metrics for the file level File Add Mod(x,y) = a A 1 if there are triples (r i,d i,l i ) and (r j,d j,l j ) in S,with i < j, such that d i = x,d j = y, (a,a) L i and (a,m) L j, and for which there is no triple (r k,d k,l k ) with i < k < j such that (a,t) L k for any operation of type t; 0 otherwise. Moura, Nascimento e Rosa Extracting new metrics from VCS... 19 / 48
Metrics for the developers C. Extending the metrics for the file level File Add ΣMod(d) = File ΣAdd Mod(d) = File Add Mod(d, y) y D {d} File Add Mod(x, d) x D {d} Moura, Nascimento e Rosa Extracting new metrics from VCS... 20 / 48
Metrics for the developers D. Metrics regarding commits Commits(x, y) = S 1 i=1 1 if triples (r i,d i,l i ) and (r i+1,d i+1,l i+1 ) are such that d i = x and d i+1 = y; 0 otherwise. S 1 if triple (r i,d i,l i ) ΣCommits(d) = is such that d i = d; i=1 0 otherwise. Moura, Nascimento e Rosa Extracting new metrics from VCS... 21 / 48
Metrics for the developers Metric Rel(d) = Metric(d) x D Metric(x) Moura, Nascimento e Rosa Extracting new metrics from VCS... 22 / 48
Summary 1 Introduction 2 Extracting fine-grain operations from VCS 3 Metrics for the developers 4 Comparison of the developers 5 The case study 6 Conclusion Moura, Nascimento e Rosa Extracting new metrics from VCS... 23 / 48
Comparison of the developers A. Performance-based hierarchy All metrics should have the same orientation Moura, Nascimento e Rosa Extracting new metrics from VCS... 24 / 48
Comparison of the developers B. Similarity Comparison Moura, Nascimento e Rosa Extracting new metrics from VCS... 25 / 48
Summary 1 Introduction 2 Extracting fine-grain operations from VCS 3 Metrics for the developers 4 Comparison of the developers 5 The case study 6 Conclusion Moura, Nascimento e Rosa Extracting new metrics from VCS... 26 / 48
The case study Evaluating the metrics and the comparison approaches with qualitative assessment on a real software-development project. The software Weby A content management system built by UFG. Hosting more than 400 internal web sites 1. Considered time (1 year and 7 months). Eleven (11) developers contributed to the evolution of the source code. One developer was also the project manager. 1,294 code revisions into VCS (Subversion) of UFG. 1 The available at https://github.com/cercomp/weby. Moura, Nascimento e Rosa Extracting new metrics from VCS... 27 / 48
The case study Files Lines D. Commits Add. Mod. Del. Add. Mod. Del. d1 474 482 1,807 64 110,204 7,026 54,710 d2 159 47 453 4 4,340 1,531 1,587 d3 2 0 6 0 26 31 165 d4 170 314 585 12 44,013 1,577 1,224 d5 30 43 78 1 1,736 142 205 d6 99 333 367 17 51,673 1,548 3,220 d7 61 12 379 15 1,116 923 1,214 d8 183 848 783 29 85,686 4,688 5,289 d9 20 1 34 0 102 398 15 d10 24 8 74 5 542 196 476 d11 72 7 199 4 1,190 489 308 Total 1,294 2,095 4,765 151 300,628 18,549 68,413 Moura, Nascimento e Rosa Extracting new metrics from VCS... 28 / 48
The case study The evaluation was conducted through two assessments involving four steps each: 1 Calculation of the values of a set of metrics for all developers. 2 Computation of the hierarchy of classes and the MDS visualization. 3 Interview with the project manager, aiming to verify if the classes and the visualization produced by the comparison approaches match his/her perception about the developers. 4 Analysis and interpretation of the results obtained from the interview. Moura, Nascimento e Rosa Extracting new metrics from VCS... 29 / 48
The case study Formulário de Entrevista Nome do Entrevistado: Nome do Projeto: Cargo: Formação: Local e Data: 1 Explicar os dados existentes e as métricas. (Explicar o que o sistema desenvolvido faz) 2 Apresentar a classificação por classe de dominância. (Explicar o significado de cada classe) 3 Perguntas sobre a classe de dominância. a) Essa separação faz sentido para você? b) Se você fosse escolher um ou mais desenvolvedores para um projeto futuro, esta classificação ajudaria? Por quê? Quais os desenvolvedores você escolheria? c) Você classificaria os desenvolvedores dessa mesma forma? Por quê? Se não, como seria sua classificação? d) Tem algum desenvolvedor que você acha que foi classificado equivocadamente? 4 Apresentar a visualização em MDS. (Explicar o que significa a distância entre dois desenvolvedores) 5 Perguntas sobre a visualização em MDS. e) Os desenvolvedores que estão próximos são, de fato, parecidos na sua produção técnica? Eles produzem resultados semelhantes? f) Como você rotularia (daria nomes com base em alguma característica de similaridade) os grupos de pessoas visivelmente próximas? g) Há alguma discrepância ou semelhança entre os resultados das classes de dominância, apresentadas anteriormente, e a visualização MDS atual? 6 Perguntas sobre o conjunto total de métricas. h) Você concorda que quanto maior for o valor obtido em cada uma dessas 4 métricas melhor foi o desempenho do desenvolvedor? Por quê? i) Quais outras métricas (da planilha completa) você acha interessante/útil para uma avaliação dos desenvolvedores? Por quê? Moura, Nascimento e Rosa Extracting new metrics from VCS... 30 / 48
The case study A. Metrics and comparisons computed in the first assessment D. Surv Add Surv Mod Surv Add Div Surv Mod Div Effo Add Effo Dist Mod d1 102,817 539 0.932 0.253 d2 3,188 294 *0.734 *0.609 d3 0 0 0.000 0.000 d4 41,929 410 0.952 0.455 d5 1,185 21 *0.682 *0.437 d6 50,630 479 0.979 *0.807 d7 483 163 *0.432 *0.612 d8 83,409 1,302 0.973 0.632 d9 55 211 *0.539 *0.875 d10 225 43 *0.415 *0.605 d11 1,053 315 *0.884 *0.734 Moura, Nascimento e Rosa Extracting new metrics from VCS... 31 / 48
The case study Equivalence Classes Developers 1 d1, d6, d8 2 d4 3 d2, d11 4 d5, d7, d9 5 d10 6 d3 Moura, Nascimento e Rosa Extracting new metrics from VCS... 32 / 48
The case study Moura, Nascimento e Rosa Extracting new metrics from VCS... 33 / 48
The case study Equivalence Classes Developers [first] Developers [second] 1 d1, d6, d8 d1, d6, d4, d8 2 d4 d2, d11 3 d2, d11 d5, d7, d9 4 d5, d7, d9 d10 5 d10 d3 6 d3 Moura, Nascimento e Rosa Extracting new metrics from VCS... 34 / 48
The case study Moura, Nascimento e Rosa Extracting new metrics from VCS... 35 / 48
Summary 1 Introduction 2 Extracting fine-grain operations from VCS 3 Metrics for the developers 4 Comparison of the developers 5 The case study 6 Conclusion Moura, Nascimento e Rosa Extracting new metrics from VCS... 36 / 48
Conclusion I We presented new formal definitions and metrics that allow the extraction of basic but important information from projects hosted in VCSs. We considered measures of efforts and code-survival. Two approaches were suggested for comparing the developers. A case study with a real software project was carried out. The results showed the usefulness of the metrics and of the comparison approaches. The new metrics may help to unveil interesting facts. But there are limitations in the use of VCS data. The logs are in general incomplete and can lead to ambiguous interpretation. Moura, Nascimento e Rosa Extracting new metrics from VCS... 37 / 48
Conclusion II We tried to compensate this weakness by involving the project manager. Moura, Nascimento e Rosa Extracting new metrics from VCS... 38 / 48
Future Work Future investigations include: formulating new metrics; using other techniques to compare the developers; improving the diff analysis for detecting other types of operation; exploring more sources of data. Moura, Nascimento e Rosa Extracting new metrics from VCS... 39 / 48
Questions? Moura, Nascimento e Rosa Extracting new metrics from VCS... 40 / 48
Extracting new metrics from Version Control System for the comparison of software developers Marcello Moura 1, Hugo Nascimento 2 e Thierson Rosa 2 Centro de Recursos Computacionais 1, Instituto de Informática 2 Universidade Federal de Goiás (UFG) Caixa Postal 131 74.001-970 Goiânia GO Brazil marcello@ufg.br, {hadn,thierson}@inf.ufg.br Goiânia, 21 de Setembro 2014 Moura, Nascimento e Rosa Extracting new metrics from VCS... 41 / 48
References I Enrico Di Bella, Alberto Sillitti, and Giancarlo Succi. A multivariate classification of open source developers. Information Sciences, 221(0):72 83, February 2013. ISSN 0020-0255. doi: http://dx.doi.org/10.1016/j.ins.2012.09.031. Eric Gilbert and Karrie Karahalios. Codesaw: A social visualization of distributed software development. In Proceedings of the 11th IFIP TC 13 International Conference on Human-computer Interaction - Volume Part II, INTERACT 07, pages 303 316, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-74799-0, 978-3-540-74799-4. Moura, Nascimento e Rosa Extracting new metrics from VCS... 42 / 48
References II Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stéphane Ducasse. How Developers Drive Software Evolution. In Proceedings of the Eighth International Workshop on Principles of Software Evolution, IWPSE 05, pages 113 122, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2349-8. doi: 10.1109/IWPSE.2005.21. Shih-Kun Huang and Kang-min Liu. Mining version histories to verify the learning process of legitimate peripheral participants. SIGSOFT Software Engineering Notes, 30(4): 1 5, May 2005. ISSN 0163-5948. doi: 10.1145/1082983.1083158. Moura, Nascimento e Rosa Extracting new metrics from VCS... 43 / 48
References III Andrejs Jermakovics, Alberto Sillitti, and Giancarlo Succi. Mining and visualizing developer networks from version control systems. In Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE 11, pages 24 31, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0576-1. doi: 10.1145/1984642.1984647. Luis Lopez-Fernandez, Gregorio Robles, and Jesus M. Gonzalez-Barahona. Applying Social Network Analysis to the Information in CVS Repositories. In First International Workshop on Mining Software Repositories, pages 101 105, 2004. Moura, Nascimento e Rosa Extracting new metrics from VCS... 44 / 48
References IV Shawn Minto and Gail C. Murphy. Recommending emergent teams. In Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR 07, page 5, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2950-X. doi: 10.1109/MSR.2007.27. Audris Mockus and James D. Herbsleb. Expertise browser: A quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering, ICSE 02, pages 503 512, New York, NY, USA, 2002. ACM. ISBN 1-58113-472-X. doi: 10.1145/581339.581401. Moura, Nascimento e Rosa Extracting new metrics from VCS... 45 / 48
References V Stas Negara, Mohsen Vakilian, Nicholas Chen, RalphE. Johnson, and Danny Dig. Is It Dangerous to Use Version Control Histories to Study Source Code Evolution? In James Noble, editor, ECOOP 2012 - Object-Oriented Programming, volume 7313 of Lecture Notes in Computer Science, pages 79 103. Springer Berlin Heidelberg, 2012. ISBN 978-3-642-31056-0. doi: 10.1007/978-3-642-31057-7 5. David Schuler and Thomas Zimmermann. Mining usage expertise from version archives. In Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR 08, pages 121 124, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-024-1. doi: 10.1145/1370750.1370779. Moura, Nascimento e Rosa Extracting new metrics from VCS... 46 / 48
References VI L Voinea, J Lukkien, and A Telea. Visual Assessment of Software Evolution. Science of Computer Programming, 65 (3):222 248, April 2007. ISSN 01676423. Lucian Voinea and Alexandru Telea. An Open Framework for CVS repository Querying, Analysis and Visualization. In Proceedings of the 2006 international workshop on Mining software repositories - MSR 06, pages 33 39, New York, NY, USA, May 20-28 2006. ACM Press. ISBN 1595933972. doi: 10.1145/1137983.1137993. Shen Zhang, Yongji Wang, and Junchao Xiao. Mining Individual Performance Indicators in Collaborative Development Using Software Repositories. In Software Engineering Conference, 2008. APSEC 08. 15th Asia-Pacific, pages 247 254, December 2008a. doi: 10.1109/APSEC.2008.12. Moura, Nascimento e Rosa Extracting new metrics from VCS... 47 / 48
References VII Shen Zhang, Yongji Wang, Ye Yang, and Junchao Xiao. Capability assessment of individual software development processes using software repositories and dea. In Proceedings of the Software Process, 2008 International Conference on Making Globally Distributed Software Development a Success Story, ICSP 08, pages 147 159, Berlin, Heidelberg, 2008b. Springer-Verlag. ISBN 3-540-79587-1, 978-3-540-79587-2. Moura, Nascimento e Rosa Extracting new metrics from VCS... 48 / 48