CVSscan: Visualization of Code Evolution



Similar documents
An Alternative Way to Measure Private Equity Performance

An Interest-Oriented Network Evolution Mechanism for Online Communities

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

DEFINING %COMPLETE IN MICROSOFT PROJECT

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Vembu StoreGrid Windows Client Installation Guide

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Forecasting the Direction and Strength of Stock Market Movement

Enterprise Master Patient Index

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

IMPACT ANALYSIS OF A CELLULAR PHONE

What is Candidate Sampling

iavenue iavenue i i i iavenue iavenue iavenue

Updating the E5810B firmware

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Section 5.4 Annuities, Present Value, and Amortization

RequIn, a tool for fast web traffic inference

Recurrence. 1 Definitions and main statements

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

An interactive system for structure-based ASCII art creation

Multiple-Period Attribution: Residuals and Compounding

MULTIVAC Customer Portal Your access to the MULTIVAC World

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

We assume your students are learning about self-regulation (how to change how alert they feel) through the Alert Program with its three stages:

Canon NTSC Help Desk Documentation

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Project Networks With Mixed-Time Constraints

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

A Secure Password-Authenticated Key Agreement Using Smart Cards

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

Using Series to Analyze Financial Situations: Present Value

= (2) T a,2 a,2. T a,3 a,3. T a,1 a,1

Financial Mathemetics

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

VIP X1600 M4S Encoder module. Installation and Operating Manual

CISCO SPA500G SERIES REFERENCE GUIDE

Parallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Calculation of Sampling Weights

The Greedy Method. Introduction. 0/1 Knapsack Problem

Estimating the Development Effort of Web Projects in Chile

Instructions for Analyzing Data from CAHPS Surveys:

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

A New Approach for Protocol Analysis on Design Activities Using Axiomatic Theory of Design Modeling

A Simple Approach to Clustering in Excel

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Design and Development of a Security Evaluation Platform Based on International Standards

COMPUTER SUPPORT OF SEMANTIC TEXT ANALYSIS OF A TECHNICAL SPECIFICATION ON DESIGNING SOFTWARE. Alla Zaboleeva-Zotova, Yulia Orlova

Design for Warranty Cost Reduction

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Overview of monitoring and evaluation

LIFETIME INCOME OPTIONS

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Web Object Indexing Using Domain Knowledge *

GENESYS BUSINESS MANAGER

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

Towards Specialization of the Contract-Aware Software Development Process

Developing an Employee Evaluation Management System: The Case of a Healthcare Organization

Politecnico di Torino. Porto Institutional Repository

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

A powerful tool designed to enhance innovation and business performance

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Conferencing protocols and Petri net analysis

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

Daily Mood Assessment based on Mobile Phone Sensing

Computer-assisted Auditing for High- Volume Medical Coding

Calculating the high frequency transmission line parameters of power cables

LoyalTracker: Visualizing Loyalty Dynamics in Search Engines

Introduction CONTENT. - Whitepaper -

M-applications Development using High Performance Project Management Techniques

Capacity-building and training

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Software project management with GAs

7.5. Present Value of an Annuity. Investigate

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

One Click.. Ȯne Location.. Ȯne Portal...

BERNSTEIN POLYNOMIALS

Demographic and Health Surveys Methodology

7 ANALYSIS OF VARIANCE (ANOVA)

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR

CHAPTER 14 MORE ABOUT REGRESSION

Network Security Situation Evaluation Method for Distributed Denial of Service

Transcription:

CVSscan: Vsualzaton of Code Evoluton Lucan Vonea Technsche Unverstet Endhoven Wskunde & Informatca PO Box 53, 56 MB Endhoven +3-44748 lvonea@wn.tue.nl Alex Telea Technsche Unverstet Endhoven Wskunde & Informatca PO Box 53, 56 MB Endhoven +3-44758 alext@wn.tue.nl Jarke J. van Wjk Technsche Unverstet Endhoven Wskunde & Informatca PO Box 53, 56 MB Endhoven +3-4474579 vanwjk@wn.tue.nl ABSTRACT Durng the lfe cycle of a software system, the source code s changed many tmes. We study how developers can be enabled to get nsght n these changes, n order to understand the status, hstory and structure better, as well as for nstance the roles played by varous contrbutors. We present CVSscan, an ntegrated multvew envronment for ths. Central s a lneorented dsplay of the changng code, where each verson s represented by a column, and where the horzontal drecton s used for tme, Separate lnked dsplays show varous metrcs, as well as the source code tself. A large varety of optons s provded to vsualze a number of dfferent aspects. Informal user studes demonstrate the effcency of ths approach for real world use cases. Categores and Subject Descrptors D..[Software Engneerng]: Desgn Tools and Technques; D..7[Software Engneerng]: Mantenance, Enhancement; H.5.[User Interfaces]: Evaluaton, Methodology; General Terms Management, Documentaton, Desgn, Expermentaton. Keywords Software evoluton, Software vsualzaton.. INTRODUCTION Snce ts begnnng, software vsualzaton has proved to be an effcent tool for supportng the software engneerng process. The ever-ncreasng complexty of software systems together wth the advent of lghtweght development methodologes, such as extreme programmng [], tends to shft development costs from early stages, such as archtecture and desgn, towards later stages, such as mantenance. Industry surveys show that, n the last decade, mantenance and evoluton exceeded 9% of the total software development costs [6], a problem referred to as the legacy crss [4]. Ths challenge s addressed on two fronts. The preventve approach tres to mprove the relablty of a system at desgn tme. Many vsual tools and technques exst to mprove the expressveness of UML and vsually assess desgn-tme qualty attrbutes [5], [9]. The correctve approach ams to facltate the mantenance phase, and s supported by program and process understandng and fault localzaton tools, e.g. SeeSoft [4] Aspect Browser [8], or Tarantula []. Wth over 5 bllon code lnes n mantenance n [6] we poston our work n ths second area of nterest. Program and process understandng s an mportant aspect of software mantenance. Current ndustral projects are often based on collaboratve development of mllons of code lnes. Industry practce studes have shown that mantaners spend 5% of ther tme on understandng ths code [7]. Many software vsualzaton tools have been desgned to help revealng the structure of software systems startng from the source code (e.g. [4], [8], [9], []). Most of such tools focus on vsualzng hgh-level system abstracton, such as classes, modules, and packages, usually extracted from source code n a reverse engneerng process. However, these tools do not show lower-level system changes, such as the many, mnute source code edts done durng debuggng. Moreover, the focus s on a fxed system structural vew that does not show all changes the code has undergone n tme. Varous graph drawng technques, such as the one proposed by Collberg et al. [], tred to overcome ths lmtaton by showng the temporal dmenson of software structures and mechansms evoluton. However, ther stll to be valdated approach does not seem to scale well on real-lfe data sets. At the other end of the granularty spectrum, the SeeSoft tool of Eck et al. [4] uses a lne-based approach: Source fles are seen as a set of code lnes, each of whch s drawn as a pxel lne. Ths allows vsualzng many thousands of lnes on a sngle screen. Several smlar technques and tools have been proposed (Aspect Browser [8], Bee / Hve [3], sv3d [], Augur [7]). Whle these approaches succeed n revealng structure and change dependences between code fragments, they only offer snapshots n tme, and do not reveal changes n the global context of an entre project lfe span. In ths paper, we propose a new technque for vsualzng the evoluton of lne-based software structures, semantcs and attrbutes usng space-fllng dsplays. We use dense pxel dsplays to show the overall evoluton, and ntegrate them n an orchestrated envronment of correlated vews to offer detals-ondemand. We also ntroduce a novel concept, the b-level code dsplay that gves a detaled, yet ntutve, vew of both the contents of a code fragment and ts evoluton n tme. We valdate our approach by analyzng the evoluton of fles spannng

thousands of lnes along tens of versons, usng data from reallfe, ndustry-sze CVS repostores. The structure of ths paper s as follows. In Secton, we brefly revew lne-based vsualzaton tools for software evoluton and ther challenges. In Secton 3, we ntroduce CVSscan, a tool we developed to test and valdate the vsualzaton technques we propose. Secton 4 presents results of two case studes we performed. These studes show how our approach can be successfully used to nvestgate the evoluton of fles from real lfe software projects. Secton 5 summarzes the novel contrbuton we brng to software evoluton vsualzaton and outlnes future drectons of research.. RELATED WORK We defne the challenge of lne-based software-evoluton vsualzaton usng the fve dmensons proposed by Maletc et al. []: task, audence, target, medum, and representaton. The man task s to gan nsght n the structure and operaton of a software system by studyng the evoluton of changes n ts source code organzaton, semantcs and attrbutes. The ntended audence s manly composed of developers and mantaners. These usually face software n the late stages of ts development process, and need to get an understandng of t, often wth no other support than the code tself. However, our audence ncludes other roles too, as follows: Project managers can get an overvew of source code producng actvtes, testers can dentfy the regresson tests requred at system change, new team members can get famlar wth the software and set-up ther socal network based on relevant techncal ssues, and eventually archtects can dentfy subsystems needng redesgn. The target of lne-based softwareevoluton vsualzaton s the collecton of source code fles mantaned by verson control management (VCM) systems, such as CVS, Subverson, or Mcrosoft s SourceSafe. Such systems mantan an archve of all ntermedate versons of fles and gve, thus, access to a lne-based hstory of changes. The ntended medum for vsualzaton s the standard PC graphcs dsplay used for most software development envronments. Fnally, the representaton s formed by lne-orented, dense-pxel dsplays. Lne-based software vsualzaton has been addressed n a number of tools. SeeSoft, already ntroduced n Secton, s the frst tool we are aware of that proposes a drect code lne-to-pxel lne vsual mappng [4]. Color s used to show code fragments that correspond to a gven modfcaton request. The Aspect Browser [8] uses regular expressons to locate specfc artfacts (e.g. key words) and then t vsualzes ther dstrbuton. Tarantula [] uses color and a lne-orented dsplay to represent the degree of success wth wtch a fragment of code passed a number of tests. Bee/Hve [3] and sv3d [] use a 3D lne-based code dsplay. The z axs shows addtonal attrbutes (Bee/Hve) or s used to pack the lne-based vsualzaton more compactly (sv3d). Augur [7], a recent effort n the area, combnes wthn one vsual frame nformaton about both artfacts and the actvtes of a software project at a gven moment. Fnally, UNIX s gdff and ts Wndows verson WnDff vsualze code dfferences between two versons of a gven fle by depctng the lne nsertons, deletons, and modfcatons, as computed by the dff utlty. However effcent for comparng pars of fles, these tools cannot deal wth real-lfe fle evolutons that often have hundreds of versons. The above tools are successful n revealng the lne-based structure of software systems, and uncover change dependences at gven moments n tme. However, they do not provde nsght nto the code attrbutes and structure changes made throughout an entre project duraton. The approach we present here attempts to gve a detaled overvew of such an evoluton n the context of source code mantaned n a VCM system. In the next sesson, we detal our approach, and we ntroduce CVSscan, a tool we developed to valdate the proposed vsualzaton technques. 3. METHODS AND TOOL DESCRIPTION CVSscan s a vsual tool we developed to support the program and process understandng for the mantenance of large software projects. Smlarly to other lne-based software vsualzaton tools, CVSscan bulds on the assumpton that developers are comfortable wth vsualzatons that present the code n the same spatal context n whch they construct,.e. wrte t [4]. Snce software mantenance s manly done at code level, we decded to use a lne-based approach to vsualze the software. In order to understand the software, developers can beneft from addtonal nformaton regardng ts evoluton, such as tme and authors of code changes. Such nformaton facltates team communcaton n collaboratve projects, and also places nvestgatons n the context of an entre project evoluton, such as dscoverng that problems n a specfc part of the code appear after another part was changed. Such nsght s easer to get when vsualzng the context of an entre project evoluton. In contrast, ntensve debuggng and runtme analyss s needed to get t from a sngle code snapshot. Hence, we vsualze n CVSscan the evoluton of source-code structure and attrbutes across an entre project lfe span. Typcal questons we try to answer wth ths are: - What code lnes were added, removed, or altered and when? - Who performed these modfcatons of the code? - Whch parts of the code are unstable? - How are changes correlated? - How are the development tasks dstrbuted? - What s the context n whch a pece of code appeared? We next detal the structure of the data we vsualze (Secton 3.) and the vsual mappngs used to dsplay t n our tool (Secton 3.). 3. Data Model Our data comes from the CVS verson control management (VCM) system. To decouple CVS from the vsualzaton tself, data extracton s done by a separate tool: CVSgrab (Fgure ). In ths way one can use our vsualzaton tool wth any VCM, once a sutable data extractor s mplemented. The central element of a VCM system s a repostory that stores all versons of a gven fle. A repostory R s a set of NF fles: { F = NF} R =.. Each fle F s defned as a set of NV versons: { V v NV } F = j, =..

verson control management system data extractor vsualzaton verson V verson V verson V 3 nt = ; nt j = ; nt = ; nt h = 3; nt j = ; nt h = 3; nt j = ; Lne poston n fle CVS CVSGrab Fgure : Software-evoluton vsualzaton tool chan Each verson s a tuple contanng the unque ID of the verson, the author that contrbuted (commtted) t to the repostory, the tme when t was commtted, and ts source code: V j, =, d, author, date code Our vsualzatons wll consder the fles F separately, so we drop the fle ndex n the followng. To compare the source code code( V j ) and code( V j+ ) of two consecutve versons V j and V j+, we use a tool lke UNIX s dff, whch reports the nserted and deleted lnes n V j+ wth respect tov j. All lnes not V deleted or nserted n j+ are defned as constant (not modfed). Fnally, lnes reported to be both deleted and nserted n some verson are defned as modfed (edted). We denote by l the th lne of the verson we talk about n some gven context. Usng dff, we can also fnd whch lnes n V j+ match constant (or modfed) lnes nv j. For one such lne, we call the complete set of matchng occurrences n all versons (.e. the transtve closure of the above match relaton) a global lne l. For every l, L ( l ) denotes the global lne assocated wth l. From these data, we buld several functonal characterzatons for the source code evoluton at lne level. The most mportant s the global lne poston: ( j, l ): N N N G We can explan G ( j, l ) by a graph analogy. For every global lne l, we buld a graph node N (l). Nodes are created by scannng versons V j n ncreasng order of j, and lnes l n each verson n ncreasng order of. If lnes l and l + are consecutve n a gven verson, we set a drected arc from N( L( l )) to N ( L( l + )). Fnally, when a node N s nserted between two other nodes N A and already exstng node between N A and shows three versons of a fle and ther correspondng graph. CVSscan N B, we set an arc from any N B to N. Fgure Ths graph s drected and acyclc, and gves a total order relaton between all code lnes. The node correspondng to the global lne l before whch no other lne exsted durng the whole project s the only one havng only outgong arcs. nt = ; nt j = ; j nt = ; nt h = 3; nt j = ; j h nt h = 3; nt j = ; j h Global lne poston Fgure Global lne poston and correspondng graph analogy We label ths start node (e.g. node n Fgure ) wth zero and all other nodes wth the maxmal path length (defned as number of arcs) to the start node, e.g. by dong a topologcal sort of the graph (see [3]). We obtan then, for every lne l n every versonv j, that G( j, l ) = label( N( l)), where l = L( l ). Ths gves a unque label to all code lnes wrtten durng development, keeps the partal lne orders mpled by the dfferent versons n the project, and ensures that lnes n dfferent versons dentfed by dff as nstances of the same global lne have the same label. Next, we ntroduce the lne status S ( j, ) : N N STATES V. S s whch characterzes the global poston n verson j computed by comparng the current lne l C at global poston n versonv j wth the lnes l P and l N havng the same global poston n the prevous and next versons V j and V j+ respectvely. The status can be one of the followng: constant: l P exsts nv j and s dentcal wth l C modfed: l P exsts nv j or l N exsts nv j+, but dffers from l C deleted: l P exsts nv j and l C does not exst nv j or S( j, ) = deleted. nserted: l N exsts nv j+ and l C does not exst nv j or S ( j +, ) = nserted modfed by deleton: l C s modfed, and S ( j, + ) = deleted OR modfed by deleton modfed by nserton: l N s modfed, and S ( j, + ) = nserted OR modfed by nserton Further nformaton can be extracted from the source code. CVSscan uses a fuzzy parser wth a customzable grammar to 3

extract nformaton such as blocks, comments, preprocessor macros, and so on. Ths produces the construct attrbute ( j, l ): N N Grammar C Fle A Project fles Fle B Tme Fle A V V V3 V4 V5 whch descrbes, for every lne l n every verson V j, the grammar construct that lne belongs to. We use ths nformaton to vsualze the structure of a gven verson (Secton 3..). We next present the technques we used to map these characterzatons to vsual elements. 3. Vsual Mappng Our man focus s to allow the user to easly perform hs nvestgatons by mnmzng the cogntve overhead of multple representatons for the same data. For ths, CVSscan uses a snglescreen dsplay of a fle s entre evoluton. 3.. Dmensons Smlarly to prevous lne-based software representatons ([4], [8], [7]), we represent every lne of code as a pxel lne on the screen. For CVSscan, we took the decson to use a D representaton. Our need to vsualze many attrbutes together may frst suggest usng a 3D vew. However, we chose for D n order to have a smple user nterface, no occluson problems, and a vsual layout perceved as smple for code developers. The man questons we next had to answer were how to layout the lne representatons n a plane, and how to use color for encodng attrbutes. Our layout approach s dfferent n two man aspects from prevous lne-based layouts. Frst, we do not use ndentaton and lne length to suggest code structure, but use a fxed-length pxel lne for all code lnes and color to encode structure (Fgure 3). a) b) Fgure 3: Lne layout a) SeeSoft b) CVSscan Secondly, we vsualze on the same screen all versons that a fle has durng ts evoluton, nstead of all fles n a project at a gven tme (Fgure 4). The horzontal axs represents thus evoluton n tme and the vertcal one the lne poston l. Each verson s shown as a vertcal strpe composed of horzontal pxel bars depctng lnes of code (Fgure 3). Fnally, whle other tools use lne color to represent only one data attrbute (e.g. lne age n [4],[7]), we use t to encode the author, construct, and lne status attrbutes defned n Secton 3. (Fgure 5). Overall, our approach trades revealng the length of code lnes off for offerng a spaceeffcent fllng to show fles and ther structure. Ths allows us to vsualze more source code on the same screen. Secondly, we focus on one fle at a tme, n order to delver comprehensve vew of ts evoluton, enablng users to make correlatons between modfcatons n tme. For the vertcal layout of lnes wthn one verson strp, we propose two approaches. Lne poston n fle Lne poston n fle a) b) Fgure 4: Use of horzontal axs n lne-based vsualzatons a) fles, n SeeSoft b) tme, n CVSscan Author A Author B Author C a) Fle Reference Block (nestng level ) Comment Block (nestng level ) b) Constant To be nserted Modfed Deleted Fgure 5: Attrbute color encodng: a) construct; b) lne status; c) author The frst one, called fle-based layout, uses as y coordnate the local lne poston l (Fgure 6.a). Ths layout offers an ntutve classcal vew on fle organzaton and sze evoluton, smlar to [4]. The second approach, called lne-based layout, uses as y coordnate the global lne poston ( j ) l c) G, (Fgure 6.b). Whle ths preserves the order of lnes of the same verson, t ntroduces empty spaces where lnes have been prevously deleted or wll be nserted n a future verson. In ths layout, each global lne l has a fxed y poston throughout the whole vsualzaton. Ths allows easy dentfcaton of code blocks that stay constant n tme, or get nserted or deleted. To show varous attrbutes, CVSscan offers alternate color encodngs of the author, construct and lne status functonal characterzatons of a verson. We use a fxed set of perceptually dfferent colors to encode the authors (Fgure 5a). For constructs (.e. blocks, comments and references) we use a customzable color map, and modulate lumnance to encode the block nestng level (Fgure 5b). Fnally, we use a customzable color map to ndcate the status of lnes n a gven verson (Fgure 5c). At each moment, one color scheme s actve, such that the user can study the tme evoluton of ts correspondng data attrbute. When nterestng patterns are spotted, one can swtch to another scheme to get more detaled nsght n the matter. 4

Legend Dscrete tme (versons) Constant lne Lnes to be nserted New lnes Dscrete tme (versons) Deleted lnes Local Lne Poston a) Global Lne Poston Fgure 6: Lne layout n CVSscan: a) fle-based b) lne-based Fgure 7 shows the CVSscan vsualzaton of a fle evoluton through 65 versons. Color encodes lne status: green denotes constant, yellow modfed, red modfed by deleton, and lght blue modfed by nserton respectvely. Addtonally, n the lnebased layout (bottom), lght gray shows nserted and deleted lnes. The fle-based layout (top) clearly shows the fle sze evoluton and allows spottng the stablzaton phase occurrng n the last thrd of the project. Here, the fle sze has a small decrease correspondng to code cleanup, followed by a relatvely stable evoluton correspondng to testng and debuggng. Yellow fragments correspond to areas that need reworkng durng the debuggng phase. b) a) b) c) Fgure 8: Attrbute encodng: a) lne status; b) construct; c) author 3.. Multple Vews A key factor n understandng the patterns revealed by evoluton vsualzaton s to correlate them wth other nformaton about the program. Besdes the lne-based vsualzaton of code evoluton we presented so far, CVSscan offers two addtonal metrc vews and a novel text vew on selected code fragments (Fgure 9). metrc vew metrc vew stablzaton phase Fgure 7: Lne status vsualzaton. Fle-based (top) and lnebased (bottom) layouts Fgure 8 llustrates dfferent color encodngs on a zoom-n of the lne-based layout n Fgure 7 (bottom). In Fgure 8.a, we use yellow to encode lnes that suffer modfcatons when passng from one verson to another, as shown n the hghlght. Snce the modfcaton relaton s symmetrc (see Secton 3.), yellow lnes always appear n pars. Swtchng to the color scheme that encodes the construct attrbute (Fgure 8.b) enables the user to dscover that the modfed pece of code s n a comment, encoded by the dark green color. Ths means the modfcaton does not actually alter the code functonalty. Fnally, the author attrbute (Fgure 8.c) shows the developer that performed the modfcaton, e.g. the purple one n our hghlght. code vew Fgure 9: Multple code vews n CVSscan The metrc vews encode per-verson and per-global-lne data and show these wth vertcal, respectvely horzontal color bars to complement the evoluton vsualzaton. Dfferent metrcs are avalable. For example, two proposed horzontal metrcs show, for each verson, ts number of lnes or ts author (Fgure ). A useful vertcal metrc shows the lfetme of a code lne for a gven global lne poston. a) b) Dscrete tme (versons) Dscrete tme (versons) Fgure : Metrc vews: a) verson sze; b) verson author The code vew offers a text look at the code. Users can select the code to be dsplayed by sweepng the mouse n the evoluton vew. Vertcal brushng n the code evoluton area scrolls through a verson s code, whereas horzontal brushng over the lne-based layout (Secton 3..) goes through a gven lne s evoluton. An mportant ssue we address n the desgn of CVSscan s how to correlate the code and evoluton vews, when the latter uses the lne-based layout. The queston s what to dsplay when the user 5

brushes over an empty space n the evoluton vew. Ths space corresponds to deleted or nserted lne status values,.e. the code at the mouse poston was deleted n a prevous verson or wll be nserted n a future verson (see e.g. the lght gray areas n Fgure 7). Freezng the code dsplay would create a sensaton of scrollng dsrupton, as the mouse moves but the text doesn t change. Dsplayng code from a dfferent verson that the one specfed by the mouse poston, would have a negatve mpact on the context. We solve ths problem by a new type of code dsplay. We use two text layers to dsplay the code around the brushed global lne poston both from the verson under the mouse and from versons n whch ths poston does not refer to an empty space (Fgure ). mouse poston Layer A evoluton vew Layer B Fgure : Two-layered code vew Whle the frst layer (A) freezes when the user brushes over an empty regon n the evoluton vew, the second layer (B) pops-up, and scrolls through the code that has been deleted, or wll be later nserted at the mouse locaton. Ths creates a smooth feelng of scrollng contnuty durng brushng. In the same tme, t preserves the context of the selected verson (layer A) and gves also a detaled, text level peek, at the code evoluton (layer B). The three motons (mouse, layer A scroll, layer B scroll) are shown also by the fgures,, and 3 n Fgure 4. frst verson lfetme of lne lfetme of lne last verson Fgure : Code vew, layer B. Lne s deleted before lne appears (.e. they do not coexst) We must now consder how to assess the code evoluton shown by layer B. The problem s that lnes of code located at consecutve global postons mght not coexst n the same verson. In other words, layer B consecutvely dsplays code lnes that may not belong to one sngle verson. We need a way to correlate ths code wth the evoluton vew. We acheve ths by showng the lnes lfetmes as dark background areas n layer B (Fgure ). Fnally, we ndcate the author of each lne by colored bars near the vertcal borders of the code vew (Fgure ). Summarzng, the code vew offers a detaled look on a specfc global poston n a selected verson, ncludng nformaton about ts evoluton and the developers that make t happen. 3..3 Vsual Improvements Real lfe software projects contan large fles of thousands of lnes. The resoluton of commodty graphc dsplays s not suffcent to ft the entre fle evoluton on one screen, unless more lnes share the same physcal screen pxels. Ths rases the queston how to represent code lnes that share pxels such that the user gets a consstent, comprehensble and complete mage of the fle evoluton. We address ths ssue n CVSscan by a poston-based antalasng algorthm. Antalasng s used when the total number of lnes to be dsplayed s larger than the avalable resoluton. The algorthm computes the screen color of a number of overlappng lnes by averagng ther colors and weghtng them accordng to ther degree of overlap. That s, lnes that ft nsde one pxel locaton have a full weght, and lnes that spread on more locatons have a weght that equals the lne percentage covered by the pxel locaton (Fgure 3). Pxel Pxel Lne (weght.) Lne (weght.) Lne 3 (weght.5) Fgure 3: CVSscan antalasng algorthm An alternatve would be to compute the lne weght based on attrbute values. Whle ths would help emphaszng lnes based on ther attrbutes, t may ntroduce structure nconsstences when usng dfferent dsplay magnfcaton levels, so more research s needed to fnd out whether and/or how well ths alternatve would work. 3.3 User Interacton In addton to the vsualzaton technques descrbed n Secton 3., CVSscan offers a wde range of nteracton means to facltate the navgaton of data. We descrbe below, usng Shnederman s perspectve [5], the repertore of nteractve exploraton nstruments we provde. All nstruments are desgned to use a pont-and-clck approach, makng the entre exploraton possble only by the use of a mouse. A tool snapshot llustratng these mechansms s shown n Fgure 4. As explaned so far, CVSscan offers an ntutve overvew on the evoluton of a program fle n a sngle D mage, even for fles whose number of lnes exceeds the avalable screen resoluton (Secton 3..3). To get more detaled nsght n a specfc regon of the evoluton, CVSscan offers zoom and pannng facltes. Ths enables the user to drll down to more detaled representatons, n whch the evoluton of each lne of code may be assessed. The tool offers also two preset zoom levels that act as shortcuts to the global overvew (ft all code to wndow sze) and to the one-pxel-per-code-lne level. In order to support the fle evoluton analyss from the perspectve of one gven verson, CVSscan offers a flterng mechansm by means of whch all lnes that are not relevant are removed from the vsualzaton,.e. lnes that wll be nserted after the selected 6

verson, or lnes that have been deleted before the selected verson. Flterng enables the user to assess a verson, selected by clckng on t, by clearly dentfyng ts lnes that are not useful and wll be eventually deleted, and the lnes that have been nserted nto t snce the begnnng of the project. In other words, flterng provdes a verson-centrc vsualzaton of code evoluton. Addtonally, the tool gves the possblty to extract and select only a desred nterval to study the fle evoluton. Ths mechansm s controlled by two slders (shown n Fgure 4, top) smlar to the page margn selectors n word processors. By choosng the startng and fnshng verson, one can remove from vsualzaton the code that s not relevant,.e. code deleted before the startng verson, or code nserted after the fnshng one. Ths Left nterval selector Presets mechansm proved to be useful n projects wth a long lfetme (e.g. over 5 versons) n whch one usually dentfes dstnct evoluton phases that should be analyzed separately. CVSscan enables the user to correlate nformaton about the software evoluton wth specfc detals of the source code and overall statstc nformaton. By means of metrc vews, users can vsually get statstc nformaton about lnes, e.g. the lfetme of a lne at a gven global poston, or versons, e.g. a verson s author or sze. The b-level code vew (Secton 3..) offers detals-ondemand about a code fragment: the text body, the lne authors and the text evoluton. The user can select the fragment of nterest by smply brushng the fle evoluton area. Evoluton overvew Rght nterval selector Verson centrc flter Zoom controls 3 Code vew, man layer Code vew, second layer Fgure 4: CVSscan tool overvew. The fle verson and lne number under the mouse () s shown n detal n the text vews (,3) 7

Although CVSscan s an exploraton tool that does not alter the data t vsualzes, t mantans a collecton of state varables that may be externalzed. Ths enables the user to keep a hstory of hs actons and let hm recover and reuse a specfc vsualzaton settng at a later tme. In ths drecton, a smple extenson that our users suggested so far was to add an annotaton faclty by whch developers can add ther own comments, and vsualze added comments, to a gven verson or lne poston. In the followng secton we present the results of two nformal studes that show how the nteracton mechansms presented above and the vsualzaton technques descrbed n Secton 3. can be successfully used to nvestgate the evoluton of fles from real lfe software systems. 4. USE-CASES AND VALIDATION The man target audence of the CVSscan tool s the mantenance communty. They perform ther tasks outsde the prmary development context of a project, and most of the tmes long after the ntal development has ended. Therefore, the man actvtes a mantaner performs are related to context recovery, such as program understandng and team network buldng. CVSscan facltates ths process by vsualzng fle evoluton from the perspectve of dfferent attrbutes and features, such as fle structure, modfcatons, and authors. In order to valdate the vsualzaton technques and methods n CVSscan, we organzed a number of nformal studes. The am was to record and analyze the experences of software mantaners when they nvestgate completely new programs,.e. programs of whose development they dd not partcpate to, wth no other support than CVSscan tself. We present below the outcome of two such studes of the larger set we organzed. In both cases, the users partcpated frst n a 5 mnutes tranng sesson. Durng the sesson, the tool s functonalty was demonstrated on a partcular example fle. After that, each user was gven a fle for analyss, but no nformaton about ts contents whatsoever. A slent observer recorded both user actons and fndngs. Case study : analyss of a Perl scrpt fle In the frst case, the user was gven a scrpt fle from the FreeBSD dstrbuton of Lnux, contanng 457 global lne postons and spannng 65 versons. The user was famlar wth scrptng languages, but had no advanced knowledge about any of them. The user started CVSscan usng the default fle-based layout to vsualze the evoluton of fle structure. a) b) c) Fgure 5: Case study - Analyss of a Perl scrpt The user brushed frst over the green areas n the evoluton vew: These are comments, rght? Let s see frst what they say. He started to brush from the begnnng of the fle, choosng frst the comments that spanned over the entre evoluton. In the same tme he read the code fragments dsplayed n the code vew. Ths s Perl. All Perl scrpts have ths path on the frst lne. Ths one looks lke a fle descrpton. It reads that ths scrpt handles pre commts of fles Then whle brushng over the comment fragments (Fgure 5.a top bottom ): These are annotated textual dvders: Confgurable optons, Constants, Error messages, Subroutnes, Man body. I use these too n my programs Here are also some annotatons Further on, the user nvestgated also the large comment fragments that dd not span over the whole evoluton: It looks lke the mplementaton was ether not completed or the developers left a lot of garbage. There are some code fragments over here that are commented out. The user next selected the last verson and brushed over the Subroutnes area It looks lke these lnes do not belong to any block. Here s a blank lne before the wrte_lne procedure. Here a blank lne before exclude_fle. So there are whte lnes before every procedure? Yes, ndeed: check_verson, fx_up_fle. So there are four procedures. It seems exclude_fle s the most complex one as t has the hghest nestng level At ths pont, the user had a hgh-level understandng of the fle structure. He started to make nqures about the developers that had worked on the fle. For that, he swtched back and forth between the construct and author attrbutes usng shortcut buttons: The yellow developer, Dawes, dd most of the work. However, the orange one, Robn, wrote that complex exclude_fle procedure. He dd that towards the end of the project, so probably that adds some extra functonalty to the core. I see also that the cyan developer, Ech, dd some sgnfcant work towards the end n the check_verson procedure (Fgure 5.b top bottom). It seems that hs concern was to rule out fles contanng DOS lne breaks... So ths scrpt doesn t handle DOS fles? The user then dsmssed the authors that had only small contrbutons and swtched to the lne status vsualzaton: Apparently a major change took place n the mddle of the project. It manly affected the check_verson procedure. Then, selectng the verson that followed the modfed by nserton lnes of the major change, the user started to concentrate on the areas where modfcatons took place: I see a number of modfcatons between these two versons (Fgure 5.c top bottom). The frst one replaces a fle reference wth a fully qualfed name; the second does the same, the thrd too, the fourth, the ffth. Oh, they should have kept that fle name n a separate varable! Here they tuned the regular expressons Here they replaced a constant strng wth a varable 8

The user contnued to brush all areas where modfcatons appeared and tred to correlate them wth the code and the authors that commtted them. We nterrupted the experment after 5 mnutes. At the end of the exercse, the user was famlar wth the overall organzaton of the fle, the focus of each ndvdual contrbutor, the places that had gone through mportant modfcatons and what ths modfcatons referred to. Case study : analyss of a C code fle In the second case, an experenced C developer was asked to analyze a fle contanng the socket mplementaton of the X Transport servce layer n the FreeBSD dstrbuton of Lnux. The fle had 9 global lne postons and spanned across 6 versons. We provded the user wth a CVSscan verson able to hghlght C grammar constructs, such as #defne, #fndef, etc (see Secton 3.). The second user started the tool n the default mode too, and tred frst to look for commented fragments: Ths s the copyrght header, pretty standard. It says ths s the mplementaton of the X Transport protocol, pretty heavy stuff It seems they explan n ths comments the mplementaton procedure The user next swtched hs attenton to the compler drectves: A lot of compler drectves. Qute complex code, ths s supposed to be portable on a lot of platforms. Oh, even Wndows. Next, the user started to evaluate the nserted and deleted blocks: Ths fle was clearly not wrtten from scratch, most of ts contents has been n there snce the frst verson. Must be some legacy code I see major addtons done n the begnnng of the project that have been removed soon after that They tred to alter some functon calls for Posx thread safe functons (Fgure 6.a top bottom) I see major addtons also towards the end of the project A hgh nestng level, could be somethng complex It looks lke code requred to support IPv6. I wonder who dd that? a) b) c) Fgure 6: Case study - Analyss of a C code fle The user swtched then to the author vsualzaton: It seems the purple use, Ts, dd that (Fgure 6.b top bottom). But a large part of hs code was replaced n the fnal verson by Danel. Ths guy commtted a lot n the fnal verson... And everythng seems to be requred to support Ipv6. The green user, Ech, had some contrbuton too well, he manly prnts error messages. Eventually, the user swtched to the evoluton of lne status and used the predefned Ft to lne settng to zoom n. Indeed, most work was done at the end Stll, I see some major changes n the begnnng throughout the fle... Ah, they changed the memory manager. They stepped to one specfc to the X envronment I assume. All memory management calls are now preceded by x (Fgure 6.c top bottom) And here they seem to have gven up the TRANS macro. The user spent the rest of the exercse assessng the modfcatons and the authors that commtted them. We nterrupted the experment after 5 mnutes. At the end, the user dd not have a very clear mage of the fle s evoluton. However, he concluded that the fle represented a pece of legacy code adapted by manly two users to support the IPv6 network protocol. He also ponted out a major modfcaton: the change of the memory manager. 5. CONCLUSIONS In ths paper, we present a new approach for the vsualzaton of software evoluton usng lne-orented dsplays, and we ntroduce CVSscan, a tool we developed to valdate the proposed technques. The man audence we target wth our work s the software mantenance communty. The goal s to provde them wth support for program and process understandng. Our novel approach uses multple correlated vews on the evoluton of a software project. We use dense pxel dsplays to show the overall evoluton of code structure, semantcs and attrbutes, and we ntegrate them n an orchestrated envronment to offer detals-on-demand. We also ntroduce a novel type of code text dsplay that gves a detaled, yet ntutve, vew on both the composton of a fragment of code and ts evoluton n tme. We also present n ths paper the typcal outcome of a number of user studes we dd to valdate our approach on data from real-lfe CVS repostores. Although nformal, the studes show that the lne-based evoluton vsualzaton of code supports a quck assessment of the mportant actvtes and artfacts produced durng development, even for users that had not taken part n any way n developng the examned code. Our tool and datasets used n the two dscussed case studes are avalable for download at http://www.wn.tue.nl/~lvonea/soft/cvsscan_setup.exe. So far, we only focused on the evoluton of ndvdual fles. As future drecton of research, we would lke to extend our approach wth hgher-level overvews, such as whole-project evoluton vsualzatons, to enable evoluton analyses on entre systems. Fnally, our am s to ntegrate CVSscan n a toolset for code vsualzaton and analyss n order to make t effectvely and effcently avalable to the software development process. 6. ACKNOWLEDGEMENTS Ths research was part of the ITEA project Space4U, whose am s to defne a component based framework for the mddleware layer of hgh volume embedded applances (http://www.wn.tue.nl/space4u) 7. REFERENCES [] Beck, K., Andres, C., Extreme Programmng Explaned: Embrace Change ( nd Edton), Addson-Wesley, 4 [] Collberg, C., Kobourov, S., Nagra, J., Ptts, J., Wampler, K. A System for Graph-Based Vsualzaton of the Evoluton of 9

Software, Proc. ACM SoftVs 3, ACM Press, NY, USA, 3, 77 86. [3] Cormen, T, Leserson, C., Rvest, R., Introducton to Algorthms, 6th edton, MIT Press, 996. [4] Eck, S. G., Steffen, J. L., Sumner, E. E. SeeSoft --A Tool for Vsualzng Lne Orented Software Statstcs. IEEE Trans. on Software Engneerng, 8(),99, 957 968. [5] Eglsperger, M., Kaufmann, M., Sebenhaller, M. A Topology-Shape-Metrcs Approach for the Automatc Layout of UML Class Dagrams. In Proc. ACM SoftVz 3, ACM Press, NY, USA, 3,89 98. [6] Erlkh, L. Leveragng Legacy System Dollars for E-busness. (IEEE) IT Pro, May-June, 7 3. [7] Froehlch, J., Doursh, P. Unfyng Artfacts and Actvtes n a Vsual Tool for Dstrbuted Software Development Teams. In Proc. ICSE 4, IEEE CS Press, Washngton DC, USA, 4, 387 396. [8] Grswold, W.G., Yuan, J.J., Kato, Y. Explotng the Map Metaphor n a Tool for Software Evoluton. Proc. ICSE, IEEE CS Press, Washngton DC, USA,, 65 74. [9] Gutwenger, C., Junger, M., Klen, K., Kupke, J., Lepert, S., Mutzel, P. A New Approach for Vsualzng UML Class Dagrams, Proc. ACM SoftVz 3, ACM Press, NY, USA, 3,79 88. [] Jones, J.A., Harrold, M.J., Stasko, J. Vsualzaton of Test Informaton to Assst Fault Localzaton. Proc. ICSE, ACM Press, NY, USA,, 467 477. [] Maletc, J.I., Marcus, A., Collard, M.L. A Task Orented Vew of Software Vsualzaton. Proc. IEEE VISSOFT, IEEE CS Press, Washngton DC, USA, 3 4. [] Marcus, A., Feng, L., Maletc, J.I. 3D Representatons for Software Vsualzaton. In Proc. ACM SoftVs 3, ACM Press, NY, USA, 3, 7 36. [3] Ress, S.P. Bee/Hve: A Software Vsualzaton Back End. In Proc. of the Workshop on Software Vsualzaton, ICSE, 44 48. [4] Seacord, R. C., Plakosh, D., Lews, G. A. Modernzng Legacy Systems: Software Technologes, Engneerng Process, and Busness Practces. Addson-Wesley, SEI Seres n Software Engneerng, 3. [5] Shnedermann, B., The Eyes Have It: A Task by Data Type Taxonomy for Informaton Vsualzaton. Proc IEEE Symp. on Vsual Languages (VL 96), IEEE CS Press, Washngton DC, USA, 996, 336 343 [6] Sommervlle, I. Software Engneerng (6 th edton). Addson- Wesley, [7] Standsh, T.A. An Essay on Software Reuse. IEEE Trans. on Software Engneerng, (5), Sep. 984, 494 497. [8] Storey, M.A., Best, C., Mchaud, J., Raysde, D., Ltou, M., Musen, M. SHrMP Vews: an Interactve Envronment for Informaton Vsualzaton and Navgaton. Proc. CHI, ACM Press, NY, 5 5. [9] Telea, A., Maccar, A., Rva, C. An Open Toolkt for Prototypng Reverse Engneerng Vsualzaton. In Proc. IEEE VsSym, The Eurographcs Assocaton, Are-la- Vlle, Swtzerland,, 4 5. [] Tlley, S.R., Wong, K., Storey M.A.D., Muller, H.A. Rg: A vsual tool for understandng legacy systems. In Internatonal Journal of Software Engneerng and Knowledge Engneerng, December 994