Program Understanding with Code Visualization

Program Understanding with Code Visualization Arif Iftikhar Department of Computer Science National University of Computer and Emerging Sciences 852-B Faisal Town, Lahore, Pakistan l060802@lhr.nu.edu.pk Abstract -Software is invisible, disappearing into files on disks. The invisible nature of software contributes to low programmer productivity by hiding system complexity, particularly for large team-oriented projects. The data regarding the components of a software system consists of a large amount of information such as version history, number of lines, defect density, and complexity measures. The ability to quickly grasp a comprehensive view of the evolution and dependencies of such information is the key to making informed decisions about future developments of the system. Code Visualization provides techniques and tools to understand the overall software and its complexity. It also contributes to identify any design defects that may be present in software. In the current paper, we present our study that we conducted to comprehend different techniques that aim to ease the task of code understanding. We present these techniques for our case study software and provide a comparative study of the studied visualization techniques. Keywords: Object-oriented programming, software visualization, reverse engineering, visual patterns. I. INTRODUCTION: The ever-increasing complexity of software systems together with the advent of lightweight development methodologies, such as extreme programming [8], tend to shift development costs from early stages, such as architecture and design, towards later stages, Today s industrial software systems are tremendously complex, with size counting in millions of lines of code. One way to cope with complexity is to represent information hierarchically in several levels of abstraction. Fortunately, software systems are often structured hierarchically as systems, subsystems and components located in packages and/or directories. However contained information does not tell us much about the complexity of the contained elements. Visual representation is an appealing way to represent lots of information simultaneously. Information Visualization is a recent and emerging technology which attempts to adopt graphics techniques for visualizing abstract entities that have no concrete shapes. Information visualization has proven to be an effective solution for understanding and uncovering information embedded in large datasets. Many software visualization tools have been designed to help revealing the structure of software systems by extracting information from their source code. In the current paper, we present our study that we conducted to comprehend different techniques that aim to ease the task of object-oriented code understanding. These techniques extract information regarding classes, methods, attributes, and inheritance links from code. The extracted information is then represented in the form of graphical entities. We have studied these techniques for object-oriented systems and compared their effectiveness for code understanding. This paper is organized as follow. Section 2 presents the visualization techniques that are used for code visualization. Section 3 presents a case study that is visualized using the techniques mentioned in the Section 2. Section 4 presents comparative study for visualization techniques used in the case study. As a conclusion, section 5 presents an outlook of the future work. II. VISUALIZATION TECHNIQUES A. Software Metrics: Merging Boxes & Classes Software Metrics are commonly used by researchers to analyze code. There is a technique that maps metrics on the 3D boxes. This technique aims at providing visualization with respect to cohesion, coupling and number of methods in the class. It is a fact that there are core problems faced while reviewing and understanding code. Code visualization can be done through object-oriented metrics [7] by 1

representing them in the form of 3D Box[1]. Box is used as it has evident characteristics of the being shown as tall. Metrics are mapped to color, twist and size of the box. In a study three kinds of matrix have been considered namely[7]: Coupling Between Objects (CBO) Lack Of Cohesion in method (LCOM5) Weighted Methods Per Class (WMC) Size complexity is measured by WMC and hence it is mapped to size of box. Coupling is measured by CBO and that is mapped to color of box from blue to red with red showing highly coupled. Lastly the cohesion is depicted by twist of the box more twist would indicated more lack of cohesion. Figure 1(a) shows a box representing class which loosely coupled, highly cohesive and with lesser weighted methods. Figure 1(b) shows a class which a bit more coupled, less cohesive and has more weighted methods per class as compared to class in Figure1(a). Figure 1(c) depicts an extreme case where the class is highly coupled, completely lacks cohesion and has more weighted methods per class. B. Evolutionary Software Visualization: The Evolution Matrix This technique aims at providing historical evolution of the software. This undoubtedly helps a lot in maintenance and support of large software systems. Many people have researched on getting knowledge about the evolution of software system. Evolution matrix helps understanding code by depicting evolution of classes within object-oriented software systems and the evolution of the systems themselves [2]. Further it acts a revealer of certain specific situations that occur during system evolution such as pulsating classes that grow and shrink during the lifetime of the system. With the evolution matrix view we want to achieve the following evolutionary reverse engineering goals: Understand the evolution of object-oriented software systems in terms of size and growth rate. Understand at which point in time classes have been introduced into a system and at which moment they have been removed. See if there are patterns in the evolution of classes. Such patterns help to understand the condition of a class in a time perspective, e.g., how resistant to software evolution processes is a class, is it changed with every release of a system, or are there classes which are virtually immune to software evolution? Figure 1(a) Figure 1(b) Figure 1(c) Figure 2(a) System level evolution aspects using the Evolution Matrix. 2

Figure 2(a) shows evolution matrix that displays the evolution of the classes of a software system. It has the following properties: Each column of the matrix represents a version of the software. Each row represents the different versions of the same class. Two classes in two different versions are considered the same if they have the same name. Within the columns the classes are sorted alphabetically in case they appear for the first time in the system. Otherwise they are placed at the same vertical position as their predecessors. This order is important because it allows one to represent the continuous flow of development of existing classes and stresses the development of new ones. In Figure 2(b) below we can see the evolution matrix of MooseFinder. We see that the first version on the left has a small number of classes and that of those only few survived until the last version, i.e., are Persistent classes. We can also see there have been two major leaps and one long phase of stabilization. Note that the second leap is in fact a case of massive class renaming: many classes have been removed in the previous version and appear as added classes in the next version. There is also a version with a few Dayfly classes. The classes themselves rarely change in size except the class annotated as a renamed Pulsar class, which at first sight seems to be one of the central classes in the system. C. Coarse Grained AgeMap When visualizing the code of a software system it is helpful to know that which portions evolved later and which were the part of the original implementation.coarse Grained Agemap is aimed getting started with evolution analysis of the code [5]. It basically intended to get the overall feel of the system how it evolved and how it was changed through the course of time. Figure 3 shows the coarse grained age map for ArgoUML. The newly born classes are shown in yellow where as the older one as shown in dark blue color. So it a glance we can have an idea of how and when system evolved looking back in time. The main shortcoming of the AgeMap is that it is showing the current visualization, however, for fully comprehending the process evolution it is necessary to travel in time as the system evolved and for that we would need a lot intermediary visualizations of the age map instead of only having one visualization. D. System Complexity View. Class Hierarchy Diagrams are preferred way of getting a quick glance of how the system is designed. This view not only gives the hierarchy view but also additionally provides a view of attributes,methods and line of codes which further enhance the analysis of large software system. It is thus named as System Complexity View providing an insight about the complexity of the system. It is a polymetric view that shows the classes of the system, organized in inheritance hierarchies[6]. Each class is represented by a node: width = the number of attributes of the class height = the number of methods of the class color = the number of lines of code of the class Figure 4 shows a system complexity view diagram. In diagram each class is represented by a node. The dark colored classes show more lines of code where as the light colored classes are have less lines of code. Similarly the classes with more height have more methods then the one having short height. Wider classes are the one having more attributes. E. Distribution Map The Distribution Map is an eclectic approach that combines an intuitive visualization and straightforward metrics into a generic and easy-to-use tool[4]. This combination makes the Distribution Map an effective starting point to reason about the results of automated algorithms, and should as such belong to any reverse engineering toolkit. Figure 4 shows a distribution map each color depicts different kind of class. Table 1 what different color shows the characteristics each color depicts. For example orange color in lower right corner shows the classes have same property. 3

. Figure 2(b): The Evolution Matrix of MooseFinder Figure 3 Coarse Grained Age Map for ArgoUML Figure 4 System Complexity Diagram 4

Figure 5 Distribution Map Table 1: Color coding for distribution map Figure 6 Fine Grained AgeMap 5

F. Fine Grained Agemap This is finer and in depth view of the system. It addresses the problem of viewing the evolution of classes how they grew in size, methods and attributes over the period of time. This technique highlights the class which has been incremented or changed over the years as against the class created in one shot. This is because the class created in one shot would tend to be more robust and loosely coupled as against the one which has to be altered over the time. This is because if class highly coupled then with every change it would also require a change in itself. In Figure 6 above the newly born methods are shown in yellow so all those classes having the yellow spots in them are the one which are highly coupled and one with blue are loosely coupled. Further the incremented parts in the classes are shown in yellow so this also shows the classes which have been altered more over the time. G. Class Blueprint a) Description: Class blueprint is very effective way of segmentation of the methods. It s give information about the role played by methods in the class. It distinguishes the core functionality methods, interfacing methods and methods only accessing variables of the classes. One the main aims in support of the old systems is to get to the core of the class which is addressed by this approach. It s a visual way of understating the classes. It is structured in layers that groups methods and attributes. The nodes that represent method and attributes are colored with respect to semantic representation. Sizes of nodes are corresponding to source code metrics information. Figure 7(a) shows the layered structure. Initialization Layer methods creates and initializes objects. Interface Layer Methods are the one interacting with other classes. Implementation Layer methods contain the core functionality of the class. Accessor Layer contains accessor methods i.e getters and setters for attributes. Lastly the Attribute Layer contains attributes. b) Class Blueprint Patterns Class blueprint vary with respect size, distribution layer and other factors. For our study we focused on size and distribution. Figure 7(b) describes a class with only one method. Its difficult to find such classes as they often represents a dead code. Figure 7(c) on the other hand represents a large class. It has implementation layers containing many nodes and probably would be having several sub-layers. Figure 7(a) Layered Structure Of Class Blueprint 6

Figure 7 (b) Class with one Method. Figure 7 (c ) Large Implementation 7

III. CASE STUDY A. Description JHotDraw 7 is a Java framework for structured drawing editors and for document-oriented applications. The framework for structured drawing editors can be used to realize drawing editors for sketches, diagrams, and artistic drawings. Drawings can be animated and interactive. It is possible to back a drawing with a data model, allowing a structured drawing editor to be used as a user interface for a data mode B. System Complexity View We generated System Complexity View for draw module of JHotDraw7 as shown in Figure 8(a). We have five classes which have more lines code shown by their dark grey color and these are the ones which have most number of methods. Inheritance hierarchy is also visible in the diagram below. However, there are many small spots in the diagram which shows short classes and they are the end nodes of the tree so these are the one which are loosely coupled Figure 8(a) System Complexity View Figure 8(b) 3 D Box view 8

C. Merging Boxes & Classes Second view generated for JHotDraw7 was using VERSO tool which is shown in the Figure 8 (b) below. The figure shows the Boxes view based on the three matrix that are WMC, CBO and LCOM5. In the Figure we can see that all the modules have twist which shows that they are lacking cohesion.. However the good thing we observe is about the blue color boxes in the shows that most of the application is tightly coupled. The two purple boxes the show that they are loosely coupled and size clearly depicts that these are the classes with more number of methods. Further they are twisted which indicates that they lack cohesion as well. accessor and last one are the attributes of the class. We can see that it has a long list of methods having interface. As it seems to be parent of some classes and children must be implementing the interface methods. Cyan color link are the links between interface methods and attributes where as dark blue color links are showing calls from interface methods to accessor and implementation methods. D. Code City Figure 8 (c) shows a code city for the JhotDraw7. In the figure, Classes are represented by orange blocks at the bottom of the yellow pillars which indicated that methods in the class. We have represented method in yellow as we need to see that which classes are having more methods. Packages are colored in purple Figure 8(d) Class Blueprint of AbstractCompositeFigure F. Comparison Figure 8(c) Fine Grained Code City For JhotDraw. E. Class Blueprint Visualization Figure 8 (d) shows a class blue print of the AbstractCompositeFigure class in JhotDraw7. We can see four layers in this with first from the left are interface methods, second is implementation, third a) Lack of Cohesion which is evident the 3D Box view is not evident from the System Complexity View. As system complexity view does shows the more line of codes but that doesn t always mean that it would have less cohesive as well. However we can t see cohesion in the code city view. Well this is a major short coming in code city view as given the fine grained visualization we should have this included in the view. Class blueprint can t effectively show lack of cohesion as it only focuses on the inside calls of the class.while visualizing code if we upfront know 9

the classes that lack cohesion we would have an idea that these classes may require change and attention. b) Inheritance Hierarchy which is presented by System Complexity View is not evident from 3D Box View. Inheritance is important aspect which must be shown in every visualization. So in this case second view lacks the inheritance structure. Code City also doesn t provide a view for inheritance. Class blueprint also does shows the inheritance hierarchy.this is one other aspect that is fundamental while refactoring the software code. If we know the inheritance hierarchies, we can easily know that change in parent would affect the child and so on. c) Number of Methods is measure which is present in all of the views. Class Blueprint in this case really helps by not only providing number of the methods but also their interacting with in the systems. According my own experience only number of methods doesn t help a lot in refactoring the code or understanding the software. A utility class may have a lot of methods but that really doesn t help in understanding the overall flow of the use case. d) Coupling is a measure which is present in System Complexity and 3D Box View however, in case of System Complexity view it is not that evident however 3D Box gives a straightforward view of that. Code city however doesn t provide any information on coupling. If you know that a class is loosely coupled you know that you should be very careful altering anything in the class as it would be referenced at a lot many places. e) Sytem Flow is one very vital point which is only covered by class blueprint view. This is another factor which helps a lot in the understanding the code and refactoring the code. f) Design Pattern is not addressed by any of the views. For instance an application is designed using the MVC pattern. None of the views so far discussed gives the view according to design pattern. This is future work that tool may be developed to address that part. g) Scattered Code is not pointed by any of the approach. Sometime it so happens that core class is having functionality which may not be related to it. For example a class implementing a grading functionality which requires some basic computations like rounding off the results. If the rounding off functionality is also implemented with the main algorithm it s not a good design. Such rounding off should be handled in utility classes. h) Method Segmentation for a class really helps to comprehend the functionality. If we can easily IV. see that which functions are implementing the core functionality we are half way ahead in comprehending the code. Class blueprint provides an excellent visualization in this case. CONCLUSION & FUTURE WORK We have presented in this paper most common code visualization techniques which have been used in past. We have also studied three different tools namely Moose, Code City and Metric Visualization. The study also includes MSE File generation for Moose. Further there is a lot of other views that can be generated by further scripting in moose. One of the future works is to get more insight on the moose scripting and generate other valuable views. Metric Visualization Tool effectively displays graphical representation into 3D Box. The features of the box explicitly depict the metrics result which at a glance gives the idea of the code. However, there is a lot more to do in this dimension. V. ACKNOWLEDGMENT VI. We would like to pay gratitude to Dr. Usman Bhatti, National University of Computer and Emerging Sciences, Lahore, Pakistan for his valuable insight and ideas, for his guidance in every phase of our study and helping us out whenever we were wedged with problems. REFERENCES [1] Langelier, G., Sahraoui, H., and Poulin, P. Visualizationbased analysis of quality for large-scale software systems. In Proceedings of the 20th IEEE/ACM international Conference on Automated Software Engineering ACM, New York, NY, 214-223. [2] Michele Lanza. Object-Oriented Reverse Engineering, University of Bern, Switzerland, 2003. [3] Alexandre Bergel.Mondrian Manual Tutorial for Moose. [4] Ducasse, S., Girba, T., and Kuhn, A. Distribution Map. In Proceedings of the 22nd IEEE international Conference on SoftwareMaintenance (September 24-27, 2006). ICSM. IEEE Computer Society, Washington, DC, 203-212. [5] Wettel, R. and Lanza, M. CodeCity: 3D visualization of large-scale software. In Companion of the 30th international Conference on Software Engineering ICSE Companion '08. ACM, New York, NY, 921-922. [6] Lanza, M. and Ducasse, S. Polymetric Views-A Lightweight Visual Approach to Reverse Engineering. IEEE Trans. Softw. Eng. 29, 9 (Sep. 2003), 782-795. [7] Fenton, N. E. and Neil, M. Software metrics: roadmap. In Proceedings of the Conference on the Future of Software Engineering (Limerick, Ireland, June 04-11, 2000). ICSE '00. ACM, New York, NY, 357-370. 10

11 [8] Kent Beck and Cynthia Andres. Extreme Programming Explained: Embrace Change, Second Edition, Addison- Wesley.