An Open Framework for Reverse Engineering Graph Data Visualization Alexandru C. Telea Eindhoven University of Technology The Netherlands Overview Reverse engineering (RE) overview Limitations of current RE tools New RE / graph viz tool architecture Applications and evaluation Comparison Conclusions 1
Reverse Engineering Goal Program understanding identify software artifacts and relationships understand structure and semantics pattern matching at different abstraction levels top-down ( what -> how ) and/or bottom-up ( how -> what ) Reverse Engineering Information Sources source code ( the thing itself ) documentation information maintained by code development and code analysis tools 2
Reverse Engineering is the part of program understanding that focuses on: extraction of low-level information presentation at the right abstraction level Reverse Engineering Overview automation level 1. Program analysis 2. Plan recognition 3. Concept assignment 4. Redocumentation 5. Architecture recovery abstraction level 3
Reverse Engineering Overview What do have the 5 RE tasks in common? " views queries domain models "Provide different views of the same data, on which queries, supported by a domain model, can be made" Do the above in a reusable,retargetable, extensible, and easy to use RE tool Reverse Engineering Problems Main problem - we need computer tools for RE - most such tools are limited in functionality (data extraction & presentation) genericity (supported data & operations) extensibility (adding new operations) retargeting (different domain models) 4
Our Goals provide desired RE tool architecture compare with existing tools since RE data is graph data, use tool beyond RE, too towards a generic InfoVis/GraphVis toolkit architecture? Operations RE Scenario Pipeline extract aggregate measure select visualize 5
RE Operations are Graph Operations RE Operations extract aggregate measure select visualize Graph Operations graph creation graph clustering graph attribute computation graph filtering/simplification graph drawing Proposed Tool Architecture operation pipeline 6
Core Architecture Core Data Operations graph data structure selections atributes visual data glyphs images selection editing mapping viewing What is RE data? Structure Core Architecture basic data (variables,functions,classes...) derived data (patterns,packages,annotations...) relationships (uses, is a, includes...) Atributes number of LOC, bugs, modification time... RE metrics (common clients/suppliers...) 7
Data Model RE data is attributed graph data Structure nodes model software artifacts edges (directed) model relationships no explicit hierarchical structure used! (edge attributes tell if edge is e.g. containment or aggregation) no explicit restrictions on graph topology (any # incoming/outgoing edges allowed) Data Model A typical RE graph: 8
Data Model Atributes unrestricted key-value pairs per node/edge type, number, value of attributes change freely copes naturally with missing values attribute planes defined implicitly as all attribute values of some given nodes/edges for a given key different from the SciViz attribute model Data Model Selections named sets of nodes and edges are the datasets in our toolkit unrestricted collections (simple interface)! read/written by operations similar to the subgraph concept in GVF selection 9
Data Model selected data selections Operations Generic operation model simple interface! specific parameters input selection operation output selection attribute plane names 10
Operations We distinguish 4 operation types (depending on the data access) graph selections visual data selection editing R R/W W R mapping viewing R R W R Selection Operations Create selection objects by applying various algorithms / criteria tree selections ( vertical slices ) level selections ( horizontal slices ) conditional selections (attribute/topology based) boolean selections (combine basic selections) Typical scenario: select select select.... 11
Editing Operations Modify the graph structure and/or attributes Structure Editing Operations: import graph data (GraphEd,GraphViz,Rigi ) cluster graphs (simplification) all editing operations use only the basic node/ edge addition/removal interface graph data is still accessed indirectly, via selection objects Attribute Editing Operations: Editing Operations RE metrics (common providers, suppliers, etc) layouts, fisheye, zoom are also attribute editing operations (in contrast to other architectures) attributes created/modified/removed dynamically specific parameters input selection operation output selection attribute plane names specify the attribute keys the operation works on 12
Implemented layouts: Layout Operations tree (Sugiyama) spring embedders (GEM, neato) grid, random 3D stacked 2D nested layout specific parameters nodes to lay out layout Nodes with Position/dimension attributes position/dimension attribute names Layout Operations 13
Layout Operations Layout 1 (neato) Layout 2 (GEM) Software artifacts (900 nodes, 2000 edges) Layout Operations Software artifacts (600 nodes, 900 edges) System core (high coupling) extracted 14
Layout Operations nested layout implemented based on spring embedder any layout accepting node dimensions could be used instead immediately Layouts and Arrangements Arrangements typically used to refine the layout of a subgraph in an existing layout implemented by cascading a scale + translate operation after a typical layout operation arrangement operation layout scale translate 15
Layouts and Arrangements 1. Full layout 2. Select subset Layouts and Arrangements 3. Adjust position of selection 4. Arrange selection with a tree layout 16
Layouts and Arrangements Advantages of the chosen architecture: decouple layout code from RE code decouple layout from drawing and mapping (!) easy to add new layouts (minimal interface) easy to cascade layouts (refinement, fisheye, arrange) easy to apply different layouts to different subgraphs (selections) simplify implementation (layouts work on normal graph attributes) Mapping and Visualization Map abstract graph data to concrete visual form Mapping and visualization pipeline 17
Mapping and Visualization graph data mapper glyphs viewer Basic Mapping Glyph factory mappers viewers glyphs glyph factories data->2d/3d geometries geometries->display parameters->geometries attributes->parameters Mapping and Visualization Glyphs similar to the SciVis glyphs 2D/3D parametrizable graphical objects implemented as (small) Inventor scene graphs Glyph Factories called by mappers for each node/edge to map written as (small) Tcl scripts, thus very easy to customize selectable/editable at run-time to map data in various ways 18
Mapping and Visualization Node glyph factory example proc make_glyph {node} { set id [pr_node $node id] if {$id<15} { return [pr_make_file "icon.iv" scale 3] } else { return [pr_make_cube scale 5 color 0 $id/10 0] } Mapping and Visualization 19
Mapping and Visualization Mapping and Visualization point glyphs (default) folder glyphs colored by component type 20
Mapping and Visualization Graph Splatting Mapper [van Liere,1999] another mapper type which produces an image instead of a set of node/edge glyphs useful for visualizing large graphs (>1000 nodes) for which glyph mappers produce cluttered views Graph Splatting Examples NOKIA software (2000 artifacts, 5000 relations) components using String class package requirements 21
Graph Splatting Examples components using String class package requirements Mapping and Visualization Advantages of the chosen architecture: easy to produce different mappings on the fly (average Tcl glyph factory < 15 lines of code) flexible (control mapping at node/edge level) simple to implement (2 mappers vs >20 in SciViz) adding more complex mappers could e.g. produce UML-like diagrams automatically 22
Comparison Rigi VAN ISH GVF GDT main focus RE RE/IV GV GV RE/GV interactivity + + ++ + + +++ implementation C/Tcl C++/own Java C++ C++/Tcl scripting yes yes no no yes min/mapper op n/a 5..10 n/a? +...++ min/graph op 5..20 n/a 30..60 30..60 5..60 size (LOC) 50000 35000 >70000 >55000 10000 LOC/mapper n/a 5..50 n/a? 5..30 LOC/graph op 50..300 n/a >100 >100 20..300 min/new app 5..30 20..60 >30 >45 5..30 * * our toolkit Future Extensions experiment with new RE-specific layouts (UML) test on large graphs (NOKIA: 10000-45000 nodes) add clustering + metrics from literature! experiment with skeletons of splatted fields (?) 23
24