Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data



Similar documents
Cabinet Tree: an orthogonal enclosure approach to visualizing and exploring big data

A Survey on Cabinet Tree Based Approach to Approach Visualizing and Exploring Big Data

Agenda. TreeMaps. What is a Treemap? Basics

Cascaded Treemaps: Examining the Visibility and Stability of Structure in Treemaps

HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data

Hierarchical Data Visualization

Voronoi Treemaps in D3

Hierarchy and Tree Visualization

Squarified Treemaps. Mark Bruls, Kees Huizing, and Jarke J. van Wijk

Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007

Ordered Treemap Layouts

Treemaps for Search-Tree Visualization

7. Hierarchies & Trees Visualizing topological relations

Space-filling Techniques in Visualizing Output from Computer Based Economic Models

Visualizing Large Graphs with Compound-Fisheye Views and Treemaps

Interactive Visual Analysis of the NSF Funding Information

TECHNICAL REPORT. Extending Tree-Maps to Three Dimensions A Comparative Study. Thomas Bladh, David A. Carr, Jeremiah Scholl.

Visualization of Software Metrics Marlena Compton Software Metrics SWE 6763 April 22, 2009

Rendering Hierarchical Data

Improvements of Space-Optimized Tree for Visualizing and Manipulating Very Large Hierarchies

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc.,

Regular TreeMap Layouts for Visual Analysis of Hierarchical Data

A Self-adaptive Treemap-based Technique for Visualizing Hierarchical Data in 3D

Evaluating the Effectiveness of Tree Visualization Systems for Knowledge Discovery

Visualization Techniques in Data Mining

TOP-DOWN DATA ANALYSIS WITH TREEMAPS

A Note on Space-Filling Visualizations and Space-Filling Curves

Elastic Hierarchies: Combining Treemaps and Node-Link Diagrams

Introduction of Information Visualization and Visual Analytics. Chapter 7. Trees and Graphs Visualization

Presented by Peiqun (Anthony) Yu

Submission to 2003 National Conference on Digital Government Research

TREE-MAP: A VISUALIZATION TOOL FOR LARGE DATA

Tree Visualization with Tree-Maps: 2-d Space-Filling Approach

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration

Visualizing the Top 400 Universities

VISUALIZATION. Improving the Computer Forensic Analysis Process through

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

Interaction and Visualization Techniques for Programming

Treemaps with bounded aspect ratio

Information Visualization of Attributed Relational Data

A Comparison of 2-D Visualizations of Hierarchies

Component visualization methods for large legacy software in C/C++

Voronoi Treemaps for the Visualization of Software Metrics

Pocket PhotoMesa: A Zooming Image Browser for the Pocket PC

THE rapid increase of automatically collected data has led

VisCG: Creating an Eclipse Call Graph Visualization Plug-in. Kenta Hasui, Undergraduate Student at Vassar College Class of 2015

AN ABSTRACT OF THE THESIS OF. 20, Title: DiskGrapher: A Different Approach to Hard Drive Visualization for Mac

Visualization of Corpus Data by a Dual Hierarchical Data Visualization Technique

Visual Data Mining with Pixel-oriented Visualization Techniques

Visibility optimization for data visualization: A Survey of Issues and Techniques

Cushion Treemaps: Visualization of Hierarchical Information

Visualization of Streaming Data: Observing Change and Context in Information Visualization Techniques

Treemap Presentation as a Corporate Dashboard Larry Day & Richard Dickinson Corporate Performance Measures BNSF Railway

Project Portfolio Earned Value Management Using Treemaps. University of Maryland, College Park, Maryland. 1 Introduction

Application of Information Visualization to the Analysis of Software Release History

Graphical representation of the comprehensive patient flow through the Hospital

The course: An Introduction to Information Visualization Techniques for Exploring Large Database

3 Information Visualization

DATA LAYOUT AND LEVEL-OF-DETAIL CONTROL FOR FLOOD DATA VISUALIZATION

Clustering & Visualization

Understanding Cultural Variations of E-Commerce Websites in A Global Framework

BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou

an introduction to VISUALIZING DATA by joel laumans

EdgeLap: Identifying and discovering features from overlapping sets in networks

Choosing Colors for Data Visualization Maureen Stone January 17, 2006

NakeDB: Database Schema Visualization

Visually Encoding Program Test Information to Find Faults in Software

Visualizing Software Projects in JavaScript

Generalized Automatic Color Selection for Visualization

6.3 Treemaps: a space-filling approach to the visualization of hierarchical information structures

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-

GosperMap: Using a Gosper Curve for Laying out Hierarchical Data

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

MultiExperiment Viewer Quickstart Guide

Effective Visualization of File System Access-Control

Information Visualization Multivariate Data Visualization Krešimir Matković

Understanding Data: A Comparison of Information Visualization Tools and Techniques

Expert Color Choices for Presenting Data

An evaluation of space-filling information visualizations for depicting hierarchical structures

Self-adaptive e-learning Website for Mathematics

SuperViz: An Interactive Visualization of Super-Peer P2P Network

Package treemap. February 15, 2013

Visualization of Varying Hierarchies by Stable Layout of Voronoi Treemaps

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Template-based treemaps to preserve spatial constraints

Cartesian vs. Radial A Comparative Evaluation of Two Visualization Tools

Visualizing Changes of Hierarchical Data using Treemaps

Interactive Information Visualization of Trend Information

Visualization of Phylogenetic Trees and Metadata

VISUALIZATION APPROACH FOR SOFTWARE PROJECTS

Hyperbolic Tree for Effective Visualization of Large Extensible Data Standards

Visualizing Data from Government Census and Surveys: Plans for the Future

Visualization methods for patent data

The Effect of Animated Transitions on User Navigation in 3D Tree-Maps

Sitecore InDesign Connector 1.1

Information Visualization. Ronald Peikert SciVis Information Visualization 10-1

Visualizing Data: Scalable Interactivity

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Visualizing Multidimensional Data Through Time Stephen Few July 2005

Time Management II. June 5, Copyright 2008, Jason Paul Kazarian. All rights reserved.

Transcription:

Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data Yalong Yang 1 Kang Zhang 1,2 Quang Vinh Nguyen 3 Jianrong Wang 4 1 School of Computer Software, Tianjin University, Tianjin, China 2 Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA 3 MARCS Institute & School of Computing, Eng. and Maths, University of Western Sydney, Sydney, Australia 4 School of Computer Science and Technology, Tianjin University, Tianjin, China ABSTRACT Treemaps are well-known for visualizing hierarchical data. Most approaches have been focused on layout algorithms and paid little attention to other graphic properties. Furthermore, the structural information in Treemaps is too implicit for viewers to perceive. This paper presents Cabinet Tree, an approach that: i) draws branches explicitly to show relational structures, ii) adapts a space-optimized layout for leaves and maximizes the space utilization, iii) uses coloring strategies to clearly reveal patterns and contrast different attributes intuitively. Our quantitative evaluations demonstrate that Cabinet Tree achieves good scalability for increased resolutions and datasets, high proximity of sibling nodes and real-time performance. Keywords: orthogonal enclosure, tree drawing, hierarchical visualization, cabinet tree, big data. Index Terms: H.5.2 [Information Interfaces and Presentation]: User Interfaces Graphical user interfaces (GUI), screen design; I.3.6 [Computer Graphics]: Methodology and Techniques Graphics data structures and data types; I.3.8 [Computer Graphics]: Applications 1 INTRODUCTION Much of data we use today has a hierarchical structure. Examples of hierarchical structures include university-department structure, family tree, library catalogues and so on. Such structures not only play significant roles in their own right, but also provide means for representing a complex domain in a manageable form. Current GUI tools, such as traditional node-link diagrams or file browsers, are an effective means for users to locate information, but information embedded in large hierarchies can be harder to find unless the nodes/branches are systematically laid out [1]. Unfortunately in the real world, hierarchical structures are often very large with thousands or even millions of elements and relationships. Therefore, a capability of visualizing the entire structure while supporting deep exploration at different levels of granularity is urgently needed for effective knowledge discovery [2]. Enclosure or space-filling visualization, such as Treemaps techniques [1][3][4], visualizes large tree structures through enclosure partitioning for space utilization. Treemaps have been considered a successful method for visualizing large hierarchical data sets with attributed properties. The Treemap algorithm works by dividing the display area into a nested sequence of rectangles e-mail:yalongyang@tju.edu.cn e-mail:kzhang@{tju.edu.cn, utdallas.edu} e-mail:q.nguyen@uws.edu.au e-mail:wjr@tju.edu.cn whose areas correspond to an attribute of the dataset, effectively combining aspects of a Venn diagram and a pie chart [3]. The set of rectangles forms the layout of a Treemap. Voronoi Treemap [5][6] shows a new way to visualize things in arbitrary polygons rather than rectangles, but only with simple relationships between items. Interactive control allows users to specify the presentation of both structural and content information [7]. This is in contrast to traditional static methods of displaying hierarchically structured information, which generally makes either poor use of display space or hide much information useful to users. Originally designed to visualize files on a hard drive [8], Treemaps have been applied to a wide variety of areas ranging from financial analysis, sport reporting [9], image browsing [10] and software and file system analysis [11]. Although space-filling is useful for visualizing large hierarchies, most of the Treemap techniques do not show the relational structure explicitly. It requires extra cognitive effort for viewers to perceive and understand the relational structures that are implicit in the enclosure [12]. This paper presents a space-filling technique, called Cabinet Tree, for visualizing big hierarchical data. Our contributions include the following aspects in the visualization design of Cabinet Tree: Orthogonal and explicit drawing of branches and space-optimized layout for leaves, generating a highly compact view; Scalable visualization suitable for big data (including hundreds of thousands of nodes) and increased resolutions, measured in terms of visible nodes; A contrast-enhanced color strategy and color-coded sorting of leaves to reveal visual patterns; Quantitative evaluations to support the design choices for the above aspects and to compare with similar Treemap approaches. The remainder of the paper is organized as follows. Section 2 discusses related work of other hierarchical visualization and exploration techniques. Section 3 gives a detailed description of Cabinet Tree. Section 4 presents case studies that use Cabinet Tree to visualize file systems. Section 5 discusses the quantitative evaluation with comparison between our method and other hierarchical visualization techniques. The final section concludes our work as well as future directions. 2 RELATED WORK The best-known enclosure approach is Treemap, first proposed by Johnson and Shneiderman in 1991, called Slice and Dice Treemap (S&D Treemap) [4]. Variations of Treemap include Squarified Treemap [1], Ordered Treemap [13], Quantum Treemap [3] and etc. A brief and concise definition of treemap-like layout is [14]: nodes are represented as nested rectangles and their layout reveals the structure of the tree. The size of such a rectangle can be used to represent a quantitative attribute. A node s color or

other graphical properties could also be used to represent other attributes of the node. Treemap-like visualizations are capable of showing a large number of nodes because they can mostly ensure 100% use of the space. However, they show the relationships (or structures) implicitly which would cost viewers extra cognitive effort to understand the relations between the nodes. Treemap [22] takes advantages of 2.5D techniques to enhance the presentation of containment relationships. As an important application issue, scalability refers to the capability of effectively displaying large amounts of data [23]. Screen resolutions become the limiting factor for scalable visualizations. Larger displays with higher resolutions are being developed for visualization [24] (e.g. the large wall at the AT&T Global Network Operations Center [25]). Therefore, scalability for high resolutions and large data sets become crucial for visualizing big data. Our previous work on Drawer Tree [26] aims at combining the connection and enclosure techniques to achieve a good presentation of the structural information. Drawer Tree uses lines to present leaves, which are hard to perceive. Having much waste between branches, its space utilization is not optimal. Figure 1: An example of enclosure approaches (S&D Treemap [4]) There are also approaches that aim at combining both the node-link style and enclosure concept into one visualization. Sunburst conveys both area and structure using a circular or radial space-filling layout methodology [15], but much space is still wasted The Space-Optimized Tree uses an explicit node-link style while each non-leaf node subdivides its area for its child nodes [12], however, the varied shaped polygons and their relationships are hard to perceive. Elastic Hierarchies [16] combines node-link style and Treemap in a straightforward manner, but space utilization remains an issue. Figure 2: Sunburst (top-left) [15], Space-Optimized Tree (top-right) [12] and Elastic Hierarchies (bottom) [16] Apart from the layout algorithm, the use of graphical attributes on hierarchical visualizations, or also called attributed visualizations, has been investigated. In 1992, Shneiderman proposed color coding in Treemap could present different types of files, owners of programs, frequency of use or the age of the file [8] and a serious of colors could be used for continuous valued attributes [17]. Cushion Treemap [18] uses intuitive shading to provide insights in hierarchical structures. Labeling is an important aspect of tree visualization for effective applications. For example, in a labeled Treemap, parent nodes make room for their names and borders to allow users to quickly find the data they are interested in, improving over the conventional Treemap [19]. Meanwhile, labels guide the navigation in typical Treemaps [14]. A 3D version of Treemap could be made 3-dimensional where child nodes are colored according to their last update time and given heights according to their access frequencies [20]. Fractal Tree [21] uses different colors to present the depths of nodes. Cascaded Figure 3: Drawer Tree visualization of Eclipse folder [26] 3 CABINET TREE To support effective visualization of big hierarchical data, we consider the following design criteria: i Balanced trade-off between space utilization and clarity of relational structures, suitable for the rectangular displays of both mobile and desktop screens; ii Scalable for increased resolutions and data sizes; iii High proximity of sibling nodes for quick visual grouping; iv Fast layout algorithm. We call our approach Cabinet Tree which resembles objects stacked in a large cabinet. Branches of the tree partition the cabinet while leaves fill the remaining partitioning space. The node-link style displays the relationships of information explicitly, but most of the pixels are wasted as background. While Treemaps ensure 100% use of the space, the implicit view costs more cognitive effort to understand their structures. Cabinet Tree aims at balancing space utilization and clarify of structural information. Our design follows the enclosure concept to maximize space utilization and the 2D orthogonal concept of S&D Treemap to enable a fast cognitive process [3]. It draws branches explicitly to reveal their hierarchical relationships. Moreover, a space-optimized layout algorithm is applied to leaves to obtain a compact view and high proximity of sibling nodes. Cabinet Tree is generally scalable for increased resolutions and data sizes as demonstrated by our experiments. Comparison of attributes within a hierarchical structure is crucial for many applications, however, rectangles in S&D Treemap are difficult to compare [1]. In Cabinet Tree, a contract-enhanced color strategy is applied to the mapping of attributes for viewers to compare them easily. We apply color-coded sorting to data items to reveal visual patterns among groups of related nodes. The remaining part of this section addresses the design rationale and realization of Cabinet Tree. 3.1 Interleaved Horizontal-Vertical Partitioning The drawing starts from the bottom horizontal line that represents the root of the tree. Level-1 branches are drawn as vertical lines partitioning the space above the root. Level-2 branches

partition the space horizontally between two neighboring level-1 lines, representing the children of the vertical line on the left. Similarly, level-3 branches partition the space vertically between two neighboring level-2 lines, the partitioning process continues down to the lowest level, which can be leaves or empty branches. Leaves occupy the remaining space within the surrounding level lines. In the following discussion, we will generally call a branch or a leaf as a node. The partitioning process is demonstrated in Figure 4. The top figure shows a tree in the node-link style where rectangles are branches and circles are leaves. Each node is labeled uniquely by a letter followed by its weight. The bottom figure demonstrates the same tree drawn as a Cabinet Tree. In typical hierarchical visualization, the type of a node is usually determined by whether it has children. The nodes with children are considered parent nodes and those without as child nodes. There are, however, applications that need to visualize nodes with no children, for instance, empty folders in a file system, and catalogues with no products in an e-business system. Cabinet Trees allow branches to have no leaves, as noted in Figure 4 where G1, I1, J1, and K1 are leaf-less branches. of clicks on a web link, size of a folder, and etc. We combine a node s depth in the hierarchy with one of the node s quantitative attributes as its weight (w) using Formula (1). w = Attr C l (1) where C determines the trend of weight in terms of the depth level (l). 3.3 Branches Branches are orthogonally drawn with decreasing thicknesses from the root to the lowest level. The exact thickness of a branch is determined by the available space to the branch. Once the space for a branch is allocated, its thickness is calculated according to the space allocated for the branch with a minimal and maximum limits. All the branches are expanding upward vertically and rightward horizontally. The visual comparison between implicit branches and explicit branches in Figure 5 demonstrates that explicit branches make relational structures easier for the viewer to perceive. Figure 4: Interleaved horizontal-vertical partitioning The layout algorithm is outlined in Algorithm 1, that is conceptually recursive but implemented iteratively: Algorithm 1 Partitioning 1: procedure PARTITION BRANCH 2: Calculate thickness for B 3: For each sub branch B-1: 4: Calculate space for B-1 according to its weight 5: Partition branch(b-1) 6: Divide the remaining space into slices of equal widths enough for all leaves in B 7: For each leaf L of B: 8: Calculate for the space of L according to its weight 9: Fill L into the next slice(s) available Figure 5: Implicit branches (top) and explicit branches (bottom) with 4245 nodes in Eclipse application folder To evaluate whether the strategy of decreasing thicknesses would significantly impact on the number of visible nodes, i.e. the scalability, we set all the branches at a constant thickness of 1 pixel. Figure 6 shows the number of visible nodes for varied and fixed (1 pixel) branch thicknesses on various screen resolutions for the same dataset containing 530,290 nodes. As Figure 6 demonstrates, the difference is insignificant, we therefore decide to use the varied and decreasing thicknesses, which provides a better visualization. The algorithm follows the partitioning concept of S&D Treemap for branches and adopts a space-optimized approach to allocate space for leaves. The time complexity of this algorithm is linear (O(n)) where n is the number of nodes, since it calculates the layout for each node only once. It is therefore suitable for real-time visualization of big hierarchical data. 3.2 Weight In an enclosure approach, the weight (w) of a node may encode a quantitative attribute represented by the node, such as the number Figure 6: Number of visible nodes with increasing resolutions for different thickness strategies 3.4 Leaves The space allocated for each leaf is calculated according to its weight and the space available from its branch. Each node is placed next to its siblings to achieve a high space utilization and also high proximity of sibling nodes (to be evaluated in Section 5.1).

Interleaved horizontal-vertical partitioning may result in either vertical or horizontal leaves. The strategies for allocating both horizontal and vertical leaves are illustrated in Figure 6, where Leaf1.1.1, Leaf1.1.2, Leaf1.1.3 and Leaf1.1.4 are leaves under Branch1.1 (left), and similarly Leaf3.1.1, Leaf3.1.2, Leaf3.1.3 and Leaf3.1.4 are under Branch3.1 (right). mapping. The process is demonstrated in Figure 10. Figure 10a shows the original color distribution among the nodes. We first set a threshold, and any node whose intensity less than the threshold will be recorded and removed as illustrated in Figure 10b. We then spread the remaining intensity values to fit the entire range proportionally, as shown in Figure 10c. Finally, the removed nodes are added back to the distribution, as in Figure 10d. This process generates much smoother color distribution for leaves than the linear mapping. Figure 7: Allocation of leaves 3.5 Color-Coded Sorting With sorted data, viewers can locate the needed items and see the quantitative differences easily. The attribute to be sorted could be either discrete (e.g. file type, owners) or continuous (e.g. size, time). Figure 8 compares the method of separate coloring and sorting with the one that combines coloring and sorting. We combine color-coding and sorting on a given attribute to reinforce the perception of grouped leaves of similar values. Figure 10: Process of contrast-enhanced color strategy We use HSI color space to present the color of each branch, the deeper of the level is, the lighter the intensity. The visual comparison between the linear mapping and contrast-enhanced color strategies is demonstrated in Figure 11. Figure 8: Separate coloring and sorting (top) and combined coloring and sorting (bottom) with 2,211 leaves in a file system 3.6 Contrast-Enhanced Color Strategy When assigning colors to continuous valued attributes of leaves, the conventional method is linear mapping. However, if attribute values are concentrated on two ends, this method generates a poor view with only the lightest and darkest colors as illustrated in Figure 9. Figure 9: A case not suitable for linear color mapping To address the problem of polarizing attribute values, we use a contrast-enhanced strategy to post-process the results of linear Figure 11: Visualization of 77,402 files and 7,850 directories in MiKTex folder. Linear color mapping (top) and contrast-enhanced color strategy (bottom) with color-coding on modified times, the darker the older the file is 3.7 Labeling Text labels are not pre-attentive but are important to provide the context in which visualized data appear. Labeling every item statically on a dense visualization is unrealistic [17], we therefore label only the nodes with enough spaces. To suit easy reading, a node s label is horizontally placed within the node if there is an enough space and the label s size is made proportional to the node s size. For a vertical node that has no space to fit its label horizontally, the label is then placed vertically. The color of a text label is crucial for the label to stand out from the background. We use YUV color space to generate the color for labels, because YUV works well for optimal foreground and background detection [27][28][29]. Assuming that Y represents a node s overall brightness or luminance, we compare Y with a threshold, and label it black or white depending on whether Y is less or larger than the threshold. This strategy ensures a high color contrast for labeling.

4 CASE STUDIES We use Cabinet Tree to visualize file systems to illustrate its effectiveness and scalability. File types are color-coded as shown in Figure 13. Figure 13: Color coding for different file types Cabinet Tree first visualizes the file system of CodeBlocks Application which contains 2,673 files and 262 directories, as shown in Figure 14. Figure 14a uses the file count as the weight, clearly showing that CodeBlocks Application consists of 4 major parts: Binary files (in Moderate Pink) and source code files (in Soft Red), Media files (in Soft Orange), Small folders (bundled in Green) and files with unspecified types (in Vivid Yellow), and Compressed files (in Light Yellow). Figure 14b uses the file size as the weight, and reveals that binary files (in Moderate Pink) takes the largest space. Another example shown in Figure 12 is Visual Studio 2012 Windows Application folder with 36,722 files and 7,399 directories (including 16 empty ones), weighted on file counts. Figure 14 clearly shows that several folders share similar Figure 14: Cabinet Tree visualizing Codeblocks, (a) node sizes weighted on file counts and (b) weighted on file sizes structures in this dataset. For example, ProjectTemplates and ProjectTemplatesCache subfolders have similar structures; ItemTemplates and ItemTemplatesCache subfolders share similar structures. By manually inspecting these folders, we find that ProjectTemplatesCache is the backup folder for ProjectTemplates and ItemTemplatesCache is the backup folder for ItemTemplates. Original and backup folders should of course be similar. Cabinet Tree is able to handle big data with hundreds of thousands of nodes. For example, Figure 15 presents a C drive contents on a personal computer with 455,940 files and 74,350 directories (including 3,520 empty directories). Figure 12: Cabinet Tree visualizing 36,722 files and 7,399 directories in Visual Studio 2012 Folder

Figure 15: Cabinet Tree visualizing 455,940 files and 74,350 directories in C Drive 5 EVALUATION AND DISCUSSION Having implemented Cabinet Tree using JavaScript, we compare it with Squarified Treemap and S&D Treemap both built on Data-Driven Documents (D3js) [30] which also uses JavaScript. The round configuration of D3js Treemap was set to true to ensure every node to occupy an integer number of pixels. All the experiments ran on Google Chrome Browser Version 36.0.1985.125m with 1600 900 resolution using a PC with 2.9 Ghz and 8GB RAM. The data sets experimented are summarized in Table 1. Table 1: Data sets and counts of nodes Data Set Max level Total # of files # of directories # of empty directories IE 11 3 142 136 6 0 Java 1.8.0 10 1,662 1,530 132 3 Office 2013 7 5,092 4,797 295 4 Hadoop 19 14,688 12,711 1,977 60 Ctex 2.9.2 11 90,981 82,901 8,080 364 D Drive 21 105,302 91,183 14,119 549 Qt 5.3.0 18 120,458 109,220 11,238 91 C Drive 1 20 337,953 290,737 47,580 1,954 C Drive 2 26 530,290 455,940 74,350 3,520 5.1 Scalability This section evaluates the scalability measured in terms of the total number of visible nodes (i.e. each occupying at least one pixel), with increasing resolutions and increasing sizes of data. To evaluate scalability with increasing resolutions, we use two large datasets (D Drive and C Drive 1 in Table 1) to visualize on 7 wide screen resolutions. In this evaluation, we wish to find the trend of the percentage of visible nodes (among all the nodes) and the layout time (excluding the file reading, rendering and displaying time) with increasing resolutions. Figure 16 shows that the percentage of visible nodes raises with the gradually increasing resolutions for both data sets. Figure 17 shows that as the resolution grows the layout time for both data sets is linearly increasing. In summary, Cabinet Tree is highly scalable with increased resolutions in both the speed and percentage of visible nodes. Figure 16: Percentage of number of visible nodes within the total number of nodes in different resolutions Figure 17: Percentage of number of visible nodes within the total number of nodes in different resolutions To evaluate the scalability of Cabinet Tree against increased data sizes, we tested it using the data sets in Table 1 with the results shown in Table 2. Table 2: Number of visible nodes with different data sets Data Set Total # of visible nodes # of visible files # of visible directories IE 11 142 136 6 Java 1.8.0 1,662 1,530 132 Office 2013 5,064 4,771 293 Hadoop 13,519 11,739 1,780 Ctex 2.9.2 62,466 60,016 2,450 D Drive 67,760 58,707 9,053 Qt 5.3.0 95,569 87,690 7,879 C Drive 1 190,035 179,258 10,777 C Drive 2 262,381 244,063 18,318

We compare Cabinet Tree with Squarified Treemap and S&D Treemap for their scalability with increased data sizes, as in Figure 18. Figure 21: Comparison of average pixels for leaves Figure 18: Comparison of visible numbers among three approaches Figure 19 compares the percentages of visible nodes within the total number of nodes for the three approaches with increased data sizes. It is clear that Cabinet Tree outperforms S&D Treemap in all datasets. For some smaller datasets, Cabinet Tree is better than Squarified Treemap because the former visualizes branches (directories) explicitly while the latter only shows leaves. Squarified Treemap demonstrates a better scalability than Cabinet Tree in some bigger data sets because the drawing of branches in Cabinet Tree costs much space, particularly for deep hierarchies and large number of branches. The comparison of space utilization on different data sets is illustrated in Figure 20. Squarified Treemap in scalability. However, given a space, Cabinet Tree can fill more leaves than S&D Treemap and Squarified Treemap since the average number of pixels per leaf is less than the latter two methods (Figure 21). 5.2 Proximity of Sibling Nodes Another aesthetic measurement is the proximity of sibling nodes. It states that [31] logical neighbors should be placed close in the visualization. The distance between two neighboring sibling nodes is measured by the distance between their centers. Smaller distances are desirable. Figure 22 suggests while S&D Treemap produces the best average distance between sibling nodes, Cabinet Tree achieves a very comparable result, which is much better than Squarified Treemap. Figure 19: Comparison of three approaches for visible nodes Figure 22: Comparison of average pixels for leaves 5.3 Running Time Figure 23 shows the computational time for the layout algorithm (excluding file reading and rendering) on the same data sets. Figure 20: Space utlization by Cabinet Tree for increased data sizes Figure 20 shows that when the data sets get bigger, the space utilization for leaves by Cabinet Tree is decreasing, and drops dramatically from Office 2013 (5,064 total number) to Hadoop (13,519 total number). Comparing the average number of pixels for leaves among three different methods, Figure 21 shows that Cabinet Tree uses the least for all the data sets. Cabinet Tree therefore delivers the most compact visualization among the three methods. In summary, Cabinet Tree is more scalable than S&D Treemap in all the cases we experimented. Due to its explicit visualization of branches that take spaces, Cabinet Tree performs poorer than Figure 23: Comparison of average pixels for leaves Figure 23 shows that Cabinet Tree is comparable with S&D Treemap which takes least execution time. The complexity of the Cabinet Tree layout is linear, so is S&D Treemap. Since Cabinet Tree draws more nodes than S&D Treemap as shown in Figure 18, it takes more computational time. Squarified Treemap has worst performance among the three approaches.

6 CONCLUSION AND FUTURE WORK This paper has presented a 2D approach for visualizing large hierarchical data, called Cabinet Tree. Using the enclosure and orthogonal drawing methods, Cabinet Tree performs a space-optimized layout for leaves and explicit branches with carefully designed color schemes for aesthetic and clear visualization. Quantitative evaluations have indicated that Cabinet Tree is capable of visualizing large hierarchical datasets. Being scalable for increased resolutions and data sizes, with high proximity of sibling nodes and high real-time layout speed, Cabinet Tree can be considered an effective tool for large hierarchy visualization in a wider range of applications. We are planning to apply an interaction framework into Cabinet Tree to facilitate effective exploration of large hierarchical structures, such as node selection and zoom-in. We will conduct a formal usability study to evaluate the effectiveness of our approach. REFERENCES [1] M. Bruls, K. Huizing, and J. J. V. Wijk, Squarified treemaps, in Proceedings of the Joint Eurographics and IEEE TCVG Symposium on Visualization, pp. 33 42, 1999. [2] W. Huang, P. Eades, S.-H. Hong, and C.-C. Lin, Improving multiple aesthetics produces better graph drawings, Journal of Visual Languages & Computing, vol. 24, pp. 262 272, Aug. 2013. [3] B. B. Bederson, B. Shneiderman, and M. M. Wattenberg, Ordered and quantum treemaps: Making effective use of 2d space to display hierarchies, ACM Transactions on Graphics, vol. 21, pp. 833 854, Oct. 2002. [4] B. Johnson and B. Shneiderman, Tree-maps: a space-filling approach to the visualization of hierarchical information structures, in Proceeding of Visualization 91, pp. 284 291, IEEE Comput. Soc. Press, 1991. [5] M. Balzer, O. Deussen, and C. Lewerentz, Voronoi treemaps for the visualization of software metrics, in Proceedings of the 2005 ACM symposium on Software visualization - SoftVis 05, p. 165, ACM Press, May 2005. [6] M. Balzer and O. Deussen, Voronoi treemaps, in IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., pp. 49 56, IEEE, 2005. [7] B. Shneiderman, The eyes have it: a task by data type taxonomy for information visualizations, in Proceedings 1996 IEEE Symposium on Visual Languages, pp. 336 343, IEEE Comput. Soc. Press, 1996. [8] B. Shneiderman, Tree visualization with tree-maps: 2-d space-filling approach, ACM Transactions on Graphics, vol. 11, pp. 92 99, Jan. 1992. [9] D. Banks, TennisViewer: a browser for competition trees, IEEE Computer Graphics and Applications, vol. 17, no. 4, pp. 63 65, 1997. [10] B. B. Bederson, Quantum Treemaps and Bubblemaps for a Zoomable Image Browser, Proc. User Interface Systems and Technology, pp. 71 80, 2001. [11] M. J. Baker and S. G. Eick, Space-filling Software Visualization, Journal of Visual Languages & Computing, vol. 6, pp. 119 133, June 1995. [12] Q. V. Nguyen and M. L. Huang, Space-optimized tree: a connection+enclosure approach for the visualization of large hierarchies, Information Visualization, vol. 2, pp. 3 15, Mar. 2003. [13] B. Shneiderman and M. M. Wattenberg, Ordered treemap layouts, in IEEE Symposium on Information Visualization, 2001. INFOVIS 2001., pp. 73 78, IEEE, 2001. [14] R. Blanch and E. Lecolinet, Browsing zoomable treemaps: structure-aware multi-scale navigation techniques., IEEE transactions on visualization and computer graphics, vol. 13, pp. 1248 53, Jan. 2007. [15] J. STASKO, R. CATRAMBONE, M. GUZDIAL, and K. MCDONALD, An evaluation of space-filling information visualizations for depicting hierarchical structures, International Journal of Human-Computer Studies, vol. 53, pp. 663 694, Nov. 2000. [16] M. McGuffin and M. Chignell, Elastic hierarchies: combining treemaps and node-link diagrams, in IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., pp. 57 64, IEEE, 2005. [17] J.-D. Fekete and C. Plaisant, Interactive information visualization of a million items, in IEEE Symposium on Information Visualization, 2002. INFOVIS 2002., pp. 117 124, IEEE Comput. Soc, 2002. [18] J. Van Wijk and H. Van de Wetering, Cushion treemaps: visualization of hierarchical information, in Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis 99), pp. 73 78, IEEE Comput. Soc, 1999. [19] H. Lu, S. X. Liu, M. M. Wattenberg, and J. Ma XI, Constructing a labeled treemap with balanced layout, Apr. 2007. [20] T. Itoh, Y. Yamaguchi, Y. Ikehata, and Y. Kajinaga, Hierarchical data visualization using a fast rectangle-packing algorithm., IEEE transactions on visualization and computer graphics, vol. 10, pp. 302 13, Jan. 2004. [21] J. Leggett, Visualizing Hierarchies and Collection Structures with Fractal Trees, in Sixth Mexican International Conference on Computer Science (ENC 05), pp. 31 40, IEEE, 2005. [22] H. Lü and J. Fogarty, Cascaded treemaps: examining the visibility and stability of structure in treemaps, in GI 08 Proceedings of Graphics Interface 2008, pp. 259 266, Canadian Information Processing Society, May 2008. [23] S. G. Eick and A. F. Karr, Visual Scalability, Journal of Computational and Graphical Statistics, vol. 11, pp. 22 43, Mar. 2002. [24] B. Yost and C. North, The perceptual scalability of visualization., IEEE transactions on visualization and computer graphics, vol. 12, no. 5, pp. 837 44, 2006. [25] B. Wei, C. Silva, E. Koutsofios, S. Krishnan, and S. North, Visualization research with large displays, IEEE Computer Graphics and Applications, vol. 20, no. 4, pp. 50 54, 2000. [26] Y. Yang, N. Dou, S. Zhao, Z. Yang, K. Zhang, and Q. V. Nguyen, Visualizing large hierarchies with drawer trees, in Proceedings of the 29th Annual ACM Symposium on Applied Computing - SAC 14, pp. 951 956, ACM Press, Mar. 2014. [27] S. Ji and H. Park, Image segmentation of color image based on region coherency, in Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269), vol. 1, pp. 80 83, IEEE Comput. Soc, 1998. [28] Y. Ivanov, A. Bobick, and J. Liu, Fast Lighting Independent Background Subtraction, International Journal of Computer Vision, vol. 37, pp. 199 207, June 2000. [29] J. Fan, D. Y. Yau, A. K. Elmagarmid, and W. G. Aref, Automatic image segmentation by integrating color-edge extraction and seeded region growing., IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 10, pp. 1454 66, Jan. 2001. [30] M. Bostock, V. Ogievetsky, and J. Heer, D: Data-Driven Documents., IEEE transactions on visualization and computer graphics, vol. 17, pp. 2301 9, Dec. 2011. [31] J. Liang, S. Simoff, Q. V. Nguyen, and M. L. Huang, Visualizing large trees with divide & conquer partition, in Proceedings of the 6th International Symposium on Visual Information Communication and Interaction - VINCI 13, p. 79, ACM Press, Aug. 2013.