Weighted Maps: treemap visualization of geolocated quantitative data

Transcription

1 Weighted Maps: treemap visualization of geolocated quantitative data Mohammad Ghoniem a and Maël Cornil a and Bertjan Broeksema a and Mickaël Stefas a and Benoît Otjacques a a CRP Gabriel Lippmann, 41 rue du Brill, L-4422 Belvaux, Luxembourg ABSTRACT A wealth of census data relative to hierarchical administrative subdivisions are now available. It is therefore desirable for hierarchical data visualization techniques, to offer a spatially consistent representation of such data. This paper focuses on a widely used technique for hierarchical data, namely treemaps, with a particular emphasis on a specific family of treemaps, designed to take into account spatial constraints in the layout, called Spatially Dependent Treemap (SDT). The contributions of this paper are threefold. First, we present the "Weighted Maps", a novel SDT layout algorithm and discuss the algorithmic differences with the other state-of-the-art SDT algorithms. Second, we present the quantitative results and analyses of a number of metrics that were used to assess the quality of the resulting layouts. The analyses are illustrated with figures generated from various datasets. Third, we show that the Weighted Maps algorithm offers a significant advantage for the layout of large flat cartograms and multilevel hierarchies having a large branching factor. Keywords: Treemaps, spatial consistency, tree visualization, cartograms 1. INTRODUCTION Broadly speaking, two main approaches are used to visualize hierarchical data: node-link diagrams and recursive enclosure of shapes (see the Treevis.net project 1 for a comprehensive list of techniques). This work falls in the second category where Treemaps 2 are the most well-known technique. Our purpose is to contribute to the research work that considers spatial properties of nodes in treemap layouts. We introduce a new treemap layout algorithm that takes into account spatial properties of nodes. Next, we present the results of an extensive quantitative study in which we compare our treemap layout with two existing ones. We discuss these results to make clear how different approaches lead to different trade-offs in various metrics. To illustrate the problem, we consider the recursive split of a territory into administrative subdivisions. Such hierarchies can easily be represented by a treemap where the size of rectangles may encode any additive quantitative region attributes such as population. However, in the case of standard treemap layout algorithms, node position is not correlated to their known geolocations, undermining the ability of culturally trained map users to get a meaningful overview of the whole territory or to rapidly locate specific regions using their prior knowledge of geography. For geographically-trained users, spatial consistency is of prime importance when visualizing geolocated data. Finding geographic entities where the user expects them roughly to be, based on prior knowledge, is crucial to the external anchoring process described by Liu and Stasko, 3 whereby the user can couple internal and external representations of the data and the world. In turn, successful external anchoring is conducive to memory offloading since the user does not have to memorize the onscreen locations of the geolocated entities. When considering multivariate data including spatial properties, spatially consistent visualizations are expected to allow the user to answer questions related to abstract data trends and spatial data distribution, as well as specific attribute values attached to certain locations or regions. For instance, in a visual representation of the USA it is better to have the city of Seattle located at the top left and Houston at the bottom center, if the user is interested in examining their population or employment figures. Likewise, it is helpful to find the counties of California grouped at the bottom left corner of the treemap representation of the USA, if one is interested in how schools are distributed across counties in that state. In this paper, we group all treemap algorithms designed to include spatial constraints into the layout under the generic designation of Spatially Dependent Treemaps (SDT). When appropriate, we use specific algorithm names. Further author information: (Send correspondence to Mohammad Ghoniem) Mohammad Ghoniem: [email protected], Telephone: Benoît Otjacques: [email protected], Telephone:

2 2. RELATED WORK Trying to visualize the value of abstract attributes related to certain geographic subdivisions, within a two-dimensional space, is not new. Geographers have faced this challenge for centuries. The inclusion of spatial constraints into treemaps may then be placed to some extent in the substantial amount of research about cartograms. Tobler s review 4 of cartograms summarizes the work in this domain until 2004 (mainly carried out by geographers). Most of the time, geographic maps are the entry point of the thoughts about cartograms. Maps may be distorted 5 and/or graphically enriched to convey more than purely spatial information. The information visualization community has also studied this issue. In this case, however, maps are not necessarily the starting point of the conceptual thoughts. Various visualization techniques (e.g. graphs, treemaps) may be the artifact to be enhanced to display both spatial and non-spatial information. Recent research work combining information and geovisualization includes choropleth maps coordinated with squarified treemaps, 6 Ring maps, 7 and the overlay of information visualization elements on maps such as Necklace maps. 8 Our work belongs to the infovis approach with treemaps as the specific technique to study and to improve. As pointed out by Baudel and Broeksema, 9 treemaps can be characterized as the application of a space-filling rectangular layout to each level of the hierarchy. As such, our approach compares to cartograms only as far as the latter result in rectangular space-filling layouts. The study of treemap techniques reveals that many of them are motivated by a specific central idea that is used to both design the algorithm generating the treemap, and to define the appropriate metrics to assess the quality of the suggested improvements. For example, trying to keep the aspect ratio close to 1 gave birth to the squarified treemaps. It was found to lead to a more accurate assessment of quantitative data in rectangular layouts than alternate slicing-based treemap algorithms. 10 Therefore, it appears to be a suitable metric to compare alternate layouts of geolocated quantitative data. The study of previous Spatially Dependent Treemap (SDT) techniques 11, 12 shows a similar methodological approach. Besides the typical aspect ratio metric, geographically related measures, such as geolocation of nodes and spatial discontinuities, are included to inform the construction and the evaluation of a new SDT algorithm. We have followed a similar methodological approach and extended it with ideas from other fields. This related work section is therefore organized with this in mind. Limiting spatial discontinuities in treemap layouts is a first objective that has been pursued. Wood and Dykes 11 highlight a specific challenge of treemaps which influences their spatial consistency: they represent one-dimensional sequence data on a two-dimensional display space e.g. when nodes are sorted according to their weight. If not properly considered, this may generate treemaps where distance between rectangles is not consistently related to their order in the sequence. If the node order is meaningful, any discrepancy in the rectangle order in the treemap layout may prevent the user from understanding the data effectively. Further developing this idea, Wood and Dykes 11 propose the Spatially Ordered Treemaps technique (SOT) that takes into consideration the geolocation of nodes. They modify the squarified treemaps algorithm 13 so that nodes are placed based on their distance to the enclosing rectangle to be filled, rather than in the weight sequence order. In order to visually assess the potential spatial distortion generated by this approach, they overlay the treemap with Bézier arrows linking the original geolocation of each node to the rectangle representing it. They also conduct a computational study where SOT are compared to squarified treemaps and to HistoMaps 12 with respect to their average aspect ratio, average distance displacement and average angular displacement. Initially designed for flat cartogram generation, the SOT algorithm has also been extended to handle multilevel hierarchies for various application domains. 14 First introduced by Keim et al., HistoMaps (HM) have been used to analyze the provenance of traffic by integrating the geographic location of message sources into a treemap layout. 15 The geolocations are then structured into a continent/country hierarchy which can be visualized as a treemap. More formally, the HistoMaps layout may be seen as a variation of the pivot-by-middle treemap layout 16 where nodes are sorted according to their latitude, respectively longitude, depending on the layout direction, rather than sorting the nodes by their weight. Splitting the nodes in chunks at the middle of the related longitude/latitude range. Another difference consists in the fact that the point set is split in two chunks, rather than three (left, right and middle). In contrast, the Weighted Maps approach (WM) may be considered as a variation of pivot-by-split-size treemap layout 16 where nodes are also sorted according to their latitude, respectively longitude, depending on the layout direction. As opposed to pivot-by-middle, pivot-by-split-size chunks elements for further recursive layout based on the weight or size of the chunk, i.e. each chunk represents roughly equal weight. In the Weighted Maps algorithm the point set is split into k equally weighted chunks rather than three in the original pivot-by-split-size layout. The value of k is determined so that it leads to the most square top-level chunks, depending on the aspect ratio of the root node. It defaults to 2 on most computer monitors.

3 Our aim to include spatial constraints into treemaps relates to the substantial amount of research about cartograms. Tobler s extensive review of cartograms 4 points out adjacency preservation as another potentially useful quality criterion. That is, how many neighbors of a given area A on the geographic map are still neighbors of A in the treemap representation? Focusing on adjacency preservation, Buchin et al. 17 produce adjacency preserving treemaps. They use a graph structure to formulate the cartogram generation problem as a graph optimization problem. They define three types of adjacencies in two-level hierarchies: top-level adjacencies, internal bottom-level adjacencies and external bottom-level adjacencies. Their approach preserves the first two types of adjacencies, but compromises on the proportionality between node weights and rectangle areas in this process (this compromise is also known as cartographic error ). Another approach from the cartograms domain that comes close are rectangular cartograms, proposed by Speckmann and Kreveld. 18, 19 A generalization of rectangular cartograms are rectilinear cartograms for which an algorithm is proposed by de Berg et al. 20 Buchin et al. 21 propose further optimization strategies for rectangular cartograms, resulting in cartograms with low error rates with respect to cartographic error and adjacency preservation. Taking a pure treemap perspective, all three SDT approaches introduced earlier, and in particular the Weighted Maps, preserve weight-area proportionality, at the cost of some relative/absolute positional anomalies and some adjacency loss. Beyond flat cartograms, the SDT techniques, and in particular the Weighted Maps, can be applied recursively to handle multilevel hierarchies. Also, these cartogram approaches assign some of the layout space to seas and oceans in order to improve preserved adjacencies, the resulting layouts are therefore not space-filling strictly speaking. The rectilinear approach, in addition allows for non-rectangular shaped forms, which puts them even further away from the treemap domain. Moreover, one of the open problems mentioned is the computational complexity of these approaches. Their examples seem to illustrate this since they are using relatively small datasets (up to a couple of hundred nodes). Buchin et al. 21 for example, report running times as long as 207 minutes for a cartogram of the world, which consist of only about 200 nodes which seems to make the approach unsuitable for interactive exploration of large datasets such as the French communes (roughly 36,000 nodes). We, on the other hand, try to optimize treemap layouts or rectangular space-filling layouts in general, while taking into account some, but not all geographic constraints. Furthermore, the interactive exploration of the data is of high importance to us. The need to generate such large cartograms in interactive time, and even larger ones, may arise in a variety of situations where fine-grained geolocated data is available e.g., representing the power consumption of all city blocks in a country will result in millions of geolocated nodes. This means that the user should be able to switch between different variables in real-time and that maps can be updated on the fly as new data comes in. Still, we draw further inspiration of the cartogram line of work for the validation of our results. Other recent approaches 22, 23 explore the use of grid layouts of geographic subdivisions. For instance, Eppstein et al. 23 model the problem as a point set matching optimization problem. Their solution produces uniformly spaced grids that are suitable for displaying unweighted nodes. The general location of nodes with respect to the entire map (top, bottom, left, right), their pairwise adjacency, as well as their relative orientation are identified as relevant evaluation criteria. In contrast, the Weighted Maps are designed for weighted geolocated data and preserve weight-area proportionality. By sorting the data points by latitude, respectively longitude, the Weighted Maps manage to some extent to preserve the overall location of nodes on the map and their relative locations while keeping the average aspect ratio of rectangles at a minimum. 3. THE WEIGHTED MAPS ALGORITHM Like HistoMaps, the Weighted Maps algorithm is inspired by the pivot variants introduced by Bederson et al. 16 HistoMaps can be considered as a variation of the pivot-by-middle layout, where the pivot is not based on an element but on the middle value of the longitude or latitude range. Weighted Maps, on the other hand, can be considered as a variation of the pivot-by-split-size treemap algorithm insofar that, the point set is split in bins of equivalent weight. We adapt the original algorithm by sorting the point set at each recursion step, according to the longitude/latitude attribute depending on the layout direction. We further adapt the algorithm by choosing the number of bins based on the aspect ratio of the enclosing area. The Weighted Maps algorithm calculates the weight that would result in the squarest chunk, and adds items until the closest approximation of this weight is reached. This is unlike HistoMaps, which uses a fixed 2-bin split. Additionally, like HistoMaps, the Weighted Maps layout does not have an explicit pivot element. Intuitively, if the display area is split into squarish chunks, this may improve the average aspect ratio (which is conducive to a more accurate assessment of quantitative data 10 ). The number of chunks depends on the global aspect ratio of the display area. A display space of aspect ratio R will generate k = R chunks (or k = R, whichever is closer to R) at the

4 // The main layout algorithm // WM configuration function layout(t, from, to) function size(ti) availablespace = new Rect(0,0, state.width, state.height) return ti.population currentchunk = new Chunk(phrase(null), availablespace) currentfrom = from function order(p) result = [currentchunk] if (state.a.w > state.a.h) T = order(t, from, to) return (order P by x coord.) prevscore = -inf else return (order P by y coord.) for (i = from; i < to; ++i) itemsize = size(t[i]) function score(chunk, itemsize) curscore = score(currentchunk, itemsize) nbchunks = round( if (curscore < prevscore) max(state.a.w, state.a.h) / currentchunk.reduce(availablespace) min(state.a.w, state.a.h)) nbchunks = (nbchunks < 2)? 2 : nbchunks if (recurse(currentchunk)) if (state.chunks.length == nbchunks) recursivechunks = layout(t, currentfrom, to) return MAX_SCORE else if (!recursivechunks.isempty()) Cpref = state.a.w * state.a.h / nbchunks result.pop() Cnew = chunk.area + itemsize result.append(recursivechunks) return Cpref - Cnew currentchunk = new Chunk(phrase(currentChunk), availablespace) result.append(currentchunk) currentfrom = i prevscore = score(currentchunk, itemsize) else prevscore = curscore currentchunk.additem(itemsize) function phrase(chunk) if state.a.w > state.a.h return (Left, Left_to_Right) else return (Top, Top_to_Bottom) function recurse(chunk) return chunk.itemcount > 1 if (currentfrom!= from && recurse(currentchunk)) recursivechunks = layout(t, currentfrom, to) if (!recursivechunks.isempty()) result.pop() result.append(recursivechunks) return result Listing 1: The generic layout algoritm by Baudel and Broeksema 9 (Left), and the WM configuration for it (right). Configuration points of the generic algorithm are marked in red. State is considered to be a global variable that is updated by the generic algorithm as required. top level, each being further split in two chunks at the next recursion levels. These screen-space chunks correspond to bins including the closest points preserving weight-area proportionality. On most computer monitors, a full-screen WM display will result in a binary space partitioning tree. However, in the case of elongated countries like Argentina or Portugal, the layout may be deemed more plausible if it had the same global aspect ratio as the map representation of the country. For the sake of reproducibility, we express the WM algorithm as a configuration of the generic treemap algorithm by Baudel and Broeksema, 9 which we also used to implement our versions of HistoMaps and SOT. In listing 1, we give the pseudocode of the generic algorithm and of the five functional dimensions of the generic algorithm: order, size, chunk (implemented by means of score), phrase and recurse. In the case of multilevel hierarchies, the algorithm is applied level by level to lay out top-level nodes first, then to lay out children nodes further down in the hierarchy in their parent space. 4. EVALUATION In this section, we present a computational study comparing the Weighted Maps algorithm to two other spatially dependent treemap algorithms, namely the Spatially Ordered Treemaps (SOT) by Wood and Dykes 11 and the HistoMaps (HM) by Mansmann et al. 12 In the body of visualization evaluation work, this study belongs to the Algorithm Performance (AP) category of Isenberg et al. s taxonomy. 24 As a preliminary step, we reimplemented both the SOT and HM algorithms as described in their respective papers. 11, 12 In particular, the HM implementation evaluated in this work uses a two-bin partitioning schema, as in the original work by Mansmann et al. 4.1 Metrics Based on the previously discussed related work, we use the following metrics for our experimental study:

5 1. average aspect ratio (less is better) defined as: r = 1 n n i=1 r i, where n is the number of leaf nodes and r i the aspect ratio of the ith leaf node; 2. average distance displacement (less is better) defined as : d = n i=1 d i n, where n is the number of nodes, A A root is the area root of the root node, and d i is the Euclidean distance between each node s treemap centroid and its affine transformed geographic location; average angular displacement (less is better) defined as: θ = 1 n 2 n i=1 n j=1 arccos ( ui j u i j ) v i j v i j, where n is the number of nodes, u i j is the vector between each leaf node and each of its sibling leaves in treemap space, and v i j is the same vector in geographic space; average adjacency preservation (more is better) defined as: a = 1 n n i=1 a i, where n is the number of leaf nodes and a i the ratio of preserved neighbors of the ith leaf node; 5. average fragmentation (less is better) defined as: f = 1 n p np i=1 f i, where n p is the number of nodes at the parent level and f i is the number of fragments of the ith node at the parent level. The first three metrics are used by Wood and Dykes to compare SOT to HM. Hence, our results can be compared to theirs directly. The fourth metric allow us to compare the three algorithms with respect to adjacency preservation and to existing adjacency optimized techniques. Adjacency preservation is used by van Kreveld and Speckmann 19 to validate their rectangular cartogram approach. Even though the Weighted Maps (or any of the SDTs for that matter) are oblivious of adjacency information, adjacency preservation is still included as a quality metric in our computational study. This gives us an additional angle to compare SDT approaches with respect to their cartographic properties. In addition it lets us compare SDT approaches to some extent with cartogram approaches. Lastly, the fifth metric measures the fragmentation produced at the parent node level when a flat layout is computed at the leaf level discarding any knowledge of the hierarchy e.g., in the case of the USA the fragmentation rate is computed at the state level when the layout algorithm is applied on the flat county level. Such fragmentation can be regarded as an extreme manifestation of adjacency loss. We are not aware of other work using this metric. 4.2 Use Cases In order to compare WM to SOT and HM, we used 18 real datasets (10 flat and 8 nested) concerning the USA and France ranging from tens of nodes to tens of thousands. These datasets have significant differences both in how the points are geographically spread, as well as in the distribution of weights. This is illustrated in Figure 1, which shows the point distribution for each of the point sets we used. We clearly see differences in how points are spatially distributed. For example, note how counties in the USA are almost normally distributed with respect to the Y-location, while in France at the canton level, the spread is more uniform, with a peak at It also shows two weight distributions for the USA at the country level. Like with the point set, the weight distributions also are significantly different. We observed similar significant differences for weight distributions at other levels as well, but lack the space to display all plots. (a) Distribution of the different used point sets (b) Weight distributions on the USA county level Figure 1: Various distributions related to the datasets used for evaluation.

6 For each dataset, the aspect ratio of the root node was set to match that of the bounding rectangle of the point set. The results obtained for the five metrics regarding these datasets are reported in Tables 1 to 6. A total of 54 table rows report the results of 54 distinct experimental settings identified by a configuration number in the leftmost column. Table rows have been grouped three at a time as they involve a common dataset (i.e. a common combination of point set and weighting attribute), while alternating through the three layout algorithms at hand. For statistical reliability purposes, we also report the standard deviation associated to all reported averages. For every dataset, we ran the paired two-tailed Student s t-test to assess the statistical significance of pair-wise differences between the three layout algorithms with p-value We ran this test with respect to each evaluation metric. When statistical significance could be ascertained at this level of confidence, we reported the best score in bold. When the best score and the first runner-up could not be distinguished with the required confidence, they were both reported in bold provided that both of them could individually be distinguished reliably from the last. In all other cases, no emphasis was put on the scores USA First, we examined the contiguous USA population and land area at the state level and at the county level. State and county population statistics were extracted from the geonames.org website. 25 The adjacency preservation metric was computed using the county adjacency graph provided by the US Census Bureau. 26 The resulting flat treemaps have either 49 nodes in the case of states (48 states + District of Columbia), or 3,109 nodes in the case of counties. They can be seen in Figures 2 and 3 respectively. The corresponding statistics appear in the top 6 rows of Tables 1 and 2. The three layout algorithms can also be applied recursively to construct a nested treemap for the two-level hierarchy made of 49 top-level nodes and 3,109 leaves. The corresponding nested treemaps snapshots can be seen in Figure 4. The related statistics appear in rows 7 to 9 of Table 1 and rows 16 to 18 of Table 2, where 3,109/49 in the second column indicates the tree structure starting with the number of leaves, followed by the number of nodes at their parent level, and so forth up to the root level. USA Population in 2010 (49 states and 3,109 counties) # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 1 HM % 8.6% % 24.4% /1 WM % 9.3% % 23.6% SOT % 9.7% % 31.4% HM % 7.5% % 22.2% ,109/1 WM % 7.3% % 20.8% SOT % 10.5% % 16.9% HM % 9.8% % 22.0% ,109/49/1 WM % 9.4% % 19.9% SOT % 9.6% % 20.1% 1 0 Table 1: This table compares the flat and nested versions of the HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) on USA Population data. In the nested case (bottom 3 rows), fragmentation is inexistent by construction. USA Land Area (49 states and 3,109 counties) # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 10 HM % 5.7% % 20.7% /1 WM % 7.7% % 16.8% SOT % 9.2% % 27.7% HM % 5.2% % 19.4% ,109/1 WM % 5.4% % 18.4% SOT % 9.6% % 18.2% HM % 6.7% % 21.1% ,109/49/1 WM % 7.6% % 19.1% SOT % 9.7% % 20.6% 1 0 Table 2: This table compares the flat and nested versions of the HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) on USA land area data. In the nested case (bottom 3 rows), fragmentation is inexistent by construction.

7 (a) HistoMaps (b) Weighted Maps (c) Spatially Ordered Treemap Figure 2: The USA population in 49 states according to (a) the HistoMaps layout, (b) the Weighted Maps layout and (c) the Spatially Ordered Treemaps layout. The three subfigures correspond to the top three rows in Table 1 respectively.

8 (a) HistoMaps (b) Weighted Maps (c) Spatially Ordered Treemap Figure 3: The USA population in 3,109 counties according to (a) the HistoMaps layout, (b) the Weighted Maps layout and (c) the Spatially Ordered Treemaps layout. Colors encode state membership consistently with Figure 2. The SOT layout creates strip patterns and severe state fragmentation. See rows 4 to 6 in Table 1 for the related statistics.

9 (a) Weighted Maps (b) Spatially Ordered Treemap Figure 4: The USA population in 3,109 counties according to the nested version of (a) the WM layout and (b) the SOT layout. The state-level rectangles are the same as in Figure 2. See rows 8 and 9 in Table 1 for the related statistics. A close inspection of Tables 1 and 2 reveals that SOT never outperforms HM and WM except once: with respect to the aspect ratio metric in the nested treemap case regarding the land area dataset (see the number in bold in row 18 of Table 2). With respect to small flat treemaps, the results either lack statistical significance for a winner to clearly stand out (see the top three rows in Table 1) or, there are ties between HM and WM on various metrics depicted by multiple values in bold faced characters in the same group (see the top three rows in Table 2). Dealing with large flat treemaps (see the middle three rows in Tables 1 and 2), WM ranks first for aspect ratio and linear displacement, while HM ranks first for adjacency preservation. Ties occur between HM and WM as well as occasional wins for HM regarding angular displacement and parent-level fragmentation. Looking closely at Figure 2, one can see the underlying properties and flaws of the three layout approaches. For instance, the SOT layout (Figure 2 (c)) is characterized by strip patterns due to the stacking strategy common to all squarified treemap based approaches. Hence, vertical strips occupy the left half of the representation, then horizontal and vertical strips alternate in the right half. Some positional anomalies can be found in the SOT representation, such as New Mexico being placed to the North of Colorado, or having Rhodes Island and Maine in the bottom right corner next to Florida. Similarly, taking a close look at the HM layout (Figure 2 (a)), one can see the top-level split in the middle of the longitude range as a vertical divide to the right of Texas. A second-level split can be seen vertically to the right of Florida on the right-hand side and horizontally above Texas and New Mexico on the left-hand side. In the HM representation, examples of positional flaws include New Mexico being placed to the North of Arizona, or Oregon being placed to the West of Washington state. Similar remarks can be made concerning the WM layout (Figure 2 (b)), one can see a vertical top-level divide in the middle of the representation to the right of Wisconsin and Illinois, followed by horizontal second-level splits above California on the left-hand side and above Indiana and Florida on the right-hand side. In the WM representation, positional flaws include the fact that Alabama is placed to the Northwest of Tennessee. In Figure 3, flat treemap representations of the 3,109 US counties are displayed as generated by the three layout algorithms. The strip patterns of SOT (Figure 3 (c)) are even more obvious and result in severely ragged state contours and high state fragmentation (13.5 fragments per state on average as reported in row 6 of Table 1). In the HM layout (Figure 3 (a)) and WM layout (Figure 3 (b)), state fragmentation is also visible, but is rather mild (2.2 and 3.0 fragments per state on average respectively as reported in rows 4 and 5 of Table 1). Previous remarks on positional anomalies still apply. In Figure 4, we show the 2-level treemap representation of the USA county population as generated by the WM and SOT layouts. Obviously, at the state level the space is subdivided exactly as in Figure 2. Further down, the leaf/county nodes are laid out using an extra recursion of the same layout algorithms. By keeping the branching factor of each subtree in the order of a hundred nodes, the nested version of SOT is much more readable than the flat version in Figure 3 (c) France With 35,955 communes, metropolitan France is by far the first European country by the number of communes. Hence, it qualifies as a good benchmark for spatial layout algorithms. The number of nodes at the communes level is between one and two orders greater compared to the USA states and counties datasets respectively. At the upper levels of its administrative subdivisions, metropolitan France is divided in 21 regions not including Corsica, which are in turn subdivided

10 Figure 5: The 2012 population in 35,955 French communes according to the flat WM layout. The black contours delimit the 21 top-level regions, making region-level fragmentation visible. Within a region, different colors encode different departments. See row 26 in Table 3 for the related statistics. in 94 departments. Hence, the French departments data has the same scale as the USA data at the state level. Further, metropolitan France is subdivided into 3,666 cantons, which is quite comparable to the 3,109 USA counties scale-wise. We considered the 2012 population statistics according to the most recent data published by the IGN 27 as well as the land area of the communes. Upper-level population and land area data have been computed by mere aggregation bottom-up from the communes data. The aspect ratio of the root node was set to match the bounding rectangle of the point set at hand. The results of the flat layouts are summarized in Tables 3 and 4. Similar to the USA use case, the results lack statistical significance to ascertain the superiority of any of the three algorithms for the layout of the 94 French departments. HM and WM make a tie concerning linear and angular displacement at the department level using land area. The SOT layout ranks last for larger flat treemaps at the canton and commune levels, using both the population and the land area for node weighting. At the canton level, WM ranks first with respect to average aspect ratio and average linear displacement using both population and land area for weighting, while HM ranks first regarding adjacency preservations. Ties occur between WM and HM with respect to angular displacement and parent-level fragmentation. At the commune level, WM also ranks first with respect to average aspect ratio, it ranks first with respect to average linear displacement using population data and

11 Figure 6: The 2012 population in 35,955 French communes according to the flat SOT (left) and HM (right) layouts. The black contours delimit the 21 top-level regions. Within a region, different colors encode different departments. See rows 25 and 27 in Table 3 respectively for the related statistics. # Tree Structure Layout Flat Treemap Layouts of the French Population in 2012 Aspect Ratio Linear Displ. Angular Displ. Adjacency mean stdev mean stdev mean stdev mean stdev Fragmentation mean stdev 19 HM % 7.1% % 19.2% /1 WM % 6.6% % 21.7% SOT % 8.6% % 24.0% HM % 6.5% % 22.8% ,665/1 WM % 9.4% % 22.3% SOT % 10.3% % 17.4% HM % 6.9% % 27.7% ,955/1 WM % 9.5% % 26.7% SOT % 11.9% % 13.5% Table 3: This table compares HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) for the generation of flat cartograms on the French metropolitan Population in 2012 within the 94 departments, the 3,665 cantons and the 35,955 communes respectively. # Tree Structure Layout Flat Treemap Layouts of the French Land Area Aspect Ratio Linear Displ. Angular Displ. Adjacency mean stdev mean stdev mean stdev mean stdev Fragmentation mean stdev 28 HM % 4.9% % 18.6% /1 WM % 5.3% % 18.9% SOT % 6.0% % 20.4% HM % 6.5% % 20.3% ,665/1 WM % 6.4% % 19.1% SOT % 10.5% % 18.2% HM % 6.5% % 30.8% ,955/1 WM % 6.3% % 29.2% SOT % 11.2% % 14.3% Table 4: This table compares HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) for the generation of flat cartograms on the French land area within the 94 departments, the 3,665 cantons and the 35,955 communes respectively.

12 ranks second in the case of land area data. HM ranks first with respect to angular displacement, adjacency preservation and parent-level fragmentation using both the population and the land area data. Figure 5 shows the flat treemap generated by the Weighted Maps layout for the French communes. Figure 6 shows the corresponding flat treemap layouts generated by SOT (left) and HM (right). The SOT treemap is severely affected by thin strip patterns and parent-level fragmentation. In contrast to the WM layout in Figure 5, HM seems to generate elongated rectangles in Figure 6 around large cities such as Paris, Marseille and Toulouse, but both layouts look quite similar overall. The results of the nested layouts are summarized in Tables 5 and 6. The number of leaves and parent nodes in each experimental setting is indicated in the second column. For France, we study the performance of all three layouts for 2-, 3- and 4-level hierarchies. It is worth noting that the average aspect ratio of the SOT layout becomes comparable to that of HM and WM. Occasionally, nested SOT ranks first for average aspect ratio when dealing with land area distribution only (see the last row in Tables 2 and 6). We don t show the nested layouts of France in this paper in the benefit of space. Nested Treemap Layouts of the French Population in 2012 # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 37 HM % 7.2% % 21.7% /21 WM % 9.4% % 22.1% SOT % 15.3% % 22.6% HM % 8.0% % 21.9% ,665/94/21 WM % 10.1% % 21.3% SOT % 13.6% % 21.4% HM % 7.8% % 26.3% ,955/3,665/94/21 WM % 9.8% % 25.8% SOT % 14.0% % 25.0% 1 0 Table 5: This table compares the nested versions of the HistoMaps, Weighted Maps and Spatially Ordered Treemaps algorithms on multi-level hierarchies showing the French population within the 21 regions, 94 departments, the 3,665 cantons and the 35,955 communes respectively. Fragmentation is inexistent by construction. Nested Treemap Layouts of the French Land Area # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 46 HM % 6.3% % 20.6% /21 WM % 6.2% % 19.4% SOT % 9.1% % 22.0% HM % 7.4% % 21.4% ,665/94/21 WM % 7.5% % 20.2% SOT % 9.9% % 20.6% HM % 7.5% % 27.1% ,955/3,665/94/21 WM % 7.6% % 26.4% SOT % 10.0% % 25.3% 1 0 Table 6: This table compares the nested versions of the HistoMaps, Weighted Maps and Spatially Ordered Treemaps algorithms on multi-level hierarchies showing the French land area within the 21 regions, 94 departments, the 3,665 cantons and the 35,955 communes respectively. Fragmentation is inexistent by construction. 5. DISCUSSION We place our work in the line of treemap algorithms, thus the algorithm does not allow for any cartographic error. That is, the area of each rectangle in the final layout is exactly proportional to the weight it represents. As such, we cannot expect to get as good results for geographic metrics as approaches based on or inspired by rectangular cartograms. 18 Bederson et al. observed 16 two problems with cluster and squarified treemap layouts: changes in data can cause dramatic discontinuities in produced layouts, and these algorithms do not take into account explicit order information that is part of the data. To address these problems they propose several layout algorithms, among which the pivot-based layouts (i.e. pivot-by-size, pivot-by-middle, and pivot-by-split-size). Given that these algorithms explicitly address the order of data, it is not surprising that it has inspired the creation of both the Weighted Maps algorithm and the HistoMaps algorithm. We started our work with two hypotheses:

13 1. Using land area as weight leads to minimal error with respect to geographic metrics (displacement, adjacency preservation, and fragmentation) 2. Degradation of adjacency preservation is minimized when overall relative location constraints (linear and angular displacement) are met. Unsurprisingly, our first hypothesis seems to be correct given the data we have tested. When looking at USA data in Tables 1 and 2, we see that all algorithms perform better with respect to linear and angular displacement, adjacency preservation, and fragmentation in the land area case. One exception is SOT, for which angular displacement degrades in the States and the States/Counties cases. We see similar results for the French datasets. For the second hypothesis the evidence is not strong enough to make strong claims about it. When we look at each case with the highest adjacency preservation, we see that it typically also has lowest values for linear displacement and angular displacement. For the USA data, these lowest values are significant in 75% of the cases. For the France data, we see a similar pattern, though lowest values account for 58% of the cases. The results concerning the USA population and the land distribution of the French departments (see the top 3 rows of Tables 1 and 4) are consistent with those published by Wood and Dykes 11 with respect to the average aspect ratio, average linear displacement and average angular displacement. This confirms that our implementation of SOT is correct, and that our implementation of HM is also comparable to theirs. Hence, the comparison of the Weighted Maps algorithm to both the SOT and HM algorithms regarding other (larger) datasets and additional quality metrics, namely adjacency preservation and fragmentation, can complete and be interpreted in the light of the previous study by Wood and Dykes. The results also show that the gap between SOT, HM and WM is rather small for small one-level hierarchies having up to a hundred nodes, all metrics considered. This is very much the case with the 2010 population in 49 USA states. The resulting treemaps can be seen in Figure 2. A careful inspection reveals that all algorithms have some localized flaws. For example, SOT places New Mexico to the north of Colorado, while WM places Utah to the north of Idaho. WM and HM may seem closer to reality regarding north eastern states (Vermont, Maine, New Hamshire, Massachussetts, New York) than SOT, while the latter may be better with north western states. HM performs very similar to WM. The treemaps at this level produced by SOT, HM and WM are still visually plausible. Subtle discrepancies in relative rectangle positions may be accounted for by the choice of anchor/reference points on the map representation. More precisely, the question is what is the geolocation of a top-level administrative entity, such as a state? Is it the centroid of the geolocations of its sub-entities, or some other location such as the state capitol, which may be far away from the centroid? Depending on the choice of anchor points, the resulting relative rectangle locations in the treemap layout may vary. Regarding small one-level hierarchies, there are only small differences on average between SOT, HM, and WM, however statistically significant they may be. For example, the difference in average angular displacement between the three algorithms does not exceed 12 degrees, which is objectively small. Also, the differences in normalized average distance displacement never exceed 10%. The average aspect ratio seems to resist this conclusion concerning small hierarchies in one instance: Weighting the USA states with land area figures degrades the average aspect ratio produced by both the WM and SOT layouts significantly (see Table 2). Only HM, is not affected as strongly with respect to aspect ratio in this case. For both SOT and WM, the standard deviation figures show very high dispersion, indicating the presence of large outliers and that the average is not a reliable statistical measure in this case. The average values reported in this case are affected by the presence of a single outlier value of 184 for WM, and two outlier values at 21.8 and 68.5 for SOT. Like HM, WM becomes consistently much more competitive than SOT for the flat layout of thousands of nodes (the average aspect ratio improvement ranges between 2x and 15x). This is due to the fact that SOT is based on the squarified treemap algorithms which places nodes in vertical or horizontal strips. These strips become very thin when individual node weights become small compared to the total sum of weights at the parent/root level. This issue appears clearly in Figure 3 regarding the 2010 American population laid out at the county level (3,109 leaf nodes) all five metrics considered. It is even more severe in Figure 6 (left), regarding the 2010 French population laid out at the commune level (35,955 leaf nodes). In this last case, the aspect ratio standard deviation indicates a lot of dispersion. We have attempted to improve the SOT algorithm by optimizing the average aspect ratio per strip rather than that of the smallest item in the original algorithm. While this strategy improves the average aspect ratio of SOT, it does not mitigate the fundamental weakness of this algorithm due to strip construction: the improved SOT still lags behind WM in terms of average aspect ratio when dealing with large one-level hierarchies.

14 In the case of multilevel hierarchies, nested SOT has a much smaller gap with HM and WM, since, at each recursion, the number of nodes to be laid out remains close to the comfort zone of SOT. This can be seen in most use cases in the bottom three rows of Tables 1, 2 and 5, where nested WM, nested HM, and nested SOT are rather close, except for the case of the USA states 2010 population anomaly explained earlier. As the branching factor increases, both nested HM and nested WM tend to stand ahead of nested SOT as with the 3,666 French cantons and the 35,955 French communes. Following Eppstein et al., 23 we also evaluated HM, WM and SOT with respect to adjacency preservation, even though they were not designed with this goal in mind. As reported in the tables, WM achieves 40% to 65% adjacency preservation values mostly greater than 50%, while SOT scores range between 8% and 52% with values mostly less than 50%. It is important to note that WM scores are rather stable close to 50% regardless of the dataset at hand, while SOT performance is heavily degraded as the scale increases reaching a median value of 0% adjacency preservation for the French communes (i.e. half the nodes have no adjacencies preserved at all). However, it should be noted that when it comes to adjacency preservation, HM outperforms WM in most cases. Unsurprisingly, all three algorithms fall behind the adjacency optimized approach of Eppstein et al., which achieves 75% adjacency preservation on small hierarchies. In their work, Eppstein et al. showcase their approach on the 49 contiguous USA states and on the French departments. Based on the high time complexity reported in their paper, the layout of bigger hierarchies like the French communes seems intractable. Concerning large one-level hierarchies, such as the US counties and the French communes, we measured the fragmentation rate of top-level groups (e.g. USA states and French departments) that results from applying the layout algorithm on the leaf level directly. The less fragmentation, the better the algorithm. In this regard, WM appears to be always much better than SOT (3x to 10x less fragmentation) and is similar to HM. Once again, parent-level fragmentation is aggravated by the strip construction strategy underlying SOT, whereas the recursive space partitioning strategy of WM increases the likelihood of nodes allocated to different partitions to remain contiguous when their respective geolocations are close. Finally, with respect to geographic features, HistoMaps outperform Weighted Maps most of the time. The differences are mostly not very large, but often statistically significant for larger datasets. This begs the question if Weighted Maps is an improvement at all. In retrospect, we have found that, in some sense, we have reproduced results from the Ordered and Quantum treemaps paper. 16 When we compare HistoMaps and Weighted Maps, we see that the latter always gives better aspect ratios. Recall that both of these algorithms have drawn inspiration from the pivot layouts; HistoMaps from the pivot-by-middle and Weighted Maps from the pivot-by-split-size. Looking at the results for the pivot layouts, we also see that in general, pivot-by-split gives better aspect ratios than pivot-by-middle. This difference could be an explanation for the better performance of HistoMaps with respect to geographical features: elongated rectangles allow for more neighbors. The trade-off that has been shown in Bederson et al. s paper is between aspect ratio and change (pivot-by-split-size is more sensitive to changes in the data). So we could expect a similar trade-off between HistoMaps and Weighted Maps as well, though we have not tested this. 6. CONCLUSION In this work, we presented the Weighted Maps, a spatially aware treemap algorithm. It gives consistently better aspect ratios than existing algorithms for data sets that consist of large flat hierarchies, and behaves equally well otherwise. Moreover, the output of Weighted Maps tends to be aesthetically pleasing for large flat hierarchies, while other existing algorithms, e.g. Spatially Ordered Treemaps, are undermined by inherent strip patterns. Weighted Maps can be considered a trade-off algorithm for HistoMaps, where the trade-off is between aspect ratio and geographic correctness. Future work includes resolving the fragmentation problem observed earlier and further assessment of Weighted Maps through user studies and simulated weight distributions. REFERENCES [1] Schulz, H.-J., Treevis.net: A tree visualization reference, IEEE Computer Graphics and Applications 31(6), (2011). [2] Shneiderman, B., Tree visualization with tree-maps: 2-d space-filling approach, ACM Trans. Graph. 11, (Jan. 1992). [3] Liu, Z. and Stasko, J., Mental models, visual reasoning and interaction in information visualization: A top-down perspective, Visualization and Computer Graphics, IEEE Transactions on 16(6), (2010).

15 [4] Tobler, W., Thirty five years of computer cartograms, Annals of the Association of American Geographers 94(1), (2004). [5] Dorling, D., Barford, A., and Newman, M., Worldmapper: The world as you ve never seen it before, Visualization and Computer Graphics, IEEE Transactions on 12(5), (2006). [6] Jern, M., Rogstadius, J., and Astrom, T., Treemaps and choropleth maps applied to regional hierarchical statistical data, in [Information Visualisation, th International Conference], (2009). [7] Zhao, J., Forer, P., and Harvey, A. S., Activities, ringmaps and geovisualization of large human movement fields, Information Visualization 7(3-4), (2008). [8] Speckmann, B. and Verbeek, K., Necklace maps, Visualization and Computer Graphics, IEEE Transactions on 16(6), (2010). [9] Baudel, T. and Broeksema, B., Capturing the design space of sequential space-filling layouts, Visualization and Computer Graphics, IEEE Transactions on 18(12), (2012). [10] Kong, N., Heer, J., and Agrawala, M., Perceptual guidelines for creating rectangular treemaps, Visualization and Computer Graphics, IEEE Transactions on 16, (Nov 2010). [11] Wood, J. and Dykes, J., Spatially ordered treemaps, Visualization and Computer Graphics, IEEE Transactions on 14(6), (2008). [12] Mansmann, F., Keim, D., North, S., Rexroad, B., and Sheleheda, D., Visual analysis of network traffic for resource planning, interactive monitoring, and interpretation of security threats, Visualization and Computer Graphics, IEEE Transactions on 13(6), (2007). [13] Bruls, M., Huizing, K., and Wijk, J., Squarified treemaps, in [Data Visualization 2000], Leeuw, W. and Liere, R., eds., Eurographics, 33 42, Springer Vienna (2000). [14] Slingsby, A., Dykes, J., and Wood, J., Rectangular hierarchical cartograms for socio-economic data, Journal of Maps 6(1), (2010). [15] Keim, D. A., Mansmann, F., Panse, C., Schneidewind, J., and Sips, M., Mail explorer - spatial and temporal exploration of electronic mail, in [Proceedings of the Seventh Joint Eurographics / IEEE VGTC Conference on Visualization], EUROVIS 05, , Eurographics Association, Aire-la-Ville, Switzerland, Switzerland (2005). [16] Bederson, B. B., Shneiderman, B., and Wattenberg, M., Ordered and quantum treemaps: Making effective use of 2d space to display hierarchies, ACM Trans. Graph. 21, (Oct. 2002). [17] Buchin, K., Eppstein, D., Löffler, M., Nöllenburg, M., and Silveira, R. I., Adjacency-preserving spatial treemaps, in [Algorithms and Data Structures], Dehne, F., Iacono, J., and Sack, J.-R., eds., Lecture Notes in Computer Science 6844, , Springer Berlin Heidelberg (2011). [18] Speckmann, B., Kreveld, M. V., and Florisson, S., A linear programming approach to rectangular cartograms, in [12th International Symposium on Spatial Data Handling], Riedl, A., Kainz, W., and Elmes, G. A., eds., , Springer Berlin Heidelberg (2006). [19] van Kreveld, M. and Speckmann, B., On rectangular cartograms, Computational Geometry 37, (Aug. 2007). [20] De Berg, M., Mumford, E., and Speckmann, B., Optimal bsps and rectilinear cartograms, International Journal of Computational Geometry & Applications 20(02), (2010). [21] Buchin, K., Speckmann, B., and Verdonschot, S., Evolution strategies for optimizing rectangular cartograms, Geographic Information Science 7478(639), (2012). [22] Wood, J., Badawood, D., Dykes, J., and Slingsby, A., Ballotmaps: Detecting name bias in alphabetically ordered ballot papers, Visualization and Computer Graphics, IEEE Transactions on 17(12), (2011). [23] Eppstein, D., van Kreveld, M., Speckmann, B., and Staals, F., Improved grid map layout by point set matching, in [Visualization Symposium (PacificVis), 2013 IEEE Pacific], (2013). [24] Isenberg, T., Isenberg, P., Chen, J., Sedlmair, M., and Moller, T., A systematic review on the practice of evaluating visualization, Visualization and Computer Graphics, IEEE Transactions on 19(12), (2013). [25] Geonames, The geonames geographic database. (2012). Accessed: [26] US Census Bureau, County adjacency file. (2013). Accessed: [27] Institut National de l Information Géographique et Forestière, GEOFLA R. (2012). Accessed: