Weighted Maps: treemap visualization of geolocated quantitative data
|
|
|
- Barbra Baker
- 10 years ago
- Views:
Transcription
1 Weighted Maps: treemap visualization of geolocated quantitative data Mohammad Ghoniem a and Maël Cornil a and Bertjan Broeksema a and Mickaël Stefas a and Benoît Otjacques a a CRP Gabriel Lippmann, 41 rue du Brill, L-4422 Belvaux, Luxembourg ABSTRACT A wealth of census data relative to hierarchical administrative subdivisions are now available. It is therefore desirable for hierarchical data visualization techniques, to offer a spatially consistent representation of such data. This paper focuses on a widely used technique for hierarchical data, namely treemaps, with a particular emphasis on a specific family of treemaps, designed to take into account spatial constraints in the layout, called Spatially Dependent Treemap (SDT). The contributions of this paper are threefold. First, we present the "Weighted Maps", a novel SDT layout algorithm and discuss the algorithmic differences with the other state-of-the-art SDT algorithms. Second, we present the quantitative results and analyses of a number of metrics that were used to assess the quality of the resulting layouts. The analyses are illustrated with figures generated from various datasets. Third, we show that the Weighted Maps algorithm offers a significant advantage for the layout of large flat cartograms and multilevel hierarchies having a large branching factor. Keywords: Treemaps, spatial consistency, tree visualization, cartograms 1. INTRODUCTION Broadly speaking, two main approaches are used to visualize hierarchical data: node-link diagrams and recursive enclosure of shapes (see the Treevis.net project 1 for a comprehensive list of techniques). This work falls in the second category where Treemaps 2 are the most well-known technique. Our purpose is to contribute to the research work that considers spatial properties of nodes in treemap layouts. We introduce a new treemap layout algorithm that takes into account spatial properties of nodes. Next, we present the results of an extensive quantitative study in which we compare our treemap layout with two existing ones. We discuss these results to make clear how different approaches lead to different trade-offs in various metrics. To illustrate the problem, we consider the recursive split of a territory into administrative subdivisions. Such hierarchies can easily be represented by a treemap where the size of rectangles may encode any additive quantitative region attributes such as population. However, in the case of standard treemap layout algorithms, node position is not correlated to their known geolocations, undermining the ability of culturally trained map users to get a meaningful overview of the whole territory or to rapidly locate specific regions using their prior knowledge of geography. For geographically-trained users, spatial consistency is of prime importance when visualizing geolocated data. Finding geographic entities where the user expects them roughly to be, based on prior knowledge, is crucial to the external anchoring process described by Liu and Stasko, 3 whereby the user can couple internal and external representations of the data and the world. In turn, successful external anchoring is conducive to memory offloading since the user does not have to memorize the onscreen locations of the geolocated entities. When considering multivariate data including spatial properties, spatially consistent visualizations are expected to allow the user to answer questions related to abstract data trends and spatial data distribution, as well as specific attribute values attached to certain locations or regions. For instance, in a visual representation of the USA it is better to have the city of Seattle located at the top left and Houston at the bottom center, if the user is interested in examining their population or employment figures. Likewise, it is helpful to find the counties of California grouped at the bottom left corner of the treemap representation of the USA, if one is interested in how schools are distributed across counties in that state. In this paper, we group all treemap algorithms designed to include spatial constraints into the layout under the generic designation of Spatially Dependent Treemaps (SDT). When appropriate, we use specific algorithm names. Further author information: (Send correspondence to Mohammad Ghoniem) Mohammad Ghoniem: [email protected], Telephone: Benoît Otjacques: [email protected], Telephone:
2 2. RELATED WORK Trying to visualize the value of abstract attributes related to certain geographic subdivisions, within a two-dimensional space, is not new. Geographers have faced this challenge for centuries. The inclusion of spatial constraints into treemaps may then be placed to some extent in the substantial amount of research about cartograms. Tobler s review 4 of cartograms summarizes the work in this domain until 2004 (mainly carried out by geographers). Most of the time, geographic maps are the entry point of the thoughts about cartograms. Maps may be distorted 5 and/or graphically enriched to convey more than purely spatial information. The information visualization community has also studied this issue. In this case, however, maps are not necessarily the starting point of the conceptual thoughts. Various visualization techniques (e.g. graphs, treemaps) may be the artifact to be enhanced to display both spatial and non-spatial information. Recent research work combining information and geovisualization includes choropleth maps coordinated with squarified treemaps, 6 Ring maps, 7 and the overlay of information visualization elements on maps such as Necklace maps. 8 Our work belongs to the infovis approach with treemaps as the specific technique to study and to improve. As pointed out by Baudel and Broeksema, 9 treemaps can be characterized as the application of a space-filling rectangular layout to each level of the hierarchy. As such, our approach compares to cartograms only as far as the latter result in rectangular space-filling layouts. The study of treemap techniques reveals that many of them are motivated by a specific central idea that is used to both design the algorithm generating the treemap, and to define the appropriate metrics to assess the quality of the suggested improvements. For example, trying to keep the aspect ratio close to 1 gave birth to the squarified treemaps. It was found to lead to a more accurate assessment of quantitative data in rectangular layouts than alternate slicing-based treemap algorithms. 10 Therefore, it appears to be a suitable metric to compare alternate layouts of geolocated quantitative data. The study of previous Spatially Dependent Treemap (SDT) techniques 11, 12 shows a similar methodological approach. Besides the typical aspect ratio metric, geographically related measures, such as geolocation of nodes and spatial discontinuities, are included to inform the construction and the evaluation of a new SDT algorithm. We have followed a similar methodological approach and extended it with ideas from other fields. This related work section is therefore organized with this in mind. Limiting spatial discontinuities in treemap layouts is a first objective that has been pursued. Wood and Dykes 11 highlight a specific challenge of treemaps which influences their spatial consistency: they represent one-dimensional sequence data on a two-dimensional display space e.g. when nodes are sorted according to their weight. If not properly considered, this may generate treemaps where distance between rectangles is not consistently related to their order in the sequence. If the node order is meaningful, any discrepancy in the rectangle order in the treemap layout may prevent the user from understanding the data effectively. Further developing this idea, Wood and Dykes 11 propose the Spatially Ordered Treemaps technique (SOT) that takes into consideration the geolocation of nodes. They modify the squarified treemaps algorithm 13 so that nodes are placed based on their distance to the enclosing rectangle to be filled, rather than in the weight sequence order. In order to visually assess the potential spatial distortion generated by this approach, they overlay the treemap with Bézier arrows linking the original geolocation of each node to the rectangle representing it. They also conduct a computational study where SOT are compared to squarified treemaps and to HistoMaps 12 with respect to their average aspect ratio, average distance displacement and average angular displacement. Initially designed for flat cartogram generation, the SOT algorithm has also been extended to handle multilevel hierarchies for various application domains. 14 First introduced by Keim et al., HistoMaps (HM) have been used to analyze the provenance of traffic by integrating the geographic location of message sources into a treemap layout. 15 The geolocations are then structured into a continent/country hierarchy which can be visualized as a treemap. More formally, the HistoMaps layout may be seen as a variation of the pivot-by-middle treemap layout 16 where nodes are sorted according to their latitude, respectively longitude, depending on the layout direction, rather than sorting the nodes by their weight. Splitting the nodes in chunks at the middle of the related longitude/latitude range. Another difference consists in the fact that the point set is split in two chunks, rather than three (left, right and middle). In contrast, the Weighted Maps approach (WM) may be considered as a variation of pivot-by-split-size treemap layout 16 where nodes are also sorted according to their latitude, respectively longitude, depending on the layout direction. As opposed to pivot-by-middle, pivot-by-split-size chunks elements for further recursive layout based on the weight or size of the chunk, i.e. each chunk represents roughly equal weight. In the Weighted Maps algorithm the point set is split into k equally weighted chunks rather than three in the original pivot-by-split-size layout. The value of k is determined so that it leads to the most square top-level chunks, depending on the aspect ratio of the root node. It defaults to 2 on most computer monitors.
3 Our aim to include spatial constraints into treemaps relates to the substantial amount of research about cartograms. Tobler s extensive review of cartograms 4 points out adjacency preservation as another potentially useful quality criterion. That is, how many neighbors of a given area A on the geographic map are still neighbors of A in the treemap representation? Focusing on adjacency preservation, Buchin et al. 17 produce adjacency preserving treemaps. They use a graph structure to formulate the cartogram generation problem as a graph optimization problem. They define three types of adjacencies in two-level hierarchies: top-level adjacencies, internal bottom-level adjacencies and external bottom-level adjacencies. Their approach preserves the first two types of adjacencies, but compromises on the proportionality between node weights and rectangle areas in this process (this compromise is also known as cartographic error ). Another approach from the cartograms domain that comes close are rectangular cartograms, proposed by Speckmann and Kreveld. 18, 19 A generalization of rectangular cartograms are rectilinear cartograms for which an algorithm is proposed by de Berg et al. 20 Buchin et al. 21 propose further optimization strategies for rectangular cartograms, resulting in cartograms with low error rates with respect to cartographic error and adjacency preservation. Taking a pure treemap perspective, all three SDT approaches introduced earlier, and in particular the Weighted Maps, preserve weight-area proportionality, at the cost of some relative/absolute positional anomalies and some adjacency loss. Beyond flat cartograms, the SDT techniques, and in particular the Weighted Maps, can be applied recursively to handle multilevel hierarchies. Also, these cartogram approaches assign some of the layout space to seas and oceans in order to improve preserved adjacencies, the resulting layouts are therefore not space-filling strictly speaking. The rectilinear approach, in addition allows for non-rectangular shaped forms, which puts them even further away from the treemap domain. Moreover, one of the open problems mentioned is the computational complexity of these approaches. Their examples seem to illustrate this since they are using relatively small datasets (up to a couple of hundred nodes). Buchin et al. 21 for example, report running times as long as 207 minutes for a cartogram of the world, which consist of only about 200 nodes which seems to make the approach unsuitable for interactive exploration of large datasets such as the French communes (roughly 36,000 nodes). We, on the other hand, try to optimize treemap layouts or rectangular space-filling layouts in general, while taking into account some, but not all geographic constraints. Furthermore, the interactive exploration of the data is of high importance to us. The need to generate such large cartograms in interactive time, and even larger ones, may arise in a variety of situations where fine-grained geolocated data is available e.g., representing the power consumption of all city blocks in a country will result in millions of geolocated nodes. This means that the user should be able to switch between different variables in real-time and that maps can be updated on the fly as new data comes in. Still, we draw further inspiration of the cartogram line of work for the validation of our results. Other recent approaches 22, 23 explore the use of grid layouts of geographic subdivisions. For instance, Eppstein et al. 23 model the problem as a point set matching optimization problem. Their solution produces uniformly spaced grids that are suitable for displaying unweighted nodes. The general location of nodes with respect to the entire map (top, bottom, left, right), their pairwise adjacency, as well as their relative orientation are identified as relevant evaluation criteria. In contrast, the Weighted Maps are designed for weighted geolocated data and preserve weight-area proportionality. By sorting the data points by latitude, respectively longitude, the Weighted Maps manage to some extent to preserve the overall location of nodes on the map and their relative locations while keeping the average aspect ratio of rectangles at a minimum. 3. THE WEIGHTED MAPS ALGORITHM Like HistoMaps, the Weighted Maps algorithm is inspired by the pivot variants introduced by Bederson et al. 16 HistoMaps can be considered as a variation of the pivot-by-middle layout, where the pivot is not based on an element but on the middle value of the longitude or latitude range. Weighted Maps, on the other hand, can be considered as a variation of the pivot-by-split-size treemap algorithm insofar that, the point set is split in bins of equivalent weight. We adapt the original algorithm by sorting the point set at each recursion step, according to the longitude/latitude attribute depending on the layout direction. We further adapt the algorithm by choosing the number of bins based on the aspect ratio of the enclosing area. The Weighted Maps algorithm calculates the weight that would result in the squarest chunk, and adds items until the closest approximation of this weight is reached. This is unlike HistoMaps, which uses a fixed 2-bin split. Additionally, like HistoMaps, the Weighted Maps layout does not have an explicit pivot element. Intuitively, if the display area is split into squarish chunks, this may improve the average aspect ratio (which is conducive to a more accurate assessment of quantitative data 10 ). The number of chunks depends on the global aspect ratio of the display area. A display space of aspect ratio R will generate k = R chunks (or k = R, whichever is closer to R) at the
4 // The main layout algorithm // WM configuration function layout(t, from, to) function size(ti) availablespace = new Rect(0,0, state.width, state.height) return ti.population currentchunk = new Chunk(phrase(null), availablespace) currentfrom = from function order(p) result = [currentchunk] if (state.a.w > state.a.h) T = order(t, from, to) return (order P by x coord.) prevscore = -inf else return (order P by y coord.) for (i = from; i < to; ++i) itemsize = size(t[i]) function score(chunk, itemsize) curscore = score(currentchunk, itemsize) nbchunks = round( if (curscore < prevscore) max(state.a.w, state.a.h) / currentchunk.reduce(availablespace) min(state.a.w, state.a.h)) nbchunks = (nbchunks < 2)? 2 : nbchunks if (recurse(currentchunk)) if (state.chunks.length == nbchunks) recursivechunks = layout(t, currentfrom, to) return MAX_SCORE else if (!recursivechunks.isempty()) Cpref = state.a.w * state.a.h / nbchunks result.pop() Cnew = chunk.area + itemsize result.append(recursivechunks) return Cpref - Cnew currentchunk = new Chunk(phrase(currentChunk), availablespace) result.append(currentchunk) currentfrom = i prevscore = score(currentchunk, itemsize) else prevscore = curscore currentchunk.additem(itemsize) function phrase(chunk) if state.a.w > state.a.h return (Left, Left_to_Right) else return (Top, Top_to_Bottom) function recurse(chunk) return chunk.itemcount > 1 if (currentfrom!= from && recurse(currentchunk)) recursivechunks = layout(t, currentfrom, to) if (!recursivechunks.isempty()) result.pop() result.append(recursivechunks) return result Listing 1: The generic layout algoritm by Baudel and Broeksema 9 (Left), and the WM configuration for it (right). Configuration points of the generic algorithm are marked in red. State is considered to be a global variable that is updated by the generic algorithm as required. top level, each being further split in two chunks at the next recursion levels. These screen-space chunks correspond to bins including the closest points preserving weight-area proportionality. On most computer monitors, a full-screen WM display will result in a binary space partitioning tree. However, in the case of elongated countries like Argentina or Portugal, the layout may be deemed more plausible if it had the same global aspect ratio as the map representation of the country. For the sake of reproducibility, we express the WM algorithm as a configuration of the generic treemap algorithm by Baudel and Broeksema, 9 which we also used to implement our versions of HistoMaps and SOT. In listing 1, we give the pseudocode of the generic algorithm and of the five functional dimensions of the generic algorithm: order, size, chunk (implemented by means of score), phrase and recurse. In the case of multilevel hierarchies, the algorithm is applied level by level to lay out top-level nodes first, then to lay out children nodes further down in the hierarchy in their parent space. 4. EVALUATION In this section, we present a computational study comparing the Weighted Maps algorithm to two other spatially dependent treemap algorithms, namely the Spatially Ordered Treemaps (SOT) by Wood and Dykes 11 and the HistoMaps (HM) by Mansmann et al. 12 In the body of visualization evaluation work, this study belongs to the Algorithm Performance (AP) category of Isenberg et al. s taxonomy. 24 As a preliminary step, we reimplemented both the SOT and HM algorithms as described in their respective papers. 11, 12 In particular, the HM implementation evaluated in this work uses a two-bin partitioning schema, as in the original work by Mansmann et al. 4.1 Metrics Based on the previously discussed related work, we use the following metrics for our experimental study:
5 1. average aspect ratio (less is better) defined as: r = 1 n n i=1 r i, where n is the number of leaf nodes and r i the aspect ratio of the ith leaf node; 2. average distance displacement (less is better) defined as : d = n i=1 d i n, where n is the number of nodes, A A root is the area root of the root node, and d i is the Euclidean distance between each node s treemap centroid and its affine transformed geographic location; average angular displacement (less is better) defined as: θ = 1 n 2 n i=1 n j=1 arccos ( ui j u i j ) v i j v i j, where n is the number of nodes, u i j is the vector between each leaf node and each of its sibling leaves in treemap space, and v i j is the same vector in geographic space; average adjacency preservation (more is better) defined as: a = 1 n n i=1 a i, where n is the number of leaf nodes and a i the ratio of preserved neighbors of the ith leaf node; 5. average fragmentation (less is better) defined as: f = 1 n p np i=1 f i, where n p is the number of nodes at the parent level and f i is the number of fragments of the ith node at the parent level. The first three metrics are used by Wood and Dykes to compare SOT to HM. Hence, our results can be compared to theirs directly. The fourth metric allow us to compare the three algorithms with respect to adjacency preservation and to existing adjacency optimized techniques. Adjacency preservation is used by van Kreveld and Speckmann 19 to validate their rectangular cartogram approach. Even though the Weighted Maps (or any of the SDTs for that matter) are oblivious of adjacency information, adjacency preservation is still included as a quality metric in our computational study. This gives us an additional angle to compare SDT approaches with respect to their cartographic properties. In addition it lets us compare SDT approaches to some extent with cartogram approaches. Lastly, the fifth metric measures the fragmentation produced at the parent node level when a flat layout is computed at the leaf level discarding any knowledge of the hierarchy e.g., in the case of the USA the fragmentation rate is computed at the state level when the layout algorithm is applied on the flat county level. Such fragmentation can be regarded as an extreme manifestation of adjacency loss. We are not aware of other work using this metric. 4.2 Use Cases In order to compare WM to SOT and HM, we used 18 real datasets (10 flat and 8 nested) concerning the USA and France ranging from tens of nodes to tens of thousands. These datasets have significant differences both in how the points are geographically spread, as well as in the distribution of weights. This is illustrated in Figure 1, which shows the point distribution for each of the point sets we used. We clearly see differences in how points are spatially distributed. For example, note how counties in the USA are almost normally distributed with respect to the Y-location, while in France at the canton level, the spread is more uniform, with a peak at It also shows two weight distributions for the USA at the country level. Like with the point set, the weight distributions also are significantly different. We observed similar significant differences for weight distributions at other levels as well, but lack the space to display all plots. (a) Distribution of the different used point sets (b) Weight distributions on the USA county level Figure 1: Various distributions related to the datasets used for evaluation.
6 For each dataset, the aspect ratio of the root node was set to match that of the bounding rectangle of the point set. The results obtained for the five metrics regarding these datasets are reported in Tables 1 to 6. A total of 54 table rows report the results of 54 distinct experimental settings identified by a configuration number in the leftmost column. Table rows have been grouped three at a time as they involve a common dataset (i.e. a common combination of point set and weighting attribute), while alternating through the three layout algorithms at hand. For statistical reliability purposes, we also report the standard deviation associated to all reported averages. For every dataset, we ran the paired two-tailed Student s t-test to assess the statistical significance of pair-wise differences between the three layout algorithms with p-value We ran this test with respect to each evaluation metric. When statistical significance could be ascertained at this level of confidence, we reported the best score in bold. When the best score and the first runner-up could not be distinguished with the required confidence, they were both reported in bold provided that both of them could individually be distinguished reliably from the last. In all other cases, no emphasis was put on the scores USA First, we examined the contiguous USA population and land area at the state level and at the county level. State and county population statistics were extracted from the geonames.org website. 25 The adjacency preservation metric was computed using the county adjacency graph provided by the US Census Bureau. 26 The resulting flat treemaps have either 49 nodes in the case of states (48 states + District of Columbia), or 3,109 nodes in the case of counties. They can be seen in Figures 2 and 3 respectively. The corresponding statistics appear in the top 6 rows of Tables 1 and 2. The three layout algorithms can also be applied recursively to construct a nested treemap for the two-level hierarchy made of 49 top-level nodes and 3,109 leaves. The corresponding nested treemaps snapshots can be seen in Figure 4. The related statistics appear in rows 7 to 9 of Table 1 and rows 16 to 18 of Table 2, where 3,109/49 in the second column indicates the tree structure starting with the number of leaves, followed by the number of nodes at their parent level, and so forth up to the root level. USA Population in 2010 (49 states and 3,109 counties) # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 1 HM % 8.6% % 24.4% /1 WM % 9.3% % 23.6% SOT % 9.7% % 31.4% HM % 7.5% % 22.2% ,109/1 WM % 7.3% % 20.8% SOT % 10.5% % 16.9% HM % 9.8% % 22.0% ,109/49/1 WM % 9.4% % 19.9% SOT % 9.6% % 20.1% 1 0 Table 1: This table compares the flat and nested versions of the HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) on USA Population data. In the nested case (bottom 3 rows), fragmentation is inexistent by construction. USA Land Area (49 states and 3,109 counties) # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 10 HM % 5.7% % 20.7% /1 WM % 7.7% % 16.8% SOT % 9.2% % 27.7% HM % 5.2% % 19.4% ,109/1 WM % 5.4% % 18.4% SOT % 9.6% % 18.2% HM % 6.7% % 21.1% ,109/49/1 WM % 7.6% % 19.1% SOT % 9.7% % 20.6% 1 0 Table 2: This table compares the flat and nested versions of the HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) on USA land area data. In the nested case (bottom 3 rows), fragmentation is inexistent by construction.
7 (a) HistoMaps (b) Weighted Maps (c) Spatially Ordered Treemap Figure 2: The USA population in 49 states according to (a) the HistoMaps layout, (b) the Weighted Maps layout and (c) the Spatially Ordered Treemaps layout. The three subfigures correspond to the top three rows in Table 1 respectively.
8 (a) HistoMaps (b) Weighted Maps (c) Spatially Ordered Treemap Figure 3: The USA population in 3,109 counties according to (a) the HistoMaps layout, (b) the Weighted Maps layout and (c) the Spatially Ordered Treemaps layout. Colors encode state membership consistently with Figure 2. The SOT layout creates strip patterns and severe state fragmentation. See rows 4 to 6 in Table 1 for the related statistics.
9 (a) Weighted Maps (b) Spatially Ordered Treemap Figure 4: The USA population in 3,109 counties according to the nested version of (a) the WM layout and (b) the SOT layout. The state-level rectangles are the same as in Figure 2. See rows 8 and 9 in Table 1 for the related statistics. A close inspection of Tables 1 and 2 reveals that SOT never outperforms HM and WM except once: with respect to the aspect ratio metric in the nested treemap case regarding the land area dataset (see the number in bold in row 18 of Table 2). With respect to small flat treemaps, the results either lack statistical significance for a winner to clearly stand out (see the top three rows in Table 1) or, there are ties between HM and WM on various metrics depicted by multiple values in bold faced characters in the same group (see the top three rows in Table 2). Dealing with large flat treemaps (see the middle three rows in Tables 1 and 2), WM ranks first for aspect ratio and linear displacement, while HM ranks first for adjacency preservation. Ties occur between HM and WM as well as occasional wins for HM regarding angular displacement and parent-level fragmentation. Looking closely at Figure 2, one can see the underlying properties and flaws of the three layout approaches. For instance, the SOT layout (Figure 2 (c)) is characterized by strip patterns due to the stacking strategy common to all squarified treemap based approaches. Hence, vertical strips occupy the left half of the representation, then horizontal and vertical strips alternate in the right half. Some positional anomalies can be found in the SOT representation, such as New Mexico being placed to the North of Colorado, or having Rhodes Island and Maine in the bottom right corner next to Florida. Similarly, taking a close look at the HM layout (Figure 2 (a)), one can see the top-level split in the middle of the longitude range as a vertical divide to the right of Texas. A second-level split can be seen vertically to the right of Florida on the right-hand side and horizontally above Texas and New Mexico on the left-hand side. In the HM representation, examples of positional flaws include New Mexico being placed to the North of Arizona, or Oregon being placed to the West of Washington state. Similar remarks can be made concerning the WM layout (Figure 2 (b)), one can see a vertical top-level divide in the middle of the representation to the right of Wisconsin and Illinois, followed by horizontal second-level splits above California on the left-hand side and above Indiana and Florida on the right-hand side. In the WM representation, positional flaws include the fact that Alabama is placed to the Northwest of Tennessee. In Figure 3, flat treemap representations of the 3,109 US counties are displayed as generated by the three layout algorithms. The strip patterns of SOT (Figure 3 (c)) are even more obvious and result in severely ragged state contours and high state fragmentation (13.5 fragments per state on average as reported in row 6 of Table 1). In the HM layout (Figure 3 (a)) and WM layout (Figure 3 (b)), state fragmentation is also visible, but is rather mild (2.2 and 3.0 fragments per state on average respectively as reported in rows 4 and 5 of Table 1). Previous remarks on positional anomalies still apply. In Figure 4, we show the 2-level treemap representation of the USA county population as generated by the WM and SOT layouts. Obviously, at the state level the space is subdivided exactly as in Figure 2. Further down, the leaf/county nodes are laid out using an extra recursion of the same layout algorithms. By keeping the branching factor of each subtree in the order of a hundred nodes, the nested version of SOT is much more readable than the flat version in Figure 3 (c) France With 35,955 communes, metropolitan France is by far the first European country by the number of communes. Hence, it qualifies as a good benchmark for spatial layout algorithms. The number of nodes at the communes level is between one and two orders greater compared to the USA states and counties datasets respectively. At the upper levels of its administrative subdivisions, metropolitan France is divided in 21 regions not including Corsica, which are in turn subdivided
10 Figure 5: The 2012 population in 35,955 French communes according to the flat WM layout. The black contours delimit the 21 top-level regions, making region-level fragmentation visible. Within a region, different colors encode different departments. See row 26 in Table 3 for the related statistics. in 94 departments. Hence, the French departments data has the same scale as the USA data at the state level. Further, metropolitan France is subdivided into 3,666 cantons, which is quite comparable to the 3,109 USA counties scale-wise. We considered the 2012 population statistics according to the most recent data published by the IGN 27 as well as the land area of the communes. Upper-level population and land area data have been computed by mere aggregation bottom-up from the communes data. The aspect ratio of the root node was set to match the bounding rectangle of the point set at hand. The results of the flat layouts are summarized in Tables 3 and 4. Similar to the USA use case, the results lack statistical significance to ascertain the superiority of any of the three algorithms for the layout of the 94 French departments. HM and WM make a tie concerning linear and angular displacement at the department level using land area. The SOT layout ranks last for larger flat treemaps at the canton and commune levels, using both the population and the land area for node weighting. At the canton level, WM ranks first with respect to average aspect ratio and average linear displacement using both population and land area for weighting, while HM ranks first regarding adjacency preservations. Ties occur between WM and HM with respect to angular displacement and parent-level fragmentation. At the commune level, WM also ranks first with respect to average aspect ratio, it ranks first with respect to average linear displacement using population data and
11 Figure 6: The 2012 population in 35,955 French communes according to the flat SOT (left) and HM (right) layouts. The black contours delimit the 21 top-level regions. Within a region, different colors encode different departments. See rows 25 and 27 in Table 3 respectively for the related statistics. # Tree Structure Layout Flat Treemap Layouts of the French Population in 2012 Aspect Ratio Linear Displ. Angular Displ. Adjacency mean stdev mean stdev mean stdev mean stdev Fragmentation mean stdev 19 HM % 7.1% % 19.2% /1 WM % 6.6% % 21.7% SOT % 8.6% % 24.0% HM % 6.5% % 22.8% ,665/1 WM % 9.4% % 22.3% SOT % 10.3% % 17.4% HM % 6.9% % 27.7% ,955/1 WM % 9.5% % 26.7% SOT % 11.9% % 13.5% Table 3: This table compares HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) for the generation of flat cartograms on the French metropolitan Population in 2012 within the 94 departments, the 3,665 cantons and the 35,955 communes respectively. # Tree Structure Layout Flat Treemap Layouts of the French Land Area Aspect Ratio Linear Displ. Angular Displ. Adjacency mean stdev mean stdev mean stdev mean stdev Fragmentation mean stdev 28 HM % 4.9% % 18.6% /1 WM % 5.3% % 18.9% SOT % 6.0% % 20.4% HM % 6.5% % 20.3% ,665/1 WM % 6.4% % 19.1% SOT % 10.5% % 18.2% HM % 6.5% % 30.8% ,955/1 WM % 6.3% % 29.2% SOT % 11.2% % 14.3% Table 4: This table compares HistoMaps (HM), Weighted Maps (WM) and Spatially Ordered Treemaps (SOT) for the generation of flat cartograms on the French land area within the 94 departments, the 3,665 cantons and the 35,955 communes respectively.
12 ranks second in the case of land area data. HM ranks first with respect to angular displacement, adjacency preservation and parent-level fragmentation using both the population and the land area data. Figure 5 shows the flat treemap generated by the Weighted Maps layout for the French communes. Figure 6 shows the corresponding flat treemap layouts generated by SOT (left) and HM (right). The SOT treemap is severely affected by thin strip patterns and parent-level fragmentation. In contrast to the WM layout in Figure 5, HM seems to generate elongated rectangles in Figure 6 around large cities such as Paris, Marseille and Toulouse, but both layouts look quite similar overall. The results of the nested layouts are summarized in Tables 5 and 6. The number of leaves and parent nodes in each experimental setting is indicated in the second column. For France, we study the performance of all three layouts for 2-, 3- and 4-level hierarchies. It is worth noting that the average aspect ratio of the SOT layout becomes comparable to that of HM and WM. Occasionally, nested SOT ranks first for average aspect ratio when dealing with land area distribution only (see the last row in Tables 2 and 6). We don t show the nested layouts of France in this paper in the benefit of space. Nested Treemap Layouts of the French Population in 2012 # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 37 HM % 7.2% % 21.7% /21 WM % 9.4% % 22.1% SOT % 15.3% % 22.6% HM % 8.0% % 21.9% ,665/94/21 WM % 10.1% % 21.3% SOT % 13.6% % 21.4% HM % 7.8% % 26.3% ,955/3,665/94/21 WM % 9.8% % 25.8% SOT % 14.0% % 25.0% 1 0 Table 5: This table compares the nested versions of the HistoMaps, Weighted Maps and Spatially Ordered Treemaps algorithms on multi-level hierarchies showing the French population within the 21 regions, 94 departments, the 3,665 cantons and the 35,955 communes respectively. Fragmentation is inexistent by construction. Nested Treemap Layouts of the French Land Area # Tree Structure Layout Aspect Ratio Linear Displ. Angular Displ. Adjacency Fragmentation mean stdev mean stdev mean stdev mean stdev mean stdev 46 HM % 6.3% % 20.6% /21 WM % 6.2% % 19.4% SOT % 9.1% % 22.0% HM % 7.4% % 21.4% ,665/94/21 WM % 7.5% % 20.2% SOT % 9.9% % 20.6% HM % 7.5% % 27.1% ,955/3,665/94/21 WM % 7.6% % 26.4% SOT % 10.0% % 25.3% 1 0 Table 6: This table compares the nested versions of the HistoMaps, Weighted Maps and Spatially Ordered Treemaps algorithms on multi-level hierarchies showing the French land area within the 21 regions, 94 departments, the 3,665 cantons and the 35,955 communes respectively. Fragmentation is inexistent by construction. 5. DISCUSSION We place our work in the line of treemap algorithms, thus the algorithm does not allow for any cartographic error. That is, the area of each rectangle in the final layout is exactly proportional to the weight it represents. As such, we cannot expect to get as good results for geographic metrics as approaches based on or inspired by rectangular cartograms. 18 Bederson et al. observed 16 two problems with cluster and squarified treemap layouts: changes in data can cause dramatic discontinuities in produced layouts, and these algorithms do not take into account explicit order information that is part of the data. To address these problems they propose several layout algorithms, among which the pivot-based layouts (i.e. pivot-by-size, pivot-by-middle, and pivot-by-split-size). Given that these algorithms explicitly address the order of data, it is not surprising that it has inspired the creation of both the Weighted Maps algorithm and the HistoMaps algorithm. We started our work with two hypotheses:
13 1. Using land area as weight leads to minimal error with respect to geographic metrics (displacement, adjacency preservation, and fragmentation) 2. Degradation of adjacency preservation is minimized when overall relative location constraints (linear and angular displacement) are met. Unsurprisingly, our first hypothesis seems to be correct given the data we have tested. When looking at USA data in Tables 1 and 2, we see that all algorithms perform better with respect to linear and angular displacement, adjacency preservation, and fragmentation in the land area case. One exception is SOT, for which angular displacement degrades in the States and the States/Counties cases. We see similar results for the French datasets. For the second hypothesis the evidence is not strong enough to make strong claims about it. When we look at each case with the highest adjacency preservation, we see that it typically also has lowest values for linear displacement and angular displacement. For the USA data, these lowest values are significant in 75% of the cases. For the France data, we see a similar pattern, though lowest values account for 58% of the cases. The results concerning the USA population and the land distribution of the French departments (see the top 3 rows of Tables 1 and 4) are consistent with those published by Wood and Dykes 11 with respect to the average aspect ratio, average linear displacement and average angular displacement. This confirms that our implementation of SOT is correct, and that our implementation of HM is also comparable to theirs. Hence, the comparison of the Weighted Maps algorithm to both the SOT and HM algorithms regarding other (larger) datasets and additional quality metrics, namely adjacency preservation and fragmentation, can complete and be interpreted in the light of the previous study by Wood and Dykes. The results also show that the gap between SOT, HM and WM is rather small for small one-level hierarchies having up to a hundred nodes, all metrics considered. This is very much the case with the 2010 population in 49 USA states. The resulting treemaps can be seen in Figure 2. A careful inspection reveals that all algorithms have some localized flaws. For example, SOT places New Mexico to the north of Colorado, while WM places Utah to the north of Idaho. WM and HM may seem closer to reality regarding north eastern states (Vermont, Maine, New Hamshire, Massachussetts, New York) than SOT, while the latter may be better with north western states. HM performs very similar to WM. The treemaps at this level produced by SOT, HM and WM are still visually plausible. Subtle discrepancies in relative rectangle positions may be accounted for by the choice of anchor/reference points on the map representation. More precisely, the question is what is the geolocation of a top-level administrative entity, such as a state? Is it the centroid of the geolocations of its sub-entities, or some other location such as the state capitol, which may be far away from the centroid? Depending on the choice of anchor points, the resulting relative rectangle locations in the treemap layout may vary. Regarding small one-level hierarchies, there are only small differences on average between SOT, HM, and WM, however statistically significant they may be. For example, the difference in average angular displacement between the three algorithms does not exceed 12 degrees, which is objectively small. Also, the differences in normalized average distance displacement never exceed 10%. The average aspect ratio seems to resist this conclusion concerning small hierarchies in one instance: Weighting the USA states with land area figures degrades the average aspect ratio produced by both the WM and SOT layouts significantly (see Table 2). Only HM, is not affected as strongly with respect to aspect ratio in this case. For both SOT and WM, the standard deviation figures show very high dispersion, indicating the presence of large outliers and that the average is not a reliable statistical measure in this case. The average values reported in this case are affected by the presence of a single outlier value of 184 for WM, and two outlier values at 21.8 and 68.5 for SOT. Like HM, WM becomes consistently much more competitive than SOT for the flat layout of thousands of nodes (the average aspect ratio improvement ranges between 2x and 15x). This is due to the fact that SOT is based on the squarified treemap algorithms which places nodes in vertical or horizontal strips. These strips become very thin when individual node weights become small compared to the total sum of weights at the parent/root level. This issue appears clearly in Figure 3 regarding the 2010 American population laid out at the county level (3,109 leaf nodes) all five metrics considered. It is even more severe in Figure 6 (left), regarding the 2010 French population laid out at the commune level (35,955 leaf nodes). In this last case, the aspect ratio standard deviation indicates a lot of dispersion. We have attempted to improve the SOT algorithm by optimizing the average aspect ratio per strip rather than that of the smallest item in the original algorithm. While this strategy improves the average aspect ratio of SOT, it does not mitigate the fundamental weakness of this algorithm due to strip construction: the improved SOT still lags behind WM in terms of average aspect ratio when dealing with large one-level hierarchies.
14 In the case of multilevel hierarchies, nested SOT has a much smaller gap with HM and WM, since, at each recursion, the number of nodes to be laid out remains close to the comfort zone of SOT. This can be seen in most use cases in the bottom three rows of Tables 1, 2 and 5, where nested WM, nested HM, and nested SOT are rather close, except for the case of the USA states 2010 population anomaly explained earlier. As the branching factor increases, both nested HM and nested WM tend to stand ahead of nested SOT as with the 3,666 French cantons and the 35,955 French communes. Following Eppstein et al., 23 we also evaluated HM, WM and SOT with respect to adjacency preservation, even though they were not designed with this goal in mind. As reported in the tables, WM achieves 40% to 65% adjacency preservation values mostly greater than 50%, while SOT scores range between 8% and 52% with values mostly less than 50%. It is important to note that WM scores are rather stable close to 50% regardless of the dataset at hand, while SOT performance is heavily degraded as the scale increases reaching a median value of 0% adjacency preservation for the French communes (i.e. half the nodes have no adjacencies preserved at all). However, it should be noted that when it comes to adjacency preservation, HM outperforms WM in most cases. Unsurprisingly, all three algorithms fall behind the adjacency optimized approach of Eppstein et al., which achieves 75% adjacency preservation on small hierarchies. In their work, Eppstein et al. showcase their approach on the 49 contiguous USA states and on the French departments. Based on the high time complexity reported in their paper, the layout of bigger hierarchies like the French communes seems intractable. Concerning large one-level hierarchies, such as the US counties and the French communes, we measured the fragmentation rate of top-level groups (e.g. USA states and French departments) that results from applying the layout algorithm on the leaf level directly. The less fragmentation, the better the algorithm. In this regard, WM appears to be always much better than SOT (3x to 10x less fragmentation) and is similar to HM. Once again, parent-level fragmentation is aggravated by the strip construction strategy underlying SOT, whereas the recursive space partitioning strategy of WM increases the likelihood of nodes allocated to different partitions to remain contiguous when their respective geolocations are close. Finally, with respect to geographic features, HistoMaps outperform Weighted Maps most of the time. The differences are mostly not very large, but often statistically significant for larger datasets. This begs the question if Weighted Maps is an improvement at all. In retrospect, we have found that, in some sense, we have reproduced results from the Ordered and Quantum treemaps paper. 16 When we compare HistoMaps and Weighted Maps, we see that the latter always gives better aspect ratios. Recall that both of these algorithms have drawn inspiration from the pivot layouts; HistoMaps from the pivot-by-middle and Weighted Maps from the pivot-by-split-size. Looking at the results for the pivot layouts, we also see that in general, pivot-by-split gives better aspect ratios than pivot-by-middle. This difference could be an explanation for the better performance of HistoMaps with respect to geographical features: elongated rectangles allow for more neighbors. The trade-off that has been shown in Bederson et al. s paper is between aspect ratio and change (pivot-by-split-size is more sensitive to changes in the data). So we could expect a similar trade-off between HistoMaps and Weighted Maps as well, though we have not tested this. 6. CONCLUSION In this work, we presented the Weighted Maps, a spatially aware treemap algorithm. It gives consistently better aspect ratios than existing algorithms for data sets that consist of large flat hierarchies, and behaves equally well otherwise. Moreover, the output of Weighted Maps tends to be aesthetically pleasing for large flat hierarchies, while other existing algorithms, e.g. Spatially Ordered Treemaps, are undermined by inherent strip patterns. Weighted Maps can be considered a trade-off algorithm for HistoMaps, where the trade-off is between aspect ratio and geographic correctness. Future work includes resolving the fragmentation problem observed earlier and further assessment of Weighted Maps through user studies and simulated weight distributions. REFERENCES [1] Schulz, H.-J., Treevis.net: A tree visualization reference, IEEE Computer Graphics and Applications 31(6), (2011). [2] Shneiderman, B., Tree visualization with tree-maps: 2-d space-filling approach, ACM Trans. Graph. 11, (Jan. 1992). [3] Liu, Z. and Stasko, J., Mental models, visual reasoning and interaction in information visualization: A top-down perspective, Visualization and Computer Graphics, IEEE Transactions on 16(6), (2010).
15 [4] Tobler, W., Thirty five years of computer cartograms, Annals of the Association of American Geographers 94(1), (2004). [5] Dorling, D., Barford, A., and Newman, M., Worldmapper: The world as you ve never seen it before, Visualization and Computer Graphics, IEEE Transactions on 12(5), (2006). [6] Jern, M., Rogstadius, J., and Astrom, T., Treemaps and choropleth maps applied to regional hierarchical statistical data, in [Information Visualisation, th International Conference], (2009). [7] Zhao, J., Forer, P., and Harvey, A. S., Activities, ringmaps and geovisualization of large human movement fields, Information Visualization 7(3-4), (2008). [8] Speckmann, B. and Verbeek, K., Necklace maps, Visualization and Computer Graphics, IEEE Transactions on 16(6), (2010). [9] Baudel, T. and Broeksema, B., Capturing the design space of sequential space-filling layouts, Visualization and Computer Graphics, IEEE Transactions on 18(12), (2012). [10] Kong, N., Heer, J., and Agrawala, M., Perceptual guidelines for creating rectangular treemaps, Visualization and Computer Graphics, IEEE Transactions on 16, (Nov 2010). [11] Wood, J. and Dykes, J., Spatially ordered treemaps, Visualization and Computer Graphics, IEEE Transactions on 14(6), (2008). [12] Mansmann, F., Keim, D., North, S., Rexroad, B., and Sheleheda, D., Visual analysis of network traffic for resource planning, interactive monitoring, and interpretation of security threats, Visualization and Computer Graphics, IEEE Transactions on 13(6), (2007). [13] Bruls, M., Huizing, K., and Wijk, J., Squarified treemaps, in [Data Visualization 2000], Leeuw, W. and Liere, R., eds., Eurographics, 33 42, Springer Vienna (2000). [14] Slingsby, A., Dykes, J., and Wood, J., Rectangular hierarchical cartograms for socio-economic data, Journal of Maps 6(1), (2010). [15] Keim, D. A., Mansmann, F., Panse, C., Schneidewind, J., and Sips, M., Mail explorer - spatial and temporal exploration of electronic mail, in [Proceedings of the Seventh Joint Eurographics / IEEE VGTC Conference on Visualization], EUROVIS 05, , Eurographics Association, Aire-la-Ville, Switzerland, Switzerland (2005). [16] Bederson, B. B., Shneiderman, B., and Wattenberg, M., Ordered and quantum treemaps: Making effective use of 2d space to display hierarchies, ACM Trans. Graph. 21, (Oct. 2002). [17] Buchin, K., Eppstein, D., Löffler, M., Nöllenburg, M., and Silveira, R. I., Adjacency-preserving spatial treemaps, in [Algorithms and Data Structures], Dehne, F., Iacono, J., and Sack, J.-R., eds., Lecture Notes in Computer Science 6844, , Springer Berlin Heidelberg (2011). [18] Speckmann, B., Kreveld, M. V., and Florisson, S., A linear programming approach to rectangular cartograms, in [12th International Symposium on Spatial Data Handling], Riedl, A., Kainz, W., and Elmes, G. A., eds., , Springer Berlin Heidelberg (2006). [19] van Kreveld, M. and Speckmann, B., On rectangular cartograms, Computational Geometry 37, (Aug. 2007). [20] De Berg, M., Mumford, E., and Speckmann, B., Optimal bsps and rectilinear cartograms, International Journal of Computational Geometry & Applications 20(02), (2010). [21] Buchin, K., Speckmann, B., and Verdonschot, S., Evolution strategies for optimizing rectangular cartograms, Geographic Information Science 7478(639), (2012). [22] Wood, J., Badawood, D., Dykes, J., and Slingsby, A., Ballotmaps: Detecting name bias in alphabetically ordered ballot papers, Visualization and Computer Graphics, IEEE Transactions on 17(12), (2011). [23] Eppstein, D., van Kreveld, M., Speckmann, B., and Staals, F., Improved grid map layout by point set matching, in [Visualization Symposium (PacificVis), 2013 IEEE Pacific], (2013). [24] Isenberg, T., Isenberg, P., Chen, J., Sedlmair, M., and Moller, T., A systematic review on the practice of evaluating visualization, Visualization and Computer Graphics, IEEE Transactions on 19(12), (2013). [25] Geonames, The geonames geographic database. (2012). Accessed: [26] US Census Bureau, County adjacency file. (2013). Accessed: [27] Institut National de l Information Géographique et Forestière, GEOFLA R. (2012). Accessed:
Agenda. TreeMaps. What is a Treemap? Basics
Agenda TreeMaps What is a Treemap? Treemap Basics Original Treemap Algorithm (Slice-and-dice layout) Issues for Treemaps Cushion Treemaps Squarified Treemaps Ordered Treemaps Quantum Treemaps Other Treemaps
Ordered Treemap Layouts
Ordered Treemap Layouts Ben Shneiderman Department of Computer Science, Human-Computer Interaction Lab, Insitute for Advanced Computer Studies & Institute for Systems Research University of Maryland [email protected]
Space-filling Techniques in Visualizing Output from Computer Based Economic Models
Space-filling Techniques in Visualizing Output from Computer Based Economic Models Richard Webber a, Ric D. Herbert b and Wei Jiang bc a National ICT Australia Limited, Locked Bag 9013, Alexandria, NSW
HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data
P a g e 77 Vol. 9 Issue 5 (Ver 2.0), January 2010 Global Journal of Computer Science and Technology HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data Abstract- The HierarchyMap
VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills
VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical
Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007
Hierarchical Data Visualization Ai Nakatani IAT 814 February 21, 2007 Introduction Hierarchical Data Directory structure Genealogy trees Biological taxonomy Business structure Project structure Challenges
Hierarchy and Tree Visualization
Hierarchy and Tree Visualization Definition Hierarchies An ordering of groups in which larger groups encompass sets of smaller groups. Data repository in which cases are related to subcases Hierarchical
Squarified Treemaps. Mark Bruls, Kees Huizing, and Jarke J. van Wijk
Squarified Treemaps Mark Bruls, Kees Huizing, and Jarke J. van Wijk Eindhoven University of Technology Dept. of Mathematics and Computer Science, P.O. Box 513, 500 MB Eindhoven, The Netherlands emailfkeesh,
TEXT-FILLED STACKED AREA GRAPHS Martin Kraus
Martin Kraus Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate
Hierarchical Data Visualization
Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Voronoi Treemaps in D3
Voronoi Treemaps in D3 Peter Henry University of Washington [email protected] Paul Vines University of Washington [email protected] ABSTRACT Voronoi treemaps are an alternative to traditional rectangular
TOP-DOWN DATA ANALYSIS WITH TREEMAPS
TOP-DOWN DATA ANALYSIS WITH TREEMAPS Martijn Tennekes, Edwin de Jonge Statistics Netherlands (CBS), P.0.Box 4481, 6401 CZ Heerlen, The Netherlands [email protected], [email protected] Keywords: Abstract:
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
Vector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
Visualizing Large Graphs with Compound-Fisheye Views and Treemaps
Visualizing Large Graphs with Compound-Fisheye Views and Treemaps James Abello 1, Stephen G. Kobourov 2, and Roman Yusufov 2 1 DIMACS Center Rutgers University {abello}@dimacs.rutgers.edu 2 Department
Visual Data Mining with Pixel-oriented Visualization Techniques
Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 [email protected] Abstract Pixel-oriented visualization
Treemaps with bounded aspect ratio
technische universiteit eindhoven Department of Mathematics and Computer Science Master s Thesis Treemaps with bounded aspect ratio by Vincent van der Weele Supervisor dr. B. Speckmann Eindhoven, July
Treemaps for Search-Tree Visualization
Treemaps for Search-Tree Visualization Rémi Coulom July, 2002 Abstract Large Alpha-Beta search trees generated by game-playing programs are hard to represent graphically. This paper describes how treemaps
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Regular TreeMap Layouts for Visual Analysis of Hierarchical Data
Regular TreeMap Layouts for Visual Analysis of Hierarchical Data Tobias Schreck Daniel Keim Florian Mansmann Databases and Visualization Group University of Konstanz, Germany {schreck,keim,mansmann}@inf.uni-konstanz.de
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
7. Hierarchies & Trees Visualizing topological relations
7. Hierarchies & Trees Visualizing topological relations Vorlesung Informationsvisualisierung Prof. Dr. Andreas Butz, WS 2011/12 Konzept und Basis für n: Thorsten Büring 1 Outline Hierarchical data and
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
Visual Analysis of Network Traffic for Resource Planning, Interactive Monitoring, and Interpretation of Security Threats
Visual Analysis of Network Traffic for Resource Planning, Interactive Monitoring, and Interpretation of Security Threats by Florian Mansmann, Daniel A. Keim, Stephen C. North, Brian Rexroad, and Daniel
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
Visualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION
A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION Zeshen Wang ESRI 380 NewYork Street Redlands CA 92373 [email protected] ABSTRACT Automated area aggregation, which is widely needed for mapping both natural
Information Visualization Multivariate Data Visualization Krešimir Matković
Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal
Cluster Analysis for Evaluating Trading Strategies 1
CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. [email protected] +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. [email protected]
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
Visualizing Changes of Hierarchical Data using Treemaps
286 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 3, NO. 6, NOVEMBER/DECEMBER 27 Visualizing Changes of Hierarchical Data using Treemaps Ying Tu and Han-Wei Shen Abstract While the treemap
Exploratory Spatial Data Analysis
Exploratory Spatial Data Analysis Part II Dynamically Linked Views 1 Contents Introduction: why to use non-cartographic data displays Display linking by object highlighting Dynamic Query Object classification
CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin
My presentation is about data visualization. How to use visual graphs and charts in order to explore data, discover meaning and report findings. The goal is to show that visual displays can be very effective
An Introduction to Point Pattern Analysis using CrimeStat
Introduction An Introduction to Point Pattern Analysis using CrimeStat Luc Anselin Spatial Analysis Laboratory Department of Agricultural and Consumer Economics University of Illinois, Urbana-Champaign
An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration
An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration Toktam Taghavi, Andy D. Pimentel Computer Systems Architecture Group, Informatics Institute
R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants
R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions
What is Visualization? Information Visualization An Overview. Information Visualization. Definitions
What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some
Tutorial Segmentation and Classification
MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel
Cascaded Treemaps: Examining the Visibility and Stability of Structure in Treemaps
Cascaded Treemaps: Examining the Visibility and Stability of Structure in Treemaps Hao Lü and James Fogarty Computer Science & Engineering DUB Group University of Washington { hlv, jfogarty }@cs.washington.edu
BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou
BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new
an introduction to VISUALIZING DATA by joel laumans
an introduction to VISUALIZING DATA by joel laumans an introduction to VISUALIZING DATA iii AN INTRODUCTION TO VISUALIZING DATA by Joel Laumans Table of Contents 1 Introduction 1 Definition Purpose 2 Data
Data Visualization Handbook
SAP Lumira Data Visualization Handbook www.saplumira.com 1 Table of Content 3 Introduction 20 Ranking 4 Know Your Purpose 23 Part-to-Whole 5 Know Your Data 25 Distribution 9 Crafting Your Message 29 Correlation
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering
Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong
Data Analysis, Statistics, and Probability
Chapter 6 Data Analysis, Statistics, and Probability Content Strand Description Questions in this content strand assessed students skills in collecting, organizing, reading, representing, and interpreting
Climate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics:
OVERVIEW Climate and Weather The climate of the area where your property is located and the annual fluctuations you experience in weather conditions can affect how much energy you need to operate your
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015
Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis
Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results
Evaluation of a New Method for Measuring the Internet Distribution: Simulation Results Christophe Crespelle and Fabien Tarissan LIP6 CNRS and Université Pierre et Marie Curie Paris 6 4 avenue du président
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
Introduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
Grade 5 Math Content 1
Grade 5 Math Content 1 Number and Operations: Whole Numbers Multiplication and Division In Grade 5, students consolidate their understanding of the computational strategies they use for multiplication.
IST 557 Final Project
George Slota DataMaster 5000 IST 557 Final Project Abstract As part of a competition hosted by the website Kaggle, a statistical model was developed for prediction of United States Census 2010 mailing
BUSINESS DEVELOPMENT OUTCOMES
BUSINESS DEVELOPMENT OUTCOMES Small Business Ownership Description Total number of employer firms and self-employment in the state per 100 people in the labor force, 2003. Explanation Business ownership
Decision Trees What Are They?
Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a
Profile of IEEE Consultants, 2004 Prepared by R.H. Gauger, P.E. December 2004
Profile of IEEE Consultants, 24 Prepared by R.H. Gauger, P.E. December 24 Introduction to a Consultant s Profile As a consultant is preparing a proposal or negotiating a contract, one of the ongoing concerns
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
Changes in the Cost of Medicare Prescription Drug Plans, 2007-2008
Issue Brief November 2007 Changes in the Cost of Medicare Prescription Drug Plans, 2007-2008 BY JOSHUA LANIER AND DEAN BAKER* The average premium for Medicare Part D prescription drug plans rose by 24.5
Visual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
ECS 235A Project - NVD Visualization Using TreeMaps
ECS 235A Project - NVD Visualization Using TreeMaps Kevin Griffin Email: [email protected] December 12, 2013 1 Introduction The National Vulnerability Database (NVD) is a continuously updated United
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
How To Calculate College Enrollment In The United States
EDUCATION POLICY BRIEF May 2008 The Nelson A. Rockefeller Institute of Government The public policy research arm of the State University of New York The States and Their Community Colleges Every state
Topic Maps Visualization
Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Demographics of Atlanta, Georgia:
Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From
Cash Rents Methodology and Quality Measures
ISSN: 2167-129X Cash Rents Methodology and Quality Measures Released August 1, 2014, by the National Agricultural Statistics Service (NASS), Agricultural Statistics Board, United States Department of Agriculture
Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA
Are Image Quality Metrics Adequate to Evaluate the Quality of Geometric Objects? Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA ABSTRACT
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Three Effective Top-Down Clustering Algorithms for Location Database Systems
Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Representing Geography
3 Representing Geography OVERVIEW This chapter introduces the concept of representation, or the construction of a digital model of some aspect of the Earth s surface. The geographic world is extremely
Interaction and Visualization Techniques for Programming
Interaction and Visualization Techniques for Programming Mikkel Rønne Jakobsen Dept. of Computing, University of Copenhagen Copenhagen, Denmark [email protected] Abstract. Programmers spend much of their
Chapter 111. Texas Essential Knowledge and Skills for Mathematics. Subchapter B. Middle School
Middle School 111.B. Chapter 111. Texas Essential Knowledge and Skills for Mathematics Subchapter B. Middle School Statutory Authority: The provisions of this Subchapter B issued under the Texas Education
SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING
SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING WELCOME TO SAS VISUAL ANALYTICS SAS Visual Analytics is a high-performance, in-memory solution for exploring massive amounts
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
For example, estimate the population of the United States as 3 times 10⁸ and the
CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number
How To Develop Software
Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Risk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
