TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

Martin Kraus Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate information that is personally relevant to readers of a visualization. This may influence readers to consider a visualization a detailed enrichment of their personal experience instead of an abstract representation of anonymous numbers. However, the integration of textual detail into a visualization is often very challenging. This work discusses one particular approach to this problem, namely text-filled stacked area graphs; i.e., graphs that feature stacked areas that are filled with small-typed text. Since these graphs allow for computing the text layout automatically, it is possible to include large amounts of textual detail with very little effort. We discuss the most important challenges and some solutions for the design of text-filled stacked area graphs with the help of an exemplary visualization of the genres, publication years, and titles of a database of several thousand PC games. INTRODUCTION Ben Shneiderman s visual information-seeking mantra overview first, zoom and filter, then details-on-demand (Shneiderman, 1996) is often recommended as a design guideline for interactive, computer-based information systems. However, it can also be applied to the design of static visualizations since the human visual systems supports all elements: overview is achieved by saccadic eye movements, zoom corresponds to visual fixation, pre-attentive processing can act as a filter, and details on demand can be provided by small-typed text that requires fixation and reading. In fact, traditional geographic maps are often used in this way: readers scan the map to find the position they are interested in, fixate this position, filter elements such as cities or streets, and then read detail information such as street names. In addition to maps, Tufte (1990) presents further examples of the successful integration of an abundance of detail in static visualizations (see also Section Micro-Macro Readings ) and summarizes the design strategy this way: to clarify, add detail (Tufte, 1990, p 37). The benefits of the visual integration of a large amount of detail include more potentially useful information (in addition to the often cumulative overview), an overview of this additional information (which is difficult to achieve in interactive systems that present detail only on demand), more credibility (for example, because the details include the data that the overview is based on), details that are personally relevant to the reader and, therefore, have the power to appeal to the reader on the reflective level described by Norman (2003), etc. From a technical point of view, the main benefit is that a highly detailed, static visualization can be printed on paper and serve as a robust, nonelectronic display of a large amount of information. It can also be presented on computer displays with standard document viewers that allow for zooming and panning without requiring more specific software. In fact, most document viewers will also allow for searching text in such visualizations and, therefore, allow readers to quickly locate particular textual information. The main challenge of the approach, however, is the integration of detail without cluttering the overview. The problem is even more severe when the detail information has to be added automatically, for example, because the amount of detail is too large for a manual integration. The solution proposed in this work can be described as text-filled stacked area graphs; i.e., stacked area graphs (Harris, 1999) that use textual detail to automatically fill the stacked areas. The basic concept, challenges and some solutions are discussed with the help of an example in Section Design of Text-Filled Stacked Area Graphs. First, however, related work on micro-macro readings is reviewed. MICRO-MACRO READINGS According to Card et al. (1999, p 307), micro-macro readings are one of several focus+context techniques along with filtering, selective aggregation, highlighting, and distortion. In contrast to overview and detail techniques, which show overview information and detail information in two displays, one of the premises of focus+context techniques is that these two types of information can be combined within a single (dynamic) display, much as in human vision (Card, 1999, p 307). In micro-macro readings this is achieved as detail cumulates into larger coherent structures (Tufte, 1990, p 37). The main advantage of micro-macro readings in comparison to most focus+context techniques is its applicability to static information graphics. Tufte (1990) reviews several examples of micro-macro readings; among them Maya Lin s design of the Vietnam Veterans Memorial in Washington, D.C. Abramson (1996) discusses this and Lin s designs of the Civil Rights Memorial in Montgomery, Alabama, and of the Women s Table in New Haven, Connecticut, in more detail. The common feature of these monuments is the use of a timeline, which combines annal-like factual reality, chronicle-like narrativity, and most important, its own vivid, graphic structure (Abramson, 1996, p 699). It is particularly notable that in all three designs, a larger graphic structure is created exclusively by text without other graphical elements. Another example of micro-macro readings, which is reproduced by Tufte (1997), is a graph of the market share of pop/rock music, which was designed by Reebee Garofalo. An updated version that covers the years 1955 to 1978 is available on Garofalo s web page (Garofalo, 2011). This graph resembles a stacked area graph although it goes beyond the traditional stacked area graph and visualizes a graph data structure with nodes and edges, which in fact includes cycles. Most notable is the concept of filling the stacked areas with textual labels of styles and artists. For the proposed text-filled stacked area graphs, we draw inspiration from Lin s and Garofalo s work and combine them in a way to allow for automatic text layout in order to dramatically reduce the costs of including large amounts of textual detail. In particular, we choose a

stacked area graph as larger structure similar to Garofalo s design. However, instead of just labeling areas between lines with text, we rely on the textual detail to form the larger structure analogously to the mentioned monument designs by Lin. The result is in fact similar to stem-and-leaf plots by Tukey and the histogram of American divisions in France during World War I by Ayres, which are discussed by Tufte as examples of data-built data measures (Tufte, 2001, pp 139 144). Many more examples of more general text-filled shapes are discussed in the literature, ranging from medieval manuscripts (Tufte, 2006, pp 84 87 to tag clouds in contemporary web design (Halvey, 2007). DESIGN OF TEXT-FILLED STACKED AREA GRAPHS According to Harris (1999, p 10), area graphs are generated by filling the areas between lines generated on other types of graphs. Note that it is not necessary to display these lines in an area graph as the edges of the filled areas are sufficient to show the data curve (Harris, 1999, p 20). Stacked area graphs are characterized by multiple data series positioned on top of one another (Harris, 1999, p 14). Since one can think of a stacked area graph as a stacked column graph with the areas between the columns filled (Harris, 1999, p 14), we will refer to stacked column graphs and linked stacked column graphs as variants of stacked area graphs. decades using a technique of small multiples (Tufte, 1990, ch. 4). In order to maintain the readability of the textual detail, it is also recommendable to avoid any overlapping text. While the genre labels violate this rule, they do not compromise the readability of the textual detail, which is considerably darker, smaller, and printed on top of the genre labels. However, not all of the genre labels are very well readable. Equally important as the readability of the textual detail is its meaning. If not all textual detail is meaningful (for example in the case of incomprehensible abbreviations), readers are likely to be frustrated. In our example, all textual detail consists of the full game titles as specified in the underlying database. In this sense all text is meaningful. In order to help readers to make most sense of the textual detail and/or to enable them to quickly navigate to specific detail information, it should be positioned in a systematic, understandable way within the graph. In the example, the titles are sorted alphabetically within each section for a specific year and genre, which should become clear to readers once they read any section that includes several titles. Fig. 1. Two alternatives for a potentially automatic layout of textual detail: horizontal text in a linked stacked column graph (left) and horizontal text in a stacked area graph (right). In this work, we propose text-filled stacked area graphs, which are characterized by using small-typed text to fill areas of stacked area graphs. In addition to this textual fill, another fill (e.g. a color) can be applied to the areas as a background to the text. In order to further explain the technique and to illustrate some design choices, a particular example is discussed, namely a visualization of the publicly available PC game titles in the All Game Guide database by All Media Guide (2011). Figs. 3~4 show the visualization, which is designed as two A0- size posters. Of course, an electronic version can also be explored with any document viewer that allows zooming and panning. The visualization consists of stacked area graphs with the total height of each column corresponding to the number of game titles of a particular year in the database. Therefore, the y axis is labeled titles per year and the x axis specifies the year. Each column is broken into several stacked sections corresponding to 13 genres that are used in the database. (The genres home and compilation were not included.) The sections are filled with the names of the games of the specific genre and year. Analogously to the total height, the height of each genre section corresponds to the number of game titles in the section. Grid lines, axes lines, and tick marks are avoided to obtain a minimal design and maximize the data-ink ratio (Tufte, 2001, ch. 4). Genres are labeled directly with layered text (Tufte, 1990, ch. 3) such that the labels are more easily readable when the reader zooms into the visualization. One of the main concerns in the design of text-filled area graphs is the readability of the textual detail. Thus, text should never be rotated by more than 90 degrees and only scaled uniformly in both dimensions, which also helps to preserve the quality of the typographic design of the employed font. Of course, textual detail should not be too small. It turned out that the number of game titles of the whole database was too large to visualize them in a single A0-size poster with readable text; thus, the visualization is limited to PC games and the data is split into Fig. 2. The chosen automatic layout of textual detail: tilted text lines in a stacked area graph. AUTOMATIC TEXT LAYOUT Considering the integration of large amounts of textual detail, it is important to automatically layout the text in order not to limit the amount of detail that can be included without excessive costs. Three possible text layouts for text-filled stacked area graphs are illustrated in Figs. 1~2. In our example, tilted text lines as shown in Fig. 2 were chosen because they communicate the slope of the data curve very well without requiring additional graphic elements apart from the text.

The automatic text layout was implemented in the computer algebra system Mathematica and processes the columns of the graph from left to right. Each column i is segmented into stacked sections in several steps. Note that the process has to look ahead one column in order to guarantee a consistent segmentation into genre segments for adjacent columns: The left and right edges of column i are segmented based on the number of titles in each genre for the year corresponding to the column and the following year, respectively. For each genre segment of column i from bottom to top, the number of text lines that can fit into the segment is computed based on the smaller of the heights at the left and right edge of a genre segment. Furthermore, the position of preliminary tilted lines are computed according to the preliminary geometry of the segment. The segment's text (i.e.\ an alphabetically sorted list of all game titles of a specific genre and year) is then word-wrapped to fill these lines. If the number of lines is not sufficient for the amount of text, the height of the segment is increased until the whole text can be placed into the segment. Thus, some segments (in particular those with a very small number of game titles) will be larger than they should be according to the vertical scale. The algorithm tries to compensate for this increased size by reducing the size of the segments on top of the current segment. The process is repeated for the next column i+1 to the right with the segmentation of the left edge (i.e. the edge between column i and column i+1) being initialized by the result of the previous step. This ensures that the segmentation of the edge between columns i and i+1 allows for enough space for all segments of column i+1. The new segmentation of the edge between column i and i+1 is used for the final layout of column i. In principle, it is possible that some segments are too small and have to be enlarged. This correction should not be compensated by reducing the heights of other segments in order to guarantee that the layout for column i+1 is not compromised. However, this case was not relevant in our example. For the layout of the following column i+1, the segmentation of the left edge between column i and column i+1 is now considered fixed as it has been taken into account when the layout of column i was computed. stacked area graphs. However, text-filled stacked area graphs are certainly not a universal solution to the general problem of integrating textual detail in information visualizations. Nonetheless, they are applicable to a wide range of data sets; examples in an academic environment include databases of personnel, students, publications, events, etc. The two main benefits of the proposed inclusion of a large amount of detail are that more information is available to readers and that readers might be able to personally relate to some of the details whereas the cumulative overview information alone would hardly provide more than anonymous numbers. Thus, the proposed technique will hopefully not only result in more useful but also more attractive visualizations. FUTURE WORK Future work should focus on a better design for labels of genres (or corresponding categories) and user studies to evaluate the proposed visualization. Based on this evaluation, the concept of text-filled shapes should be applied to further area graphs, including pie charts and flow maps, which presumably could also benefit from the inclusion of more textual detail. It turned out that it is crucial to compute the vertical distance between lines according to the slope of the lines. Furthermore, the distance between lines was varied by maximum 25 % to reflect an increase or decrease in the number of titles of a particular genre from one year to the next. While the automatic layout of the textual detail worked well, the genre labels were not positioned automatically. This was an unfortunate decision since it discouraged experimentation with alternative designs. CONCLUSIONS This work introduces text-filled stacked area graphs, which are based on an automatic layout of a large amount of meaningful textual detail in

REFERENCES Abramson, Daniel (1996). Maya Lin and the 1960s: Monuments, Time Lines, and Minimalism, Critical Inquiry, 22(4):679 709. All Media Guide (2011). All Game Guide, web page: http://www.allgame.com; last visited: May 9, 2011. Card, Stuart K.; Mackinlay, Jock D.; and Shneiderman, Ben (1999). Focus + Context in Stuart K. Card, Jock D. Mackinlay, and Ben Shneiderman (eds.), Readings in Information Visualization, Morgan Kaufmann Publishers, pp 306 309. Tufte, Edward R. (1990). Envisioning Information, Graphics Press. Tufte, Edward R. (1997). Visual Explanations: Images and Quantities, Evidence and Narrative. Graphics Press. Tufte, Edward R. (2001). The Visual Display of Quantitative Information, 2nd edition, Graphics Press. Tufte, Edward R. (2006). Beautiful Evidence, Graphics Press. Garofalo, Reebee (2011). The Genealogy of Pop/Rock Music, author s web page: http://reebee.net ; last visited: May 9, 2011. Harris, Robert L. (1999). Information Graphics, Management Graphics. Halvey, Martin J. and Keane, Mark T. (2007). An Assessment of Tag Presentation Techniques, in Proceedings of the 16th international conference on World Wide Web, WWW 07, ACM, pp 1313 1314. Norman, Donald A. (2003). Why We Love (or Hate) Everyday Things, Basic Books. Shneiderman, Ben (1996). The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations in Proceedings of the 1996 IEEE Symposium on Visual Languages, IEEE Computer Society, pp 336 343. Fig. 3 (next page). Two text-filled stacked area graphs covering in total 4896 PC game titles from the 1980s and 1990s in the All Game Guide (All Media Guide, 2011). Titles without genre information and titles in the home and compilation genre were not included. When printed on an A0 poster, the size of the small-typed text is about 1 mm; thus, the text is still readable. Fig. 4 (next but one page). A text-filled stacked area graph covering 9463 PC game titles from the 2000s in the All Game Guide (All Media Guide, 2011). (See also Fig. 3.)

Martin Kraus: Journal Computer and Information Technology Journal of of Computer and Information Technology