2 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks

Size: px
Start display at page:

Download "2 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks"

Transcription

1 Volume 32 (2013), Number 3 Eurographics Conference on Visualization (EuroVis) 2013 B. Preim, P. Rheingans, and H. Theisel (Guest Editors) Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks Submission ID: 238 Figure 1: The top 26 topics in a Wordonoi for a large research organization for homeland security consisting of 2,000 nodes and 5,000 edges. The cells have been colored based on categories to allow distinguishing between different and adjacent topics. Abstract Datasets with both relationships and textual content are becoming increasingly common; examples include hypertext documents, rich social networks, and scientific authorship. We call this type of datasets knowledge networks, and present a novel and interactive visualization technique called Wordonoi to visualize them. Wordonoi visualizes both the textual and relational components of knowledge networks by spatializing them into a multi-scale 2D visualization using a Voronoi tessellation and then mapping keywords onto the different cells. Because knowledge networks are often large, we also provide aggregation mechanisms for summarization. We explore and implement several interactions like interactive coloring, semantic zooming, and searching. We also validate the technique with three examples, including a research organizational structure, a hypertext network, and NSF funding data. Categories and Subject Descriptors (according to ACM CCS): H.5.1 [Information Systems]: Multimedia Information Systems ; H.5.2 [Information Systems]: User Interfaces 1. Introduction Text is one of the most important and common types of data in the world today [Shn96], and there exists a multitude of tools (e.g., [DZG 07, vhwv09, VWF09]) for visualizing such data. However, as we go beyond simple text corpora to more complex datasets, one particular class of data emerges that combines textual labels with their relationships (e.g., as graphs). We denote such datasets knowledge networks because they exhibit a graph structure with textual data for the nodes and links. Examples include dictionaries, where each word has a definition and relationships to synonyms, antonyms, or modifiers; the web, where each webpage consists of text and hyperlinks to other pages; and research funding networks, where text describes projects and relationships capture investigators, institutions, and program officers. While several techniques exist for visualizing graph or text alone, visualizing their combination is challenging [KKEE11]. Furthermore, in such graphs sometimes the

2 2 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks textual content is more important, and sometimes the relationship structure is more important. For example, consider crime network data consisting of crime reports and their relationship. In this case, the user may either want to focus on how the various crimes occurred (i.e., the contents of the reports), or may want to see associations between particular crimes and individuals (i.e., relationships between reports). Some tools focus only on textual contents and make little use of connections (e.g., [CVW09]), and vice versa (e.g., [vhwv09]). Only a handful of tools [KKEE11, SGL08] visualize both features simultaneously. Another problem with knowledge networks is their size. Not only are the graphs themselves generally large, but so are their textual content. Therefore, summarization [EF10] techniques are required to provide effective overview. There exist techniques [DZG 07, VWF09] that visualize large size textual data by extracting important tags and patterns from them and visualize those tags. However, these tools do not show the overall landscape or effective summary of the whole data in one screen. In our case, we need to summarize both the networks and their textual contents. In this work, drawing from previous work such as selforganizing maps (SOMs) [Koh82], WordBridge [KKEE11], and Wordle [VWF09], we propose a visualization technique that we call WORDONOI for visualizing both the relations and textual contents of knowledge networks. Our technique is a multi-scale and space-filling 2D text visualization that supports hierarchical aggregation [EF10] to allow the user to interactively explore the knowledge network. The contribution of this work is the ability to show a summary of an entire textual and relational dataset in a single screen. We have implemented a Wordonoi prototype that accepts knowledge networks as input and renders an interactive visualization. Our framework initially calculates node positions using a graph algorithm and then uses these positions to compute a Voronoi tessellation of the space. Each Voronoi cell represents a node, and the text associated with the node is shown inside the cell as tags extracted from the text. In our implementation, we explore several aspects of the Wordonoi design space, including aggregation, text visualization in Voronoi cells, coloring schemes, and interaction techniques such as semantic zooming, panning, querying, etc. We validate our technique by applying it to three examples. The first is a large research structure for relationships between persons, projects, institutions, and centers, where each node contains details about the research. The second example is a hypertext network where the relationships are links and the text is the document contents. In the last example, we study a knowledge network of NSF funding data containing relationships between PIs and projects, and where the text is the project descriptions. All three examples show that our technique works well for practical knowledge networks by summarizing their textual contents while simultaneously considering their relationship structure. 2. Related Work Digital technology has made text data ubiquitous. However, staying abreast of this onslaught of textual streams such as news articles, academic papers, crime reports, etc is impossible [Hea09]. Text visualization uses interactive visual representations to summarize, highlight, and characterize the contents of textual data [DZG 07]. However, this method is complicated by the fact that text data is categorical, unstructured, and high-dimensional [SWL 10]. Below we outline the general approaches in the literature Frequency-based Visualization Extracting important words from text according to their frequency and visualizing that metric is a common text visualization technique [Hea09]. Most famous among these techniques is tag clouds (or word clouds) [HR08], and is commonly used on the Internet by Web 2.0 and social media websites. Despite their popularity, tag clouds have several problems, such as attributing too much attention on longer words [VWF09] and not making efficient use of the spatial dimension. The Wordle technique [VWF09] overcomes many of the problems associated with tag clouds and produces highly aesthetic and compact clouds. ManiWordle [KLKS10] proposes several improvements and allows the user to interactively control the cloud layout. Finally, a technique called clustered word clouds [Cla] use word relatedness to control positioning in a tag cloud layout and thus display co-occurring and related words in close proximity Visual Concordances A concordance is an alphabetical index of all the words in a text together with their context [Hea09]. Several text visualizations have been designed to visualize texts in this way. SeeSoft [Eic94] visualizes text documents by representing each document as a vertical column and text in them as a color-coded row of pixels. Similarly, TextArc [Pal02] displays the lines of text in elliptical layout with frequently occurring words placed in the center. Selecting the central word displays its connections to lines of text containing it. Finally, DocuBurst [CCP09] and WordTree [WV08] are examples of document concordances built using hierarchies Combining Text with Other Visualizations Some existing work blends text visualization with other visualization techniques to show patterns in the text. ThemeRiver [HHWN02] uses thematic variations over time to visualize the frequencies of topics extracted from the text. NameVoyager [Wat06] uses stacked bar graphs to show frequencies of baby names across time. TIARA [WLS 10] integrates trend graphs into tag clouds to show important patterns over time. Another text visualization technique called SparkClouds [LRKC10] integrates sparklines into tag clouds to show the trend of each word over time.

3 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks Visualizing Relationships in Text Visualizing the relationships of the words in a text collection is becoming very important, and numerous visualizations have been proposed to show the structure of the text. ArcDiagrams [Wat02] displays patterns of repetition in string data. IN-SPIRE [WTP 95] shows the relationships between the documents in the corpus using a themescape, which is a topic-based projection of document concepts and keywords onto a 2D space. The Word Tree [WV08], a hierarchical document concordance, shows the context of each word by displaying phrases commonly following or preceding a given word. Finally, Phrase Nets [vhwv09] shows a graph of related words based on a user-specified relation. Th [VGD06] parses conversations and portrays relationships between individuals by extracting and analyzing keywords. Parallel Tag Clouds (PTCs) [CVW09] extends Th by combining parallel coordinates [Ins85] with tag clouds, enabling faceted browsing of textual documents and comparisons across facets. Another common approach is to use node-link diagrams to show relations in the text. Wong et al. [WMP 05] describe a novel method of displaying dynamic text in the place of links in a node-link diagram. TextArc [Pal02] uses nodelink diagrams to show all the contexts in which a word appears. WordBridge [KKEE11] replaces the nodes and links in a graph with node and link tag clouds that convey not just connectivity but also the content of the relations. FacetAtlas [CSL 10] combines node-link diagrams with density maps to visualize entity relationships in a text. In particular, Gansner et al. [GHK10a, GHK10b, GHKV09, GHN12] combine geographic maps with nodelink diagrams to increase the visual appeal of a graph. These works are closely related to our work from a visual design point of view, but have a very different data model and approach. Wordonoi visualizes bodies of text associated with a node cluster in the area assigned to the cluster s cell, while Gansner s work focuses on clustering nodes into larger regions and rendering their node labels. In other words, Wordonoi primarily visualizes knowledge networks with textual data, while Gansner uses the geographical map approach to increase the visual appeal of the node-link diagram. All of these techniques focus on showing the relationship or structure of the text within a text corpus. In contrast, our Wordonoi technique shows the relationships of nodes each with textual content in a knowledge network. Below, we will see what impact this focus will have on the technique. 3. Generating Knowledge Networks We define knowledge networks simply as graphs with associated textual content for the nodes and edges in that network, where the textual data provides some form of semantic meaning to the graph structure. In such networks, the capacity to understand not just the relationship between entities, but also the semantics of the connections is important. For example, a standard citation network shows whether or not particular authors have collaborated or cited each other, whereas a knowledge network constructed from such data may be able to tell us the nature of their collaboration or citations. We here propose a mechanism for constructing knowledge networks from standard graph and textual datasets. The most straightforward way to generate knowledge networks is by deriving multidimensional graphs from tabular data, e.g., as described by Liu et al. [LNS11]. The process is heavily dependent on the application domain, but often involves recovering an entity-relationship (E-R) model [Che76] from the data before extracting the knowledge network. In such a model, each entity (a person, publication, organization, physical object, or concept) becomes a node in the network, and the relationships between entity types are used to generate the links between nodes. For example, relational databases are built from E-R models, so extracting this mode from tables and their keys is relatively straightforward (although the user still has to make selection on which entities to include). In other cases, the E-R model must be explicitly specified by the user; for example, in a collection of crime reports, we may first have to identify entities in the text (similar to how Jigsaw [SGL08] extracts entities) and then decide on how to generate the relationships (co-occurrence, distance in text, semantic meaning). Given a basic E-R model and network data extracted from the original dataset, we must now augment the network with textual information describing the semantics of the entities and their relationship. Again, this process is applicationspecific and depends on which semantics should be highlighted. A common approach is to summarize all of the textual information available in the original database and integrate it within the knowledge network. This can be achieved using text mining techniques such as counting word frequencies, calculating tf-idf [Jon72] and related metrics, or automatically extracting text summaries [Hea99]. The extracted information ranging from keywords, phrases, or entire texts is then used as node and edge attributes. 4. Wordonoi: Design Space Wordonoi is an interactive visual representation for knowledge networks that combines both relational and textual content. Figure 2 depicts the Wordonoi pipeline that takes a knowledge network as input, processes the network in stages, and yields an interactive visualization. Below we describe these stages and explore the Wordonoi design space. Figure 2: From knowledge network to visualization.

4 4 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks Figure 3: Spatializing knowledge networks using Wordonoi: (a) graph layout; (b) Voronoi tessellation; and (c) cell generation Spatialization The first step in visualizing knowledge networks is to assign space on the 2D visual substrate to each entity (node) in the knowledge network. The intuition is to project the highdimensional network structure onto 2D space so that the textual component associated with each node can be displayed. This spatialization technique should fill the available space in the viewport by allocating disjoint 2D regions to each network node based on the graph structure. To achieve this, we abandon the explicit display of the graph structure itself in favor of conveying the textual content of the knowledge network while maintaining connectivity. Figure 3 illustrates the three parts of the spatialization process: (a) generating a graph layout of the network; (b) tessellating the space into disjoint cells; and (c) allocating a cell to each node. Graphs are high-dimensional datasets, and graph layout algorithms are concerned with finding projections of such datasets into 2D (or 3D) space by calculating the position of each node in a way that optimizes some metric (typically readability). However, the primary purpose of the layout for the Wordonoi technique is to find a 2D mapping that mimics the structure of the underlying graph, i.e., that places highly connected nodes in close proximity. Any layout algorithm can be used; we prefer Noack s lin-log algorithm [Noa05] because it can cluster nodes based on connectivity. Having found a graph layout, we now convert the nodes into 2D regions on the viewport where textual content can be visualized. For this purpose, we use a Voronoi tessellation that subdivides the space into disjoint subspaces, or cells, based on node positions (each cell being points closest to each node). Each cell has an associated node, and its area can be used for the node s textual contents. Several design decisions were made in arriving at the above spatialization approach. The main tradeoff here is clearly that we are sacrificing some of the graph structure from a traditional node-link diagram in order to be able to convey more textual content in the visualization. This is different from techniques such as WordBridge [KKEE11] and PhraseNets [vhwv09] that also combine graph and text visualization, but retain more of the relationship structure in the representation. The drawback for such approaches is that less space is available for visualizing textual content, and, as with any node-link diagram, they do not scale well with graph size. As we shall see in Section 4.5, the Wordonoi space-filling representation not only allows devoting virtually the entire space to textual content, it is also highly amenable to hierarchical aggregation to manage scale Text Visualization Spatialization has subdivided the viewport into cells based on network topology, yielding one cell per node in the knowledge network. The next step is to use the 2D cells to visualize the textual content associated with each node: Most important keyword: Scale the most important keyword (e.g., most frequent) to fit as a single label. Repeat keyword: Again, use the most important keyword, but fill the cell completely using the keyword. Word cloud: Draw a word cloud of the tags belonging to the cell using the global frequency of each keyword. We use all three strategies depending upon the size of the cell on the screen. For small-sized regions, only the most important tag is displayed; for medium-sized regions, the tag is repeatedly displayed; and for large-sized regions, a word cloud is displayed. This choice of visual representation changes as the user zooms in and out in the visualization, giving rise to a form of semantic zooming [BGM04]. Most word cloud layout algorithms are designed for rectangular spaces, so an irregular (non-convex) cell may cause parts of keywords to fall outside of the cell polygon. We therefore provide an interaction where users can drill down in any region to see its text without clipping Utilizing Color Color is a free parameter in our design space, and can be used for features such as topology, categories, and affinity: Random assignment: Cells are assigned random colors to allow for differentiation. Graph coloring: A graph coloring algorithm is used to color cells such that no two adjacent cells have the same. Categories: Node type or textual content can be used to categorize nodes (and their cells), and cells can then be assigned colors based on their category. Color scale: A color scale (such as gray scale or heat scale) can be used to show quantitative information about each cell. Some examples of such quantitative features include node centrality, cohesiveness, and connectedness.

5 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks 5 In Figure 6, a green color scale has been used to assign colors to cells based on cohesion (the ratio of internal edges to total edges associated with a cell). Thus, light green cells represent nodes that have few internal connections, whereas dark green cells have higher connectivity. This can be used to find communities in the data Recovering Network Topology Even though the spatialization is based on the topology of the underlying knowledge network, the Wordonoi representation still sacrifices some of the graph structure in favor of the semantic content of the network. We provide two ways for users to recover topology information: Cell Border Visualization: A Wordonoi visualization no longer displays the edges in the original node-link diagram that formed its basic structure, but there is nothing to prevent us from decorating the cell representation with this information. More specifically, we can use the borders between adjacent cells to convey information about the connectivity of those cells; for example, by varying the thickness or color of the border. Interactive Color Diffusion: We can also use color to explicitly show the network topology through an interactive paint metaphor where users assign a color to a cell, and the color is then diffused through the representation based on topology: (1) to all adjacent neighbors of a colored node; or (2) to all cells that are connected in the network. The diffusion proceeds in a breadth-first fashion with diminishing amounts of color for each step. The amount of color also depends on the relational strength between the source and the destination cells; the stronger the relations (i.e., the more edges between the cells), the more color is diffused. We use an alpha blending model where all regions are initially white, and iteratively gets blended with other colors. This technique is incremental, i.e., different colors can be assigned to different cells, each blending with existing colors across the representation (Figure 5) Aggregation Knowledge networks are often large, and therefore require summarization. Popular solutions to achieve this include clustering, filtering, or sampling the graph, but none of these approaches are well-suited in combination with textual data. For the Wordonoi technique, we simply take advantage of the space-filling visual representation of disjoint cells, one per node, resulting from the spatialization stage by designing a hierarchical aggregation technique [EF10] that incrementally agglomerates adjacent cells together until only one the sum of all cells remains. For our Voronoi cells, an agglomeration of two cells is simply the union of the 2D space of each cell, and the textual content of each corresponding node is also combined. This results in a binary clustering tree that can be expanded to any level depending on the user. Choosing a good distance metric is key to any hierarchical aggregation [EF10]. Examples of possibilities include metrics based on network topology or layout geometry Topology-based Distance Metrics Distance metrics based on network topology use graph structure to determine the order of agglomeration for cells: Degree: Merge the regions for associated nodes that combine to result in the highest (or lowest) degree. Edge weights: Combine regions for the nodes that are connected by edges with the highest (or lowest) weight. Cohesion: We define the cohesion (clustering affinity) of a region with other regions as the ratio of common edges between these regions to its degree. This metric will merge regions that are the most (or least) cohesive Spatialization-based Distance Metrics These metrics use the spatialization data to define distances between cells. While these do not depend on the graph topology, they result in a more optimized visual representation: Minimum area: Merge the two adjacent regions that have the minimum combined area. This would converge towards unifying cell size at each level of the aggregation. Maximum Area: Merge the two adjacent regions with the maximum combined area. This preserves small cells, which for many graph layout algorithms are highly connected, as well as central nodes in the center of the space. Rectangle completion: To avoid irregular polygons, use a distance metric based on how closely two cells form a complete rectangle (calculated as the ratio between the combined area and that of their bounding box). 5. Wordonoi: Implementation We implemented a Wordonoi prototype consisting of two components: a preprocessor that performs off-line spatialization, and an interactive tool that displays the visualization Preprocessing The preprocessing tool loads knowledge networks in GraphML format, generated from some earlier generation stage (Section 3), and performs the spatialization process presented in Section 4.1. Our implementation uses Noack s lin-log graph layout [Noa05] to find a 2D layout for the nodes, and then tessellates the space using a standard Voronoi implementation. We also make sure to preserve the textual content (extracted while generating the knowledge network) from the GraphML input file in the representation. Finally, the tool computes the complete aggregation hierarchy (several distance metrics are available for use) and saves the cell shapes, the text summaries, and the aggregation hierarchy to a custom XML format. Typical computing time for the preprocessor on a network consisting of approximately

6 6 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks (a) Search and highlighting. (b) Show text visualization without clipping. (c) Show underlying node-link diagram. (d) Show child cells. Figure 4: Interactions in our Wordonoi prototype implementation. 2,000 nodes and 5,000 edges is on the order of 5 seconds. The resulting XML file format, which includes the aggregation hierarchy and the Voronoi tesselation at all levels, is approximately 5 MB in size for the above dataset. Many stages in our implementation are computationally expensive. By precalculating the aggregation hierarchy, we avoid long run-times and can quickly render all the components at a particular hierarchy level without new computation. The result is a smooth and interactive visualization Visualization The interactive tool is built using the Piccolo [BGM04] toolkit for 2D vector graphics. The tool loads the preprocessed data, including the 2D cell shapes vectors and the aggregation hierarchy, and visualizes it as an interactive application. Inside each cell, the tool visualizes the textual content of the cell (and any aggregated children) using a method dependant on the screen space allocation (Section 4.2). We use a deterministic Wordle layout [KKEE11] to avoid the random and unstable layouts of the original Wordle [VWF09] Interactions Users can interact with the Wordonoi prototype as follows: Search: The user can type in a query, and cells that match will be highlighted while others are dimmed (Figure 4(a)). Pan & zoom: Users can pan and zoom in the visualization. Semantic zooming will change the visual representation of text depending upon the screen size of each cell. Aggregation: A slider (or mouse wheel) controls how many cells to display. Changing this setting will dynamically drill down or roll up the visual aggregation. Show text: Disable clipping of the text visualization and show the full contents of the current cell (Figure 4(b)). Show node-link: Display the node-link diagram for the current cell, as well as for neighbors (Figure 4(c)). Show children: Cells may be aggregates of multiple cells. This interaction mode will show all of the child cells of the current cell under the mouse cursor (Figure 4(d)). Interactive coloring: There are two options available for interactive coloring to recover network topology: Hover diffusion mode: Color is dynamically assigned to the clicked cell and diffused to its neighbors. Full color diffusion mode: When starting this mode, all

7 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks 7 6. Examples We showcase Wordonoi by applying it to three examples: a research organization, hypertext documents, and NSF funding data. Below we explain these examples in detail. Figure 5: Interactive color diffusion where cells have been colored in the order shown as red, blue, light green, dark green, and cyan, and then diffused based on connectivity. cells become white. Users can then iteratively select a color from a palette, assign it to a cell, and the color will be diffused across the representation (Figure 5). Cohesion color mapping: This mode switches from the default categorical color coding to coloring nodes based on the cohesion of each cell, i.e., the ratio of internal edges to the total number of edges for the cell (Figure 6). This helps in finding communities in the knowledge network. Reset: A reset option is also available to the user to revert the visualization to its original appearance Research Organization The motivation for this example is to support policymakers and researchers alike in understanding the size, scope, and research topics in a research organization for homeland security. The original dataset is an SQL database consisting of a network of research centers, the associated faculty and students, their institutions, their publications, and the projects they work on. Some of the tasks that a program manager might want to perform include the following: T1: What research is happening in the organization? T2: What research is done at particular centers? T3: What research is done by particular researchers? T4: How well are the partners collaborating? T5: Which reports deal with specific keywords? Several fulltexts are available that characterize the network, including project descriptions, paper abstracts, center mission statements, and investigator websites. In a manner similar to that described by Liu et al. [LNS11], we extract a knowledge network by mapping the tabular data to an entityrelationship model and summarizing the descriptive text for each entity using tf-idf. The resulting GraphML file consists of approximately 2,000 nodes and 5,000 edges. The file is then used as input to the Wordonoi pipeline. Figure 1 shows a screenshot of the top 26 nodes from the interactive visualization of this knowledge network. It serves as an overview of the research done in the organization, showing that the focus is on topics such as food protection, transportation safety, health and disease management, communication, and training (T1). Users can navigate, drill down, and explore this dataset further using the interactions described in the previous section. This will allow the audience to see the research landscape at all levels of scale, identify gaps, and find commonalities between projects. For example, if a program manager searches for a particular center, the summary of topics in the resulting regions gives the idea of research done in these centers (T2). For task T3, the user can use search and show node-link interactions. Interactive coloring and show node-link find the collaborations, and cohesion color mapping interaction indicates the amount of collaboration (T4). On searching a specific keywords, papers or project reports corresponding to resulting cells can be opened (T5). Figure 6: Color scale example showing regions having higher cohesion (more internal edges) as darker green Hypertext Documents Hypertext document networks are extremely common, with the web being the canonical hypertext collection, but visualizing Internet content is also notoriously difficult [Car96].

8 8 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks While we make no claims as to the utility of Wordonoi for hypertext documents in comparison to other web visualization techniques (surveyed by Card [Car96]), we do think that the technique provides some very useful new perspectives on hypertext collections. For example, Wordonoi could be applied to web browsing history data to see a summary of visited websites and their relationships NSF Funding Data Figure 7: Top 50 nodes for a network starting from the Wikipedia article on visualization. We use a straightforward approach to generate knowledge networks from the Internet by implementing a simple web crawler that takes the URL of a webpage as a starting point and processes all of the outgoing links from that webpage in a breadth-first fashion. Each processed webpage is added as a connected node with edges for all of the hyperlinks, and we also extract the text on the page by extracting title, meta-data, and document content (the latter summarized using tf-idf). The crawl is stopped at some given size (we use 500 documents in these examples), and the resulting data is stored as a GraphML file that can later be visualized. Some of the tasks the user might want to perform on such knowledge networks include the following: T1: Summarize a given webpage and its neighbors. T2: Summarize all webpages related to a keyword. T3: Find webpages with few or many links to other pages. T4: Given a keyword, find related keywords. T5: Open all webpages related to a keyword. Figure 7 shows a Wordonoi using the Wikipedia article on visualization as root. The top 50 cells shown here convey the gist of the knowledge network through concepts such as graphics, maps, graph, data, crime, theory, and charts. Users can not only see an overview of the whole dataset, but the Wordonoi technique allows them to see details and search for a particular page or words to see a summary of text related to it (T1 & T2). They can also use interactions such as show node-link or interactive coloring to see relationships. Color cohesion mapping interaction will help users in answering T3. Aggregation and search will help in answering T4. For example, in our network fields related to visualization are graphics, data, crime, police, maps etc. Searching for specific keyword and open all the webpages associated to resulting regions will help in answering T5. Federal funding agencies around the world typically make their funding data publically available, and the U.S. National Science Foundation is no exception; the NSF award search at provides fully searchable funding information on more than 300,000 awards from 1976 to today. Analyzing and understanding this portfolio of funded projects has many and diverse applications: for an investigator, this data may yield information on important topics and previous work; for a program officer, coverage and gaps in their funding portfolio; and for a policy maker, the scope of research being funded. Some tasks involving such knowledge networks include the following: T1: What are the major research areas funded? T2: Find the areas in which a person is getting funding. T3: Open all grant proposals related to a keyword. To be able to apply our Wordonoi visualization to this data, we first downloaded the full set of current awards for our own university. Based on our generation process (Section 3), we mapped this tabular data to an entity-relationship model consisting of projects, investigators, co-investigators, directorates, and program officers. Project descriptions were used as the textual data characterizing the network, and we summarized this using standard tf-idf. We further generated edges between projects based on the co-occurrence of keywords specified for each project, as well as co-occurring concepts derived from the tf-idf process. The size of the network is approximately 1,000 nodes and 5,000 edges. Figure 8 shows the Wordonoi for this NSF funding network for our university. The aggregation level is again chosen to show about 50 nodes to yield a high-level overview. It is clear from the visualization that most of the grants are related to engineering this is also accurate given the size and prominence of the college of engineering at the university. Other research topics include chemistry, earthquakes, and agriculture, which are also accurate. Another trend is the prominence of workshops and conferences, presumably for awards used to fund such scientific meetings. 7. Discussion and Limitations Our aim in this work is to visualize both the textual and structural content of knowledge networks. One potential limitation of the space-filling Wordonoi visualization is that the technique does not explicitly maintain relationships in the

9 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks 9 ric where cell aggregates tend towards a rectangular and thus more efficient shape. Another limitation with our implementation although not necessarily with the general technique is that we do not group or even link keywords that refer to the same concept. For example, the keywords first responder and emergency personnel are almost synonyms, but our naïve co-occurrence mapping would not catch this distinction. A more sophisticated approach left for future work would use a word ontology to make such connections. Figure 8: NSF funding example with rectangle aggregation. visual representation. On the other hand, the tessellation itself is derived from the network topology, and we have also presented an array of interactive and visual methods for recovering topology while benefiting from all of the advantages afforded by the space-filling representation. With this in mind, it is clear that the choice of graph layout algorithm has considerable impact on the resulting Wordonoi tessellation. A layout algorithm that clusters highly connected nodes in close proximity will result in a more cohesive representation, meaning that the generated cells at all levels of aggregation will be more faithful to the network topology. The lin-log layout we use has this property, but one of its drawbacks is that it is non-deterministic, i.e., it generates different actual layouts for each invocation. This may be detrimental for users trying to maintain a mental map of the Wordonoi representation across spatializations. Our choice in text visualization is solely based on tag clouds, and there may be other and more efficient text visualization techniques to use for this purpose. Nevertheless, we think that any text visualization will benefit from having a maximum of available space inside the Wordonoi cells, and should therefore be possible to simply plug into the existing layout. Similarly, our use of tf-idf for text mining and extraction should not be seen as indicative of limitations in the Wordonoi technique itself; we certainly think more advanced text analytics algorithm can be used in its stead. It should also be noted that we provide distance metrics for aggregation based both on spatial information as well as network topology. Different metrics have different strengths and weaknesses. While topology-based metrics are clearly the most faithful to the original network structure, they may yield aggregations that are difficult to use efficiently for text visualization. Furthermore, even such spatially based metrics may yield less-than-ideal layouts. These effects were the reason we devised the rectangular completion distance met Finally, there exists several additional examples where the Wordonoi technique can be applied. For example, social media data from sites such as Facebook, Twitter, and MySpace contain both relations and large amounts of textual data, and are therefore potential applications for the technique. Another example could be for crime and incident reports to summarize types of crime, relations between different incidents, and the crime trend in a particular area. 8. Conclusion and Future Work We have presented a novel visualization technique called Wordonoi that combines both the structure and textual content in knowledge networks. While this approach sacrifices some of the structure from the original network in favor of textual content, it is highly amenable to hierarchical aggregation to combat large scale, and we also provide multiple interactive and visual methods for recovering this lost structure. We have also demonstrated the utility of the Wordonoi technique in three examples of knowledge networks. Several potential future directions exist. We plan to deploy and evaluate the system in a large research organization. We will bring in more text mining and analytics, such as topic modeling and word ontologies, to improve the text visualization component. References [BGM04] B EDERSON B. B., G ROSJEAN J., M EYER J.: Toolkit design for interactive structured graphics. IEEE Transactions on Software Engineering 30, 8 (2004), , 6 [Car96] C ARD S. K.: Visualizing retrieved information: A survey. IEEE Computer Graphics and Applications 16, 2 (Mar. 1996), , 8 [CCP09] C OLLINS C., C ARPENDALE M. S. T., P ENN G.: DocuBurst: Visualizing document content using language structure. Computer Graphics Forum 28, 3 (2009), [Che76] C HEN P. P.-S.: The entity-relationship model toward a unified view of data. ACM Transactions on Database Systems 1, 1 (1976), [Cla] C LARK J.: Clustered word clouds. http: //neoformix.com/2008/clusteredwordclouds. html. Oct [CSL 10] C AO N., S UN J., L IN Y., G OTZ D., L IU S., Q U H.: FacetAtlas: Multifaceted visualization for rich text corpora. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010),

10 10 Submission ID 238 / Wordonoi: Visualizing the Structure and Textual Contents of Knowledge Networks [CVW09] COLLINS C., VIÉGAS F. B., WATTENBERG M.: Parallel tag clouds to explore faceted text corpora. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (2009), pp , 3 [DZG 07] DON A., ZHELEVA E., GREGORY M., TARKAN S., AUVIL L., CLEMENT T., SHNEIDERMAN B., PLAISANT C.: Discovering interesting usage patterns in text collections: integrating text mining with visualization. In Proceedings of the ACM Conference on Information and Knowledge Management (2007), pp , 2 [EF10] ELMQVIST N., FEKETE J.-D.: Hierarchical aggregation for information visualization: Overview, techniques and design guidelines. IEEE Transactions on Visualization and Computer Graphics 16, 3 (2010), , 5 [Eic94] EICK S. G.: Graphically displaying text. Journal of Computational and Graphical Statistics 3, 2 (1994), [GHK10a] GANSNER E., HU Y., KOBOUROV S.: GMap: Visualizing graphs and clusters as maps. In Proceedings of the IEEE Pacific Visualization Symposium (2010), pp [GHK10b] GANSNER E., HU Y., KOBOUROV S.: Visualizing graphs and clusters as maps. Computer Graphics and Applications 30, 6 (2010), [GHKV09] GANSNER E., HU Y., KOBOUROV S., VOLINSKY C.: Putting recommendations on the map: visualizing clusters and relations. In Proceedings of the ACM Conference on Recommender Systems (2009), pp [GHN12] GANSNER E. R., HU Y., NORTH S. C.: Visualizing streaming text data with dynamic maps. CoRR abs/ (2012). 3 [Hea99] HEARST M. A.: Untangling text data mining. In Proceedings of the Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (1999), pp [Hea09] HEARST M.: Search user interfaces. Cambridge University Press, [HHWN02] HAVRE S., HETZLER E., WHITNEY P., NOWELL L.: ThemeRiver: Visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics 8, 1 (Jan. 2002), [HR08] HEARST M. A., ROSNER D. K.: Tag clouds: Data analysis tool or social signaller? In Proceedings of the Hawaii International Conference on System Sciences (2008), pp [Ins85] INSELBERG A.: The plane with parallel coordinates. The Visual Computer 1, 2 (1985), [Jon72] JONES S. K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1972), [KKEE11] KIM K., KO S., ELMQVIST N., EBERT D. S.: Word- Bridge: using composite tag clouds in node-link diagrams for visualizing content and relations in text corpora. In Proceedings of the Hawaii International Conference on System Sciences (2011), pp , 2, 3, 4, 6 [KLKS10] KOH K., LEE B., KIM B., SEO J.: ManiWordle: Providing flexible control over wordle. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), [Koh82] KOHONEN T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 1 (1982), [LNS11] LIU Z., NAVATHE S. B., STASKO J. T.: Networkbased visual analysis of tabular data. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (2011), pp , 7 [LRKC10] LEE B., RICHE N. H., KARLSON A. K., CARPEN- DALE S.: SparkClouds: visualizing trends in tag clouds. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), [Noa05] NOACK A.: Energy-based clustering of graphs with nonuniform degrees. In Proceedings of the International Symposium on Graph Drawing (2005), pp , 5 [Pal02] PALEY W. B.: TextArc: Showing word frequency and distribution in text. In Poster Proceedigns of the IEEE Symposium on Information Visualization (2002). 2, 3 [SGL08] STASKO J. T., GÖRG C., LIU Z.: Jigsaw: supporting investigative analysis through interactive visualization. Information Visualization 7, 2 (2008), , 3 [Shn96] SHNEIDERMAN B.: The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the IEEE Symposium on Visual Languages (1996), pp [SWL 10] SHI L., WEI F., LIU S., TAN L., LIAN X., ZHOU M. X.: Understanding text corpora with multiple facets. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (2010), pp [VGD06] VIÉGAS F. B., GOLDER S., DONATH J.: Visualizing content: portraying relationships from conversational histories. In Proceedings of the ACM Conference on Human Factors in Computing Systems (2006), pp [vhwv09] VAN HAM F., WATTENBERG M., VIÉGAS F. B.: Mapping text with phrase nets. IEEE Transactions on Visualization and Computer Graphics 15, 6 (2009), , 2, 3, 4 [VWF09] VIÉGAS F. B., WATTENBERG M., FEINBERG J.: Participatory visualization with Wordle. IEEE Transactions on Visualization and Computer Graphics 15, 6 (2009), , 2, 6 [Wat02] WATTENBERG M.: Arc diagrams: Visualizing structure in strings. In Proceedings of the IEEE Symposium on Information Visualization (2002), pp [Wat06] WATTENBERG M.: Visual exploration of multivariate graphs. In Proceedings of the ACM Conference on Human Factors in Computing Systems (2006), pp [WLS 10] WEI F., LIU S., SONG Y., PAN S., ZHOU M., QIAN W., SHI L., TAN L., ZHANG Q.: TIARA: a visual exploratory text analytic system. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (2010), pp [WMP 05] WONG P. C., MACKEY P., PERRINE K., EAGAN J., FOOTE H., THOMAS J.: Dynamic visualization of graphs with extended labels. In Proceedings of IEEE Symposium on Information Visualization (2005), pp [WTP 95] WISE J. A., THOMAS J. J., PENNOCK K., LANTRIP D., POTTIER M., SCHUR A., CROW V.: Visualizing the nonvisual: Spatial analysis and interaction with information from text documents. In Proceedings of the IEEE Symposium on Information Visualization (1995), pp [WV08] WATTENBERG M., VIÉGAS F. B.: The word tree, an interactive visual concordance. IEEE Transactions on Visualization and Computer Graphics 14, 6 (Nov./Dec. 2008), , 3

Visualizing Translation Variation of Othello : A Survey of Text Visualization and Analysis Tools : Supplementary Material

Visualizing Translation Variation of Othello : A Survey of Text Visualization and Analysis Tools : Supplementary Material Eurographics Conference on Visualization (EuroVis) (2014) N. Elmqvist, M. Hlawitschka, and J. Kennedy (Editors) Short Papers Visualizing Translation Variation of Othello : A Survey of Text Visualization

More information

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft

More information

Topic Maps Visualization

Topic Maps Visualization Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics

More information

Understanding Data: A Comparison of Information Visualization Tools and Techniques

Understanding Data: A Comparison of Information Visualization Tools and Techniques Understanding Data: A Comparison of Information Visualization Tools and Techniques Prashanth Vajjhala Abstract - This paper seeks to evaluate data analysis from an information visualization point of view.

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Orland Hoeber and Hanze Liu Department of Computer Science, Memorial University St. John s, NL, Canada A1B 3X5

More information

SuperViz: An Interactive Visualization of Super-Peer P2P Network

SuperViz: An Interactive Visualization of Super-Peer P2P Network SuperViz: An Interactive Visualization of Super-Peer P2P Network Anthony (Peiqun) Yu pqyu@cs.ubc.ca Abstract: The Efficient Clustered Super-Peer P2P network is a novel P2P architecture, which overcomes

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

NakeDB: Database Schema Visualization

NakeDB: Database Schema Visualization NAKEDB: DATABASE SCHEMA VISUALIZATION, APRIL 2008 1 NakeDB: Database Schema Visualization Luis Miguel Cortés-Peña, Yi Han, Neil Pradhan, Romain Rigaux Abstract Current database schema visualization tools

More information

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical

More information

HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data

HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data P a g e 77 Vol. 9 Issue 5 (Ver 2.0), January 2010 Global Journal of Computer Science and Technology HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data Abstract- The HierarchyMap

More information

A Tutorial on dynamic networks. By Clement Levallois, Erasmus University Rotterdam

A Tutorial on dynamic networks. By Clement Levallois, Erasmus University Rotterdam A Tutorial on dynamic networks By, Erasmus University Rotterdam V 1.0-2013 Bio notes Education in economics, management, history of science (Ph.D.) Since 2008, turned to digital methods for research. data

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

TOP-DOWN DATA ANALYSIS WITH TREEMAPS

TOP-DOWN DATA ANALYSIS WITH TREEMAPS TOP-DOWN DATA ANALYSIS WITH TREEMAPS Martijn Tennekes, Edwin de Jonge Statistics Netherlands (CBS), P.0.Box 4481, 6401 CZ Heerlen, The Netherlands m.tennekes@cbs.nl, e.dejonge@cbs.nl Keywords: Abstract:

More information

Public Online Data - The Importance of Colorful Query Trainers

Public Online Data - The Importance of Colorful Query Trainers BROWSING LARGE ONLINE DATA WITH QUERY PREVIEWS Egemen Tanin * egemen@cs.umd.edu Catherine Plaisant plaisant@cs.umd.edu Ben Shneiderman * ben@cs.umd.edu Human-Computer Interaction Laboratory and Department

More information

Exploration and Visualization of Post-Market Data

Exploration and Visualization of Post-Market Data Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson

More information

Visualization of Streaming Data: Observing Change and Context in Information Visualization Techniques

Visualization of Streaming Data: Observing Change and Context in Information Visualization Techniques Visualization of Streaming Data: Observing Change and Context in Information Visualization Techniques Miloš Krstajić, Daniel A. Keim University of Konstanz Konstanz, Germany {milos.krstajic,daniel.keim}@uni-konstanz.de

More information

<no narration for this slide>

<no narration for this slide> 1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Voronoi Treemaps in D3

Voronoi Treemaps in D3 Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular

More information

Visualizing an Auto-Generated Topic Map

Visualizing an Auto-Generated Topic Map Visualizing an Auto-Generated Topic Map Nadine Amende 1, Stefan Groschupf 2 1 University Halle-Wittenberg, information manegement technology na@media-style.com 2 media style labs Halle Germany sg@media-style.com

More information

Heat Map Explorer Getting Started Guide

Heat Map Explorer Getting Started Guide You have made a smart decision in choosing Lab Escape s Heat Map Explorer. Over the next 30 minutes this guide will show you how to analyze your data visually. Your investment in learning to leverage heat

More information

Visualization of Software Metrics Marlena Compton Software Metrics SWE 6763 April 22, 2009

Visualization of Software Metrics Marlena Compton Software Metrics SWE 6763 April 22, 2009 Visualization of Software Metrics Marlena Compton Software Metrics SWE 6763 April 22, 2009 Abstract Visualizations are increasingly used to assess the quality of source code. One of the most well developed

More information

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some

More information

DATA LAYOUT AND LEVEL-OF-DETAIL CONTROL FOR FLOOD DATA VISUALIZATION

DATA LAYOUT AND LEVEL-OF-DETAIL CONTROL FOR FLOOD DATA VISUALIZATION DATA LAYOUT AND LEVEL-OF-DETAIL CONTROL FOR FLOOD DATA VISUALIZATION Sayaka Yagi Takayuki Itoh Ochanomizu University Mayumi Kurokawa Yuuichi Izu Takahisa Yoneyama Takashi Kohara Toshiba Corporation ABSTRACT

More information

Hierarchical Data Visualization

Hierarchical Data Visualization Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and

More information

Information Visualization of Attributed Relational Data

Information Visualization of Attributed Relational Data Information Visualization of Attributed Relational Data Mao Lin Huang Department of Computer Systems Faculty of Information Technology University of Technology, Sydney PO Box 123 Broadway, NSW 2007 Australia

More information

Gephi Tutorial Quick Start

Gephi Tutorial Quick Start Gephi Tutorial Welcome to this introduction tutorial. It will guide you to the basic steps of network visualization and manipulation in Gephi. Gephi version 0.7alpha2 was used to do this tutorial. Get

More information

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus Martin Kraus Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate

More information

Interactive Information Visualization of Trend Information

Interactive Information Visualization of Trend Information Interactive Information Visualization of Trend Information Yasufumi Takama Takashi Yamada Tokyo Metropolitan University 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan ytakama@sd.tmu.ac.jp Abstract This paper

More information

JustClust User Manual

JustClust User Manual JustClust User Manual Contents 1. Installing JustClust 2. Running JustClust 3. Basic Usage of JustClust 3.1. Creating a Network 3.2. Clustering a Network 3.3. Applying a Layout 3.4. Saving and Loading

More information

SAS BI Dashboard 4.3. User's Guide. SAS Documentation

SAS BI Dashboard 4.3. User's Guide. SAS Documentation SAS BI Dashboard 4.3 User's Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2010. SAS BI Dashboard 4.3: User s Guide. Cary, NC: SAS Institute

More information

Visualization Techniques in Data Mining

Visualization Techniques in Data Mining Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano

More information

VisCG: Creating an Eclipse Call Graph Visualization Plug-in. Kenta Hasui, Undergraduate Student at Vassar College Class of 2015

VisCG: Creating an Eclipse Call Graph Visualization Plug-in. Kenta Hasui, Undergraduate Student at Vassar College Class of 2015 VisCG: Creating an Eclipse Call Graph Visualization Plug-in Kenta Hasui, Undergraduate Student at Vassar College Class of 2015 Abstract Call graphs are a useful tool for understanding software; however,

More information

The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. Ben Shneiderman, 1996

The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. Ben Shneiderman, 1996 The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations Ben Shneiderman, 1996 Background the growth of computing + graphic user interface 1987 scientific visualization 1989 information

More information

Visually Encoding Program Test Information to Find Faults in Software

Visually Encoding Program Test Information to Find Faults in Software Visually Encoding Program Test Information to Find Faults in Software James Eagan, Mary Jean Harrold, James A. Jones, and John Stasko College of Computing / GVU Center Georgia Institute of Technology Atlanta,

More information

Graphical Web based Tool for Generating Query from Star Schema

Graphical Web based Tool for Generating Query from Star Schema Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604

More information

Visualization of Corpus Data by a Dual Hierarchical Data Visualization Technique

Visualization of Corpus Data by a Dual Hierarchical Data Visualization Technique Visualization of Corpus Data by a Dual Hierarchical Data Visualization Technique Takayuki Itoh Haruho Tachibana Graduate School of Humanities and Sciences, Ochanomizu University 2-1-1 Otsuka, Bunkyo-ku,

More information

DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence

DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence Abstract David Gotz, PhD 1, Jimeng Sun, PhD 1, Nan Cao, MS 2, Shahram Ebadollahi, PhD 1 1 IBM T.J. Watson Research Center, New

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Hierarchy and Tree Visualization

Hierarchy and Tree Visualization Hierarchy and Tree Visualization Definition Hierarchies An ordering of groups in which larger groups encompass sets of smaller groups. Data repository in which cases are related to subcases Hierarchical

More information

MicroStrategy Analytics Express User Guide

MicroStrategy Analytics Express User Guide MicroStrategy Analytics Express User Guide Analyzing Data with MicroStrategy Analytics Express Version: 4.0 Document Number: 09770040 CONTENTS 1. Getting Started with MicroStrategy Analytics Express Introduction...

More information

SolarMap: Multifaceted Visual Analytics for Topic Exploration

SolarMap: Multifaceted Visual Analytics for Topic Exploration 2011 11th IEEE International Conference on Data Mining SolarMap: Multifaceted Visual Analytics for Topic Exploration Nan Cao, David Gotz, Jimeng Sun, Yu-Ru Lin and Huamin Qu Hong Kong University of Science

More information

Web Archiving and Scholarly Use of Web Archives

Web Archiving and Scholarly Use of Web Archives Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data White Paper A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data Contents Executive Summary....2 Introduction....3 Too much data, not enough information....3 Only

More information

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values Information Visualization & Visual Analytics Jack van Wijk Technische Universiteit Eindhoven An example y 30 items, 30 x 3 values I-science for Astronomy, October 13-17, 2008 Lorentz center, Leiden x An

More information

OLAP Visualization Operator for Complex Data

OLAP Visualization Operator for Complex Data OLAP Visualization Operator for Complex Data Sabine Loudcher and Omar Boussaid ERIC laboratory, University of Lyon (University Lyon 2) 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France Tel.: +33-4-78772320,

More information

SAS BI Dashboard 3.1. User s Guide

SAS BI Dashboard 3.1. User s Guide SAS BI Dashboard 3.1 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS BI Dashboard 3.1: User s Guide. Cary, NC: SAS Institute Inc. SAS BI Dashboard

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

Interactive Data Mining and Visualization

Interactive Data Mining and Visualization Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives

More information

Towards Event Sequence Representation, Reasoning and Visualization for EHR Data

Towards Event Sequence Representation, Reasoning and Visualization for EHR Data Towards Event Sequence Representation, Reasoning and Visualization for EHR Data Cui Tao Dept. of Health Science Research Mayo Clinic Rochester, MN Catherine Plaisant Human-Computer Interaction Lab ABSTRACT

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

ProteinQuest user guide

ProteinQuest user guide ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for

More information

Graph/Network Visualization

Graph/Network Visualization Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation

More information

BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data

BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data Julian Heinrich, Robert Seifert, Michael Burch, Daniel Weiskopf VISUS, University of Stuttgart Abstract. Exploring data sets by

More information

Visualizing e-government Portal and Its Performance in WEBVS

Visualizing e-government Portal and Its Performance in WEBVS Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR ccfong@umac.mo Abstract An e-government

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

More information

Topological Tree Clustering of Social Network Search Results

Topological Tree Clustering of Social Network Search Results Topological Tree Clustering of Social Network Search Results Richard T. Freeman Capgemini, FS Business Information Management No. 1 Forge End, Woking, Surrey, GU21 6DB United Kingdom richard.freeman@capgemini.com

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com>

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com> IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration

More information

Newdle: Interactive Visual Exploration of Large Online News Collections

Newdle: Interactive Visual Exploration of Large Online News Collections 1 Newdle: Interactive Visual Exploration of Large Online News Collections Jing Yang, Dongning Luo, and Yujie Liu Dept of Computer Science University of North Carolina at Charlotte jyang13, dluo2, yliu39@uncc.edu

More information

Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis

Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis M. Kreuseler, T. Nocke, H. Schumann, Institute of Computer Graphics University of Rostock, D-18059 Rostock, Germany

More information

4.2. Topic Maps, RDF and Ontologies Basic Concepts

4.2. Topic Maps, RDF and Ontologies Basic Concepts Topic Maps, RDF Graphs and Ontologies Visualization Bénédicte Le Grand, Michel Soto, Laboratoire d'informatique de Paris 6 (LIP6) 4.1. Introduction Information retrieval in current information systems

More information

Scholarly Use of Web Archives

Scholarly Use of Web Archives Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png

More information

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).

More information

Visualizing Large Graphs with Compound-Fisheye Views and Treemaps

Visualizing Large Graphs with Compound-Fisheye Views and Treemaps Visualizing Large Graphs with Compound-Fisheye Views and Treemaps James Abello 1, Stephen G. Kobourov 2, and Roman Yusufov 2 1 DIMACS Center Rutgers University {abello}@dimacs.rutgers.edu 2 Department

More information

Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007

Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007 Hierarchical Data Visualization Ai Nakatani IAT 814 February 21, 2007 Introduction Hierarchical Data Directory structure Genealogy trees Biological taxonomy Business structure Project structure Challenges

More information

Submission to 2003 National Conference on Digital Government Research

Submission to 2003 National Conference on Digital Government Research Submission to 2003 National Conference on Digital Government Research Title: Data Exploration with Paired Hierarchical Visualizations: Initial Designs of PairTrees Authors: Bill Kules, Ben Shneiderman

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

A Survey on Web Mining From Web Server Log

A Survey on Web Mining From Web Server Log A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering

More information

Web Analysis Visualization Spreadsheet

Web Analysis Visualization Spreadsheet Web Analysis Visualization Spreadsheet Ed Huai-hsin Chi Xerox Palo Alto Researh Center 3333 Coyote Hill Road Palo Alto, CA 94304 chi@acm.org Abstract In this paper, we present methods of information visualization

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

CHAPTER-24 Mining Spatial Databases

CHAPTER-24 Mining Spatial Databases CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification

More information

Process Mining by Measuring Process Block Similarity

Process Mining by Measuring Process Block Similarity Process Mining by Measuring Process Block Similarity Joonsoo Bae, James Caverlee 2, Ling Liu 2, Bill Rouse 2, Hua Yan 2 Dept of Industrial & Sys Eng, Chonbuk National Univ, South Korea jsbae@chonbukackr

More information

Information & Data Visualization. Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN ytakama@sd.tmu.ac.jp

Information & Data Visualization. Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN ytakama@sd.tmu.ac.jp Information & Data Visualization Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN ytakama@sd.tmu.ac.jp 1 Introduction Contents Self introduction & Research purpose Social Data Analysis Related Works

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Instagram Post Data Analysis

Instagram Post Data Analysis Instagram Post Data Analysis Yanling He Xin Yang Xiaoyi Zhang Abstract Because of the spread of the Internet, social platforms become big data pools. From there we can learn about the trends, culture and

More information

Using Visual Analytics to Enhance Data Exploration and Knowledge Discovery in Financial Systemic Risk Analysis: The Multivariate Density Estimator

Using Visual Analytics to Enhance Data Exploration and Knowledge Discovery in Financial Systemic Risk Analysis: The Multivariate Density Estimator Using Visual Analytics to Enhance Data Exploration and Knowledge Discovery in Financial Systemic Risk Analysis: The Multivariate Density Estimator Victoria L. Lemieux 1,2, Benjamin W.K. Shieh 2, David

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

Personalized Information Management for Web Intelligence

Personalized Information Management for Web Intelligence Personalized Information Management for Web Intelligence Ah-Hwee Tan Kent Ridge Digital Labs 21 Heng Mui Keng Terrace, Singapore 119613 Email: ahhwee@krdl.org.sg Abstract Web intelligence can be defined

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

María Elena Alvarado gnoss.com* elenaalvarado@gnoss.com Susana López-Sola gnoss.com* susanalopez@gnoss.com

María Elena Alvarado gnoss.com* elenaalvarado@gnoss.com Susana López-Sola gnoss.com* susanalopez@gnoss.com Linked Data based applications for Learning Analytics Research: faceted searches, enriched contexts, graph browsing and dynamic graphic visualisation of data Ricardo Alonso Maturana gnoss.com *Piqueras

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

an introduction to VISUALIZING DATA by joel laumans

an introduction to VISUALIZING DATA by joel laumans an introduction to VISUALIZING DATA by joel laumans an introduction to VISUALIZING DATA iii AN INTRODUCTION TO VISUALIZING DATA by Joel Laumans Table of Contents 1 Introduction 1 Definition Purpose 2 Data

More information

Utilizing spatial information systems for non-spatial-data analysis

Utilizing spatial information systems for non-spatial-data analysis Jointly published by Akadémiai Kiadó, Budapest Scientometrics, and Kluwer Academic Publishers, Dordrecht Vol. 51, No. 3 (2001) 563 571 Utilizing spatial information systems for non-spatial-data analysis

More information

White Paper. Data Visualization Techniques. From Basics to Big Data With SAS Visual Analytics

White Paper. Data Visualization Techniques. From Basics to Big Data With SAS Visual Analytics White Paper Data Visualization Techniques From Basics to Big Data With SAS Visual Analytics Contents Introduction... 1 Tips to Get Started... 1 The Basics: Charting 101... 2 Line Graphs...2 Bar Charts...3

More information