Scatterplot Layout for High-dimensional Data Visualization
|
|
|
- Buddy Briggs
- 10 years ago
- Views:
Transcription
1 Noname manuscript No. (will be inserted by the editor) Scatterplot Layout for High-dimensional Data Visualization Yunzhu Zheng Haruka Suematsu Takayuki Itoh Ryohei Fujimaki Satoshi Morinaga Yoshinobu Kawahara Received: date / Accepted: date Abstract Multi-dimensional data visualization is an important research topic that has been receiving increasing attention. Several techniques that apply scatterplot matrices have been proposed to represent multi-dimensional data as a collection of two-dimensional data visualization spaces. Typically, when using the scatterplot-based approach it is easier to understand relations between particular pairs of dimensions, but it often requires too large display spaces to display all possible scatterplots. This paper presents a technique to display meaningful sets of scatterplots generated from high-dimensional datasets. Our technique first evaluates all possible scatterplots generated from high-dimensional datasets, and selects meaningful sets. It then calculates the similarity between arbitrary pairs of the selected scatterplots, and places relevant scatterplots closer together in the display space while they never overlap each other. This design policy makes users easier to visually compare relevant sets of scatterplots. This paper presents algorithms to place the scatterplots by the combination of ideal position calculation and rectangle packing algorithms, and two examples demonstrating the effectiveness of the presented technique. Keywords Visualization Scatterplot Isomap Force-directed graph layout Rectangle packing 1 Introduction Multi-dimensional data visualization is an important and active research field. According to survey papers in the field, several techniques have been proposed [Wong 97,Grinstein 01]. The authors of [Wong 97] divided the available multidimensional data visualization techniques into three categories: two-variate displays, multivariate displays, and animation. Yunzhu Zheng, Haruka Suematsu, Takayuki Itoh Department of Information Sciences, Ochanomizu University Otsuka, Bunkyo-ku, Tokyo Japan {yunzhu, gomiai, itot
2 2 Yunzhu Zheng et al. The typical two-variate display technique is the scatterplot, which assigns two of the dimensions onto the X- and Y-axis of the two-dimensional display space. Because historically scatterplots have been widely used and are currently implemented in commercial spreadsheet software packages, they are particularly popular and users are familiar with them. The scatterplot matrix, which consists of multiple adjacent scatterplots, has also been widely used to represent the dimensions of high-dimensional datasets. Each scatterplot is identified by its row and column index. Even though ordinary users are familiar with scatterplot matrix, they have two major drawbacks. First, if the number of dimensions in a given dataset is very large, the individual scatterplots in a display space may be very small. Second, it is difficult to visually compare arbitrary pairs of scatterplots that are distantly placed in the display space. Multivariate display techniques attempt to represent the distribution of all the dimensions in a given dataset on a single display space. Several multivariate display techniques are available, including icon- and glyph-based techniques, such as hierarchical axis [Mihalisin 90], worlds within worlds [Feiner 90], parallel coordinates [Inselberg 90], VisDB system [Keim 94], and XmdvTool [Ward 94]. Recently, the parallel coordinates technique has been the subject of many studies and is widely used. However, this technique has several drawbacks. First, when the number of dimensions is very large it may require a large horizontal display space. Second, it is difficult to represent the correlation of a particular dimension with three or more dimensions. In this paper, we present a technique to represent high-dimensional spaces using multiple scatterplots. Our proposed technique overcomes the drawbacks of both the two-variate and multi-variate data visualization techniques. The proposed technique selectively displays meaningful sets of scatterplots. Our technique first selects a pre-defined number of scatterplots based on a particular conditions. Here, we think one of the meaningfulness of the scatterplots can be determined based on separateness of particular classes, and therefore the separateness is applied as the condition in this paper. Then, it defines the distances or connectivity between all the pairs of scatterplots. Next, it computes the ideal positions of the scatterplots based on their distance or connectivity values. Finally, it applies a rectangle packing algorithm and places the scatterplots [Itoh 06] [Itoh 09] based on their ideal positions. By the combination of ideal position calculation and rectangle packing algorithms, our technique places similarly looking scatterplots closer together in the display spaces, while they never overlap each other. This paper presents visualization results two example datasets demonstrating the effectiveness of the presented technique. 2 Related Work 2.1 Scaterplot As discussed in Section 1, many scatterplot-based techniques use dimension reduction or the scatterplot matrix. Dimension reduction is useful in representing in a single display space, the overall distribution of vectors in multidimensional spaces. The effectiveness of dimension reduction depends strongly on the projection scheme selected. Leban et al. [Leban 05] introduced VisRank, which supports
3 Scatterplot Layout for High-dimensional Data Visualization 3 users in obtaining meaningful projections. Scaterplot matrix is often preferable because it directly assigns dimensions to the axes of scatterplots. However, these techniques have the drawback that the size of the scatterplots displayed may be very small. To address the problem, Wilkinson et al. [Wilkinson 05] and Sips et al. [Sips 09] presented techniques emphasizing on displaying meaningful scatterplots. However, because these techniques are still based on matrix-based layouts of scatterplots, it is often difficult to effectively represent correlations among scatterplots. Developing interactive mechanism is another approach for exploration of high-dimensional spaces. Recently, various interactive techniques, such as rolling the dice [Elmqvist 08], have been introduced to assist users in exploring highdimensional spaces. 2.2 Rectangle Packing Our proposed technique applies a rectangle packing algorithm to visualize hierarchical data [Itoh 06]. The rectangle packing algorithm represents a hierarchy as nested rectangles and leaf-nodes as painted icons. Moreover, it mostly satisfies the following conditions: Condition 1: It never overlap the leaf-nodes and branch-nodes in a single hierarchy of other nodes. Condition 2: It minimizes the display area. Condition 3: It minimizes the aspect ratio and area of the rectangular subspaces. Condition 4: It minimizes the distances between the actual and ideal positions of the rectangular subspaces (when the ideal positions of the rectangular subspaces are provided). First, the rectangle-packing algorithm specifies the order of the placement of rectangles. Then, it identifies several candidate positions that satisfy condition 1, for placing a rectangle. Next, for each candidate position, it computes penalty values, which represent to what extent each position satisfies conditions 2, 3, and 4, respectively. Finally, it places the rectangle at the best candidate position. This rectangle packing algorithm has also been applied to a graph visualization technique [Itoh 09]. The technique first applies hierarchical clustering to a given graph, and generates clusters of nodes based on both their assigned categories and connectivity. Then, it visualizes the hierarchy by applying a hybrid force-directed and space-filling layout. The force-directed layout minimizes distances between connected or similarly categorized nodes. Conversely, the space-filling layout applies the rectangle-packing algorithm to minimize the cluttering of nodes and maximize the utility of the display. Hence, the hybrid layout for graph visualization realizes simultaneously both features. 3 Scatterplot Packing for High-Dimensional Data Visualization This section firstly defines the input datasets and outputs, and briefly describes the processing flow of the scatterplot packing. It then presents in detail how we implement high-dimensional data visualization by packing a set of scatterplots.
4 4 Yunzhu Zheng et al. Our implementation firstly selects a set of meaningful scatterplots. Next, it computes their ideal positions based on their similarity values. Finally, it applies the rectangle packing algorithm to adjust the positions of the meaningful scatterplots. 3.1 Data structure In this paper, an n-dimensional input dataset Ds is defined as follows: Ds = {x 1,..., x N }, (1) where x i is the i-th vector, and N is the number of vectors. Our technique first forms pairs of the n dimensions: Gp = {g 1,..., g G },g i = {d i1,d i2 }, (2) where g i is the i-th pair of the dimensions, G = n(n 1)/2 is the number of pairs, and d ij is the j-th dimension of g i. 3.2 Processing flow Our proposed technique first constructs a set of two-dimensional datasets from the high-dimensional input dataset. Let S i = {x d i 1,..., xd i N } be a set of scalar values, where x d i j is the value of the d i -th dimension of the j-th vector. Our technique compares arbitrary pairs of values S i and S j, and if they are similar or correlative, categorizes the i-th and j-th dimensions into the same group. Next, it computes the similarities or distances between arbitrary pairs of values of the datasets Dp i and Dp j. Here, the dataset Dp i = {x 1,..., x N} is a dataset consisting of twodimensional vectors constructed by extracting an arbitrary pair of dimensions g i from Ds. Finally, it calculates the positions of the two-dimensional datasets and places similar ones closer together in the display space. Our technique first calculates the ideal positions of the scatterplots, and then applies the rectangle packing algorithm to adjust their positions. This two-step technique satisfies the following requirements: (1) it places similar scatterplots closer together in the display space, and (2) it avoids overlaps, thus reducing wasted space in the display regions. Our current implementation of computing ideal positions supports the following two methods: Dimension reduction based on the similarity distances between scatterplots. Graph layout, where scatterplots are connected based on their similarity measures. 3.3 Selection of Scatterplots As mentioned above, our approach first selects a set of meaningful scatterplots. We assume that one of the classes, {Y 1,..., Y C }, is assigned exclusively to the vectors of a multidimensional dataset. Here, C is the number of classes, and Y j is the j-th constant value which denotes a particular class. Our current implementation selects
5 Scatterplot Layout for High-dimensional Data Visualization 5 a set of scatterplots which have the particular features that vectors of particular classes are well-separated from other vectors. This section proposes formula for determining class separation and an algorithm to select the scatterplots. First, we compute the entropy H all for all pairs of dimensions H all (g k )= 1/N N C n=1 c=1 p(y n = c x g k n )logp(y n = c x g k n ) (3) where x i is the i-th vector, y i is the class assigned to the i-th vector, and p(y n = c x g k n ) is the probability that the c-th class Y c is assigned to the n-th vector x n. Moreover, x g j i is a two-dimensional vector containing the k-th pair of the dimensions of x i. This value represents the separation of classes in a scatterplot generated by the k-th pair of the dimensions. Furthermore, we compute the entropy H c with the c-th class, for all classes (1 c C). H c (g k )= 1/N N n=1 p(y n = c x g k n )logp(y n = c x g k n ) +p(y n c x g k n )logp(y n c x g k n ) (4) This value represents the separation of the c-th class from other classes in a scatterplot generated by the k-th pair of the dimensions. Our implementation first computes H all and H c for all pairs of dimensions as shown in Figure 1(1). Next, it selects a predefined number of pairs of dimensions which have the smaller H all and/or H c values. Values H all and H c are also used in Section 3.4 to place similarly looking scatterplots closer together in the display space. (1) Entropy calculation d 1 d n d 1 Dimension pairs selection (2) Distance matrix of scatterpots S 1 S 1 d n Calculate (H all, H 1,, H C ) for all pairs of dimensions Calculate distances between pairs of the vectors (H all, H 1,, H C ) (4) Scatterplot packing (3) Ideal position calculation S 1 S 2 S 1 S 2 S 3 S 4 S 3 S 4 By dimension reduction or force-directed graph layout Fig. 1 Illustration of selecting and placing scatterplots. Though this section introduces a scatterplot selection algorithm based on class separation, there are other meaningful metrics. For example, centrality [Correa 12] may be a good criterion for scatterplot selection.
6 6 Yunzhu Zheng et al. 3.4 Placement of Selected Scatterplots In this subsection, we describe how the selected set of scatterplots is placed onto the display space. Our technique first computes the ideal positions of the scatterplots on the display space based on their similarity, and then adjusts their positions by a rectangle packing algorithm. We prepare two algorithms for the ideal position computation which are alternatively applied Dimension reduction for computing ideal positions The entropy values H all,h 1,..., H C are arranged in a C + 1 dimensional vector, termed the entropy vector of the section. Here, the section denotes the number of scatterplots M, the number of dimensions of the entropy vectors dim = C +1, and the entropy vector of the j-th scatterplot S j. Next, we calculate the similarity distances between the a-th and b-th scatterplots, and generate an M M matrix containing the distances of all possible pairs of scatterplots, as shown in Figure 1(2). Then, we apply to the matrix the Isomap scheme and compute the twodimensional positions of the scatterplots Graph layout for computing ideal positions Instead of generating a distance matrix, we generate a graph structure by connecting pairs of scatterplots that are sufficiently similar. Next, we apply a forcedirected graph layout technique and compute the two-dimensional positions of the scatterplots. Our implementation simply sets the stable lengths of edges as the above mentioned similarity distance values before applying the force-directed layout, while it does not consider the sizes of nodes. This process can be improved by applying more sophisticated techniques [Harel 02] [Lin 09] Rectangle packing After calculating the ideal positions of the scatterplots by selectively applying dimension reduction or graph layout techniques, as shown in Figure 1(3), we compute the final positions of the scatterplots by applying a rectangle packing algorithm, as shown in Figure 1(4). 4Example In this section, we present examples of visualizing datasets using our technique described above. We implemented in Python 2.7 the scatterplot selection (Section 3.3), and dimension reduction (Sections 3.4.1) algorithms. Moreover, we implemented the force-directed layout (Sections 3.4.2) and rectangle packing (Sections 3.4.3) algorithms using the Java Development Kit (JDK) We executed the implementation on Lenovo ThinkPad T420s (Intel Core CPU 2.70GHz and 8GB RAM) with 64 bit version of Windows 7 Service Pack 1. In our experiments, we observed large differences in the quality and computation time between approaches using dimension reduction and force-directed graph
7 Scatterplot Layout for High-dimensional Data Visualization 7 layout. Dimension reduction required approximately ten minutes of computation time, while force-directed graph layout required only several seconds. Conversely, the visualization quality using dimension reduction was subjectively much better. In the next subsection, we present visualization results obtained using dimension reduction. 4.1 Example 1: Image segmentation data In this subsection, we present the visualization results obtained by applying our technique to the Segmentation dataset published at the UCI machine learning repository [Data-seg]. This dataset contains 18 feature values of 210 blocks of images. Here, the blocks are generated by dividing each of 7 images into 30 blocks. We treated images as classes, feature values as dimensions, and blocks as vectors. Hence, the 18-dimensional dataset is displayed as 210 vectors in 7 colors. In Figure 2, the technique displays 68 scatterplots resulting from our experiment. The figure shows many of scatterplots selected by the presented technique well separate particular classes. Also, the figure shows that the layout of scatterplots never overlap them while not causing large empty spaces, by the effort of the rectangle packing algorithm. We subjectively defined two groups of similarly looking scatterplots in this figure. This section calls them Group 1 and Group 2. Group 1 denotes a group of scatterplots for which the red vectors are well separated from vectors with other colors. Group 2 denotes another group of scatterplots for which the black vectors are moderately separated from vectors with other colors, in addition to the yellow and red vectors. This result demonstrates the presented technique well gathers scatterplots which the same classes are well separated closer in the display space. Figure 3 shows a close-up view of the six scatterplots belonging to Group 2. In addition, this figure demonstrates that our technique performs well in grouping similarly looking scatterplots closer together in the display space. In Figure 3, the two-dimensional values displayed above the scatterplots indicate the IDs of the dimensions assigned to the two axes of the scatterplots. The second values of the six pairs of dimensions indicate the following [Data-seg]: 9: Average of (R + G + B)/3 value over the region. 10: Average of the R value over the region. 15: Average of the (2B - (G + R)) value over the region. 16: 3D nonlinear transformation of the RGB values using the algorithm of Foley and VanDam. Here, R, G, and B are red, green, and blue components of the pixel values. These results demonstrate that the variables of the three images (visualized as yellow, red, and black vectors) are characteristic and similarly distributed. The vectors displayed in Figure 4 (Left) represent the ideal positions of the scatterplots calculated using the Isomap scheme. Similarly, the vectors displayed in Figure 4 (Right) represent the final positions of the scatterplots shown in Figure 2. We subjectively made seven groups of vectors indicated as (1) to (7) to observe their movements by the rectangle packing process. Figure 4 demonstrates that vectors in these groups remain adjacent. Moreover, the figure indicates that the scatterplots displayed in Figure 2 do not overlap and there are no unnecessary gaps in the display space.
8 8 Yunzhu Zheng et al. Group 1 Group 2 Fig. 2 Visualization of an image segmentation dataset as 68 scatterplots. Fig. 3 Close-up view of the six scatterplots belonging to Group 2 shown in Figure Example 2: Vehicle specification data In this subsection, we present visualization results obtained by applying our approach to a vehicle specification dataset constructed from websites containing on-line vehicle catalogs [Data-veh]. This dataset contains 70 feature values and various classes for 429 vehicles. Hence, the 70-dimensional dataset is displayed as 429 vectors. Here, the feature values include sales-related values such as price, performance-related values such as displacement and fuel efficiency, and sizerelated values such as width and length. In addition, this dataset contains various average values of 5-point subjective user evaluations, including appearance, equipment, interior, engine power, cost performance, and the total. We divided the vehicles according to the range of one of the average evaluation values listed above, and colored the vectors according to the corresponding range of evaluation values. In Figure 5, we present the set of scatterplots obtained by applying our technique to the vehicle specification dataset. Here, the colors of the vectors denote ranges of appearance evaluation values. Vehicles with the highest appearance eval-
9 Scatterplot Layout for High-dimensional Data Visualization 9 (1) (2) (1) (2) (3) (3) (4) (4) (5) (5) (6) (7) (6) (7) Fig. 4 (Left) Ideal positions of scatterplots calculated by the Isomap scheme. (Right) Final positions of scatterplots shown in Figure 2. Fig. 5 Visualization of a vehicle specification dataset using scatterplots. Colors of vectors denote appearance evaluation. uation values are drawn in pink. We use red rectangles to indicate scatterplots containing pink vectors that are well-separated from vectors with other colors. Several of these scatterplots assign multiple common variables to their axes, such as height of floor, evaluation of equipment, interior, cost performance, and total. This result suggests that the height of floor is one of the most important specification values correlated to the impression of appearance of vehicles. In addition, it suggests that appearance evaluation is well correlated to other evaluation values. Several of the scatterplots that are not well separated from the pink vectors are indicated by a blue rectangle. Several of these scatterplots assign multiple common variables to their axes, such as outer length, outer height, and outer width. An unexpected result obtained is that the values of the outer size of a vehicle did not have a major correlation to its appearance evaluation. In Figure 6, we present another visualization example using a set of scatterplots. Here, the colors of vectors denote ranges of values of cost performance evaluation. In several scatterplots, vehicles those cost performance are highly evaluated
10 10 Yunzhu Zheng et al. Fig. 6 Visualization of a vehicle specification dataset using scatterplot. Colors of vectors denote cost performance evaluation. (drawn in pink) are well-separated from vehicles those cost performance are poorly evaluated (drawn in blue). Several of the scatterplots assign multiple common variables to their axes, such as appearance and total evaluations. This result suggests that the evaluation of appearance strongly correlates to the cost performance and total evaluation. The two figure show that the layout of scatterplots never overlap them while not causing large empty spaces, by the effort of the rectangle packing algorithm. 5Conclusion In this paper, we presented a technique for visualizing high-dimensional data as a set of well-selected and arranged scatterplots. Our technique first selects meaningful pairs of dimensions, and computes the similarity between arbitrary pairs of the dimension pairs. It then calculates the ideal positions of scatterplots representing the dimension pairs based on their similarity distances, and finally adjusts their positionsby a rectangle packing algorithm. We presented examples of applying our proposed technique to visualize image segmentation and vehicle specification datasets. As a future work, we would like to perform additional experiments in various applications. Specifically, we would like to apply our technique to larger datasets containing hundreds of dimensions and tens of thousands of vectors to demonstrate its scalability. In addition, we would like to apply our technique to various fields of datasets to demonstrate its usability as a generic visual analytics tool. Meanwhile, we also would like to numerically and subjectively evaluate our technique. For example, we would like to compare the distances among the scatterplots on the display space with their similarity distances.
11 Scatterplot Layout for High-dimensional Data Visualization 11 References [Correa 12] C. D. Correa, T. Crnovrsanin, K.-L. Ma, Visual Reasoning about Social Networks Using Centrality Sensitivity. IEEE Transactions on Visualization and Computer Graphics, 18(1), , [Elmqvist 08] N. Elmqvist, P. Dragicevic, J. Fekete, Rolling the Dice: Multidimensional Visual Exploration using Scatterplot Matrix Navigation, IEEE transactions on Visualization and Computer Graphics, 14(6), , [Feiner 90] S. Feiner, C.Beschers, Worlds within Worlds: Metaphors for Exploring n- dimensional Virtual Worlds, ACM Symposium on User Interface Software and Technology (UIST 90), 76-83, [Grinstein 01] G. Grinstein, M. Trutschl, U. Cvek, High-Dimensional Visualizations, KDD Workshop on Visual Data Mining, [Harel 02] D. Harel, Y. Koren, Drawing Graphs with Nonuniform Vertices, Advanced Visual Interfaces (AVI) 2002, , [Inselberg 90] A. Inselberg, B. Dimsdale: Parallel Coordinate: A Tool for Visualizing Multi- Dimensional Geometry, IEEE Visualization, , [Itoh 06] T. Itoh, H. Takakura, A. Sawada, K. Koyamada, Hierarchical Visualization of Network Intrusion Detection Data in the IP Address Space, IEEE Computer Graphics and Applications, 26(2), 40-47, [Itoh 09] T. Itoh, C. Muelder, K. Ma, J. Sese, A Hybrid Space-Filling and Force-Directed Layout Method for Visualizing Multiple-Category Graphs, IEEE Pacific Visualization Symposium, , [Keim 94] D. A. Keim, H.-P. Kriegel, VisDB: Database Eploration Usng Multidimensional Visualization, IEEE Computer Graphics and Applications, 14(5), 40-49, [Leban 05] G. Leban, I. Bratko, U. Petrovics, T. Curk, B. Zupan, VizRank: Finding Informative Data Projections in Functional Genomics by Machine Learning, Bioinformatics Applications Note, 21(3), , [Lin 09] C. Lin, H. Yen, J. Chuang, Drawing Graphs with Nonuniform Nodes Using Potential Fields, Journal of Visual Languages and Computing, 20(6), , [Mihalisin 90] T. Mihalisin, E. Gawlinski, J. Timlin, J. Schwegler, Visualizing a Scalar Field on an n-dimensional Lattice, IEEE Visualization, , [Sips 09] M. Sips, Selecting Good Views of High-Dimensional Data Using Class Consistency?C Eurographics/IEEE-VGTC Symposium on Visualization, , [Ward 94] M. O. Ward, XmdvTool: Integrating Multiple Methods for Visualizing Multivariate Data, IEEE Visualization, , [Wilkinson 05] L. Wilkinson, A. Anand, R. Grossman, Graph-Theoretic Scagnostics, IEEE Symposium on Information Visualization, , [Wong 97] P. C. Wong, R. D. Bergeron, 30 Years of Multidimensional Multivariate Visualization, Scientific Visualization: Overviews Methodologies and Techniques, IEEE Computer Society Press, 3-33, [Data-seg] UCI Machine Learning Repository, Image Segmentation Data Set, [Data-veh] YAHOO! JAPAN Automobile,
A Visualization Technique for Monitoring of Network Flow Data
A Visualization Technique for Monitoring of Network Flow Data Manami KIKUCHI Ochanomizu University Graduate School of Humanitics and Sciences Otsuka 2-1-1, Bunkyo-ku, Tokyo, JAPAPN [email protected]
Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz
Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa Frias-Martinez 2, and Enrique Frias-Martinez 2 1 Department of Computer Science, Universita di Torino,
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Visualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-
Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.603-608 (2011) ARTICLE Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Hiroko Nakamura MIYAMURA 1,*, Sachiko
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
Visualization of Multivariate Data. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University
Visualization of Multivariate Data Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University Introduction Multivariate (Multidimensional) Visualization Visualization
Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data.
Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data. Andrada Tatu Institute for Computer and Information Science University of Konstanz Peter
HDDVis: An Interactive Tool for High Dimensional Data Visualization
HDDVis: An Interactive Tool for High Dimensional Data Visualization Mingyue Tan Department of Computer Science University of British Columbia [email protected] ABSTRACT Current high dimensional data visualization
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Visual Data Mining with Pixel-oriented Visualization Techniques
Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 [email protected] Abstract Pixel-oriented visualization
BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data
BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data Julian Heinrich, Robert Seifert, Michael Burch, Daniel Weiskopf VISUS, University of Stuttgart Abstract. Exploring data sets by
Voronoi Treemaps in D3
Voronoi Treemaps in D3 Peter Henry University of Washington [email protected] Paul Vines University of Washington [email protected] ABSTRACT Voronoi treemaps are an alternative to traditional rectangular
Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis
Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis M. Kreuseler, T. Nocke, H. Schumann, Institute of Computer Graphics University of Rostock, D-18059 Rostock, Germany
Hierarchical Data Visualization
Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and
Information Visualization Multivariate Data Visualization Krešimir Matković
Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal
BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou
BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Multi-Dimensional Data Visualization. Slides courtesy of Chris North
Multi-Dimensional Data Visualization Slides courtesy of Chris North What is the Cleveland s ranking for quantitative data among the visual variables: Angle, area, length, position, color Where are we?!
Visual Mining of E-Customer Behavior Using Pixel Bar Charts
Visual Mining of E-Customer Behavior Using Pixel Bar Charts Ming C. Hao, Julian Ladisch*, Umeshwar Dayal, Meichun Hsu, Adrian Krug Hewlett Packard Research Laboratories, Palo Alto, CA. (ming_hao, dayal)@hpl.hp.com;
Enhancing the Visual Clustering of Query-dependent Database Visualization Techniques using Screen-Filling Curves
Enhancing the Visual Clustering of Query-dependent Database Visualization Techniques using Screen-Filling Curves Extended Abstract Daniel A. Keim Institute for Computer Science, University of Munich Leopoldstr.
HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data
P a g e 77 Vol. 9 Issue 5 (Ver 2.0), January 2010 Global Journal of Computer Science and Technology HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data Abstract- The HierarchyMap
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
RnavGraph: A visualization tool for navigating through high-dimensional data
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS117) p.1852 RnavGraph: A visualization tool for navigating through high-dimensional data Waddell, Adrian University
TIBCO Spotfire Network Analytics 1.1. User s Manual
TIBCO Spotfire Network Analytics 1.1 User s Manual Revision date: 26 January 2009 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO
Topic Maps Visualization
Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics
Visualizing Large Graphs with Compound-Fisheye Views and Treemaps
Visualizing Large Graphs with Compound-Fisheye Views and Treemaps James Abello 1, Stephen G. Kobourov 2, and Roman Yusufov 2 1 DIMACS Center Rutgers University {abello}@dimacs.rutgers.edu 2 Department
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Data representation and analysis in Excel
Page 1 Data representation and analysis in Excel Let s Get Started! This course will teach you how to analyze data and make charts in Excel so that the data may be represented in a visual way that reflects
9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08
9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft
Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
Cours de Visualisation d'information InfoVis Lecture. Multivariate Data Sets
Cours de Visualisation d'information InfoVis Lecture Multivariate Data Sets Frédéric Vernier Maître de conférence / Lecturer Univ. Paris Sud Inspired from CS 7450 - John Stasko CS 5764 - Chris North Data
Interactive Data Mining and Visualization
Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Section 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
Data Visualization - A Very Rough Guide
Data Visualization - A Very Rough Guide Ken Brodlie University of Leeds 1 What is This Thing Called Visualization? Visualization Use of computersupported, interactive, visual representations of data to
A Color Placement Support System for Visualization Designs Based on Subjective Color Balance
A Color Placement Support System for Visualization Designs Based on Subjective Color Balance Eric Cooper and Katsuari Kamei College of Information Science and Engineering Ritsumeikan University Abstract:
COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
Visualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
Visual Discovery in Multivariate Binary Data
Visual Discovery in Multivariate Binary Data Boris Kovalerchuk a*, Florian Delizy a, Logan Riggs a, Evgenii Vityaev b a Dept. of Computer Science, Central Washington University, Ellensburg, WA, 9896-7520,
Pushing the limit in Visual Data Exploration: Techniques and Applications
Pushing the limit in Visual Data Exploration: Techniques and Applications Daniel A. Keim 1, Christian Panse 1, Jörn Schneidewind 1, Mike Sips 1, Ming C. Hao 2, and Umeshwar Dayal 2 1 University of Konstanz,
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
NeuroVis: exploring interaction techniques by combining dimensional stacking and pixelization to visualize multidimensional multivariate data
NeuroVis: exploring interaction techniques by combining dimensional stacking and pixelization to visualize multidimensional multivariate data Category: Research ABSTRACT Applying a dimensional stacking
Space-filling Techniques in Visualizing Output from Computer Based Economic Models
Space-filling Techniques in Visualizing Output from Computer Based Economic Models Richard Webber a, Ric D. Herbert b and Wei Jiang bc a National ICT Australia Limited, Locked Bag 9013, Alexandria, NSW
MicroStrategy Desktop
MicroStrategy Desktop Quick Start Guide MicroStrategy Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT. 1 Import data from
Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension. reordering.
Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension Reordering Wei Peng, Matthew O. Ward and Elke A. Rundensteiner Computer Science Department Worcester Polytechnic Institute Worcester,
Graphical Representation of Multivariate Data
Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables.
Tree Visualization with Tree-Maps: 2-d Space-Filling Approach
I THE INTERACTION TECHNIQUE NOTEBOOK I Tree Visualization with Tree-Maps: 2-d Space-Filling Approach Ben Shneiderman University of Maryland Introduction. The traditional approach to representing tree structures
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Interactive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,[email protected]) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
Test Data Sets for Evaluating Data Visualization Techniques
Test Data Sets for Evaluating Data Visualization Techniques Daniel A. Keim Institute for Computer Science, University of Munich, Leopoldstr. 11B, 80802 Munich, Germany, [email protected]
MicroStrategy Analytics Express User Guide
MicroStrategy Analytics Express User Guide Analyzing Data with MicroStrategy Analytics Express Version: 4.0 Document Number: 09770040 CONTENTS 1. Getting Started with MicroStrategy Analytics Express Introduction...
Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015
Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis
How To Make Visual Analytics With Big Data Visual
Big-Data Visualization Customizing Computational Methods for Visual Analytics with Big Data Jaegul Choo and Haesun Park Georgia Tech O wing to the complexities and obscurities in large-scale datasets (
Chapter 3 - Multidimensional Information Visualization II
Chapter 3 - Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Florian Alt, WS 2013/14 Konzept und Folien
High Dimensional Data Visualization
High Dimensional Data Visualization Sándor Kromesch, Sándor Juhász Department of Automation and Applied Informatics, Budapest University of Technology and Economics, Budapest, Hungary Tel.: +36-1-463-3969;
Time series clustering and the analysis of film style
Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such
Visualization of textual data: unfolding the Kohonen maps.
Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: [email protected]) Ludovic Lebart Abstract. The Kohonen self organizing
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Forschungskolleg Data Analytics Methods and Techniques
Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Visualizing an Auto-Generated Topic Map
Visualizing an Auto-Generated Topic Map Nadine Amende 1, Stefan Groschupf 2 1 University Halle-Wittenberg, information manegement technology [email protected] 2 media style labs Halle Germany [email protected]
TOP-DOWN DATA ANALYSIS WITH TREEMAPS
TOP-DOWN DATA ANALYSIS WITH TREEMAPS Martijn Tennekes, Edwin de Jonge Statistics Netherlands (CBS), P.0.Box 4481, 6401 CZ Heerlen, The Netherlands [email protected], [email protected] Keywords: Abstract:
Analytics with Excel and ARQUERY for Oracle OLAP
Analytics with Excel and ARQUERY for Oracle OLAP Data analytics gives you a powerful advantage in the business industry. Companies use expensive and complex Business Intelligence tools to analyze their
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
Ordered Treemap Layouts
Ordered Treemap Layouts Ben Shneiderman Department of Computer Science, Human-Computer Interaction Lab, Insitute for Advanced Computer Studies & Institute for Systems Research University of Maryland [email protected]
An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values
Information Visualization & Visual Analytics Jack van Wijk Technische Universiteit Eindhoven An example y 30 items, 30 x 3 values I-science for Astronomy, October 13-17, 2008 Lorentz center, Leiden x An
VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills
VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical
TEXT-FILLED STACKED AREA GRAPHS Martin Kraus
Martin Kraus Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
A Survey, Taxonomy, and Analysis of Network Security Visualization Techniques
Georgia State University ScholarWorks @ Georgia State University Computer Science Theses Department of Computer Science 1-12-2006 A Survey, Taxonomy, and Analysis of Network Security Visualization Techniques
How To Use Trackeye
Product information Image Systems AB Main office: Ågatan 40, SE-582 22 Linköping Phone +46 13 200 100, fax +46 13 200 150 [email protected], Introduction TrackEye is the world leading system for motion
Graphical Web based Tool for Generating Query from Star Schema
Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604
A Survey on Multivariate Data Visualization
A Survey on Multivariate Data Visualization Winnie Wing-Yi Chan Department of Computer Science and Engineering Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong June 2006
