Visualizing Large, Complex Data
Outline Visualizing Large Scientific Simulation Data Importance-driven visualization Multidimensional filtering Visualizing Large Networks A layout method Filtering methods
Supernova
Supernova
The Large Data Problem Shared resources Supercomputer Storage Visualization Machine
A Turbulent Lifted Autoignitive Ethylene/air Jet Flame HO 2 and mixture fraction isosurface HO 2 and OH
Complex, Multi-Scale Nature of Turbulent Flow Small eddies are hidden in the multi-layer flow
Feature-directed Data Reduction and Visualization
Feature-directed Data Reduction and Visualization Achieved over 80% saving In situ data reduction and triage can facilitate following data analysis and visualization!
In Situ Methods Enables Seeing all the data and capturing transient events at highest possible detail More effective data reduction More efficient postprocessing analysis and viz Monitoring/debugging of the simulation (ensuring the calculation is running well) Steering the simulation and driving the simulation with interactive analysis Tuning and optimizing the performance of the simulation/machine
Fusion
Multidimensional Particles Filtering
Multidimensional Particles Filtering Trapped particles that change direction frequently
Visualizing Large, Complex Networks Data are created and collected for a variety of purposes Internet is a source of massive data Cyber security Homeland security Business transactions Mobile device user data Health care data Email Relations in these data are often represented with graph/ networks for analysis To visualize a network, we need to lay it out
The Graph Layout Problem The cost of displaying a graph The hairball problem of large graph layouts Large, dense graphs become a mess Inefficient use of space Details cluttered Solutions Filtering Clustering Abstraction Focus+context California data 6,107 nodes 15,160 edges High dimensional embedding method
Space Partitioning Based Layout California data 6,107 nodes 15,160 edges Treemap Hibert curve Linlog Method 10,737s Radial Treemap Gosper curve
Space Filling Curve Based Layout Layout defined by clustering Space filing Interaction is very fast O( V ) Scales to large graphs Effective for Focus+Context Guaranteed aspect ratios Nodes don t become colinear Rendering is slower than layout A protein homology graph. Color corresponds to depth in the clustering hierarchy. V = 28,854, E = 1,180,816
Visualizing Internet Connectivity
Centrality Sensitivity Centralities (degree, between-ness, closeness, eigenvector, Markov, ) indicate how important a node is in a network. Studying the sensitivity and stability of a network in terms of different metrics for centrality allow us to Filter the network Search and explore in the network Obtain an overview of the network Compute sensitivity as the derivative of the centrality function, approximate derivatives of centrality using finite difference, and validate by computing the mean square error of the linear fit between the approximated and analytical values
Centrality Sensitivity Minimum spanning tree as the core network with centrality derivatives as edge weights Central nodes remain central Network of protein-protein interaction (~1500 nodes)
Overview of Sensitivity Friendster social network Links exhibit negative sensitivity (red) between cluster centers Astrophysics co-author network One competitive network (red) and one collaborative network (blue)
Summary Visualization, as a tool complementing conventional data analysis and mining methods, enhances our ability to utilize and communicate with data and knowledge In situ visualization and data reduction/triage is the most plausible solution to extreme scale scientific supercomputing Visual-based network analysis has become an essential tool, and interactivity is the key to understanding complex networks. The 1 st IEEE Symposium on Large Data Analysis and Visualization (LDAV), Oct 23-24, Providence, RI The 6 th Ultrascale Visualization Workshop, Supercomputing Conference (SC11), November 13, Seattle, WA