Data analysis and visualization topics

Data analysis and visualization topics Sergei MAURITS, ARSC HPC Specialist maurits@arsc.edu Content Day 1 - Visualization of 3-D data - basic concepts - packages - steady graphics formats and compression - animation formats and compression (codecs) - practical approach for animation - examples - Hands-on training compositing animation from frames with shake Content Hands-on Day 2 - X Windows connection to the ARSC workstations (http://www.arsc.edu/arsc/support/howtos/usingws/) - basic concepts of ParaView - animation with ParaView - making frames with ParaView - moving files (frames) from Linux realm to Mac - using shake to edit the frames - preparing stand-alone animation with QuickTime 7 All in the hands-on training mode! 3-D Data Visualization - isosurfaces (close to natural appearance) - cross-sections (3-D ==> 2-D) - probes/sounding (convenience tools) - vectors, streamlines, particles paths - volume rendering (mix of all above) - time dependence (animation) - multi-variable sets (clutter problem) - miscellaneous contexts (CFD objects, terrain, geographical maps, etc.) - discrete (molecules) vs. continuous fields - ==> NO UNIVERSAL SOLUTION 1

A Few Popular Visualization Packages - AVS & AVS Express - IDL & IDL itools - Matlab - TecPlot - NCAR Graphics + NCL (semi-free) - Paraview/VTK (free) - IDV (free) - Vis5D (free) Vis5D - Vis5D (3D+time+multi-variable = 5D) is a full suite of volumetric visualization tools, created originally for meteorological data (somewhat defines Vis5D context) - multi-scalars + two (at the time) vector field, + stream lines, trajectories, + customizable map & terrain - exists as a stand-alone GUI and/or API suite (advanced) - some customization (although, limited in terms of new graphics) is possible through Tcl-scripting (screen shots, spinning, etc.) and custom user functions - new variables: (array syntax) and user functions Vis5D (cont.) - data I/O: conversion Fortran or C codes to Vis5D-format data base.v5d, commented templates are available at /import/projects/classes/vis5d/linux/arsc_vis5d_help/convert/ -.v5d - cross-platform, effectively compressed (up to 1 byte/node/variable) 3-D format of semi-standard status - upon conversion,.v5d files can be rendered immediately - /Vis5D/doc and /Vis5D/man - on-line documentation in PDF, SGML, HTML (at the ARSC Linux environment search for INDEX.HTM at /usr/local/pkg/vis5d.../doc) 1. Find directory Vis5D: Hands-on Tutorial /projects/classes/vis5d/linux/arsc_vis5d_help 2. Review README-files (README.first -general, README.environment - Vis5D environment at ARSC, README.tutorial hands-on step-by-step tutorial) 3. Copy the entire content of the directory above to local HD under unique name (/scratch/vis5d_youruid) (see README.tutorial for details) 4. Proceed with step-by-step tutorial from Section 2 in README.tutorial 2

ParaView (http://www.paraview.org/) ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView's batch processing capabilities. ParaView was developed to analyze extremely large datasets using distributed memory computing resources. It can be run on supercomputers to analyze datasets of terascale as well as on laptops for smaller data. ParaView tutorial can be found at http:// www.paraview.org/wiki/the_paraview_tutorial - consists of pixels, usually square - resolution number of pixels, N x M - color = R,G,B [+ Alpha] - Brute force approach - you need N*M*3- or, if transparency is involved, N*M*4 - numbers to store a graphical image a lot! - in majority of applications space saving is desired (or highly desired) savings are possible either 1. by limiting the number of colors (discretization of R,G,B-space to 256 or even less colors and introduction of the color tables: R'G'B => color #n: 0-1, 0-3,, 0-255) or 2. by limiting the shapes precision, thus departing from simple but costly NxM model or 3. no savings at all for the quality 1. Compression by limiting colors to 2, 4, 8, 16,...,128, 256 colors (so called pseudocolor ) - GIF, (also PNG-8) small file size, limiting color space applications in WEB, science (few colors are used) supports transparency, animation (animated GIF) 3

2. Compression by simplifying shapes JPEG, JPEG2000 flexible compression level ==> variable file size (10% is frequently OK, 30% is good for almost everything) 3. No compression at all TIFF, PNG-24, XWD, BMP,... Highest image quality LARGEST file sizes lossy compression loss of graphics quality, artefacts, especially at the shapes edges formats of choice in professional graphics, polygraphy, satellite imagery, archiving Widely used in photo, video (TTL Through the Lens) application (cont.) (cont.) 4. Both types of compressions (color, shape) are irreversible, (keeping uncompressed archival copy is a reasonable approach) 5. Worst case scenario second pass of compression of already compressed JPEG. This way compression artefacts will be preserved (stealing file size) and magnified (stealing quality). 7. Surprise 30% quality for JPEG compression is not so bad, sometimes (photo for Web, for instance) 10% is quite sufficient. Photoshop SAVE_FOR_WEB utility is very useful for visual determination of necessary compression level 6. Better way keep uncompressed archival copy and compress it to different levels of quality. 4

10/24/13 Example of compression levels (using Save_for_Web in Photoshop) for scientific graph with the uniform background of 1700x1000 resolution (close to HDTV, 1080p), but just 16 colors were used to draw it TIFF JPEG - 5 150 000 B or 5.15 MB - no compression 314 100 B (max = 100%), subjectively, no artefacts 129 700 B (mid = 70 400 B (min = GIF - 50%), subjectively, minor artefacts 10%), subjectively, a lot of artefacts 133 200 (256 colors) - no artefacts, adequate colors 113 000 (64 colors) - no artefacts, adequate colors From Wikipedia: is the rapid display of a sequence of images of 2D or 3-D artwork or model positions in order to create an illusion of movement. It is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in a number of ways. Standard frequency of the TV-based animation is 30 frames per second (or fps). All software interfaces adopt it as a base rate. By simple repetition of your frames, you can depart from this rate, but not dramatically - 8-10 fps is the practical limit. This means, you can repeat your frames 3-4 - or 5 (may be) times, but usually not much more. 85 360 (16 colors) - no artefacts, adequate colors 76 760 ( 8 colors) - color distortion starts here (16) From the compression standpoint, the animation frequency of 30 fps means that your graphics compression problem is 30 times worse than in case of the static graphics From http://www.apple.com/quicktime/technologies/h264/ Full HD uncompressed (4:4:4): 1920W x1080h x24bit x30 fr = or 2MP x24bit x30 fr = 1424 Mbps (Duration example: 45 frames x 3 = 135 135/30 = 4.5 sec) 5

10/24/13 The good news animated stream has a lot of redundancy, its compression can be dramatically more effective than compression of the static graphics. Earlier coding techniques used singly predicted (P) frames depended only on previous independently coded frames (I) and bipredicted (B) frames, which are depended on a past and a future I or P frames. The current advanced codecs (H.264) are much more flexible, which improves quality and decreases bitrate for the given resolution Compression ratio 1:200 or increases resolution for the given bitrate 6

practical approach practical approach - Use all ARSC resources (supercomputers, Linux boxes with 3Dgraphics subsystems, Mac OS X fine software, etc. ) - Make frames in sufficient quantities (remember 30 fps) with any capable viz package, use large fonts and thick lines - To make the sequence UNIX-friendly for scripting, number it without leading zeros (f_0001.png vs. f_1001.png) if possible - Linux utilities animate+convert can crop, change formats & quality, it provides fast results, but quite limited in scope (highend codecs are commercial products) - Mac OS (ARSC-supported) QuickTime 7 (compression, timing, outputs for various media ) - Quick Time Player (screen recording - Final Cut Pro (industry standard) - Shake advanced composing - Windows DviX Pro (was AVI, uses the same codec H.264) basically the same functionality as in QuickTime 7, DviX free player is available for Mac while QuickTime 7 is available for Windows 7