Visualisation of Large Datasets with Houdini Ben Simons Data Arena Lead Developer University of Technology, Sydney ben.simons@uts.edu.au bsimons@acm.org
New UTS Broadway Building
UTS Data Arena ~ April 2014
Today's Outline - Big Data 1. Some strategies used in Film Visual FX 2. Visualisation Techniques in Houdini 3. VFX Data Formats & Disk Systems
Happy Feet 2 2 Petabytes (2,000,000 GB) 3D Stereo HD images Render: 18,000 cpu cores Parallel access to data HDF5 data on Bluearc & Isolon NAS Disk Systems Linux software: Maya, Houdini, Naiad, Nuke, 3Delight Entirely made at Carriageworks in Sydney at Dr D Studios
Resident Evil 3 Extinction The Desert Undead: 18-layer images (Rman AOV's) Each single image frame was split into 96 tiles Rendered on 96 machines, then each frame tile-joined
Houdini www.sidefx.com
Houdini across 2 screens
Houdini Object Nodes
Houdini Procedural Network
Houdini Parameters
Houdini Chops Channel is a column of data Plain textfiles ok separate columns with tabs Interactive Channel graph (zoom in) Visual programming Filtering, Sampling, shading, instancing, and rendering Hands-on tomorrow will be Chops & Vops
Spitzer Glimpse Dataset http://data.spitzer.caltech.edu/popular/glimpse/20070416_enhanced_v2/source_lists/south/
Spitzer Space Telescope GLIMPSE Dataset South: ~300 files, 78 different Channels, 145K rows gzipped.tbl data loaded into Houdini Houdini Chops used to filter & calc 'colours' Show difference of infra-red magnitude bands Point colours and scales calculated by VOPs SIMD Shaders Houdini Movie Rendered (Mantra PBR) 36M points, filtered <12M
Shading & VOP's A shader is a mini-program which makes data It can be better to generate data than load it. Shaders allow additional level of management Geom shaders on HF2 generated 1 billion snow particles per image frame (impossible to load). Houdini VOP's are SIMD
Houdini VOP Network
Instancing Saves Memory & I/O by re-using geometry Copies generated at render time Each Instance can be varied based on point attributes Referencing one instance object provides a massive data reduction
Adaptive Meshes, LOD, Caching & Filtering Data reduction techniques Level of Detail (distance from camera) Adaptive Meshes Cache common files locally Filter texture (images) - Mipmapping
Other tricks Baked Lighting & Shadows Pre-calculate lighting & shadows bake new textures & reapply onto geom Sydney Harbour Multi-Beam Sonar Survey, 30cm data. Interactive 3D Flythrough
Know ur Limits: Memory & I/O I/O will Bottleneck - Partition the problem & then scale it up Split job across many independent machines (eg. render) Segment data access for each machine (eg. HDF5) Alternate memory hardware Vector (array) processor - SIMD as Cray, now intel SSE/MMX and Nvidia GPU IBM Cell Processor has Vector Processor Content-Addressable Memory associative arrays are used by Network Routers
Types of System Memory Virtual Memory Swapping is good, thrashing is bad SMP vs MPI SMP Symmetric Multiprocessing: Multiple CPU's with common/shared memory. Multi-threaded apps. eg. Intel Xeon, Core 2 Duo are SMP. Cache coherency, snooping bus (on distributed SM) ccnuma MPI (Message Passing) PVM Clusters, Beowulf, etc (Memory not shared)
Data Formats HDF5 Heirachical Data Format www.hdfgroup.org Browsable container of data (HDFView) Has groups & datasets like dirs & files Data stored in B-Trees Can also store Binary Data HDF5 for Python www.h5py.org Operate on HDF5 data via python dictionaries & NumPy arrays - www.numpy.org
Disk Systems Network Attached Storage (NAS) Bluearc (now Hitachi) implemented via FPGA Isilon (now EMC) clustered filesystem, 100GB/s Lustre Filesystem Multiple SSD nodes & maintains global file coherency Experimental Parallel distributed filesystem can have multiple copies of a file, one master. Venti (Bell Labs Plan-9 & Inferno) WORM Archive. Shares Blocks by secure SHA-1 Hash.
Data Formats 2 Open VDB www.openvdb.org Hierachical structure for volumetric data ( clouds ) Good for sparse volumetric time-varying data Fast access (constant-time) to voxels Large set of operators (Level Set tools, filters, transforms & morphological operators)
Data Formats 3 Disney Ptex eliminates uv texture assignment http://ptex.us/ no (u,v)'s required! no seams visible works on sub-d/poly faces Stores face adjacency data & filters Efficiently stores 106 mipmapped texture files Multi-channels, compressed separately Used in Disney's Bolt
D3 Data-Driven Documents D3 An amazing Data visualisation web framework (javascript) http://d3js.org See: https://github.com/mbostock/d3/wiki/gallery Offers Parallel Coordinates Demo? Nutrient Contents - An interactive visualization of the USDA Nutrient Database. http://exposedata.com/parallel/
Parallel Co-ordinates protein, calcium, sodium, fibre, vitamin c, potassium, carbohydrate, sugar, fat, water, calories, saturated,...