Visualization Environment Issues for Terascale Data



Similar documents
Large Scale Data Visualization and Rendering: Scalable Rendering

Petascale Visualization: Approaches and Initial Results

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

LA-UR- Title: Author(s): Intended for: Approved for public release; distribution is unlimited.

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Who is Dale? Today s Topics. Vis Basics Big Data. Vis Basics The Four Paradigms. Ancient History IRIX-based Vis Systems

Computer Graphics Hardware An Overview

SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems

Lecture Notes, CEng 477

Introduction to GPGPU. Tiziano Diamanti

Quantum StorNext. Product Brief: Distributed LAN Client

Advanced Rendering for Engineering & Styling

Parallel Programming Survey

Stream Processing on GPUs Using Distributed Multimedia Middleware

NVIDIA GeForce GTX 580 GPU Datasheet

Introduction to Cloud Computing

Parallel Large-Scale Visualization

CHAPTER FIVE RESULT ANALYSIS

Writing Applications for the GPU Using the RapidMind Development Platform

Lustre Networking BY PETER J. BRAAM

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Facts about Visualization Pipelines, applicable to VisIt and ParaView

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

How To Teach Computer Graphics

L20: GPU Architecture and Models

Cluster Implementation and Management; Scheduling

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Optimizing AAA Games for Mobile Platforms

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Parallel Analysis and Visualization on Cray Compute Node Linux

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Parallel Visualization for GIS Applications

Introduction to Computer Graphics

Touchstone -A Fresh Approach to Multimedia for the PC

Architectures and Platforms

NVIDIA VIDEO ENCODER 5.0

GPU Architecture. Michael Doggett ATI

Performance Optimization and Debug Tools for mobile games with PlayCanvas

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

Bringing Big Data Modelling into the Hands of Domain Experts

Getting Started with RemoteFX in Windows Embedded Compact 7

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Deploying and managing a Visualization Onera

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Big Fast Data Hadoop acceleration with Flash. June 2013

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

Data Centric Systems (DCS)

A Hybrid Visualization System for Molecular Models

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

Low power GPUs a view from the industry. Edvard Sørgård

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Interactive Level-Set Deformation On the GPU

Developer Tools. Tim Purcell NVIDIA

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Large-Data Software Defined Visualization on CPUs

NVIDIA IndeX. Whitepaper. Document version June 2013

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

Multiresolution 3D Rendering on Mobile Devices

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Violin: A Framework for Extensible Block-level Storage

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

The Collaboratorium & Remote Visualization at SARA. Tijs de Kler SARA Visualization Group (tijs.dekler@sara.nl)

Interactive Rendering In The Post-GPU Era. Matt Pharr Graphics Hardware 2006

GPU Renderfarm with Integrated Asset Management & Production System (AMPS)

Using Linux Clusters as VoD Servers

Packet-based Network Traffic Monitoring and Analysis with GPUs

IRS: Implicit Radiation Solver Version 1.0 Benchmark Runs

GTC Presentation March 19, Copyright 2012 Penguin Computing, Inc. All rights reserved

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPGPU Computing. Yong Cao

1. INTRODUCTION Graphics 2

High Performance Computing OpenStack Options. September 22, 2015

OctaVis: A Simple and Efficient Multi-View Rendering System

Performance Monitoring of Parallel Scientific Applications

Remote Visualization and Collaborative Design for CAE Applications

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Building CHAOS: an Operating System for Livermore Linux Clusters

Operating Systems 4 th Class

Recommended hardware system configurations for ANSYS users

DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER

Radeon HD 2900 and Geometry Generation. Michael Doggett

Kriterien für ein PetaFlop System

Remote Graphical Visualization of Large Interactive Spatial Data

Application. EDIUS and Intel s Sandy Bridge Technology

IBM Deep Computing Visualization Offering

Cloud Gaming & Application Delivery with NVIDIA GRID Technologies. Franck DIARD, Ph.D. GRID Architect, NVIDIA

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Scalability and Classifications

HP SkyRoom Frequently Asked Questions

Faculty of Computer Science Computer Graphics Group. Final Diploma Examination

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

HP Workstations graphics card options

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Transcription:

Visualization Environment Issues for Terascale Data Randall Frank Lawrence Livermore National Laboratory UCRL-PRES PRES-154718 This work was performed under the auspices of the U.S. Department t of Energy by the University of California, Lawrence Livermore National Laboratory under contract No. W-7405W 7405-Eng-48.

The Livermore Model: Visualization Archive Simulation Data Data Manipulation Engine Network Switch Visualization Engine Video Switch/ Delivery PowerWall Raw data on platform disks/archive systems Offices Data manipulation engine (direct access to raw data) Networking (data/primitives/images) Visualization/rendering engine Video and remotely rendered image delivery over distance Displays (office/powerwalls) Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 2

Why Distributed Viz Environment? The realities of extreme dataset sizes Stored with the compute platform Cannot afford to copy the data Visualization co-resident with compute platform Track compute platform trends Distributed infrastructure Commodity hardware trends Migration of graphics leadership to the Migration of graphics leadership to the In clusters, desktops or displays Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 3

What Makes Visualization Unique? Unique I/O requirements Access patterns/performance Generation of graphical primitives Graphics computation: primitive extraction/computation Dataset decomposition (e.g. slabs vs chunks) Rendering of primitives Aggregation of multiple rendering engines Video displays Routing of digital or video tiles to displays (over distance) Interactivity (not a render-farm!) Real-time imagery Interaction devices, human in the loop (latency, prediction issues) Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 4

Viz Engine: Compute Nodes Computational elements upon which the visualization application runs Largely the same as the compute platform Distributed nodes P4, Itanium, Opteron SGI SMP clusters Interconnects Quadrics, Myrinet, GigE I/O systems Lustre, NFS, PVFS, Key for effective, production, visualization Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 5 Network Switch HP Visualization Cluster Stanford Chromium Cluster

Viz Engine: Graphics Options Add rendering capability to the nodes Software OpenGL & Custom Mesa, qsplat, *-Ray Hardware SGI IR pipes COTS graphics AGP or I Express slot nvidia, ATI, Card device drivers OS, Graphics API, etc Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 6 Network Switch ATI FireGL X2 nvidia Quadro FX

Viz Engine: Displays Connect rendering capability to displays Desktop(s) Tiled displays PowerWalls, IBM T221 Video switching/routing Remote image delivery Analog, Digital Compositing hardware HP Sepia-2 & sv6 IBM SGE Stanford Lightning2 SGI DVI Network Switch Video Switch GigE Switch IBM T221 (3840x2400) Monitor Monitor Monitor Monitor Image delivery Monitor Monitor Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 7

The Graphics Revolution A disruptive technology shift Raw performance kings $200 GeForce4 vs $80k IR pipe? Interesting feature kings Creative bandwidth management Fully programmable architectures Quake III Arena However Certainly a focus on gaming and DCC markets Relatively crude video support: stereo, sync, etc Can they make it past the conceptual IR ceiling? Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 8

The Tiled Display PowerWall Many forms Stereo, Cubes, Front/Back projection, Non-planar, Edge-blended, CAVEs Multiple uses Collaborative environments, Theaters, Enhanced desktop/interactive use Increased pixel counts Matching higher fidelity data (2D vs 3D) Driving a PowerWall Extreme I/O requirements 2x2 needed 300MB/s Synchronization (at a distance?) Data flow and data staging Requires output scalable image generation via aggregation 6400x3072 (5x3) Flexible Screen 5120x2048 (4x2) Glass Screen 3840x2048 (3x2) Cubes Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 9

Image Aggregation Solutions Image compositing : Take the (digital) outputs of multiple graphics cards and combine them to form a single image. Multiple goals/dimensions of scaling via aggregation Output scaling: Large pixel counts (PowerWalls) Data scaling: High polygon/fill rates/data decomposition Interaction/Virtual reality: High frame rates Image quality: Anti-aliasing, data extremes Hardware acceleration is natural Efficient access to rendered imagery Provide for image fragment transport Flexible, pipelined merging operations Solutions balance speed, scale Image input/transport solutions Application transparency Parallel rendering models HP sv6 Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 10

Examples: Compositing Systems Image composition hardware Lightning-2/MetaBuffer (Stanford/UT) sv6/sv7 (HP)/SGI compositor DVI based tiling/compositing Sepia (HP/Compaq) Custom compositing (FPGA + NIC) Dedicated network (ServerNet II & IB) Scalable Graphics Engine (IBM) Remote framebuffer, gige/udp distance solution Image composition software PICA: Parallel Image Compositing API ICE-T: Integrated image/data manipulation (SNL) HP Sepia-2 Lightning-2 IBM SGE Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 11

Idealized Visualization Environment Compute Fabric Storage Fabric Composition Fabric User Connect Existing end user desktop nodes Graphics Existing desktop displays Rendering Engines Decompress Graphics Graphics Graphics Graphics Graphics Decompress 4 x 4 Tiled Power Wall Virtual Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 12

A Distributed Parallel API Stack Goal: Provide integrated, distributed parallel services for viz apps. Encourage new apps, increase portability & device transparency. Application Toolkits (e.g., OpenRM, VTK, etc) Chromium DMX Merlot PICA X11 OpenGL Applications: VisIt, ParaView, EnSight, Blockbuster, etc Toolkits Parallel visualization algorithms, scene graphs DMX Distributed windowing and input devices (X11) Chromium Parallel OpenGL rendering model PICA Parallel image compositing API Merlot Digital image delivery infrastructure Core vendor services - X11/OpenGL/compositors/NICs Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 13

OpenGL Drivers Application Toolkits (e.g., OpenRM, VTK, etc) Chromium DMX Merlot PICA X11 OpenGL Some early DirectX/OpenGL concerns Slow ARB, Lack of innovation, Questionable support Mostly addressed Welcome to the world of extensions and competition Linux/COTS OpenGL drivers are looking good Solid support from nvidia and ATI Drivers support recent ARB extensions Vertex & fragment programs, ARB_vertex_buffer_object, etc Complete buffer support ( float pbuffers, multi-head, stereo, etc) Excellent performance, rivaling Windows, but features can lag Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 14

Distributed GL: Chromium Application Toolkits (e.g., OpenRM, VTK, etc) Chromium DMX Merlot PICA Distributed OpenGL rendering pipeline. Provides a parallel OpenGL interface for an N to M rendering infrastructure App based on a graphics stream processing model. The Stream Processing Unit (SPU) SPU Filter view OpenGL SPU interface is the OpenGL API Encode/Xmit Render, modify, absorb Allows direct OpenGL rendering SPU SPU Supports SPU inheritance Application translucent Development: chromium.sourceforge.net RedHat/Tungsten Graphics ASCI PathForward Stanford, University of Virginia Stereo, Fragment/Vertex pgms, CRUT, dynamic caching OpenGL Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 15 X11 OpenGL OpenGL

Distributed GL: Chromium Application Toolkits (e.g., OpenRM, VTK, etc) Chromium DMX Merlot PICA Distributed OpenGL rendering pipeline. Provides a parallel OpenGL interface for an N to M rendering infrastructure App based on a graphics stream processing model. OpenGL API The Stream Processing Unit (SPU) State tracking Filter view OpenGL Tiling/Sorting SPU interface is the OpenGL API Encode/Xmit Render, modify, absorb Allows direct OpenGL rendering Decode Decode Supports SPU inheritance Parallel API Parallel API Application translucent Development: chromium.sourceforge.net RedHat/Tungsten Graphics ASCI PathForward Stanford, University of Virginia Stereo, Fragment/Vertex pgms, CRUT, dynamic caching OpenGL Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 16 X11 OpenGL OpenGL

Parallel X11 Server: DMX Application Toolkits (e.g., OpenRM, VTK, etc) Chromium DMX Merlot PICA Distributed multi-headed X server: DMX Aggregates X11 servers Server of servers for X11 Single X server interface Accelerated graphics 2D via accelerated X server Common extensions as well Back-side APIs for direct, local X11 server access OpenGL via ProxyGL/GLX (from SGI) or via Chromium SPU Development: dmx.sourceforge.net RedHat ASCI PathForward contract Integrated with XFree86 OpenGL Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 17 X11

Remote Delivery: Merlot Application Toolkits (e.g., OpenRM, VTK, etc) Chromium DMX Merlot PICA X11 OpenGL Merlot is a framework for digital image delivery Transport layer abstraction, Codec interfaces, Device transparency MIDAS: Merlot Image Delivery Application Service Indirect OpenGL rendering services for X11 environment Indirect window management X11 Server w/opengl Merlot MIDAS X11 Image stream transport Development: MIDAS App OpenGL X11 Server X11 To be released as OpenSource this month (SourceForge?) More apps and experimental hardware support Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 18

Applications Full-featured viz VisIt: www.llnl.gov/visit VTK, client-server model ParaView: www.paraview.org Parallel VTK viz tool Specialty applications Blockbuster: blockbuster.sourceforge.net Scalable animations, DMX aware TeraScale Browser/Xmovie/MIDAS www.llnl.gov/icc/sdd/img/viz.shtml X11 Application Toolkits (e.g., OpenRM, VTK, etc) Chromium DMX Merlot PICA OpenGL Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 19

Deployed Viz Environment: ASCI White Data manipulation 512p on White (32 nodes) 96p SGI Onyx3 Networking Multi-gigE, Jumbo frames Rendering engines 40p and 64p SGI Onyx2, 10 and 16 IR2 pipes Video delivery Lightwave switch/modems ~20 Desktops (1280x1024, 1920x1200, 2560x2048) PowerWalls (5x3: 19.6Mpixel, 4x2: 10Mpixel) Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 20

Deployed Viz Environment: MCR & PVC MCR: 1,116 P4 Compute Nodes PVC: 58 P4 Render nodes 6 Display nodes 1,152 Port QsNet Elan3 QsNet Elan3, 100BaseT Control 2 MetaData(fail-over) Servers 32 Gateway nodes @ 140 MB/s MDS delivered Lustre I/O over 2x1GbE 2 Login, 2 Service MDS GW GbEnetFederated Switch GW GW GW GW GW GW GW Aggregated s, single Lustre file system MDS MDS GW GW GW GW Digital Desktop Displays 128 Port QsNet Elan3 Video Switch 3x2 PowerWall 7.8Mpixel Advanced Display Environments Workshop, Aug 6-8, 2003 Analog Desktop Displays rjf, 21

Examples: Systems in use Today MCR+PVC science runs Custom renderers Hardware rendering Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 22

Examples: Systems in use Today SGE: digitally driven displays SNL 62Mpixel wall surfaces DMX: desktop integration Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 23

Current Issues Data access Cluster-wide parallel I/O systems can be fragile Often not optimized for access patterns (e.g. random reads) The impedance mismatch problem Smaller number of nodes generally for visualization Improper data decomposition Scheduling complexity Co-scheduling of multiple clusters Combinations of parallel clients, servers, services and displays Visualization/rendering algorithms issues Extreme dataset sizes and ranges Complexity of visuals from large datasets Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 24

The Road Ahead The arena is very dynamic Longevity challenge: software and hardware New graphics bus (I Express) Changes in video technologies DVI to 10gigE (TeraBurst) Next generation graphics cards New rendering abstractions (are polygons dead?) How to address current card bottlenecks (e.g. setup)? Big data and failing algorithms Scaling and representational issues The extreme FLOP approach Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 25

The NV30 and the Sony Playstation 3 Are graphics trends a glimpse of the future? The nvidia NV30 Architecture 256MB+ RAM, 96 32bit IEEE FP units @ 500Mhz Assembly language for custom operations Streaming internal infrastructure The PlayStation3 (patent application) Core component is a cell 1 Power CPU + 8 APUs ( vectorial processors) 4GHz, 128K RAM, 256GFLOP/cell Building block for multimedia framework Multiple cells Four cell architecture (1TFLOP) Central 64MB memory Switched 1024 bit bus, optical links? nvidia NV30 Sony PS2 Emotion Engine Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 26

The Streaming Programming model Streaming exposes concurrency and latency at the system level as part of the programming target Data moves through the system: exposed concurrency Avoid global communication: prefer implicit models (e.g. Cr) Memory model: exposed latency/bandwidth Scalable, must support very small footprints Distributed, implicit flow between each operation A working model: Computational elements + caching and bandwidth constraints External oracle for system characterization and realization Goals: Optimally trade off computation for critical bandwidth Leverage traditionally hidden programmable elements Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 27

Streaming Impacts on Software Design How does one target this model? Integrated data structure and algorithm design Run-time targets Algorithm remapping Intent expressive and architecture aware Abstract run-time compiled languages Memory design is key Small memory models, out-of-core design Cache oblivious data flow Implementations New languages: Cg, Brook, DSP-C, Stream-C Hidden beneath layers of API: OpenGL, Chromium, Lustre It sounds like a lot of work (it can be), is it worth it? Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 28

Multiresolution Array Access: VISUS Arbitrary, multi-resolution array data access Integrated algorithm/data structure design Data reordering on a spacefilling curve In-place transformation Reordering is a simple bit manipulation Cache oblivious Arbitrary blocking is supported Coupled asynchronous query system 1000.000 100.000 Parallel rendering and queries RAM used as a cache Example Slicing 8B cells 15MB RAM 250Mhz SGI Onyx (1024x1024) Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 29 Seconds 10.000 1.000 0.100 0.010 0.001 1 10 100 Resolution ARR-Rot BIT-Rot BLK-Rot ARR-Trans BIT-Trans BLK-Trans

Next Generation Streaming Cluster Computation and memory caches everywhere NICs, Drive controllers, Switches, TFLOP GPUs Add I Express and the GPU effectively becomes a DSP chip Utilizing them may require a disruptive programming shift Modified visualization algorithms Cache oblivious: local, at the expense of computation Non-graphical algorithms & a move away from polygon primitives Need to address data scaling and representation issues New languages with higher levels of abstractions Run-time realization, dynamic compilation and scheduling Glue languages: shader languages, graphics APIs themselves Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 30

Auspices: UCRL-PRES-154718 This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness,, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, ent, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes. This work was performed under the auspices of the U.S. Department t of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405W 7405-Eng-48. Advanced Display Environments Workshop, Aug 6-8, 2003 rjf, 31