Petascale Visualization: Approaches and Initial Results

Similar documents
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

The Design and Implement of Ultra-scale Data Parallel. In-situ Visualization System

Large-Data Software Defined Visualization on CPUs

Parallel Analysis and Visualization on Cray Compute Node Linux

Parallel Large-Scale Visualization

Introduction to Cloud Computing

Remote Graphical Visualization of Large Interactive Spatial Data

Introduction to GPGPU. Tiziano Diamanti

Data Centric Systems (DCS)

Trends in High-Performance Computing for Power Grid Applications

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Parallel Visualization for GIS Applications

1. INTRODUCTION Graphics 2

Retargeting PLAPACK to Clusters with Hardware Accelerators

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Parallel Programming Survey

Facts about Visualization Pipelines, applicable to VisIt and ParaView

Stream Processing on GPUs Using Distributed Multimedia Middleware

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Advanced Rendering for Engineering & Styling

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Fast Parallel Algorithms for Computational Bio-Medicine

HPC Wales Skills Academy Course Catalogue 2015

Sun Constellation System: The Open Petascale Computing Architecture

Introduction to Computer Graphics

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Computer Graphics Hardware An Overview

IBM Deep Computing Visualization Offering

Low power GPUs a view from the industry. Edvard Sørgård

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Next Generation GPU Architecture Code-named Fermi

GPU for Scientific Computing. -Ali Saleh

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Lecture 2 Parallel Programming Platforms

The Future Of Animation Is Games

NVIDIA IndeX. Whitepaper. Document version June 2013

OpenMP Programming on ScaleMP

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky


Clusters: Mainstream Technology for CAE

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Multi-GPU Load Balancing for Simulation and Rendering

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

Optimizing AAA Games for Mobile Platforms

Computer Graphics. Computer graphics deals with all aspects of creating images with a computer

Lecture Notes, CEng 477

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Summit and Sierra Supercomputers:

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Remote Visualization and Collaborative Design for CAE Applications

How To Visualize At The Dlr

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Turbomachinery CFD on many-core platforms experiences and strategies

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

Introduction to GPU Programming Languages

Visualization Infrastructure and Services at the MPCDF

Parallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization Toolkit

Several tips on how to choose a suitable computer

Data Visualization in Parallel Environment Based on the OpenGL Standard

Parallel Firewalls on General-Purpose Graphics Processing Units

Adding Animation With Cinema 4D XL

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Cray: Enabling Real-Time Discovery in Big Data

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Data Centric Interactive Visualization of Very Large Data

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

GPU Parallel Computing Architecture and CUDA Programming Model

In-situ Visualization: State-of-the-art and Some Use Cases

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Performance of the JMA NWP models on the PC cluster TSUBAME.

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Symmetric Multiprocessing

GeoImaging Accelerator Pansharp Test Results

ECLIPSE Performance Benchmarks and Profiling. January 2009

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Hardware design for ray tracing

High Performance Computing in CST STUDIO SUITE

Evaluation of CUDA Fortran for the CFD code Strukti

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Visualization and Data Analysis

LS DYNA Performance Benchmarks and Profiling. January 2009

How to choose a suitable computer

On Demand Satellite Image Processing

Deploying and managing a Visualization Onera

OctaVis: A Simple and Efficient Multi-View Rendering System

A Hybrid Visualization System for Molecular Models

Transcription:

Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos National Security, LLC for DOE/NNSA

Questions about visualization in the petascale era What are our options for running our visualization software? Can we run our visualization software on the supercomputer? Do we need to a visualization cluster to support the supercomputer? Define supercomputer and visualization options Current approach and performance New approach Ray-tracing for rendering

Trends in petascale supercomputing Lots of compute cycles Multi-core revolution Increasing latency from processor to memory, disk and network Many memory-only simulation results Can compute significantly more data than can be saved to disk For example, on RR To disk: 1 Gbyte/sec Compute: 100 Gbytes on a triblade from Cells to Cell memory Very expensive

Supercomputing platforms Definition of supercomputing platform Type of node Co-processor architecture Example: Roadrunner Multi-core processor Example: 16-way CPU (4 x 4 quad Opteron)

Roadrunner architectural overview Connected Unit cluster 180 Triblade compute nodes w/ Cells 12 I/O nodes 6,480 dual-core Opterons 23.3 Tflop/s (DP) 12,960 Cell edp chips 1.3 Pflop/s (DP) c 18 clusters c 288-port IB 4x DDR 288-port IB 4x DDR 12 links per CU to each of 8 switches Eight 2 nd -stage 288-port IB 4X DDR switches Slide 5

Roadrunner is Cell-accelerated, not a cluster of Cells Cell-accelerated compute node Add Cells to each individual node Multi-socket multi-core Opteron cluster nodes (100 s of such cluster nodes) I/O gateway nodes Scalable Unit Cluster Interconnect Switch/Fabric Node-attached Cells is what makes Roadrunner different! Slide 6

IBM Cell processors powers the Playstation 12960 Cell chips in Roadrunner! In Playstation the Cell is used for physics processing e.g. Little Big Planet We plan to use the Cell for rendering Slide 7

Can we efficiently run our visualization/rendering software on the supercomputer? The data understanding process is composed of a number of activities: Analysis and statistics Visualization Rendering Map simulation data to a visual representation (i.e geometry) Map geometry to imagery on the screen Already runs on the supercomputer Analysis, statistics and visualization Issue is rendering Fast rendering for interactive exploration 5-10 fps minimum, 24-30 fps HDTV, 60 fps - stereo Typically provided by commodity graphics in a visualization cluster

Related Work Visualization hardware SGIs (late 1998) SGI shared memory machine Blue Mountain ran Linpack, one of the computer industry's standard speed tests for big computers, at a fast 1.6 trillion operations per second (teraops), giving it a claim to the coveted top spot on the TOP500 list, the supercomputer equivalent of the Indianapolis 500. Integrated Reality Engine graphics ($250K/each) Commodity clusters (2004) Leverage commodity technology to replace SGI infrastructure Game cards, PC-class nodes, Infiniband networks What is next? Slide 9

Analysis of tradeoffs Visualization/rendering on supercomputer or cluster Visualization/rendering on the supercomputer Disadvantages Cost to port rendering to the supercomputing platform Allocate portion of supercomputer to analysis and visualization Advantages Scalable to supercomputer size Access to all simulation results Visualization/rendering on cluster Disadvantages Cost of cluster and infrastructure to connect it Less access to data only data that is written to disk Advantages Independent resource devoted to visualization task Very fast especially on smaller datasets

Standard parallel rendering solution Sort-last parallel rendering of large data Sort-last parallel rendering algorithms have two stages: 1. Rendering stage The node renders its assigned geometry into a distance/depth buffer and image buffer 2. Networking / compositing stage These image buffers are composited together to create a complete result Given there are two stages the performance is limited by the slower stage Assuming pipelining of the stages

Performance study For real-world performance testing and to prepare for petascale visualization tasks Incorporate rendering approaches into vtk/paraview Vtk is open-source visualization library Paraview (PV) is open-source parallel large-data visualization tool Initially render on two types of nodes Multi-core node - 1, 2, 4, 8, 16 way Mesa using multiple processes via parallel vtk Data automatically partitioned and rendered by each process On-node compositing to create final image GPU Standard OpenGL driver

Vtk/PV rendering performance standard approach 1 Million polygons rendering to a 1Kx1K image Rendering Type Software Architecture Frames per second Scan conversion OpenGL Nvidia Quadro FX 5600 18.6 1. Vtk GPU hardware rendering performance could be improved. Rendering Type Scan conversion Frames per second for # of cores Software Architecture 1 2 4 8 16 Open GL Mesa Multi-core (4 quad opt.) 0.7 1.2 2.0 3.2 4.6

Networking IB-1, IB-2 compositing performance 50.00 50.00 45.00 45.00 40.00 40.00 Frames per second 35.00 30.00 25.00 20.00 Network only - Frames per second Frames per second 35.00 30.00 25.00 20.00 15.00 15.00 10.00 10.00 5.00 5.00 0.00 2 4 8 16 32 64 128 0.00 2 4 8 16 32 64 128 Number of processors Number of processors

Summary Rendering and networking performance 10-15 frames per second on IB GPU-based 20 frames per second CPU-based/supercomputer 5 frames per second with Mesa software rendering This seems to suggest that visualization clusters are the right approach Slide 15

Another type of rendering Scan conversion of polygons 1. OpenGL Software Mesa - open-source 2. OpenGL Hardware Graphics cards Nvidia Raytracing Fast multi-core ready implementations For RR - IBM s irt software Cell processor. University of Utah Manta software Multi-core optimized, open-source

Why ray tracing? Advanced rendering model More accurate lighting physics model Shadows, reflections, refractions Flexible software-based approach Ability to integrate compute, analysis & rendering Current SPaSM Rendering Images courtesy Christiaan Gribble, Grove City College, PA (done while at Univ. of Utah) Slide 17

Using raytracing for rendering in vtk/pv To be clear -- Raytracing as a scan conversion/opengl replacement for parallel rendering Why? Optimized multi-core implementations available for ray-tracing For this study, if there was an optimized multi-core OpenGL software we would use that: Aside - Tungsten Graphics is working on a Cell-based Mesa effort Part of Gallium3D architecture Their own rendering abstraction infrastructure Slide 18

Using Manta raytracing for rendering in vtk/pv Use vtk s rendering abstraction Technical challenges Data representation Mapping between representation of polygons in vtk and raytracer issues in 2D points, lines Scalar color mapping Synchronization of control Vtk runs in one thread and raytracer has many threads Vtk and raytracer have their own event loop Use callback mechanism in ray-tracer for synchronization Slide 19

VtkManta serial rendering Slide 20

Vtk/PV rendering performance 1 Million polygons rendering to a 1Kx1K image Rendering Type Software Architecture Frames per second Scan conversion OpenGL Nvidia Quadro FX 5600 18.6 Rendering Type Scan conversion Raytracing Frames per second for # of cores Software Architecture 1 2 4 8 16 Open GL Mesa Manta Multi-core (4 quad opt.) Multi-core (4 quad opt.) 0.7 1.2 2.0 3.2 4.6 1.6 2.8 5.6 10.9 19.4

Parallel rendering with raytracing Render with raytracer on each node Need a depth value for sort-last compositing Raytracer calculates distance from eye to polygon At each pixel can use this value as a depth value for sort-last compositing Slide 22

Parallel rendering with Manta in ParaView

Summary Rendering and networking performance 10-15 frames per second on IB GPU-based 20 frames per second CPU-based / on supercomputer 20 frames per second with Manta raytracing (16-way multicore node) This suggests trend is towards using the supercomputer and away from visualization cluster Slide 24

Future work irt interface to vtk/pv in the works 1 Million polygons rendering to a 1Kx1K image - preliminary Rendering Type Software Architecture Frames per second Raytracing irt Cell blade (16 SPUs) 42 Integration of IBM Cell-based ray-tracer into PV for visualization on RR platform Advanced ray-tracing

Conclusions This preliminary study suggests that: Multi-core processors are starting to serve some of roles of traditional GPUs such as parallel rendering Using fast software-based rendering methods may offer a path to utilizing our supercomputers for visualization