The Design and Implement of Ultra-scale Data Parallel. In-situ Visualization System



Similar documents
How To Share Rendering Load In A Computer Graphics System

Petascale Visualization: Approaches and Initial Results

Ultra-Scale Visualization: Research and Education

Parallel Visualization for GIS Applications

An Integrated Simulation and Visualization Framework for Tracking Cyclone Aila

Facts about Visualization Pipelines, applicable to VisIt and ParaView

A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations

Parallel Large-Scale Visualization

LA-UR- Title: Author(s): Intended for: Approved for public release; distribution is unlimited.

Parallel Analysis and Visualization on Cray Compute Node Linux

Stream Processing on GPUs Using Distributed Multimedia Middleware

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Large Scale Data Visualization and Rendering: Scalable Rendering

ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI

Remote Graphical Visualization of Large Interactive Spatial Data

2 Paper 1022 / Dynamic load balancing for parallel volume rendering

Towards Scalable Visualization Plugins for Data Staging Workflows

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

A Job Scheduling Design for Visualization Services using GPU Clusters

Visualization and Data Analysis

In Situ Visualization for LargeScale Combustion Simulations

In-Situ Processing and Visualization for Ultrascale Simulations

A Distributed Render Farm System for Animation Production

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Visualization Technology for the K computer

Large-Data Software Defined Visualization on CPUs

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Load Balancing on Cluster-Based Multi Projector Display Systems

GraySort on Apache Spark by Databricks

Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm

Data Visualization in Parallel Environment Based on the OpenGL Standard

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Using open source and commercial visualization packages for analysis and visualization of large simulation dataset

Parallel Programming Survey

A Chromium Based Viewer for CUMULVS

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

A Contract Based System For Large Data Visualization

L20: GPU Architecture and Models

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

An Analytical Framework for Particle and Volume Data of Large-Scale Combustion Simulations. Franz Sauer 1, Hongfeng Yu 2, Kwan-Liu Ma 1

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Kriterien für ein PetaFlop System

OpenMP Programming on ScaleMP

SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM

Parallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization Toolkit

Remote Large Data Visualization in the ParaView Framework

NVIDIA GeForce GTX 580 GPU Datasheet

Elements of Scalable Data Analysis and Visualization

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

MapReduce and Hadoop Distributed File System

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

A Hybrid Load Balancing Policy underlying Cloud Computing Environment

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

DIY Parallel Data Analysis

Multi-GPU Load Balancing for In-situ Visualization

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Understanding the Benefits of IBM SPSS Statistics Server

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia

Performance Analysis and Optimization Tool

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Jean-Pierre Panziera Teratec 2011

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA

Rapid 3D Seismic Source Inversion Using Windows Azure and Amazon EC2

HiBench Introduction. Carson Wang Software & Services Group

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

Image Search by MapReduce

Online Failure Prediction in Cloud Datacenters

Cray: Enabling Real-Time Discovery in Big Data

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast

Introduction to DISC and Hadoop

Understanding the Value of In-Memory in the IT Landscape

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Research and Application of Redundant Data Deleting Algorithm Based on the Cloud Storage Platform

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

CHAPTER FIVE RESULT ANALYSIS

Hank Childs, University of Oregon

GPU Architecture. Michael Doggett ATI

Recommended hardware system configurations for ANSYS users

A Flexible Cluster Infrastructure for Systems Research and Software Development

Big Data Storage Architecture Design in Cloud Computing

Big Data with Rough Set Using Map- Reduce

Load Balancing Utilizing Data Redundancy in Distributed Volume Rendering

Advanced Rendering for Engineering & Styling

A Dynamic Load-Balancing Approach for Efficient Remote Interactive Visualization

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Technical bulletin: Remote visualisation with VirtualGL

Multi-GPU Load Balancing for Simulation and Rendering

MayaVi: A free tool for CFD data visualization

A Survey Study on Monitoring Service for Grid

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article. An independence display platform using multiple media streams

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

Transcription:

The Design and Implement of Ultra-scale Data Parallel In-situ Visualization System Liu Ning liuning01@ict.ac.cn Gao Guoxian gaoguoxian@ict.ac.cn Zhang Yingping zhangyingping@ict.ac.cn Zhu Dengming mdzhu@ict.ac.cn Abstract GPU cluster enable people to model and visualize scientific phenomena at a stunning degree of precision and sophistication. But it is more and more impractical to store the data to disk in real time because of the scale of the output data, and thus the conventional post-processing based technique no longer applies. In response to this challenge,we design and implement an in-situ visualization system based on MPI and CUDA. In this system, the simulation code and visualization code are coupled together and the visualization is performed locally as soon as the data is generated using a GPU visualization pipeline. In addition, we adopt a display wall to show the rendered image, which is capable of showing very fine details of the simulation output. 1. Introduction Now the power of parallel computing system is growing dramatically, as a result, it is more and more impractical to store the output of scientific simulation to hard disk for the lack of sufficient disk space and network bandwidth. The conventional post-processing based visualization method usually requires the output to be dumped to the hard disk before it goes through further processing, so it is not appropriate to be adopted to visualize ultra-scale data generated by simulation program in real time. In this paper, we design and implement an ultra-scale data in-situ visualization system to tackle these obstacles. Our system can fully exploit the parallel computing power of the cluster or super computers. The MPI (Message Passing Interface) is a commonly used standard to write parallel programs running on distributed memory systems. The nodes of these systems can communicate with each other according to the MPI interface. This is a relatively coarser grain of parallelism. In contrast with MPI, CUDA (Compute Unified Device Architecture) accomplishes parallelism using modern GPU. There may exist hundreds of cores in one GPU, and each core can serve as a powerful computing unit. If used properly, the CUDA program can achieve one or two orders of magnitude compared to traditional CPU computing. This is a finer grain of parallelism. 2. Related works Molnar et al. [1] divide parallel rendering into three categories based on the mapping process from object space to screen space, that is, sort-first, sort-middle and sort-last. Sort-first is suitable for small data while sort-last scales well for large-scale data. If the data size is too large, sort-last is the only feasible option.

Ma [2] introduced Binary-Swap algorithm and it has been adopted in many applications because of its simplicity and good performance. But it restricts the number of nodes to be power of two, while in most cases the condition does not hold. Yu [3] extends this algorithm such that this restriction is released. Moreland et al. [4] proposed an image composition algorithm based on Binary-Swap that can support display wall. Ma [5] made a good outline of ultra-scale data visualization, including load balancing, in-situ visualization and intellectual feature tracking. When the scale of the data reaches a certain threshold, the conventional post-processing based method no longer applies and the in-situ visualization becomes a promising approach. Ma [6] gave some key points of in-situ visualization. 3. System Design The goal of the system is to visualize the simulation-generated data in real time and offer the users the ability to interact with the simulation and visualization process. The system consists of a master node, a simulation & visualization cluster and a display wall. The nodes within the cluster cooperate with each other through the MPI interface and the master node uses the socket to communicate with the cluster. Fig. 1 overall design of the system 3.1 Master Node(M-Node) Users interact with the system through a QT GUI running on the master node. It is equipped with a monitor that is responsible to show the thumbnail delivered back from the cluster. Users request some certain services and the cluster gives back the requested result. 3.2 Simulation and Rendering Node (SR Node) The simulation and rendering node performs scientific simulation and produces some elementary image results that are composed afterwards to get the whole image. The elementary results are also down-sampled and delivered to the master node such that the

master node can fetch a thumbnail from them. Part of the SR nodes are also responsible to drive the display wall and thus require DVI port to be installed. All the SR nodes need to support OpenGL in order to perform visualization. 4. Parallel Visualization Parallel visualization can be divided into four stages, that is, data distribution, local rendering, image composition and image output. We mainly focus on two problems: one is load balancing and the other is to minimize the communication delay within the cluster. 4.1 Data Distribution In the in-situ visualization mode, the scientific simulation usually adopts uniform data partition technique, as a result, the data distribution is suitable for visualization. We just follow the simulation data partition way for simplicity. If the data partition of the simulation does not adapt to the visualization calculation, the data must be re-distributed. 4.2 Local Rendering We use VTK(The Visualization Toolkit) to do local rendering. Because the simulation generated data reside in the GPU memory, but the conventional VTK visualization pipeline can only handle the data that are in the memory, so we design a new CUDA based pipeline and integrate it into the existing VTK framework. The source of the pipeline is simulation data and it goes through Filter and Mapper stage and finally is visualized by the Render. Fig.3 is the GPU visualization pipeline. Fig. 2 the GPU visualization pipeline 4.3 Image Composition Binary-Swap is a commonly used and has been proven to be the best image composition strategy if the number of nodes is power of two. Binary-Swap consists of logn stages and in each stage the nodes are divided into several pairs. A node only communicates with its pair in each stage. Finally, each node holds 1/N of the whole image and an image collection is performed to get the whole image. 4.4 Image Output The composited image is displayed during the image output stage. Because of the scale of the data, if we would like to see the details, we need the resolution of the image to be large high enough. Thus we use a display wall to show the rendered result. We utilize Moreland's algorithm to accomplish our goal. The algorithm is executed as follows: The nodes are divided into several groups according to the number of the tiles in the display wall, then each group is responsible for its corresponding tile. At first, the groups exchange image data with each other such that in each group the nodes only hold the data that will be displayed in its corresponding tile. After that, Binary-Swap is performed within each group. In

each group, there exists a drawing node that collects the composited image in that group and eventually shows the image on the display wall. 5. In-situ Visualization Conventional visualization technique is commonly based on post-processing model, that is, the simulation program stores the output data to file system and then some other dedicated visualization machines will fetch the data and do corresponding processing. The co-processing model steps a little forward. In this model, the data is transfered directly to the visualization machines after it is generated. If the visualization and the simulation calculations are done on the same machine, then we get the in-situ model. In-situ visualization can greatly reduce the demand for disk I/O and network bandwidth. Furthermore, because the result of in-situ visualization is continuous and so we can observe some information that is hard or impossible to get from post-processing visualization. In our system, the simulation code acts as a module of the visualization program and it uses the MPI communicator of the visualization side. The simulation side sends the generated data to the visualization side through a proxy. The design is quite flexible and all we need to do is to design a proxy and fill the data structure of the visualization side with the generated data from the simulation side. 6. Results We carried out an experiment on a GPU cluster consisting of 32 nodes, and the nodes use InfiniBand to communicate with each other. Each node is equipped with four GeForce GTX 295 video cards, one Xeon 5430 2.66GHz CPU, one 1.5TB hard disk. The final frame rate is about 12fps and in each frame about 30MB data are generated in each node, so the overall data that the system needs to process in one second is around 10GB. We showed the results in the 50 th, 100 th, 200 th and 300 th frame. Fig. 3 experiment results 7. Conclusion We design and implement an ultra-scale

in-situ visualization system in this paper. Our work mainly focuses on the design of in-situ visualization and the GPU visualization pipeline. The in-situ visualization technique can process the data as soon as they are generated and greatly reduce the demand for disk space and network bandwidth. The GPU visualization pipeline is capable of handling the data residing in the GPU memory, without the need of transferring the data between GPU memory and main memory. We just implement some basic visualization techniques, such as contour and volume rendering, so our future work will stress on the functions of system. large-data visualization and graphics, 2001 [5] Kwan-Liu Ma, "Ultra-Scale Visualization", Third International Symposium on ransdisciplinary Fluid Integration, 2006 [6] Kwan-Liu Ma, Chaoli Wang, Hongfeng Yu, Anna Tikhonova, "In-Situ Processing and Visualization for Ultrascale Simulations", Journal of Physics, 2007 [7] Hongfeng Yu, Chaoli Wang, Ray W. Grout, "A Study of In-Situ Visualization for Petascale Combustion Simulations", tech. report CSE-2009-9, Dept. of Computer Science, Univ. California, Davis, 2009 Acknowledgements This work is supported and funded by National Natural Science Foundation of China Grant No. 60703019, National Key Technology R&D Program 2008BAI50B07, and NSFC-Guangdong Joint Fund (U0935003). National 863 Foundation of China Grant No. 2009AA01Z312. References [1] Steven Molnar, Michael Cox, David Ellsworth, Henry Fuchs, "A Sorting Classification of Parallel Rendering", IEEE Computer Graphics and Applications, 1994 [2] Kwan Liu Ma, "Parallel Volume Rendering Using Binary Swap Image Composition", IEEE Computer Graphics and Applications, 1994 [8] Christiaan P. Gribble, Abraham J. Stephens, James E. Guilkey, Steven G. Parker, "Visualizing Particle-Based Simulation Datasets on the Desktop", British HCI 2006 Workshop on Combining Visualization and Interaction to Facilitate Scientific Exploration and Discovery, 2006 [9] J Meredith, KL Ma, "Multiresolution View-Dependent Splat Based Volume Rendering of Large Irregular Data, IEEE Symposium on Parallel and Large-Data Visualization and Graphics", 2001 [10] Kenneth Moreland, Lisa Avila, and Lee Ann Fisk, "Parallel Unstructured Volume Rendering in ParaView" In Visualization and Data Analysis 2007, Proceedings of SPIE-IS&T Electronic Imaging,2007 [3] H Yu, C Wang, KL Ma, "Massively Parallel Volume Rendering Using 2-3 Swap Image Compositing", ACM SIGGRAPH ASIA, 2008 [4] Kenneth Moreland, Brian Wylie, Constantine Pavlakos" Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays "Proceedings of the IEEE symposium on parallel and