A Domain-Specific Interpreter for Parallelising a Large Mixed-Language Visualisation Application

Similar documents
Parallel Large-Scale Visualization

A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations

Facts about Visualization Pipelines, applicable to VisIt and ParaView

James Ahrens, Berk Geveci, Charles Law. Technical Report

Part I Courses Syllabus

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Last lecture... Computer Graphics:

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

L20: GPU Architecture and Models

Data Centric Systems (DCS)

Control 2004, University of Bath, UK, September 2004

Multicore Parallel Computing with OpenMP

Multi-Threading Performance on Commodity Multi-Core Processors

BPM, EDA and SOA: How the Combination of these Technologies Facilitates Change. Dr. Neil Thomson, Head of Group Development, Microgen plc

A Hybrid Visualization System for Molecular Models

A Performance Data Storage and Analysis Tool

Big Data Visualization on the MIC

Multi-GPU Load Balancing for Simulation and Rendering

Object Oriented Database Management System for Decision Support System.

For designers and engineers, Autodesk Product Design Suite Standard provides a foundational 3D design and drafting solution.

CHAPTER FIVE RESULT ANALYSIS

MOSIX: High performance Linux farm

HPC Wales Skills Academy Course Catalogue 2015

Observations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications

MEng, BSc Computer Science with Artificial Intelligence

Driving force. What future software needs. Potential research topics

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Parallel Analysis and Visualization on Cray Compute Node Linux

The Design and Implement of Ultra-scale Data Parallel. In-situ Visualization System

Stream Processing on GPUs Using Distributed Multimedia Middleware

1 Bull, 2011 Bull Extreme Computing

Visualization with ParaView. Greg Johnson

MEng, BSc Applied Computer Science

Scientific Visualization with Open Source Tools. HM 2014 Julien Jomier

Visualization of Semantic Windows with SciDB Integration

Parallel Simplification of Large Meshes on PC Clusters

Violin: A Framework for Extensible Block-level Storage

Cluster Computing at HRI

OpenMP Programming on ScaleMP

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

High Performance Computing in CST STUDIO SUITE

Post-processing and Visualization with Open-Source Tools. Journée Scientifique Centre Image April 9, Julien Jomier

The Service Revolution software engineering without programming languages

Code Generation Tools for PDEs. Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory

Efficiency Considerations of PERL and Python in Distributed Processing

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

A Chromium Based Viewer for CUMULVS

SharePoint Impact Analysis. AgilePoint BPMS v5.0 SP2

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

An Introduction to High Performance Computing in the Department

VisIVO, a VO-Enabled tool for Scientific Visualization and Data Analysis: Overview and Demo

Unlocking the True Value of Hadoop with Open Data Science

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Data Visualization in Parallel Environment Based on the OpenGL Standard

Improved metrics collection and correlation for the CERN cloud storage test framework

BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE. D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A.

Big Data Mining Services and Knowledge Discovery Applications on Clouds

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Client/Server Computing Distributed Processing, Client/Server, and Clusters

BLM 413E - Parallel Programming Lecture 3

Spring 2011 Prof. Hyesoon Kim

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

MayaVi: A free tool for CFD data visualization

MPI and Hybrid Programming Models. William Gropp

Reliable Systolic Computing through Redundancy

Hectiling: An Integration of Fine and Coarse Grained Load Balancing Strategies 1

Building Semantic Content Management Framework

Achieving Performance Isolation with Lightweight Co-Kernels

LDIF - Linked Data Integration Framework

Petascale Visualization: Approaches and Initial Results

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS

supercomputing. simplified.

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

SUSE Linux Enterprise 10 SP2: Virtualization Technology Support

Parallel Programming Survey

Automated deployment of virtualization-based research models of distributed computer systems

Performance Monitoring of Parallel Scientific Applications

Visualization and Post Processing of OpenFOAM results a Brie. a Brief Introduction to VTK

Data Mining with Hadoop at TACC

Scaling Out With Apache Spark. DTL Meeting Slides based on

A Pattern-Based Approach to. Automated Application Performance Analysis

Jozef Matula. Visualisation Team Leader IBL Software Engineering. 13 th ECMWF MetOps Workshop, 31 th Oct - 4 th Nov 2011, Reading, United Kingdom

Putting Checkpoints to Work in Thread Level Speculative Execution

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

LinuxWorld Conference & Expo Server Farms and XML Web Services

Parallel Visualization for GIS Applications

Introduction to Spark

Software Tools for Parallel Coupled Simulations

Visualization with ParaView

Overview of HPC systems and software available within

Scaling from 1 PC to a super computer using Mascot

Trimble. ecognition. System Requirements

Bringing Big Data Modelling into the Hands of Domain Experts

End-user Tools for Application Performance Analysis Using Hardware Counters

Multiple Programming Models For Linux System Design and Development

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Transcription:

A Domain-Specific Interpreter for Parallelising a Large Mixed-Language Visualisation Application Karen Osmond, Olav Beckmann, Anthony J. Field and Paul H. J. Kelly Department of Computing,, 180 Queen s Gate, SW7 2AZ, United Kingdom http://www.doc.ic.ac.uk/ ob3 Mixed-Language Visualisation Application 1/20

Visualising Large Ocean Current Simulations MayaVi Modular Visualisation Environment Graphical interface for composing analysis and rendering components 22,000 LOC, Python + VTK Open source, active development Poor interactive performance limits usefulness Mixed-Language Visualisation Application 2/20

Python/VTK Visualisation Software Architecture Visualisation typically involves a pipeline of feature-extraction operations When working on extremely large datasets, response time for interactive parameterisation of the visualisation pipeline is poor. The challenge is to make visualisation of large datasets interactive by improving use of memory hierarchy and parallelisation Application or Script written in Python, interpreted Python VTK Bindings VTK written in C++, compiled OpenGL / XGL etc. Multi-language: Python, C++, C Component-based Actively changing code base, maintained by people who have no time for parallelisation Mixed dynamic / static Domain-specific semantics in DSL (VTK) Mixed-Language Visualisation Application 3/20

Object-Oriented Visualisation in VTK Graphics Model: Object-oriented representation of 3D computer graphics VTK Visualisation Pipeline Visualisation Model Model of data flow. Capable of representing complex data-flow graphs: visualisation pipelines Data-flow graphs can be executed in a demand-driven or data-driven manner. Surprisingly similar to high-level compositional programming models. Mixed-Language Visualisation Application 4/20

Domain-Specific Libraries: Typical Use vtkcontourfilter... vtkpolydatamapper... vtkactor...... Render user-program Domain-specific libarary Program compiled with standard compiler (gcc, icc,... ) or interpreted with standard interpreter (e.g. python). DSL code mixed with other code. No domain-specific optimisation. Using such DSLs often dominates and constrains the way a software system is built just as much as a programming language. Compiling a quasi domain-specific language without a domain-specific compiler or optimiser. Typically miss out on cross-component optimisation opportunities that exploit the domain-specific semantics of the library. Mixed-Language Visualisation Application 5/20

Domain-Specific Interpreter Pattern vtkcontourfilter... vtkpolydatamapper... vtkactor...... Render user-program Capture vtkcontourfilter vtkpolydatamapper vtkactor Render Optimise for [all processors, per chunk] vtkcontourfilter vtkploydatamapper vtkactor Render end Domain-specific libarary User program is unmodified and is compiled with or interpreted by unmodified language compiler or interpreter. Capture all calls to methods from a DSL. Apply domain-specific optimisation, then call the underlying library. Mixed-Language Visualisation Application 6/20

Domain-Specific Interpreter Pattern vtkcontourfilter... vtkpolydatamapper... vtkactor...... Render user-program Capture vtkcontourfilter vtkpolydatamapper vtkactor Render Optimise for [all processors, per chunk] vtkcontourfilter vtkploydatamapper vtkactor Render end Domain-specific libarary Applicability (Requirements) Reliable capture VTK/Python bindings Reliable capture of data-flow through DSL routines. Opaque VTK data structures Profitability Domain-specific semantics Piecewise evaluation valid Opportunities for optimisations across method calls Size of intermediate data Mixed-Language Visualisation Application 7/20

Domain-Specific Interpreter for VTK in Python mv vtkpython.py vtkpython_real.py then vtkpython.py: 2 if ("vtkdsi" in os.environ): # Control DS Interpreter via Environment 3 import vtkpython_real # Original vtkpython.py re named 4 from vtkdsi import proxyobject 5 for classname in dir(vtkpython_real): # For all classes in this module 6 exec "class " + classname + "(proxyobject): pass" # class with no methods (yet) 7 else: 8 from vtkpython_real import # fall through to original VTK Python For all classes from vtkpython_real.py, create a class by the same name, with no methods, derived from proxyobject. Explicit hooks for capturing all field and method accesses (cf. AOP) 176 class proxyobject: 253 def getattr (self, callname): # Catch all method 256 return lambda callargs: self.proxycall(callname, callargs) # lambda call Mixed-Language Visualisation Application 8/20

Visualisation Recipes The scheme we showed on the previous slide works lazily for all calls through VTK Python interface We need to identify force points (i.e. Render()). Lazy indirection causes Python s reflection mechanism to break; therefore we actually use a more eager scheme. The proxy stores all calls made to VTK in a visualisation recipe. When a force point is reached, the recipes are evaluated. 1 [ construct, vtkconesource, vtkconesource_913 ] 2 [ callmeth, vtkconesource_913, return_926, SetRadius, 0.2 ] 3 [ callmeth, vtkconesource_913, return_927, GetOutput, ] 4 [ callmeth, vtktransformfilter_918, return_928, SetInput, "self.ids[ return_927 ]"] 5 [ callmeth, vtktransformfilter_918, return_929, GetTransform, ] 6 [ callmeth, return_929, return_930, Identity, ] Mixed-Language Visualisation Application 9/20

Optimising VTK Visualisation Pipelines Simulations generating the datasets we are visualising are run in parallel, resulting in a parallel tetrahedral VTK data set. This means: XML file giving locations of partitions Normally, VTK fuses the partitions into one whole dataset. If a dataset has not been generated as a collection of partitions, we can use METIS to create a partitioned version. VTK does have parallel routines data-parallel using MPI. We are interested in a more dynamic scenario, steered from a client. Render SPMD Render Render Render Render Client Data-Parallel P1 P2 P3 P4 Mixed-Language Visualisation Application 10/20

Coarse-Grained Tiling of VTK Visualisation Pipelines Render Render Render Render Client Large intermediate data means that multi-stage visualisation pipelines make poor use of memory hierarchy. Our domain-specific vtkpython interpreter builds a data-structure representing the sequence of operations performed. When the user-application calls Render(), we apply this partition-by-partition on the data-set. The only difference is an environment variable. Domain-specific semantics determine the validity of this transformation. Mixed-Language Visualisation Application 11/20

Coarse-Grained Tiling of VTK Visualisation Pipelines Calculating isosurfaces one partition at a time, showing outlines of partitions. Mixed-Language Visualisation Application 12/20

Shared-Memory Parallelisation Render Python Global Interpreter Lock P1 P2 P3 P4 Plan: execute the visualisation pipelines for each tile in parallel on an SMP. The first obstacle is that Python interpreter is not thread-safe! This can be overcome by manually lifting the GIL (global interpreter lock) on the C++ side. Some VTK routines are also not thread-safe, or do not have parallel semantics. Rendering via OpenGL is not thread-safe. So we do not lift the GIL when calling C++-side rendering from Python. Mixed-Language Visualisation Application 13/20

Distributed Memory Parallelisation Use a cluster of machines to perform the calculation in parallel and then render on one client machine. Used Python library Pyro to provide RMI-like features for Python. Pyro allows pickleable (serialisable) objects to be transferred over the network. Our recipes can be transferred to servers in the cluster in this way. Unfortunately, VTK objects cannot be serialised using the pickle mechanism. Therefore use a shared filesystem to transfer VTK objects. This is a dynamic, client-server model of distributed memory parallelisation, not data-parallel. Mixed-Language Visualisation Application 14/20

Evaluation Size: Approx 325 MB Benchmark Scenario Open a dataset representing flow over heated sphere Plot seven isosurfaces at different values Platforms Athlon 1600+, dual SMP, 256 KB L2, 1GB RAM, Linux 2.4 Cluster of 4 Pentium 4 2.8 GHz, 512 KB L2, 1GB RAM, Linux 2.4 Mixed-Language Visualisation Application 15/20

Results for IsoSurface Benchmark Results for IsoSurface Benchmark 70 IsoSurface Benchmark: Athlon 2-way SMP 1600+ 60 Time in Seconds 50 40 30 20 Base Tiling SMP 10 0 2 4 8 16 Number of Partitions Benchmark consists of loading a dataset representing flow over heated sphere and calculating 7 isosurfaces at different values. Mixed-Language Visualisation Application 16/20

Results for IsoSurface Benchmark 50 IsoSurface Benchmark:Cluster of Pentium 4 2.0 GHz 45 40 35 Time in Seconds 30 25 20 Base Tiling DMP 15 10 5 0 2 4 8 16 Number of Partitions Benchmark consists of loading a dataset representing flow over heated sphere and calculating 7 isosurfaces at different values. Mixed-Language Visualisation Application 17/20

Related Work Dynamic component assembly software architectures SciRun / BioPSE / Uintah Virtual data-grid projects Dynamic cross-component optimisation Telescoping languages (Rice, LLNL) Code generation approaches Kitware tool: Paraview This makes use of VTK s data-parallel routines, relies on MPI Grid workflow engines Related in that we assemble a workflow at runtime, then execute Our work illustrates an interesting pathway for facilitating legacy code to operate in such an environment. Mixed-Language Visualisation Application 18/20

What s Next Use of metadata to carry additional domain-specific semantics Parallel semantics, thread-safe on C++ side, use-def equations Reducing the size of the polygon set before rendering Cache full sets on servers, return decimated sets to client full 80% 85% Level of detail (LOD) and region-of-interest (ROI) selection Multiple time steps Speculatively applying the recipe to future timesteps Aim to achieve smooth rendering of a series of timesteps Mixed-Language Visualisation Application 19/20

Conclusion We have parallelised a large, open-source visualisation application without changing a single line of code. Entirely transparent to application program, controlled via an environment variable. Works for any Python visualisation script using VTK. Use Python to implement a domain-specific interpreter for a domain-specific library Facilitated by reliable capture of DSL calls and known data-flow due to opaque objects on the C++ side. Optimisations Coarse-grained tiling SMP parallelisation Distributed memory parallelisation Dynamic, runtime parallelisation Mixed-Language Visualisation Application 20/20