C++ Class Library Data Management for Scientific Visualization



Similar documents
THE NAS KERNEL BENCHMARK PROGRAM

The C Programming Language course syllabus associate level

Fast Arithmetic Coding (FastAC) Implementations

Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Linux/UNIX System Programming. POSIX Shared Memory. Michael Kerrisk, man7.org c February 2015

Variable Base Interface

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

El Dorado Union High School District Educational Services

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Data-Intensive Science and Scientific Data Infrastructure

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

lesson 1 An Overview of the Computer System

Object Oriented Software Design II

Software documentation systems

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

VISUALIZATION OF POST-PROCESSED CFD DATA IN A VIRTUAL ENVIRONMENT

CISC 181 Project 3 Designing Classes for Bank Accounts

System Calls and Standard I/O

C++ Programming Language

Linux Driver Devices. Why, When, Which, How?

Numbering Systems. InThisAppendix...

Database 2 Lecture I. Alessandro Artale

Introduction to Computer Graphics with WebGL

System Calls Related to File Manipulation

Sensors and the Zaurus S Hardware

Programming Languages & Tools

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Data Centric Systems (DCS)

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()

Lecture 24 Systems Programming in C

Intel P6 Systemprogrammering 2007 Föreläsning 5 P6/Linux Memory System

Image Compression through DCT and Huffman Coding Technique

C++ Wrapper Library for Firebird Embedded SQL

How To Write Portable Programs In C

Habanero Extreme Scale Software Research Project

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Schema Classes. Polyhedra Ltd

Storing Measurement Data

PROBLEM SOLVING SEVENTH EDITION WALTER SAVITCH UNIVERSITY OF CALIFORNIA, SAN DIEGO CONTRIBUTOR KENRICK MOCK UNIVERSITY OF ALASKA, ANCHORAGE PEARSON

Linux Kernel Architecture

6.088 Intro to C/C++ Day 4: Object-oriented programming in C++ Eunsuk Kang and Jean Yang

Affdex SDK for Windows!

Software Engineering Concepts: Testing. Pointers & Dynamic Allocation. CS 311 Data Structures and Algorithms Lecture Slides Monday, September 14, 2009

Course MS10975A Introduction to Programming. Length: 5 Days

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:

Course Name: ADVANCE COURSE IN SOFTWARE DEVELOPMENT (Specialization:.Net Technologies)

Network Attached Storage. Jinfeng Yang Oct/19/2015

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Glossary of Object Oriented Terms

Computing Concepts with Java Essentials

CSC 2405: Computer Systems II

Introduction. dnotify

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

ECS 165B: Database System Implementa6on Lecture 2

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Binary storage of graphs and related data

File Management. Chapter 12

Curriculum Map. Discipline: Computer Science Course: C++

SMTP-32 Library. Simple Mail Transfer Protocol Dynamic Link Library for Microsoft Windows. Version 5.2

Hank Childs, University of Oregon

XenData Archive Series Software Technical Overview

C++ INTERVIEW QUESTIONS

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

ProTrack: A Simple Provenance-tracking Filesystem

Crash Course in Java

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

File Management. Chapter 12

3.5. cmsg Developer s Guide. Data Acquisition Group JEFFERSON LAB. Version

Chapter 12 File Management

W4118 Operating Systems. Junfeng Yang

Chapter 12 File Management. Roadmap

Arti Tyagi Sunita Choudhary

How To Encrypt Data With A Power Of N On A K Disk

Operating Systems Design 16. Networking: Sockets

Logging. Working with the POCO logging framework.

CpSc212 Goddard Notes Chapter 6. Yet More on Classes. We discuss the problems of comparing, copying, passing, outputting, and destructing

Original-page small file oriented EXT3 file storage system

Raima Database Manager Version 14.0 In-memory Database Engine

vector vec double # in # cl in ude <s ude tdexcept> tdexcept> // std::ou std t_of ::ou _range t_of class class V Vector { ector {

Basic Unix/Linux 1. Software Testing Interview Prep

Building Applications Using Micro Focus COBOL

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE CISY 233 INTRODUCTION TO PHP

Overview. File Management. File System Properties. File Management

Shared Memory Introduction

The programming language C. sws1 1

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial

Transcription:

I. Introduction C++ Class Library Data Management for Scientific Visualization Al Globus, Computer Sciences Corporation 1 globus@nas.nasa.gov Abstract Scientific visualization strives to convert large data sets into comprehensible pictures. Data sets generated by modern instruments and super-computers are so large that visualization is often crucial to comprehension. Visualization involves a great deal of numerical programming: vector arithmetic, inverting matrices, solving ODE s, etc. Commercial C++ class libraries implement most of what s needed, but have a fatal memory management performance flaw: data to be operated on is always copied from memory or disk into newly allocated space. On large data sets, these copies kill performance. A small class hierarchy has been implemented to address this problem. The base class implements reference count memory management and contains a virtual destructor, but does no allocation or deallocation. Derived classes are added to implement different ways of allocating memory and importing data into the system. This solution has been tested in our own C++ numerical class library -- the library we wish we didn t have to write. Data sets of current interest at our facility range in size from 600 megabytes to 160 gigabytes [Globus92]. To use a commercial C++ numerical class library for visualization purposes, our data must be passed to the library, typically by a constructor. Vector, matrix, and array class constructors in existing libraries either copy a memory buffer or read from a disk file [Dyad92,RW92]. Copies are expensive in physical memory and disk IO is expensive in time. Unnecessary copies and/or disk IO are unacceptable for our work. Constructors are needed that can incorporate large blocks of data into an object without making a copy, and destructors are needed that delete the data when it is no longer needed. Allocation and deallocation must be done by executing application specific code. This can be done with a class hierarchy based at the class tdata 2. tdata implements reference counting memory management and has a virtual destructor. Application programmers write derived classes implementing constructors and destructors that import (or allocate) and delete application specific data as desired. In reference count memory management, each block of data is an object that keeps a count of the 1. This work is supported through NASA contract NAS2-12961. 2. In this paper, class names start with t, a habit picked up from MacApp. Also, code fragments will be in the courier font. 1

number of pointers referencing the block. When the reference count goes to 0, the data block is deallocated. Note that reference count memory management defeats normal C++ memory management of local variables and can cause problems. Reference count memory management is not central to the approach advocated by this paper. It is used here because the implementation is very simple and will not distract from our main point. Note that if data initialized by the application is imported into a numerical library, the way the library expects data to be laid out in memory must be documented. E.g., for multi-dimensional arrays, the index increasing fastest with increasing memory address must be known. II. Problem A description of the tdata class hierarchy follows, but first, consider two situations that arise in scientific visualization programming where the tdata memory management approach vastly improves performance over that achievable using current commercial C++ numerical class libraries. The performance improvements derive solely from eliminating memory copies and disk IO. 1. Extensible visualization systems such as AVS [Upson89], SGI Explorer [SGI92a,SGI92b], and FAST [Bancroft90] are customized by writing modules. These modules are separate UNIX processes that communicate locally via shared memory and to remote machines via sockets. This communication is handled by the visualization system. Module programmers need only write functions that input and output visualization system data types. Thus, a useful C++ numerical library must operate on data allocated by the visualization system, not the C++ library. 2. Some important visualization techniques access a very small, but unpredictable, portion of a data set. If data is on disk, the access pattern s unpredictability prevents reading the right portion a priori. However, one can memory map the file [UNIX]. A memory mapped file is added to a process virtual memory space without actually reading the file into physical memory. The virtual memory system will transparently read only those portions of the data actually referenced by subsequent code; i.e., physical memory acts as a cache for data on disk. Thus, code which accesses memory mapped files or in-core buffers is identical, although performance will vary. Memory mapping can reduce IO by orders of magnitude. Consider following the path of a set of particles through a time dependent 3D vector field. Conceptually, this is the equivalent of releasing massless particles into a flow to see where they go. Mathematically, given the Eulerian vector field: v ( x, find the particle path of X by solving: dx = v ( x, dt with x = X at t = t 0 In our work, v ( x, is usually sampled on a curvilinear 3D grid in space and regularly in dx time. The amount of the data that must be accessed to solve = v ( x, is dt O(number-of- particles). The size of v ( x, is O(number-of-grid-points). The number of particles per time step is typically ten thousand or so [Levit92], whereas the number of sample 2

points per time step is frequently in the millions [Globus92]. Eight sample points are necessary to interpolate values for one particle. Thus, in the unlikely worst case that no two particles access the same sample points, following 10,000 particles in a 3,000,000 point spatial grid requires access to 80,000 sample points per time step, or less than 2.7% of the data. Of course, the virtual memory system will access data not actually needed since disk reads have a minimum size, but the IO savings are still substantial. In one case, the time to calculate the motion of 100 particles for one step in a 2.8 million point data set went from many minutes to four seconds per time step [Globus93]. A useful C++ numerical library needs to operate on memory mapped data, but the present packages cannot. We have seen two situations where a numerical package must operate on memory not allocated by the package. All such situations cannot be anticipated, so a flexible mechanism is needed to import (or allocate) and deallocate data from different sources. III. A Solution To operate on different kids of data transparently, a numerical class library need only differ in the way the data is imported and deallocated. C++ derived classes and virtual destructors can provide a such a mechanism. In our C++ numerical class library, we implement the class tdata and derived classes to import/allocate and deallocate memory as appropriate. Consider the following cases: Memory is allocated by new. This can be used when the numerical class library will set initial values. Data is in memory allocated by a visualization system. In this case, the data is memory resident but has been allocated by a visualization system, not by the numerical class library. The data reside is a file, but it is not necessary to read the entire file into memory. In this case, memory mapping is used. A portion of the data in an existing tdata is needed. This occurs when taking a subset of an array or when a file is memory mapped and there is header information such as array dimensions. When data should not be deallocated by the numerical library under any circumstances. This can occur when you re just plain paranoid. 3

tdata is defined 1 : // Class that keeps reference counts of blocks data for memory // management. When using these classes, always allocate with new // and never use the destructors (they re protected anyway). // Use ->unreference() when you re done with the data class tdata private: int ReferenceCount; int Bytes; // size in bytes, set by derived classes char *Data; // set and (de)allocated by derived classes tdata() ReferenceCount = 1; } virtual ~tdata() = 0; char* data() return Data; } void reference() ReferenceCount++; } void unreference() if (!(--ReferenceCoun ) delete this;} int bytes() return Bytes; } int referencecount() return ReferenceCount; } When memory allocated by new is desired, use: class tnewdata : public tdata tnewdata( int size ) Bytes = size; Data = new char[bytes];} ~tnewdata() delete[] Data; } When the data is in a data structure controlled by visualization system, use: class tvisualizationsystemdata : public tdata // tell visualization system this data is referenced // and pull out a pointer to the actual data tvisualizationsystemdata( VisSystemDataType* ) 2 ; ~tvisualizationsystemdata(); 1. The code fragments here are modified for clarity: e.g., error checks have been removed. 2. Code here is visualization system dependent. 4

When the data is in a file that must be memory mapped: class tfiledata : public tdata tfiledata( const char* filename ) 1 ; int fd = open((const char *)filename, O_RDONLY); struct stat statbuf; fstat(fd, &statbuf); Bytes = (instatbuf.st_size; Data = (char*)mmap(0, (inbytes, PROT_READ, MAP_SHARED, fd, 0); close( fd ); } ~tfiledata() munmap( (caddr_data, (inbytes ); } Since a file of data frequently contains header information, there is a need to access the part of a file containing the data. To meet this need use the following: class tpartofdata : public tdata tdata *ReferencedData; tpartofdata( tdata* ref, int size, int start ) ReferencedData = ref; Bytes = size; Data = ref->data() + start; ref->reference(); } ~tpartofdata() ReferencedData->unreference(); } Sometimes there are blocks of data that should never be deallocated. For this case use the following: class tdonttouchdata : public tdata tdonttouchdata( int size, char* data ) Bytes = size; Data = data; } ~tdonttouchdata() } 1. Note: #include files left out for brevity. 5

To see how the tdata class hierarchy is used, consider vector operation on data in a file: tdata* d = new tfiledata( foo ); // reference count = 1 tvector v( d );// count = 2, see below d->unreference(); // count goes to 1 float f = v.product(); // multiply vector elements together } // when v is destructed, d s reference count goes to 0 // so d is also destroyed The tvector constructor and destructor might be implemented as follows: class tvector float* FloatData; // cache a raw pointer to vector tdata* Data; // tdata that manages the vector s data int Length; // length of the vector tvector( tdata* d ) Data = d; Data->reference(); Length = d->bytes() / sizeof(floa; FloatData = (float*)(d->data()); } ~tvector() Data->unreference(); } IV. Future Work The tdata class hierarchy works well for blocks of data in virtual memory; but if the data is in memory on a remote machine, the present tdata cannot help. However, if virtual operator[] and operator= were implemented, then a class derived from tdata should be able to get control when the data is accessed and handle the network access transparently (except for performance!). V. Conclusion To be useful to many scientific visualization application programmers, C++ class libraries must access data allocated outside the library without making copies. This can be accomplished by a class hierarchy where the derived classes handle application specific allocation and deallocation. We recommend that numerical class library developers provide a way for their classes to operate on memory allocated by application programmers, perhaps using a mechanism similar to the one presented here. VI. Acknowledgments D. Asimov, C. Levit, E. Raible, J. Krystynak, K. McCabe, and B. Klein reviewed this paper and made many excellent comments and corrections. I would like to thank the Applied Research 6

Branch of the NAS Division of NASA Ames Research Center for the superb working environment I have become addicted to. This work is supported through NASA contract NAS2-12961. VII. References [Atwood92] C. A. Atwood and W. R. Van Dalsem, Flowfield Simulation about the SOFIA Airborne Observatory, 30th AIAA Aerospace Sciences Meeting and Exhibit, January 1992, Reno, Nevada, AIAA 92-0656. [Bancroft90] G. Bancroft, F. Merritt, T. Plessel, P. Kelaita, R. McCabe, and A. Globus, FAST: A Multi-Processing Environment for Visualization of CFD, Proceedings Visualization 90, IEEE Computer Society, San Francisco, 1990. [Dyad92] Manuals and personal communication, Dyad Software. [Globus92] A. Globus, A Software Model for Visualization of Time Dependent 3-D Computational Fluid Dynamics Results, NASA Ames Research Center, NAS Systems Division, Applied Research Branch technical report RNR-92-031, November 1992. [Globus93] A. Globus, Optimizing Time Dependent Particle Tracing, NASA Ames Research Center, NAS Systems Division, Applied Research Branch technical report RNR-93-0??, 1993. In progress, title subject to change. [Levit92] C. Levit, personal communication. [RW92] Manuals and personal communication, Rogue Wave. [SGI92a] IRIS Explorer User s Guide, Silicon Graphics Computer Systems, Document 007-1371- 010, 1992. [SGI92b] IRIS Explorer Module Writer s Guide, Silicon Graphics Computer Systems, Document 007-1369-010, 1992. [UNIX] UNIX mmap Man Page. [Upson89] C. Upson, et. al., The Application Visualization System: A Computational Environment for Scientific Visualization, IEEE Computer Graphics and Applications, July 1989, Vol. 9, No. 4, pp 30-42. 7