An Grid Service Module for Natural Resource Managers



Similar documents
GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources

Large Data Visualization using Shared Distributed Resources

Remote sensing information cloud service: research and practice

Cluster, Grid, Cloud Concepts

Sensing, monitoring and actuating on the UNderwater world through a federated Research InfraStructure Extending the Future Internet SUNRISE

Improved metrics collection and correlation for the CERN cloud storage test framework

visperf: Monitoring Tool for Grid Computing

Scalability and Performance Report - Analyzer 2007

Dynamism and Data Management in Distributed, Collaborative Working Environments

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

HEP Data-Intensive Distributed Cloud Computing System Requirements Specification Document

White Paper. ThinRDP Load Balancing

Grid Computing Making the Global Infrastructure a Reality Teena Vyas March 11, 2004

FUTURE VIEWS OF FIELD DATA COLLECTION IN STATISTICAL SURVEYS

IT service for life science

GeoCloud Project Report USGS/EROS Spatial Data Warehouse Project

locuz.com HPC App Portal V2.0 DATASHEET

Wyoming Geographic Information Science Center University Planning 3 Unit Plan, #

Applying Business Architecture to the Cloud

ANSYS EKM Overview. What is EKM?

Integrated Municipal Asset Management tool (IMAM)

Collaborative & Integrated Network & Systems Management: Management Using Grid Technologies

a new generation software test automation framework - CIVIM

Server Consolidation with SQL Server 2008

Softline VIP Payroll System Requirements v2.9a January 2010

Translating Science Into Practice

Understand the strategic arrangement of IS/IT in modern organisations. Week 3 IT Architecture and Infrastructure. Lecture objectives

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Scientific and Technical Applications as a Service in the Cloud

High Performance Cluster Support for NLB on Window

ORACLE DATABASE 10G ENTERPRISE EDITION

ISA CERTIFIED AUTOMATION PROFESSIONAL (CAP ) CLASSIFICATION SYSTEM

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Uptime Infrastructure Monitor. Installation Guide

Stock Trader System. Architecture Description

Product Brief. DC-Protect. Content based backup and recovery solution. By DATACENTERTECHNOLOGIES

RPC and TI-RPC Test Suite Test Plan Document

An approach to grid scheduling by using Condor-G Matchmaking mechanism

G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids

Bringing Value to the Organization with Performance Testing

Developing Microsoft Azure Solutions 20532B; 5 Days, Instructor-led

Cisco Application Networking for IBM WebSphere

Tableau Server 7.0 scalability

TEST AUTOMATION FRAMEWORK

Capacity Plan. Template. Version X.x October 11, 2012

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

PERFORMANCE COMPARISON OF COMMON OBJECT REQUEST BROKER ARCHITECTURE(CORBA) VS JAVA MESSAGING SERVICE(JMS) BY TEAM SCALABLE

Cisco Application Networking for BEA WebLogic

Project Title: Project PI(s) (who is doing the work; contact Project Coordinator (contact information): information):

PC-Duo Web Console Installation Guide

Harmonized Use Case for Electronic Health Records (Laboratory Result Reporting) March 19, 2006

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Legacy System Integration Technology for Legacy Application Utilization from Distributed Object Environment

Online Transaction Processing in SQL Server 2008

Part I Courses Syllabus

Newsletter 4/2013 Oktober

LEVERAGE VBLOCK SYSTEMS FOR Esri s ArcGIS SYSTEM

SAP IT Infrastructure Management. Dirk Smit ALM Engagement Manager SAP Africa

Cloud application for water resources modeling. Faculty of Computer Science, University Goce Delcev Shtip, Republic of Macedonia

APPLICATIONS AND RESEARCH ON GIS FOR THE REAL ESTATE

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

TimePictra Release 10.0

COURSE NUMBER: CTS 2371

STATEMENT OF WORK. NETL Cooperative Agreement DE-FC26-02NT41476

Manjrasoft Market Oriented Cloud Computing Platform

University of Messina, Italy

CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities

Simplifying Administration and Management Processes in the Polish National Cluster

Distributed Systems and Recent Innovations: Challenges and Benefits

Transitioning from a Physical to Virtual Production Environment. Ryan Miller Middle Tennessee Electric Membership Corp

ENOVIA V6 Architecture Performance Capability Scalability

- An Essential Building Block for Stable and Reliable Compute Clusters

Automated deployment of virtualization-based research models of distributed computer systems

PRIMERGY server-based High Performance Computing solutions

Parallel Visualization for GIS Applications

FOUNDATIONS OF A CROSS- DISCIPLINARY PEDAGOGY FOR BIG DATA

Implementing a Microsoft SQL Server 2005 Database

Solving Healthcare's BIG Data Problem... Imaging and Cloud Infrastructure

Towards a New Model for the Infrastructure Grid

Stream Processing on GPUs Using Distributed Multimedia Middleware

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

GFI Product Manual. Deployment Guide

Troubleshooting BlackBerry Enterprise Service 10 version Instructor Manual

FLORIDA STATE COLLEGE AT JACKSONVILLE COLLEGE CREDIT COURSE OUTLINE. CTS 2655 and CNT 2102 with grade of C or higher in both courses

Internet accessible facilities management

INFORMATION SCIENCE. INFSCI 0010 INTRODUCTION TO INFORMATION SCIENCE 3 cr. INFSCI 0015 DATA STRUCTURES AND PROGRAMMING TECHNIQUES 3 cr.

The Role of the Software Architect

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

IBM Cognos Controller

Océ PRISMA access BUSINESS CONTROL. Order submission and workflow management made easy

IBM EXAM QUESTIONS & ANSWERS

Fundamentals of LoadRunner 9.0 (2 Days)

Information Technology Engineers Examination. Network Specialist Examination. (Level 4) Syllabus. Details of Knowledge and Skills Required for

How To Write An Nccwsc/Csc Data Management Plan

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

KM road map. Technology Components of KM. Chapter 5- The Technology Infrastructure. Knowledge Management Systems

Information Systems Development Process (Software Development Life Cycle)

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

The Massachusetts Open Cloud (MOC)

EXECUTIVE SUMMARY CONTENTS. 1. Summary 2. Objectives 3. Methodology and Approach 4. Results 5. Next Steps 6. Glossary 7. Appendix. 1.

Transcription:

An Grid Service Module for Natural Resource Managers Dali Wang 1, Eric Carr 1, Mark Palmer 1, Michael W. Berry 2 and Louis J. Gross 1 1 The Institute for Environmental Modeling 569 Dabney Hall, University of Tennessee, Knoxville, TN 37996 [wang, carr, palmer, gross]@tiem.utk.edu 2 Department of Computer Science 203 Claxton Complex, University of Tennessee, Knoxville, TN 37996 berry@cs.utk.edu Abstract: This article presents the motivations for developing a grid service module for natural resource managers. Built on grid middleware, the grid service module allows natural resource managers to transparently use a high performance ecosystem modeling package (Across Trophic Level System Simulation) without requiring knowledge of the underlying computational issues. From a software development perspective, this novel grid service module can be deployed by researchers across multiple disciplines to allow decision makers or the public to exploit fully functional scientific computation on grids. Key words and phrases: grid service module, grid computing, distributed computation, software architecture, ecosystem modeling. 1. Introduction During the past two decades, a variety of ecological models were developed as useful tools natural resource management. Ecological models can summarize information on the natural resource, determine where gaps exist, extrapolate across the gaps, and simulate various scenarios to evaluate outcomes of natural resource management decisions [1]. However, ecological models are not effectively used in natural resource management due to the lack of training and education, integration of existing models, and development of new models [2]. We argue that for new applications of modeling to be effective, new computational methodologies must be developed to enable users to readily carry out complex simulations without extensive additional training. In reality, natural resource managers typically have very little experience with high performance computing. Therefore, as ecological models become more complex, the computational demands make it prudent to have tools available to carry out the simulations and associated visualization without requiring that the managers have extensive background in computational science. Fortunately, developments in high-performance networks, computers, and information services make it feasible to incorporate remote computing and information resources into local computational environments. Recently, grid computing [3] emerged as one of the most important new developments in building the infrastructure for computational science. In this article, we describe a grid service module developed to deliver high performance ecosystem modeling capabilities (Across Trophic Level System Simulation (ATLSS) [4]) to natural resource managers with only limited knowledge on computational science. 1

The Across Trophic Level System Simulation (ATLSS) is an ecosystem modeling package designed to assess the effects on key biota of alternative water management plans for the regulation of water flow across the Everglades landscape. The immediate objective of ATLSS is to provide quantitative, predictive modeling software for guiding the South and Central Florida restoration effort. The long-term goals are to aid in understanding how the biotic communities of South Florida are linked to the hydrologic regime and other abiotic factors, and to provide a predictive tool for both scientific research and ecosystem management. 2. Computational Platform and Grid Middleware The computational platform used is the Scalable Intracampus Research Grid (SInRG) [5], supported by the National Science Foundation. The SInRG project deploys a research infrastructure on the University of Tennessee, Knoxville campus that mirrors the underlying technologies and the interdisciplinary research collaborations that are characteristic of the emerging national technology grid. SInRG's primary purpose is to provide a technological and organizational microcosm in which key research challenges underlying grid-based computing can be addressed with better communication and control than wide-area environments usually permit. A variety of grid middleware were installed on SInRG, among which NetSolve [6] and Internet Backplane Protocol (IBP) [7] were deployed to create a grid service module designed to deliver high performance ecosystem modeling capability to natural resource managers at several federal and state agencies in Florida. NetSolve is a remote procedure call (RPC) based middleware system, which allows users to access additional hardware and/or software resources remotely. There are three main components in a NetSolve system: agent, server and remote users. NetSolve tracks which machines have computational servers running and with which computational service (software) they are provisioned. It also tracks the workload of each NetSolve server to yield the best choice of server for a given job request. In other words, NetSolve takes care of the details of finding a machine on which to execute the computational task. NetSolve also provides extensible service creation via Problem Definition Files (PDFs), to generate wrappers for the user s code. After compilation, the codes defined by PDFs become NetSolve services which can be enabled by server instances. IBP is middleware for managing and using remote storage. It uses small ASCII based files (called exnodes) to support global scheduling and optimization of data movement and storage in distributed systems and applications. In an IBP system, a large data file can be separated into multiple parts and stored on different IBP servers, (an exnode is created for each data storage location). In addition, multiple copies of data can be stored in the IBP system, therefore, the data integrity and reliability of data transmissions can be enhanced by a multi-threaded submission and retrieval process. 3. Design and Implementation Figure 1 presents a simplified view of the grid service module for natural resource managers based on NetSolve and IBP. It contains four major components: a dedicated web interface, a job scheduler, a simulation moderator, and a result repository. 2

Figure 1: Architecture of the Grid Service Module The web interface provides a common gateway for users to specify simulation inputs and launch multiple tasks potentially by different users at the same time. It also checks user identifications and performs process authorization. The job scheduler, containing an IBP client and a NetSolve intelligent agent, accepts computation requests from users via the web interface and allocates appropriate ATLSS models and necessary data storage for the job (remote) executions. The simulation moderator is built on IBP and NetSolve servers where the ATLSS models have been configured as NetSolve services and model input/output are managed by IBP servers. The result repository is a database to store simulation results, which can be reviewed and reused by authorized users in future analysis. 3.1 Web Interface Various stakeholders/agencies have expressed strong interest in a single web interface for accessing, running, and retrieving data from a variety of ecological models. The only requirements for natural resource managers to use the grid service module are an Internet access and a web browser. In addition, the detailed information and model parameterizations associated with the complex ecological models necessitated that only limited functionality should be provided through the web interface. Users are given, within limits, the choice of particular models and can parameterize them as appropriate based upon their experience and the requirements of their agency. This provides different users an intuitive process to apply ecological models for particular species, conditions or spatial domains. Figure 2 illustrates the password protected web interface to access different ATLSS models. Each model allows users to vary simple simulation control parameters such as simulation time and input conditions such as hydrological scenarios. Through this web interface, users can launch parallel jobs on SInRG resources. After a simulation (which may take hours of CPU time on a high-performance computer [8,9]) is complete, the web interface issues an email notification to the user. In addition, the web interface acts as the gateway for users to access a database (result repository) in which results from 3

Figure 2: Screenshot of the ATLSS web interface. previous simulations have been stored. Users also have access to a separate visualization and analysis tool [10] built in a geographic information system framework (ESRI ArcView). 3.2 Computational Resource Allocation and Remote Execution Technically speaking, two main functionalities are implemented in the grid service module: one is networked computational resource allocation and remote execution; the other is network storage. This section focuses on the computational resource allocation and remote execution based on NetSolve (networked storage issues are addressed in the following section). In the grid service module, the job scheduler contains a NetSolve agent. The simulation moderator incorporates the functions of NetSolve servers. The ecological models in ATLSS are preconfigured as NetSolve services at compile time (through Problem Definition Files), shown in Figure 3. When NetSolve servers register themselves and their services with the NetSolve agent, the job scheduler obtains all necessary information on the ATLSS simulation capability. An ecological model can be installed on multiple NetSolve servers, taking account of different architectural features. Also, depending on the code, the ecological model can be compiled on Windows PC, MAC and Unix based systems and report to the NetSolve agent within the job scheduler. From this standpoint, the grid service module provides the ability to harness diverse machine architectures to work together on a single computational task (heterogeneous computing). Once the job scheduler takes a job request from a user through the web interface, it will use its NetSolve agent to find a best NetSolve server within the simulation moderator based on service availability and machine workload. Next, the job request is shipped to the NetSolve server for processing. The simulation moderator contains a set of problem definition scripts to initialize the computational environment, prepare the model input and then launch the model on the computer where the NetSolve service exists. For example, if a NetSolve service must handle a message-passing interface (MPI) [11] 4

based model, a problem definition script will be used to initialize the MPI environment and determine the number of processes for parallel execution. There are several advantages to using this framework: 1) Computational workload balancing is achieved across the entire NetSolve organization, since all jobs are scheduled through the centralized, intelligent job scheduler; 2) Security of the system is enhanced since the users are insulated from actual software, data as well as not given the direct access to the high performance computational facilities; 3) Users can take advantage of the high performance computation facilities without extensive knowledge of computational science. Figure 3: ATLSS Installation in the NetSolve Organization 3.3 Network-Based Storage and Data Transmission A difficulty in using NetSolve for ecological modeling is the limited capability NetSolve supports for transporting large amounts of data over the network. For this reason, IBP was adopted to provide efficient data transport capability. In the grid service module, IBP exnodes are used as transfer keys through the NetSolve system to establish a novel way to send/receive large files to/from remote computational facilities, without requiring direct user access to those machines. Thus, IBP allows the simulation moderator to allocate and schedule storage resources as part of its resource brokering, which in turn leads to much improved performance, and enables fault-tolerance when resources fail or are revoked. As an example, we use a spatially explicit species index model (SESI) [12] to show the typical data flow in the grid service module (see Figure 4). The SESI model input includes a landscape map of South Florida at a 500-m scale of resolution, two hydrological scenarios over several decades, as well as a set of control parameters that specify model assumptions regarding the spatial pattern of wading birds foraging rules over the landscape of South Florida. 5

Figure 4: Data Flow of an Ecological Model in the Grid Service Module. (Solid black arrows represent model data flow and dashed gray arrows represent flow of exnodes) Figure 5: Sample Model Output of ATLSS SESI model for Long-legged Wading Birds Figure 4 shows the typical data flow in the grid service module. Once a user (natural resource manager) inputs control parameters (such as scenario name, simulation time, etc.) and submits a job request, these parameters are sent to the job scheduler. The job scheduler then executes four tasks: 6

1. assembles all model input data (including landscape map, and water depth distribution for 35 years, etc.); 2. determines the locations of data storage (represented by two exnodes, one for model input, one for model output) and computational facilities (represented by NetSolve server); 3. launches an IBP_upload operation to move the model input into an IBP file server monitored by the simulation moderator; and 4. passes the exnode information to the simulation moderator via the connection between its NetSolve agent and the remote NetSolve server. After receiving the job request from the job scheduler, the simulation moderator uses a Problem Definition Script to download the model input from the IBP File Server, initialize the computational environment, launch the computation and upload the model result back to the IBP File Sever. Eventually, the job scheduler is notified of job completion and issues an IBP_download operation to deliver the model output to the result repository and send a notification to the user via email. The total size of input files in this case is around 0.5 GB. An example output of the SESI model is shown in Figure 5, including a visual representation of the landscape with colorcoded values assigned to each cell and a time series of the mean overall index attained each year under each hydrologic scenario. The index value (ranging from 0 to 1) reflects the relative potential for appropriate foraging conditions. 4. Summary The grid service module presented in this article, utilizing the grid middleware NetSolve and IBP, is the first time (based on our knowledge and experiences) a computational grid has been applied to a natural resource management problem. Natural resource agencies typically have very limited access to computational facilities and the associated expertise necessary to carry out high performance computing. Projects such as the one described here offer resource managers the ability to apply the most scientificallydefensible models, even when these involve intensive computation. Spatially-explicit and temporally-varying models, though realistic in that they account for what we know of environmental variation and its impacts on natural systems, present numerous computational challenges. We expect that grid service modules will provide feasible methods to address these problems as well as provide input to decision support tools that are needed in natural resource management. From a broader perspective, we argue that grid service modules have a potential positive impact on applied and scientific computation problems for two different audiences: i) for model developers, they provide a practical, explicit approach to easily utilize remote high performance infrastructure and existing simulation packages (without code modification) to explore new frontiers in science and engineering; and ii) for decision makers and stakeholders, they create an intuitive method to launch and analyze model results without concern for the underlying implementations, utilizing highly integrated simulations and modeling approaches. 7

Acknowledgement This research has been supported by the National Science Foundation under grant No. DEB-0219269. This research used the resources of the Scalable Intracampus Research Grid (SInRG) Project at the University of Tennessee, supported by the National Science Foundation CISE Research Infrastructure Award EIA-9972889. References 1. Dale, V.H., Opportunities for Using Ecological Models for Resource Management, 2003, in Dale, V.H. (ed.), Ecological Modeling for Resource Management, Springer-Verlag New York, Inc. 2. Ginzburg, L., Akcakaya, H.R., Science and Management Investments Needed to Enhance the Use of Ecological Modeling in Decision Making, 2003, in Dale, V.H. (ed.), Ecological Modeling for Resource Management, Springer-Verlag New York, Inc. 3. Grid Computing Info Centre (GRID infoware), http://www.gridcomputing.com/. 4. ATLSS: Across Trophic Level System Simulation, http://www.atlss.org/. 5. SInRG: Scalable Intracampus Research Grid, Innovative Computing Laboratory at the University of Tennessee, Knoxville, TN, 2002, http://icl.cs.utk.edu/sinrg/. 6. NetSolve, Innovative Computing Laboratory at the University of Tennessee, Knoxville, TN, 2002, http://icl.cs.utk.edu/netsolve/. 7. IBP: Internet Backplane Protocol, Logistical Computing and Internetworking Laboratory at the University of Tennessee, Knoxville, TN, 2002, http://loci.cs.utk.edu. 8. Wang, D., M. W. Berry, E. Carr, L. J. Gross, 2003, Parallel Landscape Fish Model for South Florida Ecosystem Simulation, Proceedings of Supercomputing 2003, Phoenix, AZ. 9. Wang D., E. A. Carr, M. W. Berry, L. J. Gross, Parallel Fish Landscape Model for Ecosystem Modeling on a Computing Grid, Parallel and Distributed Computing Practices (in review) 10. ATLSS Data Viewer System: National Wetlands Research Center, USGS. 2002. http://sflwww.er.usgs.gov/projects/workplan03/atlss.html#task3. 11. MPI: Message Passing Interface Standard. http://www-unix.mcs.anl.gov/mpi/. 12. Curnutt, J. L., E.J. Comiskey, M. P. Nott and L. J. Gross. 2000. Landscapebased spatially explicit species index models for Everglades restoration. Ecological Applications 10:1849-1860. 8