A Grid-enabled Science Portal for Collaborative. Coastal Modeling



Similar documents
A Collaborative Science Portal for High. Performance Computing

A Workflow Approach to Designed Reservoir Study

Large Scale Coastal Modeling on the Grid

SCOOP Data Management: A Standards-based Distributed System for Coastal Data and Modeling

Application Frameworks for High Performance and Grid Computing

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

THE CCLRC DATA PORTAL

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

An approach to grid scheduling by using Condor-G Matchmaking mechanism

ENHANCING THE COLLABORATIVE USE OF GRID COMPUTING APPLICATIONS WITH A USER-ORIENTED PORTAL DASHBOARD

An Experience in Accessing Grid Computing Power from Mobile Device with GridLab Mobile Services

SOA REFERENCE ARCHITECTURE: WEB TIER

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

A Survey Study on Monitoring Service for Grid

IBM Solutions Grid for Business Partners Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand

Data Grids. Lidan Wang April 5, 2007

Using Resource Virtualization Techniques to Grid-enable Coupled Coastal Ocean Models

e-science Technologies in Synchrotron Radiation Beamline - Remote Access and Automation (A Case Study for High Throughput Protein Crystallography)

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

PROGRESS Portal Access Whitepaper

The THREDDS Data Repository: for Long Term Data Storage and Access

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Abstract. 1. Introduction. Ohio State University Columbus, OH

BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE. D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A.

Middleware- Driven Mobile Applications

Cluster, Grid, Cloud Concepts

GridSpeed: A Web-based Grid Portal Generation Server

Integrating SharePoint Sites within WebSphere Portal

Cloud Computing. Lecture 5 Grid Case Studies

Coastal Research Proposal Abstracts

SAS Information Delivery Portal

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

The distribution of marine OpenData via distributed data networks and Web APIs. The example of ERDDAP, the message broker and data mediator from NOAA

GridFTP: A Data Transfer Protocol for the Grid

NASA s Big Data Challenges in Climate Science

A High-Performance Virtual Storage System for Taiwan UniGrid

Web Service Based Data Management for Grid Applications

Monitoring Clusters and Grids

Communiqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

IBM Rational Rapid Developer Components & Web Services

Data Management System for grid and portal services

DATABASES AND THE GRID

API Architecture. for the Data Interoperability at OSU initiative

GRID COMPUTING Techniques and Applications BARRY WILKINSON

Red Hat Enterprise Portal Server: Architecture and Features

StormSurgeViz: A Visualization and Analysis Application for Distributed ADCIRC-based Coastal Storm Surge, Inundation, and Wave Modeling

Secure Federated Light-weight Web Portals for FusionGrid

The glite File Transfer Service

What You Need to Know About Transitioning to SOA

Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform

2012 LABVANTAGE Solutions, Inc. All Rights Reserved.

Using DeployR to Solve the R Integration Problem

Using Message Brokering and Data Mediation to use Distributed Data Networks of Earth Science Data to Enhance Global Maritime Situational Awareness.

An IDL for Web Services

Data Collection and Analysis: Get End-to-End Security with Cisco Connected Analytics for Network Deployment

Cyberinfrastructure Education and Hands-on Training Using the CH3D-GTM Virtual Appliance on SURAGrid

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

Grid Computing With FreeBSD

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Globus Toolkit: Authentication and Credential Translation

Web Service Robust GridFTP

U.S. Department of Health and Human Services (HHS) The Office of the National Coordinator for Health Information Technology (ONC)

HPC and Grid Concepts

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Web Application Hosting Cloud Architecture

TOG & JOSH: Grid scheduling with Grid Engine & Globus

SOA management challenges. After completing this topic, you should be able to: Explain the challenges of managing an SOA environment

On Enabling Hydrodynamics Data Analysis of Analytical Ultracentrifugation Experiments

BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use

Orbiter Series Service Oriented Architecture Applications

André Karpištšenko, Co-Founder & Chief Scientist, Marinexplore Strata,

IBM Rational Web Developer for WebSphere Software Version 6.0

Resource Management on Computational Grids

Status and Integration of AP2 Monitoring and Online Steering

Integration strategy

How To Create A Visual Analytics Tool

Execution Management: Key Concepts

HEP Data-Intensive Distributed Cloud Computing System Requirements Specification Document

XSEDE Overview John Towns

GRIP:Creating Interoperability between Grids

The GENIUS Grid Portal

TUTORIAL. Rebecca Breu, Bastian Demuth, André Giesler, Bastian Tweddell (FZ Jülich) {r.breu, b.demuth, a.giesler,

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Service-Oriented Architecture and Software Engineering

How To Create A C++ Web Service

NASCIO EA Development Tool-Kit Solution Architecture. Version 3.0

A Web Services Data Analysis Grid *

This research note is restricted to the personal use of

Performance Management Platform

Technical. Overview. ~ a ~ irods version 4.x

Introduction to Service Oriented Architectures (SOA)

WebSphere Portal Server and Web Services Whitepaper

Machine-to-Machine Exchange of Cyber Threat Information: a Key to Mature Cyber Defense

HP Systinet. Software Version: Windows and Linux Operating Systems. Concepts Guide

Designing an Enterprise Application Framework for Service-Oriented Architecture 1

Business Process Management

Business Process Management Tampereen Teknillinen Yliopisto

Users. Extending Visibility Across Your TIBCO Infrastructure and Beyond

NASA's Strategy and Activities in Server Side Analytics

Transcription:

A Grid-enabled Science Portal for Collaborative Coastal Modeling Master of Science, Systems Science Project Report submitted to Department of Computer Science, Louisiana State University Chongjie Zhang March 28, 2006 Department of Computer Science, Louisiana State University, Baton Rouge, LA 70803.

Abstract The Southeastern United States is regularly impacted by severe ocean-driven events such as hurricanes that affect the lives of hundreds of thousands of citizens. It is urgent to improve our ability to predict critical coastal phenomena within the region. The SCOOP program is intended to create an open access, distributed laboratory for scientific research and coastal operations by integrating the coastal data, computing resources, and research efforts from its partners. A Grid Computing infrastructure is deployed to support researchers work by allow sharing expertise, software, and data across multiple institutions. This report presents the SCOOP portal, a Grid-enabled science portal, developed for the coastal research community to reduce the complexity of high performance computing for end users and enable advanced application scenarios. Since Grid portal toolkits are increasing being adopted as a means to speed up the Grid portal development, we extensively investigate and compare two major Grid portal toolkits, OGCE and GridPortlets. Then details about the SCOOP portal development are described, including use cases, design, and implementation. The SCOOP portal, built with the GridSphere Framework, currently integrates customized Grid portlet components for data access, job submission, resource management and notification. i

Acknowledgements I express my sincere gratitude to everyone in any way connected to the work presented in this report. First and foremost I would like to thank Gabrielle Allen for her constant support, guidance, encouragement and invaluable time that she put in to help me make progress towards completion of this project. I also would like to thank the rest of my program committee, Jianhua Chen and Bert R. Boyce, for their helpful comments and guidance. My work benefits greatly from a friendly and productive cooperation with other members in the SCOOP team and Grid research group at the Center for Computation and Technology. My report makes use of two papers for which I am the primary author, but which contain constructive comments and suggestions from other authors. I am specially grateful to to Jon MacLaren for his effective management on SCOOP project and work on archive services; to Ian Kelley for his comments and suggestions on Grid portal development; and to Chirag Dekate for his ideas and work on coastal modeling scenarios. I would like to sincerely thank Michael Russell, Jason Novotny, and Oliver Wehrens for their support on GridSphere and GridPortlets during the project. Finally, I would like to thank Xiaoxi Xu for keeping me happy and sane throughout the process, my father and my mother for teaching me early on to love learning, and my brother, whose enthusiasm and support helped me to reach this point. ii

Acknowledgements iii This project was carried out as a component of the SURA Coastal Ocean Observing and Prediction (SCOOP) Program, an initiative of the Southeastern Universities Research Association (SURA). Funding support for SCOOP has been provided by the Office of Naval Research, Award #N00014-04-1-0721 and by the National Oceanic and Atmospheric Administration s NOAA Ocean Service, Award #NA04NOS4730254. Additional support was provided by the Center for Computation & Technology at Louisiana State University.

Table of Contents 1 Introduction 1 1.1 Major Contributions.............................. 2 1.2 Outline.................................... 3 2 Grid Portal Development 4 2.1 Grid Computing................................ 4 2.2 Grid Portal Toolkits.............................. 6 2.2.1 GridPortlets.............................. 7 2.2.2 OGCE................................. 9 2.2.3 Comparison between GridPortlets and OGCE............ 11 3 Collaborative Coastal Modeling 15 3.1 Goals..................................... 15 3.2 Science Scenarios............................... 16 3.3 CCT Role................................... 18 3.4 Infrastructure................................. 19 4 SCOOP Portal 21 4.1 Requirements................................. 21 4.2 Choice of Grid Portal Toolkits........................ 23 4.3 Design..................................... 24 4.4 Implementation................................ 25 4.4.1 Archive................................ 27 4.4.2 SCOOP Job Management...................... 31 5 Conclusion 36 iv

List of Figures 2.1 GridPortlet Architecture............................. 8 2.2 OGCE 2 Architecture.............................. 10 3.1 SCOOP Grid Infrastructure........................... 20 4.1 SCOOP Portal architecture based on GridSphere................. 25 4.2 Archive portlet using metadata for querying and retrieving SCOOP data files from storage...................................... 28 4.3 Copy retrieved data files to a remote machine................... 29 4.4 Manage files through physical file management portlet.............. 30 4.5 Job submission portlet for running ensembles of simulation models on different types of data................................... 32 4.6 Job submission portlet for running ensembles of simulation models on different types of data................................... 33 4.7 Show output of each sub-job run......................... 34 v

List of Tables 2.1 Comparison between GridPortlets and OGCE 2............... 12 vi

Chapter 1 Introduction The Southeastern United States hosts roughly 100 million citizens and supports five naval bases, over a dozen major ports, essential commercial shipping and fishing enterprises, major oil and natural gas reserves, and thriving tourist enterprises. It is also a region that frequently suffers from severe ocean-driven events such as hurricanes that affect the lives of hundreds of thousands of citizens yearly. The recent devastation to coastal Louisiana by Hurricanes Katrina and Rita cost the lives of over 1000 people and severely damaged the economy and the environment. Hence there is an urgent need for accurate models of hurricanes and other severe weather events within the region. Such accurate models are needed to predict the path and effect of impending hurricanes for evacuation and preparation, to design better coastal defense systems, and to understand the physics and trends of hurricanes. In an effort to improve model fidelity, the Southeastern Universities Research Association, or SURA, has initiated and funded the SURA Coastal Ocean Observing Program (SCOOP) (1), partnering with ten institutions near the Gulf and Atlantic coasts, including Louisiana State University (LSU). SCOOP is representative of a growing class of geograph- 1

1.1. Major Contributions 2 ically distributed collaborators who have realized the need for new infrastructures, such as Grid computing (2), to support their research work in today s world of complex applications which require sharing expertise, software, and data across multiple institutions. Building the necessary infrastructures for collaborative projects such as SCOOP involves integrating multiple Grid middleware packages to provide a holistic approach to collaborative problem solving. Portals have become a popular way to integrate applications and content, providing groups of users (or virtual organizations) with a single entry point to interact with their applications, data, colleagues and services, all the while maintaining a consistent and uniform interface. As new applications and technologies, such as Grid computing, become increasing complex and difficult to configure and use, Grid portals have come to be recognized as a useful tool to enable the work of scientists and engineers without burdening them with the low-level details of underlying technologies. 1.1 Major Contributions The Coastal Studies Institute and the Center for Computation & Technology (CCT) are representing LSU in SCOOP program. As a research assistant at CCT, I worked closely with other members of the SCOOP team at LSU. I designed and built a Grid-enabled science portal for the coastal research community to reduce the complexity of grid computing for end users and enable advanced application scenarios. The resulting SCOOP portal uses new collaborative tools to better access ocean data and computational resources. The development and deployment of the SCOOP portal also illustrate how portal technologies can complement Grid middleware to provide a community with an easy-to-use collaborative infrastructure that is tailored to their particular needs and has the ability to incrementally

1.2. Outline 3 introduce and test new capabilities and services. My paper (3) based on this work gained the best paper award in GCE05: Workshop on Grid Computing Portals at Supercomputing Conference 2005. Before designing the SCOOP portal, I performed an extensive investigation of the current state of integration of Grid technologies with portal frameworks, cumulating in a review paper (4) which discusses two of the major Grid portal solutions, the Open Grid Computing Environments Collaboratory (OGCE) (5) and GridPortlets (6). That paper investigates and compares what each of these packages provides, discusses their advantages and disadvantages, and identifies missing features vital for Grid portal development. Our main purpose is to identify what current toolkits provide, reveal some of their limitations, and provide motivation for the evolution of Grid portal solutions. That paper can also be used as a reference for application groups to choose a Grid portal toolkit that fulfills the needs of their application. 1.2 Outline The rest of this report is organized as follows. Chapter 2 introduces Grid computing technologies and Grid portal toolkits. Chapter 3 lists several requirements from coastal modeling researchers and describes the Grid infrastructure currently deployed to meet these needs. Section 4 elaborates use cases from the coastal community, discusses the choice of Grid portal toolkits and the design and architecture of the SCOOP portal, provides implementation details and information about the different services the portal provides, and also looks to future development. Finally, Section 5 presents the conclusions of this work.

Chapter 2 Grid Portal Development The term the Grid was coined in the mid-1990s to denote a proposed distributed computing infrastructure to enable resource sharing within scientific collaborations. Much progress has since been made on the construction of such an infrastructure and its applications to both scientific and industrial computing problems. However, current Grids are generally difficult to use, so the development and deployment of Grid portals has been a popular way to simplify the usage of Grid services. This chapter will introduce basic concepts and technologies of Grid computing and toolkits to develop Grid portals. 2.1 Grid Computing Grid computing is a form of distributed computing that addresses a need for coordinated resource sharing and problem solving across multiple dynamic and geographically dispersed organizations. Resources include not only data, computer cycles, applications, and networks, but also specialized scientific instruments, such as telescopes, ocean sensors, and earthquake shake tables. The sharing is highly controlled, with resource providers 4

2.1. Grid Computing 5 defining clearly and carefully what resources are shared, who can assess those resources, and how those resources are used. Generally, Grids contains heterogeneous administrative domains with different operating systems and hardware architectures. One of its distinguishing points from traditional distributed systems is that Grids aim to use standard, open, general-purpose protocols and interfaces. The technologies used to construct Grids has evolved over time. The Globus Alliance (7) released the Globus Toolkit 2 (GT2) in 1998. The Globus Toolkits thereafter became the de facto standard for Grid computing. As the rapidly increasing uptake of Grid technologies, the first Global Grid Forum (GGF) (8) was held as a formal standards body in March 2001. Since then, GGF has produced numerous standards and specifications documents, including the Open Grid Services Architecture (OGSA) (9). OGSA is based on web services and provides a well-defined suite of standard interfaces and behaviours that serve as a common framework for all Grid-enabled systems and applications. The Globus Toolkit 4 (GT4) (10) has implemented OGSA and other GGF-defined protocols to provide functionalities of resource management, information services, security services, and data management. In addition, a number of tools function along with Globus Toolkit to make Grids a more robust platform, such as Gridbus (11), Grid Portal toolkits, Condor (12; 13), MPICH-G2 (14), etc. Grid technologies have been applied on a range of problems from both academia and industry. These applications cover compute-intensive, data-intensive, and equipment-centered scenarios. Locally, we can these applications from projects in the Center for Computation & Technology at Louisiana State University. The UCoMS (15) project is using Grid technologies to advance reservoir simulation and drilling analysis studies, coordinating the use of large scale compute and data resources at LSU and University of Louisiana at Lafayette. The GridChem (16) project is building a production system based on Grid infrastructures

2.2. Grid Portal Toolkits 6 for Chemists to launch and monitor computational chemistry calculations on CCG supercomputers from remote sites. To support Grid applications in astrophysics, coastal modeling, and petroleum engineering, the Enlightened (17) project focuses on developing dynamic, adaptive, coordinated and optimized use of networks connecting geographically distributed high-end computing resources and specific scientific instrumentation. 2.2 Grid Portal Toolkits The Web has proven to be an effective way of integrating and delivering content and applications. Web portals, like Yahoo or MSN, offer users a single point to access various information, services, and applications. The Grid is still a complex distributed system that has, at its roots, many differing software packages that form underlying Grid-middleware and infrastructure. Grid portals are gaining momentum in the scientific research community as a possible way to expose Grid services and functionality in a friendly, easy-to-use, and collaborative interface. When designing a Grid portal for a particular application domain, developers often have the choice of developing a new solution from the ground-up or leveraging an existing toolkit. Although building a solution from scratch may prove successful in the short-term, as technologies evolve and the demands on portal applications become more intense, more sophisticated solutions are needed. Grid portal toolkits are increasingly being adopted as a means to speed application development since they can provide much of the high-level functionality that is needed to manage the multi-institutional and multi-virtual organization issues that arise when designing and maintaining a production portal. A defining moment in the evolution of portal development toolkits occurred in October 2003 with the introduction of JSR-168 (18), an industry led effort to specify how

2.2. Grid Portal Toolkits 7 components within Java-based portals, called portlets, should interact with their hosting environment, or container. By clearly defining the methods a portlet must implement to be standards compliant, JSR-168 is the first step towards allowing true interoperability between portlets developed in different portlet containers. A true JSR-168-compliant portlet that does not have any container-related dependencies will be able to run, without any modification, in any number of portal servers including IBM WebSphere, BEA WebLogic, uportal, and GridSphere (19). In such a case, the container becomes irrelevant in choosing a Grid portal toolkit, so long as it is portable. There are a number of Grid portal toolkits including GridPortlets, GridPort Toolkit (20), and OGCE. Since the GridPort Toolkit, initially started in the TACC, is now gradually integrated into the OGCE, I will mainly introduce GridPortlets and OGCE in the following subsections. 2.2.1 GridPortlets GridPortlets is an open-source toolkit that was developed at the Albert Einstein Institute as part of the European GridLab (21) project s GridSphere workpackage. GridPortlets provides a high-level interface for portal developers to access a range of Grid services including resource information, credential management, job submission, and file browsing. Figure 2.1 shows the general architecture of GridPortlets, where a common and clearly defined API abstracts developers from underlying services and middleware. For example, to launch an executable using GridPortlets, a simple execute task is constructed that does not require any details about the underlying implementation, which may be the Globus Resource Allocation Manager (GRAM) (22) via the Java Commodity Grid Toolkit (Java CoG) (23) or an advanced resource brokering system (24) such as the GridLab Resource

2.2. Grid Portal Toolkits 8 Figure 2.1: GridPortlet Architecture. Management System (GRMS) (25). In addition to providing a high-level API for Grid operations, GridPortlets contains many reusable User Interface (UI) components that can easily be exploited to develop other portlet-based applications. These UI components allow developers to customize the generic JSP pages used by GridPortlets and incorporate them into their applications. GridPortlets itself reuses many of these components for its various portlets, including the file browsing dialog and resource information interfaces. Portal services in GridPortlets are managed through a Service Registry that allows developers to plug in new implementations to the existing framework. Both services and portlets in GridPortlets can leverage other services to build more sophisticated applications and share data. For example the Job Submission Portlet accesses the MyProxy (26) service to determine if a user has a valid credential, and if not, refers the user to the MyProxy portlet to retrieve a valid proxy.

2.2. Grid Portal Toolkits 9 GridPortlets is packaged with five generic JSR-168 compliant portlets that can be used without any additional development: Resource Registry Portlet - Administrators can manage the set of resources their Grid portal makes available to its users. Resource Browser Portlet - Users can browse the resources to which they have access, including the services, job queues, and accounts that are available on remote computers. Credential Retrieval Portlet - Users can retrieve credentials from a MyProxy server and gain single sign-on access to both the portal and Grid resources. Job Submission Portlet - Users can submit and manage jobs on their resources using Globus. Resource brokering systems are also supported through the same API, although currently only GRMS has been implemented. File Browser Portlet - Users can browse and manage physical files on Grid resources. Logical files registered with a logical file service can be accessed if a service is available. 2.2.2 OGCE The OGCE project was established in Fall 2003 with funding from the National Science Foundation Middleware Initiative. OGCE is a open-source collaborative project that leverages Grid portal research and development from the University of Chicago, Argonne National Laboratory, Indiana University, the University of Michigan, the National Center for Supercomputing Applications, the San Diego State University, and the Texas Advanced Computing Center.

2.2. Grid Portal Toolkits 10 Figure 2.2: OGCE 2 Architecture The basis of the OGCE architecture, as shown in Figure 2.2, is pluggable components in the form of services and portlets. OGCE uses the Java CoG as its main service API for accessing Grid resources. GridPort s Grid Portal Information Repository (GPIR) is used to retrieve local resource information, such as machine loads and queue status. OGCE comes packaged with services to support Grid operations such as workflow and queue management with the Open GCE Runtime Engine (OGRE) (27) and the Berkeley- Illinois-Maryland Association (BIMA) QueueManager. OGCE provides container-independent mechanisms that allow portlets to share data. For example, the MyProxy Manager allows other portlets accessing Grid resources to use credentials retrieved by the MyProxy portlet. The OGCE team delivered OGCE release 2 (OGCE 2), whose portlets are compliant to JSR-168 and can currently be deployed into either GridSphere or uportal. The following are the core Grid portlets in OGCE 2: Proxy Management Portlet - Enables users to retrieve credential proxies from a MyProxy server specified by the user and allows them to remove retrieved credential

2.2. Grid Portal Toolkits 11 proxies. Job Submission Portlet - Provides a simple user interface for submitting and tracking jobs on a Globus-based Grid. File Transfer Portlet - Uses GridFTP (28) for managing files among Grid machines, supporting both uploading and downloading files. GPIR Browser Portlet - Allows users to browse Grid and portal-related resource information that has been aggregated and cached by the GPIR web-service. Resource data is categorized into compute resources, storage resources, and visualization resources. OGRE Events Viewer Portlet - Allows users to monitor OGRE events on a specified server. Using OGRE enables users to write workflow scripts to execute a flow of controlled tasks. BIMA Queue Viewer Portlet - Allows users to monitor queue status and status of jobs in queues on a specified server using BIMA. The BIMA QueueManager is a Java application that supports the execution of multi-stage jobs, where individual jobs may be dispatched to a Globus-based Grid through the Java CoG. Viscosity Portlet - Allows users to access the central data repository developed by the Network for Earthquake Engineering Simulation (NEES) to store or retrieve files and associated metadata. 2.2.3 Comparison between GridPortlets and OGCE Grid-middleware and portal technologies are constantly evolving as new technologies are developed and more stakeholders become involved. It is only natural in this evolution that

2.2. Grid Portal Toolkits 12 Table 2.1: Comparison between GridPortlets and OGCE 2 Features GridPortlets OGCE 2 Service API a uniform and consistent highlevel portal service API a heterogeneous set of service APIs Grid Middleware GT2, GT3, GT4, and GridLab GT2, GT3, and GT4 middleware Persistent Using hibernate to persisting information Not known Layer about resources and jobs Presentation JSP and a UI component model JSP and Velocity Layer Core Grid Portlets Portability Credential Management, Resource Information Provider, Job Management (Support GRAM and GRMS), File Management Compliant to JSR 168, but only ready for GridSphere Credential Management, Resource Information Provider, Job Management (Support GRAM and Condor), File Management Compliant to JSR 168 and ready for GridSphere and uportal the packages supporting the Grid also will be rapidly changing. Therefore, I focuses on the functionality provided by GridPortlets v1.0 and OGCE 2 RC 4. This section gives a relatively brief comparison between them and further details can be referred to our review paper. Table 2.2.3 summarizes the comparison of main features of Grid portal toolkits. Both GridPortlets and OGCE provide information services, job and file management services, and authentication services for portal developers. While overlapping in the basic functionalities of Grid services, GridPortlets and OGCE differ greatly in the service APIs they provide for developers. One key feature of GridPortlets is that it defines a single and consistent high-level API between portlets and underlying Grid services. This uniform API abstracts developers from underlying Grid technologies and infrastructure while supporting multiple implementations. The API can be implemented with Grid programming

2.2. Grid Portal Toolkits 13 frameworks, such as Java CoG and Java Grid Application Toolkit (GAT) (29), or with Grid middleware toolkits, such as Globus Toolkit and GridLab middleware services. Developers choose the implementations of particular services during portal configuration or in some cases users are given the choice of which service to use. The current implementation of GridPortlets API support GT2, GT3, and GT4 through Java CoG 1.1, and the Grid middleware services developed in the GridLab project. It uses the GridSphere persistence layer, an object/relational API that is implemented with Hibernate, to store job and resource information. By contrast, OGCE provides a heterogeneous set of service APIs, aggregated mainly from other projects, such as the Java CoG and GridPort. For job submission and physical file management, Java CoG provides a uniform API that abstracts developers from underlying implementations, including Condor, SSH, and different versions of the Globus Toolkit. GridPortlets and OGCE both use JavaServer Pages (JSP) for generating web presentation. Additionally, GridPortlets UI component model provides a number of reusable JSP components, including file browsing and job submission components. These reusable UI components facilitate developers in building interactive, friendly web interfaces. Although OGCE does not supply reusable JSP UI components, it provides tools to support for Velocity-based portlets, which may help developers to port Jetspeed-based portlets into JSR-168 containers. GridPortlets and OGCE both offer Grid-related portlets that can be integrated into Grid portals without additional programming, including credential management, resource information browser, job management, and file management. Their functionalities are similar and main differences are on user interface and some minor features. Since GridPorlets services are based on the service framework of GridSphere, it is not completely independent and, as a result, portlets based on GridPortlets, even though

2.2. Grid Portal Toolkits 14 technically compliant with JSR-168, can not easily be deployed into portlet containers other than GridSphere. By contrast, OGCE provides container-independent services and its portlets can be deployed in either GridSphere or uportal. In addition, OGCE includes a suite of HttpUnit tests for its Grid-related portlets, which can be used to verify the OGCE installation in a portlet container.

Chapter 3 Collaborative Coastal Modeling 3.1 Goals SCOOP is working towards a integrated and coordinated coastal ocean observation and prediction system, leveraging emerging regional efforts and cutting edge information technologies. It aims to integrate distributed real-time ocean observing stations and regional coastal modeling entities, to run ensembles of numerical hydrodynamic models for the prediction, verification and visualization of critical storm surge and wave behaviour during severe storms and hurricanes. SCOOP is addressing several needs in reaching their goals including: Ubiquitous and easy access to data of all types, including sensor, satellite, model and visualization data. Automated deployment of models across heterogeneous resources, including complex workflows and ensembles. Creation of data standards and interoperability of model codes. 15

3.2. Science Scenarios 16 Capabilities for coupled and multi-scale models. Operational procedures which can provide GIS visualization and notification to emergency management personnel. Building an infrastructure to meet these needs and supply timely information about severe events requires attention to reliability, fault tolerance, scheduling, as well as end user presentation and interaction. 3.2 Science Scenarios The current SCOOP members is a combination of research institutions, university programs and national agencies, including Louisiana State University, University of Alabama at Huntsville, Texas A&M University, GoMOOS, University of Florida, University of North Carolina at Chapel Hill, the National Oceanic and Atmospheric Administration (NOAA), University of Maryland, University of Miami, and Virginia Institute of Marine Science. This myriad collaboration engages researchers with diverse skill sets and varying degrees of technical expertise. One motivating collaborative scenario (3) for SCOOP is the following: An evolving hurricane in the tracking region begins the complex process of predicting and validating a hurricane path and its corresponding hydrodynamic impacts. Feeds from the National Hurricane Center trigger analytical models at the University of Florida which generate appropriate wind fields. Additionally atmospheric models such as No- GAPS, COAMPS, MM5, NCEP-NAM among others, are pushed by diverse modeling entities in a non-deterministic manner. A brief description of these models can be found at the National Hurricane Center page (30). These winds are pushed into a SCOOP Storage

3.2. Science Scenarios 17 archive using automated mechanisms as they become available. Researchers from across the Nation, alerted to the impending event by notifications sent via Email, SMS and IM, authenticate to the SCOOP portal from wherever they are located. From an Ensemble Model interface, they query for all possible atmospheric models available for the corresponding temporal specifications, in addition to the analytical winds. Based on the query results a matrix of possible hydrodynamic models vs. the available atmospheric datasets is generated. For datasets from atmospheric models that have yet to arrive in the archive watchdogs are deployed tagged with the associated coupled ocean modeling workflow. The portal interface provides researchers with the ability to configure models as needed, and prioritize the order in which they should be run. The matrix of models and currently available datasets is then used to automatically stage data and schedule the ensemble of hydrodynamic models on the SCOOP Grid, comprised of machines across the region. As results become available, notifications are dispatched to collaborators and output data are pushed back into the archive. GIS driven visualization services (31) allow the modelers and end users to analyze results using interfaces that provide an overlay of results obtained from all the models. Interfaces pinpoint the location of available sensors allowing modelers to compare the model ensemble with realtime data from Sensor Stations. Such correlated results comparison which allows for model data validation and verification results in improved forecasts of storm surge and inundation displayed using context relevant maps eg. street overlays or artificial levee diversion structures.

3.3. CCT Role 18 3.3 CCT Role The SCOOP group at CCT is contributing to the deployment of a cyber infrastructure for SCOOP. Our role includes: 1. Surveying the coastal research community to identify hardware and software requirements 2. Providing a data archive (32) for storing observational data from satellite and buoy sources and results from coastal and atmospheric model simulations. 3. Designing a Grid testbed, the SCOOP Grid, for model deployment and data archive. 4. Building a Grid-enabled science portal to provide the coastal research community with easy access to data, models, and resources in the SCOOP Grid. 5. Providing a command-line toolkit (33) for accessing the data archive 6. Showing prototype examples of ensemble scenarios where multiple wind input data can be automatically located and fed into multiple wave or surge models and the resulting data will be staged to the data archive. The CCT SCOOP team includes Gabrielle Allen (Lead), Jon MacLaren (Manager and Data Archive), Andrei Hutanu (Visualization), Ian Kelley (GridSphere), Chirag Dekate (Models and SCOOP Grid), Chongjie Zhang (SCOOP Portal), Dayong Huang (Data clients), Zhou Lei (Grid Application Toolkit), Archit Kulshrestha (Condor), Sasanka Madiraju (SCOOP Grid) and Edward Jerome Tate (Visualization).

3.4. Infrastructure 19 3.4 Infrastructure Currently, coastal researchers typically access data from multiple sources (e.g. wind fields from NCEP or USGODAE, hurricane tracks from NHC, observation data from coastal observatories like WAVCIS or SEACOOS) using HTTP, FTP or more recently the LDM (34) protocols. Operational workflows are deployed using cron type scripts, which are hard to adapt to address unreliable file arrival or fault tolerance. Usually the models involved in SCOOP (e.g. ADCIRC, WWIII, SWAN, WAM, CH3D, ELCIRC) are run only at local sites, and may require many different configuration and input files. The various institutions deploy their own web servers, delivering results at different times in varying data formats and data descriptions. Activities in SCOOP and other projects are addressing the complexity of dealing with different data sources and formats. A prime need is to develop data standards to facilitate sharing and collaboration. In lieu of a formal standard, SCOOP has developed a file-naming convention throughout the project that encodes enough information to serve as primary metadata. LSU have established an advanced Grid-enabled data storage archive service, providing essential features such as digestion and retrieval of data via multiple protocols (including GridFTP and HTTP), a logical file catalog, and general event-based notification. Compute resources for researchers are available in the form of the SCOOP Grid which is comprised of resources distributed across multiple institutions (Louisiana State University, University of North Carolina, MCNC, University of Florida, Virginia Institute of Marine Sciences). Basic Grid middleware such as Globus Toolkit 3 (GT3) is deployed across the resources. Condor is deployed across the LSU-SCOOP Grid for prototyping scheduling and job management scenarios. Figure 3.1 shows the current Grid infrastructure we deployed for SCOOP. An ongoing goal is to be able to coordinate the deployment

3.4. Infrastructure 20 Figure 3.1: SCOOP Grid Infrastructure and scheduling of operational and research simulations across the SCOOP Grid.

Chapter 4 SCOOP Portal The SCOOP Portal provides the SCOOP user community with a centralized gateway mechanism to submit and manage coastal model simulations and keep track of a large amount of data files. To better understand portal requirements from the community, we worked closely with coastal researchers and developed use-case scenarios which have driven the design of the portal. 4.1 Requirements After consulting and discussing with the SCOOP coastal researchers and modelers, we identified different requirements for the SCOOP portal development. One notable requirement was that although the scientists wanted to restrict access to data to those in the collaboration (for one reason, to address potential problems with casual interpretation of severe storm data), no finer grained access control was required. Additionally, all machines in the SCOOP Grid are shared resources among scientists, simplifying authorization needs considerably. In implementing this first version of the SCOOP portal, we concentrated on 21

4.1. Requirements 22 the following two user scenarios. Archive Access The SCOOP project has set up an archive service to store the source atmospheric data, the wave/surge data generated by model simulations, and also other observed data that might be used to verify the model accuracy. The SCOOP Portal is required to provide functionality to facilitate modelers and researchers in querying and retrieving datasets. The steps for accessing data files are as follows: 1. A user selects a class of data and specifies corresponding metadata to query a metadata catalog service to discover datafiles of interest, e.g. Output datafiles from AD- CIRC model simulations performed at Louisiana State University for the Gulf of Mexico region, during August 2005. A list of Logical File Names (LFNs) are returned from the query to the user. 2. The user can select one or more LFNs of interest, and then the portal contacts the archive s logical file service to return the physical file locations to the user. 3. The users can choose either to download the data file to the local machine or to perform a third-party transfer via a range of protocols. Model Simulations One of scientific objectives of SCOOP is to run an ensemble of hydrodynamic models driven by input conditions from a range of different atmospheric models. The steps for running hydrodynamic model simulations are as follows: 1. A user is required to retrieve a proxy credential to authenticate to Grid resources.

4.2. Choice of Grid Portal Toolkits 23 2. The user specifies metadata describing atmospheric data and a hydrodynamic model. The SCOOP Portal contacts a metadata catalog service with specified metadata and the archive s logical file service to get atmospheric data files of interest. Based on the data files, the SCOOP Portal constructs a list of possible simulations, depending on the available input files in the archive. Each of these simulations will then be submitted to a job scheduler. 3. The user then can track the progress of each simulations via the SCOOP Portal or use the portal s notification services, which include AIM and email. 4. Upon successful completion of each simulation the results are pushed into the archive for dissemination and further processing. 4.2 Choice of Grid Portal Toolkits The requirements from the SCOOP community are still evolving. The use of a mature portal framework and a well-designed Grid portal toolkit was necessary to be able to focus on the business logic of the SCOOP use-case scenarios and to allow for extensibility to future requirements. Based on our comparative analysis of GridPortlets and OGCE, we chose GridSphere and GridPortlets as our main toolkits to speed up the process of developing and deploying an application portal for SCOOP modelers and researchers. GridSphere is a free, open-source portal framework developed by the European Grid- Lab project, which focused on developing Grid application tools and middleware. Grid- Sphere provides a well documented set of functionality, including portlet management, user management, layout management, and role-based access control. Its portlet-based architecture offers flexibility and extensibility for portal development and facilitates soft-

4.3. Design 24 ware component sharing and code reuse. GridSphere is compliant with the JSR-168 portlet specification which allows portlets to be developed independently of a specific portal framework. GridSphere s portlet service model provides developers with a way to encapsulate reusable business logic into services that may be shared between many portlets. The advantages of using GridSphere come not only from its core functionalities, but also from its accompanying Grid portal toolkit, GridPortlets. GridPortlets abstracts the details of underlying Grid technologies and offers a consistent and uniform high-level service API, enabling developers to easily create custom Grid portal web-applications. The Grid- Portlets services provide functionalities for managing proxy credentials, resources, jobs, and remote files, and supports persisting information about credentials, resources, and jobs submitted by users. The GridPortlets service API currently supports both GT2 and GT3. In addition, GridPortlets delivers five well-designed, easy-to-use portlets, which include: resource registry, resource browser, credential management, job submission, and file management. 4.3 Design The architecture of the SCOOP Portal is based on the GridSphere framework. Figure 4.1 shows the SCOOP Portal software components and their interactions and relationships. From the simplified diagram, it can seen that SCOOP portlets use SCOOP services for application-specific functionality and business logic. The SCOOP Portal services themselves use or extend services built into the GridSphere framework, the GridPortlets package, as well as some third-party packages. For example, the SCOOP portal services mainly use the GridPortlet service API to interact with Grid resources, such as submitting jobs and moving remote files. Most portlets are independent from one-another, however, they

4.4. Implementation 25 End User 1 End User 2... End User n SCOOP Portal (based on GridSphere) Model Portlet Archive Portlet Request Portlet... Credential Portlet Resource Browser Portlet SCOOP Portal Services GridPortlets Services Archive SCOOP Job Management Download Tracking Request Tracking Credential Management Resource Browser Job Management File Management Grid Middleware GT2 Java CoG API GT3 GridLab Service API (igrid, GRMS, File Service,...) GSI, GRAM, GridFTP, MDS GSI, RLS, GridFTP Computing Resources Archival Storage Figure 4.1: SCOOP Portal architecture based on GridSphere. can communicate with each other via the service layer. For example, the credential portlet calls the credential service to retrieve proxy credentials from a MyProxy server, and later a job submission portlet can get the retrieved credentials to authenticate with Grid resources. This portlet-based, service-oriented, architecture greatly speeds up portal development and exhibits high extensibility. 4.4 Implementation The SCOOP Portal is implemented with GridSphere version 2.0.3 and GridPortlet version 1.0, which together provide a coherent core set of relevant functionality. When evaluating the SCOOP community requirements, we found several common functionalities had already been implemented in other Grid portal projects. To avoid reinventing these, the fully

4.4. Implementation 26 deployed SCOOP Portal contains not only portlets developed specifically for the SCOOP, but also shared portlets from GridSphere, GridPortlets, and GridLab Testbed (35). The following list illustrates the functionalities of portlets and services that are specific to the SCOOP project: Archive enables users to retrieve, either by HTTP to the local machine or GridFTP to a remote machine, SCOOP data files from archive storage. The interfaces provides queries using metadata, and custom interfaces to specific data formats such as OpenDAP. SCOOP Job Submission provides custom interfaces and options for users to launch coastal models. The interface matches models to available data files in the archive. Simulation Tracking and Notification allows users to track the progress of active simulations via the SCOOP Portal and to receive notification via email or AIM. Request Tracking coordinates team work by allowing users to manage and track the status of tasks and defects in various SCOOP sub-projects. Download Tracking tracks downloads of software tools distributed through the portal. The following list illustrates the Grid-related functionalities of portlets and services that are deployed to the SCOOP Portal but developed by other Grid projects. All other portlets are from the GridPortlets distribution except the Grid Resource Status Monitoring portlet that is developed by the GridLab project. Credential Management enables users to retrieve, renew, and delete proxy credentials from a MyProxy server.

4.4. Implementation 27 Resource Registry enable portal administrators to register or unregister Grid resources, such as computing resources or services. Resouce Browser enable users to view available Grid resources, including information about hardware configuration, services, job queues, and accounts on remote machines. Grid Resource Status Monitoring enable users to view the information about whether particular services and software components are installed and available on each machine, and the possible reasons for the services that are not available. Physical File Management enable users to browse and manage files on remote machines. 4.4.1 Archive The current archive storage contains three classes of data files: source atmospheric data, simulated wave/surge data, and other observed data to verify the model accuracy. The three class of data are associated with class specific metadata attributes and query interfaces. Figure 4.2 shows the archive portlet for querying for simulated wave/surge data files. The archive portlet gathers metadata information from user requests and retrieves a list of logical file names which have matching metadata from the archive portal service. Currently, SCOOP does not have an appropriate metadata service, the logical file name used in queries is generated from metadata information contained in the SCOOP file-naming convention. The logical file name may contain wildcard characters to accommodate unknown metadata or to select a range of files. The mappings of physical file names and logical file names can be obtained by querying the archive s logical file services, currently provided

4.4. Implementation 28 Figure 4.2: Archive portlet using metadata for querying and retrieving SCOOP data files from storage.

4.4. Implementation 29 Figure 4.3: Copy retrieved data files to a remote machine.

4.4. Implementation 30 Figure 4.4: Manage files through physical file management portlet by an instance of Globus Replica Location Service (RLS) version 2.2 (36). To provide performance scalability, the query results are shown with dynamically paging techniques, because the RLS API does not support merely returning the size of matched results. Users can retrieve SCOOP data files via HTTPS to their local machine, or perform a third-party transfer via GridFTP. The transfer service is built on the GridPortlet file services and resource services for directory selection and file copy. Figure 4.3 shows the user interface to allow users to specify a location on a remote machine when copying retrieved data files. Users can manage copied files through physical file management portlet, as shown in Figure 4.4. Currently the logical file entries in the RLS point to the physical file entries available locally. Efforts are underway to incorporate distributed storage resources including SCOOP data store instances at TAMU. The LSU SCOOP archive is also being expanded to leverage

4.4. Implementation 31 SRB based terabyte scale storage at SDSC. Evolving versions of the SCOOP archive access portlet will address issues of accessing federated data stores. 4.4.2 SCOOP Job Management From the user scenarios, SCOOP models can run on different input data types (e.g. using wind data generated either by analytic means or by other models). As shown in Figure 4.5, the SCOOP job submission portlet allows users to select multiple different types of wind data for a particular model. Using specified metadata, the job submission portlet queries the logical file service and generates a file list for each selected data type, and constructs a list of such tasks. The SCOOP job service submits each task to Condor via Globus GRAM to run the model on each file list. Hence one SCOOP simulation job may contain several sub-job runs. The SCOOP job submission is mainly built on the GridPortlet service API. To store custom job information and the parent-child job hierarchy, we provide a persistent layer for SCOOP job management using Hibernate. Later, we can track the job information via this portlet, as shown in Figure 4.6. Active proxy credential is required for job submission. Users need to delegate their Grid credentials to the MyProxy server. The SCOOP Portal allows users to retrieve credentials from a MyProxy server via GridPortlets credential management portlet. The job submission service will automatically use retrieved credentials for authenticating with Grid resources. The SCOOP job portlet allows users to view the status and output of each sub-job run, as shown in Figure 4.7. Notification of the job status, currently by AIM or email, is implemented by another service: Simulation Tracking and Notification. Each sub-job registers itself with and continuously sends updated information to the notification service

4.4. Implementation 32 Figure 4.5: Job submission portlet for running ensembles of simulation models on different types of data.

4.4. Implementation 33 Figure 4.6: Job submission portlet for running ensembles of simulation models on different types of data.

4.4. Implementation 34 Figure 4.7: Show output of each sub-job run.

4.4. Implementation 35 via XML-RPC. The notification service collects and sends out this information via email or AIM, depending on the user preference. Currently, the functionality of the SCOOP job management interfaces is limited by the lack of interoperability of the underlying models and data sources. As ongoing work is completed, more complex workflows and ensemble runs will be implemented.

Chapter 5 Conclusion This report has investigated state-of-the-art Grid portal toolkits represented by GridPortlets and OGCE, and described the background, design and implementation of a Grid-enabled science portal for the SCOOP coastal ocean observing and modeling community. The SCOOP portal, built with the GridSphere Framework, currently integrates customized Grid portlet components for data access, job submission, resource management and notification, and provides researchers and modelers with easy access to integrated data and compute resources. While the portal interfaces have thus far been well received by the SCOOP community, the challenge now is to make the portal an essential part of the scientists usual working environment. This requires adding new scenario and usecase driven features to the portal. These enhancements will include: advanced ensemble simulation interfaces to allow modelers to run a spectrum of hydrodynamic models each with different input conditions integration of GIS technologies into the portal to provide geo-referenced interactive 36