DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure Arcot (RAJA) Rajasekar DICE/SDSC/UCSD
What is SRB? First Generation Data Grid middleware developed at the San Diego Supercomputer Center (SDSC) A distributed file system, based on a client-server architecture. Allows users to access files seamlessly across a distributed environment, based upon their attributes rather than just their names or physical locations. It replicates, syncs and archives data, connecting heterogeneous resources in a logical and abstracted manner.
User Base & Diversity of Applications Collections at SDSC: One PetaBytes, 200+ Million files Multi-disciplinary Scientific Data Astronomy, Cosmology Neuro Science, Cell-Signalling & other Bio-medical Informatics Environmental & Ecological Data Educational (web) & Research Data (Chem, Phys, ) Archival & Library Collections Earthquake Data, Seismic Simulations Real-time Sensor Data Growing at 1TB a day Supporting large projects: TeraGrid, NVO, SCEC, SEEK/Kepler, GEON, ROADNet, JCSG, AfCS, SIO Explorer, SALK, PAT, UCSDLibrary,
BIRN: Biomedical Information Research Network
NOAO Zone Architecture
ROADNet Architecture ISGC 2004, Taipei, Taiwan
What is irods? Second Generation Data Grid It is a data grid system data virtualization A distributed file system, based on a client-server architecture. Allows users to access files seamlessly across a distributed environment, based upon their attributes rather than just their names or physical locations. It replicates, syncs and archives data, connecting heterogeneous resources in a logical and abstracted manner. It is a distributed workflow system policy virtualization Policy is a first class object to be managed Long-term policy captures provenance Policy can be declarative/descriptive as well as procedural/normative Policy can be a process enforced on new objects Policy can be a constraint whose integrity can be checked any time
Some sample policies Every dataset should have two copies in two distributed locations Every data stream should have an associated metadata in the catalog showing lat,long,elev, A data from an observatory is allowed access only to users from that observatory for the first 3 months When a new thermal vent instrumentation comes online send email to aaa@bbb When a data stream X has not received any data packet in 2 days send email to ccc@ddd
Data Virtualization with irods Logical name space Location independent identifier Persistent identifier User Application Collection owned data Access controls Audit trails Checksums Descriptive metadata Common naming convention and set of attributes for describing digital entities Inter-realm authentication Single sign-on system Archive at SDSC Database at MBARI File System at WoodsHole
Policy Virtualization with irods Micro-Services Functions with well-defined semantics Transactional - recovery Context of application Message Queues Rules Triggered by events Conditional execution of alternative rule declarations System constructs: loops, recursion, branching Workflows Distributed Execution Immediate, Deferred, Periodic Execution at SIO User Application Execution at MBARI Execution at WoodsHole
Data Management Virtualization Access Interface Standard Access Actions Data Grid Services Standard Micro-services Storage Protocol Storage System Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Policies & Practices
Management Virtualization Examples of management policies Integrity Validation of checksums Synchronization of replicas Data distribution Data retention Access controls Authenticity Chain of custody - audit trails Track required preservation metadata - templates Generation of Archival Information Packages
irods Architecture Client Interface Admin Interface Resources Resource-based Services Rule Invoker Service Manager Rule Modifier Module Config Modifier Module Metadata Modifier Module Rule Micro Service Modules Engine Consistency Check Module Consistency Check Module Consistency Check Module Metadata-based Services Micro Service Modules Current State Rule Base Confs Meta Data Base
Distributed Management System Rule Engine Policy Management Data Transport Virtualization Metadata Catalog Persistent State information Execution Server Engine Side Workflow Messaging System Execution Control Scheduling
irods Rule Each rule defines Event Condition Action sets (micro-services and rules) Recovery sets (transaction oriented) Rule types Atomic, applied immediately Deferred, support deferred consistent constraints Periodic, typically used to validate assertions
Sample Rules ingestobject(*f) $userdept == sdsc OR $userdept == sio createfile(*f), registerfile(*f), computechksum(*f),!, findbackuprsrc(*f, *R), replicatefile(*f, *R), computechecksum(*f, *R), comparechecksum(*f). ingestobject(*f) $userdept == nvo createfile(*f), registerfile(*f), extractfitsmetadata(*f). ingestobject(*f) createfile(*f), registerfile(*f).
Rule-based Data Management Administrator-controlled rules to implement management policies Administrative - adding / deleting users, resources Data ingestion - pre-processing, post-processing Data transport / deletion - parallel I/O streams, disposition Data retention policies expiration, over-writes, versions Data Reliability Policies copies, formats, migration, checking,
NVO Micro-Services msiobjbyname Accesses the NASA/IPAC Extragalactic Database (NED) Uses Web-Service provided by: http://voservices.net/ned Given an object name returns RA, Dec and Type msisdssimgcutout_getjpeg Accesses SDSS Uses Web-Service provided by: http://skyserver.sdss.org/ Given RA and Dec, and a cut-out size returns Image Cutout file in a buffer
NVO irods Rules getobjpositionbyname.ir Executes the micro-service: msiobjbyname Shows as output RA, Dec and Type getcutoutbyposition.ir Executes the micro-service: msisdssimgcutout_getjpeg Stores the Image Cutout as a file in irods. Uses other irods system micro-services getcutoutbybyobjname.ir Chains the two micro-services Chaining of two web-services from two service providers Takes an Object Name, Cutout Parameters and stores an image cutout in an irods File With other NVO and image manipulating micro-services, with similar or different functionalities, one can write complex (and alternate) irods rules.
Conclusion More Information: www.diceresearch.org Contact: sekar@sdsc.edu