INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

Similar documents
Managing Next Generation Sequencing Data with irods

Data Management using irods

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

Data grid storage for digital libraries and archives using irods

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Technical. Overview. ~ a ~ irods version 4.x

Automated and Scalable Data Management System for Genome Sequencing Data

The National Consortium for Data Science (NCDS)

RELATED WORK DATANET FEDERATION CONSORTIUM, IRODS,

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

Technology solutions for managing and computing on largescale biomedical data

Digital Preservation Lifecycle Management

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Integrating Data Life Cycle into Mission Life Cycle. Arcot Rajasekar

Assessment of RLG Trusted Digital Repository Requirements

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

Integrated Rule-based Data Management System for Genome Sequencing Data

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods

integrated Rule-Oriented Data System Reference

Figure 1: MQSeries enabled TCL application in a heterogamous enterprise environment

9 ways to revolutionize HR with paperless productivity

WebDat: Bridging the Gap between Unstructured and Structured Data

Implementing an Electronic Document and Records Management System. Key Considerations

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK wataru.takase@kek.jp

State of Michigan Records Management Services. Guide to E mail Storage Options

Using Databases to Manage State Information for. Globally Distributed Data

Abstract. 1. Introduction. irods White Paper 1

Cloud Archive & Long Term Preservation Challenges and Best Practices

Symantec Enterprise Vault.cloud Overview

ADDENDUM 5 TO APPENDIX 6 TO SCHEDULE 3.3

ECM Migration Without Disrupting Your Business: Seven Steps to Effectively Move Your Documents

Shibbolized irods (and why it matters)

Data Grids. Lidan Wang April 5, 2007

Transition Guidelines: Managing legacy data and information. November 2013 v.1.0

CAREER TRACKS PHASE 1 UCSD Information Technology Family Function and Job Function Summary

irods Technologies at UNC

Capture Your Assets (CYA) Data Retention Policies in E&P

PASIG May 12, Jacob Farmer, CTO Cambridge Computer

irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI

Data Grid Landscape And Searching

Concepts in Distributed Data Management or History of the DICE Group

Managing and Maintaining Windows Server 2008 Servers

DATA ARCHIVING. The first Step toward Managing the Information Lifecycle. Best practices for SAP ILM to improve performance, compliance and cost

TERRITORY RECORDS OFFICE BUSINESS SYSTEMS AND DIGITAL RECORDKEEPING FUNCTIONALITY ASSESSMENT TOOL

EMC NETWORKER AND DATADOMAIN

Beyond the Data Lake

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Geospatial Data and Storage Resource Broker Online GIS Integration in ESRI Environments with SRB MapServer and Centera.

Cisco Discovery 3: Introducing Routing and Switching in the Enterprise hours teaching time

Nexus Professional Whitepaper. Repository Management: Stages of Adoption

Enterprise Information Management Services Managing Your Company Data Along Its Lifecycle

7Seven Things You Need to Know About Long-Term Document Storage and Compliance

How To Use The Hitachi Content Archive Platform

CIP s Open Data & Data Management Guidelines and Procedures

Collaborative SRB Data Federations

Cisco Process Orchestrator Adapter for Cisco UCS Manager: Automate Enterprise IT Workflows

Why enterprise data archiving is critical in a changing landscape

Pluggable Rule Engine

What we do? Our services include:

SOLUTION BRIEF KEY CONSIDERATIONS FOR BACKUP AND RECOVERY

InstaFile. Complete Document management System

Common Questions and Concerns About Documentum at NEF

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

Symantec Enterprise Vault.cloud Overview

A Service for Data-Intensive Computations on Virtual Clusters

Course Overview. What You Will Learn

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network

Archival of Digital Assets.

Data Services for Campus Researchers

Hitachi Cloud Service for Content Archiving. Delivered by Hitachi Data Systems

TRANSFORMING DATA PROTECTION

How To Manage Security On A Networked Computer System

Benefits of upgrading to Enterprise Vault

A 15-Minute Guide to 15-MINUTE GUIDE

Release & Deployment Management

Protecting Official Records as Evidence in the Cloud Environment. Anne Thurston

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.

web archives & research collections

SSM6437 DESIGNING A WINDOWS SERVER 2008 APPLICATIONS INFRASTRUCTURE

ILM et Archivage Les solutions IBM

Data Sheet: Archiving Symantec Enterprise Vault Discovery Accelerator Accelerate e-discovery and simplify review

OSG PUBLIC STORAGE. Tanya Levshina

Storage Virtualisation in the Cloud

irods at CC-IN2P3: managing petabytes of data

5 CMDB GOOD PRACTICES

MOVING TO THE NEXT-GENERATION MEDICAL INFORMATION CALL CENTER

Streamline Enterprise Records Management. Laserfiche Records Management Edition

MOVING THE CLINICAL ANALYTICAL ENVIRONMENT INTO THE CLOUD

Recordkeeping for Good Governance Toolkit. GUIDELINE 14: Digital Recordkeeping Choosing the Best Strategy

HP Records Manager. A single solution for enterprise-scalable document and records management

Symantec Backup Exec 12.5 for Windows Servers. Quick Installation Guide

Migrate from Exchange Public Folders to Business Productivity Online Standard Suite

The Benefits of Archiving and Seven Questions You Should Always Ask

Provide access control with innovative solutions from IBM.

Redefining Oracle Database Management

BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use

Transcription:

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill bendor@unc.edu http://irods.org/ SESYNC Model Integration Workshop

Important note: Most of this is blatantly ripped off from irods official documentation irods executive summary/introduction Reagan Moore, UNC SILS Adil Hasan, Liverpool

Punchline Model integration and flexibility needs sophisticated tools for collecting, curating, and updating databases. What happens when our model integration dreams come true? Maintaining data provenance and facilitating loose/tight coupling is key.

Overview What is irods? Who Uses irods? What Can irods Do?

What is irods? integrated Rule Oriented Data Management System. irods is open source data grid middleware that implements Data Virtualization Automation of Data Operations A Robust Metadata Catalog Data Management Policy Enforcement and Compliance Verification

irods? Based on considerable experience from Storage Resource Broker (SRB) developed by Data Intensive Cyber Environment (DICE) group at UNC, UCSD, SD Super Computer Center. Found many groups used SRB to store large quantities of data. A lot of server-side post-processing of the data is required (e.g. replicate files, convert to different format, checksum etc). Almost all management is policy driven.

Emergence of irods SRB experience motivated requirements for a new data management system Contained all SRB functionality Add work-flow to manage server-side post-processing Model coupling potential grows Configurable only include the 'services' you need Open-source SRB license imposed sever restrictions on the academic community

Claims to fame An infinitely configurable data janitor irods is the kind of technology you need to host everyone s unstructured data. A powerful data migration tool A data preservation technology A tool for providing fine-grained privacy and security controls Can be seen as a basis for a Digital Repository/Archive Digital Repository/Archive is a Policy Driven System

but wait, there s more! Extensible: irods has command-line clients, APIs for numerous programming languages, and web clients Data is accessed using familiar APIs Supports new plug-ins for storage resources, authentication mechanisms, microservices, and network protection irods lets system administrators roll out an extensible data grid without changing their infrastructure. OPEN SOURCE

How will models interface with large amounts of heterogeneous data? Microservices Authentication Mechanisms Integrated Custom Client Command Line Client Familiar APIs Storage Resources Network Services Web Client Coordinating Resources Databases

irods is Middleware mid dle ware `midl,we(ə)r noun software that acts as a bridge between an operating system or database and applications, especially on a network Applications/Users irods Data Grid Operating System Filesystem irods Middleware Layer - abstracts out the low-level I/O - provides a uniform interface to heterogeneous storage systems (Heterogeneous) Storage Systems

Data Virtualization across Grids Administrators control how the grid is presented to users Implement replication, load-distribution, and archiving policies that are completely transparent to the user. Independent grids can be federated with one another to allow controlled access to remote grids or grids operated by separate workgroups. irods Data Grid Federation irods Data Grid Boston Chicago RTP System 1 RTP System 2 London

Automation of Data Operations With irods, any agent can initiate any action upon any trigger. This powerful capability allows administrators to automate policies such as: Validating checksums every time a new file is placed in a folder. Backing up a set of files every second Thursday. Archiving data that hasn t been accessed in over 1 month. Logging each time a file is replicated or destroyed. Permitting a file to be accessed by multiple independently defined user groups. These operations can be distributed to the storage resource or client.

Policy Enforcement and Compliance Verifications Metadata + automation = infrastructure to enforce mandated data management policies Records retention and privacy protection requirements Audit trails generated by irods can be used to verify compliance with policy.

Who Uses irods? 15 iplant: 15,000 users on an irods data grid with 100 million files http://www.iplantcollaborative.org/ IN2P3 (French Nat. Inst. Nuclear and Particle Physics): over 6 PB of data managed by irods Sanger Institute (Genome Research): 20+ PB of irods data NASA Center for Climate Simulations: 300 million metadata attributes CineGRID (exchanging digital media): sites distributed across Japan- US-Europe

What Can irods Do? irods simplifies data grid management BnF (2008) Data on different storage devices at different locations can be centrally managed. In situ migration to new hardware can be managed by replicating the legacy resource before repurposing or decommissioning it. Backup and archiving are transparent and highly configurable using the automation and metadata capabilities in irods.

What Can irods Do? irods simplifies data discovery, data validation, and data processing. attribute: library attribute: total_reads attribute: attribute: library type attribute: attribute: total_reads lane attribute: attribute: type is_paired_read attribute: attribute: lane study_accession_number attribute: ute: library_id libra attribute: is_paired_read attribute: attribute: study_accession_number sample_accession_number attribute: ribute: sample_public_name total_ attribute: library_id attribute: attribute: sample_accession_number manual_qc attribute: ribute: tag type attribute: sample_public_name attribute: attribute: manual_qc sample_common_name attribute: ribute: md5lane attribute: tag attribute: attribute: sample_common_name tag_index attribute: ribute: study_title is_paire attribute: md5 attribute: attribute: tag_index study_id attribute: ribute: reference study_ attribute: study_title attribute: attribute: study_id sample attribute: attribute: ute: reference target li attribute: attribute: sample sample_id attribute: attribute: target id_run attribute: attribute: sample_id study attribute: attribute: id_run alignment attribute: study attribute: alignment attribute: library attribute: total_reads attribute: type attribute: lane attribute: is_paired_read attribute: study_accession_number attribute: library_id attribute: sample_accession_number attribute: sample_public_name attribute: manual_qc attribute: tag attribute: sample_common_name attribute: md5 attribute: tag_index attribute: study_title attribute: study_id attribute: reference attribute: sample attribute: target attribute: sample_id attribute: id_run attribute: study attribute: alignment User-defined and intrinsic metadata make stored data searchable. Validation and analytical tools can be automated to process incoming data. The results and process steps can be stored in the icat metadata catalog.

What Can irods Do? Distributed Data Management Data Preservation Digital Archives Data at scale Data Maintenance Automated Data Processing Data Curation Digital Libraries Large number of users Complex management tasks Critical policy enforcement Data Virtualization Data Protection and Security Data Sharing and Access Policy Enforcement

Where irods fits? Client interacts with digital repository to access data Access Digital Repository Storage IRODS Spans Digital Repository and Storage Domains

Rules Policies are implemented in irods as rules. Rule is a series of logically connected steps. Each step realised as a micro-service. IRODS rules fully featured: Contain loops and branches. Can have rules contained within rules. IRODS rules read from a rule-file (called core.irb by default).

Questions/ideas? Todd BenDor Department of City and Regional Planning bendor@unc.edu http://todd.bendor.org