Technical. Overview. ~ a ~ irods version 4.x

Similar documents
irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Data Management using irods

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, Integration Guide IBM

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, Integration Guide IBM

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Symantec Enterprise Vault.cloud Overview

Migrating to vcloud Automation Center 6.1

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Managing and Maintaining Windows Server 2008 Servers

Alfresco Enterprise on AWS: Reference Architecture

MySQL Strategy. Morten Andersen, MySQL Enterprise Sales. Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

User-ID Best Practices

Alliance Key Manager Solution Brief

OSG PUBLIC STORAGE. Tanya Levshina

MySQL Security: Best Practices

owncloud Architecture Overview

Introduction to Directory Services

Creating the Conceptual Design by Gathering and Analyzing Business and Technical Requirements

EMC BACKUP-AS-A-SERVICE

owncloud Architecture Overview

Introduction to Endpoint Security

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

CA Single Sign-On r12.x (CA SiteMinder) Implementation Proven Professional Exam

NetIQ Identity Manager Setup Guide

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...

How to Backup and Restore a VM using Veeam

Using LDAP Authentication in a PowerCenter Domain

Connect for Dragon Medical 360 Network Edition. Administrator Guide

Connection Broker Managing User Connections to Workstations, Blades, VDI, and More. Quick Start with Microsoft Hyper-V

VMware vsphere Data Protection

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

Manage all your Office365 users and licenses

Group Management Server User Guide

Hypertable Architecture Overview

PRIVACY, SECURITY AND THE VOLLY SERVICE

Diagram 1: Islands of storage across a digital broadcast workflow

Web Application Hosting Cloud Architecture

EMC SYNCPLICITY FILE SYNC AND SHARE SOLUTION

TREENO ELECTRONIC DOCUMENT MANAGEMENT. Administration Guide

Technical Overview Simple, Scalable, Object Storage Software

Data Collection and Analysis: Get End-to-End Security with Cisco Connected Analytics for Network Deployment

Web Applications Access Control Single Sign On

Jitterbit Technical Overview : Microsoft Dynamics CRM

Configuring Failover

Using the vcenter Orchestrator Plug-In for Microsoft Active Directory

High Availability for Citrix XenApp

AVALANCHE MC 5.3 AND DATABASE MANAGEMENT SYSTEMS

Using EMC Documentum with Adobe LiveCycle ES

Symantec NetBackup OpenStorage Solutions Guide for Disk

SHARPCLOUD SECURITY STATEMENT

Author: Ryan J Adams. Overview. Policy Based Management. Terminology

Configuration Guide. BlackBerry Enterprise Service 12. Version 12.0

Basic & Advanced Administration for Citrix NetScaler 9.2

Active Directory Compatibility with ExtremeZ-IP. A Technical Best Practices Whitepaper

LDAP Directory Integration with Cisco Unity Connection

Features of AnyShare

Daymark DPS Enterprise - Agentless Cloud Backup and Recovery Software

Sisense. Product Highlights.

Windows Server on WAAS: Reduce Branch-Office Cost and Complexity with WAN Optimization and Secure, Reliable Local IT Services

The syslog-ng Store Box 3 F2

Pluggable Rule Engine

LISTSERV LDAP Documentation

VMware Mirage Web Manager Guide

White Paper. Anywhere, Any Device File Access with IT in Control. Enterprise File Serving 2.0

UIT USpace Flexible and Secure File Manager for Cloud Storage

High Availability Essentials

SharePoint 2010 Interview Questions-Architect

GRAVITYZONE HERE. Deployment Guide VLE Environment

Workflow Templates Library

The syslog-ng Store Box 3 LTS

Oracle Database Security and Audit

Gladinet Cloud Enterprise

FileMaker Security Guide The Key to Securing Your Apps

Synchronization Agent Configuration Guide

The Encryption Anywhere Data Protection Platform

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

Five Steps to Integrate SalesForce.com with 3 rd -Party Systems and Avoid Most Common Mistakes

Cloud Storage Backup for Storage as a Service with AT&T

Avatier Identity Management Suite

How To Install Powerpoint 6 On A Windows Server With A Powerpoint 2.5 (Powerpoint) And Powerpoint On A Microsoft Powerpoint 4.5 Powerpoint (Powerpoints) And A Powerpoints 2

Content Distribution Management

FINAL DoIT v.8 APPLICATION SECURITY PROCEDURE

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

How To Take Advantage Of Active Directory Support In Groupwise 2014

Oracle Directory Services Integration with Database Enterprise User Security O R A C L E W H I T E P A P E R F E B R U A R Y

Installation and Setup: Setup Wizard Account Information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Jitterbit Technical Overview : Salesforce

Course 2788A: Designing High Availability Database Solutions Using Microsoft SQL Server 2005

Cisco Application Networking Manager Version 2.0

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Server Installation ZENworks Mobile Management 2.7.x August 2013

Release System Administrator s Guide

FileDrawer An Enterprise File Sharing and Synchronization (EFSS) solution.

ZooKeeper. Table of contents

Configuration Guide. BES12 Cloud

Transcription:

Technical Overview ~ a ~ irods version 4.x

The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number of storage systems located anywhere, while maintaining redundancy and security, and exercise precise control over their data with extensible rules that ensure the data is archived, described, and replicated in accordance with their needs and schedule. irods empowers users by supporting virtualization, which provides a one stop shop for all data regardless of the heterogeneity of storage devices, data discovery through the use of descriptive metadata, workflow automation, and data sharing between collaborating or distributed teams. This document provides a brief overview of the irods architecture to help you consider whether irods is the right solution for your data management needs. It will describe the irods Metadata Catalog (icat), a database that stores all the valuable information about your data, the icat-enabled Server (IES), and optional Resource Servers which facilitate data management and replication throughout your storage infrastructure. Then, the document will describe facets of the architecture that may be customized to precisely control and manage data. These facets include policies, rules, plugins, and Composable Resources. This document concludes with some questions that may be helpful for determining if irods will serve your needs, followed by pointers to other helpful information about irods. ~ 1 ~

AN irods ZONE Each irods deployment or Zone is composed of an irods Metadata Catalog (icat), an icat-enabled Server (IES), and optional Resource Servers. The icat is a relational database (i.e., a SQL database) that holds all the information about your data, users, and zone that the irods servers IES and any Resource Servers need to facilitate the management and sharing of your data. The icat contains the following information: Data object (i.e., file) location and access information A virtual file system (i.e., mapping of logical directory paths to actual physical storage paths), which provides a unified namespace across various underlying storage systems. Resource configuration information hostnames and credentials for accessing your data on physical storage. Metadata about individual data objects and entire data collections. For example, author names, file names, keywords, abstracts, etc. whatever descriptors or metadata schemes you choose can be applied to your data. User details such as userids, roles, passwords, and home directories. Zone details for sharing your zone s data with collaborators from other zones. These details include the name of your zone, the domain name of your IES, and the network port number of irods servers in your zone. Currently, PostgreSQL, MySQL, and Oracle are the database technologies that may be used to implement an icat database. The irods server software allows users to store and retrieve files on physical storage devices. To take full advantage of irods distributed data capabilities, additional Resource Servers may be added to your Zone s architecture. Using multiple Resource Servers can enhance the performance, security, and resilience of a Zone by providing redundancy, both within a single location and distributed geographically. All irods servers IES and Resource Servers run the same core code and are peers. Each server may have its own set of policies, rules, and plugins (ways to customize your irods Zone, discussed later in this document). However, to facilitate data management, Resource Servers must communicate with the IES, because it holds the connection to the icat. ~ 2 ~

irods ZONE icat Database Resource Server1 IES PHYSICAL STORAGE COMPOSABLE RESOURCES * Resource ServerN 10100101011101010100010111100101010101010101010101110101100 Coordinating Resources 11010101011101010010101110101010001011110010101010101010101 010111010110 01101010101110 10100101011101010100010111100101 010101010101010101110101100 1101010101110 101001010111010100 001011110010101010101010101010111010110 01101010101110 10100 10101110101010001011110010101010101010101010111010110011011 01010111010100101011101010100010111100101010101010101010101 Storage Resources 110101100 11010101011101010010101110101010001011110010101010 Storage Device1 Storage Device N *You may arrange composable resources according to your needs. ~3~

POLICIES Your organization has data management policies that specify processes for access control, backup, data migration, data preparation, metadata extraction, and more. These organizational policies can be implemented as irods policies, which can help you customize and automate these data management tasks. irods policies consist of a rule or set of rules that are triggered based on a condition or set of conditions. These conditions may pertain to timing, user role, physical location, etc. Rules Rules are written in irods domain-specific rule language, which uses many familiar programming constructs (e.g., loops, conditional statements), making it easy for your organization s developers to construct rules to satisfy your data needs. Rules are carried out by the irods rule engine a built-in interpreter for the irods rule language that governs the sequence of data management actions in your irods zone. The IES and each Resource Server run an instance of the rule engine. Out of the box, the rule engine comes loaded with an array of basic actions (e.g., error reporting, login protocols) to get you up and running. As you begin to dig into irods, you can add rules of your own to tailor a data management program that works for you. Rules usually call one or more microservices pre-packaged C++ programs designed to accomplish specific tasks. Over 350 microservices have been contributed to the irods codebase by the user community and development staff. Using irods microservices, rather than calling a standalone external program, enables enforcement of irods access control policies, code verification for auditing and scientific reproducibility, and coordinated code execution across multiple irods servers. You will notice in the example rule on the next page that one microservice (msidataobjrepl) is called. Microservices are denoted by the prefix msi. They are a specific class of plugins, discussed in the Plugins section of this document. rules enable Access Control Rules can specify who can access data, for what activity (i.e., read or write), and when they can access it. Backup and Data Migration Rules can manage file replication and transfer among storage devices or locales, depending on any parameter, such as date of creation or last access. Data Conversion Rules can convert files between file formats or different versions of the same format, such as when you need an Excel spreadsheet saved as XML. Data Preparation Rules can clean and prep your data prior to use; for example, eliminating duplicate data, removing null values, performing spell checks, indexing data, subsetting data into more manageable chunks, or any pre-processing that meets your needs. Metadata Extraction Rules can extract relevant metadata from files to improve search, so you are not stuck remembering file names. ~ 4 ~

An example rule ReplicateAllFilesWithExtension { *Query = SELECT COLL_NAME, DATA_NAME WHERE DATA_NAME LIKE "%.*Extension"; foreach(*row in *Query) { *SourceFile = *Row.COLL_NAME++"/"++*Row.DATA_NAME; msidataobjrepl(*sourcefile,"destrescname=*resource",*status); writeline("stdout","replicated *SourceFile to *Resource"); } } INPUT *Extension="txt", *Resource="backupResc" OUTPUT ruleexecout The example rule above locates every file with the.txt extension and creates a replica of each located file on the resource called backupresc. Policy Enforcement Points (PEPs) Rules are executed based on conditions or, in irods parlance, Policy Enforcement Points (PEPs). Consider, for example, a rule to transfer ownership of data objects to the project manager when a user is deleted; the trigger or PEP is the deletion of the user. Similarly, rules could be written to extract metadata or pre-process data whenever a file is uploaded to a storage device. Or, upon access to particular data objects, a rule can create a log of the event, send an email notification to the project manager, or perform some other task you need to occur as a result of the data s access. As illustrated in the diagram below, every client-side operation (e.g., storing a file or iput, retrieving a file or iget) activates a PEP which can trigger any action you need irods to take to manage your data and metadata. The chaining of rules and PEPs allows you to create powerful, customized workflows that help you automate, save time and energy, and prevent human error. INSIDE irods iput iget irepl API RULE ENGINE icat DATABASE User or time-initiated client-side operations (i.e., icommands). Policy Policy Enforcement Points ~ 5 ~

PLUGINS In addition to rules, an irods installation can be customized with plugins. Plugins are used to implement core irods functions, such as authentication, communication over the Internet, communication with storage devices, and more. The use of plugins allows administrators to tailor irods to their needs and to the existing infrastructure, without having to re-compile code. Plugins also make it possible to upgrade small portions of irods without interfering with core functions. Plugins interact with irods core code through six interfaces. You have already read about one pluggable interface, microservices. The other five are authentication, communication with the icat database, network transport, the application programming interface (API), and communication with storage devices. irods ships with a default set of plugins; others must be installed separately, if you choose to use them. Interface Authentication Database Network API Microservices Composable Resources Plugins Native irods password access OSAuth Pluggable Authentication Module (PAM) Grid Security Infrastructure (GSI)* LDAP via PAM Oracle* PostgreSQL* MySQL* Transmission Control Protocol (TCP) Secure Socket Layer (SSL) An irods application programming interface to enable storage and retrieval of data objects from another application. There are over 350 available, covering a variety of functions. There are two kinds of Composable Resources in irods: Coordinating and Storage. Both are discussed in more detail in the next section. *External plugins that may be installed ~ 6 ~

COMPOSABLE RESOURCES Composable Resources are plugins that allow you to create rich decision trees for managing storage and retrieval of data on storage devices. There are two types of Composable Resources: Coordinating and Storage. Coordinating Resources actively make decisions about which physical device will receive or serve up a data object. Storage Resources are the logical representations of or pointers to physical storage devices. They are composed of five parts: (a) the name you give the resource, (b) a host name (e.g., willow. irods.renci.org), (c) a directory path to the exact location on the storage device (e.g., /full/path/to/storage), (d) the resource type (e.g., Amazon S3), and (e) a plugin-specific context string (e.g., the name of the file containing access credentials or any persistent information the plugin may require). One way to think about Composable Resources is to consider Coordinating Resources as the branch nodes of your decision tree and Storage Resources as the leaves. (See the irods Zone figure.) Examples of Coordinating Resources include: Replication Round Robin Load Balanced Compound A replication resource keeps its children in sync with identical copies or replicas of data objects. A round robin resource will rotate through its children for each upload to the system. A load balanced resource attempts to balance storage and retrieval across children to avoid taxing the servers (e.g., available space, CPU usage, network traffic). A compound resource manages two children, in the roles of Cache and Archive. The Cache resource provides a standard UNIX file system interface (i.e., POSIX) to an Archive that may not natively support this type of access. ~ 7 ~

Examples of Storage Resources that may be used in an irods installation: Unix File System Amazon S3* Web Object Scalar (WOS)* Ceph RADOS* Direct Access Universal Mass Storage This type of storage resource communicates with a storage device through the standard POSIX interface. Amazon s cloud storage service. DataDirect Networks block storage appliance. Designed for efficient cacheless access to a Ceph RADOS block storage cluster. Designed for access to shared high performance computing (HPC) storage. For use with Compound resources, such as tape-based archives. *External plugins that may be installed IS irods RIGHT FOR ME? irods provides a customizable solution for your data management needs. Questions you may want to ask yourself when evaluating irods as a potential solution include: Do I need to share, back up, or migrate data between different types of storage devices and/or different geographic locations now or in the future? Do I need to move data easily between local and cloud-based storage devices, or file and block storage devices? Could my collaborators and I benefit from accessing and managing data in a unified namespace? Do I need to automate data management activities (e.g., data replication, load-balancing) and/or could I take advantage of data processing workflows to streamline data management? Do I need to implement and verify compliance with regulatory data retention and privacy policies (e.g., HIPAA, Sarbanes-Oxley, or other federal regulations)? Do I need to provide data description and annotation (e.g., metadata) for improved search, sharing, or data provenance tracking? irods can help you fulfill all of these data management needs. Also consider your future needs. Maybe you don t need to share data with collaborators or outside organizations yet; but is it possible that you might need to scale up at some point to do so? Maybe you will want to migrate storage hardware some day. irods can grow with your organization as your data needs grow. For more information: irods website: http://irods.org irods Consortium: http://irods.org/consortium irods documentation: http://irods.org/documentation irods code on GitHub: https://github.com/irods ~ 8 ~