Data Management using irods



Similar documents
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

Automated and Scalable Data Management System for Genome Sequencing Data

Technical. Overview. ~ a ~ irods version 4.x

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

irods at CC-IN2P3: managing petabytes of data

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

The National Consortium for Data Science (NCDS)

Integrated Rule-based Data Management System for Genome Sequencing Data

Managing Next Generation Sequencing Data with irods

irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Object storage in Cloud Computing and Embedded Processing

THE CCLRC DATA PORTAL

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Grid Sun Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable

GridFTP: A Data Transfer Protocol for the Grid

Technology solutions for managing and computing on largescale biomedical data

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK wataru.takase@kek.jp

EnduraData Cross Platform File Replication and Content Distribution (November 2010) A. A. El Haddi, Member IEEE, Zack Baani, MSU University

IRODS use case : Ciment, the Univ. Grenoble-Alpes HPC center. B.Bzeznik / X.Briand Irods users group meeting 11/06/2015

Diagram 1: Islands of storage across a digital broadcast workflow

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Data-Intensive Science and Scientific Data Infrastructure

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

Report of the DTL focus meeting on Life Science Data Repositories

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5

European Data Infrastructure - EUDAT Data Services & Tools

Protecting Official Records as Evidence in the Cloud Environment. Anne Thurston

Accelerating and Simplifying Apache

How To Manage Research Data At Columbia

XenData Archive Series Software Technical Overview

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Pluggable Rule Engine

WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM Full Spectrum Object Storage

T a c k l i ng Big Data w i th High-Performance

Open Directory. Apple s standards-based directory and network authentication services architecture. Features

ECMWF HPC Workshop: Accelerating Data Management

Data Grids. Lidan Wang April 5, 2007

UNISOL SysAdmin. SysAdmin helps systems administrators manage their UNIX systems and networks more effectively.

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network

DDN updates object storage platform as it aims to break out of HPC niche

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

The software platform for storing, preserving and sharing very large data sets.

The THREDDS Data Repository: for Long Term Data Storage and Access

Enterprise Digital Identity Architecture Roadmap

Long term retention and archiving the challenges and the solution

ETERNUS CS High End Unified Data Protection

EMC BACKUP MEETS BIG DATA

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Redefining Oracle Database Management

File Services. File Services at a Glance

WHAT S NEW WITH EMC NETWORKER

Globus and the Centralized Research Data Infrastructure at CU Boulder

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

WOS. High Performance Object Storage

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Project Number: Project Title: Human Brain Project. HBP_SP13_EPFL_ _D13.3.2_Final.docx

Open Source Backup with Amanda

Cisco UCS Central Software

Transcription:

Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC a.heyrovsky@epcc.ed.ac.uk

2 Course outline Why talk about irods? What is irods? The main features of irods What can irods do? Who uses irods? After completing this lesson, you should: Have an overview of irods Know what irods can be used for

3 Why talk about irods? It is a data management system widely used by many organizations worldwide (including EPCC) It is open source software It is being actively developed and supported It is free

4 What is irods? irods stands for Integrated Rule-Oriented Data System It is an open-source data grid middleware As per Wikipedia (http://en.wikipedia.org/wiki/data_grid): A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. It is developed and supported by the irods Consortium

5 The main features of irods Supports large numbers of users (1000s) and user groups in a single data grid Supports heterogeneous data storage resources, e.g.: Unix File Systems Amazon S3 buckets DataDirect Networks (DDN) Web Object Scaler (WOS) appliances High Performance Storage System (HPSS) data stores And other storage resources, more are being developed Files stored in these heterogeneous storage resources are exposed to users in a single unified namespace

The main features of irods 6 irods Unified Virtual Collection irods View of Distributed Data User Client User sees a single collection My Data: disk, filesystem, site- specific storage,... My Data: tape, database, filesystem,... Partner s Data remote disk, tape, filesystem, site- specific storage, irods installs over heterogeneous data resources Access and manage distributed data as a single collection

7 The main features of irods Handles big data (petabytes) A high-performance network data transfer protocol Parallel I/O for large files Comparable to GridFTP A metadata catalogue named icat Stores system metadata and user-defined metadata Manages access control Manages mappings between logical and physical name spaces And some other services Easy backup and replication to multiple storage devices and locations

8 The main features of irods Security - Authentication irods usernames / passwords Supports Pluggable Authentication Modules (PAM) can use an LDAP authentication server Grid Security Infrastructure (GSI) provides authentication using X.509 digital certificates Kerberos Shibboleth

9 The main features of irods A Rule Engine Enables automation of data operations, e.g. Validating file checksums, backing up files, archiving unused data, logging data operations, file access permissions, etc. Implements / enforces data management policies, e.g. Records retention and privacy protection policies Audit trails to verify compliance with policies Enables rule-based workflows Data grid federation Independent data grids can be federated with one another to allow controlled access to remote grids operated by separate workgroups

10 The main features of irods irods Client Applications and APIs More than 50 Command line clients (e.g. irods i-commands) Web clients (e.g. idrop Web) idrop Desktop a desktop GUI client PyRods a Python client API to irods Jargon a Java client API to irods Prods a PHP client API to irods Custom clients

A RENCI Data Grid 11

12 What can irods do? For Data Centre Managers it simplifies data grid management For Users it simplifies data discovery, data validation and data processing Data Preservation Digital Archives Data Maintenance Data Sharing and Access Policy Enforcement Data Protection and Security Data Curation Digital Libraries Automated Data Processing Distributed Data Management

13 Who uses irods? Science and Engineering Domains, e.g. Astrophysics Auger supernova search Atmospheric science NASA Langley Atmospheric Sciences Center Biology Phylogenetics at CC IN2P3 Climate NOAA National Climatic Data Center Cognitive Science Temporal Dynamics of Learning Center Computer Science GENI experimental network Cosmic Ray AMS experiment on the International Space Station Dark Matter Physics Edelweiss II Earth Science NASA Center for Climate Simulations Ecology CEED Caveat Emptor Ecological Data Engineering CIBER-U High Energy Physics BaBar / Stanford Linear Accelerator Hydrology Institute for the Environment, UNC-CH; Hydroshare Genomics Broad Institute, Wellcome Trust Sanger Institute, NGS Medicine Sick Kids Hospital Neuroscience International Neuroinformatics Coordinating Facility Neutrino Physics T2K and dchooz neutrino experiments Oceanography Ocean Observatories Initiative Optical Astronomy National Optical Astronomy Observatory Particle Physics Indra multi-detector collaboration at IN2P3 Plant genetics the iplant Collaborative

14 Who uses irods? Science and Engineering Domains, e.g. Quantum Chromodynamics IN2P3 Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio Seismology Southern California Earthquake Center Social Science Odum, TerraPop Arts and Humanities Domains, e.g. Digital Library French National Library, Texas Digital Libraries Indexing Cheshire Institutional repository Carolina Digital Repository Preservation Adonis Reference collections SILS LifeTime Library Commercial Users, e.g. DOW Chemical Beijing Genome Institute and many others, e.g. the cross-domain European Data Infrastructure (EUDAT) consortium

15 Summary irods is a data grid management system It is scalable It can manage millions of files and millions of metadata annotations totalling petabytes of data It can support thousands of users It is widely used by many organizations There are other data grid management systems with similar features, e.g. DSpace Fedora Commons

Acknowledgements 16 Thanks to the irods Consortium for providing materials for this lecture.