Data Management using irods
|
|
|
- Silvester Armstrong
- 9 years ago
- Views:
Transcription
1 Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC
2 2 Course outline Why talk about irods? What is irods? The main features of irods What can irods do? Who uses irods? After completing this lesson, you should: Have an overview of irods Know what irods can be used for
3 3 Why talk about irods? It is a data management system widely used by many organizations worldwide (including EPCC) It is open source software It is being actively developed and supported It is free
4 4 What is irods? irods stands for Integrated Rule-Oriented Data System It is an open-source data grid middleware As per Wikipedia ( A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. It is developed and supported by the irods Consortium
5 5 The main features of irods Supports large numbers of users (1000s) and user groups in a single data grid Supports heterogeneous data storage resources, e.g.: Unix File Systems Amazon S3 buckets DataDirect Networks (DDN) Web Object Scaler (WOS) appliances High Performance Storage System (HPSS) data stores And other storage resources, more are being developed Files stored in these heterogeneous storage resources are exposed to users in a single unified namespace
6 The main features of irods 6 irods Unified Virtual Collection irods View of Distributed Data User Client User sees a single collection My Data: disk, filesystem, site- specific storage,... My Data: tape, database, filesystem,... Partner s Data remote disk, tape, filesystem, site- specific storage, irods installs over heterogeneous data resources Access and manage distributed data as a single collection
7 7 The main features of irods Handles big data (petabytes) A high-performance network data transfer protocol Parallel I/O for large files Comparable to GridFTP A metadata catalogue named icat Stores system metadata and user-defined metadata Manages access control Manages mappings between logical and physical name spaces And some other services Easy backup and replication to multiple storage devices and locations
8 8 The main features of irods Security - Authentication irods usernames / passwords Supports Pluggable Authentication Modules (PAM) can use an LDAP authentication server Grid Security Infrastructure (GSI) provides authentication using X.509 digital certificates Kerberos Shibboleth
9 9 The main features of irods A Rule Engine Enables automation of data operations, e.g. Validating file checksums, backing up files, archiving unused data, logging data operations, file access permissions, etc. Implements / enforces data management policies, e.g. Records retention and privacy protection policies Audit trails to verify compliance with policies Enables rule-based workflows Data grid federation Independent data grids can be federated with one another to allow controlled access to remote grids operated by separate workgroups
10 10 The main features of irods irods Client Applications and APIs More than 50 Command line clients (e.g. irods i-commands) Web clients (e.g. idrop Web) idrop Desktop a desktop GUI client PyRods a Python client API to irods Jargon a Java client API to irods Prods a PHP client API to irods Custom clients
11 A RENCI Data Grid 11
12 12 What can irods do? For Data Centre Managers it simplifies data grid management For Users it simplifies data discovery, data validation and data processing Data Preservation Digital Archives Data Maintenance Data Sharing and Access Policy Enforcement Data Protection and Security Data Curation Digital Libraries Automated Data Processing Distributed Data Management
13 13 Who uses irods? Science and Engineering Domains, e.g. Astrophysics Auger supernova search Atmospheric science NASA Langley Atmospheric Sciences Center Biology Phylogenetics at CC IN2P3 Climate NOAA National Climatic Data Center Cognitive Science Temporal Dynamics of Learning Center Computer Science GENI experimental network Cosmic Ray AMS experiment on the International Space Station Dark Matter Physics Edelweiss II Earth Science NASA Center for Climate Simulations Ecology CEED Caveat Emptor Ecological Data Engineering CIBER-U High Energy Physics BaBar / Stanford Linear Accelerator Hydrology Institute for the Environment, UNC-CH; Hydroshare Genomics Broad Institute, Wellcome Trust Sanger Institute, NGS Medicine Sick Kids Hospital Neuroscience International Neuroinformatics Coordinating Facility Neutrino Physics T2K and dchooz neutrino experiments Oceanography Ocean Observatories Initiative Optical Astronomy National Optical Astronomy Observatory Particle Physics Indra multi-detector collaboration at IN2P3 Plant genetics the iplant Collaborative
14 14 Who uses irods? Science and Engineering Domains, e.g. Quantum Chromodynamics IN2P3 Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio Seismology Southern California Earthquake Center Social Science Odum, TerraPop Arts and Humanities Domains, e.g. Digital Library French National Library, Texas Digital Libraries Indexing Cheshire Institutional repository Carolina Digital Repository Preservation Adonis Reference collections SILS LifeTime Library Commercial Users, e.g. DOW Chemical Beijing Genome Institute and many others, e.g. the cross-domain European Data Infrastructure (EUDAT) consortium
15 15 Summary irods is a data grid management system It is scalable It can manage millions of files and millions of metadata annotations totalling petabytes of data It can support thousands of users It is widely used by many organizations There are other data grid management systems with similar features, e.g. DSpace Fedora Commons
16 Acknowledgements 16 Thanks to the irods Consortium for providing materials for this lecture.
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill [email protected] http://irods.org/ SESYNC Model Integration Workshop Important
Automated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI! Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Current
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Directed by Stan Ahalt, formerly
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano [email protected] Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
irods at CC-IN2P3: managing petabytes of data
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h
irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire [email protected] 25th
irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire [email protected] Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...
The National Consortium for Data Science (NCDS)
The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?
Integrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
Managing Next Generation Sequencing Data with irods
Managing Next Generation Sequencing Data with irods Presented by Dan Bedard // [email protected] at the 9 th International Conference on Genomics Shenzhen, China September 12, 2014 Managing NGS Data with
irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI
irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI Acknowledgements Presented work funded in part by grants from NIH, NSF, NARA, DHS, as
WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief
DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud
Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories
irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories Reagan W. Moore Arcot Rajasekar Mike Wan {moore,sekar,mwan}@diceresearch.org h;p://irods.diceresearch.org
Distributed File Systems An Overview. Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG
Distributed File Systems An Overview Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG Introduction A distributed file system allows shared, file based access without sharing disks History starts in 1960s
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
THE CCLRC DATA PORTAL
THE CCLRC DATA PORTAL Glen Drinkwater, Shoaib Sufi CCLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire, WA4 4AD, UK. E-mail: [email protected], [email protected] Abstract: The project aims
Deploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
Grid Computing @ Sun Carlo Nardone. Technical Systems Ambassador GSO Client Solutions
Grid Computing @ Sun Carlo Nardone Technical Systems Ambassador GSO Client Solutions Phases of Grid Computing Cluster Grids Single user community Single organization Campus Grids Multiple user communities
Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable
DDN Whitepaper Putting Genomes in the Cloud with WOS TM Making data sharing faster, easier and more scalable Table of Contents Cloud Computing 3 Build vs. Rent 4 Why WOS Fits the Cloud 4 Storing Sequences
GridFTP: A Data Transfer Protocol for the Grid
GridFTP: A Data Transfer Protocol for the Grid Grid Forum Data Working Group on GridFTP Bill Allcock, Lee Liming, Steven Tuecke ANL Ann Chervenak USC/ISI Introduction In Grid environments,
Technology solutions for managing and computing on largescale biomedical data
Technology solutions for managing and computing on largescale biomedical data Charles Schmitt CTO & Director of Informatics RENCI Brand Fortner Executive Director, irods Consortium Jason Coposky Chief
PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK E-mail: [email protected]
SCALA: A Framework for Graphical Operations for irods KEK E-mail: [email protected] Adil Hasan University of Liverpool E-mail: [email protected] Yoshimi Iida KEK E-mail: [email protected] Francesca
EnduraData Cross Platform File Replication and Content Distribution (November 2010) A. A. El Haddi, Member IEEE, Zack Baani, MSU University
1 EnduraData Cross Platform File Replication and Content Distribution (November 2010) A. A. El Haddi, Member IEEE, Zack Baani, MSU University Abstract In this document, we explain the various configurations
IRODS use case : Ciment, the Univ. Grenoble-Alpes HPC center. B.Bzeznik / X.Briand Irods users group meeting 11/06/2015
IRODS use case : Ciment, the Univ. Grenoble-Alpes HPC center B.Bzeznik / X.Briand Irods users group meeting 11/06/2015 IRODS rocks! We like rocks here... Irods is used (famous) in the French Alps since
Diagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery
Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja
Data-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.
Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora
Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora David Pcolar Carolina Digital Repository (CDR) [email protected] Alexandra Chassanoff School of Information &
Report of the DTL focus meeting on Life Science Data Repositories
Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity
How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5
Cisco MDS 9000 Family Solution for Cloud Storage All enterprises are experiencing data growth. IDC reports that enterprise data stores will grow an average of 40 to 60 percent annually over the next 5
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
Protecting Official Records as Evidence in the Cloud Environment. Anne Thurston
Protecting Official Records as Evidence in the Cloud Environment Anne Thurston Introduction In a cloud computing environment, government records are held in virtual storage. A service provider looks after
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
How To Manage Research Data At Columbia
An experience/position paper for the Workshop on Research Data Management Implementations *, March 13-14, 2013, Arlington Rajendra Bose, Ph.D., Manager, CUIT Research Computing Services Amy Nurnberger,
XenData Archive Series Software Technical Overview
XenData White Paper XenData Archive Series Software Technical Overview Advanced and Video Editions, Version 4.0 December 2006 XenData Archive Series software manages digital assets on data tape and magnetic
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 [email protected] San Diego Supercomputer Center
Pluggable Rule Engine
Pluggable Rule Engine CurateGear2016 Terrell Russell, Ph.D. @terrellrussell Senior Data Scientist, irods Consortium Renaissance Computing Institute (RENCI), UNC-Chapel Hill 1 2 irods Consortium The irods
WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM 1.800.837.2298. 360 Full Spectrum Object Storage
PRODUCT BROCHURE WOS OBJECT STORAGE 360 Full Spectrum Object Storage The promise of object storage is simple: to enable organizations to build highly Performance Scalability Reliability Efficiency Security
T a c k l i ng Big Data w i th High-Performance
Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A
Open Directory. Apple s standards-based directory and network authentication services architecture. Features
Open Directory Apple s standards-based directory and network authentication services architecture. Features Scalable LDAP directory server OpenLDAP for providing standards-based access to centralized data
ECMWF HPC Workshop: Accelerating Data Management
October 2012 ECMWF HPC Workshop: Accelerating Data Management Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Glenn Wright Systems Architect, DDN Data-Driven Paradigm
Data Grids. Lidan Wang April 5, 2007
Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural
UNISOL SysAdmin. SysAdmin helps systems administrators manage their UNIX systems and networks more effectively.
1. UNISOL SysAdmin Overview SysAdmin helps systems administrators manage their UNIX systems and networks more effectively. SysAdmin is a comprehensive system administration package which provides a secure
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network November 16, 2010 Data Management Short Course Series Sponsored by the Odum Institute and the UNC Libraries Campus
DDN updates object storage platform as it aims to break out of HPC niche
DDN updates object storage platform as it aims to break out of HPC niche Analyst: Simon Robinson 18 Oct, 2013 DataDirect Networks has refreshed its Web Object Scaler (WOS), the company's platform for efficiently
EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers
EUDAT Towards a pan-european Collaborative Data Infrastructure Willem Elbers EUDAT / MPI-TLA Focus meeting: Data repositories SURF, Utrecht March 3, 2014 Outline EUDAT project EUDAT services Summary and
The software platform for storing, preserving and sharing very large data sets. www.active-circle.com
The software platform for storing, preserving and sharing very large data sets www.active-circle.com The easiest solution for storing and archiving very large data sets! ACTIVE CIRCLE HIGHLIGHTS Software-based
The THREDDS Data Repository: for Long Term Data Storage and Access
8B.7 The THREDDS Data Repository: for Long Term Data Storage and Access Anne Wilson, Thomas Baltzer, John Caron Unidata Program Center, UCAR, Boulder, CO 1 INTRODUCTION In order to better manage ever increasing
Enterprise Digital Identity Architecture Roadmap
Enterprise Digital Identity Architecture Roadmap Technical White Paper Author: Radovan Semančík Date: April 2005 (updated September 2005) Version: 1.2 Abstract: This document describes the available digital
Long term retention and archiving the challenges and the solution
Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process
ETERNUS CS High End Unified Data Protection
ETERNUS CS High End Unified Data Protection Optimized Backup and Archiving with ETERNUS CS High End 0 Data Protection Issues addressed by ETERNUS CS HE 60% of data growth p.a. Rising back-up windows Too
EMC BACKUP MEETS BIG DATA
EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for
Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research
Redefining Oracle Database Management
Redefining Oracle Database Management Actifio PAS Specification A Single Solution for Backup, Recovery, Disaster Recovery, Business Continuity and Rapid Application Development for Oracle. MAY, 2013 Contents
File Services. File Services at a Glance
File Services High-performance workgroup and Internet file sharing for Mac, Windows, and Linux clients. Features Native file services for Mac, Windows, and Linux clients Comprehensive file services using
WHAT S NEW WITH EMC NETWORKER
WHAT S NEW WITH EMC NETWORKER Unified Backup And Recovery Software 1 Why EMC NetWorker? Centralized Management Industry-Leading Data Deduplication Advanced Application Support Broad Backup-To-Disk Capabilities
Globus and the Centralized Research Data Infrastructure at CU Boulder
Globus and the Centralized Research Data Infrastructure at CU Boulder Daniel Milroy, [email protected] Conan Moore, [email protected] Thomas Hauser, [email protected] Peter Ruprecht,
A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise
WHITE PAPER A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise NOTICE This White Paper may contain proprietary information
WOS. High Performance Object Storage
Datasheet WOS High Performance Object Storage The Big Data explosion brings both challenges and opportunities to businesses across all industry verticals. Providers of online services are building infrastructures
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High
Project Number: 284941 Project Title: Human Brain Project. HBP_SP13_EPFL_14-0205_D13.3.2_Final.docx
Project Number: 284941 Project Title: Human Brain Project Document Title: Document Filename (1) : Deliverable Number: Deliverable Type: HBP Data Management Plan HBP_SP13_EPFL_14-0205_D13.3.2_Final.docx
Open Source Backup with Amanda
Open Source Backup with Amanda Peninsula Linux Users Group (Jan 2008) Paddy Sreenivasan [email protected] Copyright 2007 Zmanda, Inc. All rights reserved. 1 Amanda network backup and recovery Easy to use
Cisco UCS Central Software
Data Sheet Cisco UCS Central Software Cisco UCS Manager provides a single point of management for an entire Cisco Unified Computing System (Cisco UCS) domain of up to 160 servers and associated infrastructure.
