irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories
|
|
|
- Abraham McKenzie
- 10 years ago
- Views:
Transcription
1 irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories Reagan W. Moore Arcot Rajasekar Mike Wan h;p://irods.diceresearch.org
2 Abstract The integrated Rule Oriented Data System (irods) organizes distributed data into a sharable colleckon. The data may reside in cloud storage, in insktukonal repositories, in tape archives, in laptop file systems. We will demonstrate the enforcement of management policies across the mulkple storage locakons, access mechanisms ranging from web browsers to Fedora to Web DAV to EnginFrame interfaces, and types of asserkons that can be made on data in cloud storage.
3 Projects TPAP TransconKnental Persistent Archive Prototype PreservaKon research in collaborakon with NARA DCAPE Distributed Custodial Archival PreservaKon Environment CollaboraKon with State Archives to build preservakon policies PoDRI Policy Driven Repository Interoperability IMLS collaborakon with Duraspace to integrate Fedora / irods OOI Ocean Observatories IniKaKve NSF collaborakon to archive sensor data CDR Carolina Digital Repository UNC CH collaborakon to build an insktukonal repository TUCASI Triangle UniversiKes Center for Advanced Studies, Inc. CollaboraKon to build cyberinfrastructure between UNC, Duke, NCSU, RENCI
4 Overview of irods Architecture Overview of irods Data System User Can Search, Access, Add and Manage Data & Metadata irods Data Server Disk, Tape, etc. irods Server irods Rule Engine Track policies irods Metadata Catalog Track informa;on *Access data with Web based Browser or irods GUI or Command Line clients.
5 Integrated Rule Oriented Data System SoYware to organize distributed data into a sharable colleckon, while enforcing management policies Manage properkes of the shared colleckon Developed by DICE Center University of North Carolina at Chapel Hill 6 staff, 4 students University of California, San Diego 5 staff Funded by NSF OCI NARA Transcontinental Persistent Archives Prototype NSF SDCI Data Grids for Community Driven Applications
6 PreservaKon Concept Maintain properkes of records while uklizing new storage systems Records are permanent Federate storage across cloud, archives, file systems ImplicaKons infrastructure independence AcKve management of records MulKple indireckon mechanisms to protect records from dependencies upon technology ValidaKon of state informakon and parsing of audit trails to verify asserkons about records
7 Preservation Data VirtualizaKon Access Interface Standard Micro-services Data Grid Standard Operations Storage Protocol Map from tasks requested by the access method to a standard set of micro-services The standard microservices are mapped to standard operations. Operations are mapped to the protocol of the storage system Storage System View clients and storage as ephemeral
8 Policy based Record Management Express policies as computer ackonable rules Define explicit locakons in data management framework where policies will be enforced Express procedures as remotely executable microservices Create new preservakon services by linking micro services into a workflow Manage state informakon to track the applicakon of preservakon procedures Maintain audit trails to track evolukon of policies and procedures
9 iput With ReplicaKon Client iput data Institutional Repository metadata icat /<filesystem> Rule Base Data Metadata Cloud Storage metadata Rule Base Cache Data data Replication rule added to rule base
10 Policies Implemented Outside of the (Cloud) Storage AutomaKon of preservakon procedures DescripKve metadata extrackon CreaKon of archival form (AIP) TransformaKve migrakon AutomaKon of administrakve funckons DistribuKon, replicakon, retenkon, disposikon Report generakon usage, performance, error tracking Periodic validakon of assessment criteria Trustworthiness, integrity, authenkcity, chain of custody Parsing of audit trails for policy compliance over Kme
11 Management Framework Instrument data management infrastructure at all locakons where policy should be enforced (65 hooks) File create, open, read, write, delete CollecKon create, delete User create, modify, delete, group Resource create, modify, delete, group Metadata file modify, colleckon modify, descripkve ACL modify Support pre processing policy AuthorizaKon, seleckon, redireckon Support post processing policy Audit trails, redackon, derived product generakon
12 irods Distributed Data Management
13 Types of Rules Synchronous rules applied by framework at management hook locakons Stored in rule base; core.irb file Asynchronous rule that are queued for deferred or periodic execukon Batch system to manage queue InteracKvely executed rules defined by a user Executed through irule command
14 irods Rules Server side workflows AcKon condikon workflow chain recovery chain CondiKon test on any a;ribute: CollecKon, file name, storage system, file type, user group, elapsed Kme, IRB approval flag, descripkve metadata Workflow chain: Micro services / rules that are executed at the storage system Recovery chain: Micro services / rules that are used to recover from errors
15 Checksum ValidaKon Rule mychecksumrule{ msimakequery("data_name, COLL_NAME, DATA_CHECKSUM",*Condition,*Query); msiexecstrcondquery(*query,*b); assign(*a,0); foreachexec (*B) { msigetvalbykey(*b,coll_name,*c); msigetvalbykey(*b,data_name,*d); msigetvalbykey(*b,data_checksum,*e); msidataobjchksum(*b,*operation,*f); ifexec (*E!= *F) { writeline(stdout,file *C/*D has registered checksum *E and computed checksum *F); } else { assign(*a,*a + 1); } } ifexec(*a > 0) { writeline(stdout, have *A good files); } } *Condition can be COLL_NAME like /ils161/home/moore/genealogy/%
16 Highly Extensible To integrate cloud storage Wrote an S3 driver to execute the cloud storage protocol Implemented cloud storage as a compound resource Added disk cache in front of the cloud storage to enable enforcement of policies across all cloud accesses All irods clients can then be used to access data stored in the cloud Web browser, WebDav, FUSE, Kepler & Taverna workflows, Unix shell commands, C libraries, EnginFrame portal Can write policies to distribute data between cloud, file systems, archives, laptops. Can federate other data grids with the data grid that uses the cloud storage
17 NARA TransconKnental Persistent Archive Prototype Use data grid technology to build a preservakon environment Conduct research on preservakon concepts Infrastructure independence Enforcement of preservakon properkes AutomaKon of administrakve preservakon processes ValidaKon of preservakon assessment criteria Demonstrate preservakon on selected NARA digital holdings IntegraKon of generic infrastructure with preservakon technologies (Cheshire, MVD, JHOVE, Pronom, Fedora, Dspace)
18 NaKonal Archives and Records AdministraKon TransconKnental Persistent Archive Prototype Federation of Seven Independent Data Grids NARA I NARA II Georgia Tech Rocket Center U NC U Md UCSD MCAT MCAT MCAT MCAT MCAT MCAT MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products.
19 irods is a "coordinated NSF/OCI Nat'l Archives research ackvity" under the auspices of the President's NITRD Program Reagan W. Moore [email protected] h;p://irods.diceresearch.org NSF OCI NARA Transcontinental Persistent Archives Prototype NSF SDCI Data Grids for Community Driven Applications
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano [email protected] Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora
Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora David Pcolar Carolina Digital Repository (CDR) [email protected] Alexandra Chassanoff School of Information &
irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire [email protected] 25th
irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire [email protected] Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...
Data Management using irods
Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC [email protected] 2 Course outline Why talk about irods? What is irods?
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill [email protected] http://irods.org/ SESYNC Model Integration Workshop Important
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI! Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Current
Data Sharing with irods (integrated Rule-oriented Data System)
Data Sharing with irods (integrated Rule-oriented Data System) Richard Marciano Arcot Rajasekar Reagan Moore Lead Scientist Sustainable Archives & Library Technologies (SALT) lab director Data Intensive
Automated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
The National Consortium for Data Science (NCDS)
The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?
Abstract. 1. Introduction. irods White Paper 1
irods: integrated Rule Oriented Data System White Paper Data Intensive Cyber Environments Group University of North Carolina at Chapel Hill University of California at San Diego September 2008 Abstract
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network November 16, 2010 Data Management Short Course Series Sponsored by the Odum Institute and the UNC Libraries Campus
irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI
irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI Acknowledgements Presented work funded in part by grants from NIH, NSF, NARA, DHS, as
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Directed by Stan Ahalt, formerly
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 [email protected] San Diego Supercomputer Center
Data grid storage for digital libraries and archives using irods
Data grid storage for digital libraries and archives using irods Mark Hedges, Centre for e-research, King s College London eresearch Australasia, Melbourne, 30 th Sept. 2008 Background: Project History
irods at CC-IN2P3: managing petabytes of data
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h
Protecting Official Records as Evidence in the Cloud Environment. Anne Thurston
Protecting Official Records as Evidence in the Cloud Environment Anne Thurston Introduction In a cloud computing environment, government records are held in virtual storage. A service provider looks after
Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai
GSAW 2010 Working Group Session 11D Eucalyptus-Based Event Correlation Nehal Desai Member of the Tech. Staff, CSD/CSTS/CSRD, The Aerospace Corporation Dr. Craig A. Lee, [email protected] Senior Scientist, CSD/CSTS/CSRD,
CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION
1 E N D U R A D A T A EDpCloud: A File Synchronization, Data Replication and Wide Area Data Distribution Solution CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION 2 Resilient
PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK E-mail: [email protected]
SCALA: A Framework for Graphical Operations for irods KEK E-mail: [email protected] Adil Hasan University of Liverpool E-mail: [email protected] Yoshimi Iida KEK E-mail: [email protected] Francesca
Oracle Database 10g: Backup and Recovery 1-2
Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-3 What Is Backup and Recovery? The phrase backup and recovery refers to the strategies and techniques that are employed
PASIG May 12, 2012. Jacob Farmer, CTO Cambridge Computer
Adding Intelligence to Conventional NAS and File Systems: Metadata, Backups, and Data Life Cycle Management PASIG May 12, 2012 Presented by: Jacob Farmer, CTO Cambridge Computer Copyright 2009-2011, Cambridge
Managing Next Generation Sequencing Data with irods
Managing Next Generation Sequencing Data with irods Presented by Dan Bedard // [email protected] at the 9 th International Conference on Genomics Shenzhen, China September 12, 2014 Managing NGS Data with
Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
Using the NDMP File Service for DMA- Driven Replication for Disaster Recovery. Hugo Patterson
Using the NDMP File Service for DMA- Driven Replication for Disaster Recovery Hugo Patterson Connectathon 2006 Data Protection Today Clients DMA App Server Tape library Offsite Storage Backup to tape Put
System Administration of Windchill 10.2
System Administration of Windchill 10.2 Overview Course Code Course Length TRN-4340-T 3 Days In this course, you will gain an understanding of how to perform routine Windchill system administration tasks,
How To Understand The Nature Of Big Data
Big Data is Coming for You W. Christopher Lenhardt RENCI DAARWG, Chair Outline A few words about RENCI Introduction: On the Nature of BIG Big Challenges Big Science Questions Big Data Other Big Trends
Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria
Why archiving erecords influences the creation of erecords Martin Stürzlinger scopepartner Vienna, Austria Electronic Records In a Productive System Created Used Changed Deleted In an Archival System No
SEP sesam Backup & Disaster Recovery Overview
SEP sesam Backup & Disaster Recovery Overview Lanai Bayne VP of Business Development SEP Software Corp. [email protected] Christian Ruoff Sr. Technical Support Manager SEP Software Corp. [email protected] Why SEP
Deploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
<Insert Picture Here> Solution Direction for Long-Term Archive
1 Solution Direction for Long-Term Archive Donna Harland Oracle Optimized Solutions: Solutions Architect Program Agenda Archive Layers SAM QFS connectivity for
Why long time storage does not equate to archive
Why long time storage does not equate to archive Jos van Wezel HUF Toronto 2015 STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz
Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery
Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High
Best Practices for Managing and Monitoring SAS Data Management Solutions. Gregory S. Nelson
Best Practices for Managing and Monitoring SAS Data Management Solutions Gregory S. Nelson President and CEO ThotWave Technologies, Chapel Hill, North Carolina ABSTRACT... 1 INTRODUCTION... 1 UNDERSTANDING
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
September 2009 Cloud Storage for Cloud Computing
September 2009 Cloud Storage for Cloud Computing This paper is a joint production of the Storage Networking Industry Association and the Open Grid Forum. Copyright 2009 Open Grid Forum, Copyright 2009
Smart Storage and Preservation: How Digital Repositories Can Participate
Smart Storage and Preservation: How Digital Repositories Can Participate Sally Rumsey: University of Oxford Steve Hitchcock: University of Southampton Neil Jefferies: University of Oxford Adrian Brown:
Private Cloud Storage for Media Applications. Bang Chang Vice President, Broadcast Servers and Storage [email protected]
Private Cloud Storage for Media Bang Chang Vice President, Broadcast Servers and Storage [email protected] Table of Contents Introduction Cloud Storage Requirements Application transparency Universal
A DICOM-based Software Infrastructure for Data Archiving
A DICOM-based Software Infrastructure for Data Archiving Dhaval Dalal, Julien Jomier, and Stephen R. Aylward Computer-Aided Diagnosis and Display Lab The University of North Carolina at Chapel Hill, Department
ENTERPRISE DOCUMENT MANAGEMENT SYSTEM
A Scalable Document Management for all businesses EDMS is a powerful and cost effective document management that allows businesses to centralize management, storage, collaboration, retrieval and archiving
CatDV Pro Workgroup Serve r
Architectural Overview CatDV Pro Workgroup Server Square Box Systems Ltd May 2003 The CatDV Pro client application is a standalone desktop application, providing video logging and media cataloging capability
Integrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
Ex Libris Rosetta: A Digital Preservation System Product Description
Ex Libris Rosetta: A Digital Preservation System Product Description CONFIDENTIAL INFORMATION The information herein is the property of Ex Libris Ltd. or its affiliates and any misuse or abuse will result
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
DATA AND LOG FILES FOR CENTRAL MANAGEMENT STORE
During the planning and deployment of Microsoft 2012 or Microsoft 2008 R2 SP1 for your Front End pool, an important consideration is the placement of data and log files onto physical hard disks for performance.
Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
IT Forum 2-11-2013 UW-Madison Records Management Program. UW Archives and Records Management
IT Forum 2-11-2013 UW-Madison Records Management Program Records facilitate and sustaining day-to-day university operations. Records support organizational activities such as student admissions, research
Data Lab System Architecture
Data Lab System Architecture Data Lab Context Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication
Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.
National Digital Stewardship Residency - Boston Project Summaries 2015-16 Residency Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS. Harvard Library s Digital
Union County. Electronic Records and Document Imaging Policy
Union County Electronic Records and Document Imaging Policy Adopted by the Union County Board of Commissioners December 2, 2013 1 Table of Contents 1. Purpose... 3 2. Responsible Parties... 3 3. Availability
Amazon Cloud Storage Options
Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object
Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation
Next Generation Data Protection with Symantec NetBackup 7 Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation White Paper: Next Generation Data Protection with NetBackup
MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper
Migrating Desktop and Roaming Access Whitepaper Poznan Supercomputing and Networking Center Noskowskiego 12/14 61-704 Poznan, POLAND 2004, April white-paper-md-ras.doc 1/11 1 Product overview In this whitepaper
Digital Repository Initiative
Digital Repository Initiative 01 April 2014 Ray Frohlich Director, Enterprise Systems and Infrastructure with Euan Cochrane Digital Preservation Manager, Preservation Michael Friscia Manager of Digital
ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. http://milek.blogspot.com. Robert Milkowski.
ZFS Backup Platform Senior Systems Analyst TalkTalk Group http://milek.blogspot.com The Problem Needed to add 100's new clients to backup But already run out of client licenses No spare capacity left (tapes,
ANSYS EKM Overview. What is EKM?
ANSYS EKM Overview What is EKM? ANSYS EKM is a simulation process and data management (SPDM) software system that allows engineers at all levels of an organization to effectively manage the data and processes
irods in complying with Public Research Policy
irods User Group 2015 irods in complying with Public Research Policy Vic Cornell Senior Storage Consultant Overview Compliance overview UK examples Imperial College MedBio Requirements Architecture irods
NCSU SSO. Case Study
NCSU SSO Case Study 2 2 NCSU Project Requirements and Goals NCSU Operating Environment Provide support for a number Apps and Programs Different vendors have their authentication databases End users must
The Rutgers Workflow Management System. Workflow Management System Defined. The New Jersey Digital Highway
The Rutgers Workflow Management System Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries Presented at the 2007 LITA National Forum Denver, Colorado Workflow Management System
Exhibit to Data Center Services Service Component Provider Master Services Agreement
Exhibit to Data Center Services Service Component Provider Master Services Agreement DIR Contract No. DIR-DCS-SCP-MSA-002 Between The State of Texas, acting by and through the Texas Department of Information
Reference Architectures for Repositories and Preservation Archiving
Reference Architectures for Repositories and Preservation Archiving Keith Rajecki Education Solutions Architect Sun Microsystems, Inc. 1 Agenda Challenges Solution Architectures > Open Storage/Open Archive
