irods in complying with Public Research Policy
|
|
|
- Alaina Dorsey
- 9 years ago
- Views:
Transcription
1 irods User Group 2015 irods in complying with Public Research Policy Vic Cornell Senior Storage Consultant
2 Overview Compliance overview UK examples Imperial College MedBio Requirements Architecture irods integration irods capabilities Proposed Workflow Challenges and Unknowns 2
3 From: EPSRC Data Management Policy Research organisations will ensure that appropriately structured metadata describing the research data they hold is published (normally within 12 months of the data being generated) and made freely accessible on the internet. Where the research data referred to in the metadata is a digital object it is expected that the metadata will include use of a robust digital object identifier (For example as available through the DataCite organisation). Research organisations will ensure that EPSRC-funded research data is securely preserved for a minimum of 10 years from the date that any researcher privileged access period expires or, if others have accessed the data, from last date on which access to the data was requested by a third party 3
4 MRC (Medical Research Council UK) From MRC Research data policy: All research must have a Data Management Plan Speaks of: o Managing, storing and curating data. o Metadata standards and data documentation o Data preservation strategy and standards These must all be specified and adhered to. 4
5 Imperial College MedBio The Imperial College Bioinformatics Support is a part of the Imperial College Centre for Integrative Systems Biology and Bioinformatics. The mission of the Imperial College Centre for Bioinformatics is to promote and co-ordinate worldclass research and training in Bioinformatics within Imperial College and to provide state-of-the-art Bioinformatics support to members of Imperial College for their research. 5
6 Imperial College MedBio Conduct a large number of studies with respect to Systems Biology and Bioinformatics Range of data sources from Internal o Next Generation Sequencers o Very high resolution microscopes. Big Data o Phenome study systems produce 7GB data every 15 minutes. o They have 10 of them and they run for 2 weeks/month. o Maybe ½ PB Year? External Datasets Staff! Current staff are overloaded with IT tasks and don t have time to embrace new methods. Too many workflows to follow. More staff being recruited but its hard to find people with the right skills 6
7 Storage Solution South Kensington DDN SFA 12K TB Disks 7 x WOS TB Disks 1 x Tape Library DDN WOS Bridge GridScaler (GPFS) GridNAS TSM with Space Manager for GPFS Infinity Slough DDN SFA SFA TB Disks 7 x WOS TB Disks 1 x Tape Library DDN WOS Bridge GridScaler (GPFS) GridNAS TSM with Space Manager for GPFS 7
8 MedBio Architecture WOS Replication for Tier2 WOS Core Object Store Tier2 Tier2 WOS Core Object Store Tier 1 Pool NAS NFS/CIFS Tier1 NFS/CIFS NAS Tier 1 Pool WOS BRIDGE Tier 2 Cache GPFS Filesystems GPFS Filesystems Tier 2 Cache WOS BRIDGE Tier 3 Cache Client Nodes Tier 3 Cache TSM TSM TSM database Block Storage GPFS Block Storage TSM database Tape Tier3 NAS Tier3 Tape Library Library TSM DataDirect Networks. All Rights All Rights Reserved. Reserved.
9 MedBio POC irods Architecture WOS Replication for Tier2 WOS Core Object Store Tier2 Tier2 WOS Core Object Store Tier 1 Pool NAS NFS/CIFS Tier1 NFS/CIFS NAS Tier 1 Pool WOS BRIDGE Tier 2 Cache GPFS Filesystems GPFS Filesystems Tier 2 Cache WOS BRIDGE Tier 3 Cache Client Nodes Imperial MedBIO Architecture Tier 3 Cache TSM TSM TSM database TSM database Tier3 Block Storage irods GPFS irods Block Storage Tier3 Tape irods Database NAS irods Database Tape Library IRODS Pool irods IRODS Pool Library TSM DataDirect Networks. All Rights All Rights Reserved. Reserved.
10 10 Example Imperial MedBIO Workflow Load data from sequencer into Tier2 Tier 2 Associate metadata from Sequencer Make data immutable. Add in data from LIMS (Laboratory Management system) Register data with irods Publish Data via AIMS (Academic Information Management System) irods
11 irods for Compliance? Good: Rules based engine which associates metadata with data. Allows time based policies for data retention. Can execute rules based on complex metadata queries to catch all required cases. Policy enforcement points Can be used to implement data management and data retention policy. Opportunities for data harvesting. Once established can become a matter of record and boilerplate for subsequent projects Still need to have resilient storage underneath. 11
12 irods for Compliance? Questions: Can irods make data *really* immutable? At what level is this best done? End-to-end metadata harvesting can it manage an integration with LIMS. o Chain of custody o Chain of provenance. Publishing: Integration with AIMS? o Which one? Scale this is a 7PB facility with file counts to scale can Imperial do this in one zone? If not how will they manage federation so it works. Database stability, availability and recoverability. o Who sets the standards? 12
13 Q&A
14 THANK YOU
15 Links
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM 1.800.837.2298. 360 Full Spectrum Object Storage
PRODUCT BROCHURE WOS OBJECT STORAGE 360 Full Spectrum Object Storage The promise of object storage is simple: to enable organizations to build highly Performance Scalability Reliability Efficiency Security
HadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
Clarifications of EPSRC expectations on research data management.
s of EPSRC expectations on research data management. Expectation I Research organisations will promote internal awareness of these principles and expectations and ensure that their researchers and research
WOS. High Performance Object Storage
Datasheet WOS High Performance Object Storage The Big Data explosion brings both challenges and opportunities to businesses across all industry verticals. Providers of online services are building infrastructures
Automated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
Long term retention and archiving the challenges and the solution
Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process
Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre
Canadian Astronomy Data Centre Séverin Gaudet David Schade Canadian Astronomy Data Centre Data Activities in Astronomy Features of the astronomy data landscape Multi-wavelength datasets are increasingly
THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.
THE EMC ISILON STORY Big Data In The Enterprise 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology Summary 2 What is Big Data? 3 The Big Data Challenge File Shares 90 and Archives 80 Bioinformatics
Solution Brief: Archiving Avid Interplay Projects using NLT and XenData
Solution Brief: Archiving Avid Interplay Projects using NLT and XenData Contents 1. Introduction to the Open Interplay Archive 2. Solution Benefits 3. System Architecture 4. How to Archive and Restore
Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS
Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS Workshop on the Future of Big Data Management 27-28 June 2013 Philip Kershaw Centre for Environmental Data Archival
DDN updates object storage platform as it aims to break out of HPC niche
DDN updates object storage platform as it aims to break out of HPC niche Analyst: Simon Robinson 18 Oct, 2013 DataDirect Networks has refreshed its Web Object Scaler (WOS), the company's platform for efficiently
EPSRC Research Data Management Compliance Report
EPSRC Research Data Management Compliance Report Contents Introduction... 2 Approval Process... 2 Review Schedule... 2 Acknowledgement... 2 EPSRC Expectations... 3 1. Awareness of EPSRC principles and
UW-IT Backups & Archives
UW-IT Backups & Archives Powerful, Flexible, Affordable UW-IT TechTalk February 19, 2015 Agenda Definitions Yesterday Today Tomorrow Your thoughts Backups Defined Data is hot Primary data copy is on first-tier
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
ANY SURVEILLANCE, ANYWHERE, ANYTIME
ANY SURVEILLANCE, ANYWHERE, ANYTIME WHITEPAPER DDN Storage Powers Next Generation Video Surveillance Infrastructure INTRODUCTION Over the past decade, the world has seen tremendous growth in the use of
Object Oriented Storage and the End of File-Level Restores
Object Oriented Storage and the End of File-Level Restores Stacy Schwarz-Gardner Spectra Logic Agenda Data Management Challenges Data Protection Data Recovery Data Archive Why Object Based Storage? The
Archiving On-Premise and in the Cloud. March 2015
Archiving On-Premise and in the Cloud March 2015 Cloud Storage Storage accessed over a network via web services APIs. http://swift.example.com/v1/account/container/object Source: http://docs.openstack.org/admin-guide-cloud/content/objectstorage_characteristics.html
Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, 2009. Balancing Cost, Complexity, and Fault Tolerance
Storage Design for High Capacity and Long Term Storage Balancing Cost, Complexity, and Fault Tolerance DLF Spring Forum, Raleigh, NC May 6, 2009 Lecturer: Jacob Farmer, CTO Cambridge Computer Copyright
Managed Storage @ GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team
Managed Storage @ GRID or why NFSv4.1 is not enough Tigran Mkrtchyan for dcache Team What the hell do physicists do? Physicist are hackers they just want to know how things works. In moder physics given
Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers
DDN Whitepaper Improving Time to Results for Seismic Processing with Paradigm and DDN James Coomer and Laurent Thiers 2014 DataDirect Networks. All Rights Reserved. Executive Summary Companies in the oil
EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. [email protected]. 2011 Whamcloud, Inc.
EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. [email protected] Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano [email protected] Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
Why long time storage does not equate to archive
Why long time storage does not equate to archive Jos van Wezel HUF Toronto 2015 STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz
With DDN Big Data Storage
DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big
Data Management using irods
Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC [email protected] 2 Course outline Why talk about irods? What is irods?
AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK
White Paper AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK Abstract EMC Isilon SmartLock protects critical data against accidental, malicious or premature deletion or alteration. Whether you need to
Integrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
Solution Brief: XenData Digital Video Archives in a Dalet Environment
Solution Brief: XenData Digital Video Archives in a Dalet Environment Contents About Us 1. Introduction 2. XenData-Dalet Configuration 3. Benefits from XenData s Commitment to Standards 4. Functionality
EMC IRODS RESOURCE DRIVERS
EMC IRODS RESOURCE DRIVERS PATRICK COMBES: PRINCIPAL SOLUTION ARCHITECT, LIFE SCIENCES 1 QUICK AGENDA Intro to Isilon (~2 hours) Isilon resource driver Intro to ECS (~1.5 hours) ECS Resource driver Possibilities
ETERNUS CS High End Unified Data Protection
ETERNUS CS High End Unified Data Protection Optimized Backup and Archiving with ETERNUS CS High End 0 Data Protection Issues addressed by ETERNUS CS HE 60% of data growth p.a. Rising back-up windows Too
Globus and the Centralized Research Data Infrastructure at CU Boulder
Globus and the Centralized Research Data Infrastructure at CU Boulder Daniel Milroy, [email protected] Conan Moore, [email protected] Thomas Hauser, [email protected] Peter Ruprecht,
CMIP6 Data Management at DKRZ
CMIP6 Data Management at DKRZ icas2015 Annecy, France on 13 17 September 2015 Michael Lautenschlager Deutsches Klimarechenzentrum (DKRZ) With contributions from ESGF Executive Committee and WGCM Infrastructure
SURFsara Data Services
SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,
Research Data Storage and the University of Bristol
Introduction: Policy for the use of the Research Data Storage Facility The University s High Performance Computing (HPC) facility went live to users in May 2007. Access to this world-class HPC facility
Data Management Planning
DIY Research Data Management Training Kit for Librarians Data Management Planning Kerry Miller Digital Curation Centre University of Edinburgh [email protected] Running Order I. What is Research Data
EMC NETWORKER AND DATADOMAIN
EMC NETWORKER AND DATADOMAIN Capabilities, options and news Madis Pärn Senior Technology Consultant EMC [email protected] 1 IT Pressures 2009 0.8 Zettabytes 2020 35.2 Zettabytes DATA DELUGE BUDGET DILEMMA
The Design and Implementation of the Zetta Storage Service. October 27, 2009
The Design and Implementation of the Zetta Storage Service October 27, 2009 Zetta s Mission Simplify Enterprise Storage Zetta delivers enterprise-grade storage as a service for IT professionals needing
PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute
PADS GPFS Filesystem: Crash Root Cause Analysis Computation Institute Argonne National Laboratory Table of Contents Purpose 1 Terminology 2 Infrastructure 4 Timeline of Events 5 Background 5 Corruption
Archive Storage Infrastructure At the Library of Congress
Infrastructure At the Library of Congress Scott Rife Digital Conference September 30, 2011 srif at loc dot gov The Packard Campus Mission The National Audiovisual Conservation Center develops, preserves
A Survey of Shared File Systems
Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...
Solution Brief: Creating Avid Project Archives
Solution Brief: Creating Avid Project Archives Marquis Project Parking running on a XenData Archive Server provides Fast and Reliable Archiving to LTO or Sony Optical Disc Archive Cartridges Summary Avid
GDCMTM. Global DataCenter Management. nlytetm. nlyte the next generation datacenter management system
GDCMTM Global DataCenter Management nlytetm nlyte the next generation datacenter management system 3 of 6 nlyte provides certainty through the control and flexibility to align mission critical IT facilities
Call: 08715 900800. Disaster Recovery/Business Continuity (DR/BC) Services From VirtuousIT
Disaster Recovery/Business Continuity (DR/BC) Services From VirtuousIT The VirtuousIT DR/BC solution is designed around RecoveryShield from Thinking SAFE. The service includes a local backup appliance
EMC BACKUP AND RECOVERY SOLUTIONS
EMC BACKUP AND RECOVERY SOLUTIONS Backup to the future BRS PARTNER UPDATE Sofia, March 14 th, 2011 [email protected] [email protected] 1 Agenda EMC backup and recovery solutions Backup
Building a Scalable Big Data Infrastructure for Dynamic Workflows
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
DDN in Seismic Workflows
DDN in Seismic Workflows October, 2014 Laura Shepard Director Vertical Markets Agenda About DDN Technology & Product Portfolio Technology Preview DDN About Us DDN is a Leader in Massively Scalable Platforms
EMC arhiviranje. Lilijana Pelko Primož Golob. Sarajevo, 16.10.2008. Copyright 2008 EMC Corporation. All rights reserved.
EMC arhiviranje Lilijana Pelko Primož Golob Sarajevo, 16.10.2008 1 Agenda EMC Today Reasons to archive EMC Centera EMC EmailXtender EMC DiskXtender Use cases 2 EMC Strategic Acquisitions: Strengthen and
A Physics Approach to Big Data. Adam Kocoloski, PhD CTO Cloudant
A Physics Approach to Big Data Adam Kocoloski, PhD CTO Cloudant 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Solenoidal Tracker at RHIC (STAR) The life of LHC data Detected by experiment Online
irods at CC-IN2P3: managing petabytes of data
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h
WOS 360 FULL SPECTRUM OBJECT STORAGE
WOS 360 FULL SPECTRUM OBJECT STORAGE WHITEPAPER Collaborate Distribute Archive EXECUTIVE SUMMARY The industry has come to understand that an alternative storage methodology is required to effi..ciently.and.securely.store.the.exabytes.of.unstructured.information.we.generate.every.day..
The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013
The BIG Data Era has arrived Re-invent your storage! Bratislava, Slovakia, 21st March 2013 Luka Topic Regional Manager East Europe EMC Isilon Storage Division [email protected] 1 What is Big Data? 2 EXABYTES
Data storage services at CC-IN2P3
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief Agenda Hardware: Storage on disk. Storage on tape. Software:
ClearPath Storage Update Data Domain on ClearPath MCP
ClearPath Storage Update Data Domain on ClearPath MCP Ray Blanchette Unisys Storage Portfolio Management Jose Macias Unisys TCIS Engineering September 10, 2013 Agenda VNX Update Customer Challenges and
Big Data Analytics Service Definition G-Cloud 7
Big Data Analytics Service Definition G-Cloud 7 Big Data Analytics Service Service Overview ThinkingSafe s Big Data Analytics Service allows information to be collected from multiple locations, consolidated
Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation
Scale and Availability Considerations for Cluster File Systems David Noy, Symantec Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.
Growth of Unstructured Data & Object Storage. Marcel Laforce Sr. Director, Object Storage
Growth of Unstructured Data & Object Storage Marcel Laforce Sr. Director, Object Storage Agenda Unstructured Data Growth Contrasting approaches: Objects, Files & Blocks The Emerging Object Storage Market
Lecture 2 CS 3311. An example of a middleware service: DNS Domain Name System
Lecture 2 CS 3311 An example of a middleware service: DNS Domain Name System The problem Networked computers have names and IP addresses. Applications use names; IP uses for routing purposes IP addresses.
Open Source Sales Force Automation (SFA) in the Cloud SaaS
Open Source Sales Force Automation (SFA) in the Cloud SaaS Service Overview Our open source Sales Force Automation (SFA) in the cloud service allows customers to perform marketing automation through multi
Managing Microsoft Office SharePoint Server Content with Hitachi Data Discovery for Microsoft SharePoint and the Hitachi NAS Platform
Managing Microsoft Office SharePoint Server Content with Hitachi Data Discovery for Microsoft SharePoint and the Hitachi NAS Platform Implementation Guide By Art LaMountain and Ken Ewers February 2010
Oracle Data Protection Concepts
Oracle Data Protection Concepts Matthew Ellis Advisory Systems Engineer BRS Database Technologist, EMC Corporation Accelerating Transformation EMC Backup Recovery Systems Division 1 Agenda Market Conditions.
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill [email protected] http://irods.org/ SESYNC Model Integration Workshop Important
Our School Backup A trusted, safe and secure remote backup solution for the UK education sector.
Our School Backup A trusted, safe and secure remote backup solution for the UK education sector. A trusted, safe and secure remote data backup solution for schools. Our ICT presents Our School Backup,
The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000
The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 Summary: This document describes how to analyze performance on an IBM Storwize V7000. IntelliMagic 2012 Page 1 This
BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS
BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS ESSENTIALS Executive Summary Big Data is placing new demands on IT infrastructures. The challenge is how to meet growing performance demands
How To Manage An Electronic Discovery Project
Optim The Rise of E-Discovery Presenter: Betsy J. Walker, MBA WW Product Marketing Manager What is E-Discovery? E-Discovery (also called Discovery) refers to any process in which electronic data is sought,
Introduction to NetApp Infinite Volume
Technical Report Introduction to NetApp Infinite Volume Sandra Moulton, Reena Gupta, NetApp April 2013 TR-4037 Summary This document provides an overview of NetApp Infinite Volume, a new innovation in
Oracle Content Management and Archiving
1 Oracle Content Management and Archiving Donna Harland Principal Product Manager SAM QFS and Archiving Solutions Agenda Archive and Tiered Storage Value What is Oracle Content Management?
Preview of a Novel Architecture for Large Scale Storage
Preview of a Novel Architecture for Large Scale Storage Andreas Petzold, Christoph-Erdmann Pfeiler, Jos van Wezel Steinbuch Centre for Computing STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the
The Hartree Centre helps businesses unlock the potential of HPC
The Hartree Centre helps businesses unlock the potential of HPC Fostering collaboration and innovation across UK industry with help from IBM Overview The need The Hartree Centre needs leading-edge computing
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
EMC BACKUP MEETS BIG DATA
EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for
