Managing Next Generation Sequencing Data with irods
|
|
|
- Myles Campbell
- 10 years ago
- Views:
Transcription
1 Managing Next Generation Sequencing Data with irods Presented by Dan Bedard // at the 9 th International Conference on Genomics Shenzhen, China September 12, 2014 Managing NGS Data with irods // ICG-9 1
2 Background: Problem Statement Next Generation Sequencing (NGS) results in lots of data Several GB/genome, for each processing step. cp feels safer than mv We don t know what we don t know. Managing NGS Data with irods // ICG-9 2
3 Background: Problem Statement Next Generation Sequencing (NGS) results in lots of data Several GB/genome, for each processing step. cp feels safer than mv We don t know what we don t know. We need to store everything. We need to know where it came from. We need to be able to find it. Well we doesn t include everybody. We need to secure it but we still need to share it. Managing NGS Data with irods // ICG-9 3
4 Background: Problem Statement Next Generation Sequencing (NGS) results in lots of data Several GB/genome, for each processing step. cp feels safer than mv We don t know what we don t know. We need to store everything. We need to know where it came from. We need to be able to find it. Well we doesn t include everybody. We need to secure it but we still need to share it. And time is of the essence! Managing NGS Data with irods // ICG-9 4
5 Background How do the world s preeminent bioinformatics centers manage their data? Beijing Genomics Institute (BGI) The Wellcome Trust Sanger Institute (WTSI) The Broad Institute The International Neuroinformatics Coordinating Facility (INCF) The iplant Collaborative UNC Lineberger Comprehensive Cancer Center Uppsala Genome Center Public Health England Life Science Industrial Users Managing NGS Data with irods // ICG-9 5
6 Agenda What is irods? How are People Using It? Reference Implementation for NGS Managing NGS Data with irods // ICG-9 6
7 What is irods? irods is the underlying technology for the world s preeminent genomic research ins:tutes. irods is an infinitely configurable data janitor. irods is the kind of technology you need irods to is open host source everyone s data grid unstructured middleware for data. irods is a Data Discovery powerful data migra:on tool. irods is the technology that Workflow Automation underpins the iplant Data Store. irods is a data preserva:on Secure Collaboration technology. Data Virtualization irods is a fundamental technology for CineGrid. irods is a tool for providing fine- grained privacy and security controls. irods is extensible: irods has command- line clients, APIs for numerous programming languages, and web clients. irods supports new plug- ins for storage resources, authen:ca:on mechanisms, microservices, and network prot Managing NGS Data with irods // ICG-9 7
8
9 What is irods? irods is free the to use underlying technology for the world s free to modify sits between preeminent genomic research ins:tutes. irods is an infinitely free to contribute the files and the user configurable data janitor. irods is the kind of technology you need irods to is open host source everyone s data grid unstructured middleware for data. irods is a Data Discovery powerful data migra:on ß tool. metadata irods is the technology that Workflow Automation ß policies: any condition; any action underpins the iplant Data Store. irods is a data preserva:on Secure Collaboration ß sharing without losing control technology. Data Virtualization irods is a fundamental ß file system flexibility technology for CineGrid. irods is a tool for providing fine- grained privacy and security controls. irods is extensible: irods has command- line clients, APIs for numerous programming languages, and web clients. irods supports new plug- ins for storage resources, authen:ca:on mechanisms, microservices, and network prot ß ß Managing NGS Data with irods // ICG-9 9
10 irods: Ready for Enterprise Product of nearly 20 years of research and development, funded by DARPA, DOE, NASA, NSF, NARA, and NOAA. Starting with irods 4.0, the entire codebase has been reviewed and restructured for enterprise use. Each change is verified with a test case in a continuous integration suite Pre-compiled binary packages are available for several Linux distributions and multiple database management systems. Managing NGS Data with irods // ICG-9 10
11 irods: The irods Consortium Founded to ensure that irods continues to be free open source software. Four levels of membership, with increasing levels of involvement Participation in technical planning and governance Contact, co-marketing, sales support Discretionary staff hours Stakeholders who recognize the value of sustaining irods development. Currently: RENCI The DICE Center DataDirect Networks Seagate The Wellcome Trust Sanger Institute EMC Corporation Additionally, the irods Consortium provides professional integration services, training, and support on a contract basis to irods users. Learn more at irods.org/consortium or contact us at [email protected] Managing NGS Data with irods // ICG-9 11
12 Use Case: Wellcome Trust Sanger Institute Large Scale Genomics Research Sequenced 1/3 of the human genome (largest single contributor) Active cancer, malaria, pathogen, and genomic variation studies All data publicly available through websites, FTP, direct database access, programmatic APIs 2 PB of data managed by irods Managing NGS Data with irods // ICG-9 12
13 Use Case: Wellcome Trust Sanger Institute Using irods for... Data Discovery Metadata for tracking origin and processing history Example attribute fields Users query and access data largely from local compute clusters Users access irods locally via the CLI attribute: library attribute: total_reads attribute: type attribute: lane attribute: is_paired_read attribute: study_accession_number attribute: library_id attribute: sample_accession_number attribute: sample_public_name attribute: manual_qc attribute: tag attribute: sample_common_name attribute: md5 attribute: tag_index attribute: study_title attribute: study_id attribute: reference attribute: sample attribute: target attribute: sample_id attribute: id_run attribute: study attribute: alignment Copyright Wellcome Trust Sanger Ins:tute. Used with permission. Managing NGS Data with irods // ICG-9 13
14 Use Case: Wellcome Trust Sanger Institute Using irods for... Data Virtualiza1on with Workflow Automa1on Seamless data replica:on, automa:c checksumming, policy- based data resource selec:on Oracle RAC Cluster IRODS server IRES servers Data lands by preference on green room storage, when available. Replicated, with checksums, to red room storage. Read access served by both rooms. Integrity verified in flight. SAN attached luns from various vendors Copyright Wellcome Trust Sanger Ins:tute. Used with permission. Managing NGS Data with irods // ICG-9 14
15 Use Case: Wellcome Trust Sanger Institute Using irods for... Secure Collabora1on Selec:vely sharing data between workgroups; isola:on for maintenance opera:ons; op:ons for defining policy on a per- group basis Sanger 1 Portal zone (provides Kerberised access) Federation using head zone accounts /seq /uk10k /humgen /Archive Copyright Wellcome Trust Sanger Ins:tute. Used with permission. Managing NGS Data with irods // ICG-9 15
16 Use Case: The Broad Institute Harvard-MIT collaboration focused on cross-disciplinary challenges in biology and medicine. Small pilot program using irods to archive 9TB of data. Managing NGS Data with irods // ICG-9 16
17 Use Case: The Broad Institute Using irods for... Data Discovery and Workflow Automa1on Metadata automa:cally generated from original file system, used to enforce policy and verify integrity. Policy 1 Validate, checksum, replicate, compress Policy 2 Users cannot delete files Policy 3 Purge files by expira:on date Copyright Distributed Bio LLC. Used with permission. Managing NGS Data with irods // ICG-9 17
18 Use Case: UNC Lineberger Comprehensive Cancer Center One of the leading cancer centers in the US Highly collaborative research between UNC departments Managing NGS Data with irods // ICG-9 18
19 Use Case: UNC Lineberger Cancer Research Center Secure Medical Workspace Sequencing center Primary Processing, Storage, and Archive i Isilon UNC Kure HPC Using irods for... Data Virtualiza1on with Workflow Automa1on Automa:cally staging data for HPC and interpreta:on; using hardware from mul:ple vendors; complex access control Dell Genomic Sciences TopSail Hadoop i irods Grid Hadoop compu5ng, SAN scratch storage i RENCI Croatan Big Data DDN, Quantum Addi5onal Processing, Secondary Archive RENCI BlueRidge HPC Managing NGS Data with irods // ICG-9 19
20 Use Cases: The Upshot irods is finding a permanent home at NGS sites because of: Metadata! Vendor neutrality Not subject to storage vendor lock-in Mitigates risk of vendor termination Open source Mitigate risk of developer termination Flexibility Policy enforcement: any trigger, any action Storage virtualization: layers-deep replication; localß à cloud User permissions Sharing between workgroups Managing NGS Data with irods // ICG-9 20
21 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control Standard Analy:cal Processing Querying Interpreta:on Consulta:on Addi:onal Ac:on (ex. Treatment) Archive/Replica:on Managing NGS Data with irods // ICG-9 21
22 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control Standard Analy:cal Processing Querying Interpreta:on Consulta:on Addi:onal Ac:on (ex. Treatment) Archive/Replica:on irods will apply sample IDs and results (or links to results) of processing and interpreta:on Managing NGS Data with irods // ICG-9 22
23 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control Standard Analy:cal Processing Querying Interpreta:on Consulta:on Addi:onal Ac:on (ex. Treatment) Archive/Replica:on irods will apply sample IDs and results (or links to results) of automated processing irods will kick off each process in the pipeline, or launch a workflow engine for more complex tasks. Managing NGS Data with irods // ICG-9 23
24 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control irods will apply sample IDs and results (or links to results) of automated processing irods will kick off each process in the pipeline, or launch a workflow engine for more complex tasks. Standard Analy:cal Processing Querying irods will automa:cally compile reports upon schedule or request Interpreta:on Consulta:on Addi:onal Ac:on (ex. Treatment) Archive/Replica:on Managing NGS Data with irods // ICG-9 24
25 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control irods will apply sample IDs and results (or links to results) of automated processing irods will kick off each process in the pipeline, or launch a workflow engine for more complex tasks. Standard Analy:cal Processing Querying Interpreta:on Consulta:on irods will automa:cally compile reports upon schedule or request irods will stage files for processing, evalua:on on a secure workspace, and archiving Addi:onal Ac:on (ex. Treatment) Archive/Replica:on Managing NGS Data with irods // ICG-9 25
26 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control irods will apply sample IDs and results (or links to results) of automated processing irods will kick off each process in the pipeline, or launch a workflow engine for more complex tasks. Standard Analy:cal Processing Querying Interpreta:on Consulta:on Addi:onal Ac:on (ex. Treatment) irods will automa:cally compile reports upon schedule or request irods will stage files for processing, evalua:on on a secure workspace, and archiving irods will search on metadata Archive/Replica:on Managing NGS Data with irods // ICG-9 26
27 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control irods will apply sample IDs and results (or links to results) of automated processing irods will kick off each process in the pipeline, or launch a workflow engine for more complex tasks. Standard Analy:cal Processing Querying Interpreta:on Consulta:on Addi:onal Ac:on (ex. Treatment) Archive/Replica:on irods will automa:cally compile reports upon schedule or request irods will stage files for processing, evalua:on on a secure workspace, and archiving irods will search on metadata irods will manage complex, dynamic user permissions across mul:ple workgroups Managing NGS Data with irods // ICG-9 27
28 What s Next? Create the reference genomics implementation Document it in a cookbook, so other NGS centers can adapt and implement systems Examples: Replication, Policy-based storage selection, User interface API, Access control policies, Archiving policies Managing NGS Data with irods // ICG-9 28
29 What s Next? NGS Reference Implementation Ini:aliza:on Sequencing Forma`ng and Cleaning Quality Control irods will apply sample IDs and results (or links to results) of automated processing irods will kick off each process in the pipeline, or launch a workflow engine for more complex tasks. Standard Analy:cal Processing Querying Interpreta:on Consulta:on Addi:onal Ac:on (ex. Treatment) Archive/Replica:on irods will automa:cally compile reports upon schedule or request irods will stage files for processing, evalua:on on a secure workspace, and archiving irods will search on metadata irods will manage complex, dynamic user permissions across mul:ple workgroups Managing NGS Data with irods // ICG-9 29
30 What s Next? We need partners LIMS integration System integrators Data processing with native I/O Storage/computing appliance vendors Hosts for training Users to shape the problem space and evaluate our solution Get involved Contact [email protected] Follow us on Github: Read our blog: Join the conversation on irods Chat: Managing NGS Data with irods // ICG-9 30
31 Acknowledgments Thank you to: DARPA, DOE, NASA, NSF, NARA, and NOAA for supporting the creation of irods irods Consortium Members, for their continued support John Constable, WTSI Chris Smith, Distributed Bio Sai Balu, Lineberger Cancer Research Center Genomics researchers, for coming up with interesting problems Managing NGS Data with irods // ICG-9 31
32 Attribution and License Slide 26: The Joy of Cookbooks, Tom Taker, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Except where specified otherwise, this work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Managing NGS Data with irods // ICG-9 32
33 Thank you! Dan Bedard irods Market Development Manager Managing NGS Data with irods // ICG-9 33
The National Consortium for Data Science (NCDS)
The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill [email protected] http://irods.org/ SESYNC Model Integration Workshop Important
Technology solutions for managing and computing on largescale biomedical data
Technology solutions for managing and computing on largescale biomedical data Charles Schmitt CTO & Director of Informatics RENCI Brand Fortner Executive Director, irods Consortium Jason Coposky Chief
Data Management using irods
Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC [email protected] 2 Course outline Why talk about irods? What is irods?
Automated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI
irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI Acknowledgements Presented work funded in part by grants from NIH, NSF, NARA, DHS, as
Integrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
Personalized Medicine and IT
Personalized Medicine and IT Data-driven Medicine in the Age of Genomics www.intel.com/healthcare/bigdata Ketan Paranjape General Manager, Life Sciences Intel Corp. @Portlandketan 1 The Central Dogma of
Balancing Big Data for Security, Collaboration and Performance
Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World
Data management challenges in todays Healthcare and Life Sciences ecosystems
Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences [email protected] Evolution of Data Sets in Healthcare
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano [email protected] Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
Seagate HPC /Big Data Business Tech Talk. December 2014
Seagate HPC /Big Data Business Tech Talk December 2014 Safe Harbor Statement This document contains forward-looking statements within the meaning of Section 27A of the Securities Act of 1933, and Section
irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire [email protected] 25th
irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire [email protected] Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...
MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories
irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories Reagan W. Moore Arcot Rajasekar Mike Wan {moore,sekar,mwan}@diceresearch.org h;p://irods.diceresearch.org
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
Introduc)on of Pla/orm ISF. Weina Ma [email protected]
Introduc)on of Pla/orm ISF Weina Ma [email protected] Agenda Pla/orm ISF Product Overview Pla/orm ISF Concepts & Terminologies Self- Service Applica)on Management Applica)on Example Deployment Examples
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI! Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Current
WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief
DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud
Practical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute ([email protected]) Paul Dave ([email protected]) Dinanath Sulakhe ([email protected]) Alex Rodriguez ([email protected])
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT
Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this
WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center
WHITEPAPER Why Dependency Mapping is Critical for the Modern Data Center OVERVIEW The last decade has seen a profound shift in the way IT is delivered and consumed by organizations, triggered by new technologies
TRANSFORMING DATA PROTECTION
TRANSFORMING DATA PROTECTION Moving from Reactive to Proactive Mark Galpin 1 Our Protection Strategy: Best Of Breed Performance LEADER HIGH-END STORAGE VMAX Low Service Level LEADER SCALE-OUT NAS STORAGE
EMC IRODS RESOURCE DRIVERS
EMC IRODS RESOURCE DRIVERS PATRICK COMBES: PRINCIPAL SOLUTION ARCHITECT, LIFE SCIENCES 1 QUICK AGENDA Intro to Isilon (~2 hours) Isilon resource driver Intro to ECS (~1.5 hours) ECS Resource driver Possibilities
Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson
Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 [email protected] San Diego Supercomputer Center
The software platform for storing, preserving and sharing very large data sets. www.active-circle.com
The software platform for storing, preserving and sharing very large data sets www.active-circle.com The easiest solution for storing and archiving very large data sets! ACTIVE CIRCLE HIGHLIGHTS Software-based
Globus Genomics Tutorial GlobusWorld 2014
Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for
A Survey of Shared File Systems
Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...
DDN updates object storage platform as it aims to break out of HPC niche
DDN updates object storage platform as it aims to break out of HPC niche Analyst: Simon Robinson 18 Oct, 2013 DataDirect Networks has refreshed its Web Object Scaler (WOS), the company's platform for efficiently
Data Pipeline with Kafka
Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66 AGENDA Big Data & Data Pipeline Kafka Introduction
irods in complying with Public Research Policy
irods User Group 2015 irods in complying with Public Research Policy Vic Cornell Senior Storage Consultant Overview Compliance overview UK examples Imperial College MedBio Requirements Architecture irods
Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, 2015. 20014 IBM Corporation
Boas Betzler Cloud IBM Distinguished Computing Engineer for a Smarter Planet Globally Distributed IaaS Platform Examples AWS and SoftLayer November 9, 2015 20014 IBM Corporation Building Data Centers The
W H I T E P A P E R E X E C U T I V E S U M M AR Y S I T U AT I O N O V E R V I E W. Sponsored by: EMC Corporation. Laura DuBois May 2010
W H I T E P A P E R E n a b l i n g S h a r e P o i n t O p e r a t i o n a l E f f i c i e n c y a n d I n f o r m a t i o n G o v e r n a n c e w i t h E M C S o u r c e O n e Sponsored by: EMC Corporation
U"lizing the SDSC Cloud Storage Service
U"lizing the SDSC Cloud Storage Service PASIG Conference January 13, 2012 Richard L. Moore [email protected] San Diego Supercomputer Center University of California San Diego SAN DIEGO SUPERCOMPUTER CENTER
The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013
The BIG Data Era has arrived Re-invent your storage! Bratislava, Slovakia, 21st March 2013 Luka Topic Regional Manager East Europe EMC Isilon Storage Division [email protected] 1 What is Big Data? 2 EXABYTES
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio
NERSC Archival Storage: Best Practices
NERSC Archival Storage: Best Practices Lisa Gerhardt! NERSC User Services! Nick Balthaser! NERSC Storage Systems! Joint Facilities User Forum on Data Intensive Computing! June 18, 2014 Agenda Introduc)on
Hadoop Distributed File System. Dhruba Borthakur June, 2007
Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle
Unlocking Hadoop for Your Rela4onal DB. Kathleen Ting @kate_ting Technical Account Manager, Cloudera Sqoop PMC Member BigData.
Unlocking Hadoop for Your Rela4onal DB Kathleen Ting @kate_ting Technical Account Manager, Cloudera Sqoop PMC Member BigData.be April 4, 2014 Who Am I? Started 3 yr ago as 1 st Cloudera Support Eng Now
Distributed File Systems An Overview. Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG
Distributed File Systems An Overview Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG Introduction A distributed file system allows shared, file based access without sharing disks History starts in 1960s
How To Manage Research Data At Columbia
An experience/position paper for the Workshop on Research Data Management Implementations *, March 13-14, 2013, Arlington Rajendra Bose, Ph.D., Manager, CUIT Research Computing Services Amy Nurnberger,
LabArchives Electronic Lab Notebook:
Electronic Lab Notebook: Cloud platform to manage research workflow & data Support Data Management Plans Annotate and prove discovery Secure compliance Improve compliance with your data management plans,
Let s Talk Digital An Approach to Managing, Storing, and Preserving Time-Based Media Art Works. DAMS Branch Manager Smithsonian Institution, OCIO
Let s Talk Digital An Approach to Managing, Storing, and Preserving Time-Based Media Art Works DAMS Branch Manager Smithsonian Institution, OCIO 1 Love and Marriage Digital Asset Management?? Digital Diamonds
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Directed by Stan Ahalt, formerly
Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.
Managing a local Galaxy Instance Anushka Brownley / Adam Kraut BioTeam Inc. Agenda Who are we Why a local installation Local infrastructure Local installation Tips and Tricks SlipStream Appliance WHO ARE
Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
irods at CC-IN2P3: managing petabytes of data
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h
1 P a g e Delivering Self -Service Cloud application service using Oracle Enterprise Manager 12c
Delivering Self-service Cloud application services using Oracle Enterprise Manager 12c Kai Yu, Senior Principal Engineer, Oracle Solutions Engineering, Dell Inc ABSTRACT Oracle Self-Service provisioning
Building and Managing
ORACLE Oracle Press' Building and Managing a Cloud Using Oracle Enterprise Manager 12c Madhup Gulati Adeesh Fulay Sudip Datta Mc Graw Hill Education New York Chicago San Francisco Lisbon London Madrid
Seagate Cloud Systems & Solutions
Seagate Cloud Systems & Solutions Market Challenge 3 Traditional Storage Is Not Viable Traditional Enterprise (EMC, NetApp, HP) DIY Public Cloud 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 4 Seagate
IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016. Integration Guide IBM
IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016 Integration Guide IBM Note Before using this information and the product it supports, read the information
Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk
Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes
Report of the DTL focus meeting on Life Science Data Repositories
Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity
Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI
Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements
SIF 3: A NEW BEGINNING
SIF 3: A NEW BEGINNING The SIF Implementation Specification Defines common data formats and rules of interaction and architecture, and is made up of two parts: SIF Infrastructure Implementation Specification
INTRODUCTION APPLICATION DEPLOYMENT WITH ORACLE VIRTUAL ASSEMBLY
SIMPLIFYING APPLICATION DEPLOYMENT IN CLOUD USING VIRTUAL ASSEMBLIES AND EM 12C Kai Yu, Dell Inc. ABSTRACT Oracle virtual assemblies provide a great way to simply the deployment of enterprise-class multi-tier
Data Management & Storage for NGS
Data Management & Storage for NGS 2009 Pre-Conference Workshop Chris Dagdigian BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance
Object Storage, Cloud Storage, and High Capacity File Systems
A Primer on Object Storage, Cloud Storage, and High Capacity File Presented by: Chris Robertson Sr. Solution Architect Cambridge Computer Copyright 2010-2011, Cambridge Computer Services, Inc. All Rights
Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage
Clodoaldo Barrera Chief Technical Strategist IBM System Storage Making a successful transition to Software Defined Storage Open Server Summit Santa Clara Nov 2014 Data at the core of everything Data is
<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures
1 Refreshing Your Data Protection Environment with Next-Generation Architectures Dale Rhine, Principal Sales Consultant Kelly Boeckman, Product Marketing Analyst Program Agenda Storage
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
Hitachi Content Platform as a Continuous Integration Build Artifact Storage System
Hitachi Content Platform as a Continuous Integration Build Artifact Storage System A Hitachi Data Systems Case Study By Hitachi Data Systems March 2015 Contents Executive Summary... 2 Introduction... 3
With Red Hat Enterprise Virtualization, you can: Take advantage of existing people skills and investments
RED HAT ENTERPRISE VIRTUALIZATION DATASHEET RED HAT ENTERPRISE VIRTUALIZATION AT A GLANCE Provides a complete end-toend enterprise virtualization solution for servers and desktop Provides an on-ramp to
Best Practices Series: Cure Your E-mail Management Headaches in IBM Lotus Notes/Domino Environments
1 Best Practices Series: Cure Your E-mail Management Headaches in IBM Lotus Notes/Domino Environments Part 2: Managing E-mail Compliance and Discovery Challenges Kelly Ferguson Michael Brown 2 Agenda Compliance
High Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
Privileged Administra0on Best Prac0ces :: September 1, 2015
Privileged Administra0on Best Prac0ces :: September 1, 2015 Discussion Contents Privileged Access and Administra1on Best Prac1ces 1) Overview of Capabili0es Defini0on of Need 2) Preparing your PxM Program
Addressing Storage Management Challenges using Open Source SDS Controller
Addressing Storage Management Challenges using Open Source SDS Controller Anjaneya Reddy Chagam, Intel Chief SDS Architect, Data Center Group Shayne Huddleston, Oregon State University Infrastructure Architect,
An Oracle White Paper July 2014. Oracle ACFS
An Oracle White Paper July 2014 Oracle ACFS 1 Executive Overview As storage requirements double every 18 months, Oracle customers continue to deal with complex storage management challenges in their data
