Integrated Rule-based Data Management System for Genome Sequencing Data
|
|
- Nathan Carroll
- 3 years ago
- Views:
Transcription
1 Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer Imperial College London This project was funded as part of Imperial College s RDM Green Shoots Programme. In 2014, the Vice-Provost, Research, approved an allocation of 100K for academically-driven projects to identify and generate exemplars of best practice in RDM, specifically frameworks and prototypes that would comply with key funder RDM policies and the College position. The call for projects outlined that frameworks could be based either on original ideas or integrating existing solutions into the research process, improving its efficacy or the breadth of its usage. There was an expectation that solutions would support open access for data; solutions that supported Open Innovation were strongly encouraged. Six projects were funded, covering different disciplines, faculties and research areas. The projects ran for six months, finishing at the end of Project reports were made available in For more information on the programme and projects please visit: This work is licensed under a Creative Commons Attribution 4.0 International License.
2 Integrated Rule-based Data Management System for Genome Sequencing Data Michael Mueller, Simon Burbidge, Steven Lawlor, Jorge Ferrer; Imperial College London Background The advent and rapid evolution of next-generation DNA sequencing (NGS) technologies has revolutionised the field of genome research and has created new opportunities for the application of genomics in biomedical research and clinical practice. The ability to generate large amounts of information about individual genome sequences in a short amount of time at continuously falling costs has led to a widespread adoption of NGS technologies. Consequently, research institutes are now faced with increasingly large volumes of sequencing and associated data requiring more sophisticated approaches to data storage and management then commonly in place to date in order to ensure data integrity and security, avoid unnecessary data replication and facilitate data access for analysis and sharing. While the implementation of large-scale data management systems is a challenge biomedical research has been confronted with only recently, other data-intense domains such as physics have been developing solutions to problems associated with the storage, management and sharing of large data collections for more than two decades. More recently, the genomics research community has started to adopt existing technologies that originated in these fields. Leading this trend are large-scale sequencing centres such as the Wellcome Trust Sanger Institute in the UK or the Broad Institute in the US, where data grid middleware such as the integrated Rule-Oriented Data management System (irods) has been successfully used for storing and managing large distributed sequencing datasets and associated metadata. Requirements The newly established DNA sequencing service at the NIHR Imperial BRC Genomics Facility will generate large amounts of genome sequencing data for users across Imperial College. The Illumina HiSeq2500 DNA sequencer installed in the Facility will output around 10TB of raw data every two weeks amounting to a total data throughput of 260TB per year. Manual processing, analysis and dissemination of sequencing data make data management time consuming, pose risks to data integrity and may result in unnecessary data replication. To address these issues the Facility will set up a data management system for the new sequencing service that should integrate with existing HPC infrastructure for processing, analysis and dissemination of raw sequencing data and analysis results. The system should i) optimise data storage usage ii) facilitate data sharing within the college and with external collaborators iii) comply with College and funder data preservation and protection policies iv) allow for a high degree of automation and v) be scalable. Figure 1 shows an overview of the data life cycle within the proposed system.
3 irods The implementation of the system is based on the irods middleware. irods is a widely used, scalable, open source data grid system that has been successfully adopted for the management of next-generation sequencing data at other research institutions such as the Wellcome Trust Sanger Centre, the Broad Institute and the University of Uppsala. The irods platform implements a logical name space providing transparent access to distributed and disparate storage resources. irods features important for the implementation of the system are the rule engine and the metadata catalogue. The rule engine allows the invocation of pre-defined sequences of action (micro-services) in regular intervals or when particular events occur. State changes resulting from rule execution are stored and this information can be used to track and control rule execution. The rule engine will facilitate the effective management of data life cycles (file migration, file format conversion etc.), data preservation (consistency, replication and archiving) and data security (encryption). The metadata function allows to associate descriptive system and user defined metadata with files. This will enable search, management and tracking of data and data manipulation within the system. Figure 1 Data management system for genome sequencing data. Raw sequencing data is transferred from a local storage server at Hammersmith Campus to the HPC Service at South Kensington (1), read data is converted from the vendor specific BCL format to the platform independent fastq format (2), sequencing reads are mapped to a reference genome sequence (3), mapped read data is compressed further using reference based read compression (4), compressed reads data is enrypted (5), encrypted read data is archived on tape (6) and remotely at a public repository (7), aligned read data is disseminated locally to users (8). Implementation We have implemented a prototype of an irods-based data management system that comprises all steps of the data processing required to disseminate sequencing data to internal users of the facility (steps 1 4 and 8 in Figure 1): (1) upon completion of a sequencing run raw data required for the conversion of raw sequencing data from the vendor-specific BCL format to the platformindependent fastq format is transferred to a storage server at the HPC Service in South Kensington. (2) Reads are converted from BCL to fastq format using the vendor-supplied bcl2fastq software.
4 Splitting of read data by sample and project also occurs during this step. Subsequently a quality control report is generated with FastQC. (3) Reads in fastq format are then mapped to a reference genome sequence with BWA MEM which outputs mapping results in BAM format resulting in a >3- fold reduction in file size. (4) Further compression is achieved by applying a reference based compression algorithm using samtools v1.0 to convert BAM to CRAM files. (8) Aligned read data in BAM format is transferred to a webserver and made accessible for download through the web-based irods client idrop Web 2. The initial proposal envisaged management of the entire data processing workflow through the irods system. Figure 2A outlines the irods setup originally proposed with irods resource servers running on all storage resources. The webserver is also an icat enabled resource server that hosts the irods meta data catalogue. In this setup the entire data life cycle can be managed and tracked by irods. However, concerns were raised by the HPC service team regarding user authentication and file ownership management. The default irods setup requires irods user authentication separately from the system authentication. Apart from the additional administrative overhead the trustworthiness of the irods authentication system was questioned by the HPC service team and considered a potential security risk. With regard to file ownership management, the fact that all files that are put into irods will be owned by a single user (the user the irods service is running as) was considered a problem as this creates complete dependence on the irods file ownership management.
5 Figure 2 Proposed vs implemented setup of the data management system. A) Proposed irods setup in which all storage resources in the system are irods resource servers belonging to the same irods 'zone' (igfzone). B) Implemented irods setup. Only storage resources administrated by the Genomics Facility run irods resource servers. The storage resources administrated by the HPC service are outside the irods system. Firstly, to address the concerns regarding user authentication it was suggested to enable irods PAM authentication which allows the use of system passwords for irods authentication. Secondly, the issues relating to file ownership could be overcome by implementing the HPC irods resource as a 'direct access resource'. Direct access data resources are accessible both through irods and through the file system. irods acts as an "overlay" for the UNIX file system providing meta-data annotations for the files in the file system. However, in this operation mode irods runs as a root process which means that irods could potentially access/write any data on the system. The complete reliance on the authentication and access control in irods was not considered acceptable. Due to the concerns regarding data security and integrity the system was implemented without the integration of the HPC storage resource into irods (see Figure 2B). Instead data is fetched from irods via an irods client installed on the HPC system. After processing of the data on the cluster results and metadata are pushed back into irods. Conclusion We have successfully implemented a prototype of a data management system for sequencing data generated by the Imperial BRC Genomics Facility. The use of irods to manage and track the data life cycle within the system constitutes a compromise between the powerful functionality of irods and the sacrifice of file system-based control over data access and ownership. This becomes an issue when shared resources such as HPC systems need to be integrated where irods-based file management might conflict with the existing setup of the system. These issues would need to be addressed if the irods system was to be i) extended to manage data generated by secondary data analysis workflows and ii) used more widely across the college to manage sequencing and other research data.
Automated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
Data Management using irods
Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC a.heyrovsky@epcc.ed.ac.uk 2 Course outline Why talk about irods? What is irods?
irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th
irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire akodgire@indiana.edu Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...
Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano marciano@unc.edu Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Analysis of ChIP-seq data in Galaxy
Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers
Local Loading. The OCUL, Scholars Portal, and Publisher Relationship
Local Loading Scholars)Portal)has)successfully)maintained)relationships)with)publishers)for)over)a)decade)and)continues) to)attract)new)publishers)that)recognize)both)the)competitive)advantage)of)perpetual)access)through)
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill bendor@unc.edu http://irods.org/ SESYNC Model Integration Workshop Important
Key Considerations for Managing Big Data in the Life Science Industry
Key Considerations for Managing Big Data in the Life Science Industry The Big Data Bottleneck In Life Science Faster, cheaper technology outpacing Moore s law Lower costs and increasing speeds leading
The National Consortium for Data Science (NCDS)
The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?
Deploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
EnterpriseLink Benefits
EnterpriseLink Benefits GGY AXIS 5001 Yonge Street Suite 1300 Toronto, ON M2N 6P6 Phone: 416-250-6777 Toll free: 1-877-GGY-AXIS Fax: 416-250-6776 Email: axis@ggy.com Web: www.ggy.com Table of Contents
Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri
Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis
Data grid storage for digital libraries and archives using irods
Data grid storage for digital libraries and archives using irods Mark Hedges, Centre for e-research, King s College London eresearch Australasia, Melbourne, 30 th Sept. 2008 Background: Project History
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline
SURFsara Data Services
SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,
Certificate Revocation List Validation Mechanisms in Public Key Infrastructure Using Map-Reduce Technique
Middle-East Journal of Scientific Research 24 (Special Issue on Innovations in Information, Embedded and Communication Systems): 132-137, 2016 ISSN 1990-9233; IDOSI Publications, 2016 DOI: 10.5829/idosi.mejsr.2016.24.IIECS.23151
Practical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
Managing Next Generation Sequencing Data with irods
Managing Next Generation Sequencing Data with irods Presented by Dan Bedard // danb@renci.org at the 9 th International Conference on Genomics Shenzhen, China September 12, 2014 Managing NGS Data with
An Introduction to Managing Research Data
An Introduction to Managing Research Data Author University of Bristol Research Data Service Date 1 August 2013 Version 3 Notes URI IPR data.bris.ac.uk Copyright 2013 University of Bristol Within the Research
Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
Computational Requirements
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density
BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything
BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest
Why use workflow? In recent years, explosive amounts of biological information has been obtained and deposited in various databases.
Workflow technology It s a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services). It facilitates knowledge exchange within
Research Data Management Guide
Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that
Distributed File Systems An Overview. Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG
Distributed File Systems An Overview Nürnberg, 30.04.2014 Dr. Christian Boehme, GWDG Introduction A distributed file system allows shared, file based access without sharing disks History starts in 1960s
CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION
1 E N D U R A D A T A EDpCloud: A File Synchronization, Data Replication and Wide Area Data Distribution Solution CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION 2 Resilient
DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD
DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure Arcot (RAJA) Rajasekar DICE/SDSC/UCSD What is SRB? First Generation Data Grid middleware developed at the San Diego Supercomputer Center
Save Time and Money with Quantum s Integrated Archiving Solution
Case Study Forum WHITEPAPER Save Time and Money with Quantum s Integrated Archiving Solution TABLE OF CONTENTS Summary of Findings...3 The Challenge: How to Cost Effectively Archive Data...4 The Solution:
Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences
Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences David Fergusson Head of Scientific Computing The Francis Crick Institute The Francis Crick Institute
Analysis of NGS Data
Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference
How to Cost Effectively Retain Reference Data for Analytics and Big Data. Molly Rector, EVP Product Management & WW Marketing, Spectra Logic
How to Cost Effectively Retain Reference Data for Analytics and Big Data Molly Rector, EVP Product Management & WW Marketing, Spectra Logic SNIA Legal Notice The material contained in this tutorial is
Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved
DDN Case Study Accelerate > Converged Storage Infrastructure 2013 DataDirect Networks. All Rights Reserved The University of Florida s (ICBR) offers access to cutting-edge technologies designed to enable
Research Data Understanding your choice for data placement
Research Data Understanding your choice for data placement V1.4 Draft Aslam Ghumra 30 October 2014 Executive Summary This document is an overview of the technical infrastructure and supporting processes
High Performance Compu2ng Facility
High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,
European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute
European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Directed by Stan Ahalt, formerly
Columbia s Evolving Research Data Storage Strategy
An experience/position paper for the Workshop on Research Data Management Implementations *, March 13-14, 2013, Arlington Rajendra Bose, Ph.D., Manager, CUIT Research Computing Services Amy Nurnberger,
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Globus Genomics Tutorial GlobusWorld 2014
Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI! Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Current
Collaborative Research Infrastructure Deployments. ddn.com. Accelerate > DDN Case Study
DDN Case Study Accelerate > Collaborative Research Infrastructure Deployments University College London Transforms Research Collaboration and Data Preservation with Scalable Cloud Object Storage Appliance
The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project
The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project Alastair Duncan STFC Pre Coffee talk STFC July 2014 SCAPE Scalable Preservation Environments The
BioHPC Web Computing Resources at CBSU
BioHPC Web Computing Resources at CBSU 3CPG workshop Robert Bukowski Computational Biology Service Unit http://cbsu.tc.cornell.edu/lab/doc/biohpc_web_tutorial.pdf BioHPC infrastructure at CBSU BioHPC Web
Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment
Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249
Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT
Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility
Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility Report of a meeting organized by the Wellcome Trust and held on 14 15 January 2003 at Fort Lauderdale,
Data management plan
FACILITATE OPEN SCIENCE TRAINING FOR EUROPEAN RESEARCH 612425 Data management plan Course for Doctoral Students at ECPR Summer School 2015 Faculty of Social Sciences, University of Ljubljana, Slovenia
Personalized Medicine and IT
Personalized Medicine and IT Data-driven Medicine in the Age of Genomics www.intel.com/healthcare/bigdata Ketan Paranjape General Manager, Life Sciences Intel Corp. @Portlandketan 1 The Central Dogma of
Teleran PCI Customer Case Study
Teleran PCI Customer Case Study Written by Director of Credit Card Systems for Large Credit Card Issuer Customer Case Study Summary A large credit card issuer was engaged in a Payment Card Industry Data
RESEARCH DATA MANAGEMENT POLICY
Document Title Version 1.1 Document Review Date March 2016 Document Owner Revision Timetable / Process RESEARCH DATA MANAGEMENT POLICY RESEARCH DATA MANAGEMENT POLICY Director of the Research Office Regular
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA
Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow Barry Bolding Cray Inc Seattle, WA 1 CUG 2013 Paper Genomic Applications on Cray supercomputers: Next Generation Sequencing
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio
Research Data Storage Facility Terms of Use
Research Data Storage Facility Terms of Use By signing up to these Terms of Use, you are agreeing to abide by the terms of the University Policy for the use of the Research Data Storage Facility. 1. Definition
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.3 Selected Standards
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Policy for the use of the Research Data Storage Facility
Introduction: Policy for the use of the Research Data Storage Facility The University s High Performance Computing (HPC) facility went live to users in May 2007. Access to this world-class HPC facility
Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup
Technical white paper Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup Table of contents Executive summary... 2 Introduction... 2 What is NDMP?... 2 Technology overview... 3 HP
Project management integrated into Outlook
Project management integrated into Outlook InLoox PM 7.x off-line operation An InLoox Whitepaper Published: October 2011 Copyright: 2011 InLoox GmbH. You can find up-to-date information at http://www.inloox.com
RingStor User Manual. Version 2.1 Last Update on September 17th, 2015. RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816.
RingStor User Manual Version 2.1 Last Update on September 17th, 2015 RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816 Page 1 Table of Contents 1 Overview... 5 1.1 RingStor Data Protection...
Case Study: Centre for e-research (CeRch), King s College London
Case Study: Centre for e-research (CeRch), King s College London Primary contact for case study: Gareth Knight, Digital Curation Specialist Additional input from: Richard Palmer, System Administrator 1.
Introduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
Office hours: by appointment. Practical Computing for Biologists. Steven H. D. Haddock & Casey Dunn (2011).
MCB 5429: Theory and Practice of High Throughput Sequence Analysis Bioinformatics analysis of data from next generation sequencing Mon, Wed. 1-2:15pm Beach Hall room 202 Spring 2016 Syllabus Instructor
Information Technology Branch Access Control Technical Standard
Information Technology Branch Access Control Technical Standard Information Management, Administrative Directive A1461 Cyber Security Technical Standard # 5 November 20, 2014 Approved: Date: November 20,
Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper
Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper About this Document The purpose of this technical paper is to discuss how ExtremeZ-IP supports Microsoft Active Directory.
Image Gateway for Apeos 2.0
IGA2.0 Image Gateway for Apeos 2.0 Business Process Solutions Business Process Optimisation Versatility of the multi-function device (MFD) is an often-touted benefit by all the major office equipment manufacturers,
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
xms 1.1: Environment Provisioning and Clustering Setup for Documentum Products
White Paper xms 1.1: Environment Provisioning and Clustering Setup for Documentum Products Abstract This white paper provides a detailed overview of configurations required for provisioning a Cluster environment
Open science tools available for Finnish higher education
Open science tools available for Finnish higher education Open Science: Engaging Finland s Doctoral Schools, 20.10.2014 Outline Open Science and Research Handbook IDA - Storage service for research data
CAREER TRACKS PHASE 1 UCSD Information Technology Family Function and Job Function Summary
UCSD Applications Programming Involved in the development of server / OS / desktop / mobile applications and services including researching, designing, developing specifications for designing, writing,
THE UNIVERSITY OF LEEDS. Vice Chancellor s Executive Group Funding for Research Data Management: Interim
THE UNIVERSITY OF LEEDS VCEG/12/274 Vice Chancellor s Executive Group Funding for Research Data Management: Interim SOME CONTENT HAS BEEN REMOVED FROM THIS PAPER TO MAKE IT SUITABLE FOR PUBLIC DISSEMINATION
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
DocAve 6 Service Pack 1
DocAve 6 Service Pack 1 Installation Guide Revision C Issued September 2012 1 Table of Contents About the Installation Guide... 4 Submitting Documentation Feedback to AvePoint... 4 Before You Begin...
GC3 Use cases for the Cloud
GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect
MEETING THE CHALLENGES OF COMPLEXITY AND SCALE FOR MANUFACTURING WORKFLOWS
MEETING THE CHALLENGES OF COMPLEXITY AND SCALE FOR MANUFACTURING WORKFLOWS Michael Feldman White paper November 2014 MARKET DYNAMICS Modern manufacturing increasingly relies on advanced computing technologies
EGI Compute and Data Services for Open Access in H2020
EGI Compute and Data Services for Tiziana Ferrari Technical Director, EGI.eu tiziana.ferrari@egi.eu 25/02/14 1 EGI-InSPIRE RI-261323 Outline What is EGI? Services for Researchers Supporting Open Access
Data Grids. Lidan Wang April 5, 2007
Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural
Energy and Space Efficient Storage: Multi-tier Strategies for Protecting and Retaining Data
Energy and Space Efficient Storage: Multi-tier Strategies for Protecting and Retaining Data NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White
NETWRIX EVENT LOG MANAGER
NETWRIX EVENT LOG MANAGER QUICK-START GUIDE FOR THE ENTERPRISE EDITION Product Version: 4.0 July/2012. Legal Notice The information in this publication is furnished for information use only, and does not
irods Technologies at UNC
irods Technologies at UNC E-iRODS: Enterprise irods at RENCI Presenter: Leesa Brieger leesa@renci.org SC12 irods Informational Reception 1! UNC Chapel Hill Investment in irods DICE and RENCI: research
1 About This Proposal
1 About This Proposal 1. This proposal describes a six-month pilot data-management project, entitled Sustainable Management of Digital Music Research Data, to run at the Centre for Digital Music (C4DM)
Vodacom Managed Hosted Backups
Vodacom Managed Hosted Backups Robust Data Protection for your Business Critical Data Enterprise class Backup and Recovery and Data Management on Diverse Platforms Vodacom s Managed Hosted Backup offers
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
Data Management Planning
DIY Research Data Management Training Kit for Librarians Data Management Planning Kerry Miller Digital Curation Centre University of Edinburgh Kerry.miller@ed.ac.uk Running Order I. What is Research Data
ESPRIT 29938 ProCure. ICT at Work for the LSE ProCurement Chain:
ESPRIT 29938 ProCure ICT at Work for the LSE ProCurement Chain: Putting Information Technology & Communications Technology to work in the Construction Industry The EU-funded ProCure project was designed
Scheduling in SAS 9.4 Second Edition
Scheduling in SAS 9.4 Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. Scheduling in SAS 9.4, Second Edition. Cary, NC: SAS Institute
SAS Integration Technologies Server Administrator s Guide, Third Edition
SAS 9.1.3 Integration Technologies Server Administrator s Guide, Third Edition The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS 9.1.3 Integration Technologies:
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Introducing Cloud Backup for MS SQL Server The Cloudberry Lab Whitepaper
Introducing Cloud Backup for MS SQL Server The Cloudberry Lab Whitepaper The need for backup By performing timely backups, your valuable data is reliably protected against possible loss, corruption and
Hadoopizer : a cloud environment for bioinformatics data analysis
Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,
Vodafone Total Managed Mobility
Vodafone Total Managed Mobility More productivity, less complexity Vodafone Power to you What s inside? What you get see how your business benefits 4 In detail find out how it all works 5 Service lifecycle
Backup and Recovery for SAP Environments using EMC Avamar 7
White Paper Backup and Recovery for SAP Environments using EMC Avamar 7 Abstract This white paper highlights how IT environments deploying SAP can benefit from efficient backup with an EMC Avamar solution.
Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction
Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all
Top Ten Security and Privacy Challenges for Big Data and Smartgrids. Arnab Roy Fujitsu Laboratories of America
1 Top Ten Security and Privacy Challenges for Big Data and Smartgrids Arnab Roy Fujitsu Laboratories of America 2 User Roles and Security Concerns [SKCP11] Users and Security Concerns [SKCP10] Utilities: