Data advertising and managin system for Biobanks A use case for the egenvar data management system.



Similar documents
Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG )

Attacking the Biobank Bottleneck

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Searching biomedical data sets. Hua Xu, PhD The University of Texas Health Science Center at Houston

Analysis of Illumina Gene Expression Microarray Data

Global Alliance. Ewan Birney Associate Director EMBL-EBI

Big Data Processing and Analytics for Mouse Embryo Images

<Insert Picture Here>

What is Distributed Annotation System?

NIH s Genomic Data Sharing Policy

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015

Technical Data Sheet: imc SEARCH 3.1. Topology

NaviCell Data Visualization Python API

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Biorepository and Biobanking

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

GRIN-Global Project. the global plant genebank information management system

ECM Governance Policies

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

Functional Requirements for Digital Asset Management Project version /30/2006

Issues with Tissues. Bertha delanda Celia Molvin/Kevin Murphy Research Compliance Office Stanford University

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

JustClust User Manual

Karl Lum Partner, LabKey Software Evolution of Connectivity in LabKey Server

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

ICE Trade Vault. Public User & Technology Guide June 6, 2014

GeneProf and the new GeneProf Web Services

USING STUFFIT DELUXE THE STUFFIT START PAGE CREATING ARCHIVES (COMPRESSED FILES)

XpoLog Center Suite Data Sheet

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

A grant number provides unique identification for the grant.

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Q: What browsers will be supported? A: Internet Explorer (from version 6), Firefox (from version 3.0), Safari, Chrome

Using the Bionimbus Protected Data Cloud (PDC): Obtaining Access Credentials FAQ

Connecting Basic Research and Healthcare Big Data

A Primer of Genome Science THIRD

Genomic CDS: an example of a complex ontology for pharmacogenetics and clinical decision support

The HealthWizard 5 suite includes six programs: Health History. Fitness Profile. Wellness Profile. MicroFit Manager. SF-36 Health Survey

Classifying Adverse Events From Clinical Trials

KNOWLEDGE MANAGEMENT: COLLABORATIVE INFRASTRUCTURES!!!

Using Metadata Manager for System Impact Analysis in Healthcare

Synapse Privacy Policy

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

escience and Post-Genome Biomedical Research

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)

Sonian Getting Started Guide October 2008

PRODUCT DATA. PULSE WorkFlow Manager Type 7756

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

Atomic Hunter Atomic Hunter URL Search Hunt Advanced... Keyword Search Common Settings

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

This document is no longer current. Please go to the following URL for more information:

ACCREDITED SOLUTION. EXPLORER Core FTP

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

PODD. An Ontology Driven Architecture for Extensible Phenomics Data Management

Introduction. Overview of Bioconductor packages for short read analysis

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future

Towards the construction of an integrated Wheat Information System

How Real-time Analysis turns Big Medical Data into Precision Medicine?

CPAS Overview. Josh Eckels LabKey Software

Biobanks: DNA and Research

Create New MyWorkKeys Account Quick-Start Guide for the ACT National Career Readiness Certificate (ACT NCRC )

BlueFuse Multi Analysis Software for Molecular Cytogenetics

AgroPortal. a proposition for ontologybased services in the agronomic domain

Transcription:

Data advertising and managin system for Biobanks A use case for the egenvar data management system. Sabry Razick (24 October 2014, ESBB) Department of Cancer Research and Molecular Medicine Norwegian University of Science and Technology Norway igr.medisin.ntnu.no/data/egenvar/ www.ntnu.no

egenvar Data management em (EGDMS) provides tions for: anaging data/metadata for ternal and external reuse dvertising these data without posing the data themselves. Samples Biobank Derived data Associated data

Data not accessed or moved Cataloguing and advertising. Link LIMS and derived data Tags to describe data Establish connections using shared Data <-> Metadata Genotype <-> Phenotype Private <-> Public Description <-> Tag Processing <-> Relationships Big data<-> Location

mples of information t can be attached as tags What type of samples How was it collected Primary/secondary Processing preformed How is it stored/preserved Protocols followed Techniques used Instruments used Phenotypes investigated File type Parameters Software Types of processing Relevant publications rimental Factor Ontology (EFO) logy for Biomedical Investigations (OBI) le Processing and Separation Techniques Ontology(SEP) arch Resource Ontology an Physiology Simulation Ontology(HuPSON) an Disease Ontology (DOID) nal Cancer Institute Thesaurus Experiment Raw data Processed data Interpretations Blood specimen (OBI:0000655) venous blood (SNOMEDCT:53130003) fasting:(xco_0000102) 1M-Duo Infinium HD BeadChip (OBI_0002006 Serum alanine aminotransfera (EFO_0004735 Colorimetric detection (SEP:00165) HDL cholesterol (SNOMEDCT/102737005) GenomeStudio V2011.1 (ION:225) ANOVA (SWO:0000014) linear mixed model: (SCAIVPH_00000687) cardiovascular system disease (DOID:1287) myocardial infarction(doid_5844)

Advertising

Public web-interface 1 2 3.A 3.B Login to see mo details / request more informatio using the link or

Private web-interface se filters to further efine the results nd. e.g. get all ions of the files Get more details. e.g. people involved, grouping Tags. What are the attributed using tags Hierarchical information. i.e. parents and children

e of many ways to navigate: The graph browse

Data mangement

Search for a tag using terminal egenv.sh -exact -search tag="pre-eclampsia" #URL(DISEASE_ONTOLOGY_TAGSOURCE)=https://ans- 180230.stolav.ntnu.no:8185/eGenVar_web/Search/SearchResults3?TABLETOUSE_DISEASE_ONTOLOGY_TAGSOURCE=572cb cd98dbe48fdcc68aea6861e24173d27d9ee 1) DISEASE_ONTOLOGY_TAGSOURCE.PARENT_ID=1005 DISEASE_ONTOLOGY_TAGSOURCE.NAME=pre-eclampsia DISEASE_ONTOLOGY_TAGSOURCE.OBO_ID=DOID:10591 DISEASE_ONTOLOGY_TAGSOURCE.DEFINITION=\N DISEASE_ONTOLOGY_TAGSOURCE.PATH=disease.obo>disease>disease of anatomical entity>cardiovascular system disease>vascular disease>arterydisease>hypertension>pre-eclampsia(disease_ontology_tagsource=946) igr.medisin.ntnu.no/data/egenvar/ www.ntnu.no

Retrieve file paths using sample details tags egenv.sh -search tag="4535811005.r01c01" "sampledetails files2paths.filepath"..waiting for the server.... #URL for this result: #URL(FILES2PATH)=https://ans- 180230.stolav.ntnu.no:8185/eGenVar_web/Search/SearchResults3?TABLETOUSE_FILES2PATH=e93e2fcda6979d2bb0224 03b6f4d78b2c8d5cb05 1) FILES2PATH.FILEPATH=/...nt_20100504.idats/4535811005/4535811005_R01C01_Grn.idat 2) FILES2PATH.FILEPATH=/...nt_20100504.idats/4535811005/4535811005_R01C01_Red.idat Ended in 2 seconds igr.medisin.ntnu.no/data/egenvar/ www.ntnu.no

Type Content description Content generation details. Relationships Location and ownership Relationships Provenance Track Relocation Virtual rearrangements Information consolidation Record changes and relocation Phenotypes Instruments and de Protocols Sample information People/Experts Controls

Acknowledgments Pål Sætrom Oddgeir Lingaas Holmen Rok Mocnik Laurent Thomas Einar Ryeng Finn Drabløs HUNT biobank Kristian Hveem Research counsel of Norway Email: sabryr@gmail.com Project link: http://bigr.medisin.ntnu.no/data/egenvar/ Publication (PMID: 24682735) : The egenvar data management system cataloguing and sharing sensitive data and metadata for the life sciences. igr.medisin.ntnu.no/data/egenvar/ www.ntnu.no

r 1 Donor Original sample Sample B B.2 A B.1 B B.2 B.1 A 2 F1 Experiments/Instrument Annotation resources (FA) Raw data (F1) Post-processing Groomed data (F2) Analysis Processed data (F3) Filter Filtered data (F4) F2 Annotate FA F3 F4 Interpretations, results(f5) F5

Data sharing strategies Email Compress/FTP Data SSH ls/find/sed Advanced files systems, Federated systems Interface User Galaxy EGDMS PubMed ArrayExpress irods ISA GeneBank tools and - privilege NCBI management Taverna GEO TwinNET ArrayExpress IntAct Finland Synapse dbgap Graz UCSC GEO (Gene biobank- Genome Expression Browser Austria Nature Ensembl Omnibus) scientific Genome Browse dat Giga ENCODE dbgap science (The database o Dataverse 1000 Genotypes Genomeand DRYAD Phenotypes) Write a detailed description Create a summary/ Export to schema Parse files/extract data Central/local repository Extract details by The host system Work flow management Access management Collect metadata (e.g. file system) Record provenance with relationships Describe content Organise Describe using tags internal information management system + Metadata server igr.medisin.ntnu.no/data/egenvar/ www.ntnu.no

egenvar data management system(egdms) Sensitive data cannot be freely exchanged, hosted on public repositories or transformed in the same ways as public data. Thus data stays hidden and not reused. Data advertising Data advertising using extended set of meta-data as tags. Keep the data where they are with existing access restrictions and disclosed their presence. Data management Local data management to keep track of what is advertised. Facilitate locating data for internal and external reuse. igr.medisin.ntnu.no/data/egenvar/ www.ntnu.no