The open source ISA sooware suite and its internaqonal user community:



Similar documents
COPO: Collaborative Open Plant Omics. Rob Davey Data Infrastructure and Algorithms Group Leader

cheminformatics nomenclature activity binding based data sets knowledge thesauri article

Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Case Study Life Sciences Data

In 2014, the Research Data Purdue University

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

BIOINFORMATICS Supporting competencies for the pharma industry

Data integration options using the Semantic Web. David Shotton

Report of the DTL focus meeting on Life Science Data Repositories

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

PeptidomicsDB: a new platform for sharing MS/MS data.

DATA MANAGEMENT PLAN IN THE REAL LIFE SCIENCES

Life as a scientific database curator

Semantic Workflows and the Wings Workflow System

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

We have big data, but we need big knowledge

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

OpenAIRE Research Data Management Briefing paper

How semantic technology can help you do more with production data. Doing more with production data

Gene Expression Macro Version 1.1

Software Description Technology

Presente e futuro del Web Semantico

Tools for Researchers

BYODs & FAIR Data Stewardship

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

Open Access to Manuscripts, Open Science, and Big Data

RDF Dataset Management Framework for Data.go.th

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

Open Source Software in Life Science Research. Woodhead Publishing Series in Biomedicine

OPEN SOURCE AND BOTTOM-UP VRE APPROACH IN WESTERN FRANCE

PODD. An Ontology Driven Architecture for Extensible Phenomics Data Management

A Streamlined Workflow for Untargeted Metabolomics

Big data, Genomics and Public Health: Big Data meets DNA

Rationale and vision for E2E data standards: the need for a MDR

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

CIP s Open Data & Data Management Guidelines and Procedures

Evaluating Semantic Web Service Tools using the SEALS platform

Position Paper: Validation of Distributed Enterprise Data is Necessary, and RIF can Help

Describing Web Services for user-oriented retrieval

Pragmatic Web 4.0. Towards an active and interactive Semantic Media Web. Fachtagung Semantische Technologien September 2013 HU Berlin

Semantic Search in Portals using Ontologies

Research Data Management in Horizon 2020

arxiv: v1 [cs.dl] 20 May 2016

LJMU Research Data Policy: information and guidance

urika! Unlocking the Power of Big Data at PSC

Extracting and Preparing Metadata to Make Video Files Searchable

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

THE ONTOLOGY SUPPORTED INTELLIGENT SYSTEM FOR EXPERIMENT SEARCH IN THE SCIENTIFIC RESEARCH CENTER

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries

6 ELIXIR Domain Specific Services

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

Data Management and Standardisation in Distributed Systems Biology Research

Semantic Interoperability

Integraion und Steuerung von verteilten Sensorsystemen. Grid-Package for a Robotic Telescope Network (RTN)

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Translational & Interoperable Health Infostructure - The Servant of Three Masters

> Semantic Web Use Cases and Case Studies

CGIAR Open Access and Data Management Implementation Guidelines

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

NCBI resources III: GEO and ftp site. Yanbin Yin Spring 2013

A pretty picture, or a measurement? Retinal Imaging

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Web services in corporate semantic Webs. On intranets and extranets too, a little semantics goes a long way. Fabien.Gandon@sophia.inria.

E-SCIENCE IN WESTERN FRANCE :

A Framework for Collaborative Project Planning Using Semantic Web Technology

Europeana Core Service Platform

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Practical application of SAS Clinical Data Integration Server for conversion to SDTM data

Supplementary Materials and Methods (Metabolomics analysis)

Edinburgh Napier University. Research Data Management Policy

Integration of Polish National Bibliography within the repository platform for science and humanities

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

Dr Alexander Henzing

An industry perspective on deployed semantic interoperability solutions

SEMANTIC INTEGRATION OF HETEROGENEOUS WEB DATA FOR TOURISM DOMAIN USING ONTOLOGY BASED RESOURCE DESCRIPTION LANGUAGE

Big Data: Challenges and Opportunities

School of Biosciences: MRC Phenome Centre-Birmingham

Introduction to XML Applications

Creating a Chemistry of Sciences with Big Data Building the Data Science Institute at Imperial College London

A Knowledge Base Approach for Genomics Data Analysis

Business rules and science

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any

An Interactive Workflow Visualization for Biomedical Processing Pipelines Integrated into the Refinery Platform

Open Ontology Repository Initiative

EMBL Identity & Access Management

Translational & Interoperable Health Infostructure - The Servant of Three Masters. Abridged Presentation

Big Data and Data Analysis for Personalized Medicine

Designing a Semantic Repository

Achille Felicetti" VAST-LAB, PIN S.c.R.L., Università degli Studi di Firenze!

Data integration is a feature that clearly expands the role of the GTL

Data Wrangling: From the Wild to the Lake

Semantic Information on Electronic Medical Records (EMRs) through Ontologies

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Transcription:

The open source ISA sooware suite and its internaqonal user community: Knowledge management of experimental data Alejandra González- Beltrán Senior Software Engineer, ISATeam Oxford e- Research Centre, University of Oxford Oxford, UK NETTAB 2012 Integrated Bio- Search, Como, Italy, November 14-16

Outline Knowledge management of experimental data SeSng the scene The ecosystem: ISA- tab, tools, community Use case Latest addiqons Related projects & main points

SeSng the scene health agro env tox/pharma Source of the figure: EBI website Bioscience is mulq- domain

SeSng the scene health agro env tox/pharma Source of the figure: EBI website Bioscience is mulq- domain Petabytes of data

SeSng the scene health agro env tox/pharma Source of the figure: EBI website Bioscience is mulq- domain Petabytes of data Experimental metadata in Lab books

inves&ga&on study assay Assist in the annotaqon and management of experimental data at source Deal with data from high- throughput studies using one or a combinaqon of omics and other technologies Empower users to uptake community- defined checklists and ontologies Facilitate data sharing, reuse, comparison and reproducibility of experiments, submission to internaqonal public repositories

The ecosystem

The ecosystem ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level Rocca-Serra et al, 2010 Bioinformatics Towards interoperable bioscience data Sansone et al, 2012 Nature Genetics

General purpose & flexible format Domain agnosqc Captures metadata in omics experiments and tradiqonal experiments (e.g. clinical chemistry and histology)

faahko dataset Available in BioConductor Subset of the original data on global metabolite profiling Saghatlian et al. Biochemstry. 2004 LC/MS peaks from the spinal cords of 6 wild- type and 6 FAAH (facy acid amyde hydrolase) knockout mice

- Define key enqqes (e.g. factors, protocols, parameters) - Grouping of studies - Relate studies and assays faahko invesqgaqon

faahko study - Subjects studied: source(s), sampling methodology, characterisqcs - treatments/manipulaqons performed to prepare the specimens NEWT UniProt Taxonomy Database Mouse Genome InformaQcs

faahko study - Subjects studied: source(s), sampling methodology, characterisqcs - treatments/manipulaqons performed to prepare the specimens Mouse Adult Gross Anatomy

- - measurement type, e.g. metabolite profiling technology, e.g. mass spectrometry faahko assay

Report and edit the descripqon of the invesqgaqon using Google Spreadsheets. Use Google Spreadsheets in combinaqon with ISA- Tab templates (created through imporqng the Excel file from the ISAconfigurator) and OntoMaton (for ontology search and tagging support) to report an invesqgaqon.

- - - collaboraqve annotaqon distributed groups of users version control & history Ontology Search and Tagging in Google Spreadsheets

Create templates detailing the steps to be reported for different invesqgaqons, complying to community standards (listed at ), e.g. configuring fields to be (i) ontology terms, (ii) text (with/without regular expression tesqng), (iii) numbers etc.

From the ISA- Tab we can perform analysis, convert to RDF/OWL and other formats for submission/ sharing to local/remote repositories,

From the ISA- Tab we can perform analysis, convert to RDF/OWL and other formats for submission/ sharing to local/remote repositories, + VisualisaQon Methods

faahko Groups faahko Workflow Maguire E, Rocca- Serra P, Sansone SA, Davies J and Chen M. Taxonomy- based Glyph Design - - with a Case Study on Visualizing Workflows of Biological Experiments, IEEE Transac9ons on Visualiza9on and Computer Graphics, volume 18, 2012 (in press)

R package available in BioConductor 2.11 hcp://bioconductor.org/packages/release/bioc/html/risa.html ISAtab class Read ISAtab files into ISAtab objects and save ISAtab files Build xcmsset (xcms package) objects from mass spectrometry assays Augment the ISAtab dataset aoer analysis source & issues tracking hcps://github.com/isa- tools/risa

faahko package v. 2.12 contains ISAtab files describing the experiment faahkoisa = readisata(find.package("faahko")) assay.filename <- faahkoisa["assay.filenames"][[1]] xset = processassayxcmsset(faahkoisa, assay.filename) updateassaymetadata(faahkoisa, assay.filename,"derived Spectral Data File","faahkoDSDF.txt" ) MTBLS2 processing and analysis using Risa, xcms and CAMERA BioConductor packages Metabolights an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research

ISA syntax & Underlying Material/Data workflows Input Material or Data Node Output Material or Data Node Characteris9cs[ ] Factor Value[ ] Protocol REF Characteris9cs[ ] Factor Value[ ] Parameter Value [ ] 26

Make the semanqcs of ISAtab explicit, including materials & data enqqes & processes Exploit the semanqc annotaqons available in ISAtab datasets Augment ISA syntax with new elements (e.g. groups), facilitaqng the understanding & querying of experimental design Facilitate data integraqon & knowledge discovery/reasoning

ISAtab datasets as linked data Connect to the growing Linked Data universe RDF = Resource DescripQon Framework, OWL = Web Ontology Language CollaboraQons with Toxbank ( & W3C Health Care & Life Sciences Interest Group (HCLSIG) ) <subject, predicate, object> <lipoprotein> <parqcipates_in> <inflammatory response> <PRO:212342352> <BFO_0000056> <GO:0006954>

ISAtab dataset Parser ISAtab Graph Analysis ISA Mapping Parser

ISA- OBO- mapping

has specified input material enqty type Saghantelian_1 sample collecqon derives from has specified output processed material type KO1 derives from has specified input extracqon type material processing type KO1_extract has specified output has specified input type InformaQon content enqty derives from mass spectrometry has specified output type./cdf/ko/ko15.cdf

Increasing level of structure different target audiences Notes in Lab books (informaqon for humans) Spreadsheets & Tables (ISAtab metadata) Facts as RDF statements (informaqon for machines)

core organizaqon in the UK Node

Implementation at Harvard ISA hcp://discovery.hsci.harvard.edu/

Implementation at the EBI hcp://www.ebi.ac.uk/metabolights Metabolights an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research 35

The ecosystem

Isa- tools.org @isatools @biosharing isacommons.org biosharing.org