Data Management and Standardisation in Distributed Systems Biology Research Martin Golebiewski Heidelberg Institute for Theoretical Studies (HITS) Heidelberg, Germany BioMedBridges workshop "Data strategies for Research Infrastructures Munich, 18 February 2015
Experimental Data ) ] ([ ] [ * * )* ( ]' [ 17 16 4 11 4 4 10 K k K PLC PLC k cyt cyt cyt cyt ER ER + + + = ) ] ([ ] [ )* ( ) ] ([ ] [ ]' [ 21 20 19 18 K k K k cyt cyt cyt mit n n cyt n cyt Mito + + = Simulation Model DB collect & integrate simulate hypothesize & verify Systems Biology Workflows
The Virtual Liver Mission To create Dynamic mathematical models that represent the human liver physiology, morphology and function Models that integrate quantitative data from all scales, from (sub-)cellular levels to the whole organ and the whole body à True multi-scale models of the liver (modular, flexible and modifiable) Models that have a specific focus on applications in medicine
Virtual Liver Network German Systems Biology Flagship Programme (largest in Europe) 5 years (started in 2010) Almost 50 Mio 69 Groups >200 Scientists 36 Institutions 44 Projects Experimentalists, Modelers, Clinicians and Industry Different scales: The Liver Cell - Metabolism - Signalling - Functions Beyond the Cell: - Intercellular - Liver lobule - Whole organ Integration and Translation: - Model integration - Data management - Clinical translation
Our Data Management Mission Feels like buddy to buddy Attached systems as needed
Data management 6
The Data Management Challenges Technical: Standardisation & Integration Protocols (SOPs) Data formats Model formats Metadata for data and models Interfaces Tools for integration -> multi-scale modelling Social: Communication Complex project organisation structure Geographically dispersed Diverse scientific background Managing expectations Benefits for the users? Internal communication Outreach: Informing the public
Data Management Advocats&Multipliers PALs (Project Area Liasons): - Experts working within the project - Collect requirements from partners - Help to communicate the features
Challenges: Constructing Interfaces - Interfaces between data, workflows and models - Interfaces between modules of a model - Interfaces between different models - Interfaces between biological scales What information should transmitted? (entities, parameters, metadata,...) Which format is suitable for the transmission? How should the information be integrated?
Challenges: Metadata Standards Metadata: Data describing data or models - Context & Environment - Entities & Elements! SEMANTICS Helps to connect the dots (data, models...)
SEEK Metadata: Studies & Assays Data Files Models SOPs Publications
SEEK: Data
SEEK: Data
Just Enough Results Model JERM: Data model for metadata and results vs Just enough Just in case - Consensus minimal description of data and metadata - Shared minimum formatting standards (e.g. spread sheets) à Extractors could be build to extract the data and context information
Specimen and Sample Management Automated population of SEEK with specimen and sample descriptions from spreadsheets Parsing Spreadsheet in the lab Specimen and Sample Descriptions
SEEK: Systems Biology Data Hub Yellow Pages: - People - Institutions - Projects - Events Asset talogue: - Investigations, Studies, Assays - Data - Models - Standardized workflows (SOPs) - Publications & Presentations - Biological samples
From Grassroots Initiatives to Approved Standards
Grassroots Standards in Biology & Biotechnology http://biosharing.org/ Source: Susanna-Assunta Sansone (University of Oxford, UK)
Manifold Exchange Formats for Biological Models
So many standards...
So many standards...
Building a Bridge Research Communities Develop and apply de facto community standards in grass-roots initiatives Industries Need standards to integrate data Help to distribute and promote standards
Standardisation Organisations: How do they work? Mirrorcommittee Delegate Expert Working group Technical Committee National International
Biotechnology Committees at DIN and ISO
NormSys Standards for Systems Biology Transfer from grassroots to approved standards Identification and classification of existing initiatives Bring together all stakeholders - Standard developers providing the standards - Research initiatives and industries using and applying the standards - Publishers and Journals distributing (standardized) data and models - Funders having interest in sustainability and reuse of scientific results - Standardization bodies providing support for the standardization process How to publish and distribute the standards? How to convince the communities to apply standards? How to certify implementation of standards?