DATA MANAGEMENT PLAN IN THE REAL LIFE SCIENCES Yvan Le Bras Cyril Monjeaud Olivier Collin Jacques Nicolas CNRS UMR 6074 IRISA-INRIA
Context Now : Genomics : Next Generation Sequencing Now : Proteomics Next : Bio-imaging Kahn. On the future of genomic data. Science (2011) vol. 331 (6018) pp. 728-9 Digital data Huge amount Heterogenous Critical situation for some laboratories
Context Exchange from one domain to another From ICT / IT to scientific domains Between scientific domains Life science integrators e-science integrators
E-BIOGENOUEST From the e-biogenouest project to the first french e- Science center : CeSGO
E-Biogenouest Started in May 2012 for 3 years Funded by Brittany and Pays de la Loire E-science initiative for the Biogenouest network Test an e-science approach Roadmap preparation
E-Biogenouest Started in May 2012 for 3 years Funded by Brittany and Pays de la Loire E-science initiative for the Biogenouest network Test an e-science approach More than 120 scientists trained! 1669 meetings ;) Roadmap preparation -UEB C@mpus -CPER -FRM -INCa -H2020 Health 7 submitted publications Agro Environment IT More than 200 users! An innovative VRE concept -Mission interdisciplinarité CNRS -PIA -IFB -Fce Génomique -Rapsodyn -Sciences citoyennes
VRE: a tool for e-science application Virtual Research Environment Data User Web portal Collaboration softwares Community Processing resources
An innovative VRE approach Research Lifecycle Open source solutions Mutualise Don t reinvente the wheel win win Break down silos http://www.jisc.ac.uk/whatwedo/campaigns/res3/jischelp.aspx#simulate
Continuum HubZero Galaxy EMME Communauté Continuum data management & analysis Collaborative environment Collaboration
HUBzero : Scientifique collaborative platform ebgo HUB HUBzero to share knowledge and manage groups and projects Informations 218 users 111 projects 53 groups 729 resources > 400 uniq users uniques by month Purdue University M. McLennan, R. Kennell. Comput Sci Eng, 12:48-53, 2010.
ISAtools : Experimental data management EMME ISAtools suite to store data & metadata Fonctionalities -based on biomed ontologies -bridge between existing biomed standards -format publication submission -Pydio to upload data -biological investigation repository (data + metadata) Oxford eresearch Centre P. Rocca-Serra et al. Bioinformatics, 26;254(6), 2010
Galaxy : Data analysis web platform GALAXY by GenOuest To analyse & share data as processes and tools Informations 34917 jobs 150 users More than 800 outils Share - data - histories - workflows - tools Penn state university J. Goecks, A. Nekrutenko, J. Taylor, et al. Genome Biol, 25;11(8):R86, 2010
Pydio : File sharing platform Pydio by GenOuest To store & share data as links Informations -Galaxy workspace -EMME workspace -INCa workspace Share - data via URI - control - safety - privacy Abstrium SAS Charles du jeu, David Gillard et al.
What are our goals? For society Open Science and open data For end users scientists communities Data management plan Preserve, access, share & visualise (data & analytics porocesses) Help for project management For ICT Facilitate the use of tools Research Service Accelerate switch between dev to production state Optimise infrastructures use (storage, computing & network ) Infrastructure for data infastructure of data
DMP ON THE LINE From data storage to publication
CeSGO : Data storage
Data storage
Data storage URL generation
Metadata management
Metadata management
Metadata management Configuration
Metadata management Configuration
Metadata management Configuration
Metadata management Configuration
Metadata management Configuration
Metadata management Configuration
Metadata management Isacreator
Metadata management Isacreator
Metadata management Isacreator: genomespace
Metadata management Isacreator: local
Metadata management Isacreator: choose a config
Metadata management Isacreator: existing isatab
Metadata management Isacreator: existing isatab
Metadata management Isacreator: existing isatab
Metadata management Isacreator: Investigation
Metadata management Isacreator: Study
Metadata management Isacreator: Study 1
Metadata management Isacreator: Assay 1
Metadata management Isacreator: Assay 1 / Data
Metadata management Isacreator: Study
Metadata management Isacreator: create an ISArchive
Metadata management Isacreator: Study
Data analysis Metadata & data analysis: Galaxy
Data analysis Metadata & data analysis: Galaxy / Import ISArchive
Data analysis Metadata & data analysis: Galaxy / Import ISArchive
Data analysis Metadata & data analysis: Galaxy / Extract ISArchive
Data analysis Metadata & data analysis: Galaxy / Extract ISArchive
Data analysis Metadata & data analysis: Galaxy / Extract ISArchive
Data analysis Metadata & data analysis: Galaxy / Extract ISArchive
Data analysis Metadata & data analysis: Galaxy / Download data
Data analysis Metadata & data analysis: Galaxy / Download raw data
Data analysis Metadata & data analysis: Galaxy / Extract ISArchive
Data analysis Metadata & data analysis: Galaxy / Extract ISArchive
Metadata repository Metadata repository: Bii
Metadata repository Metadata repository: Bii 1 study
Metadata repository Metadata repository: Bii Data via URL / Protocols
CeSGO & DMP Données administratives Dénomination du projet Description du projet Nom / ID du responsable Agence de financement Version du DMP Politique appliquée aux données Responsabilités et ressources Collecte / création de données Description du jeu de données Protocole Méthode Equipements Assurance qualité appliquée Documentation et métadonnées Entrepôt Bii Standard de métadonnées : ISA-TAB
CeSGO & DMP Stockage, sauvegarde et sécurité des données Datacenter CeSGO pendant la durée du projet (max : 5 ans) Ethique et cadre légal Protection des données sensibles ou personnelles CC version 4.0 Partage des données Accès libre ou restreint Délai : 3 ans max après leur collecte Entrepôts (GEO, Genbank, SRA, Uniprot, PRIDE,.) Outils nécessaires à la réutilisation / validation des données Data paper Sélection et archivage des données
CESGO: 5 GOALS From Data Mangement to Accessibility
CeSGO : Western France e-science metadata Data management URI Life sciences protocols
CeSGO : Western France e-science New VREs! Open Data
CeSGO : Western France e-science New VREs! Connected using semantic web approaches Thanks to DOI attribution Linked Data
CeSGO : Western France e-science cloud Reproducibility Galaxy versioning docker
CeSGO : Western France e-science wiki Accessibility Analytics processes Public resources Experiments Publications
Merci de votre attention La plate-forme Bio-informatique GenOuest Le groupe Symbiose IRISA/INRIA GenOuest-Dyliss-Genscale ebgo HUB (collaboration) Scitizen portal (citizen science) EMME portal (data management) Galaxy instance (data analysis) GO4Bioinformatics (education ) http://www.e-biogenouest.org/ http://scitizen.genouest.org http://emme.genouest.org/ http://galaxy.genouest.org/ https://www.e-biogenouest.org/einfrastructure/education
CeSGO : Western France e-science New VREs!