Tools to catch (meta)data for phenotyping Karin Köhl DFG Informationsmanagement MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 1
Information types in phenotyping Phenotype Parameter What? Length Color How? Manual score Manual measurement Automatic Method Where? Youngest leaf Shoot When? After 28 and 55 days In stage 50-55 www.landwirtschaftsskammer.de Meta information Genotype Species, Cultivar Origin of Germplasm Treatment Experimental factor: stress treatment.. Background factors: fertilization, plant protection Environment Climate data Soil data MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 2
Information storage: (VALDIS) TROST project Phenotype Parameter What? Length Color How? Manual score Manual measurement Automatic Method Where? Youngest leaf Shoot When? After 28 and 55 days In stage 50-55 Phenotyper schemes LIMS Pheno- typer- Results Climate DB Method DB Meta information Genotype Species, Cultivar Origin of Germplasm Confidential? Treatment Experimental factor: stress treatment.. Background factors: fertilization, plant protection Environment Climate data Soil data MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 3
Plant database system at the MPI Greenhouse DB Protocols Management Materials LIMS Plant database Genetics Epigenetics Location Phenotyper Scheme composer Controlled vocabulary Phenotyping schemes Sampling LIMS Websystem User interface Phenotyper Results Phenotyping results Treatment information Transformation DB Protocols Management Materials Climate DB Golm Metabolom Database (GMD) MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 4
The core: LIMS based plant database at MPI-MP Founded 2002, productive 2005 Information on genetic resources What do we have? Who owns it? Where is it? Genetic and environmental data for data mining in profiling systems Metabolomics, Transcriptomics, Proteomics MIAME requirements Project management Available and completed work Phenotyping High-throughput propagating projects Long-term research projects on multiple sites with multiple partners MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 5
Genetic resources Plant lines at the MPI-MP Sources of plant lines 65000 60000 Aug 13 Different species, cultivars, accessions Hybrid and inbred lines (RILS, NILS) Genetically modified organism Rapid growth of total number MPI primary source for > 75 % High turnover of owners: ~ 30 % left the institute Introduction of database Number of lines 55000 50000 20000 15000 10000 5000 0 Wildtype Import Transformat. Crossing Generative Vegetative Time Total number % imported Mar 2008 17000 25 Aug 2013 108800 20 Researcher status Number of Researchers Number of lines active 200 78756 left 345 30023 MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 6
Collaborative use of plant material MPI-MP Researchers share plant lines: 20 standard lines everybody works on Arabidopsis Col-0, tomato Money maker, tobacco Samsun Central germplasm production and QC 120 lines studied by 5 10 groups Arabidopsis accessions (110), rice, tobacco and tomato cultivars Qualitatively and quantitatively more data on a genotype Synergy effects, systems biology Number of collaboratively used lines 70 60 50 40 30 20 10 0 5 10 15 20 Number of groups using a line MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 7
Central Plant database Relational database Oracle LIMS (laboratory information management system) Nautilus Information on Plasmids Germplasm database: Plantline ID Plant cultivation database links plants to location and time Plasmid Id Location Id Plasmiddatabase Location databas e Microorganisms and plants incl. pedigree Plant cultivation history -> epigenetics User interface - PDA (XML) - Webinterfaces (XML) - Reports (SQL) MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 8
Plant line and plant object concept Plant line Group of genetically identical organisms Bag of seeds, group of vegetative propagated plants Potentially immortal o Plant object Offspring of plant line Plant cultivated in greenhouse Finite lifespan Unique names and ids of lines and objects [Nt.PH.n]/00 1 [Nt.PH.n].1 [Nt.PH.n].2 [Nt.PH.n].3 [Nt.PH.n].1024 MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 9
Workflows create and link lines and object [Nt.PH.n]/00 1 [300910][N t.ph.n]./00 1 [300910][N t.ph.n].1.2/ 001 [Nt.PH.n].1 [Nt.PH.n].2 [300910] [Nt.PH.n].1.1 Objectworkflows Lineworkflows [Nt.PH.n].3 [300910] [Nt.PH.n].1.2 [Nt.PH.n].1024 [300910] [Nt.PH.n].1.3 MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 10
Workflow created pedigree results in interpretable name [Nt.PH.n].1 Transformation [300749] [Nt.PH.n].1.2 Plasmid ID Propagation [300749] [Nt.PH.n].1.2-3 Plasmid 300749 [300749] [St.D.n].1.2 Species.Subspecies.Mutant MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 11
Plant cultivation module Retrival of information from the germplasm module. Generation of unique name and id for each plant object. Groups plant objects in handling groups, so-called cultures Access to MIAME relevant information on plants climate data, pesticide treatments, plant age at time of sampling. Germplasm database: Plant cultivation database links plants to location and time Location database MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 12
LIMS cultivation module writes location history 03.06. 04 15.06.04 Koehl- 030604-1 Cradle History of plant in culture Unique plant names link to genetic information Plant Date Old Culture New Culture [At.C24.n]1.1001 03.06.2004 Koehl-030604-1 [At.C24.n]1.1002 03.06.2004 Koehl-030604-1 [At.C24.n]1.1003 03.06.2004 Koehl-030604-1 [At.C24.n]1.1004 03.06.2004 Koehl-030604-1 [At.C24.n]1.1005 03.06.2004 Koehl-030604-1 [At.C24.n]1.1006 03.06.2004 Koehl-030604-1 [At.C24.n]1.1007 03.06.2004 Koehl-030604-1 Barcode scanner Automatic data transfer XML-Processor Culture Date Old location New location Koehl-030604-1 03.06.2004 CGMPhy1001 Koehl-030604-1 15.06.2004 CGMPhy1001 FGH0401 Koehl-030604-1 05.07.2004 FGH0401 void13 [At.C24.n]1.1008 03.06.2004 Koehl-030604-1 History of individual plant in space MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 13
Data entry and retrieval on web pages Central data exchange point for standard user Enter data once and for all users Authentication during login Responsibility Intellectual property rights Access policy Expert s access: LIMS interface SQL queries MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 14
Accessory databases Greenhouse DB Access: internal use by greenteam Documentation of Pesticide use -requirement User alerts Metabolomics Cultivation: Materials and Methods, schedules Work schedules Organizing teamwork MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 15
The phenotyper project J. Gremmels Plant breeders and scientist need Plant description(phenotyping) Fast and precise Reproducible and independent of user Affordable Standardised data storage for rapid evaluation Controlled vocabulary Long-term data storage Tamper proof Intellectual property right guarded Funding bodies (EU, DFG, BBSRC): data management enabling data sharing is pre-requisite for future funding. MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 16
Common situation: phenotyping light Data entry on paper Manual data transfer to computer Individual data storage solution MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 17
Obstacles for evaluation and long-term access Data spread to many (excel) files (or worse: on paper) Varied data formats Different name for same value Different units Different classification schemes Ambiguous (if any) connection between data Data access unregulated or over-restrictive Slow access to original data Documentation of evaluation workflow??? Solution Central data storage Database(s) File server Web server Data import into database Web-access Ownership tag Defined variables Controlled vocabulary Standardizes graphical user interface / data sheets MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 18
Phenotyper Replacement of free-text lab book text by standardized entries into mobile terminal Plant description(phenotyping) Fast and precise Reproducible and independent of user Affordable Standardised data storage for rapid evaluation Controlled vocabulary Long-term data storage Tamper-proof Guard intellectual property right Lab book Database MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 19
Phenotyping with Personal Digital Assistant (PDA) PDA on-site (field, greenhouse) ww.m3mobile. net MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 20
Behind the scene: Scheme composer data-base(d) Controlled vocabulary Entities (e.g. organs or tissues) Parameters Exchange with vocabulary of PO consortium Project-specific subsets User management Access to web page User-defined phenotyping schemes MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 21
Tailoring phenotyping scheme on scheme composer Web-based scheme composer Selection of entities and parameters from controlled vocabulary Scheme stored user-specific in database Export to terminal as XML file (Copy and paste). MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 22
PDA user interface for phenotyping Selection of entity Entry (numeric) or selection (class variable) MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 23
Data upload and download to/from phenotyper and support MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 24
Phenotyper result data base: Entity-value concept Plant ids from LIMS-DB One table with all measured parameters for all plants MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 25
Application in research: Projects TROST and VALDIS TROST on identification and validation ofmas for drought tolerance Control Drought BMEL funded Time frame: 2011 2017 (2020) Partner: 5 Academia & 7 Breeders MPIMP (Hincha, Kopka, Walther, Köhl), JKI Groß Lüsewitz, LWK Niedersachsen, LMU München (Geigenberger), U Rostock (Horn) German Private Breeders Association GFP (Lütke-Entrup, Strahwald, Hofferbert) MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 26
Breeding of tolerant genotypes Drought tolerance = stabile yield at reduced water supply depends on several genes considerable environment X gene interaction (G X E) Classical breeding strategies very time-consuming Rare trait requires screen of many genotypes Screen requires stress treatment Marker-assisted selection: enrichment for tolerant genotypes based on DNA markers Combined metabolite and transcript markers MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 27
Testing markers by cross-validation Find marker combination correlating with tolerance in training population Measured Tolerance Predict tolerance from marker in test population and correlate with measured tolerance: significant? Measured Tolerance? X 1 Marker 1 + X 2 Marker 2 + X i Marker i Schudoma et al. 2012 Meth Mol Biol 918 Predicted Tolerance X 1 Marker 1 + X 2 Marker 2 + X i Marker i MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 28
1.4 1.3 1.2 1.1 Strategy for marker identification Biomarker Identification Generation Test dataset Selection check cultivars contrasting tolerance Controlled environment experiments Metabolom Analyse Transcriptom Analyse Biomarker Identification Selection of 30 cultivars Controlled environment Irrigation field experiment Multisite field trials 2.1 2.2 3.1 3.2 Samples Performance data 3.4 Biomarker Validation: predict performance from metabolite/ Transcript data. MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 29
Challenge 1: Large number of samples taken by several groups from many sites over long period of time with different priorities for analysis Sum of Samples Priority Year Month 1 2 3 Grand Total 2011 5 96 96 6 48 1236 1284 7 96 480 576 8 48 452 304 804 10 240 240 480 11 480 480 2011 Total 528 2408 784 3720 2012 1 480 480 2 480 480 6 1236 1236 7 480 480 8 452 304 756 2012 Total 2648 784 3432 (blank) (blank) 0 0 (blank) Total 0 0 Grand Total 528 5056 1568 0 7152 MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 30
Narrowing in.. Parameters Time Check cultivars All timepoints All parameters 7000 samples x 5200 parameters Model optimisation All cultivars Selected time, experiments Selected parameters Model validation Field samples of selected cultivars Selected parameters 2000 samples x 200 parameters MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 31
Challenge 2: High interdependence requires fast data access for modelling MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 32
Data ware house structure Result formats: csv & XML Connection scannerterminal with PC (smart phone, labtop). File copy & paste Manual quality control Upload to web page Automatic quality control Transfer to database Billiau et al. 2012 FPB 39, 948ff MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 33
Data warehouse concept Webserver File server Databases Upload/download User authentification File upload Data entry Data retrival Images Raw data scans Scanner files Standard xls sheets Method descriptions Link Parse ww.m3mobile. net MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 34
Data warehouse: three databases LIMS Genotypes and Experiments Sample workflows generate, link to plant, store, grind, aliquot Phenotyper Entity value concept Controlled vocabulary Evaluation procedures GMD Metabolite profiles Metabolite identification MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 35
Data model from plant to sample Germplasm Plants Experiments Component Sample MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 36
Data model from plant to sample Germplasm Source Cultivar Lot Experiments When Where How Plants Sample Component Plant Time Organ MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 37
Phenotyping data -> tolerance Data combined from LIMS and phenotyper database Plant data: cultivar, origin, replicate, experiment id Growth data: plant id, height, developmental stage, Yield data: plant id, biomass, starch content Experiment information: type, location, year, planting date, harvest date Script-based evaluation QC, normalization, Calculation of tolerance index DRYM from yield data Regression analysis MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 38
Taking meta data into account: climate Data combined from LIMS, phenotyper and external data sources Plant data: plant-id, cultivar, origin, replicate Growth data: developmental stage, stress symptoms Yield data: biomass, starch content Experiment information: type, location, year, planting date, harvest date Climate data: temperature, humidity (local data, DWD) Script-based evaluation Calculation of thermal sums and vapor pressure deficits Calculation of stress index Early drought Late drought MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 39 Evaluation by Schudoma
Metabolite data Data combined from LIMS, phenotyper and golm metabolom database Sample data: plant id, sampling time, Plant data: cultivar, origin, replicate, experiment id Metabolite signals for > 100 metabolites found in all experiments, Experiment information: type, location, year, planting date, harvest date Script-based evaluation Normalization of metabolite data Classification of samples based on tolerance value of cultivar Identification of marker metabolites by random forest method Tolerance prediction from independent samples Data and Evaluation by Sprenger, Erban, Kopka MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 40
Conclusions and acknowledgements Organized data structures facilitate, speed up and improve data evaluation Larger sample size in combined datasets increases test power Linked information allows to test for additional effects Phenotyper The use of controlled vocabulary makes collaborations more efficient PDA based on-site data entry saves time and permits quality control immediately after the measurements Acknowledgement All project partners of TROST who were willing to try new data management methods and to entrusted their data to a central database and thus layed the ground for a powerful analysis J Gremmels, C Schudoma, K. Billiau kept the system up and running Financial support by the DFG and the FNR MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 41
The end Thanks for your attention MPI for Molecular Plant Physiology: Infrastructure group Plant cultivation/transformation Dr. Karin Köhl 42