Deliverable 7.3.1 First report on sample storage, DNA extraction and sample analysis processes



Similar documents
G E N OM I C S S E RV I C ES

Core Facility Genomics

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction

ITT Advanced Medical Technologies - A Programmer's Overview

The 100,000 genomes project

School of Nursing. Presented by Yvette Conley, PhD

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

A Primer of Genome Science THIRD

Dr Alexander Henzing

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

The National Institute of Genomic Medicine (INMEGEN) was

PreciseTM Whitepaper

Translational research facilitating experimental medicine in dementia in the UK

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

TruSeq Custom Amplicon v1.5

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Services. Updated 05/31/2016

Clinical Research Infrastructure

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

DNA Sequencing and Personalised Medicine

Innovation Platform: Sudden Cardiac Death

Submission Schedule for Descriptive/Raw Data

History of DNA Sequencing & Current Applications

Big data in cancer research : DNA sequencing and personalised medicine

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Best Practices for Maintaining Quality in Molecular Diagnostics Gyorgy Abel, MD, PhD

A leader in the development and application of information technology to prevent and treat disease.

Pharmacology skills for drug discovery. Why is pharmacology important?

NORTH PACIFIC RESEARCH BOARD SEMIANNUAL PROGRESS REPORT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

TCRG TCRA/D IGH IGK/L

Regulatory Issues in Genetic Testing and Targeted Drug Development

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1

Rules for conducting ISAG Comparison Tests (CT) for animal DNA testing.

Software Description Technology

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network

Committee on WIPO Standards (CWS)

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Draft NIEHS Strategic Plan. Mission, Vision, Strategic Pillars, Strategic Goals. Draft Mission Statement

Validation and Replication

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Epigenetic variation and complex disease risk

Delivering the power of the world s most successful genomics platform

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

General Services Administration Federal Supply Service Authorized Federal Supply Schedule Price List

Data Analysis for Ion Torrent Sequencing

Graduate Program Objective #1 Course Objectives

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Importance of Statistics in creating high dimensional data

Open Access to Manuscripts, Open Science, and Big Data

AP Biology Essential Knowledge Student Diagnostic

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Voluntary Genomic Data Submissions at the U.S. FDA

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Next Generation Sequencing: Technology, Mapping, and Analysis

Factors for success in big data science

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Guidelines for applicants

SEQUENCING. From Sample to Sequence-Ready

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

How To Understand The Science Of Genomics

Guide for Writing a Short Proposal

Canadian Microbiome Initiative

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

Towards the construction of an integrated Wheat Information System

Information and Data Sharing Policy* Genomics:GTL Program

How Can Institutions Foster OMICS Research While Protecting Patients?

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

Nuevas tecnologías basadas en biomarcadores para oncología

BIOSCIENCES COURSE TITLE AWARD

SOP 3 v2: web-based selection of oligonucleotide primer trios for genotyping of human and mouse polymorphisms

Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future

High-throughput sequencing and big data: implications for personalized medicine?

6 ELIXIR Domain Specific Services

Directorate Medical Operations Patients and Information Nursing Policy Commissioning Development

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Cloud-Based Big Data Analytics in Bioinformatics

Analysis of Illumina Gene Expression Microarray Data

BIOINFORMATICS METHODS AND APPLICATIONS

BIOINFORMATICS Supporting competencies for the pharma industry

Bench to Bedside Clinical Decision Support:

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

July 7th 2009 DNA sequencing

Comprehensive Sample Management Solutions

Proposal to Establish the Crohn s and Colitis Center at the University of Miami Miller School of Medicine

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Transcription:

Model Driven Paediatric European Digital Repository Call identifier: FP7-ICT-2011-9 - Grant agreement no: 600932 Thematic Priority: ICT - ICT-2011.5.2: Virtual Physiological Human Deliverable 7.3.1 First report on sample storage, DNA extraction and sample analysis processes Due date of delivery: 31 st August 2014 Actual submission date: 4 th November 2014 Start of the project: 1 st March 2013 Ending Date: 28 th February 2017 Partner responsible for this deliverable: OPBG Version: 1.3 1

Dissemination Level: Public Document Classification Title First report on sample storage, DNA extraction and sample analysis processes Deliverable 7.3.1 Reporting Period 2 Authors OPBG Work Package WP7 Security PU Nature Keyword(s) RE Sample storage, DNA extraction and sample analysis processes NB. The content of the present deliverable 7.3.1 is strictly related to the deliverable 7.2.1 titled First Report Data Collection Process, for the experimental, laboratory and bioinformatic procedures. Document History Name Remark Version Date Lorenza Putignani Preliminary Draft (with already the 1.1 15/09/2014 first two paragraphs prepared in common with D7.2.1) Lorenza Putignani Final version 1.2 31/10/2014 List of Contributors Name Baban Anwar Barbara Simionati Manco Melania Putignani Lorenza Affiliation OPBG BMR GENOMICS SRL OPBG OPBG List of reviewers Name Bruno Dallapiccola Affiliation OPBG Abbreviations 2

Table of Contents 1.1 Introduction... 4 1.2 Materials and Methods... 7 2. Details of Task activities... 10 2.1 Task T7.3 DNA analysis (mm18).... 10 3. Detailed results... 12 Conclusions and Future Perspective... 13 Index of Figures Figure 1. The assignment of extended genotype (gut microbiota) to complement the genomic reservoir and to fully interpret phenotype profiling.... 5 Figure 2. Example of BioProject for metagenome annotation, usually employed at the Metagenomics Unit of OPBG.... 6 Figure 3. Original bioinformatic pipelines designed and set to generate and process metagenomic and metabolomic data.... 7 Figure 4. Biobank of reference and disease samples available at the Metagenomics Unit of OPBG... 8 Figure 5. Operational steps for bioinformatic data integration workflows and dissemination activities linked to WP7... 9 3

1.1 Introduction NB This paragraph is the same in Deliverable 7.2.1: First report on data collection process In the context of the Model Driven Paediatric European Digital Repository (MD-PAEDIGREE), besides clinical data, the collection and management of genomic and metagenomic data, may actually complement instrumental, routine laboratory and clinical data as a staple resource for medical research. Clinical data is collected during the course of ongoing patient care and the -omic and meta-omic information may actually complement the electronic health records, providing piece of evidence of the entire spectrum of ontological features of the patients. In detail, the entire set of age (i.e., stratification), flare-up conditions, naïve baseline of the pathology manifestation, external perturbations such as diet, antibiotic administration, stress-related symptoms, may be synthetically named by using the term phenomics, expression of the several phetotyping traits of the patient. Over the past 15 years, many authors have proposed that phenomics - large-scale phenotyping - is the natural complement to genome sequencing as a route to rapid advances in systems biology, preparing the route to systems medicine (Schork, N. J. Genetics of complex disease-approaches, problems, and solutions. Am. J. Respir. Crit. Care Med. 156, S103 S109, 1997); Schilling, C. H., Edwards, J. S. & Palsson, B. O. Toward metabolic phenomics: analysis of genomic data using flux balances. Biotechnol. Prog. 15, 288 295, 1999; Houle, D. In The Character Concept in Evolutionary Biology (ed. Wagner, G.) 109 140, Academic Press, 2001); Bilder, R. M. et al. Phenomics: the systematic study of phenotypes on a genome-wide scale. Neuroscience 164, 30 42 (2009); Freimer, N. & Sabatti, C. The human phenome project. Nature Genet. 34, 15 21, 2003). Phenomic-level data are necessary to understand which genomic variants affect phenotypes, to understand pleiotropy and to furnish the raw data that are needed to decipher the causes of complex diseases (obesity, juvenile idiopathic arthritis, cardiopathies). Our limited ability to understand many important biological phenomena suggests that we are not already measuring all important variables and that broadening the possibilities will pay rich dividends. Fundamentally, we can choose to include into this new point of view, additional parameters or data such as genomic fingerprinting indexes (e.g., disease-gene candidates, polymorphisms) and metagenomic gene scaffolds (microbiome), linked to metabolic activities (metabolome), to provide additional and useful indexes of disease. All genotyping and phenotyping parameters need to be measured by omics and meta-omics technologies; indeed WP 7 actually provide the added value to the Project, thanks to technologies for high-throughput phenotyping and genotyping which are fully available in the MD-PEDIGREE Consortium, at the OPBG facilities, and which include conceptual, analytical frameworks, fused to advanced bioinformatic approaches that enable the use of very high-dimensional data. Additionally, dynamic models that link clinical phenomena across levels, have been designed and are currently under advancement. However, phenotypic data continue to be the most powerful predictors of important biological outcomes, such disease progression and mortality. Although analyses of genomic data have been successful at uncovering biological phenomena, they are - in most cases -supplementing rather than supplanting phenotypic information. 4

In WP7, we have identified the scientific and operational rationales for carrying out phenomics research and to integrate phenomic to genomic and metagenomic data by advanced approaches. We have employed conceptual frameworks to taking full advantage of phenomic-level data, considering phenomics and metagenomics as independent disciplines. To evaluate the role of genomic (assessed by disease-gene or candidate gene analysis) and metagenomic (based on gut microbiota signatures) profiling on the development and progress of diseases and on their outcomes, the post-analytical data collection and analysis processes have represented one of the milestones of the WP7. The theoretical and operational framework is based on the concept of extended genotype associate to the new idea of superorganism (Putignani et al., Pediatric Research-Nature, 2014), in which host genome and gut microbiota metagenome can be considered in the context of the functional and structural activities synergically produced by the host and its tissue microbiota. Because of different internal and external stimuli, the individual phenotypes of the patient and/or individual can be considered the product of different variables such as: i) diet; ii) inflammation; iii) environment; iv) xeno-metabolites. The individual phenotype, therefore, is the combination of all these trans-acting elements, combined to genomic and metagenomic reservoirs, through genetic and epigenetic controls. Once the single microbiota is fully described, a genetic fingerprinting is available to complement the individual genetic reservoir (code), through multi-level meta-omic platforms (metagenomics, metabolomics, metaproteomics). The produced data can be employed at individual and population level, to assist in the design of therapeutic and diagnostic pipelines or, rather, in the disease risk prediction of important disease at early onset, respectively (Figure 1). Figure 1. The assignment of extended genotype (gut microbiota) to complement the genomic reservoir and to fully interpret phenotype profiling. 5

During this first year of activities, we have decided to leave out the diet factor from the integration pipelines, because of the complexity of the nutritional algorithms in the assessment of the microbiota components; this aspect will be hopefully developed by dedicated future EU Projects. The other affecting factors have been fully considered in the first step of patient recruitment and sample collection (baseline, onset) and progressively they will be considered during follow-up (e.g., flare-up). They have been analyzed for each patients and the associated ontologies or categories of clinical-diagnostic treats have been uploaded onto the Gnubila database as qualitative and quantitative metagenomics and metabolomics maps, expressed in term of relative abundances of OTUs (operational taxonomic units) and metabolites (volatilome). During this year, the process of data collection (D 7.2.1) has taken place at three levels: i) OPBG repository database, with household data processing and storage procedures; ii) NCBI BioProject submission, EBI repository database (Figure 2); iii) Gnubila data submission, with the intent to generate a shared platform for model generation. Figure 2. Example of BioProject for metagenome annotation, usually employed at the Metagenomics Unit of OPBG. 6

1.2 Materials and Methods NB This paragraph is the same in Deliverable 7.2.1: First report on data collection process All the fecal samples (please see Details of Task activities) have been collected, stored by software assisted barcoding system and sored at the Biobank of OPBG, under controlled conditions. The analyses of the samples have been practicable thanks to the technological platform and related pipelines developed so far (Figure 3). Several original pipelines have been designed and applied to the analytical phase of the data processing, also in collaboration with bioinformatic groups, starting from statistics to systems biology pipelines of data integration (Figure 3). Figure 3. Original bioinformatic pipelines designed and set to generate and process metagenomic and metabolomic data. The large reference database can furthermore provide differential fingerprinting profiling comparing obese, JIA microbiota to other disease signatures to develop phenotyping map for pediatric diseases (Figure 4). 7

Figure 4. Biobank of reference and disease samples available at the Metagenomics Unit of OPBG The integration of phenotyping and genotyping traits will represent the next step of the future activities of the Consortium and Metagenomics Units, with generation of data repository at local (server with 6 CPU) and remote sites (Gnubila, EBI) and with dissemination linked to bioinformatic activities (Figure 5). 8

Figure 5. Operational steps for bioinformatic data integration workflows and dissemination activities linked to WP7 9

2. Details of Task activities 2.1 Task T7.3 DNA analysis (mm18). Progress T7.3.1 DCMP. Molecular results from blood target enrichment sequencing are expected to be obtained from BMR by May as previously described in the project. Regarding the samples from UCL and DHZ, it was decided, during the first internal meeting, that their samples will transit in OPBG for DNA extraction and then will be shipped to BMR. However, to date OPBG has not received any sample from the above mentioned institutions. Still OPBG is proceeding in performing DNA extraction and QC verification. T7.3.2 Rheumatology. In order to analyze the OTU content of JIA patients, a targeted approach based on pyrosequencing of the variable regions V1 and V3 of 16S rrna locus have been performed. Qualitative and quantitative metagenomic analyses of gut microbiota OTUs at Phylum and Order level, have been provided, including the bioinformatic elaborations of JIA gut microbiota type, described by weighted/unweighted UNIFRAC and Bray Curtis algorithms. T7.3.3. CVD Obesity. Blood SNPs analysis is in progress at BMR Genomics. Qualitative and quantitative metagenomic analyses of gut microbiota OTUs at Phylum and Order level, have been provided, including the bioinformatic elaboration of obesity microbiota type, described by weighted/unweighted UNIFRAC and Bray Curtis algorithms. 10

Significant Results T7.3.1 DCMP The first 18 months of the project have been dedicated to design a custom gene panel and to validate the protocol for target enrichment of 56 genes involved in CMD and other forms of inherited cardiomyopathies (HCM, ARVC, CPVT, LQT, SQT and Brugada Syndrome). In agreement with the other partners of the project, the number of genes has been expanded from 18 to 56, in order to get a more comprehensive cardiomyopathy profile for the clinical samples, at similar costs. The genes of interest are listed in the attached excel file Gene list-md-paedigree-wp7. The sequence data obtained from the processing of the first samples were analyzed using a custom bioinformatic pipeline, including variant calling and annotation of the detected variants. These preliminary data were also used to verify the quality of the panel in terms of coverage, reads on-target and specificity of the probes.gene panel design. Since none of the commercially available standard kits allows the selective enrichment of all genes of interest, we opted for designing a custom gene panel. A careful preliminary analysis of performance and costs of several enrichment kits led us to choose the Agilent kit HaloPlex Custom Target Enrichment (1-500 kb cod: G9901C). The panel design was carried out using the web tool Agilent SureDesign:https://earray.chem.agilent.com/suredesign/index.htm. The parameters have been optimized in order to improve the coverage in sequence regions characterized by high GC-content and low mappability. The design included all coding exons of genes of interest, UTR regions and from 25 to 50 bp of flanking intronic regions. The target region size is 443 kb and the target coverage is 99.38%. Sample preparation and sequencing. Each genomic DNA (gdna) sample was first checked for quality and quantity and then an individual targetenriched, indexed library was prepared, following the official Agilent protocol (in attachment) for the Illumina platform. For each sequencing run, equimolar amounts of 22 libraries were multiplexed and the final pool was sequenced in the paired end format 2 x 150 bp on Illumina MiSeq system using the Illumina kit "MiSeq Reagent Kit v2 (300 cycle)". For Bioinformatic pipeline for CMPD panel sequencing please see Deliverable 7.2.1 Annex 1 T7.3.2 Rheumatology, JIA. Bioinformatic pipeline for metagenomic analyses: please see Deliverable 7.2.1 T7.3.3. CVD Obesity. Bioinformatic pipeline for CVD-risk assessment; please see Deliverable 7.2.1 Annex 2; 11

Explanation of reasons for failing to achieve critical objectives and its impact Blood and fecal sample collection for genetic and metagenomic analyses, respectively, is still at the beginning or even not started for DHZ, UCL, Utrecht samples. Reasons for deviations from DoW The sampling from DHZ and Utrecht patients has started. Sample collection by UCL should be started immediately. We do not have specific explanations. Proposed corrective actions We suggest the immediate sampling process from the rest of Consortium Centers that still have not provided to sample collection. 3. Detailed results a) metagenomics, metabolomics profiling of gut microbiota; b) and host genomics Please see deliverable 7.2.1 report, which is strictly related to the present deliverable 7.3.1 for both experimental and bioinformatic procedures. 12

4. Conclusions and Future Perspective Based on these preliminary results, sample and DNA biobank will be enlarged and procedures standardized during the first year of activities will be followed for both genomic and metagenomic activities. Ontological categories and phenomics features will be deposited onto the MD-Paedigree Infostructure database for all patient analyzed; integration of data and omics data will be performed at local level (OPBG, metagenomics Unit) for metabolomics and metagenomics, by optimized and dedicated bioinformatic pipelines, and at Infostructure level by considering the other features, including host genomics and clinical variables for obese, JIA and CVR-associated patients. 13