Typing in the NGS era: The way forward!



Similar documents
Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Use of Whole Genome Sequencing (WGS) of food-borne pathogens for public health protection

Basic Course on Bioinformatics tools for Next Generation Sequencing data mining June, 2015 Istituto Superiore di Sanità, SIDBAE Training Room

Whole genome sequencing of foodborne pathogens: experiences from the Reference Laboratory. Kathie Grant Gastrointestinal Bacteria Reference Unit

Identification and Characterization of Foodborne Pathogens by Whole Genome Sequencing: A Shift in Paradigm

Workshop on Methods for Isolation and Identification of Campylobacter spp. June 13-17, 2005

Bacterial Next Generation Sequencing - nur mehr Daten oder auch mehr Wissen? Dag Harmsen Univ. Münster, Germany dharmsen@uni-muenster.

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

EU Reference Laboratory for E. coli Department of Veterinary Public Health and Food Safety Unit of Foodborne Zoonoses Istituto Superiore di Sanità

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Next Generation Sequencing: Technology, Mapping, and Analysis

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

Network Activities PTs on VTEC detection and typing and training initiatives

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

2 Short biographies and contact information of the workshop organizers

A Primer of Genome Science THIRD

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

A Fast, Accurate, and Automated Workflow for Multi Locus Sequence Typing of Bacterial Isolates

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

School of Nursing. Presented by Yvette Conley, PhD

Cloud-Based Big Data Analytics in Bioinformatics

E. coli plasmid and gene profiling using Next Generation Sequencing

Databases and platforms for data analysis from NGS of MTB

Szent István University Postgraduate School of Veterinary Science

Data Analysis for Ion Torrent Sequencing

How Sequencing Experiments Fail

Module 1. Sequence Formats and Retrieval. Charles Steward

Introduction to NGS data analysis

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Microarray Data Analysis Workshop. Custom arrays and Probe design Probe design in a pangenomic world. Carsten Friis. MedVetNet Workshop, DTU 2008

Use of Whole Genome. of food-borne pathogens for public health protection. Efsa Scientific Colloquium Summary Report June 2014, Parma, Italy

How many of you have checked out the web site on protein-dna interactions?

BAPS: Bayesian Analysis of Population Structure

mygenomatix - secure cloud for NGS analysis

Rules and Format for Taxonomic Nucleotide Sequence Annotation for Fungi: a proposal

Introduction to Bioinformatics 3. DNA editing and contig assembly

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Analysis of ChIP-seq data in Galaxy

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Innovations in Molecular Epidemiology

CALL FOR DATA ON VEROTOXIGENIC ESCHERICHIA COLI (VTEC) / SHIGATOXIGENIC E. COLI (STEC)

TOWARD BIG DATA ANALYSIS WORKSHOP

Analyzing A DNA Sequence Chromatogram

Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Version 5.0 Release Notes

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg

Factors for success in big data science

Future Perspectives in Public Health Microbiology

Bioinformatics Grid - Enabled Tools For Biologists.

Computational Requirements

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Metagenomics revisits the one pathogen/one disease postulates and translate the One Health concept into action

Biological Databases and Protein Sequence Analysis

The Galaxy workflow. George Magklaras PhD RHCE

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

A data management framework for the Fungal Tree of Life

Practical Guideline for Whole Genome Sequencing

Bioinformatics and its applications

Multi-locus sequence typing (MLST) of C. jejuni infections in the United States Patrick Kwan, PhD

Delivering the power of the world s most successful genomics platform

SNPbrowser Software v3.5

Next Generation Sequencing Technologies in Microbial Ecology. Frank Oliver Glöckner

UF EDGE brings the classroom to you with online, worldwide course delivery!

Description: Molecular Biology Services and DNA Sequencing

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

Overview sequence projects

Genomes and SNPs in Malaria and Sickle Cell Anemia

Rules for conducting ISAG Comparison Tests (CT) for animal DNA testing.

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Basic Analysis of Microarray Data

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

-> Integration of MAPHiTS in Galaxy

DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Analysis of gene expression data. Ulf Leser and Philippe Thomas

PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data

Genome Explorer For Comparative Genome Analysis

Establishing a Whole Genome Sequence-Based national network for the detection and traceback of Foodborne Pathogens

La capture de la fonction par des approches haut débit

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Introduction To Real Time Quantitative PCR (qpcr)

TruSeq Custom Amplicon v1.5

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

Castillo et al. Rice Archaeogenetics electronic supplementary information

Changing Concept of FMD diagnostics: from Central to Local. Aniket Sanyal Project Directorate on FMD Mukteswar, India

Transcription:

Typing in the NGS era: The way forward! Valeria Michelacci NGS course, June 2015

Typing from sequence data NGS-derived conventional Multi Locus Sequence Typing (University of Warwick, 7 housekeeping genes) SNPs analysis: whole genome comparison of single nucleotidic polymorphisms Whole Genome MLST (wgmlst), Core Genome MLST (cgmlst) and more Still widely open for development!

7-genes MLST Conventional Sanger sequencing NGS-derived PCR, sequencing, electropherograms analysis Direct upload of WGS contigs on a webserver (e.g. ARIES or CGE) Uploading sequences on a webserver to obtain the corresponding alleles and STs Alleles are directly retrieved through blastn comparison with pre-installed database of alleles from University of Warwick with pre-compiled pipelines The way forward: amplifying the 7 genes, pooling, shearing the pool. About 7x800bp = 5600 bp per sample Possibility to barcode the pools and perform simultaneous MLST of wide panels of strains via NGS

Single Nucletidic Polymorphisms concept Multiple alignment of the WGS of the test strains Compiling of a variant call format file per strain, containing the information about each SNP identified Compiling of a distance matrix Phylogenetic tree built on the distance matrix High sensitivity! Still in implementation: need to only evaluate true differences Parameters: Minimum coverage Minimum distance between SNPs Minimum SNP quality Minimum distance to end of the sequence Minimum read mapping quality

SNPs tree = PFGE correlated isolates, Italy, August 2013 = not correlated isolates, Italy, August 2013 Analysis of all the detected nucleotidic differences based on the comparison with a reference sequence Epidemiologically related cases appear very different. Such a high sensitivity increases the risk for errors

NDtree Place Year SNPs analysis based on a different algorithm: only considering nucleotidic positions where the assigned nt is at least 10 times more represented than the other three More robust, less sensitive Epidemiologically related cases appear in the same cluster, but with no nucletidic differences at all Very far strains appear with no differences The sensitivity may be too low USA 2012 Japan 2001 USA 1997 USA 2012 USA 2012 USA 1997 USA 2003 USA 2012 USA 2012 Italy 1989 Italy 2010

Whole Genome MLST Need for biologically consistent bioinformatic tools for NGS data analysis Whole genome-based MLST (wgmlst): Analysis of the SNPs in a wide panel of genes Allelic profiling of the strains based on wide panels of genes Core Genome MLST (cg-mlst): MLST based on all the core genes of the species Need for reference databases of alleles sequences and codification in cg-sequence types (cg-st, combination of alleles) and cg-clonal complexes (cg-cc, grouping cg-sts sharing a proportion or subset of alleles of the core genes)

The E. coli pangenome Genomic plasticity Huge pan-genome Van Elsas J.D. et al., 2011 Pangenome Whole genome Core genome Accessory Housekeeping genome

Low sensitivity High robustness Applying MLST to E. coli Conventional MLST 7 housekeeping genes Good for phylogenetic analysis Not good enough for outbreak investigation MLST from WGS data whole genome (wgmlst) core genes (cgmlst) housekeeping genes accessory genes (agmlst) They provide a good strain signature and could be relevant for strain identification

Proposed nomenclature wgmlst cgmlst agmlst High sensitivity; need for a threshold? Lower sensitivity; is it enough for E. coli strain identification? Intermediate sensitivity (low computational requirements) High significance for E. coli It could be good for E. coli strain identification The result of MLST typing could take in consideration all these schemes, providing different levels of characterization of the strain Scheme wgmlst Alleles A-D-F-C-A- -B Unique signature of the strain ST cgmlst D-C-A-E Signature of the core genes cg-st agmlst A-F-B Signature of the accessory genes ag-st

Propose for communication A complete signature of the genetic content of the strain structured in levels Example of results: wgmlst signature cg-st ag-st Strain X A-D-F-C-A- -B cg-st170 ag-st211 For a deepest identification, the schemes could be subdivided in subsets: e.g. cg-st divided in housekeeping and nonhousekeeping- ST and ag-st divided in PAIs-ST, plasmids-st... Subsets of accessory genes could be used also for risk assessment and for guiding clinical menagement of the infections

The accessory genome of E. coli Study on the three major PAIs of STEC in 730 strains LEE OI-122 OI-57 TOTAL 38/43 ORFs 12/16 ORFs 41/117 ORFs 91/172 ORFs tested vbs.psu.edu/research/centers/ecoli/e-coli-workshop Analysis of the Tm of PCR amplified products

HReVAP: Automatic BIN assigment Tm Alleles Bin

Typing E. coli with HReVAP 2 to 9 different alleles detected in each of the 91 genes tested (mean = 4.7) Total 435 alleles detected in the 91 genes tested (mean = 4.7) ->impressive number of combinations The combination of alleles represents a significative signature of the tested strain Clustering by HReVAP

HReVAP: cross-platform and cross-generation HReVAP allows following the evolution of the MGEs by using subpanels of PAIs for the analysis Analysis of accessory genes proved valuable for E. coli typing (identification of sub-populations of VTEC even within serogroups) SURVEILLANCE? It could be applied to the accessory genome of other pathogens Already developed in house for ARIES, soon open and running RT-PCR Sanger NGS Possibility to translate the allele calling to use sequence data! HReVAP could be expanded to the whole accessory genome Designation of alleles basing on an established panel of Tm intervals avoids the problems due to base calling problems!

In order to use NGS data for surveillance Standard Operative Procedures are needed for: Quality check Filtering Assembling NGS-based typing Conclusions Need for reference databases for MLST schemes HReVAP-based tools virtually do not need reference databases, but need for curation for new alleles detected Central repository could receive only allelic profiles or Sequence Types identifiers and the log files of the analysis