Introduk)on )l CGE serverne



Similar documents
Annex A to the MPEG Audio Patent License Agreement Essential Philips, France Telecom and IRT Patents relevant to DVD-Video Player - MPEG Audio

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Bacterial Next Generation Sequencing - nur mehr Daten oder auch mehr Wissen? Dag Harmsen Univ. Münster, Germany dharmsen@uni-muenster.

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011


Typing in the NGS era: The way forward!

European Research Council

Use of Whole Genome Sequencing (WGS) of food-borne pathogens for public health protection

Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

E. coli plasmid and gene profiling using Next Generation Sequencing

Axioma Risk Monitor Global Developed Markets 29 June 2016

STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK COURSE OUTLINE VSCT 210 VETERINARY MICROBIOLOGY

Product List Cat nr Product Reactions Listprice ( ) ex VAT)

BIS CEMLA Roundtable on Fiscal Policy, public debt management and government bond markets: issues for central banks

OHIM SEARCH TOOLS: TMVIEW, DSVIEW AND TMCLASS. Making trade mark and design information readily available for users

Making Sense of Your Environmental Monitoring Data. Presented by Dawn McIver MicroWorks, Inc.

Client-IP EDNS Option Concerns

Milk Market Situation. Brussels, 27 August 2015

Workshop on Methods for Isolation and Identification of Campylobacter spp. June 13-17, 2005

QUALITY AND SAFETY TESTING

Title (fr) SOURCE IONIQUE INTERNE DOUBLE POUR PRODUCTION DE FAISCEAU DE PARTICULES AVEC UN CYCLOTRON

Microsoft survey on enterprise social use and perceptions

Tribuna Académica. Overview of Metagenomics for Marine Biodiversity Research 1. Barton E. Slatko* Metagenomics defined

Whole genome sequencing of foodborne pathogens: experiences from the Reference Laboratory. Kathie Grant Gastrointestinal Bacteria Reference Unit

SME Instrument statistics

Transmission of genetic variation: conjugation. Transmission of genetic variation: conjugation

TEPZZ 87_546A T EP A2 (19) (11) EP A2 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: G05B 19/05 ( )

MIBIE Summer School Molecular diagnostics of UTI & STI by using PCR, DHPLC and NGS

Udbredelse og betydning af resistente bakterier: Fokus på spredning fra dyr til mennesker

European Research Council

Quality Assurance and Validation of Next Generation Sequencing

Identification and Characterization of Foodborne Pathogens by Whole Genome Sequencing: A Shift in Paradigm

Next generation DNA sequencing technologies. theory & prac-ce

TEPZZ 6_Z76 A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.:

DNA Sequencing Overview

NIAID Genomics and Bioinformatics Programs

TEPZZ 9 Z5A_T EP A1 (19) (11) EP A1. (12) EUROPEAN PATENT APPLICATION published in accordance with Art.

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

European Research Council. FP7 IDEAS Programme The European Research Council. Funding possibilities from The Europen Research Council.

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

The BIPM key comparison database

A Fast, Accurate, and Automated Workflow for Multi Locus Sequence Typing of Bacterial Isolates

WorldSkills Leipzig July 2013 Days to go 298

Helsingin ja Uudenmaan alueen herkkyystilastoja

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

TURIN. Historical capital of Italy. City of Art, Nature, Food and Sport. Turin is crossed by the Po river, the Italy s longest river

Metagenomics revisits the one pathogen/one disease postulates and translate the One Health concept into action

FH Studies Collaboration Lecturers at the European Society of Atherosclerosis Congress Pre- and Post- Event Questionnaires

Milestones of bacterial genetic research:

BacT/Alert: an Automated Colorimetric Microbial Detection System

BD Modified CNA Agar BD Modified CNA Agar with Crystal Violet

Payments to Overseas banks Things to be aware of

NGS data analysis. Bernardo J. Clavijo

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

HUSRES Annual Report 2008 Martti Vaara.

Put the human back in Human Resources.

Next Generation Sequencing Technologies in Microbial Ecology. Frank Oliver Glöckner

Global Trends in Online Shopping A Nielsen Global Consumer Report. June 2010

TEPZZ A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: G06Q 40/04 ( )

Annex to the Accreditation Certificate D-PL according to DIN EN ISO/IEC 17025:2005

A Case Study of ebay UTF-8 Database Migration

MK Gibson et al. Training and Benchmarking of Resfams profile HMMs

Professor Heikki Ervasti University of Turku Department of Social Research

DNA Sequencing and Personalised Medicine

The EU s 2030 Effort Sharing Agreement

A Tutorial in Genetic Sequence Classification Tools and Techniques

Retirement Readiness. OECD/IOPS GLOBAL FORUM ON PRIVATE PENSIONS - Sydney Nov 2-3

TEPZZ 65Z79 A_T EP A1 (19) (11) EP A1. (12) EUROPEAN PATENT APPLICATION published in accordance with Art.

TEPZZ 69 49A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: G06F 17/30 ( )

TEPZZ A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.: G06F 21/64 ( )

Fabrizio Anniballi Alfonsina Fiore, Bruna Auricchio, Dario De Medici

LEXSYNERGY LIMITED, AS A SPECIALIST AFRICAN

Next Generation Sequencing in Public Health Laboratories Survey Results

TEPZZ 68575_A_T EP A1 (19) (11) EP A1. (12) EUROPEAN PATENT APPLICATION published in accordance with Art.

Statistical modeling of non-coding DNA

Where People Search for Jobs:

Predictive microbiological models

FACULTY OF MEDICAL SCIENCE

1.- L a m e j o r o p c ió n e s c l o na r e l d i s co ( s e e x p li c a r á d es p u é s ).

FedEx Electronic Trade Documents User Guide for FedEx Ship Manager TM Software

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Next Generation Sequencing

Databases and platforms for data analysis from NGS of MTB

Bioinformatics Grid - Enabled Tools For Biologists.

Anti-Money Laundering Distributor Due Diligence

TEPZZ 69 _ZA T EP A2 (19) (11) EP A2. (12) EUROPEAN PATENT APPLICATION published in accordance with Art.

ZOZ 213 VAS

*EP A1* EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2005/14

B I N G O B I N G O. Hf Cd Na Nb Lr. I Fl Fr Mo Si. Ho Bi Ce Eu Ac. Md Co P Pa Tc. Uut Rh K N. Sb At Md H. Bh Cm H Bi Es. Mo Uus Lu P F.

EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2011/37

Basic Course on Bioinformatics tools for Next Generation Sequencing data mining June, 2015 Istituto Superiore di Sanità, SIDBAE Training Room

Transcription:

Introduk)on )l CGE serverne

Center for Genomic Epidemiology Formål: At levere den videnskabelige basis for frem)dens Internet- baserede løsninger, hvor en central database vil muliggøre håndtering af helgenom informa)on og sammenligning med andre sekvenser inklusiv spa)al- temporale analyser. Udvikle metoder )l hur)g analyse af helgenom DNA sekvenser og web- interfaces der muliggør brug af metoderne af det globale vidensakbelige og medicinske samfund.

Tools for phylogeny Name of Service Description URL (cge.cbs.dtu.dk/services) Status Publication Creation of Published Dec 2012, phylogenetic trees SnpTree based on SNPs snptree Online PMID: 23281601 NDtree Creation of phylogenetic trees NDtree Online Published in Feb 2014, PMID: 24505344 CSIPhylogeny Creation of phylogenetic trees CSIPhylogeny Online Planned

Tools for species iden)fica)on Name of Service Description SpeciesFinder Species identification using 16S rrna KmerFinder Species identification using overlapping 16mers TaxonomyFinder Taxonomy identification using functional protein domains Reads2Type Species identification on client computer URL (cge.cbs.dtu.dk/ services) Status Publication SpeciesFinder Online Published Feb 2014 PMID: 24574292 KmerFinder Online Published Jan 2014 PMID: 24172157 TaxonomyFinder Reads2Type Online None? Published in PMID: 24574292 + Oksana's PhD thesis

Benchmarking of Methods for Bacterial Species Iden)fica)on

Training data 1,647 completed / almost completed genomes downloaded from NCBI in 2011 (1,009 different species) Evalua)on data NCBI drax genomes 695 isolates from species that overlap with training set (151 species) SRA drax genomes 10,407 sets of short reads from Illumina (168 species) 10,407 drax genomes from Illumina data (168 species)

16S rrna 16S rrna sequencing has dominated molecular taxonomy of prokaryotes for more than 30 years (Fox et al, Int. J. Syst. Bacteriol., 1977) Tremendous amounts of 16S rrna sequence data are available in databases Concerns: Low resolu)on Some genomes contain several copies of the 16S rrna gene with inter- gene varia)on The 16S rrna gene represent only about 0.1% of the coding part of a microbial genome

CGE implementa)on of 16S species iden)fica)on SpeciesFinder Reference database 16S rrna genes are isolated from genomes in training data using RNAmmer (Lagesen, NAR, 2007). Method Input genomes are BLASTed against 16S rrna genes in reference database. Best hit is selected based on a combina)on of coverage, % iden)ty, bitscore, number of mistmatches and number of gaps in the alignments.

KmerFinder Genomes in training data is chopped into 16mers: A T G A C G T A T G A T T G A T G A C G T A G T A G T C C Immune system inspired downsampling Only 16mers with specific prefix are kept 9mer MHC-I

16mer database ATGAATGTGTGAGTGA CP001921 (Acinetobacter baumanii) CP000521 (Acinetobacter baumanii) CP002522 (Acinetobacter baumanii) ATGACTGTGCCCCTGA CP001921 (Acinetobacter baumanii) CP002301 (Buchnera aphidicola) Unknown isolate Unique 16 mers: ATGAATGTGTGAGTGA ATGACTGTGCCCCTGA ATGAAAAAAAAAAAA Species Match No. of Kmer hits Acinetobacter baumannii CP001921 2 Acinetobacter baumannii CP000521 1 Acinetobacter baumannii CP002521 1 Buchnera aphidicola CP002301 1

KmerFinder is very robust it only needs one 16mer! Desulfovibrio piger GOR1 SRR097356 >NODE 4 length 92 cov 23.119566! TAGGACGTGGAATATGGCAAGAAAACTGAAAATCATGGAAAATGAGAAACATCCACTTGA! CGACTTGAAAAATGACGAAATCACTAAAAAACGTGAAAAATGAGAAATGC! >NODE 15 length 82 cov 2.792683! AGCGAAAAATGTCATAACAACGATCACGACCGATAACCATCTTTGGTCCAAACTTACTCA! CGCAGCAGGCGTATAACTCGCGCATACCAGCTTTGGGCAT! N50 = 110 Total no. of bp: 210 PredicNon Species Match No. of Kmer hits Flavobacterium psycrophilum AM398681 1

TaxonomyFinder Taxonomy level-specific gene database creation Input set of prokaryotic genomes Prodigal gene prediction F o r e a c h g e n o m e Gene prediction Whole genome proteome scan against 3 HMM-based databases Gene grouping based on functional domain profiles User submitted genes PfamA domains TIGRFAM Superfamily Profile: A-B-C MTGENLPPELPATAQAWRASVLYGQHLQLIRHLCVTCPRWSQSTSR A B C CD-HIT clustering of all CDSs with no hits to any HMM-database Whole genome functional profile formation Phyla-specific Specific-profile finding Species-specific

Reads2Type DefiniNon: Quick & dirty taxonomy iden)fica)on of single isolates 50- mer of marker gene DB 16S rrna: Training data genomes RNAmmer (other) ITS: Training data (Mycobacterium) GyrB: Training data (Enterobacteriaceae) Resul)ng database ~5 MB Read2Type pushes analysis to user, server provides 50- mers database SuffixTree: efficient data structure for string matching Narrow Down Approach: Reads2Type compares 50- mers of combined marker genes against raw reads Shared Probes vs Unique Probe

rmlst Jolley KA, Bliss CM, Bennel JS, Bratcher HB, Brehony C, Colles FM, Wimalarathna H, Harrison OB, Sheppard SK, Cody AJ, Maiden MC. Ribosomal mulnlocus sequence typing: universal characterizanon of bacteria from domain to strain. Microbiology. 2012 Apr;158(Pt 4):1005-15. CGE implementanon For each genome in the training data the 53 ribosomal genes were extracted. Genomes in evalua)on sets were aligned using blat to each gene collec)on (only hits with at least 95% iden)ty and 95% coverage were considered as a poten)al match). The closets match of the training genomes was selected based on a combina)on of coverage, %iden)ty, bitscore, number of mistmatches and number of gaps in the alignments across all genes.

Results (16s rrna)

Overlap in predic)ons

Isolates in the NCBI dra<s set for which all four methods predict the species to be different from the annotated one. * NZAEPO00000000 has been re- annotated as S. oralis since we downloaded the data.

!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 1)"#..,/+)$&2*)"#/ 1)"#..,/+"%*%,/ 1)"#..,/+/,(&#.#/ 1)"#..,/+&2,*#$3#%$/#/ 1)"#..,/+4%#2%$/&%02)$%$/#/ 1'**%.#)+(,*35'*6%*# 1*,"%..)+)('*&,/ 1*,"%..)+-%.#&%$/#/ 1,*72'.5%*#)+-)..%# 1,*72'.5%*#)+0/%,5'-)..%# 1,*72'.5%*#)+&2)#.)$5%$/#/ 8)-09.'()"&%*+:%:,$# 82.)-95#)+&*)"2'-)&#/ 8.'/&*#5#,-+('&,.#$,- 8.'/&*#5#,-+$';9# 8.'/&*#5#,-+0%*6*#$3%$/ <$&%*'"'"",/+6)%").#/ </"2%*#"2#)+"'.# =*)$"#/%..)+&,.)*%$/#/ >)%-'02#.,/+#$6.,%$?)% >%.#"'()"&%*+09.'*# @)"&'()"#..,/+"*#/0)&,/ @)"&'()"#..,/+3)//%*# @)"&'()"#..,/+*%,&%*# @#/&%*#)+-'$'"9&'3%$%/ A9"'()"&%*#,-+&,(%*",.'/#/ B%#//%*#)+3'$'**2'%)% C/%,5'-'$)/+)%*,3#$'/) D2#?'(#,-+%&.# D2#?'(#,-+.%3,-#$'/)*,- E).-'$%..)+%$&%*#") E2#3%..)+/'$$%# E&)029.'"'"",/+),*%,/ E&)029.'"'"",/+%0#5%*-#5#/ E&*%0&'"'"",/+)3).)"&#)% E&*%0&'"'"",/+-#&#/ E&*%0&'"'"",/+'*).#/ E&*%0&'"'"",/+0$%,-'$#)% F*%)0.)/-)+,*%).9&#",- G#(*#'+"2'.%*)% G#(*#'+2)*;%9# G#(*#'+0)*)2)%-'.9&#",/ H%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 1)"#..,/+)$&2*)"#/ 1)"#..,/+"%*%,/ 1)"#..,/+/,(&#.#/ 1)"#..,/+&2,*#$3#%$/#/ 1)"#..,/+4%#2%$/&%02)$%$/#/ 1'**%.#)+(,*35'*6%*# 1*,"%..)+)('*&,/ 1*,"%..)+-%.#&%$/#/ 1,*72'.5%*#)+-)..%# 1,*72'.5%*#)+0/%,5'-)..%# 1,*72'.5%*#)+&2)#.)$5%$/#/ 8)-09.'()"&%*+:%:,$# 82.)-95#)+&*)"2'-)&#/ 8.'/&*#5#,-+('&,.#$,- 8.'/&*#5#,-+$';9# 8.'/&*#5#,-+0%*6*#$3%$/ <$&%*'"'"",/+6)%").#/ </"2%*#"2#)+"'.# =*)$"#/%..)+&,.)*%$/#/ >)%-'02#.,/+#$6.,%$?)% >%.#"'()"&%*+09.'*# @)"&'()"#..,/+"*#/0)&,/ @)"&'()"#..,/+3)//%*# @)"&'()"#..,/+*%,&%*# @#/&%*#)+-'$'"9&'3%$%/ A9"'()"&%*#,-+&,(%*",.'/#/ B%#//%*#)+3'$'**2'%)% C/%,5'-'$)/+)%*,3#$'/) D2#?'(#,-+%&.# D2#?'(#,-+.%3,-#$'/)*,- E).-'$%..)+%$&%*#") E2#3%..)+/'$$%# E&)029.'"'"",/+),*%,/ E&)029.'"'"",/+%0#5%*-#5#/ E&*%0&'"'"",/+)3).)"&#)% E&*%0&'"'"",/+-#&#/ E&*%0&'"'"",/+'*).#/ E&*%0&'"'"",/+0$%,-'$#)% F*%)0.)/-)+,*%).9&#",- G#(*#'+"2'.%*)% G#(*#'+2)*;%9# G#(*#'+0)*)2)%-'.9&#",/ H%*/#$#)+0%/&#/!"#$%&'(#$)*')+,-.)/$012)3#'*"#4 5$#(&62#()78#6&#4 9''*202#()78#6&#4!"!#$!$"!!%$ &""'

!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%!.&%*'-'$)/+-)".%'1##!2'*3#2'(#,-+"),.#$'1)$/ 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4.)&&)()"&%*#,-+/07 4'**%.#)+)82%.## 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+'9#/ 4,"3$%*)+)03#1#"'.) 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3%.)&#9'*)$/+/07 ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'()"&%*+".')")% >$&%*'"'"",/+8)%").#/ >$&%*'"'"",/+/07 >/"3%*#"3#)+"'.#?#(*'()"&%*+/,""#$'5%$%/?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @).'0#5%*+A)$)1,%$/#/ @).'&%**#5%$)+&,*:-%$#") B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C)*#$'()"&%*+)13)%*%$/ C'(#.,$",/+",*&#/## C<"'()"&%*#,-+)(/"%//,/ C<"'()"&%*#,-+('9#/ C<"'()"&%*#,-+-)*#$,- C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E.)$"&'-<"%/+.#-$'03#.,/ E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F,-#$'"'"",/+/07 G).-'$%..)+%$&%*#") G3#5%..)+('<1## G3#5%..)+8.%A$%*# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+")*$'/,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&%$'&*'03'-'$)/+-).&'03#.#) G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+0$%,-'$#)% H*%0'$%-)+)2'&'$,&*#"#,- I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+8#/"3%*# J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/07 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%!.&%*'-'$)/+-)".%'1##!2'*3#2'(#,-+"),.#$'1)$/ 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4.)&&)()"&%*#,-+/07 4'**%.#)+)82%.## 4*,"%..)+)('*&,/ 4*,"%..)+'9#/ 4,"3$%*)+)03#1#"'.) 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3%.)&#9'*)$/+/07 ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'()"&%*+".')")% >$&%*'"'"",/+8)%").#/ >$&%*'"'"",/+/07 >/"3%*#"3#)+"'.#?#(*'()"&%*+/,""#$'5%$%/?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @).'0#5%*+A)$)1,%$/#/ @).'&%**#5%$)+&,*:-%$#") B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C)*#$'()"&%*+)13)%*%$/ C'(#.,$",/+",*&#/## C<"'()"&%*#,-+)(/"%//,/ C<"'()"&%*#,-+('9#/ C<"'()"&%*#,-+-)*#$,- C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E.)$"&'-<"%/+.#-$'03#.,/ E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F,-#$'"'"",/+/07 G).-'$%..)+%$&%*#") G3#5%..)+('<1## G3#5%..)+8.%A$%*# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+")*$'/,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&%$'&*'03'-'$)/+-).&'03#.#) G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+0$%,-'$#)% H*%0'$%-)+)2'&'$,&*#"#,- I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+8#/"3%*# J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/07 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+"<&'&'A#",/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4'**%.#)+&,*#")&)% 4*,"%..)+)('*&,/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+3%0)&#",/ B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+1</%$&%*#)% G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% H3%*-')$)%*'()"&%*+/0 I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+9,.$#8#",/ K%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+"<&'&'A#",/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4'**%.#)+&,*#")&)% 4*,"%..)+)('*&,/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+3%0)&#",/ B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+1</%$&%*#)% G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% H3%*-')$)%*'()"&%*+/07 I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+9,.$#8#",/ K%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+/,#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/0 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+/,#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/07 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+-%.#&%$/#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# 4,*:3'.1%*#)+&3)#.)$1%$/#/ ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+0<.'*# B)"&'()"#..,/+"*#/0)&,/ B)"&'()"#..,/+5)//%*# B)"&'()"#..,/+*%,&%*# B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+3)*9%<# J#(*#'+0)*)3)%-'.<&#",/ K%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+-%.#&%$/#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# 4,*:3'.1%*#)+&3)#.)$1%$/#/ ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+0<.'*# B)"&'()"#..,/+"*#/0)&,/ B)"&'()"#..,/+5)//%*# B)"&'()"#..,/+*%,&%*# B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+3)*9%<# J#(*#'+0)*)3)%-'.<&#",/ K%*/#$#)+0%/&#/!"#$%&'#$()!#&%#)!"#$%&'#$()!#&%#)!"#$%&'#$()!#&%#)!"#$%&'#$()!#&%#) *+,-./01,- '23404+5./01,- -67)' )8,9/,:./01,- ;<<=';'#$()!#&%#) ;<<=';'#$()!#&%#) ;<<=';'#$()!#&%#) ;<<=';'#$()!#&%#) <&>%($-2?@(A,04+,:!"!#$!$"!!%$ &""' ; > & $

Speed Method 16S KmerFinder TaxonomyFinder rmlst Reads2Type EsNmated speed (mm:ss) 00:13* 00:09* 11:33* 00:45* 00:55** *Es)ma)on based on drax genomes **Es)ma)on based on short reads

Summary of taxonomy benchmark study KmerFinder had the highest accuracy and was the fastest method. SpeciesFinder (16S rrna- based) had the lowest accuracy. Methods that only sample genomic loci (16S, Reads2Type, rmlst) had difficul)es dis)nguishing species that only recently diverged, especially when main difference is a plasmid.

Tools for further typing Name of Service Description URL (cge.cbs.dtu.dk/services) Status Publication MLST Multilocus sequence typing MLST Online Published Apr 2012, PMID: 22238442 Plasmid- Finder Identification of plasmids in Enterobacteriaceae PlasmidFinder Online Published Apr 2014, PMID 24777092 pmlst pmlst of plasmids in Enterobacteriaceae pmlst Online Published Apr 2014, PMID 24777092

MulNlocus Sequence Typing (MLST) First developed in 1998 for Neisseria meningiis (Maiden et al. PNAS 1998. 95:3140-3145) The nucleo)de sequence of internal regions of app. 7 housekeeping genes are determined by PCR followed by Sanger sequencing Different alleles are each assigned a random number The unique combina)on of alleles is the sequence type (ST)

Using WGS data for MLST

Campylobacter lari www.cbs.dtu.dk/services/mlst Acinetobacter baumannii #1 Acinetobacter baumannii #2 Cronobacter Arcobacter C. upsaliensis Borrelia burgdorferi Escherichia coli #1 Bacillus cereus Escherichia coli #2 Brachyspira hyodysenteriae Enterococcus faecalis Bifidobacterium Enterococcus faecium Brachyspiria intermedia F. psychrophilum Bordetella Haemophilus influenzae Burkholderia pseudomallei Haemophilus parasuis Brachyspira Helicobacter pylori Burkholeria cepacia complex Klebsiella pneumoniae Campylobacter jejuni Lactobacillus casei Clostridium Assembled botulinum genome Lactococcus lacis Clostridium 454 difficile single #1 end reads Leptospira Clostridium 454 difficile paired #2 end reads Listeria Campylobacter Illumina helveicus single end reads Listeria monocytogenes Campylobacter Illumina insulaenigrae paired end Moraxella reads catarrhalis Clostridium Ion sepicum Torrent Mannheimia haemolyica C. diphtheriae SOLiD single end reads Neisseria Campylobacter SOLiD fetus mate pair reads P. gingivalis Chlamydiales P. acne Pseudomonas aeruginosa Pasteurella multocida Pasteurella multocida Staphylococcus aureus Streptococcus agalaciae Salmonella enterica Staphylococcus epidermidis S. maltophilia Streptococcus pneumoniae Streptococcus oralis S. zooepidemicus Streptococcus pyogenes Streptococcus suis Streptococcus thermophilus Streptomyces Streptococcus uberis Vibrio parahaemolyicus Vibrio vulnificus Wolbachia Xylella fasidiosa Y. pseudotuberculosis

Extended Output

Extended Output aro: WARNING, Identity: 100%, HSP/Length: 349/498, Gaps: 0, aro_122 is the best match for aro

PlasmidFinder and pmlst PlasmidFinder databasen indeholder fak)sk replicons, ikke plasmider.

Tools for phenotyping Name of Service Description URL (cge.cbs.dtu.dk/services) Status Publication ResFinder Pathogen- Finder Identification of acquired antibiotic resistance genes ResFinder Online Prediction of pathogenic potential Published Nov 2012, PMID: 22782487 PathogenFinder Online Published Oct 2013, PMID: 24204795 Virulence- Finder Identification of virulence genes in VTEC and Enterococcus VirulenceFinder Online E. coli published Feb 2014, PMID: 24574290. No publication strategy for Enterococcus and S. aureus

ResFinder NGS Illumina Ion torrent 454.. Assembly pipeline ResFinder (BLAST) Resistance gene profile List of genes Accession numbers Theoretical resistance phenotype Fasta Sanger

200 isolater fra 4 forskellige arter (Salmonella Typhimurium, Escherichia coli, Enterococcus faecalis and Enterococcus faecium) ResFinder, 98 %ID, 60% length coverage Fænotypiske tests, i alt 3051 482 Resistente 2569 Modtagelige => 99,74% af resultaterne er overensstemmende ml. ResFinder og fænotypiske tests 23 uoverensstemmelser - > 16, typisk i forb. med spec)nomycin in E. coli

Alterna)ver )l ResFinder

Unpublished or uncategorized Name of Service Description PanFunPro Groups homologous proteins based on functional domain content URL (cge.cbs.dtu.dk/ services) Status Publication PanFunPro Online Published in F1000Research 2013, 2:265 MGmapper Identifies content of metagenomic samples MGmapper Online, but under development Planned HostPhinder Prediction of the host of a bacteriophage HostPhinder Online, but under development Planned Restriction- Identification of RM Modification system genes Finder Restriction- ModificationFinder Online, but under development? Serotype- Finder Identification of serotypes SerotypeFinder-1.0 Online, but under development? MetaVir- Finder Identification of virus in metegenomic data MetaVirFinder Online, but under development?

Hvilke servers bruges mest 31.6 2.7 0.1 0.6 5.4 0.3 2.3 34.1 3.7 0.2 1 2.1 10.4 4.8 SerotypeFinder MGmapper VirulenceFinder Restriction NDtree SpeciesFinder KmerFinder HostPhinder PathogenBuster Assembler pmlst PlasmidFinder snptree CGE PrimerFinder ResFinder PathogenFinder MLST MetaVirFinder

Serverbrug fordelt på lande AR GB NO 30 AT AU GR HK NZ PE BE HU PH BR IE PK BY IL PL Server usage (% of total requests) 20 10 CA CBS CF CH CL CN CO CR CZ IN IR IT JP KE KH KR LB LT PT RS RU SA SE SG SI TH TR DE LU 0 Assembler CGE HostPhinder KmerFinder MGmapper MLST MetaVirFinder NDtree PathogenBuster PathogenFinder PlasmidFinder PrimerFinder Prodigal ResFinder Restriction SerotypeFinder SpeciesFinder VirulenceFinder pmlst snptree TW DK MA TZ EE MU UA ES MY US FI NG UY FR NL ZA

Top- 3 lande fordelt på server 6000 4000 2000 0 Assembler CGE HostPhinder KmerFinder MGmapper MLST MetaVirFinder NDtree PathogenFinder PlasmidFinder Prodigal ResFinder Restriction SerotypeFinder SpeciesFinder VirulenceFinder pmlst snptree Server usage (number of requests) Country AU DE DK FI GB IN IT JP LU NL NZ SA US