Introduk)on )l CGE serverne
Center for Genomic Epidemiology Formål: At levere den videnskabelige basis for frem)dens Internet- baserede løsninger, hvor en central database vil muliggøre håndtering af helgenom informa)on og sammenligning med andre sekvenser inklusiv spa)al- temporale analyser. Udvikle metoder )l hur)g analyse af helgenom DNA sekvenser og web- interfaces der muliggør brug af metoderne af det globale vidensakbelige og medicinske samfund.
Tools for phylogeny Name of Service Description URL (cge.cbs.dtu.dk/services) Status Publication Creation of Published Dec 2012, phylogenetic trees SnpTree based on SNPs snptree Online PMID: 23281601 NDtree Creation of phylogenetic trees NDtree Online Published in Feb 2014, PMID: 24505344 CSIPhylogeny Creation of phylogenetic trees CSIPhylogeny Online Planned
Tools for species iden)fica)on Name of Service Description SpeciesFinder Species identification using 16S rrna KmerFinder Species identification using overlapping 16mers TaxonomyFinder Taxonomy identification using functional protein domains Reads2Type Species identification on client computer URL (cge.cbs.dtu.dk/ services) Status Publication SpeciesFinder Online Published Feb 2014 PMID: 24574292 KmerFinder Online Published Jan 2014 PMID: 24172157 TaxonomyFinder Reads2Type Online None? Published in PMID: 24574292 + Oksana's PhD thesis
Benchmarking of Methods for Bacterial Species Iden)fica)on
Training data 1,647 completed / almost completed genomes downloaded from NCBI in 2011 (1,009 different species) Evalua)on data NCBI drax genomes 695 isolates from species that overlap with training set (151 species) SRA drax genomes 10,407 sets of short reads from Illumina (168 species) 10,407 drax genomes from Illumina data (168 species)
16S rrna 16S rrna sequencing has dominated molecular taxonomy of prokaryotes for more than 30 years (Fox et al, Int. J. Syst. Bacteriol., 1977) Tremendous amounts of 16S rrna sequence data are available in databases Concerns: Low resolu)on Some genomes contain several copies of the 16S rrna gene with inter- gene varia)on The 16S rrna gene represent only about 0.1% of the coding part of a microbial genome
CGE implementa)on of 16S species iden)fica)on SpeciesFinder Reference database 16S rrna genes are isolated from genomes in training data using RNAmmer (Lagesen, NAR, 2007). Method Input genomes are BLASTed against 16S rrna genes in reference database. Best hit is selected based on a combina)on of coverage, % iden)ty, bitscore, number of mistmatches and number of gaps in the alignments.
KmerFinder Genomes in training data is chopped into 16mers: A T G A C G T A T G A T T G A T G A C G T A G T A G T C C Immune system inspired downsampling Only 16mers with specific prefix are kept 9mer MHC-I
16mer database ATGAATGTGTGAGTGA CP001921 (Acinetobacter baumanii) CP000521 (Acinetobacter baumanii) CP002522 (Acinetobacter baumanii) ATGACTGTGCCCCTGA CP001921 (Acinetobacter baumanii) CP002301 (Buchnera aphidicola) Unknown isolate Unique 16 mers: ATGAATGTGTGAGTGA ATGACTGTGCCCCTGA ATGAAAAAAAAAAAA Species Match No. of Kmer hits Acinetobacter baumannii CP001921 2 Acinetobacter baumannii CP000521 1 Acinetobacter baumannii CP002521 1 Buchnera aphidicola CP002301 1
KmerFinder is very robust it only needs one 16mer! Desulfovibrio piger GOR1 SRR097356 >NODE 4 length 92 cov 23.119566! TAGGACGTGGAATATGGCAAGAAAACTGAAAATCATGGAAAATGAGAAACATCCACTTGA! CGACTTGAAAAATGACGAAATCACTAAAAAACGTGAAAAATGAGAAATGC! >NODE 15 length 82 cov 2.792683! AGCGAAAAATGTCATAACAACGATCACGACCGATAACCATCTTTGGTCCAAACTTACTCA! CGCAGCAGGCGTATAACTCGCGCATACCAGCTTTGGGCAT! N50 = 110 Total no. of bp: 210 PredicNon Species Match No. of Kmer hits Flavobacterium psycrophilum AM398681 1
TaxonomyFinder Taxonomy level-specific gene database creation Input set of prokaryotic genomes Prodigal gene prediction F o r e a c h g e n o m e Gene prediction Whole genome proteome scan against 3 HMM-based databases Gene grouping based on functional domain profiles User submitted genes PfamA domains TIGRFAM Superfamily Profile: A-B-C MTGENLPPELPATAQAWRASVLYGQHLQLIRHLCVTCPRWSQSTSR A B C CD-HIT clustering of all CDSs with no hits to any HMM-database Whole genome functional profile formation Phyla-specific Specific-profile finding Species-specific
Reads2Type DefiniNon: Quick & dirty taxonomy iden)fica)on of single isolates 50- mer of marker gene DB 16S rrna: Training data genomes RNAmmer (other) ITS: Training data (Mycobacterium) GyrB: Training data (Enterobacteriaceae) Resul)ng database ~5 MB Read2Type pushes analysis to user, server provides 50- mers database SuffixTree: efficient data structure for string matching Narrow Down Approach: Reads2Type compares 50- mers of combined marker genes against raw reads Shared Probes vs Unique Probe
rmlst Jolley KA, Bliss CM, Bennel JS, Bratcher HB, Brehony C, Colles FM, Wimalarathna H, Harrison OB, Sheppard SK, Cody AJ, Maiden MC. Ribosomal mulnlocus sequence typing: universal characterizanon of bacteria from domain to strain. Microbiology. 2012 Apr;158(Pt 4):1005-15. CGE implementanon For each genome in the training data the 53 ribosomal genes were extracted. Genomes in evalua)on sets were aligned using blat to each gene collec)on (only hits with at least 95% iden)ty and 95% coverage were considered as a poten)al match). The closets match of the training genomes was selected based on a combina)on of coverage, %iden)ty, bitscore, number of mistmatches and number of gaps in the alignments across all genes.
Results (16s rrna)
Overlap in predic)ons
Isolates in the NCBI dra<s set for which all four methods predict the species to be different from the annotated one. * NZAEPO00000000 has been re- annotated as S. oralis since we downloaded the data.
!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 1)"#..,/+)$&2*)"#/ 1)"#..,/+"%*%,/ 1)"#..,/+/,(&#.#/ 1)"#..,/+&2,*#$3#%$/#/ 1)"#..,/+4%#2%$/&%02)$%$/#/ 1'**%.#)+(,*35'*6%*# 1*,"%..)+)('*&,/ 1*,"%..)+-%.#&%$/#/ 1,*72'.5%*#)+-)..%# 1,*72'.5%*#)+0/%,5'-)..%# 1,*72'.5%*#)+&2)#.)$5%$/#/ 8)-09.'()"&%*+:%:,$# 82.)-95#)+&*)"2'-)&#/ 8.'/&*#5#,-+('&,.#$,- 8.'/&*#5#,-+$';9# 8.'/&*#5#,-+0%*6*#$3%$/ <$&%*'"'"",/+6)%").#/ </"2%*#"2#)+"'.# =*)$"#/%..)+&,.)*%$/#/ >)%-'02#.,/+#$6.,%$?)% >%.#"'()"&%*+09.'*# @)"&'()"#..,/+"*#/0)&,/ @)"&'()"#..,/+3)//%*# @)"&'()"#..,/+*%,&%*# @#/&%*#)+-'$'"9&'3%$%/ A9"'()"&%*#,-+&,(%*",.'/#/ B%#//%*#)+3'$'**2'%)% C/%,5'-'$)/+)%*,3#$'/) D2#?'(#,-+%&.# D2#?'(#,-+.%3,-#$'/)*,- E).-'$%..)+%$&%*#") E2#3%..)+/'$$%# E&)029.'"'"",/+),*%,/ E&)029.'"'"",/+%0#5%*-#5#/ E&*%0&'"'"",/+)3).)"&#)% E&*%0&'"'"",/+-#&#/ E&*%0&'"'"",/+'*).#/ E&*%0&'"'"",/+0$%,-'$#)% F*%)0.)/-)+,*%).9&#",- G#(*#'+"2'.%*)% G#(*#'+2)*;%9# G#(*#'+0)*)2)%-'.9&#",/ H%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 1)"#..,/+)$&2*)"#/ 1)"#..,/+"%*%,/ 1)"#..,/+/,(&#.#/ 1)"#..,/+&2,*#$3#%$/#/ 1)"#..,/+4%#2%$/&%02)$%$/#/ 1'**%.#)+(,*35'*6%*# 1*,"%..)+)('*&,/ 1*,"%..)+-%.#&%$/#/ 1,*72'.5%*#)+-)..%# 1,*72'.5%*#)+0/%,5'-)..%# 1,*72'.5%*#)+&2)#.)$5%$/#/ 8)-09.'()"&%*+:%:,$# 82.)-95#)+&*)"2'-)&#/ 8.'/&*#5#,-+('&,.#$,- 8.'/&*#5#,-+$';9# 8.'/&*#5#,-+0%*6*#$3%$/ <$&%*'"'"",/+6)%").#/ </"2%*#"2#)+"'.# =*)$"#/%..)+&,.)*%$/#/ >)%-'02#.,/+#$6.,%$?)% >%.#"'()"&%*+09.'*# @)"&'()"#..,/+"*#/0)&,/ @)"&'()"#..,/+3)//%*# @)"&'()"#..,/+*%,&%*# @#/&%*#)+-'$'"9&'3%$%/ A9"'()"&%*#,-+&,(%*",.'/#/ B%#//%*#)+3'$'**2'%)% C/%,5'-'$)/+)%*,3#$'/) D2#?'(#,-+%&.# D2#?'(#,-+.%3,-#$'/)*,- E).-'$%..)+%$&%*#") E2#3%..)+/'$$%# E&)029.'"'"",/+),*%,/ E&)029.'"'"",/+%0#5%*-#5#/ E&*%0&'"'"",/+)3).)"&#)% E&*%0&'"'"",/+-#&#/ E&*%0&'"'"",/+'*).#/ E&*%0&'"'"",/+0$%,-'$#)% F*%)0.)/-)+,*%).9&#",- G#(*#'+"2'.%*)% G#(*#'+2)*;%9# G#(*#'+0)*)2)%-'.9&#",/ H%*/#$#)+0%/&#/!"#$%&'(#$)*')+,-.)/$012)3#'*"#4 5$#(&62#()78#6 9''*202#()78#6!"!#$!$"!!%$ &""'
!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%!.&%*'-'$)/+-)".%'1##!2'*3#2'(#,-+"),.#$'1)$/ 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4.)&&)()"&%*#,-+/07 4'**%.#)+)82%.## 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+'9#/ 4,"3$%*)+)03#1#"'.) 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3%.)	'*)$/+/07 ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'()"&%*+".')")% >$&%*'"'"",/+8)%").#/ >$&%*'"'"",/+/07 >/"3%*#"3#)+"'.#?#(*'()"&%*+/,""#$'5%$%/?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @).'0#5%*+A)$)1,%$/#/ @).'&%**#5%$)+&,*:-%$#") B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C)*#$'()"&%*+)13)%*%$/ C'(#.,$",/+",*&#/## C<"'()"&%*#,-+)(/"%//,/ C<"'()"&%*#,-+('9#/ C<"'()"&%*#,-+-)*#$,- C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E.)$"&'-<"%/+.#-$'03#.,/ E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F,-#$'"'"",/+/07 G).-'$%..)+%$&%*#") G3#5%..)+('<1## G3#5%..)+8.%A$%*# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+")*$'/,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&%$'&*'03'-'$)/+-).&'03#.#) G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+0$%,-'$#)% H*%0'$%-)+)2'&'$,&*#"#,- I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+8#/"3%*# J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/07 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%!.&%*'-'$)/+-)".%'1##!2'*3#2'(#,-+"),.#$'1)$/ 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4.)&&)()"&%*#,-+/07 4'**%.#)+)82%.## 4*,"%..)+)('*&,/ 4*,"%..)+'9#/ 4,"3$%*)+)03#1#"'.) 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3%.)	'*)$/+/07 ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'()"&%*+".')")% >$&%*'"'"",/+8)%").#/ >$&%*'"'"",/+/07 >/"3%*#"3#)+"'.#?#(*'()"&%*+/,""#$'5%$%/?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @).'0#5%*+A)$)1,%$/#/ @).'&%**#5%$)+&,*:-%$#") B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C)*#$'()"&%*+)13)%*%$/ C'(#.,$",/+",*&#/## C<"'()"&%*#,-+)(/"%//,/ C<"'()"&%*#,-+('9#/ C<"'()"&%*#,-+-)*#$,- C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E.)$"&'-<"%/+.#-$'03#.,/ E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F,-#$'"'"",/+/07 G).-'$%..)+%$&%*#") G3#5%..)+('<1## G3#5%..)+8.%A$%*# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+")*$'/,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&%$'&*'03'-'$)/+-).&'03#.#) G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+0$%,-'$#)% H*%0'$%-)+)2'&'$,&*#"#,- I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+8#/"3%*# J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/07 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+"<&'&'A#",/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4'**%.#)+&,*#")&)% 4*,"%..)+)('*&,/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+3%0)&#",/ B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+1</%$&%*#)% G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% H3%*-')$)%*'()"&%*+/0 I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+9,.$#8#",/ K%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+"<&'&'A#",/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4'**%.#)+&,*#")&)% 4*,"%..)+)('*&,/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+3%0)&#",/ B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+1</%$&%*#)% G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% H3%*-')$)%*'()"&%*+/07 I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+9,.$#8#",/ K%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+/,#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/0 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+/,#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% B)"&'()"#..,/+"*#/0)&,/ B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+0)*)3)%-'.<&#",/ J#(*#'+/07 K%*/#$#)+0%/&#/ K%*/#$#)+0/%,1'&,(%*",.'/#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+-%.#&%$/#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# 4,*:3'.1%*#)+&3)#.)$1%$/#/ ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+0<.'*# B)"&'()"#..,/+"*#/0)&,/ B)"&'()"#..,/+5)//%*# B)"&'()"#..,/+*%,&%*# B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+3)*9%<# J#(*#'+0)*)3)%-'.<&#",/ K%*/#$#)+0%/&#/!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)% 4)"#..,/+)$&3*)"#/ 4)"#..,/+"%*%,/ 4)"#..,/+/,(&#.#/ 4)"#..,/+&3,*#$5#%$/#/ 4)"#..,/+6%#3%$/&%03)$%$/#/ 4'**%.#)+(,*51'*8%*# 4*,"%..)+)('*&,/ 4*,"%..)+-%.#&%$/#/ 4,*:3'.1%*#)+-)..%# 4,*:3'.1%*#)+0/%,1'-)..%# 4,*:3'.1%*#)+&3)#.)$1%$/#/ ;)-0<.'()"&%*+=%=,$# ;3.)-<1#)+&*)"3'-)&#/ ;.'/&*#1#,-+('&,.#$,- ;.'/&*#1#,-+$'9<# ;.'/&*#1#,-+0%*8*#$5%$/ >$&%*'"'"",/+8)%").#/ >/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/ @)%-'03#.,/+#$8.,%$2)% @%.#"'()"&%*+0<.'*# B)"&'()"#..,/+"*#/0)&,/ B)"&'()"#..,/+5)//%*# B)"&'()"#..,/+*%,&%*# B#/&%*#)+-'$'"<&'5%$%/ C<"'()"&%*#,-+&,(%*",.'/#/ D%#//%*#)+5'$'**3'%)% E/%,1'-'$)/+)%*,5#$'/) F3#2'(#,-+%&.# F3#2'(#,-+.%5,-#$'/)*,- G).-'$%..)+%$&%*#") G3#5%..)+/'$$%# G&)03<.'"'"",/+),*%,/ G&)03<.'"'"",/+%0#1%*-#1#/ G&*%0&'"'"",/+)5).)"&#)% G&*%0&'"'"",/+-#&#/ G&*%0&'"'"",/+'*).#/ G&*%0&'"'"",/+0$%,-'$#)% I*%)0.)/-)+,*%).<&#",- J#(*#'+"3'.%*)% J#(*#'+3)*9%<# J#(*#'+0)*)3)%-'.<&#",/ K%*/#$#)+0%/&#/!"#$%&'#$()!#&%#)!"#$%&'#$()!#&%#)!"#$%&'#$()!#&%#)!"#$%&'#$()!#&%#) *+,-./01,- '23404+5./01,- -67)' )8,9/,:./01,- ;<<=';'#$()!#&%#) ;<<=';'#$()!#&%#) ;<<=';'#$()!#&%#) ;<<=';'#$()!#&%#) <&>%($-2?@(A,04+,:!"!#$!$"!!%$ &""' ; > & $
Speed Method 16S KmerFinder TaxonomyFinder rmlst Reads2Type EsNmated speed (mm:ss) 00:13* 00:09* 11:33* 00:45* 00:55** *Es)ma)on based on drax genomes **Es)ma)on based on short reads
Summary of taxonomy benchmark study KmerFinder had the highest accuracy and was the fastest method. SpeciesFinder (16S rrna- based) had the lowest accuracy. Methods that only sample genomic loci (16S, Reads2Type, rmlst) had difficul)es dis)nguishing species that only recently diverged, especially when main difference is a plasmid.
Tools for further typing Name of Service Description URL (cge.cbs.dtu.dk/services) Status Publication MLST Multilocus sequence typing MLST Online Published Apr 2012, PMID: 22238442 Plasmid- Finder Identification of plasmids in Enterobacteriaceae PlasmidFinder Online Published Apr 2014, PMID 24777092 pmlst pmlst of plasmids in Enterobacteriaceae pmlst Online Published Apr 2014, PMID 24777092
MulNlocus Sequence Typing (MLST) First developed in 1998 for Neisseria meningiis (Maiden et al. PNAS 1998. 95:3140-3145) The nucleo)de sequence of internal regions of app. 7 housekeeping genes are determined by PCR followed by Sanger sequencing Different alleles are each assigned a random number The unique combina)on of alleles is the sequence type (ST)
Using WGS data for MLST
Campylobacter lari www.cbs.dtu.dk/services/mlst Acinetobacter baumannii #1 Acinetobacter baumannii #2 Cronobacter Arcobacter C. upsaliensis Borrelia burgdorferi Escherichia coli #1 Bacillus cereus Escherichia coli #2 Brachyspira hyodysenteriae Enterococcus faecalis Bifidobacterium Enterococcus faecium Brachyspiria intermedia F. psychrophilum Bordetella Haemophilus influenzae Burkholderia pseudomallei Haemophilus parasuis Brachyspira Helicobacter pylori Burkholeria cepacia complex Klebsiella pneumoniae Campylobacter jejuni Lactobacillus casei Clostridium Assembled botulinum genome Lactococcus lacis Clostridium 454 difficile single #1 end reads Leptospira Clostridium 454 difficile paired #2 end reads Listeria Campylobacter Illumina helveicus single end reads Listeria monocytogenes Campylobacter Illumina insulaenigrae paired end Moraxella reads catarrhalis Clostridium Ion sepicum Torrent Mannheimia haemolyica C. diphtheriae SOLiD single end reads Neisseria Campylobacter SOLiD fetus mate pair reads P. gingivalis Chlamydiales P. acne Pseudomonas aeruginosa Pasteurella multocida Pasteurella multocida Staphylococcus aureus Streptococcus agalaciae Salmonella enterica Staphylococcus epidermidis S. maltophilia Streptococcus pneumoniae Streptococcus oralis S. zooepidemicus Streptococcus pyogenes Streptococcus suis Streptococcus thermophilus Streptomyces Streptococcus uberis Vibrio parahaemolyicus Vibrio vulnificus Wolbachia Xylella fasidiosa Y. pseudotuberculosis
Extended Output
Extended Output aro: WARNING, Identity: 100%, HSP/Length: 349/498, Gaps: 0, aro_122 is the best match for aro
PlasmidFinder and pmlst PlasmidFinder databasen indeholder fak)sk replicons, ikke plasmider.
Tools for phenotyping Name of Service Description URL (cge.cbs.dtu.dk/services) Status Publication ResFinder Pathogen- Finder Identification of acquired antibiotic resistance genes ResFinder Online Prediction of pathogenic potential Published Nov 2012, PMID: 22782487 PathogenFinder Online Published Oct 2013, PMID: 24204795 Virulence- Finder Identification of virulence genes in VTEC and Enterococcus VirulenceFinder Online E. coli published Feb 2014, PMID: 24574290. No publication strategy for Enterococcus and S. aureus
ResFinder NGS Illumina Ion torrent 454.. Assembly pipeline ResFinder (BLAST) Resistance gene profile List of genes Accession numbers Theoretical resistance phenotype Fasta Sanger
200 isolater fra 4 forskellige arter (Salmonella Typhimurium, Escherichia coli, Enterococcus faecalis and Enterococcus faecium) ResFinder, 98 %ID, 60% length coverage Fænotypiske tests, i alt 3051 482 Resistente 2569 Modtagelige => 99,74% af resultaterne er overensstemmende ml. ResFinder og fænotypiske tests 23 uoverensstemmelser - > 16, typisk i forb. med spec)nomycin in E. coli
Alterna)ver )l ResFinder
Unpublished or uncategorized Name of Service Description PanFunPro Groups homologous proteins based on functional domain content URL (cge.cbs.dtu.dk/ services) Status Publication PanFunPro Online Published in F1000Research 2013, 2:265 MGmapper Identifies content of metagenomic samples MGmapper Online, but under development Planned HostPhinder Prediction of the host of a bacteriophage HostPhinder Online, but under development Planned Restriction- Identification of RM Modification system genes Finder Restriction- ModificationFinder Online, but under development? Serotype- Finder Identification of serotypes SerotypeFinder-1.0 Online, but under development? MetaVir- Finder Identification of virus in metegenomic data MetaVirFinder Online, but under development?
Hvilke servers bruges mest 31.6 2.7 0.1 0.6 5.4 0.3 2.3 34.1 3.7 0.2 1 2.1 10.4 4.8 SerotypeFinder MGmapper VirulenceFinder Restriction NDtree SpeciesFinder KmerFinder HostPhinder PathogenBuster Assembler pmlst PlasmidFinder snptree CGE PrimerFinder ResFinder PathogenFinder MLST MetaVirFinder
Serverbrug fordelt på lande AR GB NO 30 AT AU GR HK NZ PE BE HU PH BR IE PK BY IL PL Server usage (% of total requests) 20 10 CA CBS CF CH CL CN CO CR CZ IN IR IT JP KE KH KR LB LT PT RS RU SA SE SG SI TH TR DE LU 0 Assembler CGE HostPhinder KmerFinder MGmapper MLST MetaVirFinder NDtree PathogenBuster PathogenFinder PlasmidFinder PrimerFinder Prodigal ResFinder Restriction SerotypeFinder SpeciesFinder VirulenceFinder pmlst snptree TW DK MA TZ EE MU UA ES MY US FI NG UY FR NL ZA
Top- 3 lande fordelt på server 6000 4000 2000 0 Assembler CGE HostPhinder KmerFinder MGmapper MLST MetaVirFinder NDtree PathogenFinder PlasmidFinder Prodigal ResFinder Restriction SerotypeFinder SpeciesFinder VirulenceFinder pmlst snptree Server usage (number of requests) Country AU DE DK FI GB IN IT JP LU NL NZ SA US