Identification and Characterization of Foodborne Pathogens by Whole Genome Sequencing: A Shift in Paradigm Peter Gerner-Smidt, MD, ScD Enteric Diseases Laboratory Branch EFSA Scientific Colloquium N o 20 Use of Whole Genome Sequencing (WGS) of foodborne pathogens for public health protection Parma, Italy, 16-17 July 2014 National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne, and Environmental Diseases
Current Methods of Characterizing Foodborne Pathogens in a Public Health Laboratory Growth characteristics Phenotypic panels Agglutination reactions Enzyme immuno assays (EIAs) PCR DNA arrays (hybridization) Sanger sequencing DNA restriction Electrophoresis (PFGE, capillary) Each pathogen is characterized by methods that are specific to that pathogen in multiple workflows Separate workflows for each pathogen TAT: 5 min weeks (months)
Why Move Public Health Microbiology to WGS? Consolidation of workflows in the lab More efficient outbreak detection, investigation & control Precise and flexible case definition More outbreaks will be detected and solved when they are small Scarce epi-resources may be focused More efficient surveillance of sporadic infections Source attribution analysis of sporadic disease Focus on pathogens of particular public health importance: Virulence Resistance - Emerging pathogens - Rapidly spreading clones/ traits- Vaccine preventable diseases
WGS in Public Health: The tools must be Simple Public health microbiologists are NOT bioinformaticians Standard desktop software Comprehensive All characterization in one workflow Work in a network of laboratories Free sharing and comparison of data between labs Central and local databases
Genomics for Diagnostics and Surveillance is a Global Issue How can we implement genomics fast and efficiently at the global level? When to share data? Surveillance WGS should be shared in real-time What is the minimal IT-infrastructure? What technical gaps need to be filled? QA/QC Political and ethical barriers
Generating succes stories: Proof-of-Concept Study on the Use of Real-Time Whole Genome Sequencing in Conjunction with Enhanced Surveillance for Listeriosis Collaboration among the public health departments in the states, CDC, FDA, USDA, and NCBI International component: Developing and refining bioinformatics pipelines with partners in Belgium. Canada, Denmark, England, and France
Why Listeria monocytogenes? Illness is rare but serious, costly, and commonly outbreak associated Controllable Current subtyping methods are not ideal Not highly discriminatory No evolutionary relationships Listeria genome is fairly small and relatively easy to sequence and analyze Strong epidemiology surveillance (Listeria Initiative) Strong regulatory component
Approach: Sequence all clinical isolates in the U.S. during one year as close to real-time as possible in parallel with current surveillance PulseNet PFGE, strain characterization at CDC, interview of case-patients Evaluate data on a weekly basis Follow up on clusters detected Both PFGE and WGS defined clusters Upload sequences to NCBI (Genbank), a public database, as the sequences are generated With metadata that do NOT identify state or isolation date but with link to the PulseNet database
Listeria Whole Genome Sequencing Surveillance in Numbers (8 months into the project, June 2014) ~ 690 clinical isolates sequenced in real-time ~ 75 environmental and historical clinical isolates sequenced 470 food isolates sequenced by FDA/GenomeTrakr 13 clusters identified by PFGE & WGS 3 clusters identified only by WGS
Listeria Whole Genome Sequencing Surveillance Lessons Learned: Possible to perform WGS in real-time Historical data are critical for cluster determination WGS provides more resolution than PFGE One PFGE cluster could not be confirmed by WGS Saving resources by interviewing only patients relevant to the cluster Suspected source disproved by WGS in one cluster A common source identified in one cluster until now Difficult to obtain relevant exposure information from patients A sporadic case was linked to a lettuce recall by WGS Many good analytical approaches yielding (almost) the same information
WGS Minimum Spanning Tree of Listeria monocytogenes Connection between lettuce recall and a sporadic case? Courtesy Public Health Agency of Canada & the Canadian Food Inspection Agency
98 99 100 LMO_1 LMO_4 LMO_5 LMO_6 LMO_7 LMO_10 LMO_11 LMO_12 LMO_13 LMO_14 Lessons learned: WGS may identify clusters not evident by PFGE Cluster 1403MLGX6-1 5 cases, 3 PFGE patterns, 3 states All patients originally from former USSR or Poland wgmlst (<All Characters>) wgmlst wgmlst 1.5 2.1 0.1 0.0 0.3 0.0 0.1 0.3 0.5 0.6 IL isolate 2 18 20 41 11 19 11 8 37 NYC isolate 1 FL isolate NYC isolate 2 NYC isolate 3 2013 isolate Nearest Neighbor 25 2 18 20 41 11 19 11 8 37 25 2 18 20 41 11 19 11 8 37 25 2 20 41 11 19 11 8 37 25 2 18 20 41 11 19 11 8 37 25 2 41 11 19 11 8
Lessons Learned: WGS May Identify Clusters Not Evident by PFGE 1403MLGX6-1WGS hqsnp tree
How different are WGS profiles of isolates belonging to the same outbreak? Salmonella serovar Heidelberg PFGE pattern JF6X01.0022 single state outbreaks in summer 2013 2013T-3050-0.00 77 2013T-3042-0.00 2013T-3111-0.00 2013T-3112-910.01 77 0.01 0.002013T-3110-74 0.00 0.00 74 2013T-3109-0.00 0.00 2013T-3040-87 0.01 2-8 SNPs 2-16 SNPs AL funeral 105-113 SNPs 0.32 0.19 0.032013T-3059-A 0.00 100 2013T-3058-0.01 2013T-3041-0.02 2013K-1037- Sporadic case, NC 0.14 0 0.00 75 0.04 2013AM-0488 410.02 0.00 100 2013AM-0486 0.01 0.18 2013AM-0487 0.01 0.14 2013K-1038 0.00 2013K-1067-100 2013K-1036-0.02 0.19 2013K-1040-930.01 0.02 2013K-1039-0.00 4-5 SNPs NY daycare Sporadic case, NE Sporadic case, OH 3-6 SNPs (3-9 SNPs) CO family gathering 0.05
2012-2013 Salmonella serovar Heidelberg multistate outbreak associated with chicken from manufacturer X PFGE patterns Pattern 122 Pattern 672 Pattern 45 Pattern 22 Pattern 326 Pattern 41 Pattern 258 ** Chicken from manufacturer X *** Clinical isolate from 2011 89 2012K-1420 0.00251 84 0.00779 2012K-1421 0.00481 0.03113 2013K-1635 42 0.01289 2012K-1747 0.00251 98 2012K-1549** 100 0.05323 0.02825 2012K-1550** 0.03381 0.02869 2012K-1417 0.01266 87 2012K-1255 0.01005 0.00763 2012K-1256 0.04386 84 2012K-1254 0.00701 0.02581 2013K-1633** 84 0.02798 2012K-1315*** 0.00707 0.07858 95 100 2013K-1649** 0.02262 0.016370.04630 2013K-1650 0.03288 87 100 2013K-0982 0.00319 0.01024 0.06016 2013K-1636 0.01701 2013K-1361 0.03682 0.05173 2013K-0983 75 0.06983 0.00807 100 2013K-1638 0.01738 0.08289 2013K-1639 0.03360 2013AM-0303 0.02883 100 2013K-0979 0.02834 0.11613 95 2013K-1275 0.01792 0.02671 2013K-0573** 0.03086 72 2013K-0574 0.00248 0.02037 2013K-0980** 0.00000 0.03478 2013K-1274 0.02286 21 SNPs 7-60 SNPs 19 SNPs 6-46 SNPs 6-35 SNPs 6-62 SNPs
To SNP or Not to SNP? in public health Single Nucleotide Polymorphism (SNP) approaches Default for phylogenetic analyses of sequence data Comparative subtyping by nature Results difficult to communicate Computationally intensive = SLOW Gene- gene approach (wgmlst) Definitive subtyping Leads to naming, tracking over time, easy communication Computationally more simple = FAST but Sufficiently discriminatory?
Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreak High confidence core SNP It took days to It took days to generate this tree using our own NY no soft cheese No soft cheese exposure exposure bioinformatics pipeline TX - no exposure info
Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreak High confidence core SNP Historical isolates from the plant environment added to the comparison (courtesy FDA/CFSAN) 0.005 Red= epi-related clinical isolates Blue= retrospective clinical controls or not outbreak related Green= historical environmental isolates from the plant It took additional hours to generate this tree using our own bioinformatics pipeline 100 100 100 CFSAN004365 100 CFSAN004359 CFSAN004361 CFSAN004360 CFSAN004358 CFSAN004353 2010L-1790 CFSAN004348 CFSAN004363 2013L-5298 CFSAN004355 CFSAN004356 CFSAN004352 CFSAN004369 2013L-5214 2011L-2809 CFSAN004377 CFSAN004375 100 CFSAN004374 100 CFSAN004373 CFSAN004349 CFSAN004364 2013L-5275 2013L-5223 2013L-5337 CFSAN004354 2013L-5357 CFSAN004372 CFSAN004370 100 CFSAN004371 CFSAN004368 CFSAN004350 CFSAN004357 2013L-5283 CFSAN004366 CFSAN004376 CFSAN004362 CFSAN004351 2012L-5487 2013L-5121 2013L-5284 2012L-5105 2012L-5274 100 2013L-5374 2013L-5301
Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreak It took less than 5 minutes to draw this tree using wgmlst
Gene Gene Approach: Fixed set of genes ( loci ) leading to typing schemes on different levels MLST Genus/Species Serotype AR emlst cmlst wgmlst Concept of allelic variation, not only point mutations Evolutionary distance for events such as recombination and simultaneous close-range mutations are counted as one event Definitive subtyping Leads to nomenclature Requires curation
Genes That May Be Targeted In a Gene-Gene Analytical Approach Housekeeping genes for MLST & emlst Virulence genes Core (c) genes ( present in all strains in a species ) Pan- genome (wg) ( all genes in the whole population of a species ) Serotyping genes Genes for genus/species/subspecies identification Antimicrobial resistance genes
Gene Gene Approach for Naming Subtyping in Keep with Phylogeny (concept to be developed) 7 gene MLST emlst cmlst wgmlst Isolate A ST24 - e12 - c48 - w214 Isolate B ST24 - e12 - c48 - w352 Isolate C ST24 - e12 - c45 - w132 Isolate D ST31 - e15 - c60 - w582 Isolate A and B closely related Isolate C related to A and B but not as closely as A is to B Isolate D unrelated to all the other isolates Providing phylogenetic information in the name is important because isolates from the same source are more likely to be related than isolates from different sources
Public Health WGS Workflow LIMS End users at CDC and in the States Sequencer Raw sequences External storage NCBI, ENA, BaseSpace PH databases Closed Genus/species Serotype Pathotype Virulence profile AST Lineage Clone Sequence type Allele SNPs Open Nomenclature server Allele databases Open Calculation engine Trimming, mapping, de novo assembly, SNP detection, allele detection Open
GENUS/SPECIES: PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EaggEC) VIRULENCE PROFILE: stx2a, aagr, aaga, siga, sepa, pic, aata, aaic, aap SEQUENCE TYPE: ST34 ANTIMICROBIAL RESISTANCE GENES: bla TEM-1, bla CTX-M-15 All characteristics have been determined by whole genome sequencing (WGS) The strain contains Shiga toxin subtype 2a typically associated with virulent STEC It does not contain adherence and virulence factors (eae, ehxa) typically associated with virulent STEC It contains adherence and virulence factors typically associated with virulent EaggEc (aagr, aaga, siga, sepa, pic, aata, aaic, aap) This genotype is associated with extremely high (>10%) rates of hemolytic uremic syndrome (HUS)
Real-time Sharing Of Sequence Data Is The Key To Successful Surveillance!
Acknowledgements CDC: John Besser, Kelly Jackson, Amanda Conrad, Lee Katz, Steven Stroika, Heather Carleton, Zuzana Kucerova, Eija Trees, Katie Roache, Ashley Sabol, Andrew Huang, Lori Gladney, Maryann Turnsek, Ben Silk, Katie Fullerton, Matthew Wise, Ian Williams, Duncan MacCannell, Efrain Ribot, Patricia Griffin, Rob Tauxe, Chris Braden, Dana Pitts, Collette Leaumont, Patti Fields Public Health Agency of Canada Disclaimers: The findings and conclusions in this presentation are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention Use of trade names is for identification only and does not imply endorsement by the Centers for Disease Control and Prevention or by the U.S. Department of Health and Human Services. National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne, and Environmental Diseases 36
WGS Analysis at CDC Similar results Circumstantial evidence suggesting pre-packaged lettuce is the likely food vehicle in a sporadic case of listeriosis WGS Analysis by Lee Katz, CDC 2013L-5439 2013L-5527 2013L-5545 2013L-5556 2013L-5203 2013L-5204 2013L-5308 2013L-5473 2013L-5085 2013L-5587 2013L-5333 2013L-5314 2013L-5182 2013L-5622 2013L-5623 2013L-5521 2013L-5548 2013L-5345 2013L-5359 2013L-5297 2013L-5194 2014L-6017 Lettuce isolate Ohio isolate 2013L-5540 2013L-5560 21.5 hqsnps [0-33] 43 hqsnps 5 hqsnps 58 hqsnps [46-58] 24 hqsnps [0-94] 55 hqsnps [5-66] 66 hqsnps [46-84] 2014L-6086 0.02 12
98 99 100 LMO_1 LMO_3 LMO_4 LMO_6 LMO_7 LMO_9 LMO_10 LMO_11 LMO_12 LMO_13 LMO_14 LMO_15 wgmlst (<All Characters>) wgmlst Canadian Lettuce/OH case patient wgmlst GMLST_LMO0000877 GMLST_LMO0000931 GMLST_LMO0000959 GMLST_LMO0000692 GMLST_LMO0000994 GMLST_LMO0000993 GMLST_LMO0000834 GMLST_LMO0000916 GMLST_LMO0000884 GMLST_LMO0000888 GMLST_LMO0000811 GMLST_LMO0000906 GMLST_LMO0000926 GMLST_LMO0000686 GMLST_LMO0001160 GMLST_LMO0000918 GMLST_LMO0000675 GMLST_LMO0000713 GMLST_LMO0000930 GLMO0001344 4-3198_S1_L001 GMLST_LMO0000727 GMLST_LMO0000901 SRR1016591 SRR1021895 SRR1027086 SRR1041615, SRR98. SRR1030280 SRR1030279 SRR1001027 SRR1112081 SRR1016925 SRR1016594 SRR988736 SRR1021953 SRR1016605 SRR946033 SRR1166834 SRR1021894 SRR1041609, SRR95. SRR972406 SRR1016609 SRR972392 SRR1021948 *UPGMA tree containing PFGE matches to Ohio and Canadian lettuce isolates *wgmlst groups these isolates of interest as well as supports the general tree layout seen by hqsnp analysis 2013L-5521 3 5 PNUSAL000256 22 26 41 NY IDR1300025739 29 27 29 26 30 2 2013L-5204 3 5 PNUSAL000320 22 26 41 HI N13-140 29 27 29 26 30 2 2013L-5587 3 5 PNUSAL000348 22 26 41 WA 19002 29 27 29 26 30 2 2013L-5314 3 5 PNUSAL000069 22 26 41 SC LM14-001 29 27 29 26 30 2 2013L-5623 3 5 PNUSAL000383 22 26 41 WA 19014 29 27 29 26 30 2 2013L-5622 3 5 PNUSAL000382 22 26 41 WA 19029 27 29 26 30 2 2013L-5473 3 5 PNUSAL000212 26 41 MD MDA13096803 29 27 29 26 30 2 2013L-5085 3 5 PNUSAL000305 22 26 41 NYC nyc13-101340416 29 27 29 26 30 2 2013L-5712 3 5 PNUSAL000270 22 26 41 NY IDR1300029167 27 29 26 30 2 2013L-5527 3 5 PNUSAL000278 22 26 41 MI CL13-200750 29 27 29 26 30 2 2013L-5439 3 5 PNUSAL000189 22 26 41 OH 2013042642 29 27 29 26 30 2 2013L-5545 3 5 PNUSAL000296 22 26 41 CA M13X04891 29 27 29 26 30 2 2013L-5556 3 5 PNUSAL000315 22 26 41 WA 18947 29 27 29 26 30 2 2013L-5308 3 5 PNUSAL000063 22 26 41 NY IDR1300016274 29 29 26 30 2 2014L-6017 3 5 PNUSAL000548 22 55 41 MA 14EN0045 29 27 29 26 30 2 2013L-5548 3 5 PNUSAL000307 22 26 41 CT 344766001 29 27 29 26 30 2 2013L-5297 3 5 PNUSAL000052 22 55 41 SC LM13-008 29 27 29 26 30 2 2013L-5345 3 5 PNUSAL000090 OH 22 case 26 41 NYC nyc13-101347849 29 27 29 26 30 2 2013L-5560 3 5 PNUSAL000319 22 26 41 OH 2013063623 29 27 29 26 30 2 2014L-6149 3 5 PNUSAL000172 22 55 41 OH 2014001036 29 27 29 26 30 2 CN 20 lettuce 3 5 CA_lettuce 22 55 41 29 27 29 26 30 2 2013L-5359 3 5 PNUSAL000104 22 55 41 MI CL13-200523 84 27 29 26 30 2 2013L-5540 5 PNUSAL000291 22 55 41 MI CL13-200753 29 27 29 26 30 2