Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing



Similar documents
Whole genome sequencing of foodborne pathogens: experiences from the Reference Laboratory. Kathie Grant Gastrointestinal Bacteria Reference Unit

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Use of Whole Genome Sequencing (WGS) of food-borne pathogens for public health protection

Databases and platforms for data analysis from NGS of MTB

The National Antimicrobial Resistance Monitoring System (NARMS)

General Services Administration Federal Supply Service Authorized Federal Supply Schedule Price List

Typing in the NGS era: The way forward!

Canadian Public Health Laboratory Network. Core Functions of Canadian Public Health Laboratories

Automated and Scalable Data Management System for Genome Sequencing Data

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

HEALTH SYSTEM. Introduction. The. jurisdictions and we. Health Protection. Health Improvement. Health Services. Academic Public

Core Functions and Capabilities. Laboratory Services

FACULTY OF MEDICAL SCIENCE

Delivering the power of the world s most successful genomics platform

DNA Sequencing and Personalised Medicine

Workshop on Methods for Isolation and Identification of Campylobacter spp. June 13-17, 2005

Integrated Rule-based Data Management System for Genome Sequencing Data

healthcare associated infection 1.2

Use of Whole Genome. of food-borne pathogens for public health protection. Efsa Scientific Colloquium Summary Report June 2014, Parma, Italy

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

A Fast, Accurate, and Automated Workflow for Multi Locus Sequence Typing of Bacterial Isolates

Kraig E. Humbaugh, M.D., M.P.H. Kentucky Department for Public Health

Automated Lab Management for Illumina SeqLab

Identification and Characterization of Foodborne Pathogens by Whole Genome Sequencing: A Shift in Paradigm

State HAI Template Utah. 1. Develop or Enhance HAI program infrastructure

Bacterial Next Generation Sequencing - nur mehr Daten oder auch mehr Wissen? Dag Harmsen Univ. Münster, Germany dharmsen@uni-muenster.

NIAID Genomics and Bioinformatics Programs

Infection Control for Non Clinical Healthcare Workers

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

2013 Indiana Healthcare Provider and Hospital Administrator Multi-Drug Resistant Organism Survey

Identification of a problem, e.g., an outbreak Surveilance Intervention Effect

DEVELOPING WORLD-CLASS PERFORMANCE IN HEALTHCARE SCIENCE

Practical Solutions for Big Data Analytics

The 100,000 genomes project

Laboratory Information for Public Health Excellence

Next Generation Sequencing in Public Health Laboratories Survey Results

2 Short biographies and contact information of the workshop organizers

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

From Farm to Fork - How to Improve Surveillance of the Food Supply Chain. Prof. Dr. Dr. Andreas Hensel

WHY IS THIS IMPORTANT?

How To Plan Healthy People 2020

Commonwealth of Virginia

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Solid Organ Transplantation

Electronic Prescriptions, Dashboards and University Hospital Birmingham

Master of Public Health (MPH) SC 542

Workshop Rapid NGS for Public Health Microbiology

Report on Plans and Priorities Additional Information for Sub-programs and Sub-sub-programs

Guidelines for Animal Disease Control

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

QUALITY AND SAFETY TESTING

Hialeah Nursing and Rehabilitation Center Combines Technology and Best Practices to Improve Infection Control Specific to C.diff

European registered Clinical Laboratory Geneticist (ErCLG) Core curriculum

Colleges and Universities Pandemic Influenza Planning Checklist

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

National Antimicrobial Resistance Monitoring System - Enteric Bacteria. A program to monitor antimicrobial resistance in humans and animals

Developing Microsoft SharePoint Server 2013 Advanced Solutions

The E. coli Insulin Factory

MATRIX GEMINI LIMS AT WORK IN THE FOOD INDUSTRY

European Centre for Disease Prevention and Control. Updated. Public Health Microbiology Strategy. Work Plan ,2

The Financial Benefits of the MicroSEQ Microbial Identification System

Showcase Hospitals Local Technology Review Report number 3. Quality Compass

Introduction to Bioinformatics 3. DNA editing and contig assembly

G E N OM I C S S E RV I C ES

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Improving quality, protecting patients

Introduction to NGS data analysis

GC3 Use cases for the Cloud

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

CONCLUSIONS AND RECOMMENDATIONS

Monitoring surgical wounds for infection

thoughtonomy Virtual Workforce for Service Automation

and Entry to Premises by Local

Premier. Helping healthcare providers deliver the best possible care to their patients. Smart is...

A Real Application of Visual Analytics for Healthcare Associated Infections

The Way Forward McGill & Food Safety

History of DNA Sequencing & Current Applications

QRadar SIEM 6.3 Datasheet

Personalized medicine in China s healthcare system

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

Overview sequence projects

Pandemic Influenza Planning for Colleges and Universities

Application Note # LCMS-62 Walk-Up Ion Trap Mass Spectrometer System in a Multi-User Environment Using Compass OpenAccess Software

Arizona Department of Health Services Healthcare-Associated Infection Plan Progress Report June 2010

Next Generation Sequencing

Prepare the environment Practical Part 1.1

Public health microbiology services user handbook

Controlling MRSA in England: what we have done and what we think worked. Professor Barry Cookson

Infection Prevention and Control Policy

Cloud Computing for Scientific Research

7- Master s Degree in Public Health and Public Health Sciences (Majoring Microbiology)

Transcription:

Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing Anthony Underwood Bioinformatics Unit, Infectious Disease Informatics, Microbiological Services, Public Health England

Public Health England PHE is an executive agency, sponsored by the Department of Health, UK. We protect and improve the nation's health and wellbeing, and reduce health inequalities Microbiology Services we provide specialist investigation and control of communicable disease outbreaks, chemical incidents, radiation and other environmental hazards we provide the evidence-based science and clinical practice in specialist microbiology in support of the wider public health system and NHS hospitals 2 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

PHE Reference Microbiology Reference Microbiology carries out a broad spectrum of work relating to prevention of infectious disease. The remit of the centre at Colindale includes: Infectious disease surveillance, Providing specialist and reference microbiology and microbial epidemiology, Research & Development Coordinating the investigation and cause of national and uncommon outbreaks, Helping advise government on the risks posed by various infections Responding to international health alerts. 3 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

PHE Specialist Microbiology Services The PHE Specialist Microbiology Services consists of 8 specialist clinical laboratories operating across England. These laboratories provide a comprehensive range of clinical diagnostic and public health microbiology tests and services to the NHS and allied healthcare providers sector. SMS also includes a further five dedicated food, water and environmental (FW&E) testing laboratories who undertake statutory testing for the NHS, local authorities, and other key stakeholders. 4 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Formed in 2013 3 staff 2 Linux servers Amongst first public health institutes to see the potential of bioinformatics and fund it Now Bioinformatics Unit Infectious Disease Informatics 15 staff 512 cores (UGE) 300Tb usable HPC storage 5 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

MS Public Health Functions Questions we often ask of a pathogen isolate: 1. What is it? 2. What characteristics does it have? 3. How does it relate to other isolates? 6 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

1. What is it? 2. What characteristics does it possess? 1. Identify the infectious agent or exclude particular infections and associated risks. 2. Antibiotic resistant? Presence of toxins?

3. How does it relate to other isolates? Do cases of an infection have a common source or are they linked? What is the source and what are the risk factors? e.g: food, school, travel to certain countries What is the best way of: treating the affected? protecting others? limiting further spread? 8 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Pathogen typing Giving bugs a label If we discover isolates with the same type we can Include/exclude individual cases to an outbreak (e.g MRSA in hospitals) Establish an association between an outbreak of food poisoning and a specific food vehicle (e.g egg mayo sandwich) Trace the source of contaminants within a manufacturing process (e.g chocolate factory, baby feed) The type also helps Determine changes in microbial populations in response to interventions (e.g. vaccination strategies, vaccine escape) Study variations and trends in the pathogenicity, virulence and antibiotic resistance within a species (e.g new ABr acquisition) 9 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

20 th Century Microbiology 10 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Bacterial Identification Culture 11 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Bacterial Identification Gram Stain and API Strips 12 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Phenotypic Characterisation of Microbes Serotyping Sensitivity Testing Phage Typing 13 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Gel-based Typing methods Ciprofloxacin-resistant Salmonella Kentucky in Travellers http://wwwnc.cdc.gov/eid/article/12/10/06-0589-f1.htm 14 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

MLST: multi-locus sequence typing Locus adhp Putative function of gene Alcohol dehydrogena se (gbs0054) Size of sequence d fragment (bp) No. (%) of polymorp No. of alleles identified hic nucleotid e sites % G+C d n /d s 498 11 12 (2.4) 43.1 0.13 72286 Position in GBS genome ( bp) phes Phenylalanyl trna synthetase 501 5 7 (1.4) 37.1 0.17 912817 atr Amino acid transporter (gbs0538) 501 8 12 (2.4) 36.9 0.14 560085 glna Glutamine synthetase 498 6 6 (1.2) 35.7 0.12 1868862 sdha Serine dehydratase (gbs2105) 519 6 13 (2.5) 41.4 0.12 2179923 Sørensen U B S et al. mbio 2010; doi: 10.1128/mBio.00178-10 glck Glucose kinase (gbs0518) 459 4 7 (1.5) 42.6 0.13 538770 tkt Transketolas e (gbs0268) 480 5 8 (1.7) 38.9 0.42 287111 15 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Virus Identification 16 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Typing of Viruses 17 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Our vision for 21 st Century Microbiology 18 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Whole genome sequencing 19 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Why whole genome sequencing? Cost e.g replacement of Salmonella serotyping Speed e.g replacement of TB drug resistance testing Added value multiple outputs from one test Extra resolution increased discriminatory power over traditional technqiues 20 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Salmonella population structure Minimal spanning tree of MLST data for S. enterica subspecies enterica Each circle corresponds to a sequence type (ST) ebgs are natural clusters of genetically related isolates MLST STs correlate with serotypes Achtman et al., 2012 21 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Hypothetical WGS-based workflow for Diagnostics & Reference Microbiology 22 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Achieving the ambition 23 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Pilot studies: Lab protocols Clinical scientist Sequencer 24 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Pilot studies: Bioinformatics process Bioinformatician Blah, blah, blah X,Y,Z A,B,C Blah, blah.. 25 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Pilot studies: Interpretation? Department of Health officials, Doctors, Epidemiologists Blah, blah, blah X,Y,Z A,B,C Blah, blah.. 26 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Moving from pilot studies to routine WGS for public health microbiology 27 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Writing scripts is easy Creating software is hard In order for WGS to replace current tests the assays require accreditation (ISO15189) Quality Reproducibility Audit trail Any WGS-based test suitable for public health intervention will need Speed Resilience 28 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing Presentation title - edit in Header and Footer

Quality Working with laboratory scientists and epidemiologists in a 3-phase approach 1. Generation of a command line workflow based on user-requirements 2. User testing and accustomisation using Galaxy 3. Automation 29 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

How do we generate outputs? Quality assessment and trimming Important to be able to provide a quality score for the result as well as the reads Majority of our workflows use mapping rather than assembly Derivation of 7-locus MLST from mapping to loci that comprise the schema Gene profiling for ABr and virulence factors using single-copy housekeeping gene as +ve 30 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Reproducibility Fastq files are tagged campylobacter-jejuni-complex-typing : 2-0-0 : kmerid_pattern: Campylobacter (jejuni coli)" UID-sample_name-workflow-version.fastq.gz components Workflows : are described in a config file - component_name: "phe/qa_and_trim" campylobacter-jejuni-complex-typing : 2-0-0 : kmerid_pattern: Campylobacter (jejuni coli)" version: components "1-1" : - component_name: "phe/qa_and_trim" version: "1-1" - version: component_name: "1-0" "phe/kmerid" version: "1-0" - component_name: "phe/mlst_typing" version: - "1-1" component_name: "phe/gene_finder" version: "1-0" - component_name: "phe/combine_xml" version: "1-0" version: "1-0" - component_name: "phe/kmerid" - component_name: "phe/mlst_typing" - component_name: "phe/gene_finder" - The component_name: results for "phe/combine_xml" each sample are tagged with same workflow version: "1-0" and version 31 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Auditability Each sample is tracked throughout the process from sending lab to report output Metrics from lab processing recorded Sequencing quality is logged Each component of the bioinformatics process logs its own progress and success/ failure Only when all quality thresholds are achieved and all components are completed are results/reports transferred to end-users 32 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

isolates received Pathogen isolates Ad Hoc received Scripts Reports Plate and form Workflows: Clinical scientist Culture Bioinformatician submission UNIX System Administrator UGE Nucleic DDN Lustre-based 4 Hrs UNIX Sys Admin Workflow-specific Computing acid High performance Kmer Liquid components Cluster handling extraction storage robots Sequencing Identification Trimming Bioinformatician Sample Dilution G A T C on C Illumina Department Gene Library Reports Preparation of Serotype T G MLST G A C profile Automated HiSeq workflows 2500 Health Metrics=> officials G A type A C T LIMS G A T C C C C G A Rapid T Mode Web-based T G G A form C Gene G A A C T G A MLST T C C selecting C C G A T profile T G type G A C workflows for G A A C T samples Department of Health officials Library preparation and Sample Doctors, Preparation: Epidemiologists 24 Hrs sequencing: 72 Hrs Doctors, Epidemiologists Drug resistance C C G A T Consensus G sequence A T C C T G G A C G A A C T C C G A T 96 well plate Automated Bioinformatics Sequence Sequencing Technician G A T C C T G G A C G A A C T C C G A T G A T C C T G G A C G A A C T C C G A T Deplexing: 4hrs Clinical Sequencing scientist Technician

Speed and Resilience Infrastructure 34 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Speed and Resilience Sequencing machines UGE High Performance Computing cluster PHE/Colindale zone HAproxy & irods server With SSL SAN certificate irods / PHE zone irods / other zone? High Performance Storage DDN EXAScaler / Lustre filesystem DDN WOS object storage system PHE WAN WTSI? PHE/Birmingham zone Sequencing machines Computing and Storage system HAproxy irods server SSL SAN certificate DDN WOS object storage system 35 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Examples of WGS in action Routine samples processed from April to September 2014 Organism Number Processed Salmonella 3954 Staphylococcus aureus 913 Streptococcus pyogenes 1274 Streptococcus pneumoniae 959 Other bacteria 238 HCV 114 HEV 3 HIV 257 Influenza 187 MeV 47 Other viruses 35 Total 7981 36 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Lustre FS problems Apparently random failures to write or writing of incomplete files Bio-banking for phenotype-genotype studies Data release Timely release of raw data Minimal meta data o Date of isolation o o Current and Future Challenges Source (Human/Environmental/Food) Place (Country?) Policy makers still wary 37 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Current and Future Challenges Scalability Plan to scale to 3000 samples/week OpenStack for surge compute Currently have 300Tb usable Lustre storage Medium term archive to object store Release data to ENA (SRA) at EBI 16 weeks Lustre 6 months Object store for ever SRA Fastqs (CRAM?) Bam files All result/log/error and meta files Fastqs (with workflow descriptor) Text/pdf result files and reports Meta data Fastqs (CRAM?) 38 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

Acknowledgements Virtual Pathogen Genomics Unit Bioinformatics Unit Francesco Giannoccaro Matthew Goulden,Steven Platt, Rediat Tewolde, Aleksey Jironkin, Ali Al-Shahib, Ulf Schaefer, Kieren Lythgow Jonathan Green Other Bioinformaticians Tim Dallman, Phil Ashton Michel Doumith Reference and Specialist Microbiology laboratories Icons made by Freepik from www.flaticon.com 39 Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing