Public Health Laboratory Workforce Development Bioinformatics

Similar documents
University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Use of Whole Genome Sequencing (WGS) of food-borne pathogens for public health protection

Next Generation Sequencing in Public Health Laboratories Survey Results

Whole genome sequencing of foodborne pathogens: experiences from the Reference Laboratory. Kathie Grant Gastrointestinal Bacteria Reference Unit

Bacterial Next Generation Sequencing - nur mehr Daten oder auch mehr Wissen? Dag Harmsen Univ. Münster, Germany dharmsen@uni-muenster.

G E N OM I C S S E RV I C ES

2 Short biographies and contact information of the workshop organizers

Phylogenetic Trees Made Easy

Typing in the NGS era: The way forward!

Delivering the power of the world s most successful genomics platform

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Next Generation Sequencing

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

E. coli plasmid and gene profiling using Next Generation Sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

A Primer of Genome Science THIRD

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Introduction to next-generation sequencing data

General Services Administration Federal Supply Service Authorized Federal Supply Schedule Price List

Next Generation Sequencing: Technology, Mapping, and Analysis

LifeScope Genomic Analysis Software 2.5

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Introduction to Bioinformatics AS Laboratory Assignment 6

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

NGS data analysis. Bernardo J. Clavijo

Module 1. Sequence Formats and Retrieval. Charles Steward

Analysis of ChIP-seq data in Galaxy

Use of Whole Genome. of food-borne pathogens for public health protection. Efsa Scientific Colloquium Summary Report June 2014, Parma, Italy

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

DNA Sequencing and Personalised Medicine

An introduction to bioinformatic tools for metagenetic and population genomic data analysis, 2.0 higher education credits

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

HP-UX Essentials and Shell Programming Course Summary

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

CLOSHA MANUAL ver1.1. KOBIC (Korean Bioinformation Center) Bioinformatics Workflow management System in Bio-Express

FACULTY OF MEDICAL SCIENCE

Identification and Characterization of Foodborne Pathogens by Whole Genome Sequencing: A Shift in Paradigm

Provisioning robust automated analytical pipelines for whole genome-based public health microbiological typing

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Overview sequence projects

Final Project Report

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Introduction to NGS data analysis

Next generation sequencing (NGS)

Practical Solutions for Big Data Analytics

CD-HIT User s Guide. Last updated: April 5,

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

UMass High Performance Computing Center

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Basic Course on Bioinformatics tools for Next Generation Sequencing data mining June, 2015 Istituto Superiore di Sanità, SIDBAE Training Room

Next generation DNA sequencing technologies. theory & prac-ce

Bioinformatics and its applications

Bioinformatics Grid - Enabled Tools For Biologists.

A data management framework for the Fungal Tree of Life

Bio-Informatics Lectures. A Short Introduction

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

HPC Wales Skills Academy Course Catalogue 2015

UF EDGE brings the classroom to you with online, worldwide course delivery!

PreciseTM Whitepaper

Workshop on Methods for Isolation and Identification of Campylobacter spp. June 13-17, 2005

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Quality Assurance and Validation of Next Generation Sequencing

Metagenomics revisits the one pathogen/one disease postulates and translate the One Health concept into action

Vector NTI Advance 11 Quick Start Guide

Protein Sequence Analysis - Overview -

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

UGENE Quick Start Guide

Next Generation Sequencing; Technologies, applications and data analysis

Computational Genomics. Next generation sequencing (NGS)

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Visualization of Phylogenetic Trees and Metadata

BioHPC Web Computing Resources at CBSU

OMU350 Operations Manager 9.x on UNIX/Linux Advanced Administration

What is a contig? What are the contig assembly programs?

Version 5.0 Release Notes

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

Institutional Partnership Program

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

NIAID Genomics and Bioinformatics Programs

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

-> Integration of MAPHiTS in Galaxy

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Deep Sequencing Data Analysis

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

How Sequencing Experiments Fail

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

A demonstration of the use of Datagrid testbed and services for the biomedical community

Transcription:

Public Health Laboratory Workforce Development Bioinformatics Templates for Course Development Contents Overview... 1 Going Beyond the Introductory Courses... 1 Course Templates... 3 Template 1: Introduction to Bioinformatics for Public Health Laboratorians... 3 Day 1... 3 Day 2... 4 Template 2: Hands-On Introduction to Linux for PHL Bioinformaticians... 5 Template 3: Introduction to Next-Generation Sequencing and Bioinformatics for Epidemiologists... 6 Overview This document is provided as an example of curricula for introductory courses on bioinformatics for public health laboratory and epidemiology staff. Public health departments planning training for their staff may wish to use this as a basis for developing their own course series. The materials included here are intended to be used as templates as examples of how introductory courses might be organized. It is anticipated that health departments choosing to use these will adapt them to their own setting and to meet the special needs of their own staff. Other health departments may wish to develop their own curricula from scratch. CDC doesn t endorse a particular approach to curriculum development, except to emphasize that it needs to be tailored to account for the existing skill level of the staff and the available resources for teaching the course. Going Beyond the Introductory Courses For individual learning, there are numerous online courses, some paid, some for free, in a wide variety of formats. Results of a 2015 survey of online options is available on CDC s AMD website (http://www.cdc.gov/amd/pdf/environmentalscan-08-04-15.pdf). 1

In addition, CDC encourages health departments to work with local academic and other institutions to develop course options customized to the needs of their staff. For examples of potential topics for courses, refer to the IHRC web site (http://abil.ihrc.com/training.html), which shows 2- and 4-day courses developed by IHRC and the faculty at Georgia Tech for trainings at CDC [note: CDC is including the website as an example and is NOT endorsing the use of this or any other specific vendor to develop the training]. 2

Course Templates Template 1: Introduction to Bioinformatics for Public Health Laboratorians Format: two days, lectures The table below provides an example curriculum for a 2-day introduction to bioinformatics. It is aimed at microbiologists who have little or no bioinformatics training and is based on experience at CDC in offering such training for laboratory staff. Note that the two days are somewhat intense, and consist entirely of lectures. While a 2-day course is not unreasonable, where feasible, some groups may wish to divide this into two separate one-day courses. Another option is to divide the second day into two half-days, using the rest of the time for another purpose, such as hands-on training. As with all templates in this document, this is meant to be modified as much as needed to meet local needs. Time 75m Day 1 Topic Basic Pathogen Genomics have a basic understanding of the pathogen genomics to be used throughout the course Sequencing Technology--Library prep to raw NGS data have a basic understanding of sequencing technologies, including both Sanger and NGS understand the major advantages and disadvantages of each technology have an understanding of basic workflows for DNA-seq, RNA-seq, metagenomics sequencing have a sense library prep technologies for NGS (single-end and paired-end) know the file formats generated by these technologies know terminology commonly used in NGS Genome Assembly raw NGS data to assembled contigs be able to explain different metrics used to assess NGS sequence quality be able to explain why trimming and filtering raw NGS data is necessary and how it can affect final assembly be able to describe the process of assembling reads into contigs be able to describe what metrics are used to assess assembly quality (coverage, N50, etc.) Lunch Databases and Searches be able to describe the basic algorithm behind BLAST (words, HSP, e-value) be able to list commonly used databases for dna, protein, functional annotation, metabolic pathways) Sequence Alignments be able to describe methods used to align 2 sequences and how that method can be extended to multiple sequences (DNA and Protein) 3

90m Genomics be able to explain what an alignment is and what SNPs and InDels are be able to list and describe current molecular typing techniques (PFGE, MLST, MLVA, VNTR, SNP, wgmlst) explain (traditional) MLST, and wgmlst how Kmer based analysis is used to find SNPs and what advantage/disadvantages it has over mgmlst Day 2 Metagenomics be able to explain different types of metagenomics approaches be able to describe several tools used to analyze metagenomic data Phylogenetic Reconstruction be able to explain different type of trees, both in terms of the data underlying the trees and the tree topology know how to interpret key aspects of trees topology, branch lengths, bootstrap values, etc. understand implications of tree structure and how to interpret clustering within a tree Gene Finding and Function Assignments be able to describe the two main methods for gene finding be able to list some examples of these methods be able to describe methods for functional assignments be able to list several of the databases used in functional annotations Sequence Visualization be able to describe examples of genome browsers commonly used be able to list the types of data that can be displayed on genome browsers Lunch Phenotypic Inference understand how phenotype is inferred from sequence be able to cite examples of phenotypic characteristics that can be inferred from sequence understand the limitations of phenotypic inference Molecular Epidemiology (case studies/examples) be able to describe some examples of how CDC and the PHLs are using molecular epidemiology to detect, investigate, and track outbreaks Commonly Used Bioinformatics Software be able to list some of the commonly used bioinformatics tools for different analytical tasks 4

Template 2: Hands-On Introduction to Linux for PHL Bioinformaticians Format: one day, hands-on training using the Linux OS This course is designed as a hands-on training for staff who understand the basics of bioinformatics and wish to gain some facility with more advanced software, much of which is open source and freely available, but which requires an understanding of Linux and the use of the command line. This is typically not a course for all laboratory staff, but is an excellent option for those who want or need to have access to more sophisticated analytic tools. Each student in the course will need access to a computer with Linux OS (most common distributions should work) and, if the HPC section is included, will need remote access to an HPC cluster. For this reason, a computer training laboratory would be an excellent site for the training. As an alternative to this course, health departments that have licensed commercial bioinformatics software might consider training for that software. 90m 90m Basic LINUX be able to execute simple LINUX commands (directory listings, create a directory, change directories, move, copy, and view files) be able to get the manual pages for LINUX commands be able to describe what a linux desktop environment is move files from PC to LINUX use a simple text editor be able to describe the difference between relative and absolute paths Intermediate LINUX be able to demonstrate the use of file permissions and how to change them be able to demonstrate how to redirect and pipe output be able to demonstrate the use of wildcards be able to demonstrate the use of grep have a basic understanding of the LINUX system file structure be able to describe what an environment variable is be able to tar and compress a directory and contents Lunch Scripting be able to write a basic bash script and execute be able to describe how scripts are used in building analysis pipelines 5

90m HPC Cluster Computing be able to describe how cluster computing environments can used in analysis applications be able to describe what a queueing system is and how they work be able to describe how applications submit jobs to queueing systems and how to retrieve results from the analysis be able to submit a job to a queue using qsub be able to check status, delete, and list details of their jobs using various q commands be able to create a simple qsub script and execute Template 3: Introduction to Next-Generation Sequencing and Bioinformatics for Epidemiologists Format: one day, lectures and structured case studies The table below shows a one-day course for epidemiologists. The course is based on a curriculum developed by microbiologists and epidemiologists at CDC and is scheduled to be offered several times in 2016. Because this course is taught by CDC staff, slides should be available for use elsewhere. The template includes case studies during the afternoon. Ideally, these should be based on local experience. Time Introduction Basic Pathogen Genomics have a basic understanding of the pathogen genomics to be used throughout the course DNA and RNA Typical genome sizes and characteristics o Viruses o Bacteria o Eukaryotes Special topics o Plasmids o Phages o Other mobile elements o Recombination Other important technologies o DNA/RNA extraction o PCR; real-time PCR 6

45m Sequencing Technology--Library prep to raw NGS data have a basic understanding of sequencing technologies, including both Sanger and NGS understand the major advantages and disadvantages of each technology have a sense library prep technologies for NGS know terminology commonly used in NGS Sanger sequencing how it works, limitations of the technology NGS technology how it works; advantages and disadvantages of different technologies o Illumina (Note: plan to spend most of time on this, given its role at CDC and in state labs), including how samples are multiplexed on a run o Other technologies Ion Torrent PacBio Oxford Nanopore Library prep for NGS o Enzymatic vs physical o Walk-through of typical enzymatic prep o Terminology Genome Assembly raw NGS data to assembled contigs be able to explain different methods for processing raw NGS data into contigs have a sense of how bioinformaticians accomplish this assembly Consensus sequence vs. variants Sequence Comparison understand what an alignment is be able to explain (traditional) MLST and wgmlst understand what a SNP analysis is and some of the considerations that go into SNP analysis What is an alignment? MLST What it is and why it was used for bacterial classification wgmlst o How this differs from traditional MLST o Automated vs curated o Flavors of wgmlst (cgmlst, etc.) SNP analysis o What is a SNP? o What needs to be masked out in a SNP analysis o Effect of reference sequence on results SNPs vs wgmlst advantages and disadvantages 7

Phenotypic Inference understand how phenotype is inferred from sequence be able to cite examples of phenotypic characteristics that can be inferred from sequence understand the limitations of phenotypic inference Common types of phenotypic inference o antibiotic and antiviral resistance o serotype o virulence How phenotypic inference is done (i.e., generally by comparisons with databases, not by heuristics) Limitations of phenotypic inference need for validation Lunch 45m Trees and Clusters be able to explain different type of trees, both in terms of the data underlying the trees and the tree topology know how to interpret key aspects of trees topology, branch lengths, etc. understand implications of tree structure and how to interpret clustering within a tree Basics of trees o Sequence-based vs distance-based o Rooted vs. unrooted o Formats: rectangular, diagonal, polar, radial, etc. Distance-based trees o Nucleotide substitution models (what they are and how they affect trees; not the details) o How neighbor-joining trees work Sequence-based tree example: maximum likelihood tree Branch lengths what do they mean? Numbers at nodes what do those mean? How to interpret Special case: Bayesian tree, and what to infer from one Case Studies 1 8

NGS: applications, pathogens and paradigms be able to explain what the AMD program is understand that the application of NGS goes beyond cluster detection be able to cite several examples of how NGS is being used in public health be able to explain what capacity state and local health departments are developing in NGS The AMD program what it is Examples of applications of NGS at CDC o Flu inferring phenotype to improve efficiency and improve vaccine strain selection o Streptococcus inferring phenotype to improve efficiency and to make phenotyping (including serotyping) more widely available o Hepatitis C the importance of quasispecies o Pertussis understanding emergence of a pathogen o HIV understanding transmission, integrating genomic and epi data o URDO Seq a new paradigm for solving outbreaks NGS in state health departments what pathogens will states be typing in the near future Case Studies 2 Other Technologies and Omics be familiar with a few other important AMD technologies be able to name several other important -omics fields Other AMD technologies o MALDI-TOF o Optical Mapping o Synthetic long-reads Metagenomics o What is metagenomics? o The microbiome o Metagenomics for diagnosis (examples: TB, foodborne) Other Omics (give several examples with very brief explanation) Case Studies 3 Wrap-up, evaluation Include state health department perspective on this how will this technology impact state health lab operations, what pathogens do they see as priorities 9