Next Generation Sequencing Technologies in Microbial Ecology. Frank Oliver Glöckner



Similar documents
Metagenomic and metatranscriptomic analysis

Nucleic Acid Techniques in Bacterial Systematics

Influence of the skin mechanical and microbial properties on hair growth

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

MASTER OF SCIENCE IN BIOLOGY

SILVAngs - rdna-based microbial community analysis using next-generation sequencing (NGS) data - User Guide

Empirical Testing of 16S PCR Primer Pairs Reveals Variance in Target Specificity and Efficacy not Suggested by in Silico Analysis

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Bioinformatics Grid - Enabled Tools For Biologists.

2.3 Identify rrna sequences in DNA

Tribuna Académica. Overview of Metagenomics for Marine Biodiversity Research 1. Barton E. Slatko* Metagenomics defined

Genomics GENterprise

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Microbial Oceanomics using High-Throughput DNA Sequencing

4. Why are common names not good to use when classifying organisms? Give an example.

Difficult DNA Templates Sequencing. Primer Walking Service

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

A data management framework for the Fungal Tree of Life

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Bioprospecting. for. Microalgae

Marine Microbial Diversity and its role in Ecosystem Functioning and Environmental Change

Typing in the NGS era: The way forward!

Introduction to Bioinformatics 3. DNA editing and contig assembly

NGS data analysis. Bernardo J. Clavijo

UCHIME in practice Single-region sequencing Reference database mode

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

IIID 14. Biotechnology in Fish Disease Diagnostics: Application of the Polymerase Chain Reaction (PCR)

Metagenomics revisits the one pathogen/one disease postulates and translate the One Health concept into action

Bioinformatics Resources at a Glance

A Primer of Genome Science THIRD

Next Generation Sequencing

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

PreciseTM Whitepaper

A Tutorial in Genetic Sequence Classification Tools and Techniques

Biotechnology: DNA Technology & Genomics

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Overview sequence projects

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

The world of non-coding RNA. Espen Enerly

TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL

Current Motif Discovery Tools and their Limitations

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Biology Majors Information Session. Biology Advising Center NHB 2.606

The Central Dogma of Molecular Biology

Microbial community profiling for human microbiome projects: Tools, techniques, and challenges

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Protocols. Internal transcribed spacer region (ITS) region. Niklaus J. Grünwald, Frank N. Martin, and Meg M. Larsen (2013)

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

BME Engineering Molecular Cell Biology. Lecture 02: Structural and Functional Organization of

BIOLOGICAL SCIENCES REQUIREMENTS [63 75 UNITS]

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Overview of Next Generation Sequencing platform technologies

3. About R2oDNA Designer

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

IMCAS-BRC: toward better management and more efficient exploitation of microbial resources

An Overview of DNA Sequencing

Introduction To Real Time Quantitative PCR (qpcr)

Recombinant DNA and Biotechnology

Using network analysis to explore co-occurrence patterns in soil microbial communities

PRACTICAL APPROACH TO ECOTOXICOGENOMICS

Vector NTI Advance 11 Quick Start Guide

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

DNA Sequence Analysis

The NGS IT notes. George Magklaras PhD RHCE

Graduate School of Excellence Exzellenz-Graduiertenschule

COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

Module 1. Sequence Formats and Retrieval. Charles Steward

Classification of Microorganisms (Chapter 10) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College Eastern Campus

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Forensic DNA Testing Terminology

Storage Solutions for Bioinformatics

Chapter 4.3. of Molecular Plant Physiology Am Mühlenberg 1, D Golm, GERMANY;

July 7th 2009 DNA sequencing

DNA Barcoding in Plants: Biodiversity Identification and Discovery

How Sequencing Experiments Fail

NORTH PACIFIC RESEARCH BOARD SEMIANNUAL PROGRESS REPORT

restriction enzymes 350 Home R. Ward: Spring 2001

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Transcription:

Next Generation Sequencing Technologies in Microbial Ecology Frank Oliver Glöckner 1

Max Planck Institute for Marine Microbiology Investigation of the role, diversity and features of microorganisms Interactions with physical and chemical processes in marine and other aquatic habitats Founded 1992 in Bremen, Germany 2

Marine Microbiology at MPI a Holistic Approach Who is out there and How much of which kind? What are they doing and Under which conditions are they doing what? 3

Promise of NGS: Much Denser Network of Data Phylogenetic diversity Qualitative data Quantitative data Organisms Environment Functional diversity Functional inventory Operon structures Expression profiles x, y, z, t Environmental descriptors -> Integrated datasets Genes 4

Data Integration www.megx.net Kottmann et al., NAR, submitted 5

Ecological Genomics The Vision Statistics Key parameters Ecosystems Biology Modelling Predictions 6

Ribosomal RNA as a universal marker gene Full cycle rrna-approach sample extracted nucleic acids DNA rrna nucleic acid probe rdna clones Pyrosequencing rdna sequences comparative analysis rdna dataset Amann, 1995 hybridization phylogeny 7

Diversity Analysis Sample Clone lib 100-500 2-3 month PCR High diversity Pedros-Alio, Trends in Microbiology, 2006, vol. 12, issue 6, page 257 8

Diversity Analysis Sample Clone lib 100-500 2-3 month PCR High diversity Tags 10,000-50,000 1 week 9

Problems Processing the data Accuracy/Quantitative? DNA/RNA extraction Multiple Operons Technical replicates Noise (sequencing errors ) 10

SILVA Databases Specifications Comprehensive & Aligned Bacteria, Archaea, Eukarya SSU, LSU Regularly updated Quality first Quality management Transparent process documentation Integrative Nomenclature Taxonomy Cultured, Typestrains Habitat (r100) 11

Growth of rrna databases (RDP & SILVA) 1000000 Growth of SSU ribosomal RNA databases (RDP II & SILVA) www.arb-silva.de 995747 900000 800000 756668 700000 rrna Sequences 600000 500000 400000 Comprehensive ribosomal RNA databases www.arb-silva.de 504295 300000 286257 200000 100000 0 473 1379 2251 2849 2849 4332 6205 7322 16277 16277 60274 83960 101781 194696 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pruesse et al. NAR 2007, vol. 35, no 21, page 7188 2005 2006 2007 2008 SILVA 100 12

SILVA SSURef 100: Fully classified guide tree www.arb-silva.de 13

ARB Software Suite www.arb-home.de A Software Environment for Sequence Data Ludwig et al. NAR, 2004 ARB 5.0, 64 bit version released on 04. September 2009 14

Problems Processing the data Accuracy/Quantitative? DNA/RNA extraction Multiple Operons Noise (sequencing errors ) Technical replicates 15

Accuracy The rare biosphere : a reality check Reeder and Knight, 2009, Nature Methods vol. 6, no. 9, p. 636 16

SILVA SSUParc 100, Sequence Length Distribution www.arb-silva.de 100000 Comprehensive ribosomal RNA databases 90000 80000 70000 rrna Sequences 60000 50000 40000 30000 20000 10000 0 2000 1900 1800 1700 1600 1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0 Length (bases) 17

Technical Replicates I Helgoland Sample 11.02.2009, 454 Ti 1/2 PTP Gomez-Alvarez et al., 2009, ISME Journal, pages 1-4 18

Technical Replicates II Helgoland Sample 14.04.2009, 454 Ti 1/2 PTP 19

Unclassified Viridiplantae TA06 Spirochaetes SHA-109 Rhodophyta et al. Planctomycetes ML635J-21 Metazoa Lentisphaerae Gemmatimonadetes Gammaproteobacteria_4 20 Gammaproteobacteria_3 Gammaproteobacteria_2 Gammaproteobacteria_1 Gammaproteobacteria Fusobacteria Technical Replicates III - Dereplication 25,0% SSU Dereplicated 20,0% 15,0% 10,0% 5,0% 0,0% Fungi Firmicutes Euryarchaeota Euglenozoa Epsilonproteobacteria Deltaproteobacteria Deferribacteres Cyanobacteria Crenarchaeota Chloroflexi Chlorobi Candidate division WS6 Candidate division WS3 Candidate division WS1 Candidate division TM6 Candidate division SR1 Candidate division OP8 Candidate division OP3 Candidate division OP11 Candidate division OD1 Candidate division BRC1 Betaproteobacteria BD1-5 Bacteroidetes Amoeba Actinobacteria Acidobacteria

A Bioinformatic Workbench for Ecological Genomics Comprehensive ribosomal RNA databases A Software Environment for Sequence Data Organisms Environment Genes 21

Functional Diversity Analysis Sample Fosmids Random sequencing 10-100 ~ 400-4000 ORFs NGS 1000 40,000 ORFs DNA 40,000 2-3 month End sequencing 20,000 ORFs High diversity 22

Fosmid Sequencing Fosmids: 42 PTP: 1/4, 454 Ti min.: 101 bp max.: 48,265 bp mean: 8,085 bp Costs: 3000 Euro 48 contigs > 10 kb 23

Functional Diversity Analysis Sample Tags DNA 500,000-1 Mio 1 week Statistics BLAST COGs Classify Assembly? High diversity 24

Assembly 25

Are we prepared for the Data Flood? 26

Computing Infrastructure Cooling - Liquid cooling with 5,000 L/h at 8 C Power - 25-30 kw at peak - installed: 40 kva Storage - 8 TByte RAID file server - 4 TByte RAID database server Computers -43 cluster nodes - several larger servers -400 CPU cores 27

Moore s Law - Outcompeted 28

Take Home Message for the Next Generation Biologists Three languages! Mother tongue English Perl, Phyton Data management Garbage in -> garbage out! Standardisation www.gensc.org 29

The Group http://www.microbial-genomics.de Thanks for your attention 30