Metagenomic and metatranscriptomic analysis



Similar documents
Next Generation Sequencing Technologies in Microbial Ecology. Frank Oliver Glöckner

MoBEDAC -- Integrated data and analysis for the indoor and built environment. Folker Meyer Argonne National Laboratory GSC 13 Shenzhen, China

A Tutorial in Genetic Sequence Classification Tools and Techniques

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

NORTH PACIFIC RESEARCH BOARD SEMIANNUAL PROGRESS REPORT

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Microbial Oceanomics using High-Throughput DNA Sequencing

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Reliable PCR Components for Molecular Diagnostic Assays

DNA Barcoding in Plants: Biodiversity Identification and Discovery

G E N OM I C S S E RV I C ES

A Primer of Genome Science THIRD

Protocols. Internal transcribed spacer region (ITS) region. Niklaus J. Grünwald, Frank N. Martin, and Meg M. Larsen (2013)

Frequently Asked Questions Next Generation Sequencing

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Computational Genomics. Next generation sequencing (NGS)

Data Analysis for Ion Torrent Sequencing

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

Tribuna Académica. Overview of Metagenomics for Marine Biodiversity Research 1. Barton E. Slatko* Metagenomics defined

restriction enzymes 350 Home R. Ward: Spring 2001

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Introduction to NGS data analysis

SILVAngs - rdna-based microbial community analysis using next-generation sequencing (NGS) data - User Guide

Introduction Bioo Scientific

The world of non-coding RNA. Espen Enerly

Next Generation Sequencing

Structure and Function of DNA

PreciseTM Whitepaper

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

Core Facility Genomics

A data management framework for the Fungal Tree of Life

Metagenomics revisits the one pathogen/one disease postulates and translate the One Health concept into action

Bioinformatics Grid - Enabled Tools For Biologists.

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

QBOL, DNA barcodes to identify phytobacteria subjected to EU quarantine regulations

2.3 Identify rrna sequences in DNA

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Microbial community profiling for human microbiome projects: Tools, techniques, and challenges

IIID 14. Biotechnology in Fish Disease Diagnostics: Application of the Polymerase Chain Reaction (PCR)

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

Bioinformatics and its applications

Introduction to Bioinformatics 3. DNA editing and contig assembly

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Human Genome and Human Genome Project. Louxin Zhang

Nucleic Acid Techniques in Bacterial Systematics

Influence of the skin mechanical and microbial properties on hair growth

The NGS IT notes. George Magklaras PhD RHCE

14/12/2012. HLA typing - problem #1. Applications for NGS. HLA typing - problem #1 HLA typing - problem #2

Introduction to next-generation sequencing data

Mir-X mirna First-Strand Synthesis Kit User Manual

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

GenBank, Entrez, & FASTA

Daniel H. Huson. January 21, Contents 1. 1 Introduction 3. 2 Getting Started 5. 4 Licensing 6. 5 Program Overview 7. 7 Taxonomic Binning 9

Molecular diagnostic: from research to application

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

Deep Sequencing Data Analysis

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Activity 7.21 Transcription factors

nuts and bolts of DNA sequencing approaches and bioinformatic tools

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Description: Molecular Biology Services and DNA Sequencing

COMPARING DNA SEQUENCES TO DETERMINE EVOLUTIONARY RELATIONSHIPS AMONG MOLLUSKS

NGS Data Analysis: An Intro to RNA-Seq

Overview sequence projects

Typing in the NGS era: The way forward!

Module 10: Bioinformatics

International CEMarin Omics Workshop: Omics Techniques for the Study of Marine Organisms and Ecosystems

The Central Dogma of Molecular Biology

Bioinformática BLAST. Blast information guide. Buscas de sequências semelhantes. Search for Homologies BLAST

Algorithms for Next Generation Sequencing Data Analysis

PAGANTEC: OPENMP PARALLEL ERROR CORRECTION FOR NEXT-GENERATION SEQUENCING DATA

4. Why are common names not good to use when classifying organisms? Give an example.

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Microbiology. Chapter 1. of Microbiology. Many Diverse Disciplines: Biotechnology Genetic engineering & recombinant.

IMBB Genomic DNA purifica8on

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data


Bioinformatics Resources at a Glance

Global Networking of Collections WFCC and GBRCN perspectives. EMbaRC Seminar David Smith Cantacuzino Institute, Bucharest, Romania 8-9 March 2010

Deliverable First report on sample storage, DNA extraction and sample analysis processes

NGS data analysis. Bernardo J. Clavijo

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

RT-PCR: Two-Step Protocol

Transcription:

Metagenomic and metatranscriptomic analysis Marcelo Falsarella Carazzolle mcarazzo@lge.ibi.unicamp.br Laboratório de Genômica e Expressão (LGE) Unicamp

METAGENOMIC Jo Handelsman (1998) University of Wisconsin-EUA)

METAGENÔMICA --- 1980 1984 1991 1996 1998 2005 Marcadores moleculares (16S rrna) Carl Woese Biblioteca de 16S procariotos NGS Acreditava-se que todos os microrganismos eram cultiváveis. Extração do DNA direto do meio ambiente Metagenômica Quem são? O que eles fazem e como fazem?

METAGENÔMICA É a análise genômica das comunidades de microrganismos de um determinado ambiente ou habitat. O DNA amostrado é uma mistura de vários microrganismos

Meta-approaches

Microbial community

- Microbial populations - Bacterial 16S Ribosomal RNA - Fungal ITS - Metagenome sequencing - Genome assembly (wide distribution of genome coverage) - Gene prediction (based on ORF finder) - Identification of new enzymes based on conserved domain - Metatranscriptomic sequencing - Transcriptome assembly - Identification of new enzymes - Full-length cdna

Phylum level

Genus level

HP + = hot phenol

Microbial diversity - Mitochondrial gene (COX1) for animals - Ribulose 1,5-bisphosphate carboxylase gene (rbcl) for plants - Internal transcribed spacer of the ribosomal DNA (ITS) for fungi - 16S ribosomal RNA for bacteria

http://www.boldsystems.org/

Ribosomal genes

V4 region in 16S DNA barcode for bacteria 254 bp

Communicating current research and educational topics and trends in applied microbiology. Formatex, Spain, pp 783 787 (2007)

ITS region universal DNA barcode for fungi ITS length from ~300 to ~1200 bp

Ribosomal databases - Greengenes - http://greengenes.lbl.gov - 16S rrna gene database and alignment - Download: FASTA and ARB file format - Silva - http://www.arb-silva.de/ - aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) rrna for all three domains of life (Bacteria, Archaea and Eukarya) - Download: FASTA and ARB file format

RNA secondary structural alignment

Primers forward Primers reverse

METAGENÔMICA Terragenome - http://www.terragenome.org/ James R. Cole and James M. Tiedje from Michigan State University, David D. Myrold from Oregon State University, Cindy H. Nakatsu, Phillip R. Owens and from Purdue University, George Kowalchuk from Netherlands Institute of Ecology, Christoph Tebbe from Institut für Biodiversität, Braunschweig, 2010

METAGENÔMICA Earth Microbiome - http://www.earthmicrobiome.org/ Jack A. Gilbert, Folker Meyer and Rick Stevens from Argonne National Laboratory and University of Chicago, Jonathan Eisen (University of California, Davis), Jed Fuhrman (University of Southern California), Janet Jansson (Lawrence Berkley National Laboratory), Rob Knight and Noah Fierer (University of Colorado, Boulder), Mark Bailey (Center for Ecology and Hydrology, UK), George Kowalchuk (Netherlands Institute of Ecology), 2010.

High throughput sequencing (150) (200)

MiSeq atual performance

A combination of high throughput sequencing with pairedend reads and barcode methodologies 16S rrna Fungal ITS

OTU (operational taxonomic unit) http://nbviewer.ipython.org/github/gregcaporaso/an-introduction-to-applied- Bioinformatics/blob/master/algorithms/5-sequence-mapping-and-clustering.ipynb?create=1

Furthest neighbor clustering

Nearest neighbor clustering

Centroid clustering

Rarefaction curve

HMM BLASTx

Samples Taxonomy groups and false discovery rate (FDR).

Family level resolution (100bp non overlapping paired-end reads)

Genus level =>

Metagenomics and metatranscriptomics assembly Grafo de De Bruijn (Kmer = 7) Fonte: http://www.homolog.us/blogs/2011/07/28/de-bruijn-graphs-i/

Read: ATGGACCAGATGACAC (k=12) => ATGGACCAGATG TGGACCAGATGA GGACCAGATGAC GACCAGATGACA ACCAGATGACAC Dividir todos os reads em palavras de tamanho k (kmers) Contar número de ocorrências de cada k-mer distinto em todo o dataset

Grafo de De Bruijn

Reads per kilobase per million (RPKM)

Gene prediction in metagenomic and metatranscriptomic data

Conceito de ORF (Open Read Frame) Tamanho mínimo das ORFs => ~7 x 10-5 para L=50aa

Microbial diversity for enviromental risk assessment -Bacteria => V4 region amplification and sequencing via MiSeq -Fungi => ITS region amplification and sequencing via MiSeq -Barcode (46 samples/run) and paired-end (2x300bp) methodologies => ~U$1.200,00 -Large scale analysis using MOTHUR pipeline and SILVA ribosomal database (16S) -New methodologies for Fungal ITS analysis need to be developed

The V4 region in 16S ribosomal gene and ITS region in trascribed ribosomal locus are amplified and sequenced using high-throughput sequencing technology producing millions of overlapping paired-end reads. Multiple samples can be sequenced together using multiplexing adapter system.

Bacterial diversity Fungal diversity

FIM