Metagenomic and metatranscriptomic analysis Marcelo Falsarella Carazzolle mcarazzo@lge.ibi.unicamp.br Laboratório de Genômica e Expressão (LGE) Unicamp
METAGENOMIC Jo Handelsman (1998) University of Wisconsin-EUA)
METAGENÔMICA --- 1980 1984 1991 1996 1998 2005 Marcadores moleculares (16S rrna) Carl Woese Biblioteca de 16S procariotos NGS Acreditava-se que todos os microrganismos eram cultiváveis. Extração do DNA direto do meio ambiente Metagenômica Quem são? O que eles fazem e como fazem?
METAGENÔMICA É a análise genômica das comunidades de microrganismos de um determinado ambiente ou habitat. O DNA amostrado é uma mistura de vários microrganismos
Meta-approaches
Microbial community
- Microbial populations - Bacterial 16S Ribosomal RNA - Fungal ITS - Metagenome sequencing - Genome assembly (wide distribution of genome coverage) - Gene prediction (based on ORF finder) - Identification of new enzymes based on conserved domain - Metatranscriptomic sequencing - Transcriptome assembly - Identification of new enzymes - Full-length cdna
Phylum level
Genus level
HP + = hot phenol
Microbial diversity - Mitochondrial gene (COX1) for animals - Ribulose 1,5-bisphosphate carboxylase gene (rbcl) for plants - Internal transcribed spacer of the ribosomal DNA (ITS) for fungi - 16S ribosomal RNA for bacteria
http://www.boldsystems.org/
Ribosomal genes
V4 region in 16S DNA barcode for bacteria 254 bp
Communicating current research and educational topics and trends in applied microbiology. Formatex, Spain, pp 783 787 (2007)
ITS region universal DNA barcode for fungi ITS length from ~300 to ~1200 bp
Ribosomal databases - Greengenes - http://greengenes.lbl.gov - 16S rrna gene database and alignment - Download: FASTA and ARB file format - Silva - http://www.arb-silva.de/ - aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) rrna for all three domains of life (Bacteria, Archaea and Eukarya) - Download: FASTA and ARB file format
RNA secondary structural alignment
Primers forward Primers reverse
METAGENÔMICA Terragenome - http://www.terragenome.org/ James R. Cole and James M. Tiedje from Michigan State University, David D. Myrold from Oregon State University, Cindy H. Nakatsu, Phillip R. Owens and from Purdue University, George Kowalchuk from Netherlands Institute of Ecology, Christoph Tebbe from Institut für Biodiversität, Braunschweig, 2010
METAGENÔMICA Earth Microbiome - http://www.earthmicrobiome.org/ Jack A. Gilbert, Folker Meyer and Rick Stevens from Argonne National Laboratory and University of Chicago, Jonathan Eisen (University of California, Davis), Jed Fuhrman (University of Southern California), Janet Jansson (Lawrence Berkley National Laboratory), Rob Knight and Noah Fierer (University of Colorado, Boulder), Mark Bailey (Center for Ecology and Hydrology, UK), George Kowalchuk (Netherlands Institute of Ecology), 2010.
High throughput sequencing (150) (200)
MiSeq atual performance
A combination of high throughput sequencing with pairedend reads and barcode methodologies 16S rrna Fungal ITS
OTU (operational taxonomic unit) http://nbviewer.ipython.org/github/gregcaporaso/an-introduction-to-applied- Bioinformatics/blob/master/algorithms/5-sequence-mapping-and-clustering.ipynb?create=1
Furthest neighbor clustering
Nearest neighbor clustering
Centroid clustering
Rarefaction curve
HMM BLASTx
Samples Taxonomy groups and false discovery rate (FDR).
Family level resolution (100bp non overlapping paired-end reads)
Genus level =>
Metagenomics and metatranscriptomics assembly Grafo de De Bruijn (Kmer = 7) Fonte: http://www.homolog.us/blogs/2011/07/28/de-bruijn-graphs-i/
Read: ATGGACCAGATGACAC (k=12) => ATGGACCAGATG TGGACCAGATGA GGACCAGATGAC GACCAGATGACA ACCAGATGACAC Dividir todos os reads em palavras de tamanho k (kmers) Contar número de ocorrências de cada k-mer distinto em todo o dataset
Grafo de De Bruijn
Reads per kilobase per million (RPKM)
Gene prediction in metagenomic and metatranscriptomic data
Conceito de ORF (Open Read Frame) Tamanho mínimo das ORFs => ~7 x 10-5 para L=50aa
Microbial diversity for enviromental risk assessment -Bacteria => V4 region amplification and sequencing via MiSeq -Fungi => ITS region amplification and sequencing via MiSeq -Barcode (46 samples/run) and paired-end (2x300bp) methodologies => ~U$1.200,00 -Large scale analysis using MOTHUR pipeline and SILVA ribosomal database (16S) -New methodologies for Fungal ITS analysis need to be developed
The V4 region in 16S ribosomal gene and ITS region in trascribed ribosomal locus are amplified and sequenced using high-throughput sequencing technology producing millions of overlapping paired-end reads. Multiple samples can be sequenced together using multiplexing adapter system.
Bacterial diversity Fungal diversity
FIM