Understanding the Microbiome: Metatranscriptomics. Marcus Claesson APC Microbiome Symposium 2015

Similar documents
G E N OM I C S S E RV I C ES

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Challenges associated with analysis and storage of NGS data

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

NGS Data Analysis: An Intro to RNA-Seq

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Services. Updated 05/31/2016

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

PreciseTM Whitepaper

Expression Quantification (I)

Basic processing of next-generation sequencing (NGS) data

Next Generation Sequencing

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Bioinformatics Unit Department of Biological Services. Get to know us

A Primer of Genome Science THIRD

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

The world of non-coding RNA. Espen Enerly

Introduction to NGS data analysis

Deep Sequencing Data Analysis

Practical Solutions for Big Data Analytics

mrna NGS Data Analysis Report

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

How Sequencing Experiments Fail

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

History of DNA Sequencing & Current Applications

Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Next generation DNA sequencing technologies. theory & prac-ce

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

TGC AT YOUR SERVICE. Taking your research to the next generation

Frequently Asked Questions Next Generation Sequencing

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Analysis of ChIP-seq data in Galaxy

Overview of Next Generation Sequencing platform technologies

A survey of best practices for RNA-seq data analysis

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Next Generation Sequencing: Technology, Mapping, and Analysis

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Structure and Function of DNA

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Comparing Methods for Identifying Transcription Factor Target Genes

NORTH PACIFIC RESEARCH BOARD SEMIANNUAL PROGRESS REPORT

Introduction to next-generation sequencing data

European Medicines Agency

Human Genome Organization: An Update. Genome Organization: An Update

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, b.patel@griffith.edu.

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

GenBank, Entrez, & FASTA

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

CCTS Informatics Expertise. Auburn University (AU) HudsonAlpha Institute for Biotechnology (HAIB)

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Core Facility Genomics

Bioinformatics Resources at a Glance

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Twincore - Zentrum für Experimentelle und Klinische Infektionsforschung Institut für Molekulare Bakteriologie

Gene Expression Analysis

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

Efficient tool deployment to the Galaxy Cloud: An RNA-seq workflow case study

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Normalization of RNA-Seq

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Computational Genomics. Next generation sequencing (NGS)

Methods, tools, and pipelines for analysis of Ion PGM Sequencer mirna and gene expression data

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

NGS data analysis. Bernardo J. Clavijo

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

CIDANE: comprehensive isoform discovery and abundance estimation

First Strand cdna Synthesis

Microbial Oceanomics using High-Throughput DNA Sequencing

CompleteⅡ 1st strand cdna Synthesis Kit

REAL TIME PCR USING SYBR GREEN

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

Microarray Technology

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Protein Synthesis How Genes Become Constituent Molecules

Measuring gene expression (Microarrays) Ulf Leser

Athanasia Pavlopoulou University of Thessaly, Lamia June 2015

Benjamin Czech, Jonathan B. Preall, Jon McGinn, and Gregory J. Hannon

Introduction. Overview of Bioconductor packages for short read analysis

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Gene Models & Bed format: What they represent.

Transcription:

Understanding the Microbiome: Metatranscriptomics Marcus Claesson APC Microbiome Symposium 2015

Metatranscriptomics Definition (genetics, ecology) A branch of transcriptomics that studies and correlates, the transcriptomes of a group of interacting organisms or species --Wiktionary--

Sequence-based omics technologies Metatranscriptome: Protein-coding RNA (mrna) Non-coding RNA (rrna, trna, regulatory RNA, etc) Metatranscriptomics studies: Community functions Response to different environments / treatments; differential gene expression Regulation of gene expression Microbiota compositional analysis Quantification of the ubiquitous 16S rrna gene WHAT organisms are there? Metagenomics shotgun sequencing Encoded potential functions of the microbiota What CAN they do? Metatranscriptomics cdna/mrna sequencing Microbial gene expression at certain times and/or locations What ARE they doing?

Measuring the transcriptome cdna clone libraries + Sanger sequencing Microarrays: hybridizing mrna onto cdna/oligonucleotide proves on glass slide RNA-seq enabled by nextgeneration sequencing technologies usually Illumina HiSeq RNA-seq superior to microarrays for gene expression analysis of microbial communities

From RNA to sequence data Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671 682

Challenges and considerations Wet lab Deplete host RNA or not? Dual RNA-Seq an option Instability of RNA (half-lives of minutes) High rrna content in total RNA (mrna<5% of total RNA) Single-end or Paired-end? Stranded cdna transcription or not? How much sequencing per sample? Technical replicates not necessary any more, but biological are Avoid batch effect -> randomize samples across runs! Bioinformatics http://www.nwfsc.noaa.gov/index.cfm General challenges with short reads and large data size Lack of metagenome/genome reference Statistical considerations Assemble or map reads, or both? http://cybernetnews.com/vista-recovery-disc/

rrna removal methods Majority of bacterial mrna is not polyadenylated, can t be isolated using oligo-dt selection Subtractive Hybridization Exonuclease Digestion Ribo-Zero / MICROBExpress Bacterial mrna Enrichment mrna-only Prokaryotic mrna Isolation 5 PPP mrna 5 P 5 Monophosphate Dependent Exonuclease rrna Figure 1. Performance evaluation of five rrna depletion methods. Distribution of RNA-seq reads aligning to protein-coding sequences (CDS; blue), rrna (red), and other regions (trna, non-coding RNA, small RNA, and intergenic regions; gray) for undepleted total RNA (top) and five rrna depletion protocols. Giannoukos et al. Genome Biology 2012,

Experimental pipeline Inflamed/uninflamed colonic biopsies Pilot project: 12 CD & 6 UC Main project: 60 CD, 86 UC & 30 HC Total RNA extracted: RNAlater Trizol PostDoc Emilio Laserna Eukaryotic RNA depleted: Microbe Enrich cdna amplification: GenomiPhi cdna synthesized: MMLV-reverse ScriptSeq transcriptase Ribosomal RNA depleted: Microbe RiboZero Express RNA-Seq library prep: Illumina TruSeq RNA-Seq: Illumina HiSeq 2500 2000 72 15 million 100 125 bp PE reads per sample

Options for DGE analysis (tuxedo suite) Differential Gene Expression Bowtie2 and Bowtie use Burrows-Wheeler indexing for aligning reads. With bowtie2 there is no upper limit on the read length Tophat2 uses Bowtie2 to align reads in a splice-aware manner and aids the discovery of new splice junctions The Cufflinks2 package has 4 components, the 2 major ones are listed below - Cufflinks2 does both de novo and reference-based transcriptome assembly Cuffdiff2 does statistical analysis and identifies differentially expressed transcripts in a simple pairwise comparison, and a series of pairwise comparisons in a time-course experiment Trapnell et al., Nature Protocols, March 2012

Differential Gene Expression Options for DGE analysis Want to learn more about the formats? https://genome.ucsc.edu/faq/faqformat.html

Check MD5 of downloaded FASTQ files Bioinformatics pipeline PhD student Feargal Ryan FastQC Quality filtering/trim ming using Trimmomatic sortmerna to remove rrna Align vs hg20 using STAR Align each sample against all known microbial genes (IGC) with Bowtie2 FastQC Make count table using HTseq Visualise normalized counts using DEXSeq for Differential exon usage DESeq2 for Differential Expression Combine samples and assemble with Trinity or IDBA-MT/IDBA-UD DESeq2 Normalize counts and perform differential expression analysis Assembles and annotate transcripts Visualize normalized counts. Any batch effects? Any other confounding factors separating out the data to control for in DESEeq2 Visualise sample composition using annotation information

Number of samples required Power calculation: Read coverage vs. sample size Sequencing coverage/sample RNASeqPower in R Bioconductor

What the Pilot Study taught us Differential gene expression for inflamed and non-inflamed in both UC and CD => higher n required to confirm Use a more effective rrna depletion method Don t deplete host mrna if also interested in it No extra cdna amplification needed High n more important than sample coverage Experimental & bioinformatics pipelines SIRG & Second Genome study: 200 subjects Genotype, disease activity, diet, medication