Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)



Similar documents
New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

PreciseTM Whitepaper

Next Generation Sequencing

Computational Genomics. Next generation sequencing (NGS)

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

Expression Quantification (I)

Next generation DNA sequencing technologies. theory & prac-ce

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Frequently Asked Questions Next Generation Sequencing

July 7th 2009 DNA sequencing

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

Genome-wide measurements of protein-dna interaction by chromatin immunoprecipitation

Analysis of gene expression data. Ulf Leser and Philippe Thomas

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Challenges associated with analysis and storage of NGS data

Introduction To Real Time Quantitative PCR (qpcr)

RNAseq / ChipSeq / Methylseq and personalized genomics

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing

How many of you have checked out the web site on protein-dna interactions?

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

G E N OM I C S S E RV I C ES

Description: Molecular Biology Services and DNA Sequencing

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Introduction to next-generation sequencing data

Next Generation Sequencing for DUMMIES

Introduction Bioo Scientific

Measuring gene expression (Microarrays) Ulf Leser

How Sequencing Experiments Fail

NGS data analysis. Bernardo J. Clavijo

Core Facility Genomics

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Gene Expression Assays

History of DNA Sequencing & Current Applications

FOR REFERENCE PURPOSES

The world of non-coding RNA. Espen Enerly

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

PrimePCR Assay Validation Report

Concepts and methods in sequencing and genome assembly

bitter is de pil Linos Vandekerckhove, MD, PhD

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Translation Study Guide

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Single Nucleotide Polymorphisms (SNPs)

Overview of Next Generation Sequencing platform technologies

Mir-X mirna First-Strand Synthesis Kit User Manual

Real-time quantitative RT -PCR (Taqman)

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

Introduction to NGS data analysis

Basic Analysis of Microarray Data

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual

Gene Models & Bed format: What they represent.

ncounter Leukemia Fusion Gene Expression Assay Molecules That Count Product Highlights ncounter Leukemia Fusion Gene Expression Assay Overview

SEQUENCING. From Sample to Sequence-Ready

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Services. Updated 05/31/2016

Next Generation Sequencing: Technology, Mapping, and Analysis

TruSeq Custom Amplicon v1.5

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Essentials of Real Time PCR. About Sequence Detection Chemistries

Complete Genomics Sequencing

How is genome sequencing done?

Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR

NGS Data Analysis: An Intro to RNA-Seq

Protein Synthesis How Genes Become Constituent Molecules

PrimePCR Assay Validation Report

Athanasia Pavlopoulou University of Thessaly, Lamia June 2015

Validating Microarray Data Using RT 2 Real-Time PCR Products

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

HiPer RT-PCR Teaching Kit

Real-Time PCR Vs. Traditional PCR

Table of Contents. I. Description II. Kit Components III. Storage IV. 1st Strand cdna Synthesis Reaction... 3

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

REAL TIME PCR USING SYBR GREEN

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

Bioanalyzer Applications for

Getting Started Guide

AffinityScript QPCR cdna Synthesis Kit

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

User Manual/Hand book. qpcr mirna Arrays ABM catalog # MA003 (human) and MA004 (mouse)

DNA Sequence Analysis

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Mir-X mirna First-Strand Synthesis and SYBR qrt-pcr

Basic processing of next-generation sequencing (NGS) data

To be able to describe polypeptide synthesis including transcription and splicing

Transcription:

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

A typical RNA Seq experiment Library construction

Protocol variations Fragmentation methods RNA: nebulization, hydrolysis cdna: sonication, Dnase I treatment Depletion of highly abundant transcripts Positive selection of mrna. Poly(A) selection. Negative selection. (RiboMinusTM) Strand specificity Most RNA sequencing is not strand-specific Single-end or Paired-end sequencing http://www.bioconductor.org/help/course-materials/2009/embljune09/talks/rnaseq-paul.pdf

Single and paire end sequencing

Microarrays versus RNA seq RNA-seq Counting Absolute abundance of transcrits All transcrits are present and can be analyzed mrna / ncrna (snorna, lncrna, erna,mirna,...) Analysis of allele-specific gene expression Detection of fusions and other structural variations Microarrays Indirect record of expression level (complementary probe) Relative abundance Cross-hybridization Content limited (can only show you what you're already looking for)

High reproducibility and dynamic range (a) Comparison of two brain technical replicate RNASeq determinations for all mouse gene models (from the UCSC genome database), measured in reads per kilobase of exon per million mapped sequence reads (RPKM), which is a normalized measure of exonic read density; R2 = 0.96. (c) Six in vitro synthesized reference transcripts of lengths 0.3 10 kb were added to the liver RNA sample (1.2 104 to 1.2 109 transcripts per sample; R2 > 0.99).

RNA Seq drawback Current disadvantages More expensive than standard expression arrays More time consuming than any microarray technology Some (lots of) data analysis issues Computing accurate transcript models Mapping reads to splice junctions Contribution of high-abundance RNAs (eg ribosomal) could dilute the remaining transcript population; sequencing depth is important http://www.bioconductor.org/help/course-materials/2009/embljune09/talks/rnaseq-paul.pdf

Do arrays and RNA Seq tell a consistent story? Do arrays and RNA-Seq tell a consistent story? The relationship is not quite linear but the vast majority of the expression values are similar between the methods. Scatter increases at low expression as background correction methods for arrays are complicated when signal levels approach noise levels. Similarly, RNASeq is a sampling method and stochastic events become a source of error in the quantification of rare transcripts Given the substantial agreement between the two methods, the array data in the literature should be durable Comparison of array and RNA-Seq data for measuring differential gene expression in the heads of male and female D. pseudoobscura

Sequencing technologies RNA-Seq has been performed using SOLiD, Illumina and 454 platform However, RNA-Seq needs high dynamic range SOLiD and Illumina sequencers Need PCR amplification Low coverage of GC rich transcrits Future? Helicos No PCR amplification Direct RNA sequencing seem possible High error rates Pacific Biosciences

Current technologies Roche 454 (pyrosequencing) Genome Analyzer IIx Illumina (sequencing by synthesis) ABI SOLiD (Seq. by Oligo Ligation/Detection, sequencing by ligation

Illumina

Illumina Genome Analyzer (I)

Illumina Genome Analyzer (II)

Illumina Genome Analyzer (III)

Illumina Genome Analyzer

ABI Next Gen Sequencing: SOLiD http://www.flickr.com/photos/ricardipus/4452215186/

ABI Next Gen Sequencing: SOLiD Use empcr on magnetic beads Sequencing by ligation using fluorescent probes

Sequencing by ligation (SOLiD) We have a flow cell (basically a microscope slide that can be serially exposed to any liquids desired) whose surface is coated with thousands of beads each containing a single genomic DNA species, with unique adapters on either end. Each microbead can be considered a separate sequencing reaction which is monitored simultaneously via sequential digital imaging. Up to this point all next-gen sequencing technologies are very similar, this is where ABI/SOLiD diverges dramatically (see figure 4)

Sequencing by ligation (SOLiD)

Sequencing by ligation (SOLiD)

Sequencing by ligation (SOLiD) According to colour code table four different dinucleotides may correspond to the same colour. This ambiguity is resolved using primer n-1 round The first ligation of this primer round analyses dinucleotide with one known nucleotide

Sequences are in color space format (cfasta) >853_64_861_F3 T00101210200133033001332333320203311 Dibases: TT TT TG GG... Sequence: TTGG...

Advantage of 2 base encoding Higher accuracy for SNP detection

Typical analysis pipeline Raw sequence data Pre-processing Removing actifact poor quality reads, sequencing adaptors, low-complexity reads, near-identical reads (PCR biases) Assembly 1. Reference-based 2. De novo strategy 3. Combined strategy (1 & 2) Summarize to gene/transcrit Statistical analysis : Diff. Expressed genes

Overview of the reference based transcriptome assembly a Reads (grey) are first aligned to a reference genome using spliceaware aligner, such as TopHat or SpliceMap b A connectivity or splice graph is then constructed to represent all possible isoforms at a locus. c,d Finally, alternative paths through the graph (blue, red, yellow and green) are followed to join compatible reads together into isoforms.

Summarize to gene expression level Transcrits of different length have different read count Tag count is normalized for transcrit length and total read number in the measurement (RPKM, Reads Per Kilobase of exon model per Million mapped reads) 1 RPKM corresponds to approximately one transcript per cell FPKM, Fragments Per Kilobase of exon model per Million mapped reads (paired-end sequencing)