RNAseq Introduction. Ian Misner, Ph.D. Bioinformatics Crash Course. Bioinformatics Core

Similar documents
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

PreciseTM Whitepaper

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Next Generation Sequencing

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

Introduction to next-generation sequencing data

G E N OM I C S S E RV I C ES

Next generation DNA sequencing technologies. theory & prac-ce

Challenges associated with analysis and storage of NGS data

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

Introduction to NGS data analysis

RNAseq / ChipSeq / Methylseq and personalized genomics

A Primer of Genome Science THIRD

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Core Facility Genomics

How Sequencing Experiments Fail

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Data Analysis for Ion Torrent Sequencing

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

NGS Data Analysis: An Intro to RNA-Seq

TGC AT YOUR SERVICE. Taking your research to the next generation

Frequently Asked Questions Next Generation Sequencing

NGS data analysis. Bernardo J. Clavijo

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

The world of non-coding RNA. Espen Enerly

PrimePCR Assay Validation Report

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Overview of Next Generation Sequencing platform technologies

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Expression Quantification (I)

Services. Updated 05/31/2016

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

ncounter Leukemia Fusion Gene Expression Assay Molecules That Count Product Highlights ncounter Leukemia Fusion Gene Expression Assay Overview

Computational Genomics. Next generation sequencing (NGS)

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

An Overview of DNA Sequencing

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Description: Molecular Biology Services and DNA Sequencing

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

PrimePCR Assay Validation Report

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Gene Expression Analysis

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Delivering the power of the world s most successful genomics platform

Introduction Bioo Scientific

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Basic processing of next-generation sequencing (NGS) data

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Measuring gene expression (Microarrays) Ulf Leser

Introduction To Real Time Quantitative PCR (qpcr)

SEQUENCING. From Sample to Sequence-Ready

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Illumina TruSeq DNA Adapters De-Mystified James Schiemer

Recombinant DNA and Biotechnology

Searching Nucleotide Databases

Methods, tools, and pipelines for analysis of Ion PGM Sequencer mirna and gene expression data

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

LifeScope Genomic Analysis Software 2.5

Introduction to Bioinformatics 3. DNA editing and contig assembly

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

FOR REFERENCE PURPOSES

Illumina Sequencing Technology

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg

History of DNA Sequencing & Current Applications

Microarray Technology

Translation Study Guide

Genome Sequencer System. Amplicon Sequencing. Application Note No. 5 / February

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Gene Expression Assays

Deep Sequencing Data Analysis

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

BIOO LIFE SCIENCE PRODUCTS

Structure and Function of DNA

restriction enzymes 350 Home R. Ward: Spring 2001

Biotechnology: DNA Technology & Genomics

Intro to Bioinformatics

Welcome to the Plant Breeding and Genomics Webinar Series

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

E. coli plasmid and gene profiling using Next Generation Sequencing

mrna NGS Data Analysis Report

Transcription:

RNAseq Introduction Ian Misner, Ph.D. Bioinformatics Crash Course

Many types of RNA rrna, trna, mrna, mirna, ncrna, etc. ~2% is mrna

Why sequence RNA Functional studies Drug treated vs untreated cell line Wild type vs knock out SNP finding Transcriptome assembly Novel gene finding Splice variant analysis

Challenges Sampling Purity?, quantity?, quality? Exons can be problematic Mapping reads can become difficult RNA abundances vary by orders of magnitude Highly expressed genes can over power genes of interest Organeller RNA can block overall signal RNA is fragile and must be properly handled RNA population turns over quickly within a cell.

General workflows Obtain raw data Align/assemble reads Process alignment with a tool specific to the goal e.g. cufflinks sailfish Post process Import into downstream software (R, Matlab, Cytoscape, etc.) Summarize and visualize Create gene lists, prioritize candidates for validation, etc.

Experimental Design Questions What is my biological question? How much sequencing do I need? What type of sequencing should I do? Read length? Which platform? SE or PE? How much multiplexing can I do? Should I pool samples? How many replicates do I need? What about duplicates?

What are you working with? Novel little or no data Some data ESTs or Unigenes Basic Draft Genome Few thousand contigs Some annotation, mostly ab initio Good Draft Genome Few thousand scaffolds to chromosome arms Better annotations with human verification Model Organism Fully sequenced genome High confidence annotations Genetic maps and markers Mutant data available

Number of Reads/Replicates (a) Increase in biological replication significantly increases the number of DE genes identified. Liu Y et al. Bioinformatics 2014;30:301-304

Read Type and Platform Read Type Pla+orm Uses 50 SE Illumina Gene Expression Quan5fica5on SNP- finding (Good Reference) 50 PE Illumina Above plus Splice variants 100+ PE Illumina Above plus Transcriptome assembly DE within gene families 200+ Ion Torrent Sanger 454 Nanopore Splice variants Transcriptome assembly Haplotypes Too large for DE

Read Platform Perdue University Discovery Park

Multiplexing 6-8 nt barcodes added to samples during library prep. Allows for pooling of samples into the same lane. Mitigate lane effects Maximize sequencing efficiency Dual barcoding allows for up to 96 samples per lane.

Replicates Biological Measurement of variation between samples More are better Can detect genetic variation between samples Pooling with barcodes each sample is a replicate Pooling without barcodes each pool is a replicate

Replicates Technical Can determine variation within sample preparation. Can be cost prohibitive. More biological replicate are better. Useful across lanes to mitigate lane effects.

Should I remove duplicates? Maybe Duplicates may correspond to biased PCR amplification of particular fragments For highly expressed, short genes, duplicates are expected even if there is no amplification bias Removing them may reduce the dynamic range of expression estimates Assess library complexity and decide If you do remove them, assess duplicates at the level of paired-end reads, not single end reads

Processing RNA for Sequencing Depends upon what you re looking to achieve. mrna is the main target PolyA Selection Oligo-dT beads Highly efficient at getting mrna and depleting the rrna Can t be used with non-polya RNA mirna kits as well

Strand Specific Sequencing Illumina prep that ligates adaptors to 5 and 3 ends of RNA prior to cdna reverse transcription Having strand information makes mapping more straightforward. Can identify antisense transcripts 5 3

Insert Sizes

Alignment Options No Genome?! No Problem! Transcriptome assembly There will be redundancy NCBI Unigene Set Not necessarily complete Good to identify highly expressed genes Valid Transcripts from you organism Easy to use but may miss novel genes Fully Sequenced and Annotated Genome No excuses this better be a Nature paper!

Mapping RNAseq Reads How many mismatches will you allow? Depends on what your mapping and what your using for a reference. Number of hits allowed? How many times can a read match in different locations? Splice Junctions? Is your mapping tool splice aware? Expected distance for PE reads? This is important to know so that read pairs can map properly.

Why PE reads are great 2 Mismatches Exact Match

Perdue University Discovery Park

RNAseq Pipeline TopHat Cufflinks Cuffcompare CuffDiff CummRbund

There are other options

Not all software is created equal

RNAseq Best Practices Platform Illumina HiSeq Read Length Minimum of 50bp 100bp is better Paired-end or Single PE Read Depth 30-40 million/sample

RNAseq Best Practices Number of biological replicates 3 or more as cost allows Experimental Design Balanced Block What type of alignment TopHat Highly confident and splice aware Unique or Multiple mapping Unique 70-90% mapping rate Analysis Method Use more than one approach Know the limits of the experiment