Quantitative RNA Sequencing (RNA-seq) and Exome Analysis

Similar documents
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

G E N OM I C S S E RV I C ES

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Core Facility Genomics

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Next Generation Sequencing

Next Generation Sequencing: Technology, Mapping, and Analysis

Frequently Asked Questions Next Generation Sequencing

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

School of Nursing. Presented by Yvette Conley, PhD

RNAseq / ChipSeq / Methylseq and personalized genomics

Services. Updated 05/31/2016

Disease gene identification with exome sequencing

Sequencing and microarrays for genome analysis: complementary rather than competing?

Introduction to NGS data analysis

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

PreciseTM Whitepaper

Computational Genomics. Next generation sequencing (NGS)

Next generation DNA sequencing technologies. theory & prac-ce

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Expression Quantification (I)

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Challenges associated with analysis and storage of NGS data

The world of non-coding RNA. Espen Enerly

CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Genomes and SNPs in Malaria and Sickle Cell Anemia

Information leaflet. Centrum voor Medische Genetica. Version 1/ Design by Ben Caljon, UZ Brussel. Universitair Ziekenhuis Brussel

How Sequencing Experiments Fail

NEXT GENERATION SEQUENCING

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

ncounter Leukemia Fusion Gene Expression Assay Molecules That Count Product Highlights ncounter Leukemia Fusion Gene Expression Assay Overview

TGC AT YOUR SERVICE. Taking your research to the next generation

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

-> Integration of MAPHiTS in Galaxy

Gene Expression Analysis

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

OriGene Technologies, Inc. MicroRNA analysis: Detection, Perturbation, and Target Validation

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

A Primer of Genome Science THIRD

Overview of Genetic Testing and Screening

RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12

Basic processing of next-generation sequencing (NGS) data

Control of Gene Expression

Biomedical Big Data and Precision Medicine

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

New solutions for Big Data Analysis and Visualization

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Overview of Next Generation Sequencing platform technologies

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Transcription and Translation of DNA

LifeScope Genomic Analysis Software 2.5

Gene Models & Bed format: What they represent.

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

Delivering the power of the world s most successful genomics platform

How many of you have checked out the web site on protein-dna interactions?

Introduction to Genome Annotation

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

Introduction to next-generation sequencing data

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Analysis of ChIP-seq data in Galaxy

Genetic diagnostics the gateway to personalized medicine

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator

Gene Expression Assays

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Human Genome Organization: An Update. Genome Organization: An Update

escience and Post-Genome Biomedical Research

Text file One header line meta information lines One line : variant/position

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Lecture Series 7. From DNA to Protein. Genotype to Phenotype. Reading Assignments. A. Genes and the Synthesis of Polypeptides

Genetics Module B, Anchor 3

Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, b.patel@griffith.edu.

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Simplifying Data Interpretation with Nexus Copy Number

Biology Final Exam Study Guide: Semester 2

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Introduction To Real Time Quantitative PCR (qpcr)

Introduction to Bioinformatics 3. DNA editing and contig assembly

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

How-To: SNP and INDEL detection

Single Nucleotide Polymorphisms (SNPs)

Methods, tools, and pipelines for analysis of Ion PGM Sequencer mirna and gene expression data

Understanding West Nile Virus Infection

ACMG clinical laboratory standards for next-generation sequencing

Transcription:

Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor of Pharmacology School of Pharmacy, Department of Pharmaceutical Sciences Room V20-3124 (303) 724-3362 richard.radcliffe@ucdenver.edu Why RNA-seq? Genetic architecture Developmental stage Environmental influences Tissue type Disease state Phenotype Crick (1970) Nature 227:561-563 1

Why RNA-seq? Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understanding development and disease. Catalogue all species of transcript, including mrnas, non-coding RNAs and small RNAs Determine the transcriptional structure of genes, in terms of their start sites, 5 and 3 ends, splicing patterns and other post-transcriptional modifications Quantify the changing expression levels of each transcript during development and under different conditions. Pathway/network/ontology analysis. Massively parallel expression analysis Wang et al. (2009) Nat Rev Genetics 10:57-63 RNA-seq Overview Select fraction of interest Library prep Sequence and map to reference genome Analysis (QC, quantitation, transcript annotation) Adapted from: Pepke et al. (2009) Nat Methods 6:S22-S32 2

Library Prep Corney (2013) Mater Methods 3:203 Library Prep: Some Considerations RNA fraction Many different RNA species Poly(A) Size (<200 nt vs. >200 nt) Strandedness Read length Single- vs. pair-end Multiplexing 3

RR34 RNA Fraction ~80% ~15% Both strands transcribed Transcribed Genomic Distribution Total RNA Distribution Mattick & Makunin (2006) Hum Mol Genet 1:R17-29 Genomes, 2 nd Edition, Oxford: Wiley-Liss, 2002 Library Prep: Some Considerations RNA fraction Many different RNA species Poly(A) Size (<200 nt vs. >200 nt) Strandedness Overlapping transcripts Annotation of novel transcripts Read length Single- vs. pair-end Multiplexing 4

Slide 7 RR34 The area of the box represents the genome. The area of large green circle is equivalent to the documented extent of transcription, with the darker green area corresponding to that on both strands. CDSs are protein-coding sequences, and UTRs are 5 - and 3 -untranslated sequences in mrnas. The dots indicate (and in fact overstate) the proportion of the genome occupied by known snornas and mirnas. Richard Radcliffe, 1/26/2015

Strandedness Strandedness Ncstn (-) <<<<< <<< Copa (+) <<< Transcription <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< DS library prep SS library prep <<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<< Alignment <<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<< Which strand (gene) did the fragment come from? No question about which strand (gene) the fragment came from. 5

Library Prep: Some Considerations RNA fraction Many different RNA species Poly(A) Size (<200 nt vs. >200 nt) Strandedness Read length Single- vs. pair-end Multiplexing Read Length Read length is related to: Sequencing accuracy: quality declines as a function of the length of a read Mapping accuracy: the longer the read, the more accurately it maps 6

Library Prep: Some Considerations RNA fraction Many different RNA species Poly(A) Size (<200 nt vs. >200 nt) Strandedness Read length Single- vs. pair-end Multiplexing Single vs. Paired-end Zhernakova et al. (2013) PLoS Genet e1003594 7

Library Prep: Some Considerations RNA fraction Many different RNA species Poly(A) Size (<200 nt vs. >200 nt) Strandedness Read length Single- vs. pair-end Multiplexing Mapping to the Reference Genome @HWUSI-EA541_0032:1:2:0:325#0 CCATCTTTTTGATGTCCGCAATGATTT + WTORTSOQXTVVYXRXXXVPTXXXWUUL Alignment @HWUSI-EA541_0032:1:2:0:325#0 - chr7 13619194 CCATCTTT Bowtie, BWA Computational considerations 8

Mapping to the Genome: Some Considerations Non-unique reads Gene families Repeat sequences (simple repeats, transposons) Depth Probability of representation & limits of detection Transcript isoform quantification Variant calling (SNPs, small indels) Reference genome effects Non-unique Reads 20 250 Fraction of reads suppressed (%) 16 12 8 4 200 150 100 50 Number of alignments (10 6 ) 0 0 10 0 10 1 10 2 10 3 10 4 10 5 Number of multiple alignment reads allowed (bowtie option -m) 9

Non-unique Reads: Gene Families Non-unique Reads: Repeats 10

Mapping to the Genome: Some Considerations Non-unique reads Gene families Repeat sequences (simple, SINEs, LINEs, etc.) Depth Probability of representation & limits of detection Transcript isoform quantification Variant calling (SNPs, small indels) Reference genome effects Depth: Transcript Quantification 11

Depth: Variant Calling Mapping to the Genome: Some Considerations Non-unique reads Gene families Repeat sequences (simple, SINEs, LINEs) Depth Probability of representation & limits of detection Variant calling (SNPs, small indels) Transcript isoform quantification Reference genome effects 12

Reference Genome Effects RNA seq: ISS (ISS genome) RNA seq: ISS (mm10 genome) ILS DNA Sequencing ISS DNA Sequencing Gene Annotations Analysis QC Assembly/Quantification Reads Per Kilobase Exon per Million Mapped Reads (RPKM) Differential expression Pathway/network functional analysis Annotation Novel exons novel splice junctions novel genes 13

Quality Control Pre-library construction: RNA quality Pre-alignment: Per base quality Per read quality Nucleotide distribution per position GC content Sequence over-representation Post-alignment: Mean coverage, 5-3 and 3-5 Ribosomal RNA contamination Percent mapped reads Quality Control: RNA Degradation 28s 18s 14

Quality Control Quality per position Quality per read Nucleotide distribution Analysis QC Assembly/Quantification Reads Per Kilobase Exon per Million Mapped Reads (RPKM) Differential expression Pathway/network functional analysis Annotation Novel exons novel splice junctions novel genes 15

Assembly/Quantification: RPKM 3.18 RPKM = C/LN Analysis QC Assembly/Quantification Reads Per Kilobase Exon per Million Mapped Reads (RPKM) Differential expression Pathway/network functional analysis Annotation Novel exons novel splice junctions novel genes 16

Differential Expression Hddc3 Analysis QC Assembly/Quantification Reads Per Kilobase Exon per Million Mapped Reads (RPKM) Differential expression Pathway/network functional analysis Annotation Novel exons novel splice junctions novel genes 17

Pathway/Network Functional Analysis Weighted Gene Co-expression Network Analysis (WGCNA) Gene Ontology (GO) Cluster Analysis Darlington et al. (2013) Genes Brain Behav 12:263-274 Bennett et al. (2015) Alcohol Clin Exp Res NIHMS658870 Analysis QC Assembly/Quantification Reads Per Kilobase Exon per Million Mapped Reads (RPKM) Differential expression Pathway/network functional analysis Annotation Novel exons novel splice junctions novel genes 18

Annotation Exome Sequencing Why Identification of variants (SNPs, CNVs, small InDels) Linkage/association/pedigree studies Clinical diagnostics How Isolate, fragment DNA Build library Exome enrichment Sequence Align to reference genome Variant calling Higher order genetic analysis 19

Exome Enrichment www.genomics.agilent.com RR1 Variant Calling Altmann et al. (2012) Hum Genetics 131:1541-1554 20

Slide 40 RR1 Examples of intragenic deletion and duplication detected by WES and confirmed by exome acgh. Each bar in the graphs (a) (c) and (e) (g) represents an exon. (a c) WES data from a family trio in which the (a) proband has inherited a whole-gene duplication of KRT34 from the (b) father, whereas the (c) mother shows normal copy number at that gene. (e g) WES data from a family trio in which the (e) proband has inherited a partial-gene heterozygous deletion in the SYCP2L gene from the (g) mother, whereas the (f) father shows normal copy number at those exons. Each dot in panels d and h represents an oligonucleotide probe in the gene of interest on the exome array, with a duplication shown by probes deviating to a positive log2 ratio (marked in red) and a deletion shown by probes deviating to a negative log2 ratio (marked in green). Panels d and h show confirmation of the KRT34 duplication and the SYCP2L deletion, respectively, by exome acgh. acgh, array comparative genomic hybridization; WES, whole-exome sequencing. Radcliffe, Richard, 2/1/2015

RR2 Variant Calling: CNVs/Indels Child Father Mother Retterer et al. (2014) Genetics Med doi:10.1038/gim.2014 Genetic Analysis: Mendelian Inheritance Assumptions: Only consider small indels and SNPs Causal variants are coding Causal variants alter protein sequence Near complete penetrance Rabbani et al. (2012) J Hum Genetics 57:621-632 21

Slide 41 RR2 Examples of intragenic deletion and duplication detected by WES and confirmed by exome acgh. Each bar in the graphs (a) (c) and (e) (g) represents an exon. (a c) WES data from a family trio in which the (a) proband has inherited a whole-gene duplication of KRT34 from the (b) father, whereas the (c) mother shows normal copy number at that gene. (e g) WES data from a family trio in which the (e) proband has inherited a partial-gene heterozygous deletion in the SYCP2L gene from the (g) mother, whereas the (f) father shows normal copy number at those exons. Each dot in panels d and h represents an oligonucleotide probe in the gene of interest on the exome array, with a duplication shown by probes deviating to a positive log2 ratio (marked in red) and a deletion shown by probes deviating to a negative log2 ratio (marked in green). Panels d and h show confirmation of the KRT34 duplication and the SYCP2L deletion, respectively, by exome acgh. acgh, array comparative genomic hybridization; WES, whole-exome sequencing. Radcliffe, Richard, 2/1/2015

Genetic Analysis Ku et al. (2012) Ann Neurology 71:5-14 A Few References RNA-seq: Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJ, Tai IT, Marra MA (2010) Alternative expression analysis by RNA sequencing. Nat Methods 7:843-847. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA- Seq. Nat Methods 5:621-628. Munger SC, Raghupathy N, Choi K, Simons AK, Gatti DM, Hinerfeld DA, Svenson KL, Keller MP, Attie AD, Hibbs MA, Graber JH, Chesler EJ, Churchill GA (2014) RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations. Genetics 198:59-73. Oshlack A, Robinson MD, Young MD (2010) From RNA-seq reads to differential expression results. Genome Biol 11:220. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57-63. Exome sequencing: Altmann A, Weber P, Bader D, Preuß M, Binder E, Müller-Myhsok B (2012) A beginners guide to SNP calling from highthroughput DNA-sequencing data. Hum Genet 131:1541-1554. Biesecker LG, Green RC (2014) Diagnostic clinical genome and exome sequencing. The New England Journal of Medicine 370:2418-2425. Krumm N, Sudmant PH, Ko A, O'Roak BJ, Malig M, Coe BP, Quinlan AR, Nickerson DA, Eichler EE (2012) Copy number variation detection and genotyping from exome sequence data. Genome Res 22:1525-1532. Majewski J, Schwartzentruber J, Lalonde E, Montpetit A, Jabado N (2011) What can exome sequencing do for you? Journal of Medical Genetics 48:580-589. Singleton AB (2011) Exome sequencing: a transformative technology. The Lancet Neurology 10:942-946. 22