NGS Data Analysis: An Intro to RNA-Seq

Similar documents
Introduction to NGS data analysis

Frequently Asked Questions Next Generation Sequencing

Analysis of ChIP-seq data in Galaxy

LifeScope Genomic Analysis Software 2.5

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Comparing Methods for Identifying Transcription Factor Target Genes

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

BioHPC Web Computing Resources at CBSU

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Challenges associated with analysis and storage of NGS data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

PreciseTM Whitepaper

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Basic processing of next-generation sequencing (NGS) data

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg

Next Generation Sequencing

Databases and mapping BWA. Samtools

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

A Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here

Data Analysis for Ion Torrent Sequencing

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster

Prepare the environment Practical Part 1.1

GenBank, Entrez, & FASTA

Gene Expression Analysis

Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team

Normalization of RNA-Seq

G E N OM I C S S E RV I C ES

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Installation Guide for Windows

Introduction to next-generation sequencing data

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

mrna NGS Data Analysis Report

Next Generation Sequencing

-> Integration of MAPHiTS in Galaxy

Introduction. Overview of Bioconductor packages for short read analysis

Next generation DNA sequencing technologies. theory & prac-ce

Data formats and file conversions

Deep Sequencing Data Analysis

Version 5.0 Release Notes

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Computational Genomics. Next generation sequencing (NGS)

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

A Tutorial in Genetic Sequence Classification Tools and Techniques

New solutions for Big Data Analysis and Visualization

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

GeneProf and the new GeneProf Web Services

CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.

E. coli plasmid and gene profiling using Next Generation Sequencing

Bioinformatics Unit Department of Biological Services. Get to know us

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting

NGS data analysis. Bernardo J. Clavijo

Methods, tools, and pipelines for analysis of Ion PGM Sequencer mirna and gene expression data

Text file One header line meta information lines One line : variant/position

Arena Tutorial 1. Installation STUDENT 2. Overall Features of Arena

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

MORPHEUS. Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

Bioinformatics Resources at a Glance

Visualisation tools for next-generation sequencing

Practical Differential Gene Expression. Introduction

TGC AT YOUR SERVICE. Taking your research to the next generation

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

High Throughput Sequencing Data Analysis using Cloud Computing

Next Generation Sequencing: Technology, Mapping, and Analysis

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

Understanding West Nile Virus Infection

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

UGENE Quick Start Guide

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Module 1. Sequence Formats and Retrieval. Charles Steward

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Introduction Bioo Scientific

Partek Methylation User Guide

RNA- seq de novo ABiMS

Reduced Representation Bisulfite-Seq A Brief Guide to RRBS

Analysis of NGS Data

Next generation sequencing (NGS)

Expression Quantification (I)

Notice. DNA Sequencing Module User Guide

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Delivering the power of the world s most successful genomics platform

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

GMQL Functional Comparison with BEDTools and BEDOPS

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

Transcription:

NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1

Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1

Experimental Design There are lots of of sequencing experiments available: Resequencing Assembly RNA-Seq CHiP-Seq Meta-genomics GST Colloquim: March 25th, 2014 3 / 1

Common experimental questions: Measure variation within or between species Generate a genome sequence Transcriptome characterization Identify protein binding sites Population genetics Differential expression studies GST Colloquim: March 25th, 2014 4 / 1

Basic Process GST Colloquim: March 25th, 2014 5 / 1

Design Considerations What resources do you have already? (reference genome, curated gene models, etc.) Do you need biological reps? (Depends on the experiment, but the answer is usually yes.) Do you need technical reps? (Most likely not.) Do you need controls? (Depends on the experiment.) Do you need deep sequencing coverage?(again, depends on the experiment.) All of these questions should be answered before you start. GST Colloquim: March 25th, 2014 6 / 1

Types of reads Single: Paired: Mate-Paired: Fast runs Cheapest overall cost More data for each fragment More data for alignment/assembly Same inputs as single-end Best for iso-form detection. Longer pairs than Paired-end Allow sequencing over long repeats Good for detecting structural variations Requre more input DNA than any other library GST Colloquim: March 25th, 2014 7 / 1

How many reads? Genomic RNA-Seq Depends on the size of your genome. You want enough reads to cover your genome at depth. Depends on complexity of the transcriptional profile you re working on and if you need to capture rare events Rule of thumb is that more replicates are more important than more sequences. Again this is another decision that is entirely dependent on the question you are trying to answer and the organism you are working in. In reality, there is usually more sequencing capacity in a lane than you need for a sample so the real question is how many samples can you pool into a given lane. GST Colloquim: March 25th, 2014 8 / 1

Read length: Again completely dependent on experiment and organism. Longer is usually better. But sometimes short is good enough. GST Colloquim: March 25th, 2014 9 / 1

Selecting a technology: Based on Read / Library Type Illumina Paired, Single, Mate Pair Ion Torrent Paired, Single, Mate Pair Solid Single, Mate Pair 454 Single, Mate Pair PacBio Single Read Length: Illumina 150-250 bp Ion Torrent 200-400 bp (100-200 bp for Paired). 454 500-1000 bp PacBio 1000+bp Solid 75bp GST Colloquim: March 25th, 2014 10 / 1

Selecting a technology: Read Number: (Manufacturer s claims, and machine dependent) Illumina 0.3-1000 GigaBases Ion Torrent 60-80 Million reads 454 1 million reads PacBio? Solid 90-300 Gigabases GST Colloquim: March 25th, 2014 11 / 1

Sample Prep (RNA-Seq Specific) Sample Collection and Storage: RNA-Later - Stabilization buffer 1 month storage time at RT. Good for field collection. Liquid Nitrogen - Fast, cheap, effective as long as you have constant access. RNA extraction Some sequencing centers only want total RNA so that they can verify sample quality before library prep. GST Colloquim: March 25th, 2014 12 / 1

Sample Prep (RNA-Seq Specific) rrna Depletion: Poly-A Enrichment polya tails of mrna used to enrich a sample (most common) rrna depletion rrna is actively bound and removed (important if large amount of rrna present) cdna Library: Non Stranded total RNA used for cdna library construction. Strand information not preserved. Stranded Strand information is preserved. Crucial in organisms with overlapping genes. GST Colloquim: March 25th, 2014 13 / 1

Library Prep It is common to have a sequencing center do this step for you, but depending on budget and experience you may want to do this yourself. Fragment DNA Sonication or Enzyme based methods followed by size selection DNA-Repair Blunting + A overhang Ligate Adaptors Attachment Site PCR addition of attachment site to one end. Barcode Attachemnt PCR addition of bar-code and attachment site to other end Clean Up Remove un ligated adapters etc. GST Colloquim: March 25th, 2014 14 / 1

Sequencing Send your samples off to the sequencing center. You ll get raw data back when it s done. GST Colloquim: March 25th, 2014 15 / 1

Quality Control of Raw Data Need to measure: Proportion of high quality bases called. Distribution of called nucleotides. Number of reads that are high overall quality Distribution of read qualities at each position GST Colloquim: March 25th, 2014 16 / 1

Trimming and Filtering Reads It is common practice to: remove reads with overall poor quality trim the ends of reads to remove low quality sequences remove low quality nucleotides There are compelling arguments why you may want to do this later, but in general its always safe to do these steps before you align reads. GST Colloquim: March 25th, 2014 17 / 1

What comes next? 1 Eyras, Eduardo; P. Alamancos, Gael; Agirre, Eneritz (2013): Methods to Study Splicing from RNA-Seq. figshare. http://dx.doi.org/10.6084/m9.figshare.679993 GST Colloquim: March 25th, 2014 18 / 1

What comes next? 2 Eyras, Eduardo; P. Alamancos, Gael; Agirre, Eneritz (2013): Methods to Study Splicing from RNA-Seq. figshare. http://dx.doi.org/10.6084/m9.figshare.679993 GST Colloquim: March 25th, 2014 19 / 1

Learning Objectives RNA-seq data quality-control (FastQC) Align sequence reads to a reference genome using Tophat Review samtools and file formats conversion View alignments in the IGV Analyze differential gene expression (in R environment) GST Colloquim: March 25th, 2014 20 / 1

Analysis workflow GST Colloquim: March 25th, 2014 21 / 1

Tools bowtie2 tophat2 FastQC samtools R and required Bioconductor packages (DESeq) RStudio HTSeq 0.6.0 Integrative Genomics Viewer (IGV) Java Most of these required tools are already installed in my bin folder: /lustre/home/qjia2/bin GST Colloquim: March 25th, 2014 22 / 1

The data Data used in this tutorial was acquired from this paper: Trapnell C, et al: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols 2012, 7(3):562-578. Pubmed It is generated in silico in Drosophila melanogaster and contains 6 paired-end samples corresponding to 3 biological replicates each of 2 conditions. For more details, please click here. File name Description C1_R1_1.fq.gz, C1_R1_2.fq.gz Simulated Condition 1, replicate 1 C1_R2_1.fq.gz, C1_R2_2.fq.gz Simulated Condition 1, replicate 2 C1_R3_1.fq.gz, C1_R3_2.fq.gz Simulated Condition 1, replicate 3 C2_R1_1.fq.gz, C2_R1_2.fq.gz Simulated Condition 2, replicate 1 C2_R2_1.fq.gz, C2_R2_2.fq.gz Simulated Condition 2, replicate 2 C2_R3_1.fq.gz, C2_R3_2.fq.gz Simulated Condition 2, replicate 3 GST Colloquim: March 25th, 2014 23 / 1

Download the reference genome and gene model annotations You also need the reference genome and gene model annotations (GTF models), which can be downloaded from Ensembl or Illumina wget ftp://ftp.ensembl.org/pub//mnt2/release-75/fasta/drosophila_melanogaster/dna/drosophila_melanogaster.bdgp5.75.dna.toplevel.fa.gz wget ftp://ftp.ensembl.org/pub//mnt2/release-75/gtf/drosophila_melanogaster/drosophila_melanogaster.bdgp5.75.gtf.gz gunzip Drosophila_melanogaster.BDGP5.75.* Indexing your reference genome: /lustre/home/qjia2/bin/bowtie2-build -f Drosophila_melanogaster.BDGP5.75.dna.toplevel.fa Dme_BDGP5_75 After executing the command, the following BT2 files will be created: Dme_BDGP5_75.1.bt2 Dme_BDGP5_75.2.bt2 Dme_BDGP5_75.3.bt2 Dme_BDGP5_75.4.bt2 Dme_BDGP5_75.rev.1.bt2 Dme_BDGP5_75.rev.2.bt2 For model species, you can download pre-built Bowtie and Bowtie 2 indexes from Bowtie website. GST Colloquim: March 25th, 2014 24 / 1

Create links to the required data Those required files are stored in the following directory in Newton: /data/scratch/qjia2/data2012 In your working directory, you can create links to these files so that you don t need to copy these files into your folders. To create links, type the following commands from your working directory: ln -s /data/scratch/qjia2/data2012/dme_bdgp5_75.*. ln -s /data/scratch/qjia2/data2012/genes.gtf. ln -s /data/scratch/qjia2/data2012/gsm79448*. Then, type: ls You will see those files. GST Colloquim: March 25th, 2014 25 / 1

Assess data quality In this workshop, we ll use FastQC to check the quality and integrity of the RNA-seq reads. FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. Create a directory to store output files: mkdir fastqc_reports Run FastQC: /lustre/home/qjia2/bin/fastqc -f fastq -o fastqc_reports *.fq.gz Inspect the output: FastQC generates its output as an HTML file for each file and you need view it in your web browser. FastQC report for a good Illumina dataset FastQC report for a bad Illumina dataset GST Colloquim: March 25th, 2014 26 / 1

Align RNA-seq reads to the genome using TopHat2 Create a job definition file called C1R1.sge: #$ -N C1R1 #$ -q medium* #$ -cwd #$ -pe threads 8 /home/qjia2/bin/tophat2 -G genes.gtf -o C1_R1_thout Dme_BDGP5_75 GSM794483_C1_R1_1.fq.gz GSM794483_C1_R1_2.fq.gz Submit the job using the qsub command: qsub C1R1.sge Use the qstat command to check the status of your jobs: qstat Kill your job: qdel your_job_pid GST Colloquim: March 25th, 2014 27 / 1

TopHat2 output The tophat2 produces a number of files, most of which are internal, intermediate files that are generated for use within the pipeline. The output files you will likely want to look at are: accepted_hits.bam: This file details the alignments for mapped reads. align_summary.txt deletions.bed: insertions.bed junctions.bed: This file contains all the splice-sites detected by TopHat during the alignment. logs/ prep_reads.info unmapped.bam The accepted_hits.bam file is used for our further analysis. This file is not humanreadable, but we can use Samtools to convert it to the.sam format. Next, we ll talk about Smatools first and then use IGV to look at our alignments. GST Colloquim: March 25th, 2014 28 / 1

samtools SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. samtools Program: samtools (Tools for alignments in the SAM format) Version: 0.1.19-44428cd Usage: samtools <command> [options] Command: view sort mpileup depth faidx tview index idxstats fixmate flagstat calmd merge rmdup reheader cat bedcov targetcut phase bamshuf SAM<->>BAM conversion sort alignment file multi-way pileup compute the depth index/extract FASTA text alignment viewer index alignment BAM index stats (r595 or later) fix mate information simple stats recalculate MD/NM tags and '=' bases merge sorted alignments remove PCR duplicates replace BAM header concatenate BAMs read depth per BED region cut fosmid regions (for fosmid pool only) phase heterozygotes shuffle and group alignments by name GST Colloquim: March 25th, 2014 29 / 1

File manipulation To analyse differential expression, we need to count the reads that align to each gene. The htseq-count script needs sorted.sam files as an input, so run the following commands to sort and create.sam files. samtools sort -n C1_R1_thout/accepted_hits.bam C1_R1_sn samtools view -o C1_R1_sn.sam C1_R1_sn.bam In order to view the alignments in IGV, the.bam files must be sorted by position and indexed. samtools sort C1_R1_thout/accepted_hits.bam C1_R1_s samtools index C1_R1_s.bam GST Colloquim: March 25th, 2014 30 / 1

View alignments in the IGV 1. Start the IGV software If you haven t installed it or have trouble starting it, please click here. 2. Load genome and gene annotation into IGV Under the Main Menu, click Genomes -> Create.genome File,and the following window will appear: GST Colloquim: March 25th, 2014 31 / 1

View alignments in the IGV - cont. 3. Load mapped reads into IGV Under the Main Menu, click on File -> Load from File. Choose C1_R1_s.bam, and wait for IGV to finish loading. 4. Navigate in IGV For further details see the IGV user guide at here. GST Colloquim: March 25th, 2014 32 / 1

Count reads in features with htseq-count HTSeq is a python package, so it can be used as a library. It also provides a set of stand-alone scripts that we can use from command line. The script called heseq-count will be used to count the reads overlapping with known genes. It accepts.sam files and a genome annotation file (gtf format) as inputs. htseq-count -s no -a 10 C1_R1_sn.sam genes.gtf > C1_R1.count -s: whether the data is from a strand-specific assay (default: yes) -a: skip all reads with alignment quality lower than the given minimum value (default: 10) It outputs a table with counts for each feature. FBgn0000003 0 FBgn0000008 622 FBgn0000014 91 FBgn0000015 73 FBgn0000017 2700...... After running this command on the other five samples, merge htseq-count files into one (mergedcounts.txt). gene_id C1R1 C1R2 C1R3 C2R1 C2R2 C2R3 FBgn0000003 0 0 0 0 0 0 FBgn0000008 622 618 555 530 606 547 FBgn0000014 91 81 104 87 125 102 FBgn0000015 73 67 53 55 GST 71 Colloquim: 73 March 25th, 2014 33 / 1

Find differentially expressed genes (DESeq) The commands used here are also described in the DESeq vignette (PDF). 1. Starting R and loading required modules R library("deseq") 2. Set your working directory # make sure you are under For_DESeq directory. setwd("/users/mac/documents/rna_seq/files/dataset/for_deseq") # You can use getwd() command to check your current working directory. getwd() 3. Read in your count table. CountTable = read.table("mergedcounts.txt", header = TRUE, row.names = 1) You table should look like this: head(counttable) ## C1R1 C1R2 C1R3 C2R1 C2R2 C2R3 ## FBgn0000003 0 0 0 0 0 0 ## FBgn0000008 622 618 555 530 606 547 ## FBgn0000014 91 81 104 87 125 102 ## FBgn0000015 73 67 53 55 71 73 ## FBgn0000017 2700 2425 2485 2575 2643 2604 ## FBgn0000018 328 343 363 304 288 345 GST Colloquim: March 25th, 2014 34 / 1

Find differentially expressed genes (DESeq) - cont. 4. Add treatment information to the data. condition = factor(c("c1", "C1", "C1", "C2", "C2", "C2")) condition ## [1] C1 C1 C1 C2 C2 C2 ## Levels: C1 C2 5. Create a newcountdataset cds <- newcountdataset(counttable, condition) 6. Estimate the size factors from the count data (Normalization) cds <- estimatesizefactors(cds) To see these size factors, do this: sizefactors(cds) ## C1R1 C1R2 C1R3 C2R1 C2R2 C2R3 ## 1.0297 1.0295 1.0302 0.9755 0.9762 0.9777 GST Colloquim: March 25th, 2014 35 / 1

Find differentially expressed genes (DESeq) - cont. Then, we can normalize the counts by the size factors using the following command: head(counts(cds, normalized = TRUE)) ## C1R1 C1R2 C1R3 C2R1 C2R2 C2R3 ## FBgn0000003 0.00 0.00 0.00 0.00 0.00 0.00 ## FBgn0000008 604.04 600.27 538.70 543.30 620.78 559.46 ## FBgn0000014 88.37 78.68 100.95 89.18 128.05 104.32 ## FBgn0000015 70.89 65.08 51.44 56.38 72.73 74.66 ## FBgn0000017 2622.02 2355.43 2412.04 2639.61 2707.45 2663.31 ## FBgn0000018 318.53 333.16 352.34 311.63 295.02 352.86 7. Calculate dispersion values cds <- estimatedispersions(cds) 8. Inspect the estimated dispersions plotdispests(cds) GST Colloquim: March 25th, 2014 36 / 1

Find differentially expressed genes (DESeq) - cont. 9. Perform the test for differential expression deg = nbinomtest(cds, "C1", "C2") 10. Plot the log 2 fold changes against the mean normalised counts plotma(deg) GST Colloquim: March 25th, 2014 37 / 1

Find differentially expressed genes (DESeq) - cont. 11. Plot histogram of p values hist(deg$pval, breaks = 100, col = "skyblue", main = "") 12. Filter for significant genes at a 10% false discovery rate (FDR) degsig = deg[deg$padj < 0.1, ] Count the number of significant genes: addmargins(table(deg$padj < 0.1)) ## ## FALSE TRUE Sum ## 10012 269 10281 GST Colloquim: March 25th, 2014 38 / 1

Find differentially expressed genes (DESeq) - cont. 13. Look at the significantly upregulated and downregulated genes head(degsig[order(degsig$log2foldchange, decreasing = TRUE), ]) ## id basemean basemeana basemeanb foldchange log2foldchange ## 2388 FBgn0025682 12095 6624 17565 2.652 1.407 ## 126 FBgn0000370 15468 8495 22440 2.641 1.401 ## 13444 FBgn0086904 17531 9887 25174 2.546 1.348 ## 15309 FBgn0263749 5510 3113 7908 2.540 1.345 ## 2103 FBgn0022893 15478 8754 22203 2.536 1.343 ## 2076 FBgn0022268 5276 2989 7562 2.530 1.339 ## pval padj ## 2388 1.483e-96 7.626e-93 ## 126 3.985e-97 4.097e-93 ## 13444 7.761e-91 2.660e-87 ## 15309 5.640e-72 6.443e-69 ## 2103 1.261e-89 3.241e-86 ## 2076 1.574e-81 2.312e-78 head(degsig[order(degsig$log2foldchange, decreasing = FALSE), ]) ## id basemean basemeana basemeanb foldchange log2foldchange ## 11475 FBgn0051953 12.78 19.42 6.146 0.3165-1.6597 ## 2685 FBgn0027513 160.39 189.03 131.764 0.6971-0.5206 ## 5844 FBgn0033781 245.78 281.27 210.280 0.7476-0.4197 ## 5682 FBgn0033539 546.01 624.37 467.646 0.7490-0.4170 ## 8947 FBgn0038348 258.89 295.20 222.584 0.7540-0.4073 ## 13333 FBgn0086251 529.43 600.73 458.118 0.7626-0.3910 ## pval padj ## 11475 2.557e-03 0.098108 ## 2685 8.797e-04 0.035054 ## 5844 1.432e-03 0.055784 ## 5682 3.157e-05 0.001319 ## 8947 1.942e-03 0.074794 ## 13333 1.093e-04 0.004514 GST Colloquim: March 25th, 2014 39 / 1

Find differentially expressed genes (DESeq) - cont. 14. Save our output to a file write.csv(deg, file = "Result_table.csv") write.csv(degsig, file = "Result_table_0.01FDR.csv") You can use a spreadsheet program such as Excel to open.csv files. GST Colloquim: March 25th, 2014 40 / 1

References 1. S. Anders, D. J. McCarthy, Y. S. Chen, M. Okoniewski, G. K. Smyth, W. Huber, M. D. Robinson, Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature protocols 8, 1765-1786 (2013); published online EpubSep (Doi 10.1038/Nprot.2013.099). 2. C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D. R. Kelley, H. Pimentel, S. L. Salzberg, J. L. Rinn, L. Pachter, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols 7, 562-578 (2012); published online EpubMar (10.1038/nprot.2012.016). 3. DESeq vignette: http://bioconductor.org/packages/release/bioc/vignettes/deseq/inst/doc/deseq.pdf GST Colloquim: March 25th, 2014 41 / 1