RNA-Seq Software, Tools, and Workflows

Size: px
Start display at page:

Download "RNA-Seq Software, Tools, and Workflows"

Transcription

1 RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop

2 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption: Changes in transcription/mrna levels correlate with phenotype (protein expression) Identification of splice variants Novel gene identification Transcriptome assembly SNP finding RNA editing

3 Experimental Design What biological question am I trying to answer? What types of samples (tissue, timepoints, etc.)? How much sequence do I need? Length of read? Platform? Single-end or paired-end? Barcoding? Pooling? Biological replicates: how many? Technical replicates: how many? Protocol considerations?

4 Strand-Specific (Directional) RNA-Seq Now the default for Illumina TruSeq kits Preserves orientation of RNA after reverse transcription to cdna Informs alignments to genome Determine which genomic DNA strand is transcribed Identify anti-sense transcription (e.g., lncrnas) Quantify expression levels more precisely Demarcate coding sequences in microbes with overlapping genes Very useful in transcriptome assemblies Allows precise construction of sense and anti-sense transcripts

5 Influence of the Organism Novel little/no previous sequencing (may need assembly) Non-Model some sequence available (draft genome or transcriptome assembly) Thousands of scaffolds, maybe tens of chromosomes Some annotation (ab initio, EST-based, etc.) Model genome fully sequenced and annotated Multiple genomes available for comparison Well-annotated transcriptome based on experimental evidence Genetic maps with markers available Basic research can be conducted to verify annotations (mutants available)

6 Differential Gene Expression Generalized Workflow

7 Read QC and Trimming Scythe Sickle Stringency of trimming depends on read length and downstream processing. Alignment can often tolerate some amount of contamination. Various sizes of trimmed reads can introduce biases in alignments. Transcriptome assembly works best with aggressive trimming.

8 QC and Alignment You may want to align your reads to a few different datasets, including rrna of your organism (or related organisms) Contaminant screening, if desirable Genome or transcriptome to generate counts for differential expression analysis First we ll go through alignment to a gene set/transcriptome, then we ll look at aligning to a genome

9 Alignment Choosing an Aligner (Pt. 1 Gene Set) Examples of gene sets: rrna sequences potential contaminant sequences (such as PhiX), a transcriptome (eukaryotic or bacterial), where each sequence in the fasta represents a different transcript (mrna) De novo transcriptome assembly A splicing-aware aligner is not needed bwa aln (<76 bp), bwa mem (>75 bp), or bowtie2 Requiring strand-specific orientation can be more difficult (requires filtering the bam file).

10 Library QC rrna Contamination mrna rrna rrna rrna mrna mrna Total RNA sample After PolyA isolation After rrna depletion Total RNA contains >95% rrna Poly A selected libraries should contain <5% rrna rrna-depleted libraries should contain <15% rrna Align reads to rrna sequences from organism or relatives Higher than expected or inconsistent rrna content may indicate degraded RNA or problems in library prep. Generally, don t need to remove rrna reads

11 Insert Size (determined from gene set alignments) Adapter ~60 bases cdna bases =Insert Size Adapter ~60 bases The insert is the cdna (or RNA) ligated between the adapters. Typical insert size is bases, but can be larger. Insert size distribution depends on library prep method. Overlapping paired-end reads Gapped paired-end reads Single-end read

12 Paired-End Reads Read INSERT SIZE (TLEN) Read 2 Read 1 INNER DISTANCE (+ or -) Read 2

13 Generating Raw Counts from Gene Set Alignments The sam/bam file can be filtered for just those pairs where both reads align in concordantly (proper pairs) to one transcript. sam2counts or samtools idxstats can be used for counting. Counts per transcript need to be added up to counts per gene before running stats software

14 Alignment to a Genome - Choosing a Reference Human/mouse: GENCODE (uses Ensembl IDs) ( but may need some manipulation to work with certain software Ensembl genomes ( and Biomart ( Illumina igenomes ( sequencing_software/igenome.html) provides genome indexes for some software, and files with extra info for Tophat/cufflinks. NCBI genomes ( Many specialized databases (Phytozome, Patric, VectorBase, FlyBase, WormBase) Do it yourself genome assembly and gene-finding (don t forget functional annotation)

15 Alignment Choosing an Aligner (Pt. 2 Genome) Eukaryotic genes are separated into exons and introns, so if you are using a genome reference, you need an aligner that is either splicing-aware or won t choke on spliced RNA reads. A splicing-aware aligner will recognize the difference between a short insert and a read that aligns across exon-intron boundaries Spliced Reads (joined by dashed line) Read Pairs (joined by solid line)

16 (Some) Splicing-Aware or Splicing-Agnostic Aligners Tophat2 (bowtie2) what we ll be using in these exercises. (remember, igenomes); also integrates with cufflinks to find novel genes and transcripts HISAT/HISAT2 successor to tophat2, and also works for alignment of populations (i.e., multiple human samples). I haven t tried this yet. It s on our Galaxy try it out and report what happens. GSNAP I haven t used this; can handle (and produce) non-standard file formats. STAR this is what I use. It s very fast and doesn t need a separate counting step. But it does need a lot of memory (>128G RAM to index a mammalian genome). bwa mem a local aligner, is not splicing-aware, but can handle local alignments (doesn t require the entire read to align).

17 Generating Raw Counts from Genome Alignments Reads aligning to a chromosome or scaffold must be parsed to assign them to a gene. We ll need a roadmap to indicate where each gene (exon) is located in the genome

18 Generating Raw Counts from Genome Alignments This roadmap is the GTF (Gene Transfer Format) file: The left columns list source, feature type, and genomic coordinates The right column includes attributes, including gene ID, etc. Let s look at the GTF fields in more detail

19 Fields in the GTF File (left to right) Sequence Name (i.e., chromosome, scaffold, etc.) chr12 Source (program that generated the gtf file or feature) unknown Feature (i.e., gene, exon, CDS, start codon, stop codon) CDS Start (starting location on sequence) End (end position on sequence) Score. Strand (+ or -) + Frame (0, 1, or 2: which is first base in codon, zero-based) 2 Attribute ( ; -delimited list of tags with additional info) gene_id "PRMT8"; gene_name "PRMT8"; p_id "P10933"; transcript_id "NM_019854"; tss_id "TSS4368";

20 An Unusual GTF File

21 Generating Gene Counts from Genome Alignments HTSeq-count runs rather slowly; doesn t work well on coordinate-sorted bam files. FeatureCounts Runs faster, and will work on coordinate-sorted bam files. Has more parameters than HTSeq-count. STAR Will generate counts equivalent to HTSeq-count with default parameters, in the same run as alignment. All of these can utilize strand-specific information in GTF file to assign reads from strand-specific libraries. But what about those reads that can t be assigned unambiguously to a gene?

22 Multiple-Mapping Reads ( Multireads ) Some reads will align to more than one place in the reference, due to: Common domains, gene families Paralogs, pseudogenes, polyploidy etc. This can distort counts, leading to misleading expression levels If a read can t be uniquely mapped, how should it be counted? Should it be ignored (not counted at all)? Should it be randomly assigned to one location among all the locations to which it aligns equally well? This may depend on the question you re asking and also depends on the alignment and counting software you use.

23 Reads That Align to Gene Features From: This read unambiguously aligns to Gene A This read also uniquely aligns to Gene A This read aligns uniquely, but could be unspliced This spliced read uniquely aligns to Gene A Genes A and B overlap, but the read is uniquely within Gene A Genes A and B overlap, the read is within Gene A, but overlaps Gene B Genes A and B overlap, and the read ambiguously maps to both genes

24 Read Count Definitions When using tools such as HTSeq-count, FeatureCounts, and STAR, the output table will include counts of reads that can be assigned to each gene and some additional lines for reads that can t be assigned, including: Ambiguous read overlaps more than one gene Multiple Mapping read maps equally well to more than one place in the genome No Feature read maps uniquely to genome, but in an intergenic (or intronic) region Unmapped read cannot be mapped to genome (based on run parameters)

25 Examples From Two Recent Experiments Genome Human (hg38) Mouse (mm10) Annotation GENCODE GENCODE Library Prep rrna depletion polya selection rrna 8.8% 5.6% Aligned to Genome 98.6% 98.1% Assigned to Genes (exons) 51.8% 84.1% Ambiguous 2.2% 1.3% Multimapping 12.9% 6.2% No Feature 31.6% 6.5% Unmapped 1.4% 1.9%

26 How Well Did Your Reads Align to the Reference? % Reads Mapped 100 Calculate percentage of reads mapped per sample Great! Good Incomplete Reference? Sample Contamination?

27 Checking Your Results Key genes that may confirm sample ID Knock-out or knock-down genes Genes identified in previous research Specific genes of interest Hypothesis testing Important pathways Experimental validation (e.g., qrt-pcr) The best way to determine if your analysis protocol accurately models your organism/experiment Ideally, validation should be conducted on a different set of samples

28 Differential Gene Expression Generalized Workflow Bioinformatics analyses are in silico experiments The tools and parameters you choose will be influenced by factors including: Available reference/annotation Experimental design (e.g., pairwise vs. multi-factor) The right tools are the ones that best inform on your experiment Don t just shop for methods that give you the answer you want

29 Differential Gene Expression Generalized Workflow File Types Fastq Fasta (reference) gtf (gene features) sam/bam Text tables

30 Today s Exercises Differential Gene Expression Today, we ll analyze a few types of data you ll typically encounter: 1. Single-end reads ( tag counting with well-annotated genomes) using RNA-Seq data from the same experiment as yesterday s ChIP-Seq data. 2. Paired-end reads (finding novel transcripts in a genome with incomplete annotation) 3. Paired-end reads for differential gene expression when only a transcriptome is available (such as after a transcriptome assembly)

31 Today s Exercises Differential Gene Expression And we ll be using a few different software: 1. Tophat to align spliced reads to a genome 2. FeatureCounts to generate raw counts tables for 3. edger, for differential expression of genes 4. Cufflinks to find novel genes and transcripts 5. Bowtie2 to align reads to a transcriptome reference 6. sam2counts to extract raw counts from bowtie2 alignments

32 Gene Construction

33 Alignment and Differential Expression Read set(s) TopHat bam file(s) We will follow these steps in the first exercise. Existing annotation (GTF) Toptables, etc. edger/limma FeatureCounts

34 But, do we have all the genes? For organisms with genomes, gene models are stored in gtf files Assumptions: The gtf file contains annotation for ALL transcripts and genes All splice sites, start/stop codons, etc. are correct Are these assumptions correct for every sequenced organism? RNA-Seq reads can be used to independently construct genes and splice variants using limited or no annotation Method used depends on how much sequence information there is for the organism

35 Gene Construction (Alignment) vs. Assembly Genome- Sequenced Organisms Trinity software Novel or Non-Model Organisms Haas and Zody (2010) Nat. Biotech. 28:421-3

36 Gene / Transcriptome Construction Annotation can be improved even for well-annotated model organisms Identify all expressed exons Combine expressed exons into genes Find all splice variants for a gene Discover novel transcripts For newly sequenced organisms Validate ab initio annotation Comparison between different annotation sets Can assist in finding some types of contamination Reconstruction of rrna genes Genomic/mitochondrial DNA in RNA library preps.

37 Reference Annotation Based Transcript (RABT) Assembly Read set(s) TopHat bam file(s) Existing annotation (GTF) [optional] Cufflinks Cuffmerge Cuffcompare Read-set specific GTF(s) Merged GTF Final assembly (GTF and stats) Toptables, etc. edger/limma FeatureCounts

38 A Few Words About Bacterial RNA-Seq

39 Eukaryotic and Bacterial Gene Structures are Different Eukaryotes Gene structure includes introns and exons Splicing, poly-adenylation Each mrna is a discrete molecule when translated Bacteria / Prokaryotes Individual genes and groups of genes in operons Generally, no splicing, no polya One mrna can contain coding sequences for multiple proteins

40 Bacterial RNA-Seq Considerations rrna depletion strategies may leave considerable amounts of non-coding RNA molecules Splicing-aware aligners (such as Tophat) may not be useful Reads from polycistronic mrna may overlap two genes How would HTSeq-Count or FeatureCounts handle this? Compare alignments to the genome to alignments to transcriptome. Some aligners, such as bwa-mem, will report secondary alignments Transcriptome alignments can be used to generate counts table for edger Specialized software, such as Rockhopper (stand-alone,

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc. New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System

More information

Introduction to NGS data analysis

Introduction to NGS data analysis Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High

More information

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012 RNA-Seq Tutorial 1 John Garbe Research Informatics Support Systems, MSI March 19, 2012 Tutorial 1 RNA-Seq Tutorials RNA-Seq experiment design and analysis Instruction on individual software will be provided

More information

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable

More information

PreciseTM Whitepaper

PreciseTM Whitepaper Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

NGS Data Analysis: An Intro to RNA-Seq

NGS Data Analysis: An Intro to RNA-Seq NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1 Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1 Experimental

More information

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

Challenges associated with analysis and storage of NGS data

Challenges associated with analysis and storage of NGS data Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

Comparing Methods for Identifying Transcription Factor Target Genes

Comparing Methods for Identifying Transcription Factor Target Genes Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF

More information

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,

More information

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Basic processing of next-generation sequencing (NGS) data

Basic processing of next-generation sequencing (NGS) data Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance

More information

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are

More information

1 Mutation and Genetic Change

1 Mutation and Genetic Change CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds

More information

Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org

Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org Using Galaxy for NGS Analysis Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org Overview NGS Data Galaxy tools for NGS Data Galaxy for Sequencing Facilities Overview

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

Searching Nucleotide Databases

Searching Nucleotide Databases Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames

More information

The world of non-coding RNA. Espen Enerly

The world of non-coding RNA. Espen Enerly The world of non-coding RNA Espen Enerly ncrna in general Different groups Small RNAs Outline mirnas and sirnas Speculations Common for all ncrna Per def.: never translated Not spurious transcripts Always/often

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent

More information

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Analysis of ChIP-seq data in Galaxy

Analysis of ChIP-seq data in Galaxy Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers

More information

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per

More information

Gene Models & Bed format: What they represent.

Gene Models & Bed format: What they represent. GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,

More information

LifeScope Genomic Analysis Software 2.5

LifeScope Genomic Analysis Software 2.5 USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use

More information

Introduction to Genome Annotation

Introduction to Genome Annotation Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT

More information

mrna NGS Data Analysis Report

mrna NGS Data Analysis Report mrna NGS Data Analysis Report Project: Test Project (Ref code: 00001) Customer: Test customer Company/Institute: Exiqon Date: Monday, June 29, 2015 Performed by: XploreRNA Exiqon A/S Company Reg. No. (CVR)

More information

Human Genome Organization: An Update. Genome Organization: An Update

Human Genome Organization: An Update. Genome Organization: An Update Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion

More information

Deep Sequencing Data Analysis

Deep Sequencing Data Analysis Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist

More information

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design) Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d. 13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both

More information

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

More information

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins

More information

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg (hackenberg@ugr.es)

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg (hackenberg@ugr.es) WEB-SERVER MANUAL Contact: Michael Hackenberg (hackenberg@ugr.es) 1 1 Introduction srnabench is a free web-server tool and standalone application for processing small- RNA data obtained from next generation

More information

Introduction. Overview of Bioconductor packages for short read analysis

Introduction. Overview of Bioconductor packages for short read analysis Overview of Bioconductor packages for short read analysis Introduction General introduction SRAdb Pseudo code (Shortread) Short overview of some packages Quality assessment Example sequencing data in Bioconductor

More information

Subread/Rsubread Users Guide

Subread/Rsubread Users Guide Subread/Rsubread Users Guide Subread v1.5.0-p1/rsubread v1.20.3 1 February 2016 Wei Shi and Yang Liao Bioinformatics Division The Walter and Eliza Hall Institute of Medical Research The University of Melbourne

More information

Expression Quantification (I)

Expression Quantification (I) Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (paired-end) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNA-seq protocol Task

More information

CIDANE: comprehensive isoform discovery and abundance estimation

CIDANE: comprehensive isoform discovery and abundance estimation Canzar et al. Genome Biology (2016) 17:16 DOI 10.1186/s13059-015-0865-0 METHOD Open Access CIDANE: comprehensive isoform discovery and abundance estimation Stefan Canzar 1,4 *, Sandro Andreotti 2, David

More information

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Services. Updated 05/31/2016

Services. Updated 05/31/2016 Updated 05/31/2016 Services 1. Whole exome sequencing... 2 2. Whole Genome Sequencing (WGS)... 3 3. 16S rrna sequencing... 4 4. Customized gene panels... 5 5. RNA-Seq... 6 6. qpcr... 7 7. HLA typing...

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams. Module 3 Questions Section 1. Essay and Short Answers. Use diagrams wherever possible 1. With the use of a diagram, provide an overview of the general regulation strategies available to a bacterial cell.

More information

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine

More information

TGC AT YOUR SERVICE. Taking your research to the next generation

TGC AT YOUR SERVICE. Taking your research to the next generation TGC AT YOUR SERVICE Taking your research to the next generation 1. TGC At your service 2. Applications of Next Generation Sequencing 3. Experimental design 4. TGC workflow 5. Sample preparation 6. Illumina

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

Recombinant DNA and Biotechnology

Recombinant DNA and Biotechnology Recombinant DNA and Biotechnology Chapter 18 Lecture Objectives What Is Recombinant DNA? How Are New Genes Inserted into Cells? What Sources of DNA Are Used in Cloning? What Other Tools Are Used to Study

More information

Structure and Function of DNA

Structure and Function of DNA Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977

More information

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College Primary Source for figures and content: Eastern Campus Tortora, G.J. Microbiology

More information

-> Integration of MAPHiTS in Galaxy

-> Integration of MAPHiTS in Galaxy Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration

More information

RNA- seq de novo ABiMS

RNA- seq de novo ABiMS RNA- seq de novo ABiMS Cleaning 1. import des données d'entrée depuis Data Libraries : Shared Data Data Libraries RNA- seq de- novo 2. lancement des programmes de nettoyage pas à pas BlueLight.sample.read1.fastq

More information

Computational Genomics. Next generation sequencing (NGS)

Computational Genomics. Next generation sequencing (NGS) Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years

More information

13.4 Gene Regulation and Expression

13.4 Gene Regulation and Expression 13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.

More information

Next generation sequencing (NGS)

Next generation sequencing (NGS) Next generation sequencing (NGS) Vijayachitra Modhukur BIIT modhukur@ut.ee 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

How Sequencing Experiments Fail

How Sequencing Experiments Fail How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine

More information

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu. Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.au What is Gene Expression & Gene Regulation? 1. Gene Expression

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

European Medicines Agency

European Medicines Agency European Medicines Agency July 1996 CPMP/ICH/139/95 ICH Topic Q 5 B Quality of Biotechnological Products: Analysis of the Expression Construct in Cell Lines Used for Production of r-dna Derived Protein

More information

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!! DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other

More information

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009 Genome and DNA Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009 Admin Reading: Chapters 1 & 2 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring09/bme110-calendar.html

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Text file One header line meta information lines One line : variant/position

Text file One header line meta information lines One line : variant/position Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!

More information

Bioinformatics Unit Department of Biological Services. Get to know us

Bioinformatics Unit Department of Biological Services. Get to know us Bioinformatics Unit Department of Biological Services Get to know us Domains of Activity IT & programming Microarray analysis Sequence analysis Bioinformatics Team Biostatistical support NGS data analysis

More information

The Galaxy workflow. George Magklaras PhD RHCE

The Galaxy workflow. George Magklaras PhD RHCE The Galaxy workflow George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org

More information

Analysis of NGS Data

Analysis of NGS Data Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference

More information

Understanding West Nile Virus Infection

Understanding West Nile Virus Infection Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,

More information

Translation Study Guide

Translation Study Guide Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to

More information

Overview of Eukaryotic Gene Prediction

Overview of Eukaryotic Gene Prediction Overview of Eukaryotic Gene Prediction CBB 231 / COMPSCI 261 W.H. Majoros What is DNA? Nucleus Chromosome Telomere Centromere Cell Telomere base pairs histones DNA (double helix) DNA is a Double Helix

More information

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications

More information

Biological Sequence Data Formats

Biological Sequence Data Formats Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

A survey of best practices for RNA-seq data analysis

A survey of best practices for RNA-seq data analysis Conesa et al. Genome Biology (2016) 17:13 DOI 10.1186/s13059-016-0881-8 REVIEW A survey of best practices for RNA-seq data analysis Open Access Ana Conesa 1,2*, Pedro Madrigal 3,4*, Sonia Tarazona 2,5,

More information

A Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here

A Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here A Complete Example of Next- Gen DNA Sequencing Read Alignment Presentation Title Goes Here 1 FASTQ Format: The de- facto file format for sharing sequence read data Sequence and a per- base quality score

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

Genomes and SNPs in Malaria and Sickle Cell Anemia

Genomes and SNPs in Malaria and Sickle Cell Anemia Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing

More information

mygenomatix - secure cloud for NGS analysis

mygenomatix - secure cloud for NGS analysis mygenomatix Speed. Quality. Results. mygenomatix - secure cloud for NGS analysis background information & contents 2011 Genomatix Software GmbH Bayerstr. 85a 80335 Munich Germany info@genomatix.de www.genomatix.de

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

GENE REGULATION. Teacher Packet

GENE REGULATION. Teacher Packet AP * BIOLOGY GENE REGULATION Teacher Packet AP* is a trademark of the College Entrance Examination Board. The College Entrance Examination Board was not involved in the production of this material. Pictures

More information

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT Kimberly Bishop Lilly 1,2, Truong Luu 1,2, Regina Cer 1,2, and LT Vishwesh Mokashi 1 1 Naval Medical Research Center, NMRC Frederick, 8400 Research Plaza,

More information

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes

More information

PrimePCR Assay Validation Report

PrimePCR Assay Validation Report Gene Information Gene Name Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID papillary renal cell carcinoma (translocation-associated) PRCC Human This gene

More information

REAL TIME PCR USING SYBR GREEN

REAL TIME PCR USING SYBR GREEN REAL TIME PCR USING SYBR GREEN 1 THE PROBLEM NEED TO QUANTITATE DIFFERENCES IN mrna EXPRESSION SMALL AMOUNTS OF mrna LASER CAPTURE SMALL AMOUNTS OF TISSUE PRIMARY CELLS PRECIOUS REAGENTS 2 THE PROBLEM

More information

Normalization of RNA-Seq

Normalization of RNA-Seq Normalization of RNA-Seq Davide Risso Modified: April 27, 2012. Compiled: April 27, 2012 1 Retrieving the data Usually, an RNA-Seq data analysis from scratch starts with a set of FASTQ files (see e.g.

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert May 3, 2016 Abstract FlipFlop implements a fast method for de novo transcript

More information

RAST Automated Analysis. What is RAST for?

RAST Automated Analysis. What is RAST for? RAST Automated Analysis Gordon D. Pusch Fellowship for Interpretation of Genomes What is RAST for? RAST is designed to rapidly call and annotate the genes of a complete or essentially complete prokaryotic

More information

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized: Module 3F Protein Synthesis So far in this unit, we have examined: How genes are transmitted from one generation to the next Where genes are located What genes are made of How genes are replicated How

More information