17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg (hackenberg@ugr.es)



Similar documents
Frequently Asked Questions Next Generation Sequencing

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Comparing Methods for Identifying Transcription Factor Target Genes

The Galaxy workflow. George Magklaras PhD RHCE

Introduction. Overview of Bioconductor packages for short read analysis

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Supervised DNA barcodes species classification: analysis, comparisons and results. Tutorial. Citations

PreciseTM Whitepaper

This document presents the new features available in ngklast release 4.4 and KServer 4.2.

RJE Database Accessory Programs

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

Bioinformatics Resources at a Glance

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

SRA File Formats Guide

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

Analysis of ChIP-seq data in Galaxy

GenBank, Entrez, & FASTA

INSTALLATION AND SETUP HANDBOOK OF PAYU LATAM s PLUGIN FOR WOOCOMMERCE

Data Analysis for Ion Torrent Sequencing

Module 1. Sequence Formats and Retrieval. Charles Steward

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Metadata Import Plugin User manual

Basic processing of next-generation sequencing (NGS) data

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

Using Internet or Windows Explorer to Upload Your Site

USING STUFFIT DELUXE THE STUFFIT START PAGE CREATING ARCHIVES (COMPRESSED FILES)

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

BioHPC Web Computing Resources at CBSU

Data formats and file conversions

User Guide. DocAve Lotus Notes Migrator for Microsoft Exchange 1.1. Using the DocAve Notes Migrator for Exchange to Perform a Basic Migration

Databases and mapping BWA. Samtools

Version 5.0 Release Notes

Usability in bioinformatics mobile applications

Mir-X mirna First-Strand Synthesis Kit User Manual

Microsoft Business Intelligence 2012 Single Server Install Guide

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

ADFS Integration Guidelines

Visualization of Phylogenetic Trees and Metadata

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

INSTALLATION AND SETUP HANDBOOK OF PAYU LATAM s PLUGIN FOR WOOCOMMERCE

On-line supplement to manuscript Galaxy for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

FWG Management System Manual

WebLogic Server 6.1: How to configure SSL for PeopleSoft Application

Basic Analysis of Microarray Data

The RNAi Consortium (TRC) Broad Institute

Next Generation Sequencing

InventoryControl for use with QuoteWerks Quick Start Guide

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction to next-generation sequencing data

Extensible Sequence (XSQ) File Format Specification 1.0.1

CD-HIT User s Guide. Last updated: April 5,

SimpleFTP. User s Guide. On-Core Software, LLC. 893 Sycamore Ave. Tinton Falls, NJ United States of America

QUANTIFY INSTALLATION GUIDE

NCBI resources III: GEO and ftp site. Yanbin Yin Spring 2013

PrimePCR Assay Validation Report

Tutorial for proteome data analysis using the Perseus software platform

Installation & Configuration Guide User Provisioning Service 2.0

Excel To Component Interface Utility

COMPARING DNA SEQUENCES TO DETERMINE EVOLUTIONARY RELATIONSHIPS AMONG MOLLUSKS

SharePoint AD Information Sync Installation Instruction

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

ACAAGGGACTAGAGAAACCAAAA AGAAACCAAAACGAAAGGTGCAGAA AACGAAAGGTGCAGAAGGGGAAACAGATGCAGA CHAPTER 3

Bonita Open Solution. Introduction Tutorial. Version 5.7. Application Development User Guidance Profile: Application Developer

Customization & Enhancement Guide. Table of Contents. Index Page. Using This Document

Configuring Network Load Balancing with Cerberus FTP Server

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

Next generation sequencing (NGS)

[Jet-Magento Integration]

PART 1 CONFIGURATION 1.1 Installing Dashboard Software Dashboardxxx.exe Administration Rights Prerequisite Wizard

Chironomid DNA Barcode Database Search System. User Manual

TSM Studio Server User Guide

RNA- seq de novo ABiMS

Quick Start Guide. Cerberus FTP is distributed in Canada through C&C Software. Visit us today at

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Configure Web Conference Parameters Through The Web Conference Administration User Interface.

NGS Data Analysis: An Intro to RNA-Seq

Typing in the NGS era: The way forward!

SharePoint Wiki Redirect Installation Instruction

ODBC Driver Version 4 Manual

The world of non-coding RNA. Espen Enerly

NaviCell Data Visualization Python API

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

How to Install a Network-Licensed Version of IBM SPSS Statistics 19

Prepare the environment Practical Part 1.1

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Analytical Study of Hexapod mirnas using Phylogenetic Methods

LogLogic Trend Micro OfficeScan Log Configuration Guide

KPN SMS mail. Send SMS as fast as !

AVG File Server User Manual. Document revision (11/13/2012)

Bioinformatics Grid - Enabled Tools For Biologists.

Technical Support Set-up Procedure

The Register Menu allows you to register, download, and activate licenses so that your players can run.

QAS for Salesforce User Guide

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening

ACCREDITED SOLUTION. EXPLORER Core FTP

SQL Server 2008 Express - Installation Guide

Transcription:

WEB-SERVER MANUAL Contact: Michael Hackenberg (hackenberg@ugr.es) 1

1 Introduction srnabench is a free web-server tool and standalone application for processing small- RNA data obtained from next generation sequencing platforms, such as Illumina or SOLiD. The srnabench tool is the replacement for miranalyzer. This short tutorial is meant to provide a quick start for the web-server. For further details and to obtain the srnabench complete functionality, please see the main manual to install a standalone version of the software. 2 Main menu Figure 1: Main srnabench menu. Home: srnabench main web page with basic information, releases, etc Restart: clean srna analysis window (see Analysis window). Web Manual: link to this manual. Manual: main manual of srnabench. Differential Expression: srnas differential expression analysis (see Differential expression). Helper Tools: tools to parse ENSEMBL and NCBI formats (see Helper tools). Cite: software publication. FAQs: link to frequently asked questions. 2

3 Analysis window Figure 2: Analysis window. 3.1 Input data Figure 3: Input data. The datasets can be provided uploading a file from a local computer (i) or by means of an URL (ii). In case of big files, they must be gzip compressed and must be provided by means of an URL, because a file sizes limit has been included on the POST method. Several input formats are accepted. In general, all formats can be compressed with gzip: WARNING: Compressed files need 'gz' extension. fastq (or fastq.gz) read count format: tab separated format with read sequence in the first column and read count in the second. fasta: the identifier field of the fasta format must encode the read count. In general, srnabench will expect >readid#read_count (using '#' as separator). Bowtie alignment files: Bowtie alignment files can be used, these files must have 'bowtieout' extension. 3

sra files 3.2 Select species Figure 4: Select species. srnabench can be used in two different modes: genome mapping (ii) or library mapping (i.a or i.b). NOTE: srnabench can treat an unlimited number of species simultaneously. The main application of this feature might be in the analysis of virus infection and the study of host/parasite interactions. i. There are two ways to use library mode: a. A species is selected and the 'Do not map to genome (Library mode)' checkbox is activated. Then, srnabench will use the annotations from the srnabench database for the selected species during the reads mapping. b. No species is selected and the 'Do not map to genome (Library mode)' checkbox is not activated. In this case, srnabench will only analyse micrornas (check MicroRNA analysis). ii. If a species is selected and the 'Do not map to genome (Library mode)' checkbox is not activated, srnabench will use the genome mapping mode. 4

3.3 Adapter removal Figure 5: Adapter removal. srnabench can perform the adapter trimming. The web-server version will by default search for the first 10 bases of the adapter (ii) allowing a maximum of 1 mismatches (iv). It is recommended to provide the adapter sequence (v) or select one of the options given by the application, which are the most common adapters used on microrna analysis (iii): Illumina RA3, Illumina (alternative) or SOLiD (SREK). If the adapter is not known, although it is not recommended, guess the adapter sequence (i) option should be activated. Then, srnabench will align the first 250,000 reads to the genome using the bowtie seed functionality (the adapters will not count for the mismatches). Out of all aligned reads, the adapter sequence is defined as the most frequent 10-mer starting at the first mismatch. And lastly, when the adapter is sequenced at the very end of the read, sometimes its length is shorter than the length threshold (ii), so it must be search in a recursively way without taking into account the minimum length (vi). NOTE: recursive adapter trimming is crucially when the reads have a length of 36 bp and small RNA populations between 27 and 34 bp should be analysed. 3.4 MicroRNA analysis Figure 6: microrna analysis. By default, the microrna analysis is done for all the species selected during the Select species (ii) step (i). During the analysis, usually srnabench will also try to assign the reads to other srna types. However, if no species is selected and the 'Do not map to genome (Library mode)' checkbox is not activated, as it was commented on Select species (i.b) step, srnabench will only analyse micrornas. In this case, the species names (like 'hsa', 'ebv', etc.) of the mirbase nomenclature must be provided in the 5

microrna analysis menu (ii). In addition, srnabench can try to detect putative homologous micrornas based on sequence similarity. This option can be activated providing a string with : separated short species names ( hsa:rno:mmu ) or typing all (use the entire mirbase database, except the species included in (ii) or those from the Select species (ii) step) in the text field Analyse homologous micrornas (iii). By default, this text field is empty, and the homologous micrornas are not analysed. 3.5 Parameters Figure 7: Alignment parameters. The srnabench server also allows choosing the parameters that will be use during the alignment process: (i). Fastq input format could be SOLiD (activated) or Illumina (by default). (ii). Length of the 5 end of the read that would be aligned either to the genome or libraries (seed). (iii). Read count threshold: reads with lower counts are filtered out. (iv). Alignment type: seed alignment ( n mode) or the whole read will be used for alignment ( v mode). NOTE: the seed length would be omitted if the v mode is chosen. (v). Allowed number of mismatches during the alignment. (vi). Barcode trimming: number of nucleotides that need to be removed from the 5 end of the reads before the alignment. 6

3.6 Upload user files Figure 8: Upload user library files. srnabench web-server allows the user uploading library files not included on the server, for example a new microrna library for species not included on the database or other RNA types neither included. The libraries can be uploaded from a file on the local computer or by means of an URL; the accepted formats are fasta or bed. 3.7 Working example Figure 9: Working example. To show the usefulness of srnabench, we processed a publically available small RNA dataset from the BC-1 cell line. BC-1 is a primary effusion lymphoma (PEL) from human b-cell, which is caused by Kaposi's sarcoma-associated herpes-virus (herpesvirus type 8, HHV-8) and frequently also harbors Epstein-Barr virus (EBV). The expression of HHV-8 and EBV micrornas in PELs suggests a role for these micrornas in viral latency and lymphomagenesis. A brief description of the protocol can be found in the dataset link: 18-25nt long small RNAs were gel purified from 50 mg total RNA and subjected to small RNA cdna library preparation protocol with barcoded 5 adaptors (BC-1: TCAAG, BC-3: TTGGC, BCBL-1: GCCTA). Resulting PCR products were purified from 10% TBE gels, pooled and sequenced on one lane on the Illumina GAII platform. Seeing the dataset description and the protocol explanation, to process this dataset we will follow the following steps: Copy and paste the BC-1 dataset link into the URL input data textbox (see Input data). As the dataset could present mirnas for three different species, all of them must be chosen on the Select species menu: human (hg19), Epstein Barr Virus (NC_007605) and Human herpesvirus 8 (KSHV) (NC_009333). Illumina GAII platform has been used to sequence the srnas, we cannot be sure 7

which adapter has been used, so the guess adapter option will be activated in the Adapter removal options. Three different barcodes have been added to the 5 end of the reads with a fixed length of 5 nt. These barcodes must be trimmed before trying to align the reads, so remove barcode option at the Parameters section will be set to 5. Moreover, taking into account that the reads have 36 nt length, out of which 5 nt correspond to the barcode will have at the most 31 nt useful information. Therefore we cannot use the default minimum adapter length as this would imply that we can only profile small RNAs equal or shorter than 31nt -10nt = 21nt. Therefore, we will set the minimum adapter length to 6nt allowing the profiling of small RNAs up to 25 nt. As the adapter length is quite short the allowed max. number of mismatches in adapter detection will be reduced to 0. 4 Differential Expression Figure 10: Differential expression analysis. srnabench differential expression analysis is based on edger package, although the standalone version has numerous parameters, the web-server version will run the analysis with its default parameters (see standalone manual). First of all, each sample must be processed independently and the srnabench ids for each process should be kept by the user. The web version needs at least two samples for each group that will be compared (for example, a case/control study). Once each sample has been processed with srnabench, the ids of the analysis should be included in the differential expression text field section (i) (as it is shown in the example, the groups to be compared must be separated by # and the samples ids within each group by : ). 8

5 Helper Tools The srnabench suite is completed with some useful tools to parse some common file formats (NCBI, Ensembl and gtrnadb) to the srnabench accepted format, which is a simple fasta with the transcript name and its classification separated by :. EXAMPLE: > NR_031589:microRNA GCGTTGGCTGGCAGAGGAAGGGAAGGGTCCAGGGTCAGCTGAGCATGCCCTCAGGTTG CTCACTGTTCTTCCCTAGAATGTCAGGTGATGT NOTE: Please remember citing any RNA annotation source that you finally decide to use on your research: Ensembl database, NCBI RefSeq database, genomic trna database or other... 5.1 Parse Ensembl Fasta Files Figure 11: Parse Ensembl Fasta Files. The Ensembl format is also a simple fasta format, but with a more complex identifier than the srnabench one. In order to convert the Ensembl format, a file (i) or directly an URL (ii) from the FTP Ensembl database can be provided (for example, the cdna annotated for Canis lupus familiaris). The Ensembl plant format is a bit different, so for plant annotations Ensembl Plant? (iii) should be chosen. 5.2 Parse NCBI RefSeq Figure 12: Parse NCBI RefSeq. 9

As on the Ensembl parse tool, the NCBI file can be provided from the local computer (i) or by means of an URL (ii). The NCBI RefSeq annotation can be obtained from the NCBI FTP Database (for example, the Bos taurus annotated RNAs can be provided). In this case, to include the RNA classification to the srnabench file, the Add RNA classification (iii) must be set on. 5.3 Parse trna from the genomic trna database Figure 13: Parse trna from the Genomic trna Database. In order to include trna information into the analysis, a tool is available to download the genomic trna database information for the species provided in the text field (i) (the species names must be separated by _, for example: Homo_sapiens, Mus_musculus, etc ). 10