Using the RAST prokaryotic genome annotation server

Similar documents
RAST Automated Analysis. What is RAST for?

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

ithenticate User Manual

Bioinformatics Resources at a Glance

Turnitin Blackboard 9.0 Integration Instructor User Manual

INTRODUCTION 5 COLLABORATION RIBBON 5 SELECT THE UPDATING METHOD 6 MAKE YOUR PROJECT COLLABORATIVE 8 PROCESSING RECEIVED TASK UPDATES 9

ImageNow User. Getting Started Guide. ImageNow Version: 6.7. x

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

ithenticate User Manual

Oracle Fusion Middleware User s Guide for Oracle Approval Management for Microsoft Excel 11gRelease 1 ( )

ithenticate User Manual

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

Adding a File Attachment to a CFS Requisition

Turnitin Student User Manual Chapter 1: Getting Started

MEDIAplus administration interface

Virtual Communities Operations Manual

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Personal Portfolios on Blackboard

EBOX Digital Content Management System (CMS) User Guide For Site Owners & Administrators

QQConnect Overview Guide

CONTENTM WEBSITE MANAGEMENT SYSTEM. Getting Started Guide

DarwiNet Client Level

Table of Contents. 1. Content Approval...1 EVALUATION COPY

Axis 360 Administrator User Manual. May 2015

Teacher Activities Page Directions

GenBank, Entrez, & FASTA

FUNDS ADMINISTRATIVE SERVICE INC.

Telecom Systems Billing Application User Guide

HDAccess Administrators User Manual. Help Desk Authority 9.0

Welcome to PowerClaim Net Services!

SECTION 5: Finalizing Your Workbook

Google Docs A Tutorial

AppShore Premium Edition Campaigns How to Guide. Release 2.1

How to Add Users 1. 2.

Table of Contents. Visma Software International Contents

Getting Started and Administration

Phone.com. Communicate Better

Managing Documents in the Citrix XenApp Remote Desktop

Affiliated Provider Billing/Coding

EMC Documentum Webtop

Faculty Access for the Web 7 - New Features and Enhancements

Reading Wonders Training Resource Guide

INFORMATION SYSTEMS SERVICE NETWORKS AND TELECOMMUNICATIONS SECTOR. User Guide for the RightFax Fax Service. Web Utility

Updated 08/2015. Wire Transfer User Guide

Intellect Platform - Tables and Templates Basic Document Management System - A101

SES PAS Senior Executive Service (SES) Performance Appraisal System (PAS)

Oracle Business Intelligence Publisher: Create Reports and Data Models. Part 1 - Layout Editor

Sign in. Select Search Committee View

owncloud Configuration and Usage Guide

Introduction to OpenOffice Writer 2.0 Jessica Kubik Information Technology Lab School of Information University of Texas at Austin Fall 2005

USC Aiken CMS Manual. A manual on using the basic functions of the dotcms system. Office of Marketing and Community Relations-USC Aiken

Copyright EPiServer AB

Access to Moodle. The first session of this document will show you how to access your Lasell Moodle course, how to login, and how to logout.

CentreSuite Expense Routing Cardholder USER GUIDE

Student User Guide. Introduction to the Module Management System (MMS) in Philosophy. Logging in; Submitting work; Logging out

Content Management System QUICK START GUIDE

Web Content Management Training Manualv3

Bitrix Site Manager 4.1. User Guide

Hosted Fax Service User Guide. Version 3.2 March, 2010 This document is subject to change without notice.

LEARNING RESOURCE CENTRE. Guide to Microsoft Office Online and One Drive

Making Visio Diagrams Come Alive with Data

HELPDESK SYSTEM (HDS) USER MANUAL

Starting User Guide 11/29/2011

Guide for Bioinformatics Project Module 3

Basics. a. Click the arrow to the right of the Options button, and then click Bcc.

PALM BEACH STATE COLLEGE APPLICANT TRACKING SYSTEM HIRING ADMINISTRATOR S GUIDE

Creating an Event Registration Web Page with Special Features using regonline Page 1

Quick and Easy Web Maps with Google Fusion Tables. SCO Technical Paper

NJCU WEBSITE TRAINING MANUAL

Your Archiving Service

Content Management System User Guide

To launch the Microsoft Excel program, locate the Microsoft Excel icon, and double click.

Visualization of Phylogenetic Trees and Metadata

PERFORMANCE MANAGEMENT Frequently Asked Questions

At the top of the page there are links and sub-links which allow you to perform tasks or view information in different display options.

EASA Airworthiness Directives publishing tool 2008 EASA

SINGLE NUMBER SERVICE - MY SERVICES MANAGEMENT

RSCCD REMOTE PORTAL TABLE OF CONTENTS: Technology Requirements NOTE

UF Health SharePoint 2010 Introduction to Content Administration

Getting Started Guide - Desktop

Secure Message Center User Guide

DESIGN A WEB SITE USING PUBLISHER Before you begin, plan your Web site

Terminal Four (T4) Site Manager

Teacher References archived classes and resources

How Do I Upload Multiple Trucks?

Virginia s Department of Minority Business Enterprise

Content Manager User Guide Information Technology Web Services

Where do I start? DIGICATION E-PORTFOLIO HELP GUIDE. Log in to Digication

Analyzing A DNA Sequence Chromatogram

Integrated Research Application System (IRAS)

Editor Manual for SharePoint Version December 2005

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.

Creating a New Digital ID or Signature for Adobe Acrobat

Version End User Help Files. GroupLink Corporation 2014 GroupLink Corporation. All rights reserved

Version End User Help Files. GroupLink Corporation 2015 GroupLink Corporation. All rights reserved

E-FILE. Universal Service Administrative Company (USAC) Last Updated: September 2015

everything HelpDesk Help Files May 2009 GroupLink Corporation 2009 GroupLink Corporation. All rights reserved

BackupAgent Management Console User Manual

What Do You Think? for Instructors

Transcription:

Using the RAST prokaryotic genome annotation server RAST is designed to rapidly call and annotate the genes of a complete or essentially complete prokaryotic genome. RAST, Rapid Annotations based on Subsystem Technology, uses a "Highest Confidence First" assignment propagation strategy based on manually curated subsystems and subsystem-based protein families that automatically guarantees a high degree of assignment consistency. RAST returns an analysis of the genes and subsystems in your genome, as supported by comparative and other forms of evidence. Because NMPDR and the SEED provide access to all essentially complete, public genomes without a user account, the use of RAST without an account makes no sense you must have a free account in order for access to your data to be kept under your control. The tools available in RAST for comparing your new private data to public genomes are mostly the same as those available for analyzing public genomes at NMPDR (www.nmpdr.org). The tour of the site will follow the workflow listed below. For short answers to specific questions, see the RAST FAQ. Upload and manage your job Sequence format and upload steps Log in and select "Upload New Job" from the "Your Jobs" menu. Step 1: Browse for the sequence file which must be a plain text file in either FASTA format or GenBank format only. o Multiple contigs or replicons of the same genome should all be together in one file. o Files encoded as html, pdf, rtf, doc, docx, embl, gff3, or gtf will be rejected. Sequences in the correct FASTA or Genbank format must NOT be in a Microsoft Word document--save as a plain text (*.txt) file, text encoding Windows (default); do NOT insert line breaks or allow character sustitution. o Click the button to "Upload and go to step 2." Step 2: Provide the name of the organism and choose a translation table. o If you know or find the taxonomy ID from NCBI, paste it into the text box. Then, when you select either Bacteria or Archaea with the radio button, the corresponding genus, species, and strain will autofill accurately. If you do not know or cannot find an ID in the NCBI Taxonomy database, then fill in the genus (one word), species (one word), and strain (any number of

o words). RAST will provide a dummy ID number corresponding to nothing in the taxonomy database. Most bacteria use version 11 of the genetic code, but mycoplasmas and spiroplasmas use version 4. Step 3: Provide information about the sequence data and select settings. o If you uploaded a GenBank file, you may elect to preserve gene calls. Since there are no gene calls in a FASTA file, this choice would be unavailable. o Select whether the translated proteins should reflect corrected frame shifts if you have low-quality sequence data. o Look at the information in the "Upload summary" tab to confirm that the system detected the sequence data you intended to upload. o Click the button to "Finish Upload." Manage your job From the "Your Jobs" menu, select "Jobs Overview." If you have logged out, you will be directed to your jobs overview upon logging back in. Your annotation job could be complete in as few as 8 hours. Track progress: All of your jobs are displayed in a table with active headers. o An overview of the progress is shown in the table as a series of colored boxes. Select the link to view details of one job. Access completed job: From the details page, you may view or download the annotated genome. o How to navigate the genome viewer will be discussed below.

o Download formats include GenBank, GFF3, and EMBL; with and without EC numbers. Share your annotated genome with one or more other users. o You can share this job with others by clicking the link and adding the email addresses (one at a time) of registered users to whom you would like to grant access to your otherwise private data. o If you would like to share with many people, e.g. a class, request a new group by emailing rast @ mcs.anl.gov. Group memberships may be viewed from the account management page, which is accessed by clicking on the pair of people at the far right of the green menu bar. This is also where you can change your password, if needed. Delete job: First you must click on the "view details" link in the jobs table. Then, the green menu bar in the header of the page will provide an option labeled with your job number. The only action to choose from this menu is "Delete this job." An intermediate screen will appear to confirm whether you are sure you want to delete. Click the button to do so. View your annotated genome results Organism Overview page From the jobs table, click on "view details" and then "Browse annotated genome in SEED Viewer" The overview page opens with a table that lists how many contigs, how many genes, the number of genes that are assigned to complete subsystems, and the subsystem categories represented in your genome.

The green "Features in Subsystems" tab displays all genes and other features that are automatically included in subsystems because one similar sequence was found for all roles in a functional variant of the subsystem. The table is resortable and downloadable. In the menu bar, under Organism, the feature table will display all annotated features in your genome both those in and not in complete subsystems. Click on any feature ID to open the Annotation Overview page for a selected feature. Annotation Overview pages for individual features Compare Regions displays the new genome at the top in comparison with 4 other closely related genomes. Sets of homologous proteins located in the genomic region are presented in the same color with a numerical label. To expand the graphic comparison to more genomes, click the advanced button, select the option to collapse close genomes, type in a larger number of genomes (20), select PCH pin for clustered genes, or leave at similarity for isolated genes, then click the button to redraw the graphic. All information in the graphic is presented in a table in the next tab.

Walk the chromosome or contigs of your new genome Browse genome Access to the genome browser is available from the menu bar, under Organism -> Genome Browser, and in the hint box at the top of the Organism Overview page. The genome browser provides a visual tour of the annotated features. You may choose a larger window, and you may color the features by subsystem or clustering. Click navigation arrows to move forward or back along the genome. Clicking a feature arrow in the graphic will center the graphic on that protein and color the focus protein red. Mouse over any feature to pop up its identity. The table beneath the graphic allows you to scan or search for a feature of interest. Click the "Show" button in the Region column to focus the browser graphic on the selected feature. Click on the details button in the focus tab to open the Annotation Overview page for the protein of focus. Click on any feature ID in the table to open its Annotation Overview page. Metabolic reconstruction of your new genome--automated subsystems assignments The genome overview page displays a pie chart of complete subsystems identified in the genome. Expand the categories to see subcategories and subsystem names, along with the number or proteins assigned to each. The table (click the green "Features in Subsystems" tab) provides similar access; you can select the "Carbohydrates" category from the column header. From the Carbohydrates category either in the chart or table, click to open the Glycolysis and Gluconeogenesis subsystem.

In the subsystem spreadsheet, the new genome is highlighted and displayed in the context of closely related public genomes. The spreadsheet is arranged with functional roles in columns, genomes in rows, and genes annotated to those roles in the respective cells. Within one row, genes that are clustered on the genome are shown in the same color. Click on any of the genes in the newly annotated genome to open its Annotation Overview page, which will open in a new window or tab.

How does your genome sequence compare with others? Run the sequence-based comparison tool From the page showing the subsystem, go back on your browser to return to the overview, then select "Sequencebased comparison" from the Comparative Tools menu. Step 1: Select a reference genome. If your private sequence is in many contigs, the best selection may be a known, high-quality genome. Step 2: Select up to three comparison genomes. Step 3: Click button to compute.

Result This tool allows you to select from all your private genomes as well as all public genomes. Protein sequence similarities are computed on demand, which allows you to select from all your annotation jobs. The example illustrated below uses two private jobs, the GenBank and FASTA versions of the same Streptococcus equi subsp. zooepidemicus MGCS10565 genome in order to compare the preserved, published gene calls to the RAST-computed gene calls. The second two genomes are nephritogenic strains of Streptococcus pyogenes from the list of public genomes. Notice that the most similar sequences are ribosomal proteins. Results are computed on demand in real time using BLASTP to compare every protein in the reference genome to every protein in the comparison genomes. Results are presented in a color-coded table and in a circular map, in order of the contigs/genes in the reference genome. If your private genomes are in multiple contigs, and you use a closed, public genome as the reference, these results will help order your contigs. Comparison proteins are listed with their contig number, gene number, and length. The gene numbers are linked to pop-up boxes that list the annotation (name) as well as the proportion of identical amino acids. The amino acid identity of the comparison genomes relative to the reference is color-coded on a scale that is not linear, but logarithmic, and that follows the order of the visible spectrum.

How does your genome content compare with others in the database? Run the function-based comparison tool From the Comparative Tools menu of the organism summary page, select "Function-based comparison." Step 1: Your newly annotated genome is already input as the reference. Step 2: Highlight one comparison genome in the list. Step 3: Click the "Select" button Result The table opens with all features in your genome (A) or the comparison genome (B) that are associated with a complete subsystem. The table is sortable, searchable by subsystem category or name, and downloadable. The first column may be reset to show features associated with subsystems in genome A, but not B; in both A and B; or in B but not A. When genome B is very closely related to the new genome, A, this comparison of which functions are annotated in B but not automatically associated with subsystems in A

indicates a place to begin looking at the annotations to evaluate accuracy or find missing functions. It is important to keep in mind that automated subsystem assignments are made only if all roles required for one functional variant of a subsystem have been correctly annotated. Annotations are assigned based on sequence similarity, while inclusion in subsystems is based on the annotation matching that of a functional role of a subsystem. Therefore, if the best sequence match to your new protein has a functional annotation that does not match exactly to the name of a role in a subsystem, the protein in your new genome will not be included in the subsystem. Such an event may occur when the protein with the best sequence match in the database has not yet been reviewed and anotated by the curator of the subsystem in question. How to include two or more jobs (private organisms) in comparative analyses Set private organism preferences For each individual job, RAST calculates similarities between all features in the input genome (private genome) and all features the SEED database (public genomes). In order for two or more of your jobs to be displayed in compare regions or other comparative tools, the similarities of features in the private genomes to each other must be calculated. In order to maintain privacy of the data in each job, the similarities between two or more private genomes are calculated only upon request. Select "Private Organism Preferences" from the Your Jobs menu Shift the genomes you want to compare with each other into the peers box at right by selecting them (one at a time) and clicking the right-pointing arrow. Click the button to check requirements for computation. Select those comparisons that require calculation, then click the button to request computation. You will recieve an email when the computation is complete.