Library page. SRS first view. Different types of database in SRS. Standard query form



Similar documents
Module 1. Sequence Formats and Retrieval. Charles Steward

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced

Tutorial 3 Maintaining and Querying a Database

Chapter 1: The Cochrane Library Search Tour

TRIM: Web Tool. Web Address The TRIM web tool can be accessed at:

GenBank, Entrez, & FASTA

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Top Tips 9 IS Portal Tips & Tricks

Photo Library. Help Guide

Bioinformatics Resources at a Glance

1.00 ATHENS LOG IN BROWSE JOURNALS 4

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

This document presents the new features available in ngklast release 4.4 and KServer 4.2.

At the end of this lesson, you will be able to create a Request Set to run all of your monthly statements and detail reports at one time.

COURSE DESCRIPTION. Queries in Microsoft Access. This course is designed for users with a to create queries in Microsoft Access.

Objectives. Understand databases Create a database Create a table in Datasheet view Create a table in Design view

Sequence Information. Sequence information. Good web sites. Sequence information. Sequence. Sequence

Tutorial 3. Maintaining and Querying a Database

Microsoft Office 2010

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Online Application Instruction Document

Bioinformatics Grid - Enabled Tools For Biologists.

Express222 Quick Reference

X1 Professional Client

A database is a collection of data organised in a manner that allows access, retrieval, and use of that data.

User Guide for Patients

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

A Guide to using egas Lead Applicant

CHAPTER 6: SEARCHING AN ONLINE DATABASE

Online Statements. About this guide. Important information

Netmail Search for Outlook 2010

Introduction to Client Online. Factoring Guide

Chapter 4b - Navigating RedClick Import Wizard

An agent-based layered middleware as tool integration

Biological Sequence Data Formats

HelpDesk Connect Operator Manual rev. 1.0.

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

Instructions for Graduate School Online Course Information and Selection

Quick Start Guide to Logging in to Online Banking

UGENE Quick Start Guide

MEDLINE (via Ovid): Introduction to Searching

Turn editing on and under Add an Activity use the drop down list and click on Questionnaire.

Microsoft Outlook Sorting, searching and filtering s. Sorting your messages

Rational Software. Getting Started with Rational Customer Service Online Case Management. Release 1.0

Molecular Databases and Tools

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Training Guide Travel & Expenses How to Find an Expense Report Number or an Employee ID. State of Kansas

BIOINFORMATICS TUTORIAL

Perform this procedure when you need to add a recurring payment option, or when you need to change or withdraw it.

PeopleSoft Query Training

The PTA s new membership website database and dues reporting system

PubMed My NCBI: Saving Searches & Creating Alerts

Online Application Instruction Document

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Creating a Participants Mailing and/or Contact List:

GenBank: A Database of Genetic Sequence Data

Supervised DNA barcodes species classification: analysis, comparisons and results. Tutorial. Citations

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

Searching your Archive in Outlook (Normal)

Order Tracking Tips and Techniques

LexisNexis TotalPatent. Training Manual

Converting an Excel Spreadsheet Into an Access Database

How to Find Commercial Sources

Using Word 2007 For Mail Merge

Version 5.0 Release Notes

How to use the Online Module Enrolment Application

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

SMALL BUSINESS USER GUIDE

EXCEL Using Excel for Data Query & Management. Information Technology. MS Office Excel 2007 Users Guide. IT Training & Development

Data Entry Training Module

Infinite Campus Ad Hoc Reporting Basics

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

Tutorial: Creating a form that s the results to you.

Human Resources (HR) Query Basics

EEHIST: East of England Healthcare Information Skills Trainers. searching the evidence: Cochrane Library

Banner Workflow. Creating FOAPAL Requests

Medicare Part D Plan Finder instructions

Updated 08/2015. Wire Transfer User Guide

A Tutorial in Genetic Sequence Classification Tools and Techniques

FFA Web Application User Guide

DiaComp Funding Programs Submission Documentation

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

How to Filter and Sort Excel Spreadsheets (Patient-Level Detail Report)

Transcription:

SRS & Entrez SRS Sequence Retrieval System Bengt Persson Whatis SRS? Sequence Retrieval System User-friendly interface to databases http://srs.ebi.ac.uk Developed by Thure Etzold and co-workers EMBL/EBI & Lion Bioscience AG Information retrieval WhyuseSRS? Easy way to retrieve information from sequence and sequence-related databases Possibility to search for multiple words/other criteria Linkage between different databases E.g. Find all primary structures with known threedimensional structure... and much more Bengt Persson, Linköpings Universitet & Karolinska Institutet 3 Bengt Persson, Linköpings Universitet & Karolinska Institutet 4 (c) Bengt Persson, Linköping University & Karolinska Institutet 1

SRS first view 1. Quick search via the initial tab 2. Normal search via the Library Page tab followed by the Query Form tab 2. Select type of query form Library page 1. Select one or more databases by ticking the corresponding box Drop down menu for type of database Bengt Persson, Linköpings Universitet & Karolinska Institutet 5 Bengt Persson, Linköpings Universitet & Karolinska Institutet 6 Different types of database in SRS Sequence & structure DNA, protein, three-dimensional structures Sequence-related Gene-related Genome, mapping, mutations, transcription factors SNP Bibliographic Medline, enzyme User-defined Bengt Persson, Linköpings Universitet & Karolinska Institutet 7 Standard query form 3. Select AND or OR if multiple search items are used 4. Select number of results to show at a time 2. Select field to search 5. Submit query 1. Type text to search for Bengt Persson, Linköpings Universitet & Karolinska Institutet 8 (c) Bengt Persson, Linköping University & Karolinska Institutet 2

Query results 6. Possibility to analyse results with other tools, e.g. Fasta and ClustalW 5. Link sequences to other databases 4. Results can be saved 2. Short information 1. Hypertext links 3. Tick boxes to select/deselect sequences for further analyses and choose selection method Bengt Persson, Linköpings Universitet & Karolinska Institutet 9 Starting other programmes from SRS In this example, three sequences weresubmittedto ClustalWfor a multiple sequence alignment This was performed by: tickingthe boxesof the sequences in the results view choosingthe alternative apply options on selected results only choosingto launchthe programme ClustalW Bengt Persson, Linköpings Universitet & Karolinska Institutet 10 Preparing for ClustalW analysis Submission verification Bengt Persson, Linköpings Universitet & Karolinska Institutet 11 Bengt Persson, Linköpings Universitet & Karolinska Institutet 12 (c) Bengt Persson, Linköping University & Karolinska Institutet 3

Results page ClustalW results Bengt Persson, Linköpings Universitet & Karolinska Institutet 13 Bengt Persson, Linköpings Universitet & Karolinska Institutet 14 Using the link function Query result Use SRS to answer the following question: For whichshort-chaindehydrogenases/ reductases(sdr) are the three-dimensional structure known i PDB? 2. Enter which field to search 1. Enter the search term sdr Press the button Link in order to get to the Link page Bengt Persson, Linköpings Universitet & Karolinska Institutet 15 Bengt Persson, Linköpings Universitet & Karolinska Institutet 16 (c) Bengt Persson, Linköping University & Karolinska Institutet 4

Link page Three different ways of linking 1. You can link in three different ways 4. Finally, we press the Search button 3. Then we select chunk size 2. In this case, we select to link to PDB 1. In the selected databanks that are linked to the current query A new entry list is made using entries found in the selected databanks that are linked to the entries in the current query. This is most useful for finding additional information about a thing that is located in other databanks. 2. In the current query that are linked to all selected databanks Determine if any of the entries in the current query are linked to a particular databank or set of databanks. In other words, you are refining the original query. 3. In the current query that are not linked to any of the selected databanks This is another limiting operation. Any entries that do link to the specified databanks will not be included with the results of this linking operation. Useful for eliminating entries based on a known condition that you do not want. Bengt Persson, Linköpings Universitet & Karolinska Institutet 17 Bengt Persson, Linköpings Universitet & Karolinska Institutet 18 Linkage type 1, Results Linkage type 2, Results Bengt Persson, Linköpings Universitet & Karolinska Institutet 19 Bengt Persson, Linköpings Universitet & Karolinska Institutet 20 (c) Bengt Persson, Linköping University & Karolinska Institutet 5

Linkage type 3, Results Example of a Swissprot entry Links to original article Bengt Persson, Linköpings Universitet & Karolinska Institutet 21 Bengt Persson, Linköpings Universitet & Karolinska Institutet 22 Journal page Difference between AC and ID AC ID accession number always follows the sequence sequence identification often abbreviation of the gene name or otherwise function dependent might change Bengt Persson, Linköpings Universitet & Karolinska Institutet 23 Bengt Persson, Linköpings Universitet & Karolinska Institutet 24 (c) Bengt Persson, Linköping University & Karolinska Institutet 6

the Tools tab The Results tab Bengt Persson, Linköpings Universitet & Karolinska Institutet 25 Bengt Persson, Linköpings Universitet & Karolinska Institutet 26 The Projects tab The Views tab Bengt Persson, Linköpings Universitet & Karolinska Institutet 27 Bengt Persson, Linköpings Universitet & Karolinska Institutet 28 (c) Bengt Persson, Linköping University & Karolinska Institutet 7

The Databanks tab The basics of querying (extract from SRS help file) Querying is asking the system a question. A simple query may read: Are there any entries in the SWISSPROT databank that contain the query term 'cancer in any field? SRS would check all fields in the SWISSPROT databank for any occurrences of the search word and display a list of all the entries matching that query into the result page. Bengt Persson, Linköpings Universitet & Karolinska Institutet 29 Bengt Persson, Linköpings Universitet & Karolinska Institutet 30 Selecting a databank to query The databanks are sorted into groups according to databank type. The groupings provide clues about the databanks that you want to include on your query. If you are looking for information about a specific gene you may want to check the sequences databanks. Gene mutations are found in the databanks listed in the "Mutations" group, and protein structure databanks are included in the 3Dstruct group. Determine the type of data you want and pick a databank from that group. The search phrase SRS looks in the selected data fields of the chosen databanks for the search term you entered. At its simplest, the search term can be a single word. You can perform rather exacting and precise queries though using the query language designed specifically for this purpose. Bengt Persson, Linköpings Universitet & Karolinska Institutet 31 Bengt Persson, Linköpings Universitet & Karolinska Institutet 32 (c) Bengt Persson, Linköping University & Karolinska Institutet 8

Search phrase types There are three ways to query the system the word query the numeric query the regular expression query Words A single word or a multi-word can be used When you use a single word, the system simply checks that word against every word in the index for the databank(s) selected. For example, typing the word cancer in the textbox for the search term tells the system to look for and include in the results of the query all instances of the word. Search phrase types, cont. Numeric Numeric expressions such as dates Regular Expression SRS allows you to enter your query as a regular expression. This means that if you are unsure of the spelling of a thing you could enter only the first few characters of its name and get a list of matching entries as your result. You can also apply controls to the regular expression that will limit the type of search it performs, thus saving a lot of time for the query. Bengt Persson, Linköpings Universitet & Karolinska Institutet 33 Bengt Persson, Linköpings Universitet & Karolinska Institutet 34 Quick Search Standard query Extended query Picking a query type Bengt Persson, Linköpings Universitet & Karolinska Institutet 35 Quick search This is the fastest way to generate a query With this option you have the fewest steps from selecting the databank to viewing the results. The AllText data field is always used for a quick search query. By selecting the AllText field you are telling SRS to query all fields that have a data type of text. The benefit to this type of grouping is that you can search several fields at once without having to pick those fields from the list. The drawback to querying with alltext is that it will include in the query results entries that have nothing to do with what you want because there was some cursory use of your search term in the comment or description field. It is still a good starting point though and a quick search could very possibly provide you with hints for narrowing the query down using one of the other query methods. Bengt Persson, Linköpings Universitet & Karolinska Institutet 36 (c) Bengt Persson, Linköping University & Karolinska Institutet 9

Standard query Accessible from the "Top Page" in the left hand column under the heading "search the selected databanks with..." If you are searching more then one databank at a time, the data fields available will be limited to only those fields that are valid for all databanks. You can search as many as four data fields at once using the Standard Query form. There are four drop down lists and four textbox elements. The drop down list is always to the left of the text box for which it refers. You can scroll through the data fields in the drop down list and pick the one that you want to query, one for each textbox. Extended query The extended query allows you to specify a search term for every single data field available in the databank. All the fields of all the databanks are displayed on the extended query form. The common fields and the databank specific fields are not displayed in any particular order and therefore make it easy to render the inclusion of too many databanks as pointless. Chances are, however, that if you always use the AllText or description field you will catch all the databanks. Every data field available to the databank or databanks being queried are displayed and available to search in the Extended Query form. To the right of each data field name is a textbox element where you can enter the search term. The extended query form provides an excellent querying mechanism for the more complicated queries. Take care that you use at least one data field that is valid for each databank and keep in mind that from this form there is no way of determining which databanks a particular data field is used for. Bengt Persson, Linköpings Universitet & Karolinska Institutet 37 Bengt Persson, Linköpings Universitet & Karolinska Institutet 38 Order of data fields The order in which a data field appears relates to the order that that data field will be checked. There is no way to reverse the order of data fields using the extended form. The standard form, while limiting you to only four data fields per query, allows you to specify the order that those fields will be checked. Clearly this is an advantage over the extended form. Expression query You can perform an expression query from the Results page. There is a text area section near the top of the page. Select this text area and enter the expression. After entering the expression to query click the "query expression" button. The Expression Query is a textarea element in the Results List page. You can use the expression query to combine, link, or refine the results of existing queries. All valid data fields are available to an expression query. Bengt Persson, Linköpings Universitet & Karolinska Institutet 39 Bengt Persson, Linköpings Universitet & Karolinska Institutet 40 (c) Bengt Persson, Linköping University & Karolinska Institutet 10

List of Operators Starting a FASTA search from SRS Operator &! Meaning Logical OR Logical AND Logical AND NOT (in colloquial English, BUT NOT) > < >^ >_ Link left Link right Get subtree defined by right operand (hierarchical links) Get leaf entries of the subtree defined by right operand (hierarchical links) Bengt Persson, Linköpings Universitet & Karolinska Institutet 41 Bengt Persson, Linköping University & Karolinska Institutet 42 FASTA form FASTA results Click here to see the alignment between the query sequence and this sequence Bengt Persson, Linköpings Universitet & Karolinska Institutet 43 Bengt Persson, Linköpings Universitet & Karolinska Institutet 44 (c) Bengt Persson, Linköping University & Karolinska Institutet 11

What is Entrez? Entrez Another user-friendly interface to databases, similar to SRS (Sequence Retrieval System) Other functionality http://www.ncbi.nlm.nih.gov Developed at NCBI (National Center for Biotechnology Information, USA) Bengt Persson, Linköpings Universitet & Karolinska Institutet 46 NCBI home page http://www.ncbi.nlm.nih.gov All databases, Results 1. Direct links to different services 2.Possibility to search database directly 3. Click here to enter Entrez Bengt Persson, Linköpings Universitet & Karolinska Institutet 47 Bengt Persson, Linköpings Universitet & Karolinska Institutet 48 (c) Bengt Persson, Linköping University & Karolinska Institutet 12

All databases, Results, cont. Protein databases, Results Bengt Persson, Linköpings Universitet & Karolinska Institutet 49 Bengt Persson, Linköpings Universitet & Karolinska Institutet 50 Protein results, Detailed view Protein results, Detailed view, cont. Bengt Persson, Linköpings Universitet & Karolinska Institutet 51 Bengt Persson, Linköpings Universitet & Karolinska Institutet 52 (c) Bengt Persson, Linköping University & Karolinska Institutet 13

Limitation of search results Taxonomy browser Bengt Persson, Linköpings Universitet & Karolinska Institutet 53 Bengt Persson, Linköpings Universitet & Karolinska Institutet 54 Taxonomy browser Taxonomy browser Bengt Persson, Linköpings Universitet & Karolinska Institutet 55 Bengt Persson, Linköpings Universitet & Karolinska Institutet 56 (c) Bengt Persson, Linköping University & Karolinska Institutet 14

Taxonomy browser Taxonomy browser Bengt Persson, Linköpings Universitet & Karolinska Institutet 57 Bengt Persson, Linköpings Universitet & Karolinska Institutet 58 (c) Bengt Persson, Linköping University & Karolinska Institutet 15