SRS & Entrez SRS Sequence Retrieval System Bengt Persson Whatis SRS? Sequence Retrieval System User-friendly interface to databases http://srs.ebi.ac.uk Developed by Thure Etzold and co-workers EMBL/EBI & Lion Bioscience AG Information retrieval WhyuseSRS? Easy way to retrieve information from sequence and sequence-related databases Possibility to search for multiple words/other criteria Linkage between different databases E.g. Find all primary structures with known threedimensional structure... and much more Bengt Persson, Linköpings Universitet & Karolinska Institutet 3 Bengt Persson, Linköpings Universitet & Karolinska Institutet 4 (c) Bengt Persson, Linköping University & Karolinska Institutet 1
SRS first view 1. Quick search via the initial tab 2. Normal search via the Library Page tab followed by the Query Form tab 2. Select type of query form Library page 1. Select one or more databases by ticking the corresponding box Drop down menu for type of database Bengt Persson, Linköpings Universitet & Karolinska Institutet 5 Bengt Persson, Linköpings Universitet & Karolinska Institutet 6 Different types of database in SRS Sequence & structure DNA, protein, three-dimensional structures Sequence-related Gene-related Genome, mapping, mutations, transcription factors SNP Bibliographic Medline, enzyme User-defined Bengt Persson, Linköpings Universitet & Karolinska Institutet 7 Standard query form 3. Select AND or OR if multiple search items are used 4. Select number of results to show at a time 2. Select field to search 5. Submit query 1. Type text to search for Bengt Persson, Linköpings Universitet & Karolinska Institutet 8 (c) Bengt Persson, Linköping University & Karolinska Institutet 2
Query results 6. Possibility to analyse results with other tools, e.g. Fasta and ClustalW 5. Link sequences to other databases 4. Results can be saved 2. Short information 1. Hypertext links 3. Tick boxes to select/deselect sequences for further analyses and choose selection method Bengt Persson, Linköpings Universitet & Karolinska Institutet 9 Starting other programmes from SRS In this example, three sequences weresubmittedto ClustalWfor a multiple sequence alignment This was performed by: tickingthe boxesof the sequences in the results view choosingthe alternative apply options on selected results only choosingto launchthe programme ClustalW Bengt Persson, Linköpings Universitet & Karolinska Institutet 10 Preparing for ClustalW analysis Submission verification Bengt Persson, Linköpings Universitet & Karolinska Institutet 11 Bengt Persson, Linköpings Universitet & Karolinska Institutet 12 (c) Bengt Persson, Linköping University & Karolinska Institutet 3
Results page ClustalW results Bengt Persson, Linköpings Universitet & Karolinska Institutet 13 Bengt Persson, Linköpings Universitet & Karolinska Institutet 14 Using the link function Query result Use SRS to answer the following question: For whichshort-chaindehydrogenases/ reductases(sdr) are the three-dimensional structure known i PDB? 2. Enter which field to search 1. Enter the search term sdr Press the button Link in order to get to the Link page Bengt Persson, Linköpings Universitet & Karolinska Institutet 15 Bengt Persson, Linköpings Universitet & Karolinska Institutet 16 (c) Bengt Persson, Linköping University & Karolinska Institutet 4
Link page Three different ways of linking 1. You can link in three different ways 4. Finally, we press the Search button 3. Then we select chunk size 2. In this case, we select to link to PDB 1. In the selected databanks that are linked to the current query A new entry list is made using entries found in the selected databanks that are linked to the entries in the current query. This is most useful for finding additional information about a thing that is located in other databanks. 2. In the current query that are linked to all selected databanks Determine if any of the entries in the current query are linked to a particular databank or set of databanks. In other words, you are refining the original query. 3. In the current query that are not linked to any of the selected databanks This is another limiting operation. Any entries that do link to the specified databanks will not be included with the results of this linking operation. Useful for eliminating entries based on a known condition that you do not want. Bengt Persson, Linköpings Universitet & Karolinska Institutet 17 Bengt Persson, Linköpings Universitet & Karolinska Institutet 18 Linkage type 1, Results Linkage type 2, Results Bengt Persson, Linköpings Universitet & Karolinska Institutet 19 Bengt Persson, Linköpings Universitet & Karolinska Institutet 20 (c) Bengt Persson, Linköping University & Karolinska Institutet 5
Linkage type 3, Results Example of a Swissprot entry Links to original article Bengt Persson, Linköpings Universitet & Karolinska Institutet 21 Bengt Persson, Linköpings Universitet & Karolinska Institutet 22 Journal page Difference between AC and ID AC ID accession number always follows the sequence sequence identification often abbreviation of the gene name or otherwise function dependent might change Bengt Persson, Linköpings Universitet & Karolinska Institutet 23 Bengt Persson, Linköpings Universitet & Karolinska Institutet 24 (c) Bengt Persson, Linköping University & Karolinska Institutet 6
the Tools tab The Results tab Bengt Persson, Linköpings Universitet & Karolinska Institutet 25 Bengt Persson, Linköpings Universitet & Karolinska Institutet 26 The Projects tab The Views tab Bengt Persson, Linköpings Universitet & Karolinska Institutet 27 Bengt Persson, Linköpings Universitet & Karolinska Institutet 28 (c) Bengt Persson, Linköping University & Karolinska Institutet 7
The Databanks tab The basics of querying (extract from SRS help file) Querying is asking the system a question. A simple query may read: Are there any entries in the SWISSPROT databank that contain the query term 'cancer in any field? SRS would check all fields in the SWISSPROT databank for any occurrences of the search word and display a list of all the entries matching that query into the result page. Bengt Persson, Linköpings Universitet & Karolinska Institutet 29 Bengt Persson, Linköpings Universitet & Karolinska Institutet 30 Selecting a databank to query The databanks are sorted into groups according to databank type. The groupings provide clues about the databanks that you want to include on your query. If you are looking for information about a specific gene you may want to check the sequences databanks. Gene mutations are found in the databanks listed in the "Mutations" group, and protein structure databanks are included in the 3Dstruct group. Determine the type of data you want and pick a databank from that group. The search phrase SRS looks in the selected data fields of the chosen databanks for the search term you entered. At its simplest, the search term can be a single word. You can perform rather exacting and precise queries though using the query language designed specifically for this purpose. Bengt Persson, Linköpings Universitet & Karolinska Institutet 31 Bengt Persson, Linköpings Universitet & Karolinska Institutet 32 (c) Bengt Persson, Linköping University & Karolinska Institutet 8
Search phrase types There are three ways to query the system the word query the numeric query the regular expression query Words A single word or a multi-word can be used When you use a single word, the system simply checks that word against every word in the index for the databank(s) selected. For example, typing the word cancer in the textbox for the search term tells the system to look for and include in the results of the query all instances of the word. Search phrase types, cont. Numeric Numeric expressions such as dates Regular Expression SRS allows you to enter your query as a regular expression. This means that if you are unsure of the spelling of a thing you could enter only the first few characters of its name and get a list of matching entries as your result. You can also apply controls to the regular expression that will limit the type of search it performs, thus saving a lot of time for the query. Bengt Persson, Linköpings Universitet & Karolinska Institutet 33 Bengt Persson, Linköpings Universitet & Karolinska Institutet 34 Quick Search Standard query Extended query Picking a query type Bengt Persson, Linköpings Universitet & Karolinska Institutet 35 Quick search This is the fastest way to generate a query With this option you have the fewest steps from selecting the databank to viewing the results. The AllText data field is always used for a quick search query. By selecting the AllText field you are telling SRS to query all fields that have a data type of text. The benefit to this type of grouping is that you can search several fields at once without having to pick those fields from the list. The drawback to querying with alltext is that it will include in the query results entries that have nothing to do with what you want because there was some cursory use of your search term in the comment or description field. It is still a good starting point though and a quick search could very possibly provide you with hints for narrowing the query down using one of the other query methods. Bengt Persson, Linköpings Universitet & Karolinska Institutet 36 (c) Bengt Persson, Linköping University & Karolinska Institutet 9
Standard query Accessible from the "Top Page" in the left hand column under the heading "search the selected databanks with..." If you are searching more then one databank at a time, the data fields available will be limited to only those fields that are valid for all databanks. You can search as many as four data fields at once using the Standard Query form. There are four drop down lists and four textbox elements. The drop down list is always to the left of the text box for which it refers. You can scroll through the data fields in the drop down list and pick the one that you want to query, one for each textbox. Extended query The extended query allows you to specify a search term for every single data field available in the databank. All the fields of all the databanks are displayed on the extended query form. The common fields and the databank specific fields are not displayed in any particular order and therefore make it easy to render the inclusion of too many databanks as pointless. Chances are, however, that if you always use the AllText or description field you will catch all the databanks. Every data field available to the databank or databanks being queried are displayed and available to search in the Extended Query form. To the right of each data field name is a textbox element where you can enter the search term. The extended query form provides an excellent querying mechanism for the more complicated queries. Take care that you use at least one data field that is valid for each databank and keep in mind that from this form there is no way of determining which databanks a particular data field is used for. Bengt Persson, Linköpings Universitet & Karolinska Institutet 37 Bengt Persson, Linköpings Universitet & Karolinska Institutet 38 Order of data fields The order in which a data field appears relates to the order that that data field will be checked. There is no way to reverse the order of data fields using the extended form. The standard form, while limiting you to only four data fields per query, allows you to specify the order that those fields will be checked. Clearly this is an advantage over the extended form. Expression query You can perform an expression query from the Results page. There is a text area section near the top of the page. Select this text area and enter the expression. After entering the expression to query click the "query expression" button. The Expression Query is a textarea element in the Results List page. You can use the expression query to combine, link, or refine the results of existing queries. All valid data fields are available to an expression query. Bengt Persson, Linköpings Universitet & Karolinska Institutet 39 Bengt Persson, Linköpings Universitet & Karolinska Institutet 40 (c) Bengt Persson, Linköping University & Karolinska Institutet 10
List of Operators Starting a FASTA search from SRS Operator &! Meaning Logical OR Logical AND Logical AND NOT (in colloquial English, BUT NOT) > < >^ >_ Link left Link right Get subtree defined by right operand (hierarchical links) Get leaf entries of the subtree defined by right operand (hierarchical links) Bengt Persson, Linköpings Universitet & Karolinska Institutet 41 Bengt Persson, Linköping University & Karolinska Institutet 42 FASTA form FASTA results Click here to see the alignment between the query sequence and this sequence Bengt Persson, Linköpings Universitet & Karolinska Institutet 43 Bengt Persson, Linköpings Universitet & Karolinska Institutet 44 (c) Bengt Persson, Linköping University & Karolinska Institutet 11
What is Entrez? Entrez Another user-friendly interface to databases, similar to SRS (Sequence Retrieval System) Other functionality http://www.ncbi.nlm.nih.gov Developed at NCBI (National Center for Biotechnology Information, USA) Bengt Persson, Linköpings Universitet & Karolinska Institutet 46 NCBI home page http://www.ncbi.nlm.nih.gov All databases, Results 1. Direct links to different services 2.Possibility to search database directly 3. Click here to enter Entrez Bengt Persson, Linköpings Universitet & Karolinska Institutet 47 Bengt Persson, Linköpings Universitet & Karolinska Institutet 48 (c) Bengt Persson, Linköping University & Karolinska Institutet 12
All databases, Results, cont. Protein databases, Results Bengt Persson, Linköpings Universitet & Karolinska Institutet 49 Bengt Persson, Linköpings Universitet & Karolinska Institutet 50 Protein results, Detailed view Protein results, Detailed view, cont. Bengt Persson, Linköpings Universitet & Karolinska Institutet 51 Bengt Persson, Linköpings Universitet & Karolinska Institutet 52 (c) Bengt Persson, Linköping University & Karolinska Institutet 13
Limitation of search results Taxonomy browser Bengt Persson, Linköpings Universitet & Karolinska Institutet 53 Bengt Persson, Linköpings Universitet & Karolinska Institutet 54 Taxonomy browser Taxonomy browser Bengt Persson, Linköpings Universitet & Karolinska Institutet 55 Bengt Persson, Linköpings Universitet & Karolinska Institutet 56 (c) Bengt Persson, Linköping University & Karolinska Institutet 14
Taxonomy browser Taxonomy browser Bengt Persson, Linköpings Universitet & Karolinska Institutet 57 Bengt Persson, Linköpings Universitet & Karolinska Institutet 58 (c) Bengt Persson, Linköping University & Karolinska Institutet 15