TITLE PAGE - CURRENT PROTOCOLS IN BIOINFORMATICS

Size: px
Start display at page:

Download "TITLE PAGE - CURRENT PROTOCOLS IN BIOINFORMATICS"

Transcription

1 TITLE PAGE - CURRENT PROTOCOLS IN BIOINFORMATICS Unit Number: Unit Title: DALI structural comparison of proteins Authors: Liisa Holm *, Sakari Kääriäinen, Dariusz Plewczynski 1, Chris Wilton Address(es): Institute of Biotechnology, University of Helsinki, Viikinkaari 5, P.O. Box 56, Helsinki, FI-00014, Finland, 1 Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, Pawinskiego 5a Street, Warsaw, Poland. Telephone: Fax: [email protected] 3-6 Keywords: classification of protein folds database searching distance geometry pattern recognition protein structure alignment Abstract:(up to 150 words) The Dali program is widely used for carrying out automatic comparisons of protein structures determined by X-ray crystallography or NMR. The most familiar version is the Dali server, which performs a database search comparing a query structure supplied by the user against the database of known structures (PDB) and returns the list of structural neighbours by . The more recently introduced DaliLite server compares two structures against each other and visualizes the result interactively. The Dali database is a structural classification based on precomputed all-against-all structural similarities within the PDB. The resulting hierarchical classification can be browsed on the web and is linked to protein sequence classification resources. All Dali resources use an identical algorithm for structure comparison. Users may run Dali using the Web, or the program may be downloaded to be run locally on Linux computers. 1

2 UNIT INTRODUCTION The rapidly growing number of known tertiary structures makes protein structure comparison important. In the center of biological interest are evolutionary relationships inferred from quantifiable similarities between proteins. Sequence similarity searches are able to detect evolutionary relationships down to a sequence identity of about 25 %. Below this level of sequence identity starts the twilight zone of similarity. Comparing structures can help to extend the validity of an evolutionary relationship between proteins through the border of twilight zone. This is because the structure of proteins is much better preserved during evolution than the sequence (Chothia and Lesk, 1986). By searching structural databases, molecular biologists can gain a considerable amount of information about connections between protein families, which are unseen using sequence alone. The prediction of protein function based on the structure aims at the unification of protein families into larger sets (super-families). Functionally divergent families classified into the same super-family typically exploit a conserved mechanical or biochemical mechanism that has been adapted to different cellular processes and substrates (Holm and Sander 1996). Inferring complex conserved properties is the basic reason to provide the systematic structure-structure comparison and classification of available proteins. Dali is a tool for both pair-wise structure comparison and structure database searching. It is equipped with a web interface to easily view the results, multiple alignments and threedimensional superimpositions of structures. The method is fully automated and identifies very sensitively common structural cores and structural resemblances. Dali uses 3D Cartesian coordinates of Cα atoms of each protein in order to calculate residue-residue distance matrices. A similarity score for these sets is defined as a weighted sum of equivalent intra-molecular distances. As a result one gets the scored list of all important structural alignments. The method allows for any length of gaps (i.e., insertions or deletions) and detects similarities involving geometrical distortions. Dali is easily accessible through web servers. Table 1 outlines the relationships of Dali resources. Use the DaliLite server to compare two known structures to each other and visualize the superimposition (Basic protocol 1: Interactive DaliLite server for pairwise comparison). This server requires two sets of atomic coordinates in PDB format as input. The comparison is usually 2

3 quite fast, and results should be returned after about one minute. A search against all known structures takes much longer, and can be performed using the DALI Server (Basic protocol 2: Dali server for database searching). This server is routinely used by protein crystallographers to compare a newly solved structure against the database of known structures in order to detect possible evolutionary relationships. If you are interested in the structure neighbours of proteins already in the PDB, you can find them in the Dali database. Its web interface allows you to browse the hierarchical classification of protein structures based on allagainst-all comparisons of known structures (Basic protocol 3: Dali database). In the case that you have many query structures, you may wish to download the DaliLite standalone program package for your convenience. This uses the same comparison algorithms as the Dali web servers but can be run locally on Linux computers (Alternate protocol 1: Comparing two structures using DaliLite; Alternate protocol 2: Comparing large sets of structures using DaliLite; Support protocol: Obtaining DaliLite). BASIC PROTOCOL 1 Protocol Title: Interactive DaliLite server for pairwise comparison Introduction: This interactive web server provides a quick, convenient means to check the structural alignment of two known protein structures and to visualize their structural superimposition. You need only to know the PDB identifiers of the structures. It is also possible to upload your own structures. A fast server can be accessed at Necessary Resources (list) Hardware A computer connected to the Internet. software A web browser (Internet Explorer, Netscape etc.). Rasmol or other PDB viewer. files none. (User PDB files are optional.) Protocol Steps: 1. You need two inputs to run this server - these are intuitively called First and Second structures in the submission page. You can either enter PDB entry codes (for known 3

4 structures), or upload your own coordinate files in PDB format. You can search for the PDB entry codes of known structures for your query protein using the NCBI-Entrez, SRS and other similar database cross-linking resources. If you have a structure file containing a number of different chains, you can select a specific chain in the submission page. If no chain is specified, structural comparisons will be performed on every chain in the structure file, and it will take much longer to return your results. Size limits for the comparison are: at least 30 amino acid residues per chain, at most The results summary page looks like Figure 1. For each chain in the query structure, a table is presented showing significant hits against each chain of the subject structure, with the best hit for each chain highlighted. Note that the First structure is named mol1, the Second is mol2, chain A of the First structure is mol1a, and so on. Suboptimal alignments are reported; the highest scoring alignment per any pair of chains is highlighted. The tables show: Z-Scores, number of aligned residues, root-mean-square deviation (RMSD) of alpha-carbon atoms, sequence identity between the two chains. Links are then given for: a. the structural alignment, including DSSP secondary structure information, between the indicated chains (Figure 2) b. a coordinates file of the superposed alpha-carbon traces for the indicated chains, viewable in Rasmol or other PDB structure viewer (Figure 3). Only the C-alpha coordinates are transmitted, therefore use the backbone display in Rasmol! Note that the first structure chain is renamed Q, and Second structure chain S. c. the First structure file (unchanged), followed by the Second structure file with all ATOM coordinates of the indicated chain rotated/translated to match the First structure - to view the full superposition, either open both files in your structure viewer, or concatenate the two files and view the resulting file. 4

5 You can build a superimposition of multiple Second structures onto the same First structure. This is useful in studying a large superfamily that has many distantly related known structures. Essential and variable structural elements are easily seen in the multiple superimposition. This option preserves ligands that might have been co-crystallised with the protein as well as showing quaternary structure interactions. Note that only the indicated chains are superposed (eg: mol1a with mol2b), however, any other chains will still be contained in the structure files, so you may wish to remove unwanted chains using a text editor before viewing the structures. The following files can also be viewed: the rotation/translation matrices for each alignment, a list of structurally equivalent residue ranges, a log file indicating all the steps taken by the DaliLite application. These are included for completeness but are uninformative to most users. Finally, at the bottom of the results page, a summary of your two inputs is given, including header information and a report of the chains found within each structure file. If these data are not as expected, it is apparent that file upload (rather than the program itself) failed for one reason or another. BASIC PROTOCOL 2 Protocol Title: Dali server for database searching Introduction: The Dali server is an easy-to-use network service for comparing protein structures. It is routinely used by structural biologists to compare a newly solved structure against previously known structures. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. You submit the coordinates of a query protein structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural neighbours is mailed back to you. If you want to know the structural neighbours of a protein already in the Protein Data Bank, you can find them in the Dali database (Basic Protocol 3). The Dali server is hosted by the EBI ( Necessary Resources (list) hardware A computer connected to the Internet. 5

6 software A web browser (Internet Explorer, Netscape, etc.) files Atomic coordinates of protein structure in PDB-format. Protocol Steps: 1. Structure submission can be done either interactively or by . a. Upload your coordinate file through the web page and press the Submit button. The results will be sent to the address provided by you. Type carefully. b. a message containing the PDB entry to [email protected]. The submission will fail unless the message is plain text, as encoded messages (e.g. MIME or BinHex) are rejected by the server. 2. You will receive an with the results. Expect a reply within a few days of submission; in case of longer delays, please notify [email protected]. The comparison is carried out against a representative subset of PDB structures. The set is constructed so that the sequence identity between any two chains in the set should be less than 25 %. The summary of structural neighbours looks like Figure Use the DaliLite server for pairwise comparison (Basic Protocol 1) to visualize interesting pairs of structures. BASIC PROTOCOL 3 Protocol Title: Dali database Introduction: The Dali database is based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB). The classification and alignments are automatically maintained and continuously updated using the Dali search engine. The database currently (Sep 2005) contains 10,562 representative structures. The set of representative structures is called PDB90 and it contains all polypeptide chains from the PDB with less than 90 % sequence identity to each other. The representative structures are decomposed into 14,020 domains. Hierarchical clustering reveals 3,107 fold types. Fold types are defined as clusters of structural neighbours in fold space with average pairwise Dali Z-scores above 2. The threshold has been chosen empirically and groups together structures which have topological similarity. Higher Z-scores correspond to structures which agree more closely in architectural detail. The Fold Index lists all chains in PDB90 ordered by structural similarity. The order is that of a 6

7 dendrogramme derived in the hierarchical clustering. Fold types are indexed. A heavier branch with more members is listed above a branch with fewer members. Domains that are structural neighbours are found next to each other. Fold types with similar structural motifs are also found next to each other. Necessary Resources (list) hardware A computer connected to the Internet. software A web browser. files none. Protocol Steps: 1. Browsing: The Dali database is accessed from You can enter into the fold classification from the Fold Index or by querying for a text term that occurs in the COMPND records of the PDB entries (Figure 4). More sophisticated queries should be performed using specialized search engines such as NCBI- Entrez or SRS. Figure 5 shows the result for a query for estradiol receptor. The leftmost column shows that there are two PDB entries for estradiol receptor, namely 1qkt and 1qku. The latter has three chains named A, B and C. The second column indicates that representative of all is the chain 1qkuA. The third column shows that 1qkuA belongs to domain fold class 342. Clicking on the fold-link shows a section of the Fold Index. Here, you see all members of the fold class at a glance (Figure 6). Domains in the Fold Index are annotated by the sequence family that they belong to. Sequence families are defined in the Adda database (Heger & Holm, 2003) based on shared sequence motifs. Adda unifies many structural neighbours with little overall sequence similarity in terms of percent-identity. As can be seen from Figure 6, the nuclear receptors are unified by Adda into one family. The interact link shows details about the structural neighbours of each domain. The list of neighbours of estradiol receptor is shown in Figure 7. Structural alignments between estradiol receptor and its neighbours can be displayed as 1D alignments or in 3D superimposition. Select a 7

8 few structures (click on check-boxes). The Structure Alignment button shows a multiple structure alignment similar to a sequence alignment. Secondary structure definitions are shown below the amino acid sequences. Typically secondary structure assignments agree very well even though sequence identity is low (Figure 8). The Structure/Sequence alignment button augments the structural alignment by related sequences, which are detected by PSI-Blast and stored in the Adda database (Heger & Holm 2003). This view is useful for checking sequence patterns that are conserved across distantly related protein families. Conserved functional sites are a strong hint at common evolutionary origins. In the alignment, residues are coloured if the frequency of the amino acid type in the column is above 50 %. The superimposed C-alpha traces of the selected structures can be viewed in 3D using Rasmol or other PDB viewer. The 3D superimposition button launches a Rasmol script, if your browser is appropriately configured. Use the PDB format button to download the C-alpha coordinates of selected neighbours superimposed onto the query structure. 2. External links: External sites may link directly to the query engine of the Dali database. To make a link from a PDB identifier to the database, use the call where the search_term is a PDB identifier (e.g. 2kau or 2kauC ). 3. Data downloads: For non-interactive use, we provide comprehensive computer-readable database-dumps for large-scale studies. These are accessed from the link to Downloads from the home page of the Dali database. ALTERNATE protocol 1 Protocol Title: Comparing two structures using DaliLite Introduction: This simple protocol is the command-line version of that performed online by the DaliLite server for pairwise structure comparison (Basic Protocol 1). The inputs are two protein structures 8

9 in PDB format. The output is a set of HTML files, which should be viewed from a browser. Rough timings are from a few seconds up to tens of seconds per pairwise comparison. Necessary Resources (list) hardware Linux workstation (Sun, Alpha, Silicon Graphics, PC). software DaliLite program, Perl interpreter, web browser (Netscape, Internet Explorer, Opera etc.). files Two protein structures in PDB format files. Protocol Steps: The option to run DaliLite is DaliLite pairwise <pdbfile1> <pdbfile2>, where the arguments <pdbfile1> <pdbfile2> should be replaced by the PDB file names, for instance: Linux-prompt> perl DaliLite -pairwise /pdb/1wsy.brk /pdb/2kau.brk > log Linux-prompt> netscape index.html The program computes the structural alignments for all chains in pdbfile1 against all chains in pdbfile2, and creates a set of HTML pages linked from the top page 'index.html'. The first structure is called 'mol1' and the second 'mol2'. All data are stored in the current work directory, overwriting any previous results generated using this option. The output is identical to that from Basic Protocol 1 (Figures 1-3). ALTERNATE PROTOCOL 2 Protocol Title: Comparing large sets of structures using DaliLite Introduction: This is a more advanced protocol that allows the systematic comparison of large sets of structures. It performs the structural comparisons between all pairs of two user-provided lists of structures. The results are stored in an internal alignment format which can be processed by computer programs for further statistical analysis. There is an option to re-format the results as human-readable output. Necessary Resources (list) hardware Linux workstation (Sun, Alpha, Silicon Graphics, PC). software DaliLite program, Perl interpreter. files Protein structures in PDB format files. 9

10 Protocol Steps: 1. All structures that one wants to compare must be prepared using the -readbrk option. These structural data are stored in a DAT subdirectory under the DaliLite home directory. You must supply a unique identifier for the structure as the second argument. The identifier must be PDB-style, i.e., four characters long. Linux-prompt> perl DaliLite -readbrk <pdbfile> <pdbid> Examples: DaliLite -readbrk 3ubp.brk 3ubp DaliLite -readbrk /data/pdb/3ubp.brk 3ubp DaliLite -readbrk /data/pdb/pdb3ubp.ent 3ubp The program automatically generates a data file for each chain in the PDB entry. In the above examples, 3ubpA.dat, 3ubpB.dat and 3ubpC.dat are created in the DAT subdirectory. The system uses the DSSP program by Kabsch and Sander (included in the distribution package) to parse the information out of the PDB file. DSSP requires that the complete backbone (N, CA, C, O atoms) is present or it will skip the residue. The MaxSprout server ( can be used to build full coordinates from a C-alpha trace. The DAT file includes information about the CA coordinates, primary structure, secondary structure elements (by program DSSP, Kabsch & Sander 1986) and putative folding pathway of the protein (by program PUU, Holm & Sander 1994). The first line of a properly formed DAT file looks like this: >>>> 1xg8A EHEHEEH order secondary structure elements number of beta-strands (E) number of helices (H) total number of secondary structure elements number of residues chain identifier If reading the coordinates failed, for any reason, you only find lots of zeros on the first line of the DAT file. 10

11 2. Generate structural alignments. There are options for pairwise, one against many, and many against many comparisons. The structures are specified using the unique identifiers, which were introduced in the previous step when reading in PDB structures using the readbrk option. Pairwise alignments of two structures are generated using exhaustive search (Parsi method). If the query structure has few secondary structure elements, the Soap method is used. Monte Carlo optimization is used for refinement (see Table 2). Alignment data is output to <code>.dccp files. An optimal and a number of suboptimal structural alignments are reported for each pair of structures. Similarities with a Z-score below zero are omitted from the output. The format is explained below: DCCP ppt 1bba second first structure number of aligned blocks Z-score sequence identity number of structurally equivalent residues root mean square deviation, in Angstroms, of CAs raw similarity score alignment 1 33 List of start and end residues of each aligned block in the first structure List of start and end residues of each aligned block in the second structure. If you want to construct a similarity matrix of a large set of proteins, you can extract the DCCP lines from the alignment data files (*.dccp). Note that several alternative alignments may be reported by protein pair. DaliLite has four options for alignment. The simplest is pairwise alignment (-align option) which takes two chain identifiers as argument, for example: Linux-prompt> perl DaliLite align 3ubpC 1gkpA The arguments are the unique-identifier with the chain-identifier appended. Output (alignment data) is automatically appended to the alignment file <code>.dccp You may also prepare a list of chain identifiers in a file, and the program will perform a pairwise comparison of the query to each structure in the list. For example, the list file mylist may have the following contents: 11

12 1bf6A 1j79A 1a4mA 1k70A 3ubpC The command to compare 3ubpC against each entry in the list file is then: Linux-prompt> perl DaliLite list 3ubpC mylist There is also an option for all-against-all comparison: Linux-prompt> perl DaliLite AllAll mylist The database search option (-search) uses the same shortcuts as the Dali server. Note that using this option is dependent on an up-to-date list of representative structures and the complete database of pre-computed structural alignments. This database resides in the DCCP/ subdirectory. Updates of the database are available for download. Click the Downloads link on the home page of the Dali database 3. Convert the alignment file to a readable format using the format option. The output of the alignment options is in DaliLite s internal format (files with the extension.dccp). The arguments to the format option are the identifier of the query structure, the alignment datafile, a listfile of valid identifiers, and the name of the output file. Only comparisons to structures listed in the listfile will be output. For example: Linux-prompt> perl DaliLite -format 3ubpC 3ubpC.dccp representatives.list 3ubpC.html The output file is in HTML-format. It contains the list of structural neighbours and links to the structural alignments similar to Figure 2). SUPPORT PROTOCOL Protocol Title: Obtaining the DaliLite standalone program Introduction: DaliLite is a stand-alone program package that can help researchers compare large numbers of protein structures for specialized projects efficiently and locally. The DaliLite distribution package contains a self-contained package of scripts and programs written in Perl and Fortran77. It has been tested on the Linux operating systems (RedHat distribution, version 6.0) and on Cygwin, a Linux-like environment for Microsoft Windows ( 12

13 The program code is distributed to academic users. Commercial use is prohibited. Necessary Resources (list) hardware Linux workstation. software Fortran-77 compiler, Perl5 interpreter. files none. Protocol Steps: 1. Download the academic licence agreement from print, sign and fax it to the address indicated. 2. Download the DaliLite program package by clicking on the link at the top of the above web page. The current distribution version (spring 2005) is Complete instructions for compilation and installation are available in the INSTALL file included in the DaliLite distribution. Instructions where to obtain the necessary software resources are included in the INSTALL file. Test examples are included in the distribution package. In brief overview, the installation proceeds as follows:... Unpack the distribution package: Linux-prompt> tar -zxvf DaliLite_2.4.1.tar.gz Linux-prompt> cd./dalilite_2.4.1/bin... If you are using cygwin (Linux emulator for Windows): Linux-prompt> mv -f Makefile_cygwin Makefile... Use a text editor to set proper HOMEDIR and ESCAPED_HOMEDIR in Makefile Linux-prompt> make clean Linux-prompt> make install Linux-prompt> make test Linux-prompt> cd../ Linux-prompt>./DaliLite -help GUIDELINES FOR UNDERSTANDING RESULTS: 13

14 Like in sequence analysis, the goal of structural database searching is usually to identify homologous proteins which might provide clues to the function of the query protein. Homology means descent from a common ancestor. We can infer homology from sequence or structural similarities that are so strong they would not be expected to have arisen by chance. The structural neighbours reported by Dali are ranked in order of decreasing structural similarity (Zscore). The Z-Score is the most important measure of quality of the structural alignment. Homologous proteins cluster at the top of the ranked list, but the boundary between homologous and unrelated proteins varies from one family to another. As a general rule, a Z-score above 20 means the two structures are definitely homologous, between 8 and 20 means the two are probably homologous, between 2 and 8 is a grey area, and a Z-Score below 2 is not significant. The size of the proteins influences Z-scores - small structures will tend to have small Z- Scores, whereas a medium Z-Score for very large structures need not imply a biologically interesting relationship. Fold type also has an effect α/β proteins also usually have higher Z- scores than all-β proteins. Homologous proteins often share significant functional similarities. You should try to place the query structure in the context of a fold similarity dendrogram like Figure 6 before transferring function. There is always a best hit. Reciprocal nearest neighbours suggest more similar functions than if your query joins a whole branch of functionally diverse proteins. For example, in the receptor dendrogram (Figure 6), sex hormone receptors form one sub-cluster while the orphan receptor is about equidistant from all the other receptors. RMSD is a measure of the average deviation in distance between aligned alpha-carbons. For sequences sharing 50% identity, this should be around 1.0. Dali maximizes a geometrical similarity score which is defined in terms of similarities of intra-molecular distances and is thus not primarily aiming to generate alignments with low RMSD. The RMSD and number of equivalent residues (NE) are reported, because they are traditional measures. Note that an alignment is better if it has both smaller RMSD and larger NE. If both RMSD and NE are smaller or both are larger, it is not possible to establish an order between the alignments. It is generally assumed that if two sequences share over 40% identity, then they are unambiguously homologous. However, two distantly-related proteins may share very low sequence identity but still be homologous, and conversely, two sequences may locally share as 14

15 much as 30% identity but be unrelated. Therefore, the percentage of sequence identity is only a guide. In lieu of numbers, it is often informative to inspect using Rasmol or another graphics program, whether the structurally equivalent regions form a continuous, compact structural core. If there are many structures known in a super-family, you can see secondary structure elements line up consistently in the multiple structure alignment view (Figure 8). Check especially for the conservation of known active site residues. You can study conservation profiles in multiple sequence alignments of protein families in sequence classification databases such as ADDA ( or PFAM ( Enzyme super-families have sharp signatures but binding domains can have very little sequence similarity. Without a sequence signature, it is harder to establish homology. COMMENTARY. Background Information Improved methods of protein engineering, crystallography and NMR spectroscopy have led to a surge of new protein structures deposited in the Protein Data Bank (PDB). At the end of 2004, the Protein Data Bank (PDB) contained over 28,000 protein structures, and the structural genomics initiative aims to provide a structure for each major protein family within a decade. This wealth of data needs to be organised and correlated using automated methods. Nearly all proteins have structural similarities to other proteins. General similarities arise from principles of physics and chemistry that limit the number of ways in which a polypeptide chain can fold into a compact globule. Evolutionary relationships result in surprising similarities (which are even stronger than similarity due to convergence caused by physical principles). Because structure tends to diverge more conservatively than sequence during evolution, structure alignment is a more powerful method than pairwise sequence alignment for detecting homology and aligning the sequences of distantly-related proteins. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences and may help to infer functional properties of hypothetical proteins. Automatic methods enable exhaustive all-against-all structure comparisons. As a result, each structure in the PDB can be represented as a node in a graph where similar structures are neighbours of each other and structurally unrelated proteins are not neighbours. Clustering the 15

16 graph at different levels of granularity removes redundancy and aids navigation in protein space. At long range, the overall distribution of folds is dominated by secondary structure composition (for example, all-alpha or alternating alpha/beta). At intermediate range, clusters are related by shape similarity that does not necessarily reflect similarity of biological function (for example, globins and colicin A). At close range, clusters represent protein families related through strong functional constraints (for example, hemoglobin and myoglobin). Evolutionary relationships can be recovered by searching for continuous neighbourhoods (Dietmann & Holm 2001). In order to identify natural groupings of any set of objects, one needs a measure of distance or similarity. Structure comparison programs derive a structural alignment, which maximizes similarity or minimizes distance. The alignment defines a one-to-one correspondence of amino acid residues (sequence positions) in two proteins. This is analogous to sequence alignment except that the notion of (dis)similarity is much more complex between three-dimensional objects than between linear strings. For example, the conformation of a point mutant differs from of the wild-type protein only locally and only be a few tenths of an Angstrom. Much larger deviations are observed in pairs of homologous proteins: with increasing sequence dissimilarity, small shifts in the relative orientations of secondary structure elements accumulate and reach several Angstroms and tens of degrees. At the largest evolutionary distances, only the topology of the fold or folding motif is conserved; topology here means the relative location of helices and strands and the loop connections between these. Deviations can be even larger and qualitatively different when structural similarity is the result of convergent rather than divergent evolution. In particular, convergent evolution may result in similar 3D folds that differ in the topology of loop connections. The modular architecture of proteins presents another complication. Large proteins can be decomposed into semi-autonomous, globular folding units called domains. Domains are often evolutionarily mobile modules and may carry specific biological functions. Because a common domain may be surrounded by completely unrelated domains, most structure comparison methods search for local similarities. Given a measure of similarity or distance, the algorithmic problem is to find the set of corresponding points in two structures that optimise this target function. Just as there is much latitude in the formulation of the structure comparison problem, many different types of optimization algorithm have been employed. Similarity measures of the sum-of-pairs form and subgraph isomorphism formulations of the structure comparison problem belong to the NP- 16

17 complete class of problems and one has to resort to heuristics for practical algorithms. Heuristic approaches do not aim for provably correct solutions, gaining computational performance at the potential cost of accuracy or precision. Many programs use a hierarchical approach, where promising seeds for alignment are identified using local criteria based on dynamic programming, distance difference matrices, maximal common subgraph detection, fragment matching, geometric hashing, unit vector comparison or local geometry matching (reviewed by Sierk & Kleywegt 2004). The initial set of correspondences is then optimised globally using methods such as double dynamic programming, Monte Carlo algorithms or simulated annealing, a genetic algorithm or combinatorial searching. Recently, it has been proved that brute-force exhaustive scanning of the six degrees of freedom from rotations and translations in rigid-body superimposition leads to a polynomial-time approximation algorithm for the problem of determining the maximum number of C-alpha atom pairs that can be superimposed within a given RMSD at a given error. However, this solution is too computationally demanding for practical application (Kolodny & Linial 2004). The Dali method is based on a sensitive measure of geometrical similarities defined as a weighted sum of similarities of intra-molecular distances (see Appendix for details). 3D shape is described with a matrix of all intramolecular distances between the C-alpha atoms. Such a distance matrix is independent of coordinate frame but contains more than enough information to reconstruct the 3D coordinates, except for overall chirality, by distance geometry methods. Imagine sliding a (transparent) distance matrix on top of another one. Depending on the register of the two matrices, similar substructures will stand out as submatrices with similar patterns. Structurally equivalent regions can be filtered out with a fixed cutoff on acceptable differences of intramolecular distances or, as we prefer, with a continuous function defined in terms of relative distance deviations. The common structure is revealed when two distance matrices brought into register by keeping only rows or columns corresponding to the structurally equivalent residues (Figure 9). The Dali program has a modular architecture, where the structure alignment / database searching problem is approached by a cascade of algorithms. The Dali package consists of many Fortran programs and Perl5 scripts. The program flow is controlled by a Perl wrapper script that calls other programs as needed. Each program implements pairwise structure comparisons using different algorithms. References for these programs are given in Table 2. The goal of a database 17

18 search is to find all structures that are significantly similar to the query. A conceptual map of fold space is determined by the pre-computed all-against-all structural alignments between all representative structures. Based on this map, the database search by the Dali server tries shortcuts to quickly place the query structure in a known location of fold space. If a strong match is found to one database structure, then the search can be restricted to the pre-computed neighborhood of this structure. Fast but approximate methods can quickly find obvious structural resemblances. Slower but most sensitive algorithms need then only be applied to a smaller set of candidates. DaliLite has the core algorithmic functionality of the Dali server. The DaliLite programs perform systematic pairwise comparisons without shortcuts and can therefore be run independently of database updates. Applications The exponential growth in the number of newly solved protein structures makes correlating and classifying the data an important task. Dali is now used routinely by crystallographers world-wide to screen the database of known structures for similarity to newly-determined structures. The application of Dali to newly released structures led to a string of discoveries of unexpected distant evolutionary relationships. For example, a remarkably diverse set of distant relatives of urease were identified based on structural and sequence analysis (Holm & Sander 1997); several blind fold predictions have since been verified by experimental structure determination. Comparison to other techniques Dali was ranked at top among seven protein structure comparison methods and two sequence comparison programs that were evaluated on their ability to detect either protein homologues or domains with the same topology (fold) as defined by the CATH structure database (Novotny et al. 2004). Critical Parameters The Dali program has been run successfully with default parameters since its inception (Holm & Sander 1993). The results usually agree quite well with human experts assessment. For example, the dendrogram of structural similarities by Dali has similar topology to the SCOP hierarchical classification based on visual analysis and biological knowledge (Dietmann & Holm 2001). 18

19 While we strongly advise against changing parameter values from their default values, a description of the numerical parameters that go into the algorithms is given in the Appendix. Troubleshooting Similarity not reported. The Dali system reports only similarities above an empirically chosen threshold of Z=2. This captures most cases of topological similarity of globular domains. In some fold types, though, also structural similarities between parts of globular domains score above this threshold. Known similarity not reported. The Dali server currently reports similarities only to PDB25 representatives. The purpose of using PDB25 is to suppress the redundancy of output due to multiple structure determinations of mutants or of the same protein in slightly differing conditions. Thus, a particular PDB entry, which you know to be structurally similar to the query, might appear to be missing from the output list only because the representative structure is a different PDB entry. The Dali database reports similarities between PDB90 representatives. The PDB90 representatives for any PDB entry can be found by using the search functionality on the homepage of the Dali database ( Empty result. The Dali database includes all peptide chains from the PDB, except Cα-only entries and chains that are shorter than 30 residues. DaliLite requires that the backbone atoms (N, CA, C, O) must be complete. You can build a complete backbone model from the CA-trace using the MaxSprout Server. The Dali server runs MaxSprout automatically, if only a CA-trace is submitted. The submission to the Dali server will fail unless the message is plain text, as encoded messages (e.g. MIME or BinHex) are rejected by the server. Complex comparison. Each chain is compared separately. For example, similarities to structural units made up of a dimer of two different chains (say, A and B) will not be detected. There is a way around this limitation, which requires manual editing of the PDB entry by the user: renumber the residues in a sequential order and give all chains the same chain identifier. Multidomain proteins. It is advisable to break a multidomain query structure into its constituent domains, because the Dali server is designed to report all matches only to the firstfound structural neighbourhood. That is, if the query protein has one common domain that is found by the fast filters, the search termination criteria are satisfied without a more unique domain in the same query being tested systematically. 19

20 Which Z-score threshold implies homology? This varies for each protein family (Dietmann & Holm 2001). The topology of the fold dendrogramme (hierarchical clustering of domains based on structure similarity) represents evolutionary relationships fairly faithfully, so that homologous structures are found collected in one branch of the tree, but the borders of the homologous families might at Z-scores around 4 (helix-turn-helix DNA-binding domains) or around 14 (TIM barrels). Technical failures. The Dali server at the EBI is running automatically with minimal human administrative effort. The assumption that the fold space graph is complete is critical to exhaustive database searching but can sometimes be violated for the following reasons: unpredictable failure of the database update (black-outs, computer crashes, network failures, over-running disk space, etc. ), failure to process the PDB entry (for example, chains longer than 1000 residues are not handled well), program bugs. Please report unexpected behaviour to [email protected]. LITERATURE CITED Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J. 5, Dietmann S, Holm L (2001) Identification of homology in protein structure classification. Nature Structural Biology 8, Heger A, Holm L (2003) Exhaustive enumeration of protein domain families. J Mol Biol 328, Holm L, Sander C (1997) An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins 28, Holm, L., & Sander, C. (1994). Parser for protein folding units. Proteins, 19, Holm, L., & Sander, C. (1996). Mapping the protein universe. Science 273, Kabsch W. & Sander C. (1983) Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features, 22: Novotny M, Madsen D, Kleywegt GJ (2004) Evaluation of protein fold comparison servers. Proteins 54, Sierk ML, Kleywegt GJ (2004) Deja vu all over again: finding and analyzing protein structure similarities. Structure 12,

21 Kolodny R, Linial N (2004) Approximate protein structural alignment in polynomial time. PNAS 101, Key References (optional) Holm, L., & Sander, C. (1993). Protein structure comparison by alignment of distance matrices. J. Mol. Biol., 233, The original Dali reference. Holm, L., & Sander, C. (1996). Mapping the protein universe. Science 273, Reviews structure comparison methodology, key results and implications. Holm L & Park J (2000). DaliLite workbench for protein structure comparison. Bioinformatics 16: The main DaliLite reference, which should be cited in any publication in which you use DaliLite results. Internet Resources (optional) The interactive DaliLite server for comparing two structures to each other and visualizing the structural superimposition. The Dali server for comparing a new structure against the database of known structures. The Dali database for browsing structural and sequence neighbours of proteins. The ADDA classification assigns every residue of known protein sequences into a domain family and interactively visualizes the sequence neighbours of any query protein in a multiple alignment. SRS at the EBI and Entrez at NCBI are comprehensive search engines cross-reference the PDB identifier of a protein to many other databases. FIGURE LEGENDS 1. Results summary page of the DaliLite server. 21

22 2. Structural alignment by the DaliLite server. 3. Click on the Superimposed C-alpha traces link to view the superimposition in Rasmol (stereo view). 4. Clicking on the browse link in Figure 3 leads to the list of structural neighbours of estradiol receptor. Hits 1-21 are members of the same fold class comprising nuclear receptors. The last hit (number 22) has a much lower Z-score than the nuclear receptors and represents a biologically non-interesting hit that matches in a helical bundle motif. 5. Home page of the Dali database. The user has typed in Estradiol receptor in the querybox. 6. The result of the query for estradiol receptor structures. 7. A large number of nuclear receptors belong to the same fold class as estradiol receptor. Where a sequence-structure-domain mapping is available, they have all been classified into the same Adda domain family (numbered 523). 8. Multiple structure-alignment of estradiol receptor and selected structural neighbours. Notation: three-state secondary structure definitions by DSSP (reduced to H=helix, E=sheet, L=coil) are shown above the amino acid sequence. 9. Left: Distance matrix representation of two different proteins, one in the upper and the other in the lower triangle. Right: Structural alignment identifies a one-to-one correspondence between a subset of residues. The respective sub-matrices of the distance matrix display similar contact patterns. 22

23 Table 1: Overview of Dali resources and their relations. Dali server DaliLite Dali database Adda database Input One PDB structure Two (lists of) PDB structures All PDB structures NRDB (all protein sequences) Steps Database search Pairwise - Remove redundancy - Remove redundancy using cascaded algorithms structure comparison - All-against-all structure comparison - All-against-all sequence comparison - Domain - Domain decomposition decomposition - Clustering - Clustering Output Structure neighbours of query Structure neighbours of query Protocol Basic Protocol 2 Basic Protocol 1 Alternate 1-2 Support protocol Protein fold classification Basic Protocol 3 Protein family classifcation Linked to Dali database 23

24 Table 2: Program modules of the Dali suite. Program Purpose Reference DSSP dsspcmbi Parse PDB entry. Define secondary structure elements. Kabsch W & Sander C (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: Puu Derive a tree of compact substructures to guide alignment. Holm L & Sander C (1994). Parser for protein folding units. Proteins 19: Wolf Very fast filter to identify obvious similarities. Holm L & Sander C (1995). 3-D lookup: fast protein structure database searches at 90% reliability. ISMB'95: Soap Used to align structures with little secondary structure. Falicov A & Cohen FE (1996). A surface of minimum area metric for the structural comparison of proteins. J Mol Biol 258(5): Parsi Sensitive branch-and-bound alignment algorithm. Holm L & Sander C (1996). Mapping the protein universe. Science 273: Dalicon All alignments generated by the above methods (with different objective functions) are refined using a Monte Carlo algorithm that maximizes the Dali score. Holm L & Sander C (1993). Protein structure comparison by alignment of distance matrices. J Mol Biol 233:

25 APPENDIX A. OBJECTIVE FUNCTION Here we describe the objective function of the Dali algorithm and the normalization of structural similarity scores to obtain the Z-score. Let s consider two proteins labeled A and B. The match of two substructures is evaluated using an additive similarity score S of the form: Equation 1 S = ( i, j) L L i= 1 j= 1 ϕ, where i and j label residues, L is the number of matched pairs (the size of each substructure), and φ is a similarity measure based on some pairwise relationship, here on the Cα-Cα distances d, d A ij B ij. Unmatched residues do not contribute to the overall score. For a given functional form of ϕ ( i, j), the largest value of S corresponds to the optimal set of residue equivalences. Structural similarity searches here for the largest common substructure between two proteins. So one need to define a similarity measure that balances two contradictory requirements: maximizing the number of equivalenced residues and that of minimizing structural deviations. The use of relative rather than absolute deviations of equivalent distances is tolerant to the cumulative effect of gradual geometrical distortions. In Dali, the residue-pair score φ has the form of Equation 2: A B d d ϕ, * ij, d ij ij ij * Equation 2 ( i j) = θ w( d ) where * d ij is the average of d, d A ij B ij, θ is the similarity threshold, and w is an envelope function. Dali uses the value of θ equal to 0.2. Since pairs in the long distance range are abundant but less discriminative, their contribution is weighted down by the envelope function 2 () r exp( r 2 ) w =, where α = 20 Å, calibrated on the size of a typical domain. We report α alignments generated using the similarity measure of Equation 2, imposing the constraint of strictly sequential alignment. The resulting raw Dali score describing the structural similarity is given by Equation 3: Equation 3 S( A, B) = i core j core 0.2 d A ij d d * ij B ij d exp * ij Ο 20A 2, 25

26 where we explicitly inserted values of constants in the equation. The core is defined as a set of equivalences between residues in A and B proteins, which is analogous to a sequence alignment. For random pairwise comparison expected Dali-score (Equation 3) increases with the number of residues in compared proteins. In order to describe the statistical significance of a pairwise comparison score S(A,B) Dali server uses the Z-score defined as Equation 4 Z( A B) ( A, B) m( L) 0.5 m( L) S, =, where the denominator is an estimation of the average standard deviation of scores for various lengths of protein chains. The approximate experimental relation between the mean score m and the average length L = L L A (with L<400) of two proteins is given by: B Equation 5 ( ) m L L L L. The Z-score is computed for every possible pair of domains, and the highest value is reported as the Z-score of the protein pair. Possible domains are determined by the Puu algorithm (parser for Protein Unfolding Units). The algorithm recursively cuts a structure into smaller compact substructures at the weakest interface. A number of post-processing rules were introduced to supplement numerical criteria. The whole procedure is fully described in the original publication (Holm & Sander 1995). B. PROGRAM PARAMETERS The following parameters are set at the top of the main Perl script. The default values, as used by the Dali server, are indicated. These parameters mainly affect the pruning of search space in the database search. - $MINLEN=30. Structures with fewer residues are excluded from comparison. Dali was designed to detect similarities at the level of globular domain folding patterns that involve several secondary structure elements. It is not designed to compare conformations of short peptides. 26

27 - $MINSSE=2. The Wolf and Parsi methods reduce the complexity of the structural comparison by representing structures (partly) as secondary structure elements. If there are fewer than $MINSSE secondary structure elements in the protein, then the Soap method is used. - $cut0=20.0; $cut1=4.0; $cut2=2.0. The database search by the Dali server uses a set of rules to prune search space after a strong similarity has been found. If a similarity has been found that is above a Z-score equal to $cut0, then the search is stopped completely the query is structurally almost identical to the best hit. If similarities have been found with Z-scores above $cut1, then the search list is restricted to the first neighbour shells of all hits. If the best Z-score lies between $cut1 and $cut2, then the search list is restricted to the second neighbour shells of all hits. - $nbest=1. This parameter controls the number of hits in output. All hits with a Z- score above 2, or at least $nbest hits, will be reported. 27

28 FIGURE 1: SNAPSHOT FROM THE RESULTS PAGE OF DALILITE SERVER FOR THE COMPARISON OF 1F0KA TO 1F6DA. DaliLite Results SUBMISSION PARAMETERS Structure 1 1QKU Structure 2 1K4W SUBMIT ANOTHER Results of Structure Comparison Each chain of mol1 is compared structurally to each chain of mol2 using the DaliLite program. The Dali method optimises a weighted sum of similarities of intramolecular distances. Sequence identity and the root-meansquare deviation of C-alpha atoms after rigid-body superimposition are reported for your information only, they are ignored by the structural alignment method. Suboptimal alignments do not overlap the optimal alignment or each other. Suboptimal alignments detected by the program are reported if the Z-score is above 2; they may be of interest if there are internal repeats in either structure. In the C-alpha traces, the chains of the first and second structure are renamed 'Q' and 'S', respectively. The best match to each chain in the second structure is highlighted in the table below. Z-Scores below 2 are not significant. First Structure & Chain: mol1a No. Second Structure & Chain Z- Score Aligned Residues RMSD [Å] Seq. Identity [%] Structural Alignment Superimposed C-alpha Traces PDB Files: mol2 is rotated / translated to mol1 position 1 mol2a click here CA_1.pdb mol1_original.pdb mol2_1.pdb Additional data Rotation-translation matrices for superimposition Listing of structurally equivalent residue ranges View the log - this is only informative to experts 28

29 FIGURE 2: STRUCTURAL ALIGNMENT BETWEEN 1QKUA AND 1K4WA. NO 1: QUERY=MOL1A SBJCT=MOL2A Z-SCORE=22.2 DSSP lllllllllllhhhhhhhhhhhl..llll...llllllll..lllhhhhhh Query skknslalsltadqmvsalldae..ppil...yseydptr..pfseasmmg 44 ident Sbjct...tMSEIDRIAQNIIKSHleTCQYtmeelhqlawqthtyeEIKAyqSKSREALWQ 53 DSSP...lHHHHHHHHHHHHHHHhhLLLLlhhhhhllllllllhhHHHHhhLLLHHHHHH DSSP HHHHHHHHHHHHHHHHHHHLLLHHHLLHHHHHHHHHHHHHHHHHHHHHHHLLLLLLEELL Query LLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLF 104 ident Sbjct QCAIQITHAIQYVVEFAKRITGFMELCQNDQILLLKSGCLEVVLVRMCRAFNPLNNTVLF 113 DSSP HHHHHHHHHHHHHHHHHHLLHHHHLLLHHHHHHHHHHHHHHHHHHHHHHHEELLLLEEEE DSSP LlLLLEELLHHHHLLlHHHHHHHHHHHHHHHHHHLLLHHHHHHHHHHHHHHLLLLLLLll Query ApNLLLDRNQGKCVEgMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLss 164 ident Sbjct E.GKYGGMQMFKALG.SDDLVNEAFDFAKNLCSLQLTEEEIALFSSAVLISPDRAWLL DSSP L.LEEELHHHHHHHL.LHHHHHHHHHHHHHHHLLLLLHHHHHHHHHHHHLLLLLLLLL.. DSSP llhhhhhhhhhhhhhhhhhhhhhhhhhlllllhhhhhhhhhhhhhhhhhhhhhhhhhhhh Query tlksleekdhihrvldkitdtlihlmakagltlqqqhqrlaqlllilshirhmsnkgmeh 224 ident Sbjct...EPRKVQKLQEKIYFALQHVIQKNHLD...DETLAKLIAKIPTITAVCNLHGEK 219 DSSP...LHHHHHHHHHHHHHHHHHHHHHLLLL...LLHHHHHHLLHHHHHHHHHHHHHH DSSP HHHHHHLL...llLLLHHHHHLLLlllll Query LYSMKCKN...vvPLYDLLLEMLDahrlh 250 ident Sbjct LQVFKQSHpdivntLFPPLYKELFN DSSP HHHHHHHLhhhhhhLLLHHHHHHHL... 29

30 FIGURE 3: SUPERIMPOSED C-ALPHA TRACES OF 1QKUA AND 1K4WA, RASMOL STEREO VIEW. 30

31 FIGURE 4: HOME PAGE OF DALI DATABASE Dali fold classification Reference: L. Holm and C. Sander (1996) Mapping the protein universe. Science 273: The Dali database is based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB). The classification and alignments are automatically maintained and continuously updated using the Dali search engine. This is a preliminary test version dated May FOLD CLASSIFICATION Fold index - complete list of structural domains in PDB90 ordered by similarity. From the Fold index, you can browse the list of structural neighbours and alignments of each representative. Fold tree - a postscript picture SEARCH PDB CODES OR PROTEIN NAMES Enter PDB code or protein name to search for: estradiol receptor submit reset DOWNLOADS HELP L. Holm, Sep

32 FIGURE 5: TEXT QUERY RESULT Dali database query: estradiol receptor Click on the Repres. link to browse the structural neighbours and alignments of the representative. Click on the Fold link to view all members of the fold class. PDB chain Repres. Fold Compound 1qktA/ qkuA_1 342 ESTRADIOL RECEPTOR 1qkuA/ qkuA_1 342 ESTRADIOL RECEPTOR 1qkuB/ qkuA_1 342 ESTRADIOL RECEPTOR 1qkuC/ qkuA_1 342 ESTRADIOL RECEPTOR FIGURE 6: FOLD QUERY RESULT Dali fold query: 342 Fold index PDB code Adda Browse Compound qkuA_1 523 interact ESTRADIOL RECEPTOR kv6A_1 523 interact ESTROGEN-RELATED RECEPTOR GAMMA l2jA_1 523 interact ESTROGEN RECEPTOR BETA qknA_1 523 interact ESTROGEN RECEPTOR BETA e3gA_1 523 interact ANDROGEN RECEPTOR a28A_1 523 interact PROGESTERONE RECEPTOR nhzA_0 interact GLUCOCORTICOID RECEPTOR hg4A_1 523 interact ULTRASPIRACLE g2nA_1 523 interact ULTRASPIRACLE PROTEIN lv2A_1 523 interact HEPATOCYTE NUCLEAR FACTOR 4-GAMMA lbd_1 523 interact RETINOID X RECEPTOR gwxB_1 523 interact PPAR-DELTA fm9D_1 523 interact RETINOIC ACID RECEPTOR RXR-ALPHA kkqA_1 523 interact PEROXISOME PROLIFERATOR ACTIVATED RECEPTOR k4wA_1 523 interact NUCLEAR RECEPTOR ROR-BETA n83A_1 523 interact NUCLEAR RECEPTOR ROR-ALPHA dkfB_1 523 interact RETINOID X RECEPTOR-ALPHA lbd_1 523 interact RETINOIC ACID RECEPTOR GAMMA nq2A_0 interact THYROID HORMONE RECEPTOR BETA ie9A_1 523 interact VITAMIN D3 RECEPTOR m13A_0 interact ORPHAN NUCLEAR RECEPTOR PXR 32

33 FIGURE 7: STRUCTURAL NEIGHBOUR LIST FOR ESTRADIOL RECEPTOR 1qkuA: Structural Neighbours in PDB90 and structural alignments PDB90 is a representative subset of PDB chains that are less than 90 % sequence identical to each other No: the top 50 alignments, sorted by Z-score, are shown Chain: PDB entry code plus chain identifier raw-score: the sum of weighted similarities of intramolecular distances that Dali maximizes Z-score: normalized score that depends on the size of the structures %id: percentage of identical amino acids over all structurally equivalent residues lali: number of structurally equivalent residues rmsd: root-mean-square deviation of C-alpha atoms in the least-squares superimposition of the structurally equivalent C-alpha atoms Description: the COMPND record from the PDB entry No Chain raw-score Z-score %id lali rmsd Description 1 1qkuA ESTRADIOL RECEPTOR 2 1qknA ESTROGEN RECEPTOR BETA 3 1kv6A ESTROGEN-RELATED RECEPTOR GAMMA 4 1l2jA ESTROGEN RECEPTOR BETA 5 1e3gA ANDROGEN RECEPTOR 6 1a28A PROGESTERONE RECEPTOR 7 1lv2A HEPATOCYTE NUCLEAR FACTOR 4-GAMMA 8 1nhzA GLUCOCORTICOID RECEPTOR 9 2lbd RETINOIC ACID RECEPTOR GAMMA 10 1k4wA NUCLEAR RECEPTOR ROR-BETA 11 1ie9A VITAMIN D3 RECEPTOR 12 1nq2A THYROID HORMONE RECEPTOR BETA n83A NUCLEAR RECEPTOR ROR-ALPHA 14 1hg4A ULTRASPIRACLE 15 1g2nA ULTRASPIRACLE PROTEIN 16 1dkfB RETINOID X RECEPTOR-ALPHA 17 1fm9D RETINOIC ACID RECEPTOR RXR-ALPHA 18 1gwxB PPAR-DELTA 19 1m13A ORPHAN NUCLEAR RECEPTOR PXR 20 1lbd RETINOID X RECEPTOR 21 1kkqA PEROXISOME PROLIFERATOR ACTIVATED RECEPTOR 22 1n81A PLASMODIUM FALCIPARUM GAMETE ANTIGEN 27/25 Figure 8: Multiple structural alignment of estradiol receptor and selected neighbours. 33

34 FIGURE 9: DISTANCE MATRIX ALIGNMENT 34

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Lecture 19: Proteins, Primary Struture

Lecture 19: Proteins, Primary Struture CPS260/BGT204.1 Algorithms in Computational Biology November 04, 2003 Lecture 19: Proteins, Primary Struture Lecturer: Pankaj K. Agarwal Scribe: Qiuhua Liu 19.1 The Building Blocks of Protein [1] Proteins

More information

Bioinformatics for Biologists. Protein Structure

Bioinformatics for Biologists. Protein Structure Bioinformatics for Biologists Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Consensus alignment server for reliable comparative modeling with distant templates

Consensus alignment server for reliable comparative modeling with distant templates W50 W54 Nucleic Acids Research, 2004, Vol. 32, Web Server issue DOI: 10.1093/nar/gkh456 Consensus alignment server for reliable comparative modeling with distant templates Jahnavi C. Prasad 1, Sandor Vajda

More information

CSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10

CSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10 CSC 2427: Algorithms for Molecular Biology Spring 2006 Lecture 16 March 10 Lecturer: Michael Brudno Scribe: Jim Huang 16.1 Overview of proteins Proteins are long chains of amino acids (AA) which are produced

More information

Mascot Search Results FAQ

Mascot Search Results FAQ Mascot Search Results FAQ 1 We had a presentation with this same title at our 2005 user meeting. So much has changed in the last 6 years that it seemed like a good idea to re-visit the topic. Just about

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

Structure Tools and Visualization

Structure Tools and Visualization Structure Tools and Visualization Gary Van Domselaar University of Alberta [email protected] Slides Adapted from Michel Dumontier, Blueprint Initiative 1 Visualization & Communication Visualization

More information

MASCOT Search Results Interpretation

MASCOT Search Results Interpretation The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually

More information

Linear Sequence Analysis. 3-D Structure Analysis

Linear Sequence Analysis. 3-D Structure Analysis Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

1. Product Information

1. Product Information ORIXCLOUD BACKUP CLIENT USER MANUAL LINUX 1. Product Information Product: Orixcloud Backup Client for Linux Version: 4.1.7 1.1 System Requirements Linux (RedHat, SuSE, Debian and Debian based systems such

More information

Online Backup Client User Manual Linux

Online Backup Client User Manual Linux Online Backup Client User Manual Linux 1. Product Information Product: Online Backup Client for Linux Version: 4.1.7 1.1 System Requirements Operating System Linux (RedHat, SuSE, Debian and Debian based

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

RecoveryVault Express Client User Manual

RecoveryVault Express Client User Manual For Linux distributions Software version 4.1.7 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

Protein annotation and modelling servers at University College London

Protein annotation and modelling servers at University College London Nucleic Acids Research Advance Access published May 27, 2010 Nucleic Acids Research, 2010, 1 6 doi:10.1093/nar/gkq427 Protein annotation and modelling servers at University College London D. W. A. Buchan*,

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Online Backup Linux Client User Manual

Online Backup Linux Client User Manual Online Backup Linux Client User Manual Software version 4.0.x For Linux distributions August 2011 Version 1.0 Disclaimer This document is compiled with the greatest possible care. However, errors might

More information

Online Backup Client User Manual

Online Backup Client User Manual For Linux distributions Software version 4.1.7 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by

More information

DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

More information

Concepts of digital forensics

Concepts of digital forensics Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence

More information

IQ MORE / IQ MORE Professional

IQ MORE / IQ MORE Professional IQ MORE / IQ MORE Professional Version 5 Manual APIS Informationstechnologien GmbH The information contained in this document may be changed without advance notice and represents no obligation on the part

More information

Novell ZENworks 10 Configuration Management SP3

Novell ZENworks 10 Configuration Management SP3 AUTHORIZED DOCUMENTATION Software Distribution Reference Novell ZENworks 10 Configuration Management SP3 10.3 November 17, 2011 www.novell.com Legal Notices Novell, Inc., makes no representations or warranties

More information

Multiobjective Robust Design Optimization of a docked ligand

Multiobjective Robust Design Optimization of a docked ligand Multiobjective Robust Design Optimization of a docked ligand Carlo Poloni,, Universitaʼ di Trieste Danilo Di Stefano, ESTECO srl Design Process DESIGN ANALYSIS MODEL Dynamic Analysis Logistics & Field

More information

Hydrogen Bonds The electrostatic nature of hydrogen bonds

Hydrogen Bonds The electrostatic nature of hydrogen bonds Hydrogen Bonds Hydrogen bonds have played an incredibly important role in the history of structural biology. Both the structure of DNA and of protein a-helices and b-sheets were predicted based largely

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

An Introduction to Point Pattern Analysis using CrimeStat

An Introduction to Point Pattern Analysis using CrimeStat Introduction An Introduction to Point Pattern Analysis using CrimeStat Luc Anselin Spatial Analysis Laboratory Department of Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

More information

Cloud. Hosted Exchange Administration Manual

Cloud. Hosted Exchange Administration Manual Cloud Hosted Exchange Administration Manual Table of Contents Table of Contents... 1 Table of Figures... 4 1 Preface... 6 2 Telesystem Hosted Exchange Administrative Portal... 7 3 Hosted Exchange Service...

More information

UGENE Quick Start Guide

UGENE Quick Start Guide Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.

More information

Online Backup Client User Manual

Online Backup Client User Manual Online Backup Client User Manual Software version 3.21 For Linux distributions January 2011 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have

More information

Dell Enterprise Reporter 2.5. Configuration Manager User Guide

Dell Enterprise Reporter 2.5. Configuration Manager User Guide Dell Enterprise Reporter 2.5 2014 Dell Inc. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The software described in this guide is furnished under a software license

More information

FEAWEB ASP Issue: 1.0 Stakeholder Needs Issue Date: 03/29/2000. 04/07/2000 1.0 Initial Description Marco Bittencourt

FEAWEB ASP Issue: 1.0 Stakeholder Needs Issue Date: 03/29/2000. 04/07/2000 1.0 Initial Description Marco Bittencourt )($:(%$63 6WDNHKROGHU1HHGV,VVXH 5HYLVLRQ+LVWRU\ 'DWH,VVXH 'HVFULSWLRQ $XWKRU 04/07/2000 1.0 Initial Description Marco Bittencourt &RQILGHQWLDO DPM-FEM-UNICAMP, 2000 Page 2 7DEOHRI&RQWHQWV 1. Objectives

More information

PyRy3D: a software tool for modeling of large macromolecular complexes MODELING OF STRUCTURES FOR LARGE MACROMOLECULAR COMPLEXES

PyRy3D: a software tool for modeling of large macromolecular complexes MODELING OF STRUCTURES FOR LARGE MACROMOLECULAR COMPLEXES MODELING OF STRUCTURES FOR LARGE MACROMOLECULAR COMPLEXES PyRy3D is a method for building low-resolution models of large macromolecular complexes. The components (proteins, nucleic acids and any other

More information

file:///c /Documents%20and%20Settings/terry/Desktop/DOCK%20website/terry/Old%20Versions/dock4.0_faq.txt

file:///c /Documents%20and%20Settings/terry/Desktop/DOCK%20website/terry/Old%20Versions/dock4.0_faq.txt -- X. Zou, 6/28/1999 -- Questions on installation of DOCK4.0.1: ======================================= Q. Can I run DOCK on platforms other than SGI (e.g., SparcStations, DEC Stations, Pentium, etc.)?

More information

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler Structure 17 Supplemental Data EM-Fold: De Novo Folding of α-helical Proteins Guided by Intermediate-Resolution Electron Microscopy Density Maps Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert

More information

Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996

Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996 Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 1 Genetic algorithms Inspired

More information

Eventia Log Parsing Editor 1.0 Administration Guide

Eventia Log Parsing Editor 1.0 Administration Guide Eventia Log Parsing Editor 1.0 Administration Guide Revised: November 28, 2007 In This Document Overview page 2 Installation and Supported Platforms page 4 Menus and Main Window page 5 Creating Parsing

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

T cell Epitope Prediction

T cell Epitope Prediction Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments

More information

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/ CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction

More information

CHM 579 Lab 1: Basic Monte Carlo Algorithm

CHM 579 Lab 1: Basic Monte Carlo Algorithm CHM 579 Lab 1: Basic Monte Carlo Algorithm Due 02/12/2014 The goal of this lab is to get familiar with a simple Monte Carlo program and to be able to compile and run it on a Linux server. Lab Procedure:

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

NNMi120 Network Node Manager i Software 9.x Essentials

NNMi120 Network Node Manager i Software 9.x Essentials NNMi120 Network Node Manager i Software 9.x Essentials Instructor-Led Training For versions 9.0 9.2 OVERVIEW This course is designed for those Network and/or System administrators tasked with the installation,

More information

Visualizing molecular simulations

Visualizing molecular simulations Visualizing molecular simulations ChE210D Overview Visualization plays a very important role in molecular simulations: it enables us to develop physical intuition about the behavior of a system that is

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe [email protected] ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

More information

RNA Movies 2: sequential animation of RNA secondary structures

RNA Movies 2: sequential animation of RNA secondary structures W330 W334 Nucleic Acids Research, 2007, Vol. 35, Web Server issue doi:10.1093/nar/gkm309 RNA Movies 2: sequential animation of RNA secondary structures Alexander Kaiser 1, Jan Krüger 2 and Dirk J. Evers

More information

A QUICK OVERVIEW OF THE OMNeT++ IDE

A QUICK OVERVIEW OF THE OMNeT++ IDE Introduction A QUICK OVERVIEW OF THE OMNeT++ IDE The OMNeT++ 4.x Integrated Development Environment is based on the Eclipse platform, and extends it with new editors, views, wizards, and additional functionality.

More information

MultiExperiment Viewer Quickstart Guide

MultiExperiment Viewer Quickstart Guide MultiExperiment Viewer Quickstart Guide Table of Contents: I. Preface - 2 II. Installing MeV - 2 III. Opening a Data Set - 2 IV. Filtering - 6 V. Clustering a. HCL - 8 b. K-means - 11 VI. Modules a. T-test

More information

How To Use Query Console

How To Use Query Console Query Console User Guide 1 MarkLogic 8 February, 2015 Last Revised: 8.0-1, February, 2015 Copyright 2015 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents Query Console User

More information

Protein Studies Using CAChe

Protein Studies Using CAChe Protein Studies Using CAChe Exercise 1 Building the Molecules of Interest, and Using the Protein Data Bank In the CAChe workspace, click File / pen, and navigate to the C:\Program Files\Fujitsu\ CAChe\Fragment

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Network Scanner Tool R3.1. User s Guide Version 3.0.04

Network Scanner Tool R3.1. User s Guide Version 3.0.04 Network Scanner Tool R3.1 User s Guide Version 3.0.04 Copyright 2000-2004 by Sharp Corporation. All rights reserved. Reproduction, adaptation or translation without prior written permission is prohibited,

More information

WS_FTP Professional 12

WS_FTP Professional 12 WS_FTP Professional 12 Tools Guide Contents CHAPTER 1 Introduction Ways to Automate Regular File Transfers...5 Check Transfer Status and Logs...6 Building a List of Files for Transfer...6 Transfer Files

More information

ImageNow User. Getting Started Guide. ImageNow Version: 6.7. x

ImageNow User. Getting Started Guide. ImageNow Version: 6.7. x ImageNow User Getting Started Guide ImageNow Version: 6.7. x Written by: Product Documentation, R&D Date: June 2012 2012 Perceptive Software. All rights reserved CaptureNow, ImageNow, Interact, and WebNow

More information

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

More information

Bridging People and Process. Bridging People and Process. Bridging People and Process. Bridging People and Process

Bridging People and Process. Bridging People and Process. Bridging People and Process. Bridging People and Process USER MANUAL DATAMOTION SECUREMAIL SERVER Bridging People and Process APPLICATION VERSION 1.1 Bridging People and Process Bridging People and Process Bridging People and Process Published By: DataMotion,

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

TD 271 Rev.1 (PLEN/15)

TD 271 Rev.1 (PLEN/15) INTERNATIONAL TELECOMMUNICATION UNION STUDY GROUP 15 TELECOMMUNICATION STANDARDIZATION SECTOR STUDY PERIOD 2009-2012 English only Original: English Question(s): 12/15 Geneva, 31 May - 11 June 2010 Source:

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Numerical Algorithms Group

Numerical Algorithms Group Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Note : It may be possible to run Test or Development instances on 32-bit systems with less memory.

Note : It may be possible to run Test or Development instances on 32-bit systems with less memory. Oracle Enterprise Data Quality Customer Data Services Pack Installation Guide Release 11g R1 (11.1.1.7) E40736-01 October 2013 1 Installation This guide explains how to install Oracle Enterprise Data Quality

More information

A Business Process Services Portal

A Business Process Services Portal A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

IT Service Level Management 2.1 User s Guide SAS

IT Service Level Management 2.1 User s Guide SAS IT Service Level Management 2.1 User s Guide SAS The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS IT Service Level Management 2.1: User s Guide. Cary, NC:

More information

USER GUIDE MANTRA WEB EXTRACTOR. www.altiliagroup.com

USER GUIDE MANTRA WEB EXTRACTOR. www.altiliagroup.com USER GUIDE MANTRA WEB EXTRACTOR www.altiliagroup.com Page 1 of 57 MANTRA WEB EXTRACTOR USER GUIDE TABLE OF CONTENTS CONVENTIONS... 2 CHAPTER 2 BASICS... 6 CHAPTER 3 - WORKSPACE... 7 Menu bar 7 Toolbar

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

SAnDReS Tutorial 01 Prof. Dr. Walter F. de Azevedo Jr.

SAnDReS Tutorial 01 Prof. Dr. Walter F. de Azevedo Jr. 2015 Dr. Walter F. de Azevedo Jr. SAnDReS Tutorial 01 Prof. Dr. Walter F. de Azevedo Jr. 1 Running in the Windows On the Windows, left click on Command Prompt. Go to SAnDReS directory (c:\sandres) and

More information

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700 Information Technology MS Access 2007 Users Guide ACCESS 2007 Importing and Exporting Data Files IT Training & Development (818) 677-1700 [email protected] TABLE OF CONTENTS Introduction... 1 Import Excel

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Novell ZENworks Asset Management 7.5

Novell ZENworks Asset Management 7.5 Novell ZENworks Asset Management 7.5 w w w. n o v e l l. c o m October 2006 USING THE WEB CONSOLE Table Of Contents Getting Started with ZENworks Asset Management Web Console... 1 How to Get Started...

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Polynomial Neural Network Discovery Client User Guide

Polynomial Neural Network Discovery Client User Guide Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3

More information

CDD user guide. PsN 4.4.8. Revised 2015-02-23

CDD user guide. PsN 4.4.8. Revised 2015-02-23 CDD user guide PsN 4.4.8 Revised 2015-02-23 1 Introduction The Case Deletions Diagnostics (CDD) algorithm is a tool primarily used to identify influential components of the dataset, usually individuals.

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Bitrix Site Manager 4.1. User Guide

Bitrix Site Manager 4.1. User Guide Bitrix Site Manager 4.1 User Guide 2 Contents REGISTRATION AND AUTHORISATION...3 SITE SECTIONS...5 Creating a section...6 Changing the section properties...8 SITE PAGES...9 Creating a page...10 Editing

More information

Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet

Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet Nucleic Acids Research, 2006, Vol. 34, Web Server issue W119 W123 doi:10.1093/nar/gkl199 Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet M. Tyagi 1,

More information

PSW Guide. Version 4.7 April 2013

PSW Guide. Version 4.7 April 2013 PSW Guide Version 4.7 April 2013 Contents Contents...2 Documentation...3 Introduction...4 Forms...5 Form Entry...7 Form Authorisation and Review... 16 Reporting in the PSW... 17 Other Features of the Professional

More information

Unemployment Insurance Data Validation Operations Guide

Unemployment Insurance Data Validation Operations Guide Unemployment Insurance Data Validation Operations Guide ETA Operations Guide 411 U.S. Department of Labor Employment and Training Administration Office of Unemployment Insurance TABLE OF CONTENTS Chapter

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Moxa Device Manager 2.3 User s Manual

Moxa Device Manager 2.3 User s Manual User s Manual Third Edition, March 2011 www.moxa.com/product 2011 Moxa Inc. All rights reserved. User s Manual The software described in this manual is furnished under a license agreement and may be used

More information

Email Data Protection. Administrator Guide

Email Data Protection. Administrator Guide Email Data Protection Administrator Guide Email Data Protection Administrator Guide Documentation version: 1.0 Legal Notice Legal Notice Copyright 2015 Symantec Corporation. All rights reserved. Symantec,

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

Authoring for System Center 2012 Operations Manager

Authoring for System Center 2012 Operations Manager Authoring for System Center 2012 Operations Manager Microsoft Corporation Published: November 1, 2013 Authors Byron Ricks Applies To System Center 2012 Operations Manager System Center 2012 Service Pack

More information

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip Load testing with WAPT: Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. A brief insight is provided

More information

v4.8 Getting Started Guide: Using SpatialWare with MapInfo Professional for Microsoft SQL Server

v4.8 Getting Started Guide: Using SpatialWare with MapInfo Professional for Microsoft SQL Server v4.8 Getting Started Guide: Using SpatialWare with MapInfo Professional for Microsoft SQL Server Information in this document is subject to change without notice and does not represent a commitment on

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

The Real Challenges of Configuration Management

The Real Challenges of Configuration Management The Real Challenges of Configuration Management McCabe & Associates Table of Contents The Real Challenges of CM 3 Introduction 3 Parallel Development 3 Maintaining Multiple Releases 3 Rapid Development

More information