Contemporary approaches to protein structure classification

Size: px
Start display at page:

Download "Contemporary approaches to protein structure classification"

Transcription

1 Contemporary approaches to protein structure classification Mark B. Swindells, 1 * Christine A. Orengo, 2 David T. Jones, 3 E. Gail Hutchinson, 2 and Janet M. Thornton 2,4 Summary In a similar manner to sequence database searching, it is also possible to compare three-dimensional protein structures. Such methods can be extremely useful because a structural similarity may represent a distant evolutionary relationship that is undetectable by sequence analysis. In this review, we summarise the most popular structure comparison methods, show how they can be used for database searching, and then describe some of the most advanced attempts to develop comprehensive protein structure classifications. With such data, it is possible to identify distant evolutionary relationships, provide libraries of unique folds for structure prediction, estimate the total number of folds that exist, and investigate the preference for certain types of structures over others. BioEssays 20: , John Wiley & Sons, Inc. Introduction Like all public bioinformatics resources, the main repository for experimentally determined protein structures, the Protein Data Bank (1) (or PDB as it is better known) is growing at an enormous rate. There are currently over 7,000 structures in the PDB, most of which have been determined by x-ray crystallographic and nuclear magnetic resonance (NMR) spectroscopic techniques. However, there are also a small number of structures that have been determined by neutron diffraction and electron microscopy and even some hypothetical models. Currently, new coordinates are being deposited at a rate of 150 per month, which is a far cry from a decade ago when there were only about 600 structures in total. To make these data more accessible to the user, the PDB has recently created a relational database that can be interrogated over the web. Using this, one can select proteins based on key words (such as Perutz or hemoglobin) or experimental criteria 1 Helix Research Institute, Kisarazu, Japan. 2 Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, United Kingdom. 3 Department of Biological Sciences, Warwick University, Warwick, United Kingdom. 4 Department of Crystallography, Birkbeck College, London, United Kingdom. *Correspondence to: Mark B Swindells, Inpharmatica Ltd., 60 Charlotte Street, London W1P 2AX, UK. (such as resolution and R-factor in the case of crystal structures), as well as by a sequence search against all PDB entries. But it is now known that there are many distant relationships that can only be identified through structure comparison, and it is this development that is the subject of this review. By considering the possibility of structure-based searches in a manner that is comparable to a sequence search, it becomes possible to move cleanly through the barriers that prevent sequence searches from identifying all distant homologues. For instance, although a search that uses a globin sequence is not sufficiently sensitive to detect all the globins in PDB, a structure-based search would not only identify all the globins, but also proteins with similar structures, such as phycocyanin, which may be a distant ancestor, (2) and colicin, whose evolutionary relationships are less certain. (3) Another example, shown in Figure 1, reveals a clear similarity between several DNA-binding proteins that could not be identified on the basis of sequence alone. (4) This type of structure has, so far, only been seen in DNA-binding proteins, and in each case the same helix is used to bind the DNA. In the case of the replication terminator protein (5) and histone H5, (6) residues important for interacting with the DNA also appear to be conserved. As there are now many such cases to be identified, it is desirable to describe all relationships between known structures in a systematic and ultimately quantitative manner. 884 BioEssays BioEssays 20: , 1998 John Wiley & Sons, Inc.

2 TABLE 1. Databases and Programs with Internet Links Databases and programs Protein data bank CATH and CATHserver SCOP FSSP and DALI MMDB and VAST 3Dee Homstrad and ddbase 3D-ali Cosec ssap sarf Internet links argos orengo nicka/prerun.html Figure 1. a: Replication terminator protein, a homodimeric structure (32) whose monomeric chains have structural similarities to a large number of winged helix DNA-binding proteins. (33) Colored regions from (a) are shown in detail (b), and the structural similarity (c) to histone H5 (34) is emphasised. Such data would be useful, not only for detecting evolutionary relationships but also for making libraries of unique structures that could be used in structure prediction methods known as threading (7) and other fold recognition (8-10) techniques. In a manner that is analogous to the comparison of two sequences, such algorithms can thread a sequence through the coordinates of a library structure and through the application of a statistically derived energy function identify its most suitable alignment on the structure. By repeating this procedure on each library structure, it is hoped that the threading with the lowest energy will identify both the correct structure and alignment, though in practice the former is more likely than the latter. Describing such prediction methods in any more detail would be a review in itself, (11,12) but the point to appreciate here is that thorough organisation of the structures plays a significant role, for both the derivation of the potential function and the library of structures to be searched. Comparing protein structures Of course, the discussion above gives no indication of how such structural similarities might be identified. The simplest way is by making comparisons by eye, and indeed, this can be a very successful approach, (13-15) particularly when the coordinates of a structure are not available. However, there are few people who are able to do this. As a result, algorithms which automatically identify structural relationships have become a hot topic for research in recent years. Inevitably, they are more complex and computationally intensive than those used for sequence comparison as one has to find similarities between three-dimensional objects. In principle, one could condense three-dimensional information into a linear string, (16,17) but the increase in speed is more than offset by loss of discriminatory power. Assuming that the coordinates for both structures are available, there are now a number of approaches available for comparing protein structures which use techniques as diverse as least square superposition, (18-20) double dynamic programming, (21-23) simulated annealing, (24) graph theory, (25) distance matrix comparison, (26) and geometric hashing. (27) Some of these will be discussed further in our review. Finding similarities to a newly determined structure In an analogous manner to sequence database searching, it is possible to take a probe structure and compare it with every structure in the PDB. If the structure of interest is not already in the PDB, the easiest way is to use the DALI server or CATHserver (Table 1), which automatically compare the query structure with those already in the database. Alternatively, if one does not want to send the coordinates over the web and the PDB is mirrored, one can download a structure alignment program (Table 1) and run the search locally. Comprehensive comparisons of PDB structures In theory at least, if we can find similarities between pairs of structures, we should be able to describe all similarities in the PDB just by running a program many times. However, there are problems with this. Because most structure alignment procedures are central processing unit-intensive and there is significant redundancy in the PDB, a huge amount of computing time can be wasted performing unnecessary compari- BioEssays

3 sons. For instance, there are over 200 lysozyme entries with over 95% sequence identity. As these relationships can be identified on the basis of sequence alone, it is pointless to compare them all. Therefore, if one is interested in generating a comprehensive classification, it would be prudent to first identify obvious relationships using sequence alignment and then use a representative from each cluster in the second, more time-consuming phase of structure comparison. Preliminary attempts at providing comprehensive classifications (28,29) adopted such a two-step procedure. First, all-byall sequence alignments were calculated, and then, based on the alignment score and the overlap between each pair of sequences, these were clustered into families. From each cluster a representative was selected and a similar procedure performed, except that this time structure comparison was used instead of sequence comparison. These lists proved to be popular but still had two problems: no consideration was given to the deconvolution of multidomain structures into their constituents and new data could not be publicised without writing another paper. More recent approaches address the first problem directly by splitting all of the representatives into domains before performing the structure comparison step. Five groups have independently published procedures for splitting proteins into their constituent domains, (30-34) and these can be used to help process the representatives before performing the structure comparison stage. In structural terms, a domain can be thought of as a globular structure, which under normal circumstances would contain a hydrophobic core and be expected to be stable. However, it may also include small proteins that are held together by disulphide bridges (such as epidermal growth factor). The second problem has been helped by web technology as data can be now be distributed over the internet and updated at regular intervals. If one has a database of all-by-all comparisons, it is also possible, though not essential, to create a classification of the structural similarities. This is one of the main differences between the most popular databases. Some groups use all-by-all comparisons to find interesting new relationships but are not as concerned with the construction of a formal classification, whereas others have classification as an important goal of their project. In addition, one of the most widely used classifications, SCOP, is made by hand and does not have the kind of database common to the computationally based methods, though it could be viewed quite validly as a database of binary relationships. Each approach is equally valid and merely reflects the different ideas that each group has on tackling this important area. In the next part of this review, we will concentrate on the work of four groups: SCOP, (35) a manually constructed hierarchical classification; FSSP (36) and MMDB, (37) which are made in a totally automated fashion; and CATH, (38) the semiautomatic classification developed by ourselves. In our opinion, these four are currently the most comprehensive, firstly, because they include most PDB structures and, secondly, because they combine these data with advanced methods of display. However, readers interested in more detail should also look at 3dee, (32) a database of domains that has been clustered using two different hierarchical procedures, and 3D-ali, (39) an occasionally updated version of the list originally provided by the same authors. (28) Web links for all data and programs are shown in Table 1. Finally, there is the Homologous Structure Alignment Database (Homstrad), which, instead of trying to describe all protein structures, concentrates on a smaller number of families (currently release 5b has 130) but provides highly annotated structure-based sequence alignments (Table 1). A further novelty of Homstrad is that each sequence is written in a form that contains a considerable amount of structural information (beyond the normal summary of secondary structure elements), such as solvent accessibility and hydrogenbonding groups, by using a variety of typesetting procedures. (40) SCOP The first classification of PDB structures to be made available on the web was SCOP. (35) This is a complete classification of all proteins in PDB as well as additional structures that have not been deposited. The latter is possible because, as mentioned earlier, SCOP is an entirely manual classification. SCOP takes structure, function, and to some extent cellular location into account at various levels of the hierarchy. The top level is Class, which currently holds 10 levels: all-, all-, /, (following the classification of Levitt and Chothia) (41) multidomain (containing structures that have not been split into domains), as well as five other classes that deal with membrane-associated proteins and peptides. Below each Class there are two more levels that are based on structural rather than sequence data: Fold, which groups together similar proteins purely on structural criteria, and Superfamily, which clusters proteins on the basis of a similar structure and function. In this manner, proteins belonging to the same superfamily are expected to be evolutionarily related, even though their sequences may be quite different. Below the Superfamily level, all clusters correspond to relationships that could be identified by conventional sequence alignment. The clear advantages of SCOP are that classification at the superfamily level has been performed extremely carefully using the authors detailed knowledge of both structures and biological processes. In addition, the web-site is user-friendly and it is very easy for a researcher to browse through the classification. A potential disadvantage is that the absence of a comparison algorithm means that users cannot compare a new structure to the classification, nor can structure-based 886 BioEssays 20.11

4 TABLE 2. Glossary of Terms Class Architecture Topology Homologous superfamily Fold Superfold Structural similarity Overall description of a protein in terms of its regular secondary structure (,,or ) content. Gross shape that results from packing regular secondary structure elements. Names used often describe this appearance in terms of well-known shapes such as barrels and trefoils. No consideration is given to the polarity of the secondary structural elements nor the sequential order in which they are joined together. Considers not only architecture, but the polarity of each secondary structural element as well as the order in which they are encountered in a sequence. Essentially the same as topology but requires a higher degree of structural similarity coupled with a functional similarity that is suggestive of a homologous relationship. In this paper, we use this word interchangeably with topology. The definition of this word can vary between laboratories. A topology which contains more than one Homologous superfamily General phrase often used to describe the degree of structural complementarity resulting from least squares superposition. Hence, high structural similarity, etc. sequence alignments of classified families be generated automatically. FSSP FSSP is the database that results from using the DALI program to compare all PDB structures. It is a completely automatic approach that relies entirely on a set of algorithms (including DALI) to first process data into domains (30,42) and then use distance matrix comparison to compare all domain level structures with no significant sequence similarity. Each structural similarity, as determined by a system-defined score, is stored in the database, and through this automated procedure, the authors have detected several interesting relationships. (3,43,44) FSSP data can be accessed over the www, by either searching for a PDB structure of interest or submitting the coordinates of a new structure and requesting DALI to perform a custom search against the FSSP database. The latter is a very popular approach for groups involved with protein structure determination as they can ascertain whether any interesting similarities exist immediately after the structure has been solved. A large number of the structure report s published these days include the observation of a distant evolutionary relationship detected on the basis of a DALI search against the FSSP data. In many ways, the FSSP and SCOP approaches are extremely complementary in that a similarity first detected by a DALI search can provide a route into the SCOP classification. Although the FSSP database is not a hierarchical classification and is not designed to be viewed by eye, similar structures are clustered together into a tree of folds so that it is easy to analyse a particular family. However, the major advantage of FSSP is the additional availability of structurebased sequence alignments, which can be tailored to the preferences of each researcher. FSSP also contains other interesting features, such as a file of folds that are so far unique in the database. MMDB This database is created at the National Centre for Biotechnology Information (NCBI) using a vector alignment search tool (VAST). The target for MMDB is to provide a structural link with sequence and literature databases maintained by the NCBI. As one might envisage, this creates the exciting possibility of starting with a protein of interest and exploring increasingly distant relationships, while always being able to check the literature as the search progresses. Data for all structural similarities are accessed in a manner similar to other areas of Entrez, where the user is given the opportunity to investigate a set of nearest neighbours that have steadily decreasing similarities to the query. To make MMDB, domains are first identified automatically on the basis of compactness. (45) Similarities between domains are then determined by comparing sets of secondary structure vectors and assessing the significance of each hit with a statistical scoring scheme. By requesting structural neighbours using tools available at the NCBI, one is effectively asking for structures whose scores with the target are highest. However, as no extra assessments of functional similarities are made, the onus is on the user to judge the relevance of each hit. CATH Our own approach to structure classification has developed from two earlier pieces of work. (29,46) Although we wanted to automate the system as far as possible, we also wanted to try and differentiate between homologous proteins and those with merely a common fold. Therefore, we expected that manual intervention would be needed. We also wanted a classification system that could be browsed like a book as BioEssays

5 Figure 3. CATH pyramid of numbers showing how the structural entries cluster from 12,899 domain entries to three main classes. Figure 2. Simple example of the four CATH levels. For the b Class, the two-layer sandwich architecture is shown, in which one layer is formed by a -sheet and the other is formed by the -helices. Two distinct topologies are given for this architecture, and the difference is emphasised by colouring the ribbons from blue to red. For the first topology, we show two separate Homologous superfamilies, which in this case are represented by acylphosphatase and a domain of aspartate transcarbamoylase. well as searched by a structural probe. The result of our approach is a classification called CATH, whose construction has been explained in detail elsewhere. (22) CATH stands for Class, Architecture, Topology, and Homologous superfamily (see glossary in Table 2) and describes four levels of increasingly detailed structural similarity for each protein domain in the PDB. In this manner, proteins which share the same CATH numbers will also have structures which are globally similar, even though their sequences may be quite different. Figure 2 shows a simple example of these four CATH levels. To deal with redundancy in the PDB, we also have hierarchical divisions below the H level which cluster on the basis of sequence similarity. These are Sequence (where identity 35%), Nearly identical ( 95%), and Identical (100%). Version 1.4 of CATH contains nearly 13,000 domains, and the number of entries at each CATH level is summarised in Figure 3. The considerable redundancy in PDB is emphasised by the S level ( 35% identity), where the data reduce to only 1,316 distinct families. To cluster these S levels, we introduce structure comparison, and in the following section we will show how these 1,316 Sequence families are distributed in our CATH hierarchy. Class and Architecture There are only three main classes in CATH: all-, b, and all- (plus a miscellaneous category for small protein chains having no discernible secondary structure). Within each class, the number of architectures (Table 2) varies considerably. Currently, there are three in the class, 10 in the b class, and 18 in the class (Fig. 3). This difference arises because helices can interact with a wide range of angles, (47) despite a known preference for 35 and 25 angles, (48,49) and in combination these lead to complex structures that are difficult to divide in an intuitive manner. In contrast, those formed by -strands have more distinct shapes, such as barrels and sheets, as hydrogen bonds between the strands limit their orientation. Topology and Homologous superfamily These two terms (Table 2) require structures to have a high degree of structural similarity (as determined by the ssap 888 BioEssays 20.11

6 DNA binding Doubly wound Up Down Figure 4. CATHerine wheel, showing how the 757 H-level families distribute among the C, A, and T levels. Protein Class is shown in red, green, and yellow for,, and b, respectively. Within each class, the angle subtended reflects the number of H-level families which belong to each (inner circle) architecture and (outer circle) topology. Twelve superfolds having at least five Homologous superfamilies each are indicated in paler colours, and the structures of selected superfolds are shown around the circumference. αβ plaits TIM Barrel Complex 3 layer sandwich 2 layer sandwich RollBarrel Non bundle Sandwich Bundle Ribbon Barrel UB rolls Jelly roll Single sheet OB folds IG like score). The main difference is that Homologous superfamilies also require evidence that the proteins are related. We use a variety of criteria for this; evidence for a similarity in function (e.g., proteinase activity, DNA binding) is clearly the most useful, but in the absence of such data, we can also use our knowledge of how many distinct functions are associated with each fold. As mentioned earlier (Fig. 1), some topologies appear to be exclusively associated with a particular function. Distinguishing homologues from analogues is extremely difficult, and despite detailed research into methods for dividing these automatically, (50,51) there is currently no method sufficiently reliable to be applied in a routine manner. Some folds occur more than others Our 1,316 S-level clusters occupy 757 Homologous superfamilies and 527 Topologies (Fig. 3). Looking at the outer ring of CATHerine wheel (Fig. 4), two different situations clearly exist: either there is only one H-level within a T-level or there is more than one. Some years ago, we defined the term superfold in order to deal with the observation that certain topologies had noticeably more Homologous superfamilies than others. We required a superfold to have at least three H-levels, and at that time, it meant that 10 out of the 131 topologies identified (7%) were classified as such. Although the database is now an order of magnitude larger, this trend is Figure 5. Superfold graph showing the current data for topologies with more than one superfamily. reinforced with only 4% of the topologies (23 out of 527) having three or more H-levels (Fig. 5). But more significantly, even the data within this small set of topologies are highly skewed and in one case the doubly wound topology has as many as 52 Homologous superfamilies (Fig. 5). This suggests that there are a small number of popular topologies occupied by many proteins with no apparent evolutionary BioEssays

7 relationship and that these are presumably the ones with simple folding pathways. More recently, Brenner et al. (52) have looked at highly populated families in the SCOP database, describing them as frequently occurring domains. A FOD requires the presence of two SCOP superfamilies, and a recent review described 42 FODs in SCOP that represent about 12% of their classified folds. 1,000 or so folds Chothia was the first to estimate the number of folds that may exist in protein structure space. (53) His popular, though not necessarily correct, estimate is 1,000, calculated on the basis of how many folds we know, how many distinct sequence families are known in databases such as Swissprot, and what fraction of all sequences are currently available. Recently, the same group has used a simpler method, which compares the ratio of novel and previously observed folds during a particular year to the total number of folds known in the preceding year. (52) This year-on-year result can vary somewhat but always has the same order of magnitude. Some time ago, we provided a larger estimate of around 6,000. (29) This was based on a slightly different calculation, which tried to correct for the redundancy inherent in all databases and compensate for the presence of superfolds. However, the inability to cluster long sequences on the basis of their constituent domains and predict how many sequence families will ultimately belong to a superfold leads to an overestimation of folds by this calculation. The correct number will probably lie somewhere between the two. However, in all structure classification work, perhaps the biggest obstacle to determining the number of folds is the problem of how to actually define a fold. What is a fold? Take, for example, the all- proteins in Figure 6. Although the two structures at each extreme are sufficiently different to be assigned to separate folds by most current definitions, there are many proteins with intermediate degrees of similarity. On the basis of known variations in truly homologous proteins, one could easily imagine neighbouring structures as belonging to the same fold. As a result, when links such as these are allowed to form, they lead to large numbers of structures being clustered together. We call this the Russian doll effect. Although the effect can occur between proteins of any size, it is predominant in small domains because they are more likely than large structures to have most of their helices and strands superimposed by chance. At the moment, we have no automated way of dealing with these problems, so when they exist, we increase the requirements for a common fold by limiting the number of nonequivalenced helices and Figure 6. Russian doll effect as illustrated by a selection of all- structures. strands that are allowed. However, this will clearly have a knock-on effect towards the contents of superfolds as different criteria for defining topologies will change the number of Homologous superfamilies that a superfold contains. Conclusion Although protein structure classification is an area that remains under development, the results described here emphasise the scale of progress that has been achieved in the past few years. That we can successfully classify the majority of proteins into a manageable number of families is a significant step forwards for both the analysis of protein structure per se as well as its application to prediction programs such as threading and the new field of genome analysis. Our knowledge of evolution and the way it adapts gene products to new functions is the main reason that structure comparison and classification are possible. The ultimate goal is to relate structure to function, and current classifications represent one such route. As we have shown in the latter part of this review though, proteins do not always have the 890 BioEssays 20.11

8 researchers interests in mind when they evolve and create discrepancies that make divergent and convergent evolutionary paths often difficult to distinguish. Structure comparison and classification help us to appreciate both the variety of folds available to globular proteins as well as the limitations that result from having to form a compact structure. It is likely that the exponential rise in determined structures together with the sequencing of complete genomes will deepen our understanding, allowing some old problems to be solved and other, currently unanticipated problems to take their place. REFERENCES 1 Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, and Tasumi M (1977) The Protein Data Bank: A computer based archival file for macromolecular structures. J. Mol. Biol. 122: Pastore A, and Lesk AM (1990) Comparison of globins and physocyanins: Evidence for evolutionary relationship. Proteins 8: Holm L, and Sander C (1993) Globin fold in a bacterial toxin. Nature 361: Swindells MB (1995) Identification of a common fold in the replication terminator protein suggests a possible mode for DNA binding. Trends in Bioc. Sci. 20: Bussiere DE, Bastia D, and White SW (1995) Crystal structure of the replication terminator protein. Cell 80: Ramakrishnan V, and Finch JT (1993) Crystal structure of globular domain of histone H5 and its implications for nucleosome binding. Nature 362: Jones DT, Taylor WR, and Thornton JM (1992) A new approach to fold recognition. Nature 358: Bowie JU, Lüthy R, and Eisenberg D (1991) A method to identify protein sequences that fold into a known three dimensional structure. Science 253: Nishikawa K, and Matsuo Y (1993) Development of psuedoenergy potentials for assessing protein 3D-1D compatibility and detecting weak homologies. Prot. Eng. 6: Flockner H, Braxenthaler M, Lackner-P Jartz-M, Ortner M, and Sippl MJ (1995) Progress in fold recognition. Proteins 23: Fischer D, Rice D, Bowie JU, and Eisenberg D (1995) Assigning amino acid sequences to 3-dimensional protein folds. FASEB 10: Sippl MJ, and Flockner H (1996) Threading thrills and threats. Structure 4: Murzin AG, and Chothia C (1992) Protein architecture: new superfamilies. Curr. Opin. Struct. Biol. 2: Swindells MB (1992) Structural similarity between transforming growth factor- 2 and nerve growth factor. Science 258: Murzin AG (1996) Structural classification of proteins: new superfamilies. Curr. Opin. Struct. Biol. 6: Karpen ME, de-haseth PL, and Neet KE (1989) Comparing short protein substructures by a method based on backbone torsion angles. Proteins 6: Matsuo Y, and Kanehisa M (1993) an approach to systematic detection of protein structural motifs. Comput. Appl. Biosci. 9: Rossmann MG, and Argos P (1976) Exploring structural homology of proteins. J Mol. Biol. 105: Vriend G, and Sander C. (1991) Detection of three dimensional substructures in proteins. Proteins 11: Alexandrov NN, Takahashi K, and Go N (1992) Common spatial arrangements of backbone fragments in homologous and nonhomologous proteins. J. Mol. Biol. 225: Taylor WR, and Orengo C (1989) Protein structure alignment. J. Mol. Biol. 208: Orengo CA, Brown NP, and Taylor WR (1992) Fast structure alignment for protein databank searching. Proteins 14: Orengo CA, and Taylor WR (1996) SSAP: Sequential structure alignment program for protein structure comparison. Methods Enzymol 266: Sali A, and Blundell TL (1990) The definition of general topological equivalence in protein structures: A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol. 212: Mitchell EM, Artymiuk PJ, Rice DW, and Willett DW (1989) Use of techniques derived from graph theory to compare secondary structure motifs in proteins. J. Mol. Biol. 212: Holm L, and Sander C (1993) Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233: Nussinov R, and Woolfson HJ (1989) Efficient detection of threedimensional structural motifs in biological macromolecules by computer vision techniques. Proc. Natl. Acad. Sci. 88: Pascarella S, and Argos P (1992) A data bank merging related protein structures and sequences. Prot. Eng. 2: Orengo CA, Flores TP, Taylor WR, and Thornton JM (1993) Identification and classification of protein fold families. Prot. Eng. 6: Holm L, and Sander C (1994) Parser for protein folding units. Proteins 19: Swindells MB (1995) A procedure for detecting structural domains in proteins. Protein Science 4: Siddiqui AS, and Barton GJ (1995) Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions. Prot. Sci. 4: Sowdahamini R, Rufino SD, and Blundell TL (1996) A database of globular protein structural domains: Clustering of representative family members into similar folds. Folding and Design 1: Islam SA, Luo J, and Sternberg MJE (1995) Identification and analysis of domains in proteins. Prot. Engineering 8: Murzin AG, Brenner SE, Hubbard T, and Chothia T (1995) SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: Holm L, and Sander C (1996) The FSSP database of structurally aligned protein fold families. Nucl. Acids Res. 24: Gibrat JF, Madej T, and Bryant SH (1996) Surprising similarities in structure comparison. Curr. Opin. Struc. Biol. 6: Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, and Thornton JM (1997) CATH a hierarchic classification of protein domain structures. Structure 5: Pascarella S, Milpetz F, and Argos P (1996) A databank (3D-ali) collecting related protein sequences and structures. Prot. Eng. 9: Overington J, Johnson MS, Sali A, and Blundell TL (1990) Tertiary structural constraints on protein evolutionary diversity; Templates, key residues and structure prediction. Proc. Roy. Soc. Lond. B241: Levitt M, and Chothia C (1976) Structural patterns in globular proteins. Nature 261: Holm L, and Sander C (1996) Mapping the protein universe. Science 273: Holm L, and Sander C (1995) DNA polymerase beta belongs to an ancient nucleotidyltransferase superfamily. Trends in Bioc. Sci. 20: Holm L, and Sander C (1997) Enzyme HIT. Trends in Bioc. Sci. 22: Madej T, Gibrat JF, and Bryant SH (1995) Threading a database of protein cores. Proteins 23: Orengo CA, Jones DT, and Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372: Bowie JU (1997) Helix packing angle preferences. Nature Structural Biology 4: Chothia C, Levitt M, and Richardson D (1981) Helix to helix packing in proteins. J. Mol. Biol. 145: Reddy B, and Blundell T (1993) Packing of secondary structural elements in proteins. Analysis and prediction of inter-helix distances. J. Mol. Biol. 233: Russell RB, Saqi MA, Sayle RA, Bates PA, and Sternberg MJ (1997) Recognition of analogous and homologous protein folds: Analysis of sequence structure conservation. J. Mol. Biol. 269: Rost B (1997) Protein structures sustain evolutionary shift. Folding and Design 2:S19 S Brenner SE, Chothia C, and Hubbard TJP (1997) Population statistics of protein structures: Lessions from structural classifications. Curr. Opin. Struc. Biol. 7: Chothia C (1992) One thousand families for the molecular biologist. Nature 357: BioEssays

CSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10

CSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10 CSC 2427: Algorithms for Molecular Biology Spring 2006 Lecture 16 March 10 Lecturer: Michael Brudno Scribe: Jim Huang 16.1 Overview of proteins Proteins are long chains of amino acids (AA) which are produced

More information

Bioinformatics for Biologists. Protein Structure

Bioinformatics for Biologists. Protein Structure Bioinformatics for Biologists Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

Lecture 19: Proteins, Primary Struture

Lecture 19: Proteins, Primary Struture CPS260/BGT204.1 Algorithms in Computational Biology November 04, 2003 Lecture 19: Proteins, Primary Struture Lecturer: Pankaj K. Agarwal Scribe: Qiuhua Liu 19.1 The Building Blocks of Protein [1] Proteins

More information

Linear Sequence Analysis. 3-D Structure Analysis

Linear Sequence Analysis. 3-D Structure Analysis Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic

More information

Built from 20 kinds of amino acids

Built from 20 kinds of amino acids Built from 20 kinds of amino acids Each Protein has a three dimensional structure. Majority of proteins are compact. Highly convoluted molecules. Proteins are folded polypeptides. There are four levels

More information

Replication Study Guide

Replication Study Guide Replication Study Guide This study guide is a written version of the material you have seen presented in the replication unit. Self-reproduction is a function of life that human-engineered systems have

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group. Protein Structure Amino Acids Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain Alpha Carbon Amino Group Carboxyl Group Amino Acid Properties There are

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Consensus alignment server for reliable comparative modeling with distant templates

Consensus alignment server for reliable comparative modeling with distant templates W50 W54 Nucleic Acids Research, 2004, Vol. 32, Web Server issue DOI: 10.1093/nar/gkh456 Consensus alignment server for reliable comparative modeling with distant templates Jahnavi C. Prasad 1, Sandor Vajda

More information

arxiv:cond-mat/9709083v1 [cond-mat.stat-mech] 6 Sep 1997

arxiv:cond-mat/9709083v1 [cond-mat.stat-mech] 6 Sep 1997 Are Protein Folds Atypical? arxiv:cond-mat/97983v1 [cond-mat.stat-mech] 6 Sep 1997 Hao Li, Chao Tang, and Ned S. Wingreen NEC Research Institute, 4 Independence Way, Princeton, New Jersey 854 (August 29,

More information

Section I Using Jmol as a Computer Visualization Tool

Section I Using Jmol as a Computer Visualization Tool Section I Using Jmol as a Computer Visualization Tool Jmol is a free open source molecular visualization program used by students, teachers, professors, and scientists to explore protein structures. Section

More information

Discrete representations of the protein C. chain Xavier F de la Cruz 1, Michael W Mahoney 2 and Byungkook Lee

Discrete representations of the protein C. chain Xavier F de la Cruz 1, Michael W Mahoney 2 and Byungkook Lee Research Paper 223 Discrete representations of the protein C chain Xavier F de la Cruz 1, Michael W Mahoney 2 and Byungkook Lee Background: When a large number of protein conformations are generated and

More information

Structure Tools and Visualization

Structure Tools and Visualization Structure Tools and Visualization Gary Van Domselaar University of Alberta gary.vandomselaar@ualberta.ca Slides Adapted from Michel Dumontier, Blueprint Initiative 1 Visualization & Communication Visualization

More information

Protein annotation and modelling servers at University College London

Protein annotation and modelling servers at University College London Nucleic Acids Research Advance Access published May 27, 2010 Nucleic Acids Research, 2010, 1 6 doi:10.1093/nar/gkq427 Protein annotation and modelling servers at University College London D. W. A. Buchan*,

More information

The peptide bond is rigid and planar

The peptide bond is rigid and planar Level Description Bonds Primary Sequence of amino acids in proteins Covalent (peptide bonds) Secondary Structural motifs in proteins: α- helix and β-sheet Hydrogen bonds (between NH and CO groups in backbone)

More information

Hydrogen Bonds The electrostatic nature of hydrogen bonds

Hydrogen Bonds The electrostatic nature of hydrogen bonds Hydrogen Bonds Hydrogen bonds have played an incredibly important role in the history of structural biology. Both the structure of DNA and of protein a-helices and b-sheets were predicted based largely

More information

K'NEX DNA Models. Developed by Dr. Gary Benson Department of Biomathematical Sciences Mount Sinai School of Medicine

K'NEX DNA Models. Developed by Dr. Gary Benson Department of Biomathematical Sciences Mount Sinai School of Medicine KNEX DNA Models Introduction Page 1 of 11 All photos by Kevin Kelliher. To download an Acrobat pdf version of this website Click here. K'NEX DNA Models Developed by Dr. Gary Benson Department of Biomathematical

More information

Chapter 6 DNA Replication

Chapter 6 DNA Replication Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore

More information

18.2 Protein Structure and Function: An Overview

18.2 Protein Structure and Function: An Overview 18.2 Protein Structure and Function: An Overview Protein: A large biological molecule made of many amino acids linked together through peptide bonds. Alpha-amino acid: Compound with an amino group bonded

More information

Phase determination methods in macromolecular X- ray Crystallography

Phase determination methods in macromolecular X- ray Crystallography Phase determination methods in macromolecular X- ray Crystallography Importance of protein structure determination: Proteins are the life machinery and are very essential for the various functions in the

More information

PROTEINS THE PEPTIDE BOND. The peptide bond, shown above enclosed in the blue curves, generates the basic structural unit for proteins.

PROTEINS THE PEPTIDE BOND. The peptide bond, shown above enclosed in the blue curves, generates the basic structural unit for proteins. Ca 2+ The contents of this module were developed under grant award # P116B-001338 from the Fund for the Improvement of Postsecondary Education (FIPSE), United States Department of Education. However, those

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

Helices From Readily in Biological Structures

Helices From Readily in Biological Structures The α Helix and the β Sheet Are Common Folding Patterns Although the overall conformation each protein is unique, there are only two different folding patterns are present in all proteins, which are α

More information

Lab # 12: DNA and RNA

Lab # 12: DNA and RNA 115 116 Concepts to be explored: Structure of DNA Nucleotides Amino Acids Proteins Genetic Code Mutation RNA Transcription to RNA Translation to a Protein Figure 12. 1: DNA double helix Introduction Long

More information

Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to

More information

Myoglobin and Hemoglobin

Myoglobin and Hemoglobin Myoglobin and Hemoglobin Myoglobin and hemoglobin are hemeproteins whose physiological importance is principally related to their ability to bind molecular oxygen. Myoglobin (Mb) The oxygen storage protein

More information

This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are

This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are put together. 1 A more detailed view of a single protein

More information

DNA Worksheet BIOL 1107L DNA

DNA Worksheet BIOL 1107L DNA Worksheet BIOL 1107L Name Day/Time Refer to Chapter 5 and Chapter 16 (Figs. 16.5, 16.7, 16.8 and figure embedded in text on p. 310) in your textbook, Biology, 9th Ed, for information on and its structure

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Functional Architecture of RNA Polymerase I

Functional Architecture of RNA Polymerase I Cell, Volume 131 Supplemental Data Functional Architecture of RNA Polymerase I Claus-D. Kuhn, Sebastian R. Geiger, Sonja Baumli, Marco Gartmann, Jochen Gerber, Stefan Jennebach, Thorsten Mielke, Herbert

More information

4. Which carbohydrate would you find as part of a molecule of RNA? a. Galactose b. Deoxyribose c. Ribose d. Glucose

4. Which carbohydrate would you find as part of a molecule of RNA? a. Galactose b. Deoxyribose c. Ribose d. Glucose 1. How is a polymer formed from multiple monomers? a. From the growth of the chain of carbon atoms b. By the removal of an OH group and a hydrogen atom c. By the addition of an OH group and a hydrogen

More information

2011.008a-cB. Code assigned:

2011.008a-cB. Code assigned: This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler Structure 17 Supplemental Data EM-Fold: De Novo Folding of α-helical Proteins Guided by Intermediate-Resolution Electron Microscopy Density Maps Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert

More information

http://faculty.sau.edu.sa/h.alshehri

http://faculty.sau.edu.sa/h.alshehri http://faculty.sau.edu.sa/h.alshehri Definition: Proteins are macromolecules with a backbone formed by polymerization of amino acids. Proteins carry out a number of functions in living organisms: - They

More information

RNA Movies 2: sequential animation of RNA secondary structures

RNA Movies 2: sequential animation of RNA secondary structures W330 W334 Nucleic Acids Research, 2007, Vol. 35, Web Server issue doi:10.1093/nar/gkm309 RNA Movies 2: sequential animation of RNA secondary structures Alexander Kaiser 1, Jan Krüger 2 and Dirk J. Evers

More information

Translation Study Guide

Translation Study Guide Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to

More information

Chapter 3. Protein Structure and Function

Chapter 3. Protein Structure and Function Chapter 3 Protein Structure and Function Broad functional classes So Proteins have structure and function... Fine! -Why do we care to know more???? Understanding functional architechture gives us POWER

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

VTT TECHNICAL RESEARCH CENTRE OF FINLAND

VTT TECHNICAL RESEARCH CENTRE OF FINLAND Figure from: http://www.embl.de/nmr/sattler/teaching Why NMR (instead of X ray crystallography) a great number of macromolecules won't crystallize) natural environmant (water) ligand binding and inter

More information

REMOTE CONTROL by DNA as a Bio-sensor -antenna.

REMOTE CONTROL by DNA as a Bio-sensor -antenna. REMOTE CONTROL by DNA as a Bio-sensor -antenna. "Piezoelectric quantum transduction is a fundamental property of at- distance induction of genetic control " Paolo Manzelli: pmanzelli@gmail.com ; www.edscuola.it/lre.html;www.egocreanet.it

More information

Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996

Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996 Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 1 Genetic algorithms Inspired

More information

The EcoCyc Curation Process

The EcoCyc Curation Process The EcoCyc Curation Process Ingrid M. Keseler SRI International 1 HOW OFTEN IS THE GOLDEN GATE BRIDGE PAINTED? Many misconceptions exist about how often the Bridge is painted. Some say once every seven

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Structure Determination

Structure Determination 5 Structure Determination Most of the protein structures described and discussed in this book have been determined either by X-ray crystallography or by nuclear magnetic resonance (NMR) spectroscopy. Although

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Chapter 11: Molecular Structure of DNA and RNA

Chapter 11: Molecular Structure of DNA and RNA Chapter 11: Molecular Structure of DNA and RNA Student Learning Objectives Upon completion of this chapter you should be able to: 1. Understand the major experiments that led to the discovery of DNA as

More information

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK Dai Lu, Ph.D. dlu@tamhsc.edu Tel: 361-221-0745 Office: RCOP, Room 307 Drug Discovery and Development Drug Molecules Medicinal

More information

Peptide Bonds: Structure

Peptide Bonds: Structure Peptide Bonds: Structure Peptide primary structure The amino acid sequence, from - to C-terminus, determines the primary structure of a peptide or protein. The amino acids are linked through amide or peptide

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

The Ramachandran Map of More Than. 6,500 Perfect Polypeptide Chains

The Ramachandran Map of More Than. 6,500 Perfect Polypeptide Chains The Ramachandran Map of More Than 1 6,500 Perfect Polypeptide Chains Zoltán Szabadka, Rafael Ördög, Vince Grolmusz manuscript received March 19, 2007 Z. Szabadka, R. Ördög and V. Grolmusz are with Eötvös

More information

Recap. Lecture 2. Protein conformation. Proteins. 8 types of protein function 10/21/10. Proteins.. > 50% dry weight of a cell

Recap. Lecture 2. Protein conformation. Proteins. 8 types of protein function 10/21/10. Proteins.. > 50% dry weight of a cell Lecture 2 Protein conformation ecap Proteins.. > 50% dry weight of a cell ell s building blocks and molecular tools. More important than genes A large variety of functions http://www.tcd.ie/biochemistry/courses/jf_lectures.php

More information

IMPLEMENTING BUSINESS CONTINUITY MANAGEMENT IN A DISTRIBUTED ORGANISATION: A CASE STUDY

IMPLEMENTING BUSINESS CONTINUITY MANAGEMENT IN A DISTRIBUTED ORGANISATION: A CASE STUDY IMPLEMENTING BUSINESS CONTINUITY MANAGEMENT IN A DISTRIBUTED ORGANISATION: A CASE STUDY AUTHORS: Patrick Roberts (left) and Mike Stephens (right). Patrick Roberts: Following early experience in the British

More information

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function Papers listed: Cell2 During the semester I will speak of information from several papers. For many of them you will not be required to read these papers, however, you can do so for the fun of it (and it

More information

Disulfide Bonds at the Hair Salon

Disulfide Bonds at the Hair Salon Disulfide Bonds at the Hair Salon Three Alpha Helices Stabilized By Disulfide Bonds! In order for hair to grow 6 inches in one year, 9 1/2 turns of α helix must be produced every second!!! In some proteins,

More information

Identification of Domains and Domain Interface Residues in Multidomain Proteins From Graph Spectral Method

Identification of Domains and Domain Interface Residues in Multidomain Proteins From Graph Spectral Method PROTEINS: Structure, Function, and Bioinformatics 59:616 626 (2005) Identification of Domains and Domain Interface Residues in Multidomain Proteins From Graph Spectral Method Ramesh K. Sistla, Brinda K.

More information

TITLE PAGE - CURRENT PROTOCOLS IN BIOINFORMATICS

TITLE PAGE - CURRENT PROTOCOLS IN BIOINFORMATICS TITLE PAGE - CURRENT PROTOCOLS IN BIOINFORMATICS Unit Number: Unit Title: DALI structural comparison of proteins Authors: Liisa Holm *, Sakari Kääriäinen, Dariusz Plewczynski 1, Chris Wilton Address(es):

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

SuperViz: An Interactive Visualization of Super-Peer P2P Network

SuperViz: An Interactive Visualization of Super-Peer P2P Network SuperViz: An Interactive Visualization of Super-Peer P2P Network Anthony (Peiqun) Yu pqyu@cs.ubc.ca Abstract: The Efficient Clustered Super-Peer P2P network is a novel P2P architecture, which overcomes

More information

Protein 3D-structure analysis. why and how

Protein 3D-structure analysis. why and how Protein 3D-structure analysis why and how 3D-structures are precious sources of information Shape and domain structure Protein classification Prediction of function for uncharacterized proteins Interaction

More information

A disaccharide is formed when a dehydration reaction joins two monosaccharides. This covalent bond is called a glycosidic linkage.

A disaccharide is formed when a dehydration reaction joins two monosaccharides. This covalent bond is called a glycosidic linkage. CH 5 Structure & Function of Large Molecules: Macromolecules Molecules of Life All living things are made up of four classes of large biological molecules: carbohydrates, lipids, proteins, and nucleic

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME.

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME. Biology Curriculum The Georgia Performance Standards are designed to provide students with the knowledge and skills for proficiency in science. The Project 2061 s Benchmarks for Science Literacy is used

More information

Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet

Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet Nucleic Acids Research, 2006, Vol. 34, Web Server issue W119 W123 doi:10.1093/nar/gkl199 Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet M. Tyagi 1,

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

Introduction to Proteins and Enzymes

Introduction to Proteins and Enzymes Introduction to Proteins and Enzymes Basics of protein structure and composition The life of a protein Enzymes Theory of enzyme function Not all enzymes are proteins / not all proteins are enzymes Enzyme

More information

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins

More information

CDD: a curated Entrez database of conserved domain alignments

CDD: a curated Entrez database of conserved domain alignments # 2003 Oxford University Press Nucleic Acids Research, 2003, Vol. 31, No. 1 383 387 DOI: 10.1093/nar/gkg087 CDD: a curated Entrez database of conserved domain alignments Aron Marchler-Bauer*, John B. Anderson,

More information

Molecular basis of sweet taste in dipeptide taste ligands*

Molecular basis of sweet taste in dipeptide taste ligands* Pure Appl. Chem., Vol. 74, No. 7, pp. 1109 1116, 2002. 2002 IUPAC Molecular basis of sweet taste in dipeptide taste ligands* M. Goodman 1,, J. R. Del Valle 1, Y. Amino 2, and E. Benedetti 3 1 Department

More information

Supplementary Information

Supplementary Information Supplementary Information S1: Degree Distribution of TFs in the E.coli TRN and CRN based on Operons 1000 TRN Number of TFs 100 10 y = 619.55x -1.4163 R 2 = 0.8346 1 1 10 100 1000 Degree of TFs CRN 100

More information

Self-adjusting Importances for the Acceleration of MCBEND

Self-adjusting Importances for the Acceleration of MCBEND Self-adjusting Importances for the Acceleration of MCBEND Edmund Shuttleworth Abstract The principal method of variance reduction in MCBEND is the use of splitting and Russian roulette under the control

More information

DNA Scissors: Introduction to Restriction Enzymes

DNA Scissors: Introduction to Restriction Enzymes DNA Scissors: Introduction to Restriction Enzymes Objectives At the end of this activity, students should be able to 1. Describe a typical restriction site as a 4- or 6-base- pair palindrome; 2. Describe

More information

High Throughput Network Analysis

High Throughput Network Analysis High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department

More information

AS Biology Unit 2 Key Terms and Definitions. Make sure you use these terms when answering exam questions!

AS Biology Unit 2 Key Terms and Definitions. Make sure you use these terms when answering exam questions! AS Biology Unit 2 Key Terms and Definitions Make sure you use these terms when answering exam questions! Chapter 7 Variation 7.1 Random Sampling Sampling a population to eliminate bias e.g. grid square

More information

Carbohydrates, proteins and lipids

Carbohydrates, proteins and lipids Carbohydrates, proteins and lipids Chapter 3 MACROMOLECULES Macromolecules: polymers with molecular weights >1,000 Functional groups THE FOUR MACROMOLECULES IN LIFE Molecules in living organisms: proteins,

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

PHYSIOLOGY AND MAINTENANCE Vol. II - On The Determination of Enzyme Structure, Function, and Mechanism - Glumoff T.

PHYSIOLOGY AND MAINTENANCE Vol. II - On The Determination of Enzyme Structure, Function, and Mechanism - Glumoff T. ON THE DETERMINATION OF ENZYME STRUCTURE, FUNCTION, AND MECHANISM University of Oulu, Finland Keywords: enzymes, protein structure, X-ray crystallography, bioinformatics Contents 1. Introduction 2. Structure

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Disaccharides consist of two monosaccharide monomers covalently linked by a glycosidic bond. They function in sugar transport.

Disaccharides consist of two monosaccharide monomers covalently linked by a glycosidic bond. They function in sugar transport. 1. The fundamental life processes of plants and animals depend on a variety of chemical reactions that occur in specialized areas of the organism s cells. As a basis for understanding this concept: 1.

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical

More information

Basic Concepts of DNA, Proteins, Genes and Genomes

Basic Concepts of DNA, Proteins, Genes and Genomes Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate

More information

Polarization Dependence in X-ray Spectroscopy and Scattering. S P Collins et al Diamond Light Source UK

Polarization Dependence in X-ray Spectroscopy and Scattering. S P Collins et al Diamond Light Source UK Polarization Dependence in X-ray Spectroscopy and Scattering S P Collins et al Diamond Light Source UK Overview of talk 1. Experimental techniques at Diamond: why we care about x-ray polarization 2. How

More information

Bob Jesberg. Boston, MA April 3, 2014

Bob Jesberg. Boston, MA April 3, 2014 DNA, Replication and Transcription Bob Jesberg NSTA Conference Boston, MA April 3, 2014 1 Workshop Agenda Looking at DNA and Forensics The DNA, Replication i and Transcription i Set DNA Ladder The Double

More information

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) 820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor

More information

Searching Nucleotide Databases

Searching Nucleotide Databases Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips Technology Update White Paper High Speed RAID 6 Powered by Custom ASIC Parity Chips High Speed RAID 6 Powered by Custom ASIC Parity Chips Why High Speed RAID 6? Winchester Systems has developed High Speed

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information

Structure and Function of DNA

Structure and Function of DNA Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four

More information

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions Amazing DNA facts These facts can form the basis of a quiz (for example, how many base pairs are there in the human genome?). Students should be familiar with most of this material, so the quiz could be

More information

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5 Protein physics, Lecture 5 Peptide bonds: resonance structure Properties of proteins: Peptide bonds and side chains Proteins are linear polymers However, the peptide binds and side chains restrict conformational

More information

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011 Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.3 Selected Standards

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information

Poverty among ethnic groups

Poverty among ethnic groups Poverty among ethnic groups how and why does it differ? Peter Kenway and Guy Palmer, New Policy Institute www.jrf.org.uk Contents Introduction and summary 3 1 Poverty rates by ethnic group 9 1 In low income

More information