Protein annotation and modelling servers at University College London
|
|
|
- Brianna Price
- 10 years ago
- Views:
Transcription
1 Nucleic Acids Research Advance Access published May 27, 2010 Nucleic Acids Research, 2010, 1 6 doi: /nar/gkq427 Protein annotation and modelling servers at University College London D. W. A. Buchan*, S. M. Ward, A. E. Lobley, T. C. O. Nugent, K. Bryson and D. T. Jones* Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, UK Received February 5, 2010; Revised April 26, 2010; Accepted May 6, 2010 ABSTRACT The UCL Bioinformatics Group web portal offers several high quality protein structure prediction and function annotation algorithms including PSIPRED, pgenthreader, pdomthreader, MEMSAT, MetSite, DISOPRED2, DomPred and FFPred for the prediction of secondary structure, protein fold, protein structural domain, transmembrane helix topology, metal binding sites, regions of protein disorder, protein domain boundaries and protein function, respectively. We also now offer a fully automated 3D modelling pipeline: BioSerf, which performed well in CASP8 and uses a fragment-assembly approach which placed it in the top five servers in the de novo modelling category. The servers are available via the group web site at INTRODUCTION As the rate of deposition of new protein sequences outstrips the rate at which new protein structures are deposited in the public databases, there will be a continued need for accurate protein structure prediction. It is also arguable that the need for novel function annotation tools is even more pressing. The UCL Bioinformatics Group web portal aggregates a range of methods, developed and maintained at UCL, which predict key structural features of proteins from either their primary structure (sequence) or tertiary structure. We have recently revamped all of our web tools and have taken the opportunity to port all of our services to a new unified server architecture. The services are now presented with a user-friendly common look and feel and we have greatly increased our capacity to serve requests to the community. ALGORITHMS This section gives an overview of the algorithms and services available via the UCL Bioinformatics Group web server ( PSIPRED is a secondary structure prediction method (1) which uses a two-stage neural network to predict secondary structure using PSI-BLAST output (2). PSIPRED V3.0 currently offers the highest available three-state accuracy (Q 3 ) for secondary structure prediction of 81.4% (0.6%). This year, in addition to improvements to the underlying algorithm, we have also made some minor updates to the PSIPRED web output, where we now provide coloured alignment schematic diagrams alongside the original secondary structure cartoons. MEMSAT3 and MEMSAT-SVM Transmembrane topology prediction is provided by our MEMSAT methods. Our server now supports two algorithms: MEMSAT3 (3) and MEMSAT-SVM (4). Predictions made with MEMSAT3 and MEMSAT-SVM begin with the generation of a PSI-BLAST profile (PSSM). Both algorithms employ dynamic programming to enumerate all possible transmembrane topologies, but differ in the method used to score the different topologies. MEMSAT3 uses a neural network for scoring, and MEMSAT-SVM makes use of a support vector machine (SVM) along with additional target features such as signal peptides and re-entrant helices. Benchmarked with full cross-validation on a data set composed solely of sequences with crystal structures available to validate their topologies, MEMSAT3 and MEMSAT-SVM achieved 76 and 89% accuracy, respectively. Additionally, MEMSAT-SVM was able to predict both signal peptides (93% accuracy) and re-entrant helices (44% accuracy). When run via the portal, both MEMSAT3 and MEMSAT-SVM are run simultaneously and the results *To whom correspondence should be addressed. Tel: +44 (0) ; Fax: +44 (0) ; [email protected] Correspondence may also be addressed to David Jones. [email protected] ß The Author(s) Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
2 2 Nucleic Acids Research, 2010 Figure 1. The diagram produced by the MEMSAT-SVM algorithm available via the PSIPRED server. (a) A schematic diagram of the transmembrane regions is presented. Directly below this a trace of the Kyte Doolittle hydropathy plot (18) and below that four traces of the outputs for the four SVMs used to make the transmembrane region assignment are shown. (b) A cartoon of the transmembrane helix topology summarizing the linear coordinates for the helices and indicating where the protein s extra- and intercellular regions are. produced allow the user to compare both predictions. Results may be returned by and the web presentation displays both the schematic and cartoon diagrams of the predictions (Figure 1). DISOPRED2 allows the user to predict disordered regions of proteins (5). Disordered regions are known to play important roles in protein protein interactions, cell signalling, enzyme kinetics, signalling and a range of other processes. Disordered regions are characterized as those regions which do not have static structure and are believed to move continually through different configurations when the protein is in its functional configuration. DISOPRED2 generates a PSSM matrix and then analyses this with an appropriately trained SVM. DISOPRED accuracy has previously been shown to be 93.1%. Our updated results presentation on the web now features a simple schematic which clearly displays the location of the disordered residues.
3 Nucleic Acids Research, DomPred DomPred is a protein domain boundary prediction service which presents the results of two different domain boundary prediction methods, DPS (Domains Predicted from Sequence) and DomSSEA (6). DPS is a protein domain homology based method which uses PSI-BLAST to find sequential discrete regions of domain alignments along a query sequence and is capable of resolving domain boundaries even when no close structural relatives are known. DomSSEA is a domain recognition method which attempts to estimate the location of domain boundaries using distant homologous domains. This method requires that a given domain already be present in the domain databases. Our new server makes only minor cosmetic changes to the presentation of these results. pgenthreader and pdomthreader The latest implementations of the GenTHREADER algorithm combine profile profile alignments with secondary structure- based gap penalties, pair-wise and solvation-based potentials. A linear regression SVM is used to predict protein fold. Our new method, pdomthreader (7), is capable of discriminating protein domain superfamilies within a query sequence. The performance of pdomthreader exceeds that of most related methods. We have greatly improved the web-based presentation of the results. Results now feature external links to other structural resources including PDBSum (8), SCOP (9) and CATH (10). The sequence alignments are now presented in a fully interactive form using JalView (11); and we also now include links which allow users to explore functional and structural data derived from both the CATH and Gene3D databases (12). The integration with other resources gives the user a richer experience, enabling researchers to move easily from predicted fold through to putative function. In the future we hope to integrate a much wider range of structural and functional annotations into the results. BioSerf BioSerf is our new fully automated de novo and homology modelling pipeline available via the World Wide Web or as an XML-driven web service (manuscript in preparation). BioSerf uses a selection of algorithms including PSI-BLAST, PSIPRED and pgenthreader to attempt to intelligently select appropriate template structures for homology modelling. Where a suitable template or templates can be found, BioSerf uses MODELLER (13) to build a model. Where no suitable template structure or insufficient coverage can be found, FRAGFOLD (14) is used to create an ab initio template (Figure 2). As this service uses MODELLER, a valid MODELLER licence key is required to submit sequences to the service. Users receive a PDB file as the primary result along with the intermediate results used to generate the model. These intermediate results and the model itself can be inspected via a web page. Figure 2. The process flow when analysing a sequence using the BioSerf pipeline is shown. An incoming sequence is initially analysed by both PSI-BLAST, PSIPRED and pgenthreader. The PSI- BLAST run allows the pipeline to identify whether the query sequences closely matches the sequence of a known structure in the PDB. PSIPRED and pgenthreader attempt to identify any remote structural homologues of the sequence should the PSI-BLAST run fail to find any related sequences. These data are then combined and potential template structures for modelling are selected given suitable cut-offs. Where there is at least one suitable template, the pipeline then uses MODELLER to generate a homology model. Where there are no suitable template structures, FRAGFOLD is used to build a possible model structure. A typical run of the pipeline that finds a suitable template structure takes around 10 min. In those cases where no template can be found and FRAGFOLD is used, runs can take up to 3 h. MetSite MetSite (15) is a fully automatic method for predicting clusters of metal-binding residues. It is highly sensitive and works well even on protein models of moderate resolution. Sequence profile information derived from PSI-BLAST is analysed using several neural network classifiers, one classifier for each metal atom type. These classifiers are capable of distinguishing metal binding sites from non-metal sites at a mean accuracy of 94.5%. Users receive a PDB file as the main output with the residue relative-temperature feature used to annotate the prediction. Via the web site, users can explore the annotation in the Jmol applet provided. FFPred The FFPred server (16) offers a method for predicting protein function using a machine learning approach. FFPred attempts to assign reliable Gene Ontology (GO) classes (17) to queries using a series of SVM classifiers. Query sequences are annotated with a large set of possible features including such data as amino acid content, transmembrane regions and disordered regions. Every GO class is represented by five SVM classifiers and the sequence is scored for each class. Final GO class assignment is determined by majority rule wherever at least three out of the five classifiers agree. The method achieves a specificity of >90% at sensitivities >30%. Results are
4 4 Nucleic Acids Research, 2010 (a) (e) (b) (c) (d) Figure 3. A typical series of analyses using our servers to annotate a protein of unknown function are shown. Putative human membrane protein A8MYE5 was selected for this analysis (a). It was initially submitted to a MEMSAT-SVM prediction to locate any transmembrane helices (b). In the third step, (c), the sequence is submitted to DomPred to locate any possible domain boundaries. As these do not conflict with the transmembrane helix assignment, the sequence is divided into it s possible constituent domains and those domains that are not transmembrane are submitted to BioSerf for homology modelling (d). The larger domain has a long terminal region consisting of two beta sheets that are likely to be disordered rather than packed as in the prediction. To confirm this possibility, the larger domain is submitted the DISOPRED2 server. The DISOPRED2 prediction (e) confirms that the terminal stretch of the larger domain is likely to be disordered rather than packed onto the structure. then returned to the user who can also explore the results via a temporary dynamically created web page. Use case scenario In this section, we describe a typical use case scenario for our methods which uses a range of our new servers. Typically we envisage that researchers with a single or a small number of uncharacterized or partially characterized proteins will be interested in making one or more predictions of structural features using our methods. We would advise users wishing to perform high-throughput studies to download the software which we make available via our web site ( Structural annotation We selected a previously uncharacterized, putative membrane protein from UniProt, A8MYE5 (currently available via This is a 552 residue protein derived from human gene MERTK. The annotation via the protein domain databases available at UniProt suggests this may be a putative protein tyrosine kinase. Initially we submitted the sequence to our FFPred server. FFPred confirms this tentative functional annotation, identifying a range of GO terms the most significant of which are: GO: , GO: , GO: for metabolic process, protein kinase activity and protein amino acid phosphorylation, respectively. Structural annotation began by submitting the sequence for analysis with the MEMSAT-SVM service (Figure 3b). The MEMSATSVM prediction indicates the possible presence of two membrane spanning helices towards the C-terminus of the sequence, between positions 115 and 130, and 146 and 165. Presence of these helices could indicate that the putative description assigned by Uniprot is correct. However were this a receptor tyrosine kinase, we might expect there to be only a single helix such that a ligand binding domain might be presented to the extracellular space.
5 Nucleic Acids Research, Figure 4. The final, putative, annotation of Uniprot A8MYE5 sequence. The next analysis performed was to submit the sequence to the DomPred server (Figure 3c). The output shows two strong peaks around residue 150 and 234, with the domain boundary prediction made at residue 234 as the first peak likely corresponds to the transmembrane region we had already predicted. As such, this annotation appears to be in reasonable agreement with the initial MEMSAT-SVM annotation. Based on the MEMSAT-SVM and DomPred annotation, the sequence was then divided into three segments. The initial break point was around residue 234, as suggested by the DomPred analysis, gave two sub-sequences; an N-terminal sequence containing both the initial domain alongside the transmembrane region and a larger C-terminal region containing the second domain. The smaller N-terminal sequence was then divided again at residue 130 to cleave the putative initial domain from the predicted transmembrane region. Both the putative N-terminal domain and the larger C-terminal domain were then submitted to the BioSerf homology modelling server, (Figure 3d). This generated two good, compact homology models for each domain. The smaller N-terminal domain appears to be a simple two-layer a b sandwich but the larger domain appears to be more complex. The smaller domain has a large exposed beta sheet, this type of conformation is not common in extracellular domains and this lends some support to the MEMSAT-SVM annotation that predicts two transmembrane helices, which in turn places the N-terminal domain in the cytosolic region. The homology model of the larger C-terminal domain shows an initial region comprised of a well-resolved beta sheet with an attendant layer of alpha helices forming a two-layer a b sandwich. The other portion of the model appears to form a further discrete region that contains a clear four helix bundle. These two discrete compact regions may indicate that the larger C-terminal domain is in fact comprised of two smaller domains. The homology model for the C-terminal domain has a similar topology to PDB entry 1apm. Searching the CATH database for 1apm reveals that this structure has been divided into two domains by the CATH classification. Finally, following the model all the way to its C-terminus, the tail end of the sequence loops back along the entire length of the domain as a pair of extended b strands. These primarily make contacts only with loop regions. This rather unnatural formation seems likely to be an artefact of the modelling process. To further examine the terminal b strands of the larger domain, the whole C-terminal sequence was submitted to the DISOPRED2 server (Figure 3e). The DISOPRED2 prediction suggests that the final 35 residues of the C-terminal domain are likely to be disordered. This lends some credence to the proposition that the final two b strands of the protein are an artefact of the modelling process. This may instead be a highly mobile region which may contain features such as binding sites, signal sites or phosphorylation sites. Our complete annotation suggests a model for this protein containing, in sequence, an initial domain, a transmembrane region, one large or two putative domains and a trailing disordered region (Figure 4). Our prediction of a tyrosine kinase potentially made up of three domains, comprising two compact a b sandwiches and a compact all alpha domain, is in keeping with other tyrosine kinase structures already deposited at the PDB. DISCUSSION The UCL Bioinformatics Group server offers some of the best performing algorithms of their kind. They can be accessed via the web ( servers/) or most of the programs themselves can be downloaded for use on a local machine ( These services will be useful to any bioinformatician or biologist looking to functionally and structurally characterize proteins. FUNDING Funding for open access charge: Biotechnology and Biological Sciences Research Council, UK. Conflict of interest statement. None declared. REFERENCES 1. Jones,D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, Altschul,S., Madden,T., Schaffer,A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, Jones,D.T. (2007) Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics, 23, Nugent,T. and Jones,D.T. (2009) Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics, 10, Ward,J.J., McGuffin,L.J., Bryson,K., Buxton,B.F. and Jones,D.T. (2004) The DISOPRED server for the prediction of protein disorder. Bioinformatics, 20, Bryson,K., Cozzetto,D. and Jones,D.T. (2007) Computer-assisted protein domain boundary prediction using the DomPred server. Curr. Protein Pept. Sci., 8, Lobley,A., Sadowski,M.I. and Jones,D.T. (2009) pgenthreader and pdomthreader: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics, 25,
6 6 Nucleic Acids Research, Laskowski,R.A. (2009) PDBsum new things. Nucleic Acids Res., 37, D355 D Andreeva,A., Howorth,D., Chandonia,J.M., Brenner,S.E., Hubbard,T.J.P., Chothia,C. and Murzin,A.G. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res., 36, D419 D Greene,L.H., Lewis,T.E., Addou,S., Cuff,A., Dallman,T., Dibley,M., Redfern,O., Pearl,F., Nambudiry,R., Reid,A. et al. (2007) The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res., 35, D291 D Waterhouse,A.M., Procter,J.B., Martin,D.M.A., Clamp,M. and Barton,G.J. (2009) Jalview version 2 a multiple sequence alignment editor and analysis workbench. Bioinformatics, 25, Yeats,C., Lees,J., Reid,A., Kellam,P., Martin,N., Liu,X. and Orengo,C. (2008) Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Res., 36, D414 D Eswar,N., Eramian,D., Webb,B., Shen,M.Y. and Sali,A. (2008) Protein structure modeling with MODELLER. Methods Mol. Biol., 426, Jones,D.T. and McGuffin,L.J. (2003) Assembling novel protein folds from super-secondary structural fragments. Proteins, 53(Suppl. 6), Sodhi,J.S., Bryson,K., McGuffin,L.J., Ward,J.J., Wernisch,L. and Jones,D.T. (2004) Predicting metal-binding site residues in low-resolution structural models. J. Mol. Biol., 342, Lobley,A.E., Nugent,T., Orengo,C.A. and Jones,D.T. (2008) FFPred: an integrated feature-based function prediction server for vertebrate proteomes. Nucleic Acids Res., 36, W207 W The Gene Ontology Consortium. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, Kyte,J. and Doolittle,R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol., 157,
Consensus alignment server for reliable comparative modeling with distant templates
W50 W54 Nucleic Acids Research, 2004, Vol. 32, Web Server issue DOI: 10.1093/nar/gkh456 Consensus alignment server for reliable comparative modeling with distant templates Jahnavi C. Prasad 1, Sandor Vajda
Linear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST
Nucleic Acids Research, 2005, Vol. 33, Web Server issue W105 W110 doi:10.1093/nar/gki359 LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST Dan
CSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10
CSC 2427: Algorithms for Molecular Biology Spring 2006 Lecture 16 March 10 Lecturer: Michael Brudno Scribe: Jim Huang 16.1 Overview of proteins Proteins are long chains of amino acids (AA) which are produced
Bioinformatics for Biologists. Protein Structure
Bioinformatics for Biologists Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research
Lecture 19: Proteins, Primary Struture
CPS260/BGT204.1 Algorithms in Computational Biology November 04, 2003 Lecture 19: Proteins, Primary Struture Lecturer: Pankaj K. Agarwal Scribe: Qiuhua Liu 19.1 The Building Blocks of Protein [1] Proteins
Guide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
Genome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
RNA Movies 2: sequential animation of RNA secondary structures
W330 W334 Nucleic Acids Research, 2007, Vol. 35, Web Server issue doi:10.1093/nar/gkm309 RNA Movies 2: sequential animation of RNA secondary structures Alexander Kaiser 1, Jan Krüger 2 and Dirk J. Evers
Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler
Structure 17 Supplemental Data EM-Fold: De Novo Folding of α-helical Proteins Guided by Intermediate-Resolution Electron Microscopy Density Maps Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
Template-based protein structure modeling using the RaptorX web server
Template-based protein structure modeling using the RaptorX web server Morten Källberg 1 3, Haipeng Wang 1,3, Sheng Wang 1, Jian Peng 1, Zhiyong Wang 1, Hui Lu 2 & Jinbo Xu 1 1 Toyota Technological Institute
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet
Nucleic Acids Research, 2006, Vol. 34, Web Server issue W119 W123 doi:10.1093/nar/gkl199 Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet M. Tyagi 1,
Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: [email protected]
Pairwise Sequence Alignment
Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
T cell Epitope Prediction
Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments
Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004
Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence
Vaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He
Vaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He Unit for Laboratory Animal Medicine Department of Microbiology and Immunology Center for Computational Medicine
BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs
BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs Richard J. Edwards 2008. Contents 1. Introduction... 2 1.1. Version...2 1.2. Using this Manual...2 1.3. Why use BUDAPEST?...2
The peptide bond is rigid and planar
Level Description Bonds Primary Sequence of amino acids in proteins Covalent (peptide bonds) Secondary Structural motifs in proteins: α- helix and β-sheet Hydrogen bonds (between NH and CO groups in backbone)
Chapter 3. Protein Structure and Function
Chapter 3 Protein Structure and Function Broad functional classes So Proteins have structure and function... Fine! -Why do we care to know more???? Understanding functional architechture gives us POWER
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
Transcription and Translation of DNA
Transcription and Translation of DNA Genotype our genetic constitution ( makeup) is determined (controlled) by the sequence of bases in its genes Phenotype determined by the proteins synthesised when genes
Integrating Bioinformatics, Medical Sciences and Drug Discovery
Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: [email protected] Bioinformatics
GenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
Protein 3D-structure analysis. why and how
Protein 3D-structure analysis why and how 3D-structures are precious sources of information Shape and domain structure Protein classification Prediction of function for uncharacterized proteins Interaction
Introduction to Proteins and Enzymes
Introduction to Proteins and Enzymes Basics of protein structure and composition The life of a protein Enzymes Theory of enzyme function Not all enzymes are proteins / not all proteins are enzymes Enzyme
Built from 20 kinds of amino acids
Built from 20 kinds of amino acids Each Protein has a three dimensional structure. Majority of proteins are compact. Highly convoluted molecules. Proteins are folded polypeptides. There are four levels
Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction
Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction Chao Jin, Jayavardhana Gubbi, Rajkumar Buyya, and Marimuthu Palaniswami Grid Computing and Distributed Systems Lab Dept.
Current Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
Student name ID # 2. (4 pts) What is the terminal electron acceptor in respiration? In photosynthesis? O2, NADP+
1. Membrane transport. A. (4 pts) What ion couples primary and secondary active transport in animal cells? What ion serves the same function in plant cells? Na+, H+ 2. (4 pts) What is the terminal electron
REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])
820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor
Activity 7.21 Transcription factors
Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Analysis of structural dynamics by H/D-exchange coupled to mass spectrometry HDX-MS
Analysis of structural dynamics by H/D-exchange coupled to mass spectrometry () New Approaches in Drug Design & Discovery 2014 25 th of March 2014 Introduction What are the challenges in structure-based
Introduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Helices From Readily in Biological Structures
The α Helix and the β Sheet Are Common Folding Patterns Although the overall conformation each protein is unique, there are only two different folding patterns are present in all proteins, which are α
Sickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes
Protein Structure Polypeptide: Protein: Therefore: Example: Single chain of amino acids 1 or more polypeptide chains All polypeptides are proteins Some proteins contain >1 polypeptide Hemoglobin (O 2 binding
4. Which carbohydrate would you find as part of a molecule of RNA? a. Galactose b. Deoxyribose c. Ribose d. Glucose
1. How is a polymer formed from multiple monomers? a. From the growth of the chain of carbon atoms b. By the removal of an OH group and a hydrogen atom c. By the addition of an OH group and a hydrogen
BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm377
Vol. 23 no. 19 2007, pages 2558 2565 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm377 Structural bioinformatics Comparative protein structure modeling by combining multiple templates and
A Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in
DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results
Computational Systems Biology. Lecture 2: Enzymes
Computational Systems Biology Lecture 2: Enzymes 1 Images from: David L. Nelson, Lehninger Principles of Biochemistry, IV Edition, Freeman ed. or under creative commons license (search for images at http://search.creativecommons.org/)
Peptide Bonds: Structure
Peptide Bonds: Structure Peptide primary structure The amino acid sequence, from - to C-terminus, determines the primary structure of a peptide or protein. The amino acids are linked through amide or peptide
Protein engineering for structural biology
Protein engineering for structural biology J. Michael Sauder, Ph.D. Lilly Biotechnology Center Eli Lilly and Company San Diego, CA 92121 [email protected] PSDI 2012, Nov 12-13 Structural Biology
Chapter 6 DNA Replication
Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore
Introduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
RNA & Protein Synthesis
RNA & Protein Synthesis Genes send messages to cellular machinery RNA Plays a major role in process Process has three phases (Genetic) Transcription (Genetic) Translation Protein Synthesis RNA Synthesis
DBDB : a Disulfide Bridge DataBase for the predictive analysis of cysteine residues involved in disulfide bridges
DBDB : a Disulfide Bridge DataBase for the predictive analysis of cysteine residues involved in disulfide bridges Emmanuel Jaspard Gilles Hunault Jean-Michel Richer Laboratoire PMS UMR A 9, Université
Hormones & Chemical Signaling
Hormones & Chemical Signaling Part 2 modulation of signal pathways and hormone classification & function How are these pathways controlled? Receptors are proteins! Subject to Specificity of binding Competition
Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments
Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100
Biological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?
Pipe Cleaner Proteins GPS: SB1 Students will analyze the nature of the relationships between structures and functions in living cells. Essential question: How does the structure of proteins relate to their
This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are
This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are put together. 1 A more detailed view of a single protein
Disulfide Bonds at the Hair Salon
Disulfide Bonds at the Hair Salon Three Alpha Helices Stabilized By Disulfide Bonds! In order for hair to grow 6 inches in one year, 9 1/2 turns of α helix must be produced every second!!! In some proteins,
Software Getting Started Guide
Software Getting Started Guide For Research Use Only. Not for use in diagnostic procedures. P/N 001-097-569-03 Copyright 2010-2013, Pacific Biosciences of California, Inc. All rights reserved. Information
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives [email protected] 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
NO CALCULATORS OR CELL PHONES ALLOWED
Biol 205 Exam 1 TEST FORM A Spring 2008 NAME Fill out both sides of the Scantron Sheet. On Side 2 be sure to indicate that you have TEST FORM A The answers to Part I should be placed on the SCANTRON SHEET.
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
Discovering Bioinformatics
Discovering Bioinformatics Sami Khuri Natascha Khuri Alexander Picker Aidan Budd Sophie Chabanis-Davidson Julia Willingale-Theune English version ELLS European Learning Laboratory for the Life Sciences
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Translation Study Guide
Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to
Functional Architecture of RNA Polymerase I
Cell, Volume 131 Supplemental Data Functional Architecture of RNA Polymerase I Claus-D. Kuhn, Sebastian R. Geiger, Sonja Baumli, Marco Gartmann, Jochen Gerber, Stefan Jennebach, Thorsten Mielke, Herbert
Analysis of ChIP-seq data in Galaxy
Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers
Methods for Protein Analysis
Methods for Protein Analysis 1. Protein Separation Methods The following is a quick review of some common methods used for protein separation: SDS-PAGE (SDS-polyacrylamide gel electrophoresis) separates
AP BIOLOGY 2008 SCORING GUIDELINES
AP BIOLOGY 2008 SCORING GUIDELINES Question 1 1. The physical structure of a protein often reflects and affects its function. (a) Describe THREE types of chemical bonds/interactions found in proteins.
Interaktionen von RNAs und Proteinen
Sonja Prohaska Computational EvoDevo Universitaet Leipzig June 9, 2015 Studying RNA-protein interactions Given: target protein known to bind to RNA problem: find binding partners and binding sites experimental
When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
Bioinformatics: course introduction
Bioinformatics: course introduction Filip Železný Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab http://ida.felk.cvut.cz
