The Bioinformatics of Protein Modification
|
|
- Eunice Jacobs
- 7 years ago
- Views:
Transcription
1 The Bioinformatics of Protein Modification (Part 2) Vorlesung 4610 Universität Basel Dr. Michael Rebhan, Friedrich Miescher Institute, Basel, January
2 1. Introduction: what role does bioinformatics play? 2. Mining information related to protein modifications - known modifications - finding proteins with particular modifications 3. Predicting modification sites in proteins: - general concepts - filtering and interpretation - generic tools - modification-specific tools and issues - building your own motif Part 2 4. Related topics: - protein function - mutation effects 5. Online Materials: Exercises, Links
3 Predicting modification sites: Building Your Own Motif: 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search
4 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Collect all relevant sequences: Your own + Public - ExPASy: SWISSPROT - Specialized datasets? online materials (PubMed, Google) Eisenhaber et al (2004) Proteomics 4, Prediction of sequence signals for lipid post-translational modifications: Insights from case studies Keep in mind: - how reliable is the data? (direct evidence?) - importance of the sequence environment around the main motif (see part 1) can reduce false positive rate
5 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Collect all relevant sequences: Your own + Public - ExPASy: SWISSPROT Example: C-linked (man) in the feature descriptions (= C-mannosylation) only those with direct exper. evidence! (is the dataset large & diverse enough?)
6 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Collect all relevant sequences: Your own + Public - ExPASy: SWISSPROT Example: C-linked (man) in the feature descriptions Features look OK query is OK (no preditions etc.) Now get more info, incl. sequence environment
7 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Collect all relevant sequences: Your own + Public - ExPASy: SWISSPROT Example: C-linked (man) in the feature descriptions Back to the query form: Retrieve entry instead of feature, and display key fields in output.
8 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Collect all relevant sequences: Your own + Public - ExPASy: SWISSPROT Example: C-linked (man) in the feature descriptions Why 11? We had 49 features before? (each entry (=protein) can carry a number of features (=modifications)) Click on the entry link (if you d like to include this protein)
9 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Collect all relevant sequences: Your own + Public 1. Find the features you d like to include in the data set ( training set ) 2. Click on its position to get the sequence context 3. Build the alignment in FASTA format (by copy & paste, if it s a small set) 4. Import into alignment viewers (like Jalview,
10 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Analysis of the alignment / data set: - any corrections needed, esp. gaps? - is it large/diverse enough? - sorting, try different color views: In Jalview: By conservation: - which positions show clear constraints? motif boundaries Other constraints: - conserved? ( BLAST ) - secondary structure, accessibility? (Quick2D, SABLE) see part 1 Color: Zappo
11 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Which kind of model to use? - regular expressions (PROSITE patterns) - profiles, like PSI-BLAST - support vector machines (SVMs) Regular expressions: [WDMLYSFHQ]-[TGSAYF]-[QSGCTNEPA]-W- [TGSAI]-[SCGPTVEDQ]-[CW]-[SGEDRANTF] or: W-X-X-[CW] (in S-rich env.) could be useful, but doesn t impose a lot of constraints (and no scoring ) If you d like to use it anyway, you can scan protein databases with this motif at ScanProsite (ExPASy)
12 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Which kind of model to use? - regular expressions (PROSITE patterns) - profiles, like PSI-BLAST - support vector machines (SVMs) ScanProsite: enter pattern, options
13 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Which kind of model to use? - regular expressions (PROSITE patterns) - profiles, like PSI-BLAST - support vector machines (SVMs) ScanProsite results: More: online materials
14 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Which kind of model to use? - regular expressions (PROSITE patterns) - profiles, like PSI-BLAST! - support vector machines (SVMs) Search with the alignment using PSI-BLAST, e.g. at the Bioinformatics Toolkit (MPI Tuebingen) PSSM profile (see part 1)
15 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search First against SWISSPROT to check which proteins get the highest scores e value: 1000, ungapped alignment Also: ScanSite (MIT)! (enhanced regular expressions and PSSM search) Validation / filtering: - Quick2D: secondary structure, disorder - conservation (?)
16 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Which kind of model to use? - regular expressions (PROSITE patterns) - profiles, like PSI-BLAST - support vector machines (SVMs) SVMs: training data function (classification / regression) For classification, SVMs operate by finding a hypersurface in the space of possible inputs. This hypersurface will attempt to split the positive examples from the negative examples. The split will be chosen to have the largest distance from the hypersurface to the nearest of the positive and negative examples. AutoMotif server (using SVMs) Need: - reformat sequences (with a simple replace, e.g. in WordPad) - register at the AutoMotif site (immediate) - submit reformatted alignment & search
17 Predicting modification sites: Building your own motif My dataset is very small and not very diverse anything I can do? Collecting & aligning orthologs: 1. Check SWISSPROT for by similarity features, and, if that s not enough, use myhits (SIB) to collect orthologs with considerable variation (lots of flanking sequence, use 90% identity clustering, against SWISSPROT [and Ensembl], E values 1e-6 and 0.01 select clear hits, then next cycle, then align trustworthy hits) 2. Trim the alignment in Jalview (e.g. in myhits), sort by pairwise id. Demo with MARRSVLYFILLNALINKGQACFCDHYAWTQWTSCSKTCNSGTQSRHRQIVVDKYYQENF
18 Predicting modification sites: Building your own motif Which residues are conserved? Do all these orthologs still carry the same modification? experiments! Search: PSI-BLAST at MPI (as before) (this example: 2 C-mannosyl. sites next to each other)
19 Predicting modification sites: Building your own motif If there are no substrates at all anything I can do? Your have a kinase, by chance? PREDIKIN: potential substrates for different kinds of kinases, based on sequence and type ideas for experiments Brinkworth et al. (2003) PNAS 100:74
20 Predicting modification sites: Building your own motif 1. Building the data set 2. Alignment 3. Analysis of the alignment 4. Motif building & search Which kind of model to use? - regular expressions (PROSITE patterns) - profiles, like PSI-BLAST - support vector machines (SVMs) Need advice? Ask a protein sequence analysis expert
21 SUMMARY Building your own motif Building your own motif is not as hard as you may think The main issue: building a good and informative alignment! Motif building & search: Regular expressions: ScanProsite PSSMs: PSI-BLAST at MPI SVMs: AutoMotif
22 Overview 1. Introduction: what role does bioinformatics play? 2. Mining information related to protein modifications - known modifications - finding proteins with particular modifications 3. Predicting modification sites in proteins: - general concepts - filtering and interpretation - generic tools - modification-specific tools and issues - building your own motif 4. Related topics: - protein function prediction - mutation effects 5. Online Materials: Exercises, Links
23 Protein Function Prediction: Predicting modifications in the context of function prediction Also: - Protein isoforms and the prediction of modifications - Interpretation of potential motifications, e.g. phospho-sites
24 Protein function prediction: Prediction modifications in the context of function prediction MARRSVLYFI LLNALINKGQ ACFCDHYAWT QWTSCSKTCN SGTQSRHRQI VVDKYYQENF CEQICSKQET RECNWQRCPI NCLLGDFGPW SDCDPCIEKQ SKVRSVLRPS QFGGQPCTEP What can be (reliably) predicted from the sequence alone? Domain architecture (and signal peptides): potential molecular interactions proteins with similar domain architecture Tertiary or secondary structure, disorder & accessibility Small motifs: targeting, modifications, transmembrane regions, coiled coils Genomic context & phylogenetic occurrence: hints on functional interactions New predictions are coming out all the time
25 Protein function prediction: our sequence, alternative transcripts How good/complete is the protein sequence we want to check? - is the sequence itself reliable? - is it as complete as we think? - alternative transcripts? Quick check: BLAT at UCSC In this example (translated ORF): - some exons are missing! (alternatively spliced) - alternative TSS exists pick a better sequence! (maybe run the predictions on both & compare)
26 Protein function prediction: Predicting modifications in the context of function prediction Domain architecture, signal peptide & low complexity regions: PFAM, Interpro molecular interactions (if you re lucky), e.g. RNA-binding proteins with similar domain architecture (or composition): PFAM, SMART Signal peptide Low complexity
27 Protein function prediction: Prediction modifications in the context of function prediction MARRSVLYFI LLNALINKGQ ACFCDHYAWT QWTSCSKTCN SGTQSRHRQI VVDKYYQENF CEQICSKQET RECNWQRCPI NCLLGDFGPW SDCDPCIEKQ SKVRSVLRPS QFGGQPCTEP
28 Protein function prediction: Prediction modifications in the context of function prediction MARRSVLYFI LLNALINKGQ ACFCDHYAWT QWTSCSKTCN SGTQSRHRQI VVDKYYQENF CEQICSKQET RECNWQRCPI NCLLGDFGPW SDCDPCIEKQ SKVRSVLRPS QFGGQPCTEP Small motifs: targeting, modifications, transmembrane regions Modifications part 1 Targeting: TargetP (part of ProtFun, see part 1) Disorder, secondary structure, coiled coils etc: Quick2D (at MPI) Quick2D output Transmembrane regions: TMHMM, also: Quick2D, SABLE
29 Protein function prediction: Prediction modifications in the context of function prediction Transmembrane Regions: TMHMM (at CBS), in ProtFun
30 Protein function prediction: Prediction modifications in the context of function prediction Genomic context & phylogenetic occurrence: STRING at EMBL: Which interactions are supported by different methods?
31 Protein function prediction: Protein isoforms and the prediction of modifications BLAT at UCSC alternative transcripts protein isoforms Also: check SWISSPROT! Do they show differences in their potential modification sites? (How could that affect function?) e.g. SWISSPROT:TAU_HUMAN (pos )
32 Protein function prediction: Interpretation of potential motifications Predicted phosphorylation sites protein-protein interactions? ScanSite at MIT (see part 1)
33 SUMMARY Prediction of modification sites in the context of protein function prediction Prediction of protein modifications is often/best done in the context of protein function prediction (comprehensive protein annotation) Many kinds of signals can be found in such sequences, and often they can provide interesting hypotheses Any isoform-specific things? (modifications?) Functional consequences of the modification? (e.g. phospho-sites) Synergy between analyses! (e.g. structure modification sites evolution) Reviews: - F. Eisenhaber (2005) Eurekah Bioscience Collection (at NCBI Books) and the online recipe at - J. Bienkowska (2005) Expert Rev. Proteomics 2:129 - B. Rost (2003) Cell.Mol.Life Sci. 60:2637
34 Mutation Effects: Will a mutation / polymorphism (e.g. SNP) weaken/destroy the potential modification site, or even create a new one? Example: NetPhosK analysis of p53_human cancer variants (pos. 151) some modification sites disappear, others appear! wt Blom et al. (2004) Proteomics 4:1633
35 Overview 1. Introduction: what role does bioinformatics play? 2. Mining information related to protein modifications - known modifications - finding proteins with particular modifications 3. Predicting modification sites in proteins: - general concepts - filtering and interpretation - generic tools - modification-specific tools and issues - building your own motif 4. Related topics: - protein function - mutation effects - analysis of mass spectrometry data 5. Online Materials: Exercises, Links
36 Online Materials: Exercises, Links 1. Protein Function & Structure 2. Modifications: Generic Tools 3. Modification-specific Tools 4. Building Your Own Motif 5. Recommended Materials 6. Exercises
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationLinear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
More informationSequence Information. Sequence information. Good web sites. Sequence information. Sequence. Sequence
Sequence information Multiple Pair-wise SRS Entrez Comparisons Database searches Sequence Information Orthologue clusters Sequence Organell localisation Patterns Protein families Membrane attachment Bengt
More informationID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures
Data resource: In this database, 650 alternatively translated variants assigned to a total of 300 genes are contained. These database records of alternative translational initiation have been collected
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationModule 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
More informationGenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
More informationThe Galaxy workflow. George Magklaras PhD RHCE
The Galaxy workflow George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More informationDistributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed
More informationSimilarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationA Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web
More informationWhen you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
More informationEMBL-EBI Web Services
EMBL-EBI Web Services Rodrigo Lopez Head of the External Services Team SME Workshop Piemonte 2011 EBI is an Outstation of the European Molecular Biology Laboratory. Summary Introduction The JDispatcher
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationDiscovering Bioinformatics
Discovering Bioinformatics Sami Khuri Natascha Khuri Alexander Picker Aidan Budd Sophie Chabanis-Davidson Julia Willingale-Theune English version ELLS European Learning Laboratory for the Life Sciences
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationLibrary page. SRS first view. Different types of database in SRS. Standard query form
SRS & Entrez SRS Sequence Retrieval System Bengt Persson Whatis SRS? Sequence Retrieval System User-friendly interface to databases http://srs.ebi.ac.uk Developed by Thure Etzold and co-workers EMBL/EBI
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationCurrent Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
More informationProtein annotation and modelling servers at University College London
Nucleic Acids Research Advance Access published May 27, 2010 Nucleic Acids Research, 2010, 1 6 doi:10.1093/nar/gkq427 Protein annotation and modelling servers at University College London D. W. A. Buchan*,
More informationError Tolerant Searching of Uninterpreted MS/MS Data
Error Tolerant Searching of Uninterpreted MS/MS Data 1 In any search of a large LC-MS/MS dataset 2 There are always a number of spectra which get poor scores, or even no match at all. 3 Sometimes, this
More informationData mining with Mascot Integra ASMS 2005
Data mining with Mascot Integra 1 What is Mascot Integra? Fully functional out-the-box solution for proteomics workflow and data management Support for all the major mass-spectrometry data systems Powered
More informationMolecular Databases and Tools
NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton
More informationBiological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
More informationBioinformatics for Biologists. Protein Structure
Bioinformatics for Biologists Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research
More informationVaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He
Vaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He Unit for Laboratory Animal Medicine Department of Microbiology and Immunology Center for Computational Medicine
More informationSearching Nucleotide Databases
Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames
More informationComparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
More informationGenome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome
Module 2 Genome Viewing Using Genome Browsers to View Annotation of the Human Genome Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationBIOINFORMATICS TUTORIAL
Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.
More informationBMC Bioinformatics. Open Access. Abstract
BMC Bioinformatics BioMed Central Software Recent Hits Acquired by BLAST (ReHAB): A tool to identify new hits in sequence similarity searches Joe Whitney, David J Esteban and Chris Upton* Open Access Address:
More informationSICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE
AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationProcessing Genome Data using Scalable Database Technology. My Background
Johann Christoph Freytag, Ph.D. freytag@dbis.informatik.hu-berlin.de http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)
More informationCore Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat
More informationCD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction
More informationBLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
More informationProSightPC 3.0 Quick Start Guide
ProSightPC 3.0 Quick Start Guide The Thermo ProSightPC 3.0 application is the only proteomics software suite that effectively supports high-mass-accuracy MS/MS experiments performed on LTQ FT and LTQ Orbitrap
More informationAnalyzing A DNA Sequence Chromatogram
LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ
More informationDatabase Searching Tutorial/Exercises Jimmy Eng
Database Searching Tutorial/Exercises Jimmy Eng Use the PETUNIA interface to run a search and generate a pepxml file that is analyzed through the PepXML Viewer. This tutorial will walk you through the
More informationChoices, choices, choices... Which sequence database? Which modifications? What mass tolerance?
Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS
More informationProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry
ProteinScape Proteomics Data Analysis & Management Innovation with Integrity Mass Spectrometry ProteinScape a Virtual Environment for Successful Proteomics To overcome the growing complexity of proteomics
More informationThe human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.
Tutorial Module 5 BioMart You will learn about BioMart, a joint project developed and maintained at EBI and OiCR www.biomart.org How to use BioMart to quickly obtain lists of gene information from Ensembl
More informationRJE Database Accessory Programs
RJE Database Accessory Programs Richard J. Edwards (2006) 1: Introduction...2 1.1: Version...2 1.2: Using this Manual...2 1.3: Getting Help...2 1.4: Availability and Local Installation...2 2: RJE_DBASE...3
More informationSeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
More informationREGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])
820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor
More informationMASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationSimplifying Data Interpretation with Nexus Copy Number
Simplifying Data Interpretation with Nexus Copy Number A WHITE PAPER FROM BIODISCOVERY, INC. Rapid technological advancements, such as high-density acgh and SNP arrays as well as next-generation sequencing
More informationA classification of tasks in bioinformatics
BIOINFORMATICS Vol. 17 no. 2 2001 Pages 180 188 A classification of tasks in bioinformatics Robert Stevens 1, 2,, Carole Goble 1, Patricia Baker 3 and Andy Brass 2 1 Department of Computer Science, 2 School
More informationModule 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
More informationEfficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
More informationDNA Sequencing Overview
DNA Sequencing Overview DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It is presently conducted using a modified PCR reaction where both normal and labeled
More informationIntroduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov
More informationBUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs
BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs Richard J. Edwards 2008. Contents 1. Introduction... 2 1.1. Version...2 1.2. Using this Manual...2 1.3. Why use BUDAPEST?...2
More informationTHE UNIVERSITY OF MANCHESTER Unit Specification
1. GENERAL INFORMATION Title Unit code Credit rating 15 Level 7 Contact hours 30 Other Scheduled teaching and learning activities* Pre-requisite units Co-requisite units School responsible Member of staff
More informationLearning Objectives:
Proteomics Methodology for LC-MS/MS Data Analysis Methodology for LC-MS/MS Data Analysis Peptide mass spectrum data of individual protein obtained from LC-MS/MS has to be analyzed for identification of
More informationProteome Data Integration: Characteristics and Challenges
Proteome Data Integration: Characteristics and Challenges K. Belhajjame 1, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob 4, S.J. Hubbard 1, D. Jones 3, P. Jones 4, N. Martin 2, S. Oliver 1, C. Orengo
More informationVersion 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
More informationExercises for the UCSC Genome Browser Introduction
Exercises for the UCSC Genome Browser Introduction 1) Find out if the mouse Brca1 gene has non-synonymous SNPs, color them blue, and get external data about a codon-changing SNP. Skills: basic text search;
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationLifeScope Genomic Analysis Software 2.5
USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use
More informationBioinformatics Tools Tutorial Project Gene ID: KRas
Bioinformatics Tools Tutorial Project Gene ID: KRas Bednarski 2011 Original project funded by HHMI Bioinformatics Projects Introduction and Tutorial Purpose of this tutorial Illustrate the link between
More informationTutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF
Tutorial for Proteomics Data Submission Katalin F. Medzihradszky Robert J. Chalkley UCSF Why Have Guidelines? Large-scale proteomics studies create huge amounts of data. It is impossible/impractical to
More informationTeaching Bioinformatics to Undergraduates
Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics
More informationProtein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004
Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationIntegrated design of antibodies for systems biology using AbDesigner
Integrated design of antibodies for systems biology using AbDesigner Trairak Pisitkun, MD Chulalongkorn University Systems Biology (CUSB) Center Epithelial Systems Biology Laboratory (ESBL), National Heart,
More informationSGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
More informationLOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST
Nucleic Acids Research, 2005, Vol. 33, Web Server issue W105 W110 doi:10.1093/nar/gki359 LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST Dan
More informationIntroduction to Bioinformatics 2. DNA Sequence Retrieval and comparison
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov
More informationDatabase schema documentation for SNPdbe
Database schema documentation for SNPdbe Changes 02/27/12: seqs_containingsnps.taxid removed dbsnp_snp.tax_id renamed to dbsnp_snp.taxid General information: Data in SNPdbe is organized on several levels.
More informationSequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
More informationSyllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks
Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks Semester II Paper II: Mathematics I 85 marks B.Sc. II Year
More informationCPAS Overview. Josh Eckels LabKey Software jeckels@labkey.com
CPAS Overview Josh Eckels LabKey Software jeckels@labkey.com CPAS Web-based system for processing, storing, and analyzing results of MS/MS experiments Key goals: Provide a great analysis front-end for
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationGenBank: A Database of Genetic Sequence Data
GenBank: A Database of Genetic Sequence Data Computer Science 105 Boston University David G. Sullivan, Ph.D. An Explosion of Scientific Data Scientists are generating ever increasing amounts of data. Relevant
More informationNote: This document wh_informatics_practical.doc and supporting materials can be downloaded at
Woods Hole Zebrafish Genetics and Development Bioinformatics/Genomics Lab Ian Woods Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at http://faculty.ithaca.edu/iwoods/docs/wh/
More informationWhen you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More informationThree data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk
Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes
More informationDepartment of Microbiology, University of Washington
The Bioverse: An object-oriented genomic database and webserver written in Python Jason McDermott and Ram Samudrala Department of Microbiology, University of Washington mcdermottj@compbio.washington.edu
More informationMultiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker
Multiple Sequence Alignment Hot Topic 5/24/06 Kim Walker Outline Why are Multiple Sequence Alignments useful? What Tools are Available? Brief Introduction to ClustalX Tools to Edit and Add Features to
More informationClone Manager. Getting Started
Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software
More informationUsing MATLAB: Bioinformatics Toolbox for Life Sciences
Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY
More informationPPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
More informationHuman Genome Organization: An Update. Genome Organization: An Update
Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion
More informationA Tutorial in Genetic Sequence Classification Tools and Techniques
A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University jakemdrew@gmail.com www.jakemdrew.com Sequence Characters IUPAC nucleotide
More informationP G DIPLOMA IN BIOINFORMATICS
P G DIPLOMA IN BIOINFORMATICS Name Course Code Name of the Course Credits PGD BINF 301 Introduction to Bioinformatics and Databases 2 Module I PGD BINF 302 Genome and Protein Sequence Analysis 2 Basic
More informationUsability in bioinformatics mobile applications
Usability in bioinformatics mobile applications what we are working on Noura Chelbah, Sergio Díaz, Óscar Torreño, and myself Juan Falgueras App name Performs Advantajes Dissatvantajes Link The problem
More informationHow To Use The Assembly Database In A Microarray (Perl) With A Microarcode) (Perperl 2) (For Macrogenome) (Genome 2)
The Ensembl Core databases and API Useful links Installation instructions: http://www.ensembl.org/info/docs/api/api_installation.html Schema description: http://www.ensembl.org/info/docs/api/core/core_schema.html
More informationGlobal and Discovery Proteomics Lecture Agenda
Global and Discovery Proteomics Christine A. Jelinek, Ph.D. Johns Hopkins University School of Medicine Department of Pharmacology and Molecular Sciences Middle Atlantic Mass Spectrometry Laboratory Global
More informationSequence homology search tools on the world wide web
44 Sequence Homology Search Tools Sequence homology search tools on the world wide web Ian Holmes Berkeley Drosophila Genome Project, Berkeley, CA email: ihh@fruitfly.org Introduction Sequence homology
More information