Linear Sequence Analysis. 3-D Structure Analysis



Similar documents
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Bioinformatics Grid - Enabled Tools For Biologists.

Guide for Bioinformatics Project Module 3

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Sequence Information. Sequence information. Good web sites. Sequence information. Sequence. Sequence

Biological Databases and Protein Sequence Analysis

Module 1. Sequence Formats and Retrieval. Charles Steward

Bioinformatics for Biologists. Protein Structure

EMBL-EBI Web Services

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

Discovering Bioinformatics

Protein annotation and modelling servers at University College London

Pairwise Sequence Alignment

Specific problems. The genetic code. The genetic code. Adaptor molecules match amino acids to mrna codons

Bio-Informatics Lectures. A Short Introduction

Structure Tools and Visualization

Lecture 19: Proteins, Primary Struture

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Introduction to Genome Annotation

BIOINFORMATICS TUTORIAL

Molecular Databases and Tools

Databases and mapping BWA. Samtools

Chapter 3. Protein Structure and Function

Protein Sequence Analysis - Overview -

Lecture Series 7. From DNA to Protein. Genotype to Phenotype. Reading Assignments. A. Genes and the Synthesis of Polypeptides

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Transcription and Translation of DNA

Scientific databases. Biological data management

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London

Introduction to Proteomics

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Bioinformatics Resources at a Glance

UNIT I LESSON -1 INTRODUCTION TO BIOINFORMATICS

Bioinformatics: course introduction

An agent-based layered middleware as tool integration

Module 10: Bioinformatics

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Protein Synthesis How Genes Become Constituent Molecules

CD-HIT User s Guide. Last updated: April 5,

Lab # 12: DNA and RNA

RNA & Protein Synthesis

Translation Study Guide

PeptidomicsDB: a new platform for sharing MS/MS data.

Template-based protein structure modeling using the RaptorX web server

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Integrated design of antibodies for systems biology using AbDesigner

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Processing Genome Data using Scalable Database Technology. My Background

Teaching Bioinformatics to Undergraduates

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Transmembrane proteins span the bilayer. α-helix transmembrane domain. Multiple transmembrane helices in one polypeptide

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Proteins and Nucleic Acids

Library page. SRS first view. Different types of database in SRS. Standard query form

Lecture 8. Protein Trafficking/Targeting. Protein targeting is necessary for proteins that are destined to work outside the cytoplasm.

Biological Sequence Data Formats

Database schema documentation for SNPdbe

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

Database manager does something that sounds trivial. It makes it easy to setup a new database for searching with Mascot. It also makes it easy to

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

Preliminary MFM Quiz

Sequence homology search tools on the world wide web

Bioinformatics Tools Tutorial Project Gene ID: KRas

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Molecular Genetics. RNA, Transcription, & Protein Synthesis

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Integration of data management and analysis for genome research

Protein engineering for structural biology

Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks

AP BIOLOGY 2010 SCORING GUIDELINES (Form B)

Consensus alignment server for reliable comparative modeling with distant templates

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Interaktionen von RNAs und Proteinen

Structure and Function of DNA

Amino Acids and Their Properties

BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs

Built from 20 kinds of amino acids

H H N - C - C 2 R. Three possible forms (not counting R group) depending on ph

P G DIPLOMA IN BIOINFORMATICS

Concluding lesson. Student manual. What kind of protein are you? (Basic)

Myoglobin and Hemoglobin

國 立 交 通 大 學. 同 源 蛋 白 質 - 蛋 白 質 交 互 作 用 之 研 究 A Study of Homologous Protein-protein Interactions 生 物 科 技 學 系 博 士 論 文

GenBank, Entrez, & FASTA

Course Outline. 1. COURSE INFORMATION Session Offered Winter 2012 Course Name Biochemistry

1 Mutation and Genetic Change

Carbohydrates, proteins and lipids

Transcription:

Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic v. hydrophobic regions) Does not take into account post-translational modifications of protein, so are usually not 100% accurate Identify sequence motifs and families Signal sequences, transmembrane domains, coiled-coils, posttranslational modification sites, secondary structure (nonhomologous) Domains, functional motifs (homologous) 3-D Structure Analysis Visualization Domain structure, global fold, active sites, point mutations, SNPs, splice sites Evaluate structure quality Calculate physical properties Surface areas, distances, side-chain conformations, contact maps Structural alignment (ie similarity to other structures) Prediction Physical properties: binding affinity, pka s stability, specificity 3D structure (homology modeling, fold recognition, de novo) Advanced: protein design, docking of two proteins, active site modeling

Sequence Databases SwissProt (ExPASy) Highly curated, updated less frequently TrEMBL (ExPASy) Translated nucleotide sequences Automatic translation, fast but less info UniProt (EBI) Unified Protein Resource Combines SwissProt, TrEMBL, PIR sequences Sequence Analysis Sites For protein sequences and tools to analyze them, the two major centers are: ExPASy : Expert Protein Analysis System Many tools: http://ca.expasy.org/tools/ Databases: SwissProt, TrEMBL NCBI : Entrez Protein and Domains PIR: Protein Information Resource (folded into UniProt consortium; no longer major resource site)

More Sequence Databases Non-redundant NR (NCBI), UniRef (PIR/EBI) Reference RefSeq (NCBI) re-annotated by NCBI Domains/Families Pfam protein families (Sanger Center + 4 mirror sites) SMART Simple Modular Architecture Research Tool CDD Conserved protein Domain Database (NCBI), combines Pfam, SMART, and COGs databases InterPro (based on UniProt, at EMBL-EBI) Many others Structure Databases Experimental: PDB: Protein Data Bank Families: SCOP, CATH, Dali database, Homstrad Models/Predictions ModBase SwissModel NOTE: All these databases are described in January Database issue of Nucleic Acids Research (plus other kinds of databases). Also, links to them

Protein Sequence Analysis Tools ExPASy Proteomics Tools Calculate physical properties Predict sequence motifs what ExPASy calls Topology : localization, TM domains Signal sequences, postranslational modifications Search pattern and profile collections PredictProtein and Meta-PP A meta-server providing access to many servers with one submission form Secondary Structure Prediction Three good methods: Psipred Sam-T02/T04/T06 PhD (PredictProtein) Compare a couple methods Use the three-state predictions

SEQUENCE <--> STRUCTURE <--> FUNCTION Evolutionary selection operates on function Structure is more closely linked to function than is sequence, so structure tends to be more conserved than sequence. Need to search farther in sequence space to find proteins with related structures and functions. Detecting Remote Similarities Remote similarities can more easily be detected by comparing protein sequences DNA sequences change faster than protein sequences (wobble position, redundant codons) 4 letter DNA code vs. 20 letter amino acid code means that matches by chance are more likely in DNA; The protein code has more information in it!

Detecting Homology NEAR Evolutionary Distance FAR DNA Sequence BLASTn BLASTp Protein Sequence PSIBLAST Protein Structure Fold Recognition SIMILARITY METHODS Similar Sequences Share Similar Structures Compare all pairs of proteins in the same family (pairs for which homology is very probable) Homologs do not necessarily share much sequence similarity. Proteins with >30% sequence identity almost always share the same fold More structurally similar All other immunoglobulins Sauder et al., (2000) Proteins 40:6