Protein Sequence Analysis - Overview -



Similar documents
Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Introduction to Phylogenetic Analysis

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Introduction to Bioinformatics AS Laboratory Assignment 6

4. Why are common names not good to use when classifying organisms? Give an example.

Phylogenetic Trees Made Easy

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

The Central Dogma of Molecular Biology

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Linear Sequence Analysis. 3-D Structure Analysis

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Guide for Bioinformatics Project Module 3

Visualization of Phylogenetic Trees and Metadata

Principles of Evolution - Origin of Species

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

PHYLOGENETIC ANALYSIS

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Bio-Informatics Lectures. A Short Introduction

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Introduction to Bioinformatics 3. DNA editing and contig assembly

Systematics - BIO 615

Pairwise Sequence Alignment

Section 3 Comparative Genomics and Phylogenetics

Final Project Report

BIOINFORMATICS TUTORIAL

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

17.1. The Tree of Life CHAPTER 17. Organisms can be classified based on physical similarities. Linnaean taxonomy. names.

Building a phylogenetic tree

KEY CONCEPT Organisms can be classified based on physical similarities. binomial nomenclature

Bioinformatics Grid - Enabled Tools For Biologists.

Taxonomy and Classification

WJEC AS Biology Biodiversity & Classification (2.1 All Organisms are related through their Evolutionary History)

Genome Explorer For Comparative Genome Analysis

Core Bioinformatics. Degree Type Year Semester

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

AP Biology Essential Knowledge Student Diagnostic

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Bayesian Phylogeny and Measures of Branch Support

Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes

Bioinformatics Resources at a Glance

Consensus alignment server for reliable comparative modeling with distant templates

Phylogenetic Analysis using MapReduce Programming Model

CD-HIT User s Guide. Last updated: April 5,

AS Biology Unit 2 Key Terms and Definitions. Make sure you use these terms when answering exam questions!

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Biological Databases and Protein Sequence Analysis

Keywords: evolution, genomics, software, data mining, sequence alignment, distance, phylogenetics, selection

1. Over the past century, several scientists around the world have made the following observations:

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

Worksheet - COMPARATIVE MAPPING 1

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Name: Date: Problem How do amino acid sequences provide evidence for evolution? Procedure Part A: Comparing Amino Acid Sequences

DNA Sequence Alignment Analysis

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Break down material outside their body and then absorb the nutrients. Most are single-celled organisms Usually green. Do not have nuclei

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

MASCOT Search Results Interpretation

Computational Systems Biology. Lecture 2: Enzymes

Theory of Evolution. A. the beginning of life B. the evolution of eukaryotes C. the evolution of archaebacteria D. the beginning of terrestrial life

Genomes and SNPs in Malaria and Sickle Cell Anemia

Module 10: Bioinformatics

2.3 Identify rrna sequences in DNA

Evidence for evolution factsheet

EMBL-EBI Web Services

MAKING AN EVOLUTIONARY TREE

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

Bioinformatics: course introduction

Network Protocol Analysis using Bioinformatics Algorithms

Data for phylogenetic analysis

Introduction to Proteins and Enzymes

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Bioinformatics for Biologists. Protein Structure

LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives

Molecular Databases and Tools

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

A comparison of methods for estimating the transition:transversion ratio from DNA sequences

Structure Tools and Visualization

Discovering Bioinformatics

Protein annotation and modelling servers at University College London

Chapter 5: The Structure and Function of Large Biological Molecules

Amino Acids and Their Properties

Given these characteristics of life, which of the following objects is considered a living organism? W. X. Y. Z.

Protein Protein Interaction Networks

Analyzing A DNA Sequence Chromatogram

II. Germ Layers Ontogeny can reveal a great deal about evolutionary relationships. Answer and discuss the following:

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME.

Transcription:

Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center

Topics Why do protein sequence analysis? Searching sequence databases (similarity search) Post-processing search results Protein classification & function prediction. Detecting remote homologs Multiple sequence alignment and Phylogenetic analysis

Protein bioinformatics: protein sequence analysis Helps characterize protein sequences in silico and allows prediction of protein structure and function Statistically significant BLAST hits usually signifies sequence homology Homologous sequences may or may not have the same function but would always (very few exceptions) have the same structural fold Protein sequence analysis allows protein classification

Comparative protein sequence analysis and evolution Patterns of conservation in sequences allows us to determine which residues are under selective constraint (and thus likely important for protein function) Comparative analysis of proteins is more sensitive than comparing DNA Homologous proteins have a common ancestor Different proteins evolve at different rates Protein classification systems based on evolution: PIRSF and COG

Comparing proteins Amino acid sequence of protein generated from proteomics experiment e.g. protein fragment DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFCKFT Amino-acids of two sequences can be aligned and we can easily count the number of identical residues (or use an index of similarity) as a measure of relatedness. Protein structures can be compared by superimposition

Protein sequence alignment Pairwise alignment a b a c d a b _ c d Multiple sequence alignment provides more information a b a c d a b _ c d x b a c e MSA difficult to do for distantly related proteins

Protein sequence analysis overview Protein databases PIR (pir.georgetown.edu) and UniProt (www.uniprot.org) Searching databases Peptide search, BLAST search, Text search Information retrieval and analysis Protein records at UniProt and PIR Multiple sequence alignment Secondary structure prediction Homology modeling

Query Sequence Unknown sequence is Q9I7I7 BLAST Q9I7I7 against the UniProt Knowledgebase (http://www.uniprot.org/search/blast.shtml) Analyze results

BLAST results

SIR2_HUMAN protein record

Are Q9I7I7 and SIR2_HUMAN homologs? Check BLAST results Check pairwise alignment

Protein structure prediction Programs can predict secondary structure information with 70% accuracy Homology modeling - prediction of target structure from closely related template structure

Secondary structure prediction http://bioinf.cs.ucl.ac.uk/psipred/

Secondary structure prediction results

Sir2 structure

Homology modeling http://www.expasy.org/swissmod/swiss-model.html

Homology model of Q9I7I7 Blue - excellent Green - so so Red - not good Yellow - beta sheet Red - alpha helix Grey - loop

Sequence features: SIR2_HUMAN

Multiple sequence alignment

Multiple sequence alignment Q9I7I7, Q82QG9, SIR2_HUMAN

Identifying Remote Homologs

Function prediction

Function prediction

Molecular Phylogenetics and Evolution Overview History of phylogenetics Sequence analysis and classification Methods in phylogenetic analysis

Phylogenetics Field of biology that studies the evolutionary relationships between organisms, proteins or genes that share a common ancestor Phylogenetics includes the discovery (estimation) of these relationships, and the study of the causes behind this pattern Phylogenetics is related taxonomy

Tree of Life Aristotle (384 BC 322 BC), classified all living organisms as either a plant or an animal. Whittaker (1969), summarized the "Five Kingdoms" of life: animals, plants, fungi, protists ("protozoa"), and monera (bacteria). R. H. Whittaker, Science 163, 150 (1969) Zuckerkandl et al. (1965) forwarded the concept that sequences could be used to relate organisms. E. Zuckerkandl et al. Biol. 8, 357 (1965). Woese (1990) proposed "urkingdoms" or "domains": Eucarya (eukaryotes), Bacteria (initially called eubacteria), and Archaea (initially called archaebacteria). Woese et al.proc. Natl. Acad. Sci. U.S.A. 87, 4576 (1990). Norman R. Pace. 1997. Science Vol. 276. 734-740

History of Phylogenetics Charles Darwin.1859. Author of The Origin of Species Ernst Haeckel. 1892. Mapped a genealogical tree relating all animal life. Romanes's 1892 copy of Ernst Haeckel's allegedly fraudulent embryo drawings.

Monophyly, Paraphyly & Polyphyly Phylogenetics Wikipedia

Molecular Phylogenetics Morphological or organismal character evolution not as consistent compared to molecular evolution Can be used to study any organism Rates of evolution can be studied in greater detail Abundant data available

Evolutionary Change in DNA Several models have been proposed to study the mechanisms of DNA evolution Jukes and Cantor s One- Parameter Model assumes no bias in the direction of change so the substitution occur randomly among four types of nucleotides. Kimura s Two-Parameter model transitions are generally more frequent than transversions. The rate of transitional substitution is different than the rate of transversional substitution Rate of change is dependent upon the rate of substitution and pattern of substitution A C T G A > C > T A C > G G T > A A A > C > T C G C Ancestral sequence A C T G A A C G T A A C G C A C > A T G A A C > A G T > A A A > T C G C > T > C Sequence 1 Sequence 2 Single substitution Multiple substitution Coincidental substitution Parallel substitution Convergent substitution Back substitution From Li and Graur 1991

Evolutionary Change in Protein Synonymous and nonsynonymous substitutions: Substitutions that result in amino acid replacements are said to be nonsynonymous while substitutions that do not cause an amino acid replacement are said to be synonymous substitutions Changes within the same amino acid classes. Example, hydrophobic, charged, etc.

Tutorial Retrieve 1FSI (PDB id) sequence and related sequences from UniProtKB using BLAST Align all the sequences in Clustal (desktop version) Generate tree (using Clustal) View tree (http://www.phylowidget.org/; http://www.proweb.org/treeviewer/)

Representation Of Phylogeny The evolutionary relationship between two proteins can be represented in the form of a tree A phylogeny is a bifurcating tree with nodes and branches and a root (represents the common ancestor) clade Branch Protein 1a Node Root Protein 1b Protein 1c Protein 1d Homologous proteins

Terminology Clade A monophyletic taxon Taxon any named group of organisms; not necessarily a clade Branches branches connect nodes Nodes any bifurcating branch point

Common Phylogenetic Tree Layout rectangular cladogram slanted cladogram Phylogram (branch lengths proportional to distance) Radial 11

Rooted vs. Unrooted Phylogenies R unrooted rooted only relationships not the evolutionary path root (R) is the common ancestor

How to Construct A Phylogenetic Tree Construct a multiple sequence alignment Determine the substitution model Build tree Evaluate tree

Bootstrapping Bootstrapping is a resampling tree evaluation method A number associated with a particular branch in the tree that gives the proportion of bootstrap replicates that support the monophyly of the clade Two-step process generation of many new data sets from the original set and then the computation of a number that tells how often a particular branch appears in the tree

Distance - Neighbor-joining Method NJ algorithm commonly is applied with distance tree building The fully resolved tree is decomposed from a fully unresolved star tree by inserting branches between a pair of closest neighbors and the remaining terminals in the tree. The process is repeated. Rapid method.

Function Prediction From Evolutionary Classification Example PFK: Phosphofructokinase classification revealed that major functional specialization can occur as a result not only of major sequence changes but also by mutation of a single amino-acid residue. Families E. coli (P06998) Gly105 Gly125 Classification tree ATP_PFK_DR0635 ATP_PFK_euk PPi_PFK_PfpB PPi_PFK_TM0289 PPi_PFK_TP0108 PPi_PFK_SMc01852 ATP-PFK: Gly105 + Gly125 PPi-PFK: Gly/Asp105 + Lys125 PFK_XF0274

Contact Myself- rm285@georgetown.edu UniProt- help@uniprot.org PIR- pirmail@georgetown.edu