Genome Bioinformatics Protein Families Annotation Phylogeny I Molecule Compare Domains? Compare Epression Similar Proteins? Epression What is a phylogenetic tree? How to make a phylogenetic tree? TLR Some of the slides in this lecture are courtesy of Jaap Heringa, Anders Gorm Pedersen and Michael Rosenerg Hierarchical Classification: Linnaeus Tree: depiction (formalization) of classification Carl Linnaeus 707-778 Theory of evolution The only figure in Darwin s On the Origin of Species is Charles Darwin 809-88
Phylogenetic trees. historical pattern of relationships among organisms: interpretation of a tree e.g. Flow of Time How to read a phylogenetic tree? Ancestors Trees are useful in ioinformatics eyond phylogeny of species. Where else can phylogenetic trees e used? Progressive multiple alignment general principles 4 5 5 5 Scores to distances Scores Similarity matri Score - Score - Score 4-5 Guide tree Multiple alignment Other trees (=clusters): gene epression Phylogenetic Trees Unrooted Rooted
Unrooted vs rooted Trees Trees and evolutionary time Unrooted vs rooted Trees Phylogenies using characters Faster Evolution Molecular Phylogeny changed taonomy Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood
Phylogenetic tree y Distance methods (Clustering) 4 5 5 5 Similarity criterion Scores Multiple alignment Distance matri Phylogenetic tree Distances Evolutionary sequence distance = sequence dissimilarity Human -KITVVGVGAVGMACAISILMKDLADELALVDVIEDKLKGEMMDLQHGSLFLRTPKIVSGKDYNVTANSKLVIITAGARQ Chicken -KISVVGVGAVGMACAISILMKDLADELTLVDVVEDKLKGEMMDLQHGSLFLKTPKITSGKDYSVTAHSKLVIVTAGARQ Dogfish KITVVGVGAVGMACAISILMKDLADEVALVDVMEDKLKGEMMDLQHGSLFLHTAKIVSGKDYSVSAGSKLVVITAGARQ Lamprey SKVTIVGVGQVGMAAAISVLLRDLADELALVDVVEDRLKGEMMDLLHGSLFLKTAKIVADKDYSVTAGSRLVVVTAGARQ Barley TKISVIGAGNVGMAIAQTILTQNLADEIALVDALPDKLRGEALDLQHAAAFLPRVRI-SGTDAAVTKNSDLVIVTAGARQ Maizey casei -KVILVGDGAVGSSYAYAMVLQGIAQEIGIVDIFKDKTKGDAIDLSNALPFTSPKKIYSA-EYSDAKDADLVVITAGAPQ Bacillus TKVSVIGAGNVGMAIAQTILTRDLADEIALVDAVPDKLRGEMLDLQHAAAFLPRTRLVSGTDMSVTRGSDLVIVTAGARQ Lacto ste -RVVVIGAGFVGASYVFALMNQGIADEIVLIDANESKAIGDAMDFNHGKVFAPKPVDIWHGDYDDCRDADLVVICAGANQ Lacto_plant QKVVLVGDGAVGSSYAFAMAQQGIAEEFVIVDVVKDRTKGDALDLEDAQAFTAPKKIYSG-EYSDCKDADLVVITAGAPQ Therma_mari MKIGIVGLGRVGSSTAFALLMKGFAREMVLIDVDKKRAEGDALDLIHGTPFTRRANIYAG-DYADLKGSDVVIVAAGVPQ Bifido -KLAVIGAGAVGSTLAFAAAQRGIAREIVLEDIAKERVEAEVLDMQHGSSFYPTVSIDGSDDPEICRDADMVVITAGPRQ Thermus_aqua MKVGIVGSGFVGSATAYALVLQGVAREVVLVDLDRKLAQAHAEDILHATPFAHPVWVRSGW-YEDLEGARVVIVAAGVAQ Mycoplasma -KIALIGAGNVGNSFLYAAMNQGLASEYGIIDINPDFADGNAFDFEDASASLPFPISVSRYEYKDLKDADFIVITAGRPQ Distance Matri 4 5 6 7 8 9 0 0.000 0. 0.8 0.0 0.78 0.46 0.50 0.55 0.5 0.54 0.58 0.65 0.67 Human Chicken 0. 0.000 0.55 0.4 0.8 0.48 0.58 0.569 0.56 0.54 0.54 0.6 0.65 Dogfish 0.8 0.55 0.000 0.96 0.89 0.7 0.5 0.567 0.56 0.5 0.54 0.600 0.655 4 Lamprey 0.0 0.4 0.96 0.000 0.46 0.56 0.55 0.589 0.544 0.50 0.544 0.66 0.669 5 Barley 0.78 0.8 0.89 0.46 0.000 0.7 0.56 0.565 0.56 0.547 0.56 0.69 0.575 6 Maizey 0.46 0.48 0.7 0.56 0.7 0.000 0.557 0.56 0.58 0.555 0.58 0.64 0.587 7 Lacto_casei 0.50 0.58 0.5 0.55 0.56 0.557 0.000 0.58 0.08 0.445 0.56 0.56 0.50 8 Bacillus_stea 0.55 0.569 0.567 0.589 0.565 0.56 0.58 0.000 0.477 0.56 0.56 0.598 0.495 9 Lacto_plant 0.5 0.56 0.56 0.544 0.56 0.58 0.08 0.477 0.000 0.4 0.489 0.56 0.485 0 Therma_mari 0.54 0.54 0.5 0.50 0.547 0.555 0.445 0.56 0.4 0.000 0.5 0.405 0.598 Bifido 0.58 0.54 0.54 0.544 0.56 0.58 0.56 0.56 0.489 0.5 0.000 0.604 0.64 Thermus_aqua 0.65 0.6 0.600 0.66 0.69 0.64 0.56 0.598 0.56 0.405 0.604 0.000 0.64 Mycoplasma 0.67 0.65 0.655 0.669 0.575 0.587 0.50 0.495 0.485 0.598 0.64 0.64 0.000 NB ecause evo distance we otain a phylogenetic tree 5 5 Clustering Scores Single linkage - Nearest neighour Cluster criterion Complete linkage Furthest neighour Group averaging UPGMA Neighour joining Distance matri Phylogenetic tree Clustering algorithm: UPGMA human - mouse - fugu 4 4 - Yeast 8 8 8 - human fugu mouse human mouse Fugu 4 Yeast Evolutionary clock speeds Uniform clock: Ultrametric distances lead to identical distances from root to leaves UPGMA trees would e correct if evolution had a uniform clock, ut it often did not! Neighour-Joining (Saitou and Nei, 987) Gloal : keeps total ranch length minimal At each step, join two nodes that are considering their respective distance to all other nodes, closest Leads to unrooted tree Non-uniform evolutionary clock: leaves have different distances to the root
Neighour joining Neighour joining y At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. Neighour joining Introduce a root y root y Yeast ranch human root ranch y y fugu mouse Yeast fugu mouse human At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. internal node leaf internal node (ancestor) leaf How to root a tree How to root a tree: outgroup Outgroup place root etween distant (still homolog) sequence and rest group Midpoint place root at midpoint of longest path (sum of ranches etween any two leafs) Gene duplication place root etween paralogous gene copies Y f-β fugu Yeast f 5 mouse m human h f-α Y f m h Y f m h h-α h-β f-α h-α f-β h-β 4
Orthologs and paralogs
Gene duplication and gene loss Simple real life eample Kinase-5: essential for centrosome separation in mitosis Gene duplication: divergence of a gene within one genome Let's tell a story Verterate Toll-Like Receptors Spanish Flu (98) Roach, Jared C. et al. (005) Proc. Natl. Acad. Sci. USA 0, 9577-958 Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood Parsimony 4 5 6 7 A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a
Parsimony 4 5 6 7 A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a Parsimony 4 5 6 7 A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a Informative sites are the sites where at least two different characters occur at least twice. Another eample Another eample 4 5 6 7 Human c c t t g a a Chimp c c t t g a a Gorilla c c t a g t a Gion t c a a g a a Orangutan t c a a g a t 4 5 6 7 Human c c t t g a a Chimp c c t t g a a Gorilla c c t a g t a Gion t c a a g a a Orangutan t c a a g a t Chimp Gion Human Gorilla Orangutan Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood Maimum likelihood If data = alignment, hypothesis = tree, and under a given evolutionary model (e.g. Sustitution matri): compute likelihood that the hypothesis (=tree), given a model (e.g. sustitution matri), results in the oserved data (= multiple sequence alignment). maimum likelihood selects the hypothesis (tree) that maimises the oserved data Etremely time consuming method Best approach to find the true tree
Parsimony, Maimum Likelihood or Neighor- Joining? Common practice: use all methods and compare trees Data is of greater importance than method As with alignments, one must rememer that a phylogenetic tree is a hypothesis of the true evolutionary history. As a hypothesis it could e right or wrong or a it of oth. If we would know the true tree of life we would also know which method is est. How to assess confidence in tree Distance method ootstrap: Select multiple alignment columns with replacement Recalculate tree Compare ranches with original tree Repeat 00-000 times, so calculate 00-000 different trees How often is ranching preserved for each internal node? Uses samples of the data The Bootstrap The Bootstrap Original 4 5 6 7 8 C C V K V I Y S M A V R L I F S M A L R L L F S The Bootstrap The Bootstrap Original 4 5 6 7 8 C C V K V I Y S M A V R L I F S M A L R L L F S 4 5 6 7 8 C C V K V I Y S Original M A V R L I F S M A L R L L F S 4 8 6 6 8 6 V K V S I I S I Scramled V R V S I I S I L R L T L L S L Nonsupportive
The Bootstrap Bootstrap eample 85 times 5 times 85 Horizontal (lateral) gene transfer: The evolutionary history of a gene is not always consistent with the history of the species! Detecting HGT in trees Eukaryotes Aminoacyl-tRNA synthetase Discovering horizontal gene transfer y: Comparing phylogenetic trees of the species (SSU rrna) and that of the gene in question. Be careful however!! The sequences have to e orthologous to each other. Ancient gene duplications followed y differential loss can also give rise to horizontal gene transfer like trees. Archaea Leucine Aminoacyl-tRNA synthetase. Bacteria Detecting HGT in trees Detecting HGT in trees Eukaryotes Archaea Archaea Eukaryotes Bacteria Bacteria No apparent Horizontal Gene Transfer in the evolution of Leucine Aminoacyl-tRNA synthetase (the phylogeny of the sequences fits more or less the species phylogeny). Proline Aminoacyl-tRNA synthetase. Archaea Eukaryotes Bacteria?
Detecting HGT in trees Archaea Eukaryotes Bacteria Apparent Horizontal Gene Transfer to the parasites Bu (B.urgdorferi) and Mge, Mpe (Mycoplasmas) from the Eukaryotes represented y Cel (C.elegans) and Sce (S.cerevisiae) Let's tell a story MHC molecules Let's tell a story MHC molecules Another use of Phylogenies