Unraveling protein networks with Power Graph Analysis PLoS Computational Biology, 2008 Loic Royer Matthias Reimann Bill Andreopoulos Michael Schroeder Schroeder Group Bioinformatics 1
Complex Networks in Biology Direct visualization too much detail Clustering / Coarse graining loss of detail Is there a middle ground? 2
SH3 Interaction Profiles Landgraf et al. (2004) Discovery of peptides that bind 8 SH3 domains in Yeast. What is interesting about the interaction profiles of these SH3 domains? 3
Bipartite Regulatory Network Beyer et al. (2006) A bipartite network between transcription factors and target genes in Yeast. What insights can be gathered about the poorly characterized factor YAP7? 4
Phosphatase Similarity Network (Own data) An network connecting Tyrosine Phosphatases if their sequences are similar enough (BLAST e < 10-46). Is it at all possible to find something interesting in there? 5
Solution: 'To comprehend is to compress' Gregory Chaitin 6
Solution: 'To comprehend is to compress' Gregory Chaitin Outline: Power Graph Analysis 3 Examples Statistics of Compressibility 2 step Algorithm 7
The Language of Power Graphs Solution: Transform networks into Power Graphs by clustering both nodes and edges Biclique Clique Star This is a reversible transformation that preserves all connectivity information 8
Protein Interactions Hubs in networks hub Protein Complexes Domain and motif induced interactions 9
Beyond Protein Interactions Regulatory networks Transcription factors Homology / Paralogy Networks 10
SH3 Interaction Profiles The power graph improves the readability of the network. Is there biology explaining the way the peptides are grouped? 11
SH3 Interaction Profiles Green: SH3 domains, Red: PxxPxR motif, Blue: RxxPxxP motif Now, what about the SH3 domains? 12
SH3 Interaction Profiles The neighborhood similarity implied by the power graph reflects the sequence similarity of the SH3 domains. Example: LSB3 and YSC84 have similar sequences but also similar binding profiles 13
Bipartite Regulatory Network Transcription factors are clustered according to their target genes Target genes are clustered according to their transcription factors 14
Bipartite Regulatory Network YAP7 All 6 factors involved in Yeast stress response YAP1/2 regulate metal detoxification genes Hypothesis: poorly characterized YAP7 too 15
Phosphatase Similarity Network 16
Phosphatase Similarity Network 6 type B receptor PTPs are linked by a power edge to two type 2 non-receptor PTPs 17
Phosphatase Similarity Network The second tyrosine phosphatase domain of the two type G PTPs align to an unannotated region of about 370 amino acids with a sequence identity of 14% and a similarity of 39% This is evidence of domain erosion 18
Power Graph Analysis T o com prehend is to compress Power Graph Analysis reduces redundant information 18 edges 2 power nodes 2 power edges Edges become power nodes and power edges Conversion rate is one power node for 8 edges Edge reduction is 88% Overall less symbols needed: 4 instead of 18 19
Empirical Statistical Analysis Higher compression levels are achieved for biological networks than for rewired networks of same degree distribution Thus the scale-free degree distribution is not the explanation Original Rewired 20
Power Graph Spectrum Cliques and bicliques almost disappear after rewiring Same holds for manually curated networks (SIN, HPRD) 21
Domains and GO Terms Cliques and bicliques have a biological explanation Power nodes are enriched in InterPro domains And enriched in GO terms: Domains are a better explanation for cliques and bicliques than GO terms. 22
The Power Graph Algorithm Problem: Minimal decomposition into cliques and bicliques Similar problems: Minimal partition into cliques is NP-hard (Kratzke 88) Minimal biclique partition is NP-complete (Duh 97) Our solution: a greedy search, two steps: 23
Summary Compress Power Graphs compress networks without loss of information Compression levels up to 95% are possible High compressibility is lost after degree invariant rewiring Fast, greedy algorithm, applicable for many types of networks. Comprehend Half of power nodes have a Domain or GO term enrichment SH3 domain interaction profiles reflects phylogeny Function prediction for transcription factor Discovery of an eroded Phosphatase domain Try it! Available for Cytoscape and as command line tool. GOOGLE FOR: Power Graph Analysis 24
Acknowledgments Matthias Reimann Bill Andreopoulos Christof Winter Michael Schroeder Participant travel costs to present the project described was supported by Award Number R13GM085877 from the U.S National Institute of General Medical Sciences. The content is solely the responsibility of the author(s) and does not necessarily represent the official views of the National Institute of General Medical Sciences of the National Institutes of Health. M ichael S chroeder G roup B iotec D resden University of T echnology 25
GCB 2008 German Conference on Bioinformatics A Systems Approach to Disease Dresden September 9 12, 2008 www.gcb2008.de Posters and highlight papers: 1 August Keynote speakers: Michael Ashburner Janusz M. Bujnicki David Gilbert Trey Ideker Jens Reich Marino Zerial Biotechnology Center Dresden 26