Unraveling protein networks with Power Graph Analysis



Similar documents
Protein Protein Interaction Networks

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Introduction to Bioinformatics 3. DNA editing and contig assembly

Bioinformatics: Network Analysis

Interaktionen von RNAs und Proteinen

Visualizing Networks: Cytoscape. Prat Thiru

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Introduction to Genome Annotation

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Graph theoretic approach to analyze amino acid network

Computational Systems Biology. Lecture 2: Enzymes

Feed Forward Loops in Biological Systems

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Healthcare Analytics. Aryya Gangopadhyay UMBC

Guide for Bioinformatics Project Module 3

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Probabilistic methods for post-genomic data integration

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Resources at a Glance

ProteinQuest user guide

Linear Sequence Analysis. 3-D Structure Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

NeXO Web: the NeXO ontology database and visualization platform

Bio-Informatics Lectures. A Short Introduction

Bioinformatics: course introduction

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

T cell Epitope Prediction

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Using Graph Theory to Analyze Gene Network Coherence

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

MAKING AN EVOLUTIONARY TREE

Visualization methods for patent data

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Final Project Report

Pairwise Sequence Alignment

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

Exponential time algorithms for graph coloring

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Distributed Computing over Communication Networks: Maximal Independent Set

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Graphical degree sequences and realizations

Mining Social-Network Graphs

Tutorial 8. NP-Complete Problems

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Human-Mouse Synteny in Functional Genomics Experiment

Current Motif Discovery Tools and their Limitations

Biological Sequence Data Formats

Processing Genome Data using Scalable Database Technology. My Background

10/4/2012. Analysis and Visualization of Biological Networks with Cytoscape. Outline of the Day. Introductions

Open Source Software Developer and Project Networks

Activity 7.21 Transcription factors

Pathway Analysis : An Introduction

A Primer of Genome Science THIRD

Genome Explorer For Comparative Genome Analysis

Annex 6: Nucleotide Sequence Information System BEETLE. Biological and Ecological Evaluation towards Long-Term Effects

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Who takes the lead? Social network analysis as a pioneering tool to investigate shared leadership within sports teams. Fransen et al.

WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE

QUANTITATIVE APPROACHES IN CELL BIOLOGY BIOPHYSICS, BIOENGINEERING & SYSTEMS BIOLOGY

8/20/2012 H C OH H R. Proteins

The Big Data Paradigm Shift. Insight Through Automation

The Visualization Pipeline

Qualitative modeling of biological systems

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

Protein Sequence Analysis - Overview -

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Understanding the dynamics and function of cellular networks

MASCOT Search Results Interpretation

CD-HIT User s Guide. Last updated: April 5,

Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools

BIOINFORMATICS TUTORIAL

Transcription:

Unraveling protein networks with Power Graph Analysis PLoS Computational Biology, 2008 Loic Royer Matthias Reimann Bill Andreopoulos Michael Schroeder Schroeder Group Bioinformatics 1

Complex Networks in Biology Direct visualization too much detail Clustering / Coarse graining loss of detail Is there a middle ground? 2

SH3 Interaction Profiles Landgraf et al. (2004) Discovery of peptides that bind 8 SH3 domains in Yeast. What is interesting about the interaction profiles of these SH3 domains? 3

Bipartite Regulatory Network Beyer et al. (2006) A bipartite network between transcription factors and target genes in Yeast. What insights can be gathered about the poorly characterized factor YAP7? 4

Phosphatase Similarity Network (Own data) An network connecting Tyrosine Phosphatases if their sequences are similar enough (BLAST e < 10-46). Is it at all possible to find something interesting in there? 5

Solution: 'To comprehend is to compress' Gregory Chaitin 6

Solution: 'To comprehend is to compress' Gregory Chaitin Outline: Power Graph Analysis 3 Examples Statistics of Compressibility 2 step Algorithm 7

The Language of Power Graphs Solution: Transform networks into Power Graphs by clustering both nodes and edges Biclique Clique Star This is a reversible transformation that preserves all connectivity information 8

Protein Interactions Hubs in networks hub Protein Complexes Domain and motif induced interactions 9

Beyond Protein Interactions Regulatory networks Transcription factors Homology / Paralogy Networks 10

SH3 Interaction Profiles The power graph improves the readability of the network. Is there biology explaining the way the peptides are grouped? 11

SH3 Interaction Profiles Green: SH3 domains, Red: PxxPxR motif, Blue: RxxPxxP motif Now, what about the SH3 domains? 12

SH3 Interaction Profiles The neighborhood similarity implied by the power graph reflects the sequence similarity of the SH3 domains. Example: LSB3 and YSC84 have similar sequences but also similar binding profiles 13

Bipartite Regulatory Network Transcription factors are clustered according to their target genes Target genes are clustered according to their transcription factors 14

Bipartite Regulatory Network YAP7 All 6 factors involved in Yeast stress response YAP1/2 regulate metal detoxification genes Hypothesis: poorly characterized YAP7 too 15

Phosphatase Similarity Network 16

Phosphatase Similarity Network 6 type B receptor PTPs are linked by a power edge to two type 2 non-receptor PTPs 17

Phosphatase Similarity Network The second tyrosine phosphatase domain of the two type G PTPs align to an unannotated region of about 370 amino acids with a sequence identity of 14% and a similarity of 39% This is evidence of domain erosion 18

Power Graph Analysis T o com prehend is to compress Power Graph Analysis reduces redundant information 18 edges 2 power nodes 2 power edges Edges become power nodes and power edges Conversion rate is one power node for 8 edges Edge reduction is 88% Overall less symbols needed: 4 instead of 18 19

Empirical Statistical Analysis Higher compression levels are achieved for biological networks than for rewired networks of same degree distribution Thus the scale-free degree distribution is not the explanation Original Rewired 20

Power Graph Spectrum Cliques and bicliques almost disappear after rewiring Same holds for manually curated networks (SIN, HPRD) 21

Domains and GO Terms Cliques and bicliques have a biological explanation Power nodes are enriched in InterPro domains And enriched in GO terms: Domains are a better explanation for cliques and bicliques than GO terms. 22

The Power Graph Algorithm Problem: Minimal decomposition into cliques and bicliques Similar problems: Minimal partition into cliques is NP-hard (Kratzke 88) Minimal biclique partition is NP-complete (Duh 97) Our solution: a greedy search, two steps: 23

Summary Compress Power Graphs compress networks without loss of information Compression levels up to 95% are possible High compressibility is lost after degree invariant rewiring Fast, greedy algorithm, applicable for many types of networks. Comprehend Half of power nodes have a Domain or GO term enrichment SH3 domain interaction profiles reflects phylogeny Function prediction for transcription factor Discovery of an eroded Phosphatase domain Try it! Available for Cytoscape and as command line tool. GOOGLE FOR: Power Graph Analysis 24

Acknowledgments Matthias Reimann Bill Andreopoulos Christof Winter Michael Schroeder Participant travel costs to present the project described was supported by Award Number R13GM085877 from the U.S National Institute of General Medical Sciences. The content is solely the responsibility of the author(s) and does not necessarily represent the official views of the National Institute of General Medical Sciences of the National Institutes of Health. M ichael S chroeder G roup B iotec D resden University of T echnology 25

GCB 2008 German Conference on Bioinformatics A Systems Approach to Disease Dresden September 9 12, 2008 www.gcb2008.de Posters and highlight papers: 1 August Keynote speakers: Michael Ashburner Janusz M. Bujnicki David Gilbert Trey Ideker Jens Reich Marino Zerial Biotechnology Center Dresden 26