Using Graph Theory to Analyze Gene Network Coherence



Similar documents
Visualizing Networks: Cytoscape. Prat Thiru

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Exercise with Gene Ontology - Cytoscape - BiNGO

Feed Forward Loops in Biological Systems

Practical Graph Mining with R. 5. Link Analysis

Warshall s Algorithm: Transitive Closure

How To Cluster

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Understanding the dynamics and function of cellular networks

A Review And Evaluations Of Shortest Path Algorithms

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Boolean Network Models

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

Network (Tree) Topology Inference Based on Prüfer Sequence

Consistent Binary Classification with Generalized Performance Metrics

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

1. Classification problems

FRAUD DETECTION IN MOBILE TELECOMMUNICATION

Social Media Mining. Graph Essentials

Mining Social-Network Graphs

Final Project Report

Performance Metrics. number of mistakes total number of observations. err = p.1/1

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

How To Find Influence Between Two Concepts In A Network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Social Media Mining. Network Measures

Protein Protein Interaction Networks

A Survey on Pre-processing and Post-processing Techniques in Data Mining

Experiments in Web Page Classification for Semantic Web

Social Network Analysis using Graph Metrics of Web-based Social Networks

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

A Review of Performance Evaluation Measures for Hierarchical Classifiers

ProteinQuest user guide

The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Knowledge Discovery and Data Mining

Binary vs Analogue Path Monitoring in IP Networks

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

life science data mining

Distance Degree Sequences for Network Analysis

Component Ordering in Independent Component Analysis Based on Data Power

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

DATA ANALYSIS II. Matrix Algorithms

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

WAN Wide Area Networks. Packet Switch Operation. Packet Switches. COMP476 Networked Computer Systems. WANs are made of store and forward switches.

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

Qualitative modeling of biological systems

Data Structures and Algorithms Written Examination

Metodi Numerici per la Bioinformatica

Using Library Dependencies for Clustering

Extraction and Visualization of Protein-Protein Interactions from PubMed

Unraveling protein networks with Power Graph Analysis

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

In developmental genomic regulatory interactions among genes, encoding transcription factors

Data Mining and Machine Learning in Bioinformatics

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

The Use of Evolutionary Algorithms in Data Mining. Khulood AlYahya Sultanah AlOtaibi

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools

How To Understand The Network Of A Network

Categorical Data Visualization and Clustering Using Subjective Factors

JustClust User Manual

Genetomic Promototypes

Intrusion Detection using Artificial Neural Networks with Best Set of Features

Data Mining Algorithms Part 1. Dejan Sarka

Nino Pellegrino October the 20th, 2015

MEng, BSc Applied Computer Science

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Graph Mining and Social Network Analysis

Graph Processing and Social Networks

International Journal of Advance Research in Computer Science and Management Studies

Contents. Dedication List of Figures List of Tables. Acknowledgments

Graphical Modeling for Genomic Data

The Fuzzy Frequent Pattern Tree

Introduction to Data Mining

Bayesian Spam Detection

Community Mining from Multi-relational Networks

Using Random Forest to Learn Imbalanced Data


Dynamic Programming. Lecture Overview Introduction

KEYWORD SEARCH IN RELATIONAL DATABASES

How To Find Local Affinity Patterns In Big Data

Quality and Complexity Measures for Data Linkage and Deduplication

Transcription:

Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gómez-Vela fgomez@upo.es Norberto Díaz-Díaz ndiaz@upo.es José A. Lagares José A. Sánchez Jesús S. Aguilar 1

Outlines Introduction Proposed Methodology Experiments Conclusions 2

Outlines Introduction Proposed Methodology Experiments Conclusions 3

Introduction Gene Network There is a need to generate patterns of expression, and behavioral influences between genes from microarray. GNs arise as a visual and intuitive solution for genegene interaction. They are presented as a graph: Nodes: are made up of genes. Edges: relationships among these genes. 4

Introduction Gene Network 5

Introduction Gene Network Many GN inference algorithms have been developed as techniques for extracting biological knowledge Ponzoni et al., 2007. Gallo et al., 2011. They can be broadly classified as (Hecker M, 2009): Boolean Network Information Theory Model Bayesian Networks 6

Introduction Gene Network Validation in Bioinformatics Once the network has been generated, it is very important to assure network reliability in order to illustrate the quality of the generated model. Synthetic data based validation This approach is normally used to validate new methodologies or algorithms. Well-Known data based validation The literature prior knowledge is used to validate gene networks. 7

Introduction Well-Known Biological data based Validation The quality of a GN can be measured by a direct comparison between the obtained GN and prior biological knowledge (Wei and Li, 2007; Zhou and Wong, 2011). However, these approaches are not entirely accurate as they only take direct gene gene interactions into account for the validation task, leaving aside the weak (indirect) relationships (Poyatos, 2011). 8

Outlines Introduction Proposed Methodology Experiments Conclusions 9

Proposed Methodology The main features of our method: Evaluate the similarities and differences between gene networks and biological database. Take into account the indirect gene-gene relationships for the validation process. Using Graph Theory to evaluate with gene networks and obtain different measures. 10

Proposed Methodology Input Network B A D C Floyd Warshall Algorithm Biological Database A B E C F DM IN Distance Matrices DM DB 11

Proposed Methodology Input Network A C B DM IN D CM= DMi DMj Biological Database A C B DM DB F E Coherence Matrix CM CM = DM IN DM DB 12

Proposed Methodology Floyd-Warshall Algorithm This approach is a graph analysis method that solves the shortest path between nodes. Network Distance Matrix A B E A B C E F A 0 2 1 1 2 B 2 0 1 1 2 C 1 1 0 2 1 C F E 1 1 2 0 1 F 2 2 1 1 0 13

Proposed Methodology Distance Threshold Distance threshold (δ) It is used to exclude relationships that lack biological meaning. This threshold denotes the maximum distance to be considered as relevant in the Distance Matrix generation process. If the minimum distance between two genes is greater than δ, then no path between the genes will be assumed. 14

Proposed Methodology Distance Threshold A C Network B E F δ = 1 Distance Matrix A B C E F A 0 2 1 1 2 B 2 0 1 1 2 C 1 1 0 2 1 E 1 1 2 0 1 F 2 2 1 1 0 15

Proposed Methodology Distance Threshold A C Network B E F δ = 1 Distance Matrix A B C E F A 0 2 1 1 2 B 2 0 1 1 2 C 1 1 0 2 1 E 1 1 2 0 1 F 2 2 1 1 0 16

Proposed Methodology DM IN DM DB A B C D A 0 1 3 2 B 1 0 2 1 C 3 2 0 1 D 2 1 1 0 CM= DMi DMj Coherence Matrix (CM) A B C E F A 0 2 1 2 2 B 2 0 1 1 2 C 1 1 0 1 1 E 2 1 1 0 1 F 2 2 1 1 0 A B C A 0 1 B 1 0 1 C 1 0 17

Proposed Methodology Obtaining Measures Coherence Level threshold (θ) This threshold denotes the maximum coherence level to be considered as relevant in the Coherence Matrix. It is used to obtain well-known indices by using the elements of the coherence matrix: CM i,j v-y <=θ TP 0< v,y < v-y >θ FP -y (α) FN - (β) TN 18

Proposed Methodology θ = 3 Coherence Matrix A B C D E A - 1 α 4 7 B 1 - β 2 5 C α β - 1 8 D 4 2 1-1 E 7 5 8 1-19

Proposed Methodology θ = 3 Coherence Matrix A B C D E A - TP 1 FN α FP 4 FP 7 B TP 1 - TN β TP 2 FP 5 C FN α TN β - TP 1 FP 8 D FP 4 TP 2 TP 1 - TP 1 E FP 7 FP 5 FP 8 TP 1-20

Outlines Introduction Proposed Methodology Experiments Conclusions 21

Results Real data experiment Input networks were obtained by applying four inference network techniques on the well-known yeast cell cycle expression data set (Spellman et al., 1998). Soinov et al., 2003. Bulashevska et al., 2005. Ponzoni (GRNCOP) et al., 2007 Comparison with Well-Known data: BioGrid KEGG SGD YeastNet 22

Results Real data experiment Several studies were carried out using different threshold value combinations: Distance threshold (δ) and Coherence level threshold (θ) have been modified from one to five, generating 25 different combinations. The results show that the higher δ and θ values, the greater is the noise introduced. The most representative result, was obtained for δ=4 and θ=1. 23

Results Soinov Bulashevska Ponzoni Accuracy F-measure Accuracy F-measure Accuracy F-measure Biogrid 0,27 0,42 0,65 0,79 0,82 0,90 KEGG 0,58 0,48 0,34 0,50 0,28 0,43 SGD 0,31 0,47 0,53 0,69 1 1 YeastNet 0,29 0,45 0,50 0,66 1 1 24

Results These results are consistent with the experiment carried out in Ponzoni et al., 2007. Ponzoni was successfully compared with Soinov and Bulashevska approaches. 25

Outlines Introduction Proposed Methodology Experiments Conclusions 26

Conclusions A new approach of a gene network validation framework is presented: The methodology not only takes into account the direct relationships, but also the indirect ones. Graph theory has been used to perform validation task. 27

Conclusions Experiments with Real Data. These results are consistent with the experiment carried out in Ponzoni et al., 2007. Ponzoni was successfully compared with Soinov and Bulashevska approaches. These behaviours are also found in the obtained results. Ponzoni presents better coherence values than Soinov and Bulashevska in BioGrid, SGD and YeastNet. 28

Future Works The methodology has been improved: The elements in coherence matrix will be weighted based on the gene-gene relationships distance. A new measure, based on different databases will be generated. Moreover, a Cytoscape plugin will be implemented. 29

Some References Pavlopoulos GA, et al. (2011): Using graph theory to analyze biological networks. BioData Mining, 4:10. Asghar A, et al (2012) Speeding up the Floyd Warshall algorithm for the cycled shortest path problem. AppliedMathematics Letters 25(1): 1 Bulashevska S and Eils R (2005) Inferring genetic regulatory logic from expression data. Bioinformatics 21(11):2706. Ponzoni I, et al (2007) Inferring adaptive regulationthresh-olds and association rules from gene expressiondata through combinatorial optimization learning.ieee/acm Transaction on Computation Biology andbioinformatics 4(4):624. Poyatos JF (2011). The balance of weak and strong interactions in genetic networks. PloS One 6(2):e14598. 30

Using Graph Theory to Analyze Gene Network Coherence Thanks for your attention 31