netresponse probabilistic tools for functional network analysis



Similar documents
Estimating batch effect in Microarray data with Principal Variance Component Analysis(PVCA) method

Visualising big data in R

HowTo: Querying online Data

Using the generecommender Package

Creating a New Annotation Package using SQLForge

R4RNA: A R package for RNA visualization and analysis

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Practical Differential Gene Expression. Introduction

GEOsubmission. Alexandre Kuhn May 3, 2016

Monocle: Differential expression and time-series analysis for single-cell RNA-Seq and qpcr experiments

mmnet: Metagenomic analysis of microbiome metabolic network

EDASeq: Exploratory Data Analysis and Normalization for RNA-Seq

KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor

Analysis of Bead-summary Data using beadarray

Gene2Pathway Predicting Pathway Membership via Domain Signatures

Tutorial for proteome data analysis using the Perseus software platform

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Exercise with Gene Ontology - Cytoscape - BiNGO

Visualization by Linear Projections as Information Retrieval

Normalization of RNA-Seq

Introduction to robust calibration and variance stabilisation with VSN

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants

Distance Degree Sequences for Network Analysis

Data, Measurements, Features

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Tutorial for the WGCNA package for R I. Network analysis of liver expression data in female mice

Final Project Report

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Delivering the power of the world s most successful genomics platform

Computer Science Electives and Clusters

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

Exploratory data analysis (Chapter 2) Fall 2011

Doctor of Philosophy in Computer Science

Principles of Data Mining by Hand&Mannila&Smyth

Statistical and computational challenges in networks and cybersecurity

Tutorial for the WGCNA package for R: I. Network analysis of liver expression data in female mice

Distributed Systems Seminar Spatio-Temporal Visualization System - STVS

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

HMMcopy: A package for bias-free copy number estimation and robust CNA detection in tumour samples from WGS HTS data

Client Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data

FlowMergeCluster Documentation

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

High Throughput Network Analysis

Dirichlet Processes A gentle tutorial

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Package RDAVIDWebService

Self Organizing Maps for Visualization of Categories

A Computational Framework for Exploratory Data Analysis

Component Ordering in Independent Component Analysis Based on Data Power

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Gene Expression Analysis

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Learning Objectives:

Statistics Graduate Courses

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps

Exploratory Data Analysis with MATLAB

Learning outcomes. Knowledge and understanding. Competence and skills

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Structural Health Monitoring Tools (SHMTools)

Package MixGHD. June 26, 2015

A Demonstration of Hierarchical Clustering

Protein Protein Interaction Networks

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

MapReduce Evaluator: User Guide

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants

JustClust User Manual

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

ProteinQuest user guide

Cloud-Based Big Data Analytics in Bioinformatics

Unsupervised Data Mining (Clustering)

Statistical issues in the analysis of microarray data

Graphical Modeling for Genomic Data

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Guide for Data Visualization and Analysis using ACSN

Exploratory data analysis for microarray data

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

COC131 Data Mining - Clustering

Package MBA. February 19, Index 7. Canopy LIDAR data

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Distributed Bioinformatics Computing System for DNA Sequence Analysis

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

Hadoop SNS. renren.com. Saturday, December 3, 11

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data

Bioinformatics Grid - Enabled Tools For Biologists.

IC05 Introduction on Networks &Visualization Nov

Transcription:

netresponse probabilistic tools for functional network analysis Leo Lahti 1,2, Olli-Pekka Huovilainen 1, António Gusmão 1 and Juuso Parkkinen 1 (1) Dpt. Information and Computer Science, Aalto University, Finland (2) Wageningen University, Netherlands October 1, 2012 1 Introduction Condition-specific network activation is characteristic for cellular systems and other real-world interaction networks. If measurements of network states are available across a versatile set of conditions or time points, it becomes possible to construct a global view of network activation patterns. Different parts of the network respond to different conditions, and in different ways. Systematic, data-driven identification of these responses will help to obtain a holistic view of network activity [1, 2]. This package provides robust probabilistic algorithms for functional network analysis [1, 5]. The methods are based on nonparametric probabilistic modeling and variational learning, and provide general exploratory tools to investigate the structure (ICMg; [5]) and context-specific behavior (NetResponse; [1]) of interaction networks. ICMg is used to identify community structure in interaction networks; NetResponse detects and characterizes subnetworks that exhibit context-specific activation patterns across versatile collections of functional measurements, such as gene expression data. The implementations are partially based on the agglomerative independent variable group analysis [3] and variational Dirichlet process Gaussian mixture models [4]. The tools are particularly useful for global exploratory analysis of genome-wide interaction networks and versatile collections of gene expression data. 2 Loading the package and example data Load the package and toy data set. The toydata object contains the variables D (gene expression matrix) and netw (network matrix). The data matrix D describes measurements of the network activation over multiple conditions. This simple toy data will be analyzed in the subsequent examples. Note that the leo.lahti@iki.fi 1

method is potentially applicable to networks with thousands of nodes and conditions; the scalability depends on network connectivity. > library(netresponse) > data(toydata) > D <- as.matrix(toydata$emat) > netw <- as.matrix(toydata$netw) 3 Detecting network responses Detect network responses across the different measurement conditions in the data matrix D: > model <- detect.responses(d, netw, verbose = FALSE) Various network formats are supported, see help(detect.responses) for details. With large data sets, consider using the speedup option. 4 Investigating the results Subnetwork statistics: size and number of distinct responses for each subnet > stat <- model.stats(model) > stat subnet.size subnet.responses Subnet-1 7 1 Subnet-2 3 3 List the detected subnetworks (each is a list of nodes). By default, singleton subnetworks (with only one gene) and subnetworks with only a single response (no differences between conditions) are excluded. To change the defaults, see help(get.subnets). Subnetworks can be filtered by size and number of responses. Subnetworks that have only one response are not informative of the differences between conditions, and typically ignored in subsequent analysis. > get.subnets(model, min.size = 2, min.responses = 2) $`Subnet-2` [1] "node-4" "node-5" "node-6" Each subnetwork response has a probabilistic association to each condition. Get the list of samples corresponding to each response (each sample is assigned to the response of the highest probability) with response2sample function. > subnet.id <- 'Subnet-2' > response2sample(model, subnet.id) Retrieve model parameters of a given subnetwork (Gaussian mixture means, covariance diagonal, and component weights): 2

> pars <- get.model.parameters(model, subnet.id) # model parameters > pars $mu [,1] [,2] [,3] Mode-1-0.1286901 3.001058-0.1857475 Mode-2-4.9279611-0.156345 2.1559391 Mode-3 5.0687917-2.924342-3.2136255 $sd [,1] [,2] [,3] Mode-1 0.8685188 0.9146674 0.9118761 Mode-2 0.9552095 1.0309359 0.9005080 Mode-3 0.9725623 1.0904168 0.9451185 $w Mode-1 Mode-2 Mode-3 0.3502574 0.3497425 0.3000002 $free.energy [,1] [1,] 1134.673 $Nparams [1] 21 $nodes [1] "node-4" "node-5" "node-6" Probabilistic sample-response assignments for a given subnet is retrieved with: > response.probabilities <- sample2response(model, subnet.id) 5 Extending the subnetworks After identifying the locally connected subnetworks, it is possible to search for features (genes) that are similar to a given subnetwork but not directly interacting with it. To order the remaining features in the input data based on similarity with the subnetwork, type > g <- find.similar.features(model, subnet.id = "Subnet-1") > subset(g, delta < 0) feature.name delta node-6 node-6-71.59557 node-5 node-5-68.29331 node-4 node-4-31.91128 This gives a data frame which indicates similarity level with the subnetwork for each feature. The smaller, the more similar. Negative values of delta indicate the presence of coordinated responses, positive values of delta indicate 3

independent responses. The data frame is ordered such that the features are listed by decreasing similarity. 6 Nonparametric Gaussian mixture models The package provides additional tools for nonparametric Gaussian mixture modeling based on variational Dirichlet process mixture models and implementations by [4, 3]. See the example in help(vdp.mixt). 7 Interaction Component Model for Gene Modules Interaction Component Model (ICMg) can be used to find functional gene modules [5] from either protein interaction data or from combinations of protein interaction and gene expression data. A short example of how to run ICMg and obtain clustering for the nodes: > library(netresponse) > data(osmo) > res <- ICMg.combined.sampler(osmo$ppi, osmo$exp, C=10) > res$comp.memb <- ICMg.get.comp.memberships(osmo$ppi, res) > res$clustering <- apply(res$comp.memb, 2, which.max) 8 Visualization Plot subnetwork responses (Fig. 1). The visualization tools depend on Rgraphviz and igraph packages. 4

> subnet.id <- "Subnet-2" # specify the subnet to visualize > vis <- plot.responses(model, subnet.id) Subnet 2/Response 1 Subnet 2/Response 2 Subnet 2/Response 3 node 5 node 5 node 5 node 4 node 4 node 4 node 6 node 6 node 6 Figure 1: Plot color scale used in the visualization: > plot.scale(vis$breaks, vis$palette) 9 Citing NetResponse Please cite [1] with the package. When using the ICMg algorithms, additionally cite [5]. 10 Version information This document was written using: > sessioninfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C 5

[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid parallel stats graphics grdevices utils datasets [8] methods base other attached packages: [1] netresponse_1.10.0 Rgraphviz_2.2.0 reshape_0.8.4 plyr_1.7.1 [5] RColorBrewer_1.0-5 qvalue_1.32.0 minet_3.12.0 infotheo_1.1.0 [9] mclust_4.0 graph_1.36.0 ggplot2_0.9.2.1 igraph0_0.5.5-2 [13] dmt_0.8.10 MASS_7.3-21 Matrix_1.0-9 lattice_0.20-10 [17] mvtnorm_0.9-9992 loaded via a namespace (and not attached): [1] BiocGenerics_0.4.0 colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 [5] gtable_0.1.1 labeling_0.1 memoise_0.1 munsell_0.4 [9] proto_0.3-9.2 reshape2_1.2.1 scales_0.2.2 stats4_2.15.1 [13] stringr_0.6.1 tcltk_2.15.1 tools_2.15.1 References [1] Leo Lahti et al. (2010). Global modeling of transcriptional responses in interaction networks. Bioinformatics 26(21):2713-20 Preprint available at: http://www.cis.hut.fi/lmlahti/publications/lahti10bioinf-preprint.pdf [2] Leo Lahti (2010). Probabilistic analysis of the human transcriptome with side information. PhD thesis. TKK Dissertations in Information and Computer Science TKK-ICS-D19. Aalto University School of Science and Technology, Department of Information and Computer Science, Espoo, Finland, December 2010 http://lib.tkk.fi/diss/2010/isbn9789526033686/ [3] Antti Honkela et al. (2008). Agglomerative independent variable group analysis. Neurocomputing 71, 1311 20. [4] Kenichi Kurihara et al. (2007). Accelerated variational Dirichlet process mixtures. In B. Schölkopf, J. Platt, and T. Hoffman, eds., Advances in Neural Information Processing Systems 19, 761-8. MIT Press, Cambridge, MA. [5] Parkkinen, J. and Kaski, S. Searching for functional gene modules with interaction component models. BMC Systems Biology 4 (2010), 4. 6