RamiGO: an R interface for AmiGO

Similar documents
Creating a New Annotation Package using SQLForge

netresponse probabilistic tools for functional network analysis

Estimating batch effect in Microarray data with Principal Variance Component Analysis(PVCA) method

HowTo: Querying online Data

R4RNA: A R package for RNA visualization and analysis

JustClust User Manual

Using the generecommender Package

mmnet: Metagenomic analysis of microbiome metabolic network

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

GEOsubmission. Alexandre Kuhn May 3, 2016

Exercise with Gene Ontology - Cytoscape - BiNGO

Methods for network visualization and gene enrichment analysis July 17, Jeremy Miller Scientist I jeremym@alleninstitute.org

Tutorial for proteome data analysis using the Perseus software platform

Gene2Pathway Predicting Pathway Membership via Domain Signatures

Package RDAVIDWebService

Analysis of Illumina Gene Expression Microarray Data

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants

Cluster software and Java TreeView

Introduction to Synoptic

Package retrosheet. April 13, 2015

Visualization of Semantic Windows with SciDB Integration

Figure 1: Opening a Protos XML Export file in ProM

EDASeq: Exploratory Data Analysis and Normalization for RNA-Seq

TUTORIAL 4 Building a Navigation Bar with Fireworks

NaviCell Data Visualization Python API

Technical Report. The KNIME Text Processing Feature:

Formulas, Functions and Charts

Visualising big data in R

WebSphere Business Monitor

Extracting data from XML. Wednesday DTL

Interactive Data Visualization for the Web Scott Murray

Final Project Report

Guide for Data Visualization and Analysis using ACSN

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

ProteinQuest user guide

Visualizing Networks: Cytoscape. Prat Thiru

Analysis of Bead-summary Data using beadarray

Stored Documents and the FileCabinet

Introduction to R and UNIX Working with microarray data in a multi-user environment

Package GEOquery. August 18, 2015

DATASTREAM CHARTING ADVANCED FEATURES

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

Introduction to RStudio

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants

Data Tool Platform SQL Development Tools

Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools

Monocle: Differential expression and time-series analysis for single-cell RNA-Seq and qpcr experiments

IC05 Introduction on Networks &Visualization Nov

CIS 467/602-01: Data Visualization

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.

HRS 750: UDW+ Ad Hoc Reports Training 2015 Version 1.1

The goal with this tutorial is to show how to implement and use the Selenium testing framework.

RNA Movies 2: sequential animation of RNA secondary structures

Package sjdbc. R topics documented: February 20, 2015

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

Using self-organizing maps for visualization and interpretation of cytometry data

Talend Component: tjasperreportexec

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

10CS73:Web Programming

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Terms and Definitions for CMS Administrators, Architects, and Developers

Package hoarder. June 30, 2015

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

Introduction to Web Development with R

Gene set and data preparation

aarhus MicroStation Print Organizer V8i-SS3 Your Technology Advantage Spring 2013 Jeanne Aarhus

Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions

Agents and Web Services

Tutorial: setting up a web application

NVivo 10 for Windows and NVivo for Mac

Practical Differential Gene Expression. Introduction

Adobe Acrobat 6.0 Professional

Eclipse installation, configuration and operation

Comparing Methods for Identifying Transcription Factor Target Genes

Package brewdata. R topics documented: February 19, Type Package

Gephi Tutorial Quick Start

Package GSA. R topics documented: February 19, 2015

Application of Graph-based Data Mining to Metabolic Pathways

MultiExperiment Viewer Quickstart Guide

Creating a Network Graph with Gephi

DeCyder Extended Data Analysis module Version 1.0

Fireworks CS4 Tutorial Part 1: Intro

SLIDE SHOW 18: Report Management System RMS IGSS. Interactive Graphical SCADA System. Slide Show 2: Report Management System RMS 1

Package cgdsr. August 27, 2015

Semester Thesis Traffic Monitoring in Sensor Networks

Chapter 10: Multimedia and the Web

Transcription:

RamiGO: an R interface for AmiGO Markus S. Schröder 1,2, Daniel Gusenleitner 1, Aedín C. Culhane 1, Benjamin Haibe-Kains 1, and John Quackenbush 1 1 Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, USA 2 Computational Genomics, Center for Biotechnology, Bielefeld University, Germany October 21, 2016 Contents 1 Introduction 1 2 Getting started 2 3 A usefull extension to GSEA 4 4 View and edit GO trees in Cytoscape 5 5 Misc 5 6 Session Info 6 1 Introduction Please cite RamiGo as follows: Markus S. Schröder, Daniel Gusenleitner, John Quackenbush, Aedín C. Culhane and Benjamin Haibe-Kains (2013): RamiGO: an R/Bioconductor package providing an AmiGO Visualize interface. Bioinformatics 29(5):666-668. A common task in recent gene set or gene signature analyses is testing for up- and downregulation of these gene sets or gene signatures in Gene Ontology (GO) terms. Or having a gene or set of genes of interest and looking at the GO terms that include that gene or gene set. For a closer look at the distribution of the GO terms in the different tree structures of the three GO categories one has to either rebuild the GO tree himself with the help of published R packages, or copy and paste the GO terms of interest into existing web services to display the GO tree. One of these web services is AmiGO visualize: AmiGO visualize: http://amigo.geneontology.org/cgi-bin/amigo/amigo?mode=visualize The RamiGO package is providing functions to interact with the AmiGO visualize web server and retrieves GO (Gene Ontology) trees in various formats. The most common requests 1

would be as png or svg, but a file representation of the tree in the GraphViz DOT format is also possible. RamiGO also provides a parser for the GraphViz DOT format that returns a graph object and meta data in R. 2 Getting started At first we load the RamiGO package into the current workspace: > library(ramigo) The RamiGO package currently provides two functions that enable the user to retrieve directed acyclic trees from AmiGO and parse the GraphViz DOT format. An example on how to use the functions is given below. To retrieve a tree from AmiGO, the user has to provide a vector of GO ID s. For example GO:0051130, GO:0019912, GO:0005783, GO:0043229 and GO:0050789. These GO ID s represent entries from the three GO categories: Biological Process, Molecular Function and Cellular Component. The given GO ID s can be highlighted with different colors within the tree, therefor the user has to provide a vector of colors for each GO ID. A request could look like this: > goids <- c("go:0051130","go:0019912","go:0005783","go:0043229","go:0050789") > color <- c("lightblue","red","yellow","green","pink") > pngres <- getamigotree(goids=goids, color=color, filename="example", pictype="png", saver Figure 1: Example PNG returned from AmiGO. The GO tree representing the given GO ID s is dowloaded to the file example.png (see Figure 1); the file extension is created automatically according to pictype. The request for a svg file is similar: > svgres <- getamigotree(goids=goids, color=color, filename="example", pictype="svg", saver svgres is a vector with the svg picture in xml format. In order to further analyze the tree, RamiGO provides the possibility to retrieve the tree in the GraphViz DOT format. The function readamigodot parses these DOT format files and returns a AmigoDot S4 object. This S4 object includes an igraph object (agraph()), an adjacency matrix representing the graph (adjmatrix()), a data.frame with the annotation for each node (annot()), the relations (edges) between the nodes (relations()) and a data.frame with the leaves of the tree and their annotation (leaves()). An example could look like this: > dotres <- getamigotree(goids=goids, color=color, filename="example", pictype="dot", saver > tt <- readamigodot(object=dotres) > ## reading the file would also work! > ## tt <- readamigodot(filename="example.dot") > show(tt) 2

..- attr(*, "vnames")= chr [1:241] "node1 node2" "node1 node69" "node1 node86" "node2 nod class: AmigoDot Class 'igraph.es' atomic [1:241] 1 2 3 4 5 6 7 8 9 10.....- attr(*, "env")=<weakref>..- attr(*, "graph")= chr "4c3a5cc8-336c-431b-a46e-b23d10bd3cb9" nodes: Class 'igraph.vs' atomic [1:93] 1 2 3 4 5 6 7 8 9 10.....- attr(*, "env")=<weakref>..- attr(*, "graph")= chr "4c3a5cc8-336c-431b-a46e-b23d10bd3cb9" edges: 'data.frame': 3 obs. of 6 variables: $ node : chr "node58" "node74" "node83" $ GO_ID : chr "GO:0005783" "GO:0051130" "GO:0019912" $ description: chr "endoplasmic reticulum" "positive regulation of cellular component org $ color : chr "#000000" "#000000" "#000000" $ fillcolor : chr "yellow" "lightblue" "red" $ fontcolor : chr "#000000" "#000000" "#000000" leaves: 'data.frame': 93 obs. of 6 variables: $ node : chr "node1" "node2" "node3" "node4"... $ GO_ID : chr "GO:0044464" "GO:0012505" "GO:0005575" "GO:0044444"... $ description: chr "cell part" "endomembrane system" "cellular_component" "cytoplasmic pa $ color : chr "#000000" "#000000" "#000000" "#000000"... $ fillcolor : chr "#ffffff" "#ffffff" "#ffffff" "#ffffff"... $ fontcolor : chr "#000000" "#000000" "#000000" "#000000"... annot: 'data.frame': 243 obs. of 6 variables: $ parent : chr "node1" "node1" "node1" "node2"... $ child : chr "node2" "node69" "node86" "node58"... $ arrowhead: chr "none" "none" "none" "none"... $ arrowtail: chr "normal" "normal" "normal" "normal"... $ color : chr "\"#000000\"" "\"#000000\"" "\"#000000\"" "\"#add8e6\""... $ style : chr "bold" "bold" "bold" "bold"... relations: The leaves of the tree are returned in leaves(tt): > leavestt <- leaves(tt) > leavestt[,c("node","go_id","description")] node GO_ID 58 node58 GO:0005783 74 node74 GO:0051130 83 node83 GO:0019912 description 58 endoplasmic reticulum 3

74 positive regulation of cellular component organization 83 cyclin-dependent protein kinase activating kinase activity In order to export the tree to an GML file that is readable by Cytoscape, you have to call the adjm2gml with some of the results from the readamigodot function. The following example creates a GML file by internally calling the exportcytogml: > gg <- adjm2gml(adjmatrix(tt),relations(tt)$color,annot(tt)$fillcolor,annot(tt)$go_id,anno The result is a GML file named example.gml that can be imported into Cytoscape as a network file. 3 A usefull extension to GSEA The RamiGO package provides an extremely helpful extension to the GSEA software, in java as well as in R, if run with genesets from GO (C5 in MSigDB). RamiGO provides a mapping from GO terms returned from GSEA to official GO ID s. The mapping is stored in the data object c5.go.mapping. > data(c5.go.mapping) > head(c5.go.mapping) description goid 1 NUCLEOPLASM GO:0005654 2 EXTRINSIC_TO_PLASMA_MEMBRANE GO:0019897 3 ORGANELLE_PART GO:0044422 4 CELL_PROJECTION_PART GO:0044463 5 CYTOPLASMIC_VESICLE_MEMBRANE GO:0030659 6 GOLGI_MEMBRANE GO:0000139 One of the ways to avoid running GSEA in R is to call the java application of GSEA from R with the system() function. An example for a preranked GSEA would be: > ## paths to gsea jar and gmt file > exe.path <- exe.path.string > gmt.path <- gmt.path.string > gsea.collapse <- "false" > ## number of permutations > nperm <- 10000 > gsea.seed <- 54321 > gsea.out <- "out-folder" > ## build GSEA command > gsea.report <- "report-file" > rnk.path <- "rank-file" > gsea.cmd <- sprintf("java -Xmx4g -cp %s xtools.gsea.gseapreranked -gmx %s -collapse %s -n > ## execute command on the system > system(gsea.cmd) 4

The results are stored in a folder with the name specified in gsea.out. The subfolder gsea.report has the detailed results in comma separated files and html pages. In the gsea.cmd string above we specified a few parameters which can be changed according to the type of analysis. ˆ plot top x: the number of results that should have an individual result page linked to the main index.html. ˆ set max and set min: limits the analysis to genesets that have more than 15 and less than 500 genes. Once the GSEA analysis is finished, the important result files are xls files in the gsea.report folder. Named gsea report for na pos <some number>.xls and gsea report for na neg <some number>.xls. We can read them into R with the following command: > resn <- "xxx" ## number generated by GSEA that you can get with grep(), strsplit() and di > tt <- rbind(read.table(sprintf("%s/%s/gsea_report_for_na_pos_%s.xls", gsea.out, gsea.repo With all results from the GSEA analysis stored in tt, you can extract information from the results and call the getamigotree mentioned in the example section. 4 View and edit GO trees in Cytoscape The adjm2gml function in RamiGO creates a Cytoscape specific GML file (see example section above) that can be imported into Cytoscape and further edited (for example for publication purposes). The GO tree from the example above, parsed with the readamigodot function, exported with the adjm2gml and imported into Cytoscape as a network, looks like Figure 2. 5 Misc strapply enables perl-like regular expression in R, as do grep, sub or gsub. In particular, it enables the use of the perl variables $1, $2,... for extracting information from within a regular expression. The code below shows an example of the use of strapply. The string within brackets (...) is returned in a list by strapply. > strapply(c("node25 -> node30"), "node([\\d]+) -> node([\\d]+)", c, backref = -2) [[1]] [1] "25" "30" The RCurl package is useful for communicating with a web server and sending GET or POST requests. RamiGO uses the postform() function to communicate with the AmiGO web server. The png package is used to convert the web server response for a png request into an actual png file. The igraph package is used to build a graph object representing the tree that was parsed from an DOT format file. 5

6 Session Info Figure 2: Example GML imported in Cytoscape. ˆ R version 3.3.1 (2016-06-21), x86_64-pc-linux-gnu ˆ Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=C, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C ˆ Base packages: base, datasets, grdevices, graphics, methods, stats, utils 6

ˆ Other packages: RamiGO 1.20.0, gsubfn 0.6-6, proto 0.3-10 ˆ Loaded via a namespace (and not attached): BiocGenerics 0.20.0, RCurl 1.95-4.8, RCytoscape 1.24.1, XML 3.98-1.4, XMLRPC 0.3-0, bitops 1.0-6, graph 1.52.0, igraph 1.0.1, magrittr 1.5, parallel 3.3.1, png 0.1-7, stats4 3.3.1, tcltk 3.3.1, tools 3.3.1 7