GeneProf and the new GeneProf Web Services

Similar documents
Comparing Methods for Identifying Transcription Factor Target Genes

New solutions for Big Data Analysis and Visualization

GMQL Functional Comparison with BEDTools and BEDOPS

Analysis of ChIP-seq data in Galaxy

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Basic processing of next-generation sequencing (NGS) data

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Delivering the power of the world s most successful genomics platform

Cloud Computing. What Are We Handing Over? Ganesh Shankar Advanced IT Core Pervasive Technology Institute

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Globus Genomics Tutorial GlobusWorld 2014

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Visualisation tools for next-generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

LifeScope Genomic Analysis Software 2.5

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

UGENE Quick Start Guide

High Throughput Sequencing Data Analysis using Cloud Computing

Discovery & Modeling of Genomic Regulatory Networks with Big Data

G E N OM I C S S E RV I C ES

4/25/2016 C. M. Boyd, Practical Data Visualization with JavaScript Talk Handout

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

BioHPC Web Computing Resources at CBSU

Oracle Big Data SQL Technical Update

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

Technical Information Abstract

Enhancing Document Review Efficiency with OmniX

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

GenomeSpace Architecture

Visualizing a Neo4j Graph Database with KeyLines

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

White Paper. Version 1.2 May 2015 RAID Incorporated

Cloud-Based Big Data Analytics in Bioinformatics

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

2015 Workshops for Professors

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

VPMS - Advanced Media Management

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

Case Study Life Sciences Data

What s new in Carmenta Server 4.2

Data-Intensive Science and Scientific Data Infrastructure

Interactive Visualization of Genomic Data

CMS data quality monitoring web service

Visualizing an OrientDB Graph Database with KeyLines

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Visualization of Semantic Windows with SciDB Integration

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

Installation Guide for Windows

Reporting Services. White Paper. Published: August 2007 Updated: July 2008

Hadoopizer : a cloud environment for bioinformatics data analysis

Creating Highly Interactive Websites for the Dissemination of Statistics

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

GC3 Use cases for the Cloud

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

QPR WorkFlow. Minimize Process Time, Maximize Process Outcome. QPR WorkFlow 1

Tutorial for proteome data analysis using the Perseus software platform

Cloudbuz at Glance. How to take control of your File Transfers!

Corepoint Community Exchange Features and Value - Overview

Alterian Content Manager 7 Digital Asset Management (DAM) capabilities

DiskPulse DISK CHANGE MONITOR

Module 1. Sequence Formats and Retrieval. Charles Steward

Practical Solutions for Big Data Analytics

The Galaxy workflow. George Magklaras PhD RHCE

e-science Technologies in Synchrotron Radiation Beamline - Remote Access and Automation (A Case Study for High Throughput Protein Crystallography)

Sisense. Product Highlights.

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

IBM Rational ClearCase, Version 8.0

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

INCOGEN Professional Services

Forcepoint Stonesoft Management Center

McAfee Security. Management Client

ORACLE DATABASE 10G ENTERPRISE EDITION

Keep managers better informed on their areas of responsibility and highlight the issues that require their attention with dashboards!

Cloud Tools Reference Guide. Version: GA

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Load and Performance Load Testing. RadView Software October

Frequently Asked Questions Next Generation Sequencing

Cisco Data Preparation

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

IBM WebSphere ILOG Rules for.net

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

ProteinQuest user guide

NaviCell Data Visualization Python API

Computational Genomics. Next generation sequencing (NGS)

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

The data between TC Monitor and remote devices is exchanged using HTTP protocol. Monitored devices operate either as server or client mode.

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

Base One's Rich Client Architecture

Searching Nucleotide Databases

icer Bioinformatics Support Fall 2011

A Performance Analysis of Distributed Indexing using Terrier

Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team

Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys

Transcription:

GeneProf and the new GeneProf Web Services Florian Halbritter florian.halbritter@ed.ac.uk Stem Cell Bioinformatics Group (Simon R. Tomlinson) simon.tomlinson@ed.ac.uk December 10, 2012 Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 1 / 18

Outline 1 GeneProf Motivation GeneProf - what is it? Simple, Transparent and Reproducible Data Analysis Straightforward Interpretation of Results A Comprehensive Resource of HTS Results 2 GeneProf Web Services (new!) Web Services?!? Example Use Cases R UCSC Your web site (HTML+jQuery+d3) Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 2 / 18

GeneProf Motivation: The Next-Generation Analysis Challenge Full potential of HTS for unbiased, accurate and genome-wide data generation is held back by numerous challenges: storage and transfer (big disks, fast networks) and computational complexity (speed & memory), lack of established, transparent methodologies, consistency and general expertise, integration, visualization and interpretation. Next-Gen Sequencers (adapted from Cochrane et al, NAR, 2010) Public databases have accumulated billions of short reads, but there s no convenient and quick way for researchers to access and utilise these data. There s a wealth of biological knowledge buried out there, but it s cumbersome and time-consuming to get to it! This is where GeneProf comes in!?????? Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 3 / 18

GeneProf Next-Gen Analysis for Next-Gen Data To address these challenges and to make HTS data more widely interpretable and usable by (all) life scientists, we have developed a web-based graphical software suite, called GeneProf. GeneProf combines.... an easy-to-use and versatile data analysis suite that automates large parts of the analysis process, with a.... comprehensive resource of transparently analysed experimental data that can be browsed, searched, exported and, importantly, reused. With GeneProf we try to keep the focus on biology: It s not just about connecting tools together, but about getting answers out of the system. Use existing data to enrich your findings and create new insight! http://www.geneprof.org Halbritter F, Vaidya HJ, Tomlinson SR. GeneProf: Analysis of high-throughput sequencing experiments. Nature Methods, 2012. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 4 / 18

GeneProf Simple Interface, Powerful Backend GeneProf s user interface is completely web-based: No need for special software or hardware. Data and results accessible from anywhere. A dedicated, remote compute cluster does all the hard work: Concurrent handling of many computationally demanding tasks. All required software is installed on these machines. Future developments: Wire in UoE s high-performance compute cluster (Eddie) and the cloud. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 5 / 18

GeneProf Simple, Quick and Transparent Data Analysis Analysis Results Count 0.6 200 0 0 2 Calculate TFAS 4 5 6 7 200 100 0 100 200 2 200 100 0 100 200 100 50 50 0 0 50 50 meb.1 mesc.1 100 mesc.2 50 mesc.3 0 meb.3 50 meb.2 150 meb.4 150 + 1 150 0.0 Find Peaks with MACS 3 150 + 2 100 1 Quality Control + Bowtie Alignment 0 Row Z Score 0.2 0.4 Ensembl 58 Mouse Genes, NCBIM37 Assembly 50 Input Sequences 50 0.8 100 1.0 Data = Virtual Experiment Assign TFBS to Genes Data, analysis and results all packed together in one logical unit = a virtual experiment. GeneProf simplifies workflow creation by providing workflow wizards (configured typically with just a few mouse clicks!). Wizards make it possible to run best-practice analysis procedures for complex data within minutes! Analyses can be customised using the drag&drop-based workflow designer tool benefitting from over 100 versatile analysis components! Entire analysis process is tracked and all intermediate results available fully transparent and reproducible methodology! Create a worflow by wizard.. Florian Halbritter (MRC-CRM).. then customize it by drag & drop. http://www.geneprof.org December 10, 2012 6 / 18

GeneProf Data Summaries & Exploratory Analysis In addition to primary data analysis results, GeneProf will automatically create a range of informative summary statistics and plots. Short read quality before and after quality control, alignment summary, gene expression overview, summary of binding peaks,.. These summaries help to get a feel for the data and interpret results. Exploratory data analysis: Create an analysis workflow using a wizard, check summary statistics, adjust workflow, re-run,.. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 7 / 18

GeneProf A Comprehensive Resource We have used GeneProf as a tool for large-scale analysis, building up a comprehensive and attainable resource of ChIP-seq and RNA-seq (and related) data: Over 3 terabytes of analysed HTS data from 100 published studies amounting to some 1,500 lanes of sequencing runs or over 22 billion short reads. This data can be browsed, searched, filtered, plotted and re-used in your own experiments for comparison and meta-analysis purposes! Gene Expression Transcription Factor Binding Histone Modifications Others Public Data in GeneProf 100 200 300 experiments data [*10GB] Sep Oct Dec Jan Feb Apr Jun Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 8 / 18

GeneProf Making HTS Attainable by All Researchers Even researchers without their own HTS data can benefit from GeneProf: Instantly access data about your favourite genes from large-scale genomics experiments: General information, functional annotation, protein interactions,... Gene expression (RNA-seq & the like) in different cell types, tissues, conditions, etc. Transcription factor / DNA-protein binding activity by this factor (if applicable).... and by other factors near this gene transcriptional regulation. Browse huge amounts of genomic data using the built-in genome browser: Gene expression, transcription factors, histones, polymerase, etc. DNA-binding by Transcription Factor (ChIP-seq): RNA-seq Expression:.. and to a gene (also ChIP-seq): Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 9 / 18

Outline 1 GeneProf Motivation GeneProf - what is it? Simple, Transparent and Reproducible Data Analysis Straightforward Interpretation of Results A Comprehensive Resource of HTS Results 2 GeneProf Web Services (new!) Web Services?!? Example Use Cases R UCSC Your web site (HTML+jQuery+d3) Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 10 / 18

GeneProf Web Services Web Services?!? Web services are software systems designed to support interoperable machine-to-machine interaction over a network (source: W3C) other software can retrieve or manipulate data on the server. We have implemented a range of RESTful web services that allow programmatic retrieval of GeneProf data in a variety of computer- and human-readable formats (XML, JSON, CSV, FASTQ, BED, R-data,..). Specific web service request http://www.geneprof.org/geneprof/api/exp/list.json?with-outputs=true Web services base URL Format Additional filter parameters and options Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 11 / 18

GeneProf Web Services Overview What s available? Metadata and search (lists of experiments, datasets, genes), ID translations,.. Raw and processed data retrieval from specific GeneProf experiments, e.g. FASTA/Q, BED, results tables,.. Gene expression data (as raw counts, RPM, RPKM) and lists of correlated genes (based on RNA-seq). Regulatory data (based on ChIP-seq): Putative target genes of transcription factors and the like. Lists of TFs, HMs, etc. enriched in the proximity of a gene. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 12 / 18

GeneProf Web Services Example Use Cases Now 3 Examples: R, UCSC HTML/AJAX. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 13 / 18

GeneProf Web Services Example Use Cases: R Many web services can export data directly in binary R format, which can be easily loaded into R using an URL connection: gpload <- function(webservice) { base.url <- http://www.geneprof.org/geneprof/api/ ; url.con <- url(description=paste(base.url,webservice,sep= )); load(url.con); close(url.con); geneprof.data } We can use the gene expression data web service to retrieve data for two genes and, for instance, generate an annotated scatter plot: g1 <- gpload( gene.info/expression/mouse/9066.rdata ) g2 <- gpload( gene.info/expression/mouse/29219.rdata ) selection <- g1$cell Type %in% TYPES.OF.INTEREST... plot(g1$rpkm[selection],g2$rpkm[selection],...)... (complete source code on web services homepage!) gene 2 0 2 4 6 8 10 12 embryonic stem cell neuronal precursor cell lung fibroblast oocyte sperm embryoid body 0 2 4 6 8 10 12 gene 1 Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 14 / 18

GeneProf Web Services Example Use Cases: UCSC You can use the GeneProf Web Services to export genomic data directly in formats supported by many modern genome browsers, e.g. the UCSC Genome Browser or IGV. For example, some Pol2 ChIP-seq data from a realignment of Sultan et al. (2008) + Input DNA (as WIG) + MACS-called peaks (as BED): Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 15 / 18

GeneProf Web Services Example Use Cases: Your web site (HTML+jQuery+d3) You can request the data in XML and JSON format (or JSONP for cross-domain requests), which means you can easily integrate GeneProf data in external web sites. Example: Search genes by name, then (for each matching gene) display the average RPKM expression in a selection of cell types as a dynamically created plot. How to do it? jquery makes issuing JSONP requests trivial, d3.js can generate SVG / HTML5 plots. $.ajax({ url: API HOME + /gene.info/expression/ +refid+ / +geneid+.json, datatype: jsonp, success: function(jsondata) {... } }); (complete source code on web services homepage!) Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 16 / 18

GeneProf Web Services Example Use Cases Perl Taverna Many more examples available at: http://www.geneprof.org/geneprof/webapi.jsp... and we d love to hear about further use cases from you! Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 17 / 18

Funding Stem Cell Bioinformatics Simon Tomlinson Florian Halbritter Aidan McGlinchey Will Bowring Duncan Godwin Anastacia Kousa Alison McGarvey Thank you for your attention! Questions? Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 18 / 18