GeneProf and the new GeneProf Web Services Florian Halbritter florian.halbritter@ed.ac.uk Stem Cell Bioinformatics Group (Simon R. Tomlinson) simon.tomlinson@ed.ac.uk December 10, 2012 Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 1 / 18
Outline 1 GeneProf Motivation GeneProf - what is it? Simple, Transparent and Reproducible Data Analysis Straightforward Interpretation of Results A Comprehensive Resource of HTS Results 2 GeneProf Web Services (new!) Web Services?!? Example Use Cases R UCSC Your web site (HTML+jQuery+d3) Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 2 / 18
GeneProf Motivation: The Next-Generation Analysis Challenge Full potential of HTS for unbiased, accurate and genome-wide data generation is held back by numerous challenges: storage and transfer (big disks, fast networks) and computational complexity (speed & memory), lack of established, transparent methodologies, consistency and general expertise, integration, visualization and interpretation. Next-Gen Sequencers (adapted from Cochrane et al, NAR, 2010) Public databases have accumulated billions of short reads, but there s no convenient and quick way for researchers to access and utilise these data. There s a wealth of biological knowledge buried out there, but it s cumbersome and time-consuming to get to it! This is where GeneProf comes in!?????? Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 3 / 18
GeneProf Next-Gen Analysis for Next-Gen Data To address these challenges and to make HTS data more widely interpretable and usable by (all) life scientists, we have developed a web-based graphical software suite, called GeneProf. GeneProf combines.... an easy-to-use and versatile data analysis suite that automates large parts of the analysis process, with a.... comprehensive resource of transparently analysed experimental data that can be browsed, searched, exported and, importantly, reused. With GeneProf we try to keep the focus on biology: It s not just about connecting tools together, but about getting answers out of the system. Use existing data to enrich your findings and create new insight! http://www.geneprof.org Halbritter F, Vaidya HJ, Tomlinson SR. GeneProf: Analysis of high-throughput sequencing experiments. Nature Methods, 2012. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 4 / 18
GeneProf Simple Interface, Powerful Backend GeneProf s user interface is completely web-based: No need for special software or hardware. Data and results accessible from anywhere. A dedicated, remote compute cluster does all the hard work: Concurrent handling of many computationally demanding tasks. All required software is installed on these machines. Future developments: Wire in UoE s high-performance compute cluster (Eddie) and the cloud. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 5 / 18
GeneProf Simple, Quick and Transparent Data Analysis Analysis Results Count 0.6 200 0 0 2 Calculate TFAS 4 5 6 7 200 100 0 100 200 2 200 100 0 100 200 100 50 50 0 0 50 50 meb.1 mesc.1 100 mesc.2 50 mesc.3 0 meb.3 50 meb.2 150 meb.4 150 + 1 150 0.0 Find Peaks with MACS 3 150 + 2 100 1 Quality Control + Bowtie Alignment 0 Row Z Score 0.2 0.4 Ensembl 58 Mouse Genes, NCBIM37 Assembly 50 Input Sequences 50 0.8 100 1.0 Data = Virtual Experiment Assign TFBS to Genes Data, analysis and results all packed together in one logical unit = a virtual experiment. GeneProf simplifies workflow creation by providing workflow wizards (configured typically with just a few mouse clicks!). Wizards make it possible to run best-practice analysis procedures for complex data within minutes! Analyses can be customised using the drag&drop-based workflow designer tool benefitting from over 100 versatile analysis components! Entire analysis process is tracked and all intermediate results available fully transparent and reproducible methodology! Create a worflow by wizard.. Florian Halbritter (MRC-CRM).. then customize it by drag & drop. http://www.geneprof.org December 10, 2012 6 / 18
GeneProf Data Summaries & Exploratory Analysis In addition to primary data analysis results, GeneProf will automatically create a range of informative summary statistics and plots. Short read quality before and after quality control, alignment summary, gene expression overview, summary of binding peaks,.. These summaries help to get a feel for the data and interpret results. Exploratory data analysis: Create an analysis workflow using a wizard, check summary statistics, adjust workflow, re-run,.. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 7 / 18
GeneProf A Comprehensive Resource We have used GeneProf as a tool for large-scale analysis, building up a comprehensive and attainable resource of ChIP-seq and RNA-seq (and related) data: Over 3 terabytes of analysed HTS data from 100 published studies amounting to some 1,500 lanes of sequencing runs or over 22 billion short reads. This data can be browsed, searched, filtered, plotted and re-used in your own experiments for comparison and meta-analysis purposes! Gene Expression Transcription Factor Binding Histone Modifications Others Public Data in GeneProf 100 200 300 experiments data [*10GB] Sep Oct Dec Jan Feb Apr Jun Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 8 / 18
GeneProf Making HTS Attainable by All Researchers Even researchers without their own HTS data can benefit from GeneProf: Instantly access data about your favourite genes from large-scale genomics experiments: General information, functional annotation, protein interactions,... Gene expression (RNA-seq & the like) in different cell types, tissues, conditions, etc. Transcription factor / DNA-protein binding activity by this factor (if applicable).... and by other factors near this gene transcriptional regulation. Browse huge amounts of genomic data using the built-in genome browser: Gene expression, transcription factors, histones, polymerase, etc. DNA-binding by Transcription Factor (ChIP-seq): RNA-seq Expression:.. and to a gene (also ChIP-seq): Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 9 / 18
Outline 1 GeneProf Motivation GeneProf - what is it? Simple, Transparent and Reproducible Data Analysis Straightforward Interpretation of Results A Comprehensive Resource of HTS Results 2 GeneProf Web Services (new!) Web Services?!? Example Use Cases R UCSC Your web site (HTML+jQuery+d3) Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 10 / 18
GeneProf Web Services Web Services?!? Web services are software systems designed to support interoperable machine-to-machine interaction over a network (source: W3C) other software can retrieve or manipulate data on the server. We have implemented a range of RESTful web services that allow programmatic retrieval of GeneProf data in a variety of computer- and human-readable formats (XML, JSON, CSV, FASTQ, BED, R-data,..). Specific web service request http://www.geneprof.org/geneprof/api/exp/list.json?with-outputs=true Web services base URL Format Additional filter parameters and options Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 11 / 18
GeneProf Web Services Overview What s available? Metadata and search (lists of experiments, datasets, genes), ID translations,.. Raw and processed data retrieval from specific GeneProf experiments, e.g. FASTA/Q, BED, results tables,.. Gene expression data (as raw counts, RPM, RPKM) and lists of correlated genes (based on RNA-seq). Regulatory data (based on ChIP-seq): Putative target genes of transcription factors and the like. Lists of TFs, HMs, etc. enriched in the proximity of a gene. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 12 / 18
GeneProf Web Services Example Use Cases Now 3 Examples: R, UCSC HTML/AJAX. Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 13 / 18
GeneProf Web Services Example Use Cases: R Many web services can export data directly in binary R format, which can be easily loaded into R using an URL connection: gpload <- function(webservice) { base.url <- http://www.geneprof.org/geneprof/api/ ; url.con <- url(description=paste(base.url,webservice,sep= )); load(url.con); close(url.con); geneprof.data } We can use the gene expression data web service to retrieve data for two genes and, for instance, generate an annotated scatter plot: g1 <- gpload( gene.info/expression/mouse/9066.rdata ) g2 <- gpload( gene.info/expression/mouse/29219.rdata ) selection <- g1$cell Type %in% TYPES.OF.INTEREST... plot(g1$rpkm[selection],g2$rpkm[selection],...)... (complete source code on web services homepage!) gene 2 0 2 4 6 8 10 12 embryonic stem cell neuronal precursor cell lung fibroblast oocyte sperm embryoid body 0 2 4 6 8 10 12 gene 1 Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 14 / 18
GeneProf Web Services Example Use Cases: UCSC You can use the GeneProf Web Services to export genomic data directly in formats supported by many modern genome browsers, e.g. the UCSC Genome Browser or IGV. For example, some Pol2 ChIP-seq data from a realignment of Sultan et al. (2008) + Input DNA (as WIG) + MACS-called peaks (as BED): Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 15 / 18
GeneProf Web Services Example Use Cases: Your web site (HTML+jQuery+d3) You can request the data in XML and JSON format (or JSONP for cross-domain requests), which means you can easily integrate GeneProf data in external web sites. Example: Search genes by name, then (for each matching gene) display the average RPKM expression in a selection of cell types as a dynamically created plot. How to do it? jquery makes issuing JSONP requests trivial, d3.js can generate SVG / HTML5 plots. $.ajax({ url: API HOME + /gene.info/expression/ +refid+ / +geneid+.json, datatype: jsonp, success: function(jsondata) {... } }); (complete source code on web services homepage!) Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 16 / 18
GeneProf Web Services Example Use Cases Perl Taverna Many more examples available at: http://www.geneprof.org/geneprof/webapi.jsp... and we d love to hear about further use cases from you! Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 17 / 18
Funding Stem Cell Bioinformatics Simon Tomlinson Florian Halbritter Aidan McGlinchey Will Bowring Duncan Godwin Anastacia Kousa Alison McGarvey Thank you for your attention! Questions? Florian Halbritter (MRC-CRM) http://www.geneprof.org December 10, 2012 18 / 18