Integrating computational data analysis capabilities into analytics applications TIBCO Spotfire API Juan Elvira Integromics Deputy CTO
About Integromics www.integromics.com Focus on software development for: o genomic and proteomic data analysis and data management solutions RT-PCR (RealTime StatMiner) Tibco Spotfire application MicroArrays (Integromics Biomarker Discovery) Tibco Spotfire application NGS (SeqSolve) Tibco Spotfire application Proteomic data management (OmicsHub) Partners: o Tibco Spotfire (Life science) o Applied Biosystems o Affymetrix GeneChip compatible o Illumina iconnect o Ingenuity Pathways
Background Work 2005: Integration of R (bioconductor) into Spotfire a DecisionSite using COM technology 2005: Applied Biosystems 1700 Microarray Analysis DecisionSite Guide 2006: Functional Analysis Guide for DecisionSite Guide 2007: RT-PCR Analysis Guide for DecisionSite Guide 2008-2010: Integromics Biomarker Discovery for Tibco Spotfire (v. 3.0.0) 2010: SeqSolve, NGS (RNA-seq) Analysis Workflow for Tibco Spotfire
Next Generation Sequencing Analytics Applications facing new challenges, seizing new opportunities
Next Generation Sequencing challenges Disparate data source formats (multiple instruments vendors: Illumina, Roche, Helicos, SOLiD...) Large datasets (10-50 GBytes) Computational intensive down stream analysis (RNASeq, ChipSeq,...) Requires advanced and interactive visualizations Integrate best of the breed of third-party APIs, tools, applications Reliability
Next Generation Sequencing challenges integration usability scalability automation
Integration: 3rd-party software Integration Coupled Integration patterns o Call the external application executable o Use third-party APIs De-coupled Integration patterns o Web Services based integration o Message Oriented Middleware based integration Spotfire API Extensions enable 3 rd party APIs integration Spotfire DataFunctions enable both integration patterns Spotfire COM Automation interface (two-way integration)
Integration: SeqSolve use case Genome-Browser integration (Custom Tool visualization context)
Usability: Time consuming tasks NGS analysis usually takes a looo...ong time!
Usability: Time consuming tasks Synchronous I/O (Blocking mode) Asynchronous I/O (Non-Blocking mode)
Usability: Asynchronous I/O Pattern Spotfire API Support Spotfire.Dxp.Framework.Threading
Usability: Asynchronous I/O Pattern Spotfire API Support Spotfire.Dxp.Data.DataFunctions o Executes in background thread, easier API than Threading Framework o Takes advantage of multi-core CPUs o TIBCO Spotfire Statistics Service connection enabler o Implement asynchronous calculations, wrap custom datasources and transformations o Output supported operations: Add new table Add columns Add rows Replace data table Set document, table and column properties
Scalability
Scalability: facing the challenge "scalability is the ability of a system, network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth." from wikipedia Design (or re-implement) your algorithms following parallelization design patterns: e.g MapReduce,... Use a computational middleware supporting: o distributed and parallel computation o scale up to accommodate more works
Analytics Application Scalability TIBCO Spotfire Statistics Services o Supports S+/R Engines o Job Management o Seamless Integration with TIBCO Spotfire Professional and Webplayer o RESTful communication layer and C# API client o Cluster support (load balancing and fail-over) Using TIBCO Spotire API to integrate other middleware(s) o Message Oriented Middleware (MOM)
Scalability: Message Oriented Middleware Advance Message Queue Protocol (AMQP) Mesage producers Exchanges Queues Message Consumers Message Patterns Request/Response Publish/Subscribe Round robin
Scalability: Message Oriented Middleware
Automation... less error prone!... more reliable!!
Automation: Spotfire API Support TIBCO Spotfire Platform Automation Services COM Automation Interface o Expose a public interface to control Spotfire remotely o COM based intercomunication pocess o Two-way communication (callback)
Automation: Application Creation Using document, table and column properties as metadata to enable analytics application automatic generation o Add new tables o Add new pages o Add and configure new visualizations
Automation: Integromics Click and GO Entry point: SeqSolve CustomDataSource Extension
Automation: Integromics Click and GO Select input files
Automation: Integromics Click and GO Define Analysis Configuration (Analysis Profile)
Automation: Integromics Click and GO Run Click and GO -> creates a complete RNA-Seq Analytics Application ready to be used
Summary Building Analytics applications is not a one-dimensional problem. o Integration: Take advantage of the 'state of the art' o Usability: Use Asynchronous I/O patterns o Scalability: Be prepare for larger data and heavier computation. o Automation: Save user time and minimize errors TIBCO Spotfire Platform and its API provides with a valuable set of built-in capabilities readily to be used TIBCO Spotfire Platform can be extended in case your needs require a tailored solution
Q&A THANK YOU! juan.elvira@integromics.com