Integrating computational data analysis capabilities into analytics applications



Similar documents
q for Gods Whitepaper Series (Edition 7) Common Design Principles for kdb+ Gateways

How To Create A C++ Web Service

Automation, Efficiency and Scalability in Securities Back Office Processing An implementer's view

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

A Survey Study on Monitoring Service for Grid

Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

Welcome to the second half ofour orientation on Spotfire Administration.

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Manjrasoft Market Oriented Cloud Computing Platform

whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

Tier Architectures. Kathleen Durant CS 3200

I/O Considerations in Big Data Analytics

ATHABASCA UNIVERSITY. Enterprise Integration with Messaging

Empowering the Masses with Analytics

Workflow Tools at NERSC. Debbie Bard NERSC Data and Analytics Services

Introduction to TIBCO MDM

FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7

Redefining Microsoft SQL Server Data Management. PAS Specification

Cloud-Based Big Data Analytics in Bioinformatics

Fast Innovation requires Fast IT

Jitterbit Technical Overview : Microsoft Dynamics CRM

CHAPTER 1 INTRODUCTION

Introduction. Overview of Bioconductor packages for short read analysis

Unified Batch & Stream Processing Platform

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials

Event based Enterprise Service Bus (ESB)

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Setting up Remote Replication on SNC NAS Series

6.0, 6.5 and Beyond. The Future of Spotfire. Tobias Lehtipalo Sr. Director of Product Management

Big Data and Analytics (Fall 2015)

Practical Solutions for Big Data Analytics

What s new in TIBCO Spotfire 6.5

Manjrasoft Market Oriented Cloud Computing Platform

WSO2 Message Broker. Scalable persistent Messaging System

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center

Data Integration Checklist

Developing Algo Trading Applications with SmartQuant Framework The Getting Started Guide SmartQuant Ltd Dr. Anton B.

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Data processing goes big

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

A Guide Through the BPM Maze

SOA Blueprints Concepts

Building Platform as a Service for Scientific Applications

EVALUATING INTEGRATION SOFTWARE

TIBCO Spotfire Statistics Services Installation and Administration Guide. Software Release 5.0 November 2012

TIBCO Spotfire Statistics Services Installation and Administration Guide

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Creating a universe on Hive with Hortonworks HDP 2.0

Application Architectures

Microsoft SQL Server 2012: What to Expect

Tech Note. TrakCel in the wider Clinical Ecosystem: Accelerating Integration and Automation

WHITE PAPER. Enabling predictive analysis in service oriented BPM solutions.

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Security Center Unified Security Platform

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Managing Large Imagery Databases via the Web

Smart wayside management software

Assignment # 1 (Cloud Computing Security)

TIBCO Spotfire Statistics Services Installation and Administration. Release 5.5 May 2013

Databricks. A Primer

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Data Domain Profiling and Data Masking for Hadoop

Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. technische universität dortmund. fakultät für informatik informatik 12

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Microsoft SQL Server Always On Technologies

Service Virtualization

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Experience with the integration of distribution middleware into partitioned systems

A Data Centric Approach for Modular Assurance. Workshop on Real-time, Embedded and Enterprise-Scale Time-Critical Systems 23 March 2011

TRADING SOLUTIONS CONNECTIVITY & INTEGRATION (TSCI) A Bloomberg Trading Solutions Offering BE AGILE

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant

Integrated Rule-based Data Management System for Genome Sequencing Data

Enterprise Service Bus Defined. Wikipedia says (07/19/06)

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Analysis of ChIP-seq data in Galaxy

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Cluster, Grid, Cloud Concepts

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Managing Data in Motion

Jitterbit Technical Overview : Salesforce

Big Data Analytics Nokia

Solution Overview: Geomant Contact Expert for Microsoft Lync Server

Databricks. A Primer

A Unified Messaging-Based Architectural Pattern for Building Scalable Enterprise Service Bus

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

High Throughput Sequencing Data Analysis using Cloud Computing

TIBCO Spotfire Statistics Services Installation and Administration

Closer Look at Enterprise Service Bus. Deb L. Ayers Sr. Principle Product Manager Oracle Service Bus SOA Fusion Middleware Division

Transcription:

Integrating computational data analysis capabilities into analytics applications TIBCO Spotfire API Juan Elvira Integromics Deputy CTO

About Integromics www.integromics.com Focus on software development for: o genomic and proteomic data analysis and data management solutions RT-PCR (RealTime StatMiner) Tibco Spotfire application MicroArrays (Integromics Biomarker Discovery) Tibco Spotfire application NGS (SeqSolve) Tibco Spotfire application Proteomic data management (OmicsHub) Partners: o Tibco Spotfire (Life science) o Applied Biosystems o Affymetrix GeneChip compatible o Illumina iconnect o Ingenuity Pathways

Background Work 2005: Integration of R (bioconductor) into Spotfire a DecisionSite using COM technology 2005: Applied Biosystems 1700 Microarray Analysis DecisionSite Guide 2006: Functional Analysis Guide for DecisionSite Guide 2007: RT-PCR Analysis Guide for DecisionSite Guide 2008-2010: Integromics Biomarker Discovery for Tibco Spotfire (v. 3.0.0) 2010: SeqSolve, NGS (RNA-seq) Analysis Workflow for Tibco Spotfire

Next Generation Sequencing Analytics Applications facing new challenges, seizing new opportunities

Next Generation Sequencing challenges Disparate data source formats (multiple instruments vendors: Illumina, Roche, Helicos, SOLiD...) Large datasets (10-50 GBytes) Computational intensive down stream analysis (RNASeq, ChipSeq,...) Requires advanced and interactive visualizations Integrate best of the breed of third-party APIs, tools, applications Reliability

Next Generation Sequencing challenges integration usability scalability automation

Integration: 3rd-party software Integration Coupled Integration patterns o Call the external application executable o Use third-party APIs De-coupled Integration patterns o Web Services based integration o Message Oriented Middleware based integration Spotfire API Extensions enable 3 rd party APIs integration Spotfire DataFunctions enable both integration patterns Spotfire COM Automation interface (two-way integration)

Integration: SeqSolve use case Genome-Browser integration (Custom Tool visualization context)

Usability: Time consuming tasks NGS analysis usually takes a looo...ong time!

Usability: Time consuming tasks Synchronous I/O (Blocking mode) Asynchronous I/O (Non-Blocking mode)

Usability: Asynchronous I/O Pattern Spotfire API Support Spotfire.Dxp.Framework.Threading

Usability: Asynchronous I/O Pattern Spotfire API Support Spotfire.Dxp.Data.DataFunctions o Executes in background thread, easier API than Threading Framework o Takes advantage of multi-core CPUs o TIBCO Spotfire Statistics Service connection enabler o Implement asynchronous calculations, wrap custom datasources and transformations o Output supported operations: Add new table Add columns Add rows Replace data table Set document, table and column properties

Scalability

Scalability: facing the challenge "scalability is the ability of a system, network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth." from wikipedia Design (or re-implement) your algorithms following parallelization design patterns: e.g MapReduce,... Use a computational middleware supporting: o distributed and parallel computation o scale up to accommodate more works

Analytics Application Scalability TIBCO Spotfire Statistics Services o Supports S+/R Engines o Job Management o Seamless Integration with TIBCO Spotfire Professional and Webplayer o RESTful communication layer and C# API client o Cluster support (load balancing and fail-over) Using TIBCO Spotire API to integrate other middleware(s) o Message Oriented Middleware (MOM)

Scalability: Message Oriented Middleware Advance Message Queue Protocol (AMQP) Mesage producers Exchanges Queues Message Consumers Message Patterns Request/Response Publish/Subscribe Round robin

Scalability: Message Oriented Middleware

Automation... less error prone!... more reliable!!

Automation: Spotfire API Support TIBCO Spotfire Platform Automation Services COM Automation Interface o Expose a public interface to control Spotfire remotely o COM based intercomunication pocess o Two-way communication (callback)

Automation: Application Creation Using document, table and column properties as metadata to enable analytics application automatic generation o Add new tables o Add new pages o Add and configure new visualizations

Automation: Integromics Click and GO Entry point: SeqSolve CustomDataSource Extension

Automation: Integromics Click and GO Select input files

Automation: Integromics Click and GO Define Analysis Configuration (Analysis Profile)

Automation: Integromics Click and GO Run Click and GO -> creates a complete RNA-Seq Analytics Application ready to be used

Summary Building Analytics applications is not a one-dimensional problem. o Integration: Take advantage of the 'state of the art' o Usability: Use Asynchronous I/O patterns o Scalability: Be prepare for larger data and heavier computation. o Automation: Save user time and minimize errors TIBCO Spotfire Platform and its API provides with a valuable set of built-in capabilities readily to be used TIBCO Spotfire Platform can be extended in case your needs require a tailored solution

Q&A THANK YOU! juan.elvira@integromics.com