KNIME Enterprise server usage and global deployment at NIBR



Similar documents
What s Cooking in KNIME

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

Cheminformatics in the Cloud. Michael A. Dippolito DeltaSoft, Inc. 3-June-2009 ChemAxon European User Group Meeting

Professional Education for the Future of Health Informatics. Charles P. Friedman, PhD Schools of Information and Public Health University of Michigan

Cheminformatics and its Role in the Modern Drug Discovery Process

Actian Vortex Express 3.0

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis

2) Xen Hypervisor 3) UEC

HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013

Talking your Language. E-WorkBook 10 provides a one-platform, single source of truth without adding complexity to research

Job list - Research China

HPC & Visualization. Visualization and High-Performance Computing

An Easily Accessed Clinical Research Database from your Epic EMR

Data Mining & Data Stream Mining Open Source Tools

speed thought Getting the most of CHEMAXON Integration June 2006 of The Power of at the

Visualization and Data Analysis with VIDA. Joe Corkery OpenEye Scientific Software

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

In-Database Analytics

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

How To Learn To Use Big Data

Here, There And Everywhere - On Integrating KNIME Workflows Man-Ling Lee

Advanced Big Data Analytics with R and Hadoop

RELEASE ANNOUNCEMENT Kaseya Network Discovery and Network Monitoring Version 1.0

Machine Learning with MATLAB David Willingham Application Engineer

Cheminformatics and Pharmacophore Modeling, Together at Last

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC Presentation

Nathan Brown. The Application of Consensus Modelling and Genetic Algorithms to Interpretable Discriminant Analysis.

DATA MINING ALPHA MINER

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR)

The Data Mining Process

Do you know how your TSM environment is evolving?

Anforderungen der Life-Science Industrie an die Hochschulen. Hans Widmer Novartis Institutes for BioMedical Research

Machine Learning and Data Mining. Fundamentals, robotics, recognition

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer

Game Changers for Researchers: Altmetrics, Big Data, Open Access What Might They Change? Kiki Forsythe, M.L.S.

CUSTOMER Presentation of SAP Predictive Analytics

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

IBM Rational Asset Manager

How To Synchronize With A Cwr Mobile Crm 2011 Data Management System

ELECTRONIC MEDICAL RECORDS. Selecting and Utilizing an Electronic Medical Records Solution. A WHITE PAPER by CureMD.

Cray: Enabling Real-Time Discovery in Big Data

Make the Most of Big Data to Drive Innovation Through Reseach

Advanced In-Database Analytics

Introduction Predictive Analytics Tools: Weka

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials

Prerequisites. Course Outline

New Relic & JMeter - Perfect Performance Testing

Belatrix Software Factory Sample Automated Load/Stress Testing Success Cases

The open source enterprise solution pre-configured for the IT Asset Management

Healthcare Big Data Exploration in Real-Time

Primetime for KNIME:

AppBoard TM 2.6. System Requirements. Technical Documentation. Version July 2015

Graphical Web based Tool for Generating Query from Star Schema

User's Guide - Beta 1 Draft

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

use ready 2 The open source enterprise solution pre-configured for the IT Asset Management Tecnoteca Srl

Day 1 - Technology Introduction & Digital Asset Management

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

ADAM 5.5. System Requirements

LabKey Server: An open source platform for scientific data integration, analysis, and collaboration

MS1b Statistical Data Mining

Creating a universe on Hive with Hortonworks HDP 2.0

Information Management course

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

GEM Network Advantages and Disadvantages for Stand-Alone PC

Pentaho Data Mining Last Modified on January 22, 2007

Making Good Use of Data at Hand: Government Data Projects. Mark C. Cooke, Ph.D. Tax Management Associates, Inc.

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ON THE IMPLEMENTATION OF ADAPTIVE FLOW MEASUREMENT IN THE SDN-ENABLED NETWORK: A PROTOTYPE

Databricks. A Primer

Implementing a Data Warehouse with Microsoft SQL Server

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

MS-55052: SharePoint 2013 End User Level II

B 3. Biology-Biotechnology-Bridge Program. 2-year Post-Baccalaureate Program. at MIT and its Biotech Partners

A Statistician s View of Big Data

Citrix XenApp-7.6 Administration Training. Course

Find the signal in the noise

Transcription:

KNIME Enterprise server usage and global deployment at NIBR Gregory Landrum, Ph.D. NIBR Informatics Novartis Institutes for BioMedical Research, Basel 8 th KNIME Users Group Meeting Berlin, 26 February 2015

Novartis Institutes for BioMedical Research (NIBR) A global network of >6,000 scientists, physicians, and business professionals. 2

R&D at Novartis Bringing innovative medicines to patients The Drug Development Process Source: http://www.nibr.com/cs/groups/public/@nibr_com/ 3 documents/document/n_prod_200363.pdf

R&D at Novartis Bringing innovative medicines to patients 4 Source: http://www.nibr.com/cs/groups/public/@nibr_com/ documents/document/n_prod_200363.pdf

Timelines and economics Paul, S. M., Mytelka, D. S., Dunwiddie, C. T., Persinger, C. C., Munos, B. H., Lindborg, S. R., & Schacht, A. L. (2010). Nature reviews Drug discovery, 9(3), 203-214.

NIBR: Making it work Our model of research connecting the laboratory to the clinic, and pursuing molecular pathways across a landscape of multiple diseases means that we have to be a highly collaborative organization. Every project is made up of cross-functional teams, drawn from pathways scientists, chemists, disease area specialists, informaticians, clinicians and more. We re doing scientific research, not making widgets Lots of collaboration, lots of technology, lots of data 6

Lots of data 7 Presentation Title Presenter Name Date Subject Business Use Only

Lots of data Shape of the data generated for a project Hit finding 10 6 rows, 1-2 columns Hit-to-lead 10 3 rows, 5-10 columns Lead optimization 10 2 rows, 10 2 columns Clinic 1 rows, 10 4 columns 8

The role of NIBR Informatics (NX) Identifying and driving new opportunities to accelerate science with leading-edge computing and informatics solutions. Traditional IT stuff: service desk, hardware support, network, etc. Designing, building, deploying, and supporting tools/systems for: portfolio management; document management; compliance and reporting lab informatics; sample management and logistics; electronic lab notebooks high-performance computing; large-scale data warehousing and mining, machine learning scientific data analysis; visualization; reporting Pushing the frontier: research and exploration Combination of purchased and in-house developed systems, lots of different technologies, lots of integration work 9

NIBR and KNIME We believe KNIME can be really useful, so we want to make it available to all of our scientists We re supporting both people who are using KNIME to solve problems in their own labs/groups and people who want to make tools available to others. Need to support exchange of workflows and information across all our sites Need to be integrated into our data and software environment 10

Infrastructure Internal node development Enterprise servers + cluster integration Standardized desktop releases for Windows, Linux, Mac Nightly builds for users comfortable on the bleedingleading edge Dev and test servers to support our node development 11

NIBR s KNIME servers 12

KNIME for NIBR internal distribution Standardized set of nodes and extensions Customized preferences 13

KNIME for NIBR make it supportable Allow a reset to the default configuration without requiring a new install. 14

In-house node development make it useful Connections to internal data sources and applications Wrappers around in-house developed algorithms Connection to our web service framework for cheminformatics services 15

Open-source node development Chemistry nodes based on the RDKit open-source cheminformatics toolkit useable from C++, Python, Java, C# NIBR scientists/developers actively participate www.rdkit.org Standard cheminformatics tasks + some nice extras Developed both in-house and together with knime.com 16

Sponsored node development Modifications to naïve Bayes nodes to support fingerprints Fingerprint naïve Bayes supporting unbalanced datasets Database schema browser Improvements to database connector, readers Ensemble tree classifier New Python integration 17

Integration example 1: Descriptor calculation 18

Integration example 2: DART Internal web-based tool used by project teams to do querying and reporting from our data warehouse 19

Integration example 2: DART Internal web-based tool used by project teams to do querying and reporting from our data warehouse Access to saved queries and views URL contains full state of query/view 20

Integration example 2: DART + KNIME 21

Integration example 2: DART + KNIME Access to saved queries and views 22

Usage snapshot Unique users per month Overall: 240 unique users, mostly scientists Users by site 23 Notes: 1) stats only include KNIME client 2) December data incomplete

What are those users doing with KNIME? Querying and reporting from our warehouse virtual chemistry processing usage statistics mining medchem project data processing and analyzing experimental data machine learning triaging high-throughput screening results looking up chemical catalog numbers in other words: a bit of everything 24

KNIME Server usage Primarily used to share workflows Increasingly used as a quick and easy deployment platform for small application/services built in KNIME This is mainly driven by the scientists themselves Areas for improvement: Would be nice if it were easier to sync between servers Would be great if the server could do RESTful web services. Still: enabling scientists to share workflows and make (hopefully) simple applications available to each other is great 25

Wrapping up KNIME in heavy use to solve many different problems Enterprise server used to exchange workflows globally Web portal provides a way for scientists to deploy tools to each other KNIME is a great platform for us to build upon 26

Acknowledgements NIBR Manuel Schwarze (NX) Mark Duffield (NX) David Nick (NX) Marc Litherland (NX) John Davies (CPC) Richard Lewis (GDC) Remy Evard (NX) knime.com 27