Why use workflow? In recent years, explosive amounts of biological information has been obtained and deposited in various databases.

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Why use workflow? In recent years, explosive amounts of biological information has been obtained and deposited in various databases."

Transcription

1 Workflow technology It s a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services). It facilitates knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. 1

2 Workflow tecnology Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. 2

3 Applications Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. 3

4 Why use workflow? In recent years, explosive amounts of biological information has been obtained and deposited in various databases. Workflow systems could become crucial for enabling scientists to deal with this data explosion. A workflow system is highly flexible and can accommodate any changes or updates whenever new or modified data and corresponding analytical tool become available. 4

5 How workflow works Workflow environment allows life science researchers to perform the integration themselves without involving any programming. Workflow system allows the construction of complex in silico experiments in the form of workflows and data pipelines. 5

6 Data Pipelines Data pipelining is a relatively simple concept. Any computational component or node has data inputs and data outputs. Data pipelining views these nodes as being connected together by pipes through which data flows. In a workflow controlled data pipeline, as the data flows, it is transformed and raw data is analyzed to become information and the collected information gives rise to knowledge. 6

7 Workflow description A workflow system (Hollingsworth, 1995) is a holistic unit that defines, manages, and executes workflow processes aided by software. The order of execution is defined by a computer representation of the workflow process logic. 7

8 Workflow description Internally, a workflow system uses a Workflow Language or Meta-Languages for process specification (Michael and Jörg, 1999) to define the workflow process logic, to be executed by workflow execution engine or workflow controller 8

9 Workflow description (2) Visual representation of the workflow process logic is generally carried out using a Graphical User Interface (GUI) where different types of nodes (data transformation point) or software components are available for connection through edges or pipes that define the workflow process 9

10 Workflow component The anatomy of a workflow node or component is basically defined by three parameters: input metadata transformation rules, algorithms or user parameters, output metadata. 10

11 Anatomy of a workflow node The component properties are best described by the input metadata, output metadata and user defined parameters or transformation rules. The inputs ports can be constrained to only accept data of a specific type such as those provided by another component 11

12 Node linking Nodes can be plugged together only if the output of one, previous (set of) node(s) represents the mandatory input requirements of the following node. Thus, the essential description of a node actually comprises only in and output that are described fully in terms of data types and their semantics. 12

13 A general workflow based framework for life sciences domain 13

14 A general workflow based framework for life sciences domain A general workflow has three different layers: clients layer, component and enactment layer, and database layer 14

15 Clients layer The client has a Graphical User Interface for the creation of workflows along with process definition service. The user can create workflows using any combination of the available tools, services or databases in workflow system by dragging/dropping and linking graphical icons. Process definition services use Meta-Languages for Workflow and Process Modeling. 15

16 Component and enactment layer This layer has component services and enactment services and it can interact with database layer. Component services provide different software components, tools and services (grid An enactment service consists of one or more workflow engines in order to manage and execute workflow instances. Workflow engine or workflow controller is responsible for the part or all of the runtime control environment within an enactment service. and web services). 16

17 Database layer Database layer consist of diverse range of data sources (remote as well as local). 17

18 Commercial Workflow Systems SciTegic s Pipeline Pilot chemistry (compound library acquisition, combinatorial library design, molecular property calculators, filters, and manipulators), ADME/Tox, Decision Trees, Modeling, R Statistics, Reporting, Sequence Analysis, BioMining, Text Analytics and Integration Collection (flexible mechanisms to link external applications and databases) 18

19 Commercial Workflow Systems InforSense KDE environment and its specialized extension BioSense High performance bioinformatics solutions ranging from sequence analysis to microarray informatics and remote database annotation ChemSense High range of chemoinformatics solutions ranging from the analysis and visualization of chemical libraries to the development of combinatorial chemistry libraries, and includes a wide range of QSAR, ADME-Tox prediction, molecular modeling and evaluation methods 19

20 Commercial Workflow Systems InforSense KDE and SciTegic s Pipeline Pilot are state of the art workflow systems helping faster and efficient research in life sciences domain. Due to the high costs involved, these are still not accessible to academics a major contributor to scientific research. Other commercial workflow system to be mentioned are INCOGEN VIBE, BioLog s BioLib, Science Factory s übertool. 20

21 Public Domain Workflow Systems There are dozens of open-source workflow systems in the life sciences domain, many of which were initially developed as specific and often small-scale projects. 21

22 Public Domain Workflow Systems Opensource workflow systems represent a major source of advantage for the academia not just because they are free of license charges, but also because the open-source workflow systems are based on community models of development in which people from diverse background actively contribute to the application. 22

23 Public Domain Workflow Systems KNIME (University of Konstanz) It is based on the Eclipse platform and, it provides an excellent data mining platform for drug discovery informatics, bioinformatics and chemistry research Taverna It addresses problems beyond the capabilities of the present system to improve many areas including Data flow centric model, Scalability and Data streaming. 23

24 Public Domain Workflow Systems my Grid It is developing semantically enabled grid middleware for supporting bioinformatics and drug discovery applications and, is regarded as the most powerful workflow system in public domain. Different application tools built on mygrid cover fields like Genome and Proteome Annotation (e-protein), Integrative Systems Biology (myib), Integration of Biological Data (PlaNet, EMBRACE), Proteomics (ISPIDER), Medical and Health Informatics (PsyGrid, MIAS-Grid), Chemoinformatics (CDK- Taverna, Chimatica) andtext mining (Epitheliome). 24

25 Taverna Taverna project aims to provide a language and software tools to facilitate easy use of workflow and distributed compute technology within the escience community Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) and it is possible to download at the site Taverna is written in Java 5, and runs on a selection of platforms. 25

26 Taverna Taverna allows a biologist or bioinformatician with limited computing background and limited technical resources and support to construct highly complex analyses over public and private data and computational resources, all from a standard PC, UNIX box or Apple computer. 26

27 Taverna The screenshot below shows the workbench in action running an example workflow. 27

28 Workflow example 28

29 Specialized Softwares for Integration with Life Sciences Workflow Systems Within the last few years a large number of tools and softwares dealing with different computational problems related to life sciences have been constantly developed Incorporating third party or new tools into existing frameworks needs a flexible, modular and customizable workflow framework A plug-in based architecture (like Eclipse) is a better option for an ideal workflow system. The workflow execution engine or controller communicates with the integrated components plug-ins using workflow language or XML. 29

30 Desired Traits and Future Trends Workflow systems can be data-intensive, computationintensive, analysis intensive, visualization-intensive (e.g., visualization pipeline systems such as AVS, OpenDX, and SCIRun), process-intensive or a combination of one or more traits. 30

31 Desired Traits and Future Trends Workflow systems can be data-intensive, computationintensive, analysis intensive, visualizationintensive (e.g., visualization pipeline systems such as AVS, OpenDX, and SCIRun), process-intensive or a combination of one or more traits. There are many common desired traits (Ludäscher et al., 2005) of workflow systems like seamless access to resources and services, service composition and reuse, scalability, detached execution, reliability and faulttolerance, semantic binding, process provenance and data provenance. 31

32 Desired Traits and Future Trends Present workflow systems in life sciences need to integrate several resources like Semantic web technology (Shadbolt et al., 2006), grids services and web services. Web and grid services provide access to distributed resources, while workflow techniques enable the integration of these resources to perform in silico experiments. 32

33 Conclusion We described workflow based framework to speed up the life sciences research. Largescale data analysis needs flexible workflow based integration of different software tools and application from diverse domains which can provide in silico experimental design through visual programming and execution on grids. 33

34 Conclusion Workflow systems in life sciences are still evolving and the final goal is a distributed and a ubiquitous environment which can integrate web services, semantic web, grids, domain specific tools and ontologies. 34

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

BIOINFORMATICS Supporting competencies for the pharma industry

BIOINFORMATICS Supporting competencies for the pharma industry BIOINFORMATICS Supporting competencies for the pharma industry ABOUT QFAB QFAB is a bioinformatics service provider based in Brisbane, Australia operating nationwide and internationally. QFAB was established

More information

The information system in a small biotech company: Overview of the cheminformatics developments at Aptanomics

The information system in a small biotech company: Overview of the cheminformatics developments at Aptanomics The information system in a small biotech company: Overview of the cheminformatics developments at Aptanomics Jérôme PANSANEL Contents Introduction Chemical library management

More information

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding

More information

University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology

University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60

More information

Cheminformatics and Pharmacophore Modeling, Together at Last

Cheminformatics and Pharmacophore Modeling, Together at Last Application Guide Cheminformatics and Pharmacophore Modeling, Together at Last SciTegic Pipeline Pilot Bridging Accord Database Explorer and Discovery Studio Carl Colburn Shikha Varma-O Brien Introduction

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

Taverna

Taverna http://www.mygrid.org.uk/ http://www.taverna.org.uk/ Taverna Robert Haines, Stian Soiland-Reyes mygrid, University of Manchester rhaines@manchester.ac.uk http://orcid.org/0000-0002-9538-7919 This work

More information

Bioinformatics and escience. Abstract

Bioinformatics and escience. Abstract Bioinformatics and escience W.A. Gray 1 and C. Thompson 2 1 Cardiff University, School of Computer Science, PO Box 916, Cardiff CF24 3XF, 2 BBSRC, Polaris House, North Star Avenue, Swindon SN2 1UH Abstract

More information

speed thought Getting the most of CHEMAXON Integration June 2006 of The Power of at the

speed thought Getting the most of CHEMAXON Integration June 2006 of The Power of at the ETL Data Mining Workflow Engine In Database Analytics Process Knowledge Creation How Soon Can We Deliver? Which Project Is Most Successful? What More Information Do We Need? Where Is The Risk In My Portfolio?

More information

Soaplab - a unified Sesame door to analysis tools

Soaplab - a unified Sesame door to analysis tools Soaplab - a unified Sesame door to analysis tools Martin Senger, Peter Rice, Tom Oinn European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK http://industry.ebi.ac.uk/soaplab Abstract

More information

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf Jenkins as a Scientific Data and Image Processing Platform Ioannis K. Moutsatsos, Ph.D., M.SE. Novartis Institutes for Biomedical Research www.novartis.com June 18, 2014 #jenkinsconf Life Sciences are

More information

Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke

Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke Agilent s Kalabie Electronic Lab Notebook (ELN) Product Overview ChemAxon UGM 2008 Agilent Software and Informatics Division Mike Burke Kalabie: Extending the OpenLAB Architecture Agilent User Interface

More information

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and

More information

IO Informatics The Sentient Suite

IO Informatics The Sentient Suite IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric

More information

INCOGEN Professional Services

INCOGEN Professional Services Custom Solutions for Life Science Informatics Whitepaper INCOGEN, Inc. 3000 Easter Circle Williamsburg, VA 23188 www.incogen.com Phone: 757-221-0550 Fax: 757-221-0117 info@incogen.com Introduction INCOGEN,

More information

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100

More information

VIBE. Visual Integrated Bioinformatics Environment. Enter the Visual Age of Computational Genomics. Whitepaper

VIBE. Visual Integrated Bioinformatics Environment. Enter the Visual Age of Computational Genomics. Whitepaper VIBE Visual Integrated Bioinformatics Environment Whitepaper Enter the Visual Age of Computational Genomics INCOGEN, Inc. 104 George Perry Williamsburg, VA 23185 www.incogen.com Phone: 757-221-0550 info@incogen.com

More information

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18

The Mantid Project. The challenges of delivering flexible HPC for novice end users. Nicholas Draper SOS18 The Mantid Project The challenges of delivering flexible HPC for novice end users Nicholas Draper SOS18 What Is Mantid A framework that supports high-performance computing and visualisation of scientific

More information

An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis

An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis Clemens Neudecker, Mustafa Dogan, Sven Schlarb (IMPACT) Paolo Missier, Shoaib Sufi, Alan Williams, Katy Wolstencroft

More information

Software Development Kit

Software Development Kit Open EMS Suite by Nokia Software Development Kit Functional Overview Version 1.3 Nokia Siemens Networks 1 (21) Software Development Kit The information in this document is subject to change without notice

More information

Management of Scientific Experiments in Computational Modeling: Challenges and Perspectives

Management of Scientific Experiments in Computational Modeling: Challenges and Perspectives Management of Scientific Experiments in Computational Modeling: Challenges and Perspectives Regina Braga 1, Fernanda Campos 1, José Maria N. David 1, Jairo de Souza 1, Leonardo Azevedo 2, Kate Revoredo

More information

Open Source Software in Life Science Research. Woodhead Publishing Series in Biomedicine

Open Source Software in Life Science Research. Woodhead Publishing Series in Biomedicine Brochure More information from http://www.researchandmarkets.com/reports/2719842/ Open Source Software in Life Science Research. Woodhead Publishing Series in Biomedicine Description: The free/open source

More information

A W orkflow Management System for Bioinformatics Grid

A W orkflow Management System for Bioinformatics Grid A W orkflow Management System for Bioinformatics Grid Giovanni Aloisio, Massimo Cafaro, Sandro Fiore, Maria Mirto C A C T/IS U FI SP A CI, University of Lecce and NNL/INFM&CNR,Italy NETTAB 2005, 5-7 October

More information

The Open Analytics Platform

The Open Analytics Platform The Open Analytics Platform Bernd Wiswedel KNIME.com AG Agenda KNIME.com AG The KNIME Platform Recognition Small Sales Pitch KNIME and R the best of two worlds KNIME (Node) Development 2 A Brief History

More information

Integrated Rule-based Data Management System for Genome Sequencing Data

Integrated Rule-based Data Management System for Genome Sequencing Data Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer

More information

On the Applicability of Workflow Management Systems for the Preservation of Business Processes

On the Applicability of Workflow Management Systems for the Preservation of Business Processes On the Applicability of Workflow Management Systems for the Preservation of Business Processes Rudolf Mayer, Stefan Pröll, Andreas Rauber sproell@sba-research.org University of Technology Vienna, Austria

More information

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials Pharmaceutical leader deploys TIBCO Spotfire enterprise analytics platform across its drug discovery organization

More information

Primetime for KNIME:

Primetime for KNIME: Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director IT, Cenix BioScience Presentation for: KNIME User Group Meeting

More information

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja

More information

Describing Web Services for user-oriented retrieval

Describing Web Services for user-oriented retrieval Describing Web Services for user-oriented retrieval Duncan Hull, Robert Stevens, and Phillip Lord School of Computer Science, University of Manchester, Oxford Road, Manchester, UK. M13 9PL Abstract. As

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Semantic and Personalised Service Discovery

Semantic and Personalised Service Discovery Semantic and Personalised Service Discovery Phillip Lord 1, Chris Wroe 1, Robert Stevens 1,Carole Goble 1, Simon Miles 2, Luc Moreau 2, Keith Decker 2, Terry Payne 2 and Juri Papay 2 1 Department of Computer

More information

Classic Grid Architecture

Classic Grid Architecture Peer-to to-peer Grids Classic Grid Architecture Resources Database Database Netsolve Collaboration Composition Content Access Computing Security Middle Tier Brokers Service Providers Middle Tier becomes

More information

The Service Revolution software engineering without programming languages

The Service Revolution software engineering without programming languages The Service Revolution software engineering without programming languages Gustavo Alonso Institute for Pervasive Computing Department of Computer Science Swiss Federal Institute of Technology (ETH Zurich)

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

Development of Digital Libraries at Poznań Supercomputing and Networking Center

Development of Digital Libraries at Poznań Supercomputing and Networking Center Development of Digital Libraries at Poznań Supercomputing and Networking Center Cezary Mazurek, Sebastian Szuber 1 What for? Increasing amount of information Growing needs for faster and easier access

More information

IDL. Get the answers you need from your data. IDL

IDL. Get the answers you need from your data. IDL Get the answers you need from your data. IDL is the preferred computing environment for understanding complex data through interactive visualization and analysis. IDL Powerful visualization. Interactive

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

Architecture Styles. Software Architecture

Architecture Styles. Software Architecture Architecture Styles Software Architecture Architectural Styles and Strategies Software architecture is the first step in producing a design. Three design levels: 1. 2. 3. Architecture: requirements ->

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA Facilitate Parallel Computation Using Kepler Workflow System on Virtual Resources Jianwu Wang 1, Prakashan Korambath 2, Ilkay Altintas 1 1 San Diego Supercomputer Center, UCSD 2 Institute for Digital Research

More information

Building Semantic Content Management Framework

Building Semantic Content Management Framework Building Semantic Content Management Framework Eric Yen Computing Centre, Academia Sinica Outline What is CMS Related Work CMS Evaluation, Selection, and Metrics CMS Applications in Academia Sinica Concluding

More information

Monitoring Infrastructure (MIS) Software Architecture Document. Version 1.1

Monitoring Infrastructure (MIS) Software Architecture Document. Version 1.1 Monitoring Infrastructure (MIS) Software Architecture Document Version 1.1 Revision History Date Version Description Author 28-9-2004 1.0 Created Peter Fennema 8-10-2004 1.1 Processed review comments Peter

More information

SERVICE ORIENTED ARCHITECTURE

SERVICE ORIENTED ARCHITECTURE SERVICE ORIENTED ARCHITECTURE Introduction SOA provides an enterprise architecture that supports building connected enterprise applications to provide solutions to business problems. SOA facilitates the

More information

Leveraging the Eclipse TPTP* Agent Infrastructure

Leveraging the Eclipse TPTP* Agent Infrastructure 2005 Intel Corporation; made available under the EPL v1.0 March 3, 2005 Eclipse is a trademark of Eclipse Foundation, Inc 1 Leveraging the Eclipse TPTP* Agent Infrastructure Andy Kaylor Intel Corporation

More information

RIC 2007 SNAP: Symbolic Nuclear Analysis Package. Chester Gingrich USNRC/RES 3/13/07

RIC 2007 SNAP: Symbolic Nuclear Analysis Package. Chester Gingrich USNRC/RES 3/13/07 RIC 2007 SNAP: Symbolic Nuclear Analysis Package Chester Gingrich USNRC/RES 3/13/07 1 SNAP: What is it? Standard Graphical User Interface designed to simplify the use of USNRC analytical codes providing:

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative

More information

secure intelligence collection and assessment system Your business technologists. Powering progress

secure intelligence collection and assessment system Your business technologists. Powering progress secure intelligence collection and assessment system Your business technologists. Powering progress The decisive advantage for intelligence services The rising mass of data items from multiple sources

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.

More information

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire akodgire@indiana.edu Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...

More information

Dr Alexander Henzing

Dr Alexander Henzing Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

Using Ontology with Semantic Web Services to Support Modeling in Systems Biology

Using Ontology with Semantic Web Services to Support Modeling in Systems Biology Using Ontology with Semantic Web Services to Support Modeling in Systems Biology!houyang Sun, Anthony Finkelstein, Jonathan Ashmore CoMPLEX, University College London, WC1E 6BT, London Ej.sun, a.finkelsteinijcs.ucl.ac.uk,

More information

PerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang

PerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang PerCuro-A Semantic Approach to Drug Discovery Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang Towards the fulfillment of the course Semantic Web CSCI 8350 Fall 2003 Under

More information

Workprogramme 2014-15

Workprogramme 2014-15 Workprogramme 2014-15 e-infrastructures DCH-RP final conference 22 September 2014 Wim Jansen einfrastructure DG CONNECT European Commission DEVELOPMENT AND DEPLOYMENT OF E-INFRASTRUCTURES AND SERVICES

More information

Integrating pharmacological data

Integrating pharmacological data Integrating pharmacological data For scientists For software and application developers A semantic data integration infrastructure Open PHACTS is a 3-year project of the Innovative Medicines Initiative

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload

More information

Next Generation Sequencing Informatics Markets

Next Generation Sequencing Informatics Markets Next Generation Sequencing Informatics Markets Greg Caressi SVP Healthcare & Life Sciences November, 2014 Personalization, Communication, Decentralization, Collaboration From... One Size Fits All APPROACH...To

More information

UGENE Quick Start Guide

UGENE Quick Start Guide Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo 178627 Database And Data Mining Research Group Summary RapidMiner project Strengths How to use RapidMiner Operator

More information

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

THE CCLRC DATA PORTAL

THE CCLRC DATA PORTAL THE CCLRC DATA PORTAL Glen Drinkwater, Shoaib Sufi CCLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire, WA4 4AD, UK. E-mail: g.j.drinkwater@dl.ac.uk, s.a.sufi@dl.ac.uk Abstract: The project aims

More information

IOTIVITY AND EMBEDDED LINUX SUPPORT. Kishen Maloor Intel Open Source Technology Center

IOTIVITY AND EMBEDDED LINUX SUPPORT. Kishen Maloor Intel Open Source Technology Center IOTIVITY AND EMBEDDED LINUX SUPPORT Kishen Maloor Intel Open Source Technology Center Outline Brief introduction to IoTivity Software development challenges in embedded Yocto Project and how it addresses

More information

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR)

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR) Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR) Enable Science in silico & Provide the Right Knowledge to the Right People at the Right Time to enable the

More information

A Platform for Collaborative e-science Applications. Marian Bubak ICS / Cyfronet AGH Krakow, PL bubak@agh.edu.pl

A Platform for Collaborative e-science Applications. Marian Bubak ICS / Cyfronet AGH Krakow, PL bubak@agh.edu.pl A Platform for Collaborative e-science Applications Marian Bubak ICS / Cyfronet AGH Krakow, PL bubak@agh.edu.pl Outline Motivation Idea of an experiment Virtual laboratory Examples of experiments Summary

More information

Overview. Stakes. Context. Model-Based Development of Safety-Critical Systems

Overview. Stakes. Context. Model-Based Development of Safety-Critical Systems 1 2 Model-Based Development of -Critical Systems Miguel A. de Miguel 5/6,, 2006 modeling Stakes 3 Context 4 To increase the industrial competitiveness in the domain of software systems To face the growing

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Lawrence Tarbox, Ph.D. Washington University in St. Louis School of Medicine Mallinckrodt Institute of Radiology, Electronic Radiology Lab

Lawrence Tarbox, Ph.D. Washington University in St. Louis School of Medicine Mallinckrodt Institute of Radiology, Electronic Radiology Lab Washington University in St. Louis School of Medicine Mallinckrodt Institute of Radiology, Electronic Radiology Lab APPLICATION HOSTING 12/5/2008 1 Disclosures The presenter s work on Application Hosting

More information

Semantic and Personalised Service Discovery

Semantic and Personalised Service Discovery Semantic and Personalised Service Discovery Phillip Lord 1, Chris Wroe 1, Robert Stevens 1,Carole Goble 1, Simon Miles 2, Luc Moreau 2, Keith Decker 2, Terry Payne 2 and Juri Papay 2 1 Department of Computer

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Introduction to Research Data Management

Introduction to Research Data Management Introduction to Research Data Management Erik Brisson ebrisson@bu.edu Why data management? Increase likelihood of successful research More understandable view of process Avoid catastrophic loss Increase

More information

Handling next generation sequence data

Handling next generation sequence data Handling next generation sequence data a pilot to run data analysis on the Dutch Life Sciences Grid Barbera van Schaik Bioinformatics Laboratory - KEBB Academic Medical Center Amsterdam Very short intro

More information

QsarDB first 100 DOIs for predictive models

QsarDB first 100 DOIs for predictive models QsarDB first 100 DOIs for predictive models Uko Maran Institute of chemistry, University of Tartu, Estonia LOD: Content Data Predictive (and descriptive) models? Goal Components Persistent digital identifiers

More information

2012 LABVANTAGE Solutions, Inc. All Rights Reserved.

2012 LABVANTAGE Solutions, Inc. All Rights Reserved. LABVANTAGE Architecture 2012 LABVANTAGE Solutions, Inc. All Rights Reserved. DOCUMENT PURPOSE AND SCOPE This document provides an overview of the LABVANTAGE hardware and software architecture. It is written

More information

SOFTWARE TESTING TRAINING COURSES CONTENTS

SOFTWARE TESTING TRAINING COURSES CONTENTS SOFTWARE TESTING TRAINING COURSES CONTENTS 1 Unit I Description Objectves Duration Contents Software Testing Fundamentals and Best Practices This training course will give basic understanding on software

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing

More information

Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) A Data Driven Science Gateway for Computational Workflows

Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) A Data Driven Science Gateway for Computational Workflows Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) A Data Driven Science Gateway for Computational Workflows (richard.grunzke@tu-dresden.de) Introduction MoSGrid Science Gateway - Simple and

More information

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware R. Goranova University of Sofia St. Kliment Ohridski,

More information

Design of a Scientic Workow for the Analysis of Microarray experiments with Taverna and R

Design of a Scientic Workow for the Analysis of Microarray experiments with Taverna and R Design of a Scientic Workow for the Analysis of Microarray experiments with Taverna and R Marcus Ertelt Proposal for a diploma thesis December 2006 - May 2007 referees: Prof. Dr. Ulf Leser, PD Dr. Wolfgang

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.3 Selected Standards

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

Semantic Web Systems Web Services Part 1 Jacques Fleuriot School of Informatics

Semantic Web Systems Web Services Part 1 Jacques Fleuriot School of Informatics Semantic Web Systems Web Services Part 1 Jacques Fleuriot School of Informatics 9 th March 2015 Antecedents B2B Previous a+empts at distributed compu2ng (CORBA, Distributed Smalltalk, Java RMI) have yielded

More information

Dynamix: An Open Plug-and-Play Context Framework for Android

Dynamix: An Open Plug-and-Play Context Framework for Android Dynamix: An Open Plug-and-Play Context Framework for Android Darren Carlson and Andreas Schrader Ambient Computing Group / Institute of Telematics University of Lübeck, Germany www.ambient.uni-luebeck.de

More information

Running Large Workflows in the Cloud

Running Large Workflows in the Cloud Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk The team: Jacek Cala, Hugo Hiden, Simon Woodman, David Leahy

More information

AUTOMATIC WORKFLOW PROCESS AND DATA ANNOTATION: DESIGN AND EXECUTION ENVIRONMENT AMRITA BASU. (Under the Direction of Krzysztof J.

AUTOMATIC WORKFLOW PROCESS AND DATA ANNOTATION: DESIGN AND EXECUTION ENVIRONMENT AMRITA BASU. (Under the Direction of Krzysztof J. AUTOMATIC WORKFLOW PROCESS AND DATA ANNOTATION: DESIGN AND EXECUTION ENVIRONMENT by AMRITA BASU (Under the Direction of Krzysztof J. Kochut) ABSTRACT This thesis presents an ontology-based approach to

More information

GenomeSpace Architecture

GenomeSpace Architecture GenomeSpace Architecture The primary services, or components, are shown in Figure 1, the high level GenomeSpace architecture. These include (1) an Authorization and Authentication service, (2) an analysis

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

10/21/10. Formatvorlage des Untertitelmasters durch Klicken bearbeiten

10/21/10. Formatvorlage des Untertitelmasters durch Klicken bearbeiten Formatvorlage des Untertitelmasters durch Klicken bearbeiten Introduction Who is Pramari? Leading US Based RFID Software and Consulting Company Member of EPCGlobal (Standards Group for RFID) Partnered

More information

Peer to Peer Search Engine and Collaboration Platform Based on JXTA Protocol

Peer to Peer Search Engine and Collaboration Platform Based on JXTA Protocol Peer to Peer Search Engine and Collaboration Platform Based on JXTA Protocol Andraž Jere, Marko Meža, Boštjan Marušič, Štefan Dobravec, Tomaž Finkšt, Jurij F. Tasič Faculty of Electrical Engineering Tržaška

More information

UniGR Workshop: Big Data «The challenge of visualizing big data»

UniGR Workshop: Big Data «The challenge of visualizing big data» Dept. ISC Informatics, Systems & Collaboration UniGR Workshop: Big Data «The challenge of visualizing big data» Dr Ir Benoît Otjacques Deputy Scientific Director ISC The Future is Data-based Can we help?

More information

Business-Driven Software Engineering Lecture 3 Foundations of Processes

Business-Driven Software Engineering Lecture 3 Foundations of Processes Business-Driven Software Engineering Lecture 3 Foundations of Processes Jochen Küster jku@zurich.ibm.com Agenda Introduction and Background Process Modeling Foundations Activities and Process Models Summary

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information