A Grid Data Integration Service (OGSA-DQP)

Size: px
Start display at page:

Download "A Grid Data Integration Service (OGSA-DQP)"

Transcription

1 A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne based on the work of Norman Paton, Tasos Gounaris, Alvaro Fernandes, Rizos Sakellariou University of Manchester Jim Smith, Arijit Mukherjee, Paul Watson University of Newcastle-upon-Tyne

2 The Problem Many grid applications would benefit from access to distributed data Data sources are scattered and autonomous Integration is often done by tedious manual process or (recently) hand-coded workflows We are interested in how to simplify the process of querying distributed data Focussing initially on information held in (relational) databases 2

3 Distributed Query Processing Queries are expressed in OQL allows computations to be included in the query A single query may reference data at multiple sites the data locations may be transparent to the query author select p.proteinid, Blast(p.sequence) from protein p, proteinterm t where t.termid = S92 and p.proteinid = t.proteinid 3

4 Query Compiler OGSA-DQP automatically compiles and executes the query on a set of Grid nodes - in parallel where possible OQL Parser Logical Optimiser Physical Optimiser Single-node optimiser Multi-node optimiser Partitioner Scheduler Evaluator 4

5 Execution Plan select p.proteinid, Blast(p.sequence) from protein p, proteinterm t where t.termid = S92 and p.proteinid = t.proteinid The plan is split in to a set of partitions Grid resources are acquired to execute the partitions in parallel where possible, required and affordable 9, exchange reduce reduce op_call (Blast) exchange hash_join (proteinid) exchange reduce 1 2 table_scan (protein) table_scan termid=s92 (proteinterm) 5

6 Evaluation on the Grid The OGSA-DQP builds on OGSA-DAI accesses relational databases wrapped by OGSA-DAI Oracle, DB2, MySQL Data streams between nodes flow control All services are OGSI-compliant built on GT3 6

7 perform(querysubplan) Execution on the Grid GDQ 3 3 Client G 1 GDS GDQS GDT perform(query) N0 perform(querysubplan) 2 perform(querysubplan) createservice 2 4 Factory GQESF G createservice Factory GQESF G results GDS 3 GDS GDT GQES1 G GDT GQES1 G N3 N4 4 results GDS GDT GQES2 G operation_call blast(p.sequence) reduce (p.proteinid, blast) Factory GDS GDS G GQESF G reduce (p.proteinid, blast) 14 GDT GQES3 G hash_join (p.proteinid=t.proteinid) results sequential_scan N2 reduce (proteinid,sequence) Web Services (BLAST) reduce (proteinid) N1 2 createservice operation_call blast(p.sequence) Factory GQESF G GDS G sequential_scan (term=8372) 7

8 Mutual Benefit The Grid needs DQP: Declarative, high-level resource integration with implicit parallelism DQP needs the Grid: Systematic access to remote data and computational resources Cost based optimisation Dynamic resource discovery and allocation 8

9 Summary DQP is a potentially important technology for the Grid OGSA-DQP supports: declarative expression of queries location transparency access to both data and computational resources dynamic deployment on Grid resources implicit parallelism First release made in September 2003 available for download Dynamic adaptation now being investigated fault-tolerance, performance, cost 9

10 Experiences and Issues Remote service deployment not yet available for Grids, but some work PhD Project at Newcastle (Chris Fowler) dynamically deploy individual services remotely initial prototype by end of November 2003 working on security issues WS only GridShed project (Newcastle + BT) design of hosting environments for Grids install execution images on nodes as required 10

11 Experiences & Issues DQP vs Workflow? for what space of problems is each better DQP advantages? declarative expression of intent cost-based choice of execution plans implicit parallelisation Investigating with Bioinformatics applications in the my Grid project DQP with workflows & workflows with DQP 11

12 Projects/Sponsors Projects OGSA-DAI Polar Polar* my Grid Sponsors 12

Service-Based Distributed Querying on the Grid

Service-Based Distributed Querying on the Grid Service-Based Distributed Querying on the rid M. Nedim Alpdemir 1, Arijit Mukherjee 2, Norman W. Paton 1, Paul Watson 2, Alvaro A.A. Fernandes 1, Anastasios ounaris 1, and Jim Smith 2 1 Department of Computer

More information

OGSA-DQP: A Service-Based Distributed Query Processor for the Grid

OGSA-DQP: A Service-Based Distributed Query Processor for the Grid OSA-DQP: A Service-Based Distributed Query Processor for the rid M. Nedim Alpdemir 1, Arijit Mukherjee 2, Anastasios ounaris 1, Norman W.Paton 1, Paul Watson 2, Alvaro A.A. Fernandes 1, Jim Smith 2 (1)

More information

Data Services @neurist and beyond

Data Services @neurist and beyond s @neurist and beyond Siegfried Benkner Department of Scientific Computing Faculty of Computer Science University of Vienna http://www.par.univie.ac.at Department of Scientific Computing Parallel Computing

More information

OGSA - A Guide to Data Access and Integration in UK

OGSA - A Guide to Data Access and Integration in UK The OGSA-DAI Project Databases and the Grid Neil Chue Hong Principal Consultant EPCC, Edinburgh N.ChueHong@epcc.ed.ac.uk What is OGSA-DAI? 4It is a project: OGSA Data Access and Integration: funded by

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

DATA INTEGRATION AND QUERY REFORMULATION IN SERVICE-BASED GRIDS

DATA INTEGRATION AND QUERY REFORMULATION IN SERVICE-BASED GRIDS DATA INTEGRATION AND QUERY REFORMULATION IN SERVICE-BASED GRIDS Carmela Comito and Domenico Talia DEIS, University of Calabria, Italy ccomito@deis.unical.it talia@deis.unical.it Anastasios Gounaris and

More information

SERVICE CHOREOGRAPHY FOR DATA INTEGRATION ON THE GRID

SERVICE CHOREOGRAPHY FOR DATA INTEGRATION ON THE GRID SERVICE CHOREOGRAPHY FOR DATA INTEGRATION ON THE GRID Anastasios Gounaris and Rizos Sakellariou School of Computer Science, University of Munchester; UK gounaris@cs.man.ac.uk rizos@cs.man.ac.uk Carrnela

More information

Grid Data Integration Based on Schema Mapping

Grid Data Integration Based on Schema Mapping Grid Data Integration Based on Schema Mapping Carmela Comito and Domenico Talia DEIS, University of Calabria, Via P. Bucci 41 c, 87036 Rende, Italy {ccomito, talia}@deis.unical.it http://www.deis.unical.it/

More information

Grid Data Integration based on Schema-mapping

Grid Data Integration based on Schema-mapping Grid Data Integration based on Schema-mapping Carmela Comito and Domenico Talia DEIS, University of Calabria, Via P. Bucci 41 c, 87036 Rende, Italy {ccomito, talia}@deis.unical.it http://www.deis.unical.it/

More information

GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT

GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT Hemant Mehta 1, Priyesh Kanungo 2 and Manohar Chandwani 3 1 School of Computer Science, Devi Ahilya University, Indore,

More information

Grid Data Management Systems & Services

Grid Data Management Systems & Services Grid Data Management Systems & Services Data Grid Management Systems Part I Arun Jagatheesan, Reagan Moore Grid Services for Structured data Part II Paul Watson, Norman Paton VLDB Tutorial Berlin, 2003

More information

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12518

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12518 Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited

More information

Experiences of Designing and Implementing Grid Database Services in the OGSA-DAI project

Experiences of Designing and Implementing Grid Database Services in the OGSA-DAI project Experiences of Designing and Implementing Grid Database Services in the OGSA-DAI project Mario Antonioletti 1, Neil Chue Hong 1, Ally Hume 1, Mike Jackson 1, Amy Krause 1, Jeremy Nowell 2, Charaka Palansuriya

More information

Gradient An EII Solution From Infosys

Gradient An EII Solution From Infosys Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such

More information

Data Access and Integration in the ISPIDER Proteomics Grid

Data Access and Integration in the ISPIDER Proteomics Grid Data Access and Integration in the ISPIDER Proteomics Grid Lucas Zamboulis 1,2, Hao Fan 1,2,, Khalid Belhajjame 3, Jennifer Siepen 3, Andrew Jones 3, Nigel Martin 1, Alexandra Poulovassilis 1, Simon Hubbard

More information

International Summer School on Grid Computing. Naples, Italy. Introduction to OGSA-DAI

International Summer School on Grid Computing. Naples, Italy. Introduction to OGSA-DAI International Summer School on Grid Computing Naples, Italy Introduction to OGSA-DAI Prof. Malcolm Atkinson Director www.nesc.ac.uk 21 st July 2003 1 Workshop Overview OGSA-DAI Workshop 08:30 Information

More information

XML Data Integration in OGSA Grids

XML Data Integration in OGSA Grids XML Data Integration in OGSA Grids Carmela Comito and Domenico Talia University of Calabria Italy comito@si.deis.unical.it Outline Introduction Data Integration and Grids The XMAP Data Integration Framework

More information

Real-time Data Replication

Real-time Data Replication Real-time Data Replication from Oracle to other databases using DataCurrents WHITEPAPER Contents Data Replication Concepts... 2 Real time Data Replication... 3 Heterogeneous Data Replication... 4 Different

More information

DATABASES AND THE GRID

DATABASES AND THE GRID DATABASES AND THE GRID Paul Watson Department of Computing Science, University of Newcastle, Newcastle-upon-Tyne, NE1 7RU, UK e-mail: Paul.Watson@newcastle.ac.uk Telephone: +44 191 222 7653 Fax: +44 191

More information

Distributed Databases

Distributed Databases Distributed Databases Chapter 1: Introduction Johann Gamper Syllabus Data Independence and Distributed Data Processing Definition of Distributed databases Promises of Distributed Databases Technical Problems

More information

DISTRIBUTED AND PARALLELL DATABASE

DISTRIBUTED AND PARALLELL DATABASE DISTRIBUTED AND PARALLELL DATABASE SYSTEMS Tore Risch Uppsala Database Laboratory Department of Information Technology Uppsala University Sweden http://user.it.uu.se/~torer PAGE 1 What is a Distributed

More information

Parallel Processing of JOIN Queries in OGSA-DAI

Parallel Processing of JOIN Queries in OGSA-DAI Parallel Processing of JOIN Queries in OGSA-DAI Fan Zhu Aug 21, 2009 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2009 Abstract JOIN Query is the most important and

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Dynamic allocation of servers to jobs in a grid hosting environment

Dynamic allocation of servers to jobs in a grid hosting environment Dynamic allocation of s to in a grid hosting environment C Kubicek, M Fisher, P McKee and R Smith As computational resources become available for use over the Internet, a requirement has emerged to reconfigure

More information

Optimizing Utility in Cloud Computing through Autonomic Workload Execution

Optimizing Utility in Cloud Computing through Autonomic Workload Execution Optimizing Utility in Cloud Computing through Autonomic Workload Execution Norman W. Paton, Marcelo A. T. de Aragão, Kevin Lee, Alvaro A. A. Fernandes, Rizos Sakellariou School of Computer Science, University

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

Java EE Web Development Course Program

Java EE Web Development Course Program Java EE Web Development Course Program Part I Introduction to Programming 1. Introduction to programming. Compilers, interpreters, virtual machines. Primitive types, variables, basic operators, expressions,

More information

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011 Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball odiago, inc. OSCON Data 2011 The plan Background: Flume introduction The need for online analytics Introducing FlumeBase Demo! FlumeBase

More information

HALOGEN. Technical Design Specification. Version 2.0

HALOGEN. Technical Design Specification. Version 2.0 HALOGEN Technical Design Specification Version 2.0 10th August 2010 1 Document Revision History Date Author Revision Description 27/7/09 D Carter, Mark Widdowson, Stuart Poulton, Lex Comber 1.1 First draft

More information

An Evaluation of the Application Hosting Environment Uk e-science Engineering Task Force

An Evaluation of the Application Hosting Environment Uk e-science Engineering Task Force UK e-science Technical Report Series ISSN 1751-5971 An Evaluation of the Application Hosting Environment Uk e-science Engineering Task Force Matteo Turilli, Oxford e-research Centre 3 October 2007 Abstract:

More information

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult

More information

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY Tokyo. Koln Sebastopol. Cambridge Farnham. FIFTH EDITION Oracle Essentials Rick Greenwald, Robert Stackowiak, and Jonathan Stern O'REILLY" Beijing Cambridge Farnham Koln Sebastopol Tokyo _ Table of Contents Preface xiii 1. Introducing Oracle 1

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

Optimizing Utility in Cloud Computing through Autonomic Workload Execution

Optimizing Utility in Cloud Computing through Autonomic Workload Execution Optimizing Utility in Cloud Computing through Autonomic Workload Execution Norman W. Paton, Marcelo A. T. de Aragão, Kevin Lee, Alvaro A. A. Fernandes, Rizos Sakellariou School of Computer Science, University

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

chapater 7 : Distributed Database Management Systems

chapater 7 : Distributed Database Management Systems chapater 7 : Distributed Database Management Systems Distributed Database Management System When an organization is geographically dispersed, it may choose to store its databases on a central database

More information

Data Lab System Architecture

Data Lab System Architecture Data Lab System Architecture Data Lab Context Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication

More information

Profiling as a Service

Profiling as a Service Profiling as a Service Table of Contents 1. PraaS Overview 2 2. The Profiling Goal 2 3. What do you get from Profiling? 2 4. How PraaS Improves the Profiling Experience 2 5. What is the Profiling Process?

More information

DObjects: Enabling Distributed Data Services for Metacomputing Platforms

DObjects: Enabling Distributed Data Services for Metacomputing Platforms DObjects: Enabling Distributed Data Services for Metacomputing Platforms Pawel Jurczyk, Li Xiong, and Vaidy Sunderam Emory University, Atlanta GA 30322, USA {pjurczy,lxiong,vss}@emory.edu Abstract. Many

More information

Integration strategy

Integration strategy C3-INAD and ESGF: Integration strategy C3-INAD Middleware Team: Stephan Kindermann, Carsten Ehbrecht [DKRZ] Bernadette Fritzsch [AWI] Maik Jorra, Florian Schintke, Stefan Plantikov [ZUSE Institute] Markus

More information

Data Wrangling: The Elephant in the Room of Big Data. Norman Paton University of Manchester

Data Wrangling: The Elephant in the Room of Big Data. Norman Paton University of Manchester Data Wrangling: The Elephant in the Room of Big Data Norman Paton University of Manchester Data Wrangling Definitions: a process of iterative data exploration and transformation that enables analysis [1].

More information

Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services

Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz Multiscale Computing Lab Biomedical Informatics Department

More information

A Uniform Approach to Workflow and Data Integration

A Uniform Approach to Workflow and Data Integration A Uniform Approach to Workflow and Data Integration Lucas Zamboulis 1, 2, Nigel Martin 1, Alexandra Poulovassilis 1 1 School of Computer Science and Information Systems, Birkbeck, Univ. of London 2 Department

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Why Standardize on Oracle Database 11g Next Generation Database Management. Thomas Kyte http://asktom.oracle.com

Why Standardize on Oracle Database 11g Next Generation Database Management. Thomas Kyte http://asktom.oracle.com Why Standardize on Oracle Database 11g Next Generation Database Management Thomas Kyte http://asktom.oracle.com Top Challenges Performance Management Change Management Ongoing Administration Storage Backup

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Client/Server Grid applications to manage complex workflows

Client/Server Grid applications to manage complex workflows Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT) Outline Science Gateways and Client/Server computing Client/server

More information

MANAGING SCIENTIFIC DATA WITH NDN

MANAGING SCIENTIFIC DATA WITH NDN MANAGING SCIENTIFIC DATA WITH NDN Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto, Catherine Olschanowsky, Christos Papadopoulos NDNcomm 2015 Sept 28, 2015 Los Angeles, CA Supported by NSF #13410999

More information

Managing R12 EBS using OEM with the Application Management and Application Change Management Packs

Managing R12 EBS using OEM with the Application Management and Application Change Management Packs Managing R12 EBS using OEM with the Application Management and Application Change Management Packs John Stouffer john.w.stouffer@gmail.com www.justadba.com John Stouffer Board Member 2010-2011, OAUG Oracle

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Data integration for metagenomics: current status and future plans

Data integration for metagenomics: current status and future plans integration for metagenomics: current status and future plans Neil Wipat Computing Science University of Newcastle NERC Microbial Metagenomics Overview metamicrobase Current method of data integration

More information

Bruce Momjian June, 2008. Postgres Plus Technical Overview

Bruce Momjian June, 2008. Postgres Plus Technical Overview Bruce Momjian June, 2008 Postgres Plus Technical Overview PostgreSQL Heritage Independent & Thriving Development Community 10 committers and ~200 reviewers 1,500 contributors and 10,000+ members 2,000,000+

More information

Tier Architectures. Kathleen Durant CS 3200

Tier Architectures. Kathleen Durant CS 3200 Tier Architectures Kathleen Durant CS 3200 1 Supporting Architectures for DBMS Over the years there have been many different hardware configurations to support database systems Some are outdated others

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Integrated Communication Systems

Integrated Communication Systems Integrated Communication Systems Courses, Research, and Thesis Topics Prof. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de

More information

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces Software Engineering, Lecture 4 Decomposition into suitable parts Cross cutting concerns Design patterns I will also give an example scenario that you are supposed to analyse and make synthesis from The

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex

More information

The Virtual Grid Application Development Software (VGrADS) Project

The Virtual Grid Application Development Software (VGrADS) Project The Virtual Grid Application Development Software (VGrADS) Project VGrADS: Enabling e-science Workflows on Grids and Clouds with Fault Tolerance http://vgrads.rice.edu/ VGrADS Goal: Distributed Problem

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information

Self-optimizing Block Transfer in Web Service Grids

Self-optimizing Block Transfer in Web Service Grids Self-optimizing Block Transfer in Web Service Grids Anastasios Gounaris, Marios D. Dikaiakos {gounaris, mdd}@cs.ucy.ac.cy Department of Computer Science - University of Cyprus, Cyprus Christos Yfoulis

More information

Efficient Data Storage and Analysis for Generic Biomolecular Simulation Data

Efficient Data Storage and Analysis for Generic Biomolecular Simulation Data Efficient Data Storage and Analysis for Generic Biomolecular Simulation Data Muan Hong Ng 1, Steven Johnston 1, Stuart Murdock 2, Bing Wu 3, Kaihsu Tai 4, Hans Fangohr 1, Simon Cox 1, Jonathan W. Essex

More information

Proteome Data Integration: Characteristics and Challenges

Proteome Data Integration: Characteristics and Challenges Proteome Data Integration: Characteristics and Challenges K. Belhajjame 1, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob 4, S.J. Hubbard 1, D. Jones 3, P. Jones 4, N. Martin 2, S. Oliver 1, C. Orengo

More information

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting 15.04.2014

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting 15.04.2014 Writing & Running Pipelines on the Open Grid Engine using QMake Wibowo Arindrarto DTLS Focus Meeting 15.04.2014 Makefile (re)introduction Atomic recipes / rules that define full pipelines Initially written

More information

Big Data Database Revenue and Market Forecast, 2012-2017

Big Data Database Revenue and Market Forecast, 2012-2017 Wikibon.com - http://wikibon.com Big Data Database Revenue and Market Forecast, 2012-2017 by David Floyer - 13 February 2013 http://wikibon.com/big-data-database-revenue-and-market-forecast-2012-2017/

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc. Impala: A Modern, Open-Source SQL Engine for Hadoop Marcel Kornacker Cloudera, Inc. Agenda Goals; user view of Impala Impala performance Impala internals Comparing Impala to other systems Impala Overview:

More information

ADMIRE D3.3 ADMIRE Platform Release 2

ADMIRE D3.3 ADMIRE Platform Release 2 ADMIRE FRAMEWORK 7 ICT 215024 ADMIRE D3.3 ADMIRE Platform Release 2 Project Title ADMIRE Document Title ADMIRE Platform Release 2 Deliverable Number D3.3 Authorship Radek Ostrowski, Rob Baxter Document

More information

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Sector vs. Hadoop. A Brief Comparison Between the Two Systems Sector vs. Hadoop A Brief Comparison Between the Two Systems Background Sector is a relatively new system that is broadly comparable to Hadoop, and people want to know what are the differences. Is Sector

More information

Tutorial on Client-Server Architecture

Tutorial on Client-Server Architecture Tutorial on Client-Server Architecture SEEM3430 Information Systems Analysis and Design Pengfei Liu Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong March

More information

The HP Neoview data warehousing platform for business intelligence

The HP Neoview data warehousing platform for business intelligence The HP Neoview data warehousing platform for business intelligence Ronald Wulff EMEA, BI Solution Architect HP Software - Neoview 2006 Hewlett-Packard Development Company, L.P. The inf ormation contained

More information

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Deployment Topologies

Deployment Topologies , page 1 Multinode Cluster with Unified Nodes, page 2 Clustering Considerations, page 3 Cisco Unified Communications Domain Manager 10.6(x) Redundancy and Disaster Recovery, page 4 Capacity Considerations,

More information

Application of Distributed Database Concepts to RAN Configuration Management

Application of Distributed Database Concepts to RAN Configuration Management Application of Distributed Database Concepts to RAN Configuration Management Henning Sanneck, Christoph Schmelz Nokia Siemens Networks Alan Southall, Joachim Sokol, Christian Kleegrewe, Christoph Gerdes

More information

Oracle PharmaGRID Response. Dave Pearson Oracle Corporation UK

Oracle PharmaGRID Response. Dave Pearson Oracle Corporation UK Oracle PharmaGRID Response Dave Pearson Oracle Corporation UK Grid Concepts and Vision! Everything is a service! Resource virtualisation and sharing Hardware, storage, network, data, function, instruments

More information

COMPUTING SCIENCE. Scalable and Responsive Event Processing in the Cloud. Visalakshmi Suresh, Paul Ezhilchelvan and Paul Watson

COMPUTING SCIENCE. Scalable and Responsive Event Processing in the Cloud. Visalakshmi Suresh, Paul Ezhilchelvan and Paul Watson COMPUTING SCIENCE Scalable and Responsive Event Processing in the Cloud Visalakshmi Suresh, Paul Ezhilchelvan and Paul Watson TECHNICAL REPORT SERIES No CS-TR-1251 June 2011 TECHNICAL REPORT SERIES No

More information

Rob Zoeteweij CUSTOMER CASE CONFIGURATION MANAGEMENT PROVISIONING & AUTOMATED PATCHING

Rob Zoeteweij CUSTOMER CASE CONFIGURATION MANAGEMENT PROVISIONING & AUTOMATED PATCHING Rob Zoeteweij CUSTOMER CASE CONFIGURATION MANAGEMENT PROVISIONING & AUTOMATED PATCHING Agenda Case introduction Configuration Management Provisioning & Automated Patching Finally Q & A Case introduction

More information

Data Lab Operations Concepts

Data Lab Operations Concepts Data Lab Operations Concepts 1 Introduction This talk will provide an overview of Data Lab components to be implemented Core infrastructure User applications Science Capabilities User Interfaces The scope

More information

IV Distributed Databases - Motivation & Introduction -

IV Distributed Databases - Motivation & Introduction - IV Distributed Databases - Motivation & Introduction - I OODBS II XML DB III Inf Retr DModel Motivation Expected Benefits Technical issues Types of distributed DBS 12 Rules of C. Date Parallel vs Distributed

More information

Oracle Data Integration Solutions GoldenGate New Features Summary

Oracle Data Integration Solutions GoldenGate New Features Summary Oracle Data Integration Solutions GoldenGate New Features Summary Valarie Bedard Principle Sales Consultant Tony Velardo DIS Account Manager Oracle Data Integration Moving Data to Transform Business 1

More information

Play with Big Data on the Shoulders of Open Source

Play with Big Data on the Shoulders of Open Source OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19

More information

Consolidation Technology in the Cloud Data Processing

Consolidation Technology in the Cloud Data Processing ISBN 978-93-84422-37-0 2015 International Conference on Advances in Software, Control and Mechanical Engineering (ICSCME'2015) Antalya (Turkey) Sept. 7-8, 2015 pp. 1-5 Consolidation Technology in the Cloud

More information

Application of Predictive Analytics for Better Alignment of Business and IT

Application of Predictive Analytics for Better Alignment of Business and IT Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker

More information

Interoperability between Sun Grid Engine and the Windows Compute Cluster

Interoperability between Sun Grid Engine and the Windows Compute Cluster Interoperability between Sun Grid Engine and the Windows Compute Cluster Steven Newhouse Program Manager, Windows HPC Team steven.newhouse@microsoft.com 1 Computer Cluster Roadmap Mainstream HPC Mainstream

More information

Enterprise Level Change Control: A Life Science Business Imperative. Presented by: Carl Ning Solutions Delivery Manager Sparta Systems

Enterprise Level Change Control: A Life Science Business Imperative. Presented by: Carl Ning Solutions Delivery Manager Sparta Systems Enterprise Level Change Control: A Life Science Business Imperative Presented by: Carl Ning Solutions Delivery Manager Sparta Systems Agenda Global Change Control: An Overview Benefits and Challenges Change

More information

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

Provisioning and Resource Management at Large Scale (Kadeploy and OAR) Provisioning and Resource Management at Large Scale (Kadeploy and OAR) Olivier Richard Laboratoire d Informatique de Grenoble (LIG) Projet INRIA Mescal 31 octobre 2007 Olivier Richard ( Laboratoire d Informatique

More information

LOG MANAGEMENT AND SIEM FOR SECURITY AND COMPLIANCE

LOG MANAGEMENT AND SIEM FOR SECURITY AND COMPLIANCE PRODUCT BRIEF LOG MANAGEMENT AND SIEM FOR SECURITY AND COMPLIANCE As part of the Tripwire VIA platform, Tripwire Log Center offers out-of-the-box integration with Tripwire Enterprise to offer visibility

More information

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,

More information

Technology Strategies for Big Data Analytics Paul Bachteal Director, Americas Technology Practice

Technology Strategies for Big Data Analytics Paul Bachteal Director, Americas Technology Practice Technology Strategies for Big Data Analytics Paul Bachteal Director, Americas Technology Practice THRIVING IN THE BIG DATA ERA DATA SIZE VOLUME VARIETY VELOCITY VALUE TODAY THE FUTURE BIG DATA ANALYTICS

More information

Service Oriented Architectures

Service Oriented Architectures 8 Service Oriented Architectures Gustavo Alonso Computer Science Department Swiss Federal Institute of Technology (ETHZ) alonso@inf.ethz.ch http://www.iks.inf.ethz.ch/ The context for SOA A bit of history

More information

Distributed Databases in a Nutshell

Distributed Databases in a Nutshell Distributed Databases in a Nutshell Marc Pouly Marc.Pouly@unifr.ch Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

ORACLE DATABASE 10G ENTERPRISE EDITION

ORACLE DATABASE 10G ENTERPRISE EDITION ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information