XML Data Integration in OGSA Grids



Similar documents
Grid Data Integration based on Schema-mapping

Grid Data Integration Based on Schema Mapping

GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT

DATA INTEGRATION AND QUERY REFORMULATION IN SERVICE-BASED GRIDS

SERVICE CHOREOGRAPHY FOR DATA INTEGRATION ON THE GRID

Classic Grid Architecture

DASCOSA: Database Support for Computational Science Applications. Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, Norway

Using Peer to Peer Dynamic Querying in Grid Information Services

Grid-based Distributed Data Mining Systems, Algorithms and Services

Data Grids. Lidan Wang April 5, 2007

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

Praseeda Manoj Department of Computer Science Muscat College, Sultanate of Oman

Virtual Data Integration

Web Service Based Data Management for Grid Applications

FIPA agent based network distributed control system

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS

Data and beyond

Introduction to Service Oriented Architectures (SOA)

Content Delivery Network (CDN) and P2P Model

16th International Conference on Control Systems and Computer Science (CSCS16 07)

A Survey Study on Monitoring Service for Grid

Service-Oriented Computing and Service-Oriented Architecture

Shuffling Data Around

Theme 6: Enterprise Knowledge Management Using Knowledge Orchestration Agency

Remote Sensitive Image Stations and Grid Services

KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery

A Uniform Approach to Workflow and Data Integration

Peer-to-Peer Issue Tracking System: Challenges and Solutions

JOURNAL OF OBJECT TECHNOLOGY

Building Web Services with XML Service Utility Library (XSUL)

Service-Oriented Architectures

Gradient An EII Solution From Infosys

Lesson 4 Web Service Interface Definition (Part I)

Robust service-based semantic querying to distributed heterogeneous databases

Exploiting peer group concept for adaptive and highly available services

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Grid based Integration of Real-Time Value-at-Risk (VaR) Services. Abstract

THE CCLRC DATA PORTAL

Distributed Systems Architectures

IBM Solutions Grid for Business Partners Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand

ENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES

DATABASES AND THE GRID

A Peer-to-Peer Approach to Content Dissemination and Search in Collaborative Networks

GRIDB: A SCALABLE DISTRIBUTED DATABASE SHARING SYSTEM FOR GRID ENVIRONMENTS *

Interacting the Edutella/JXTA Peer-to-Peer Network with Web Services

Database Middleware and Web Services for Data Distribution and Integration in Distributed Heterogeneous Database Systems

Distributed Databases

XpoLog Center Suite Data Sheet

DObjects: Enabling Distributed Data Services for Metacomputing Platforms

Resource Management on Computational Grids

Enterprise Application Designs In Relation to ERP and SOA

GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources

Achieving Semantic Interoperability By UsingComplex Event Processing Technology

A Distributed Architecture for Multi-dimensional Indexing and Data Retrieval in Grid Environments

REST vs. SOAP: Making the Right Architectural Decision

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

Integration of the OCM-G Monitoring System into the MonALISA Infrastructure

Service Oriented Architecture 1 COMPILED BY BJ

Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)

Data Integration and Network Marketing

Multi-Agent Support for Internet-Scale Grid Management

An innovative, open-standards solution for Konnex interoperability with other domotic middlewares

Cluster, Grid, Cloud Concepts

Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley

Getting Started with Service- Oriented Architecture (SOA) Terminology

GEDDM - Commercial Data Mining Using Distributed Resources

A P2P SERVICE DISCOVERY STRATEGY BASED ON CONTENT

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Towards Trusted Semantic Service Computing

Tier Architectures. Kathleen Durant CS 3200

OGSA - A Guide to Data Access and Integration in UK

A standards-based approach to application integration

Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus

Relational Database Basics Review

UNIVERSITY OF TRENTO A PEER-TO-PEER DATABASE MANAGEMENT SYSTEM. Albena Roshelova. June Technical Report # DIT

DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents

Analisi di un servizio SRM: StoRM

Closer Look at Enterprise Service Bus. Deb L. Ayers Sr. Principle Product Manager Oracle Service Bus SOA Fusion Middleware Division

Chapter 2: Remote Procedure Call (RPC)

Integration Using the MultiSpeak Specification

Software design (Cont.)

Parallel Processing of JOIN Queries in OGSA-DAI

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

CLEVER: a CLoud-Enabled Virtual EnviRonment

Enabling Collaboration Using the Biomedical Informatics Research Network (BIRN):

EAI OVERVIEW OF ENTERPRISE APPLICATION INTEGRATION CONCEPTS AND ARCHITECTURES. Enterprise Application Integration. Peter R. Egli INDIGOO.

Providing Semantically Equivalent, Complete Views for Multilingual Access to Integrated Data

PROGRESS Portal Access Whitepaper

OSIRIS Middleware & ISIS Application

Vertical Integration of Enterprise Industrial Systems Utilizing Web Services

Peer-to-peer framework of Distributed Environment for Cooperative and Collaborative Work Service Composition

Enterprise Application Integration (EAI) Techniques

Distributed Database for Environmental Data Integration

Concept-Based Discovery of Mobile Services

Web Services Description Language (WSDL) Wanasanan Thongsongkrit

Data Integration Hub for a Hybrid Paper Search

Automating Rich Internet Application Development for Enterprise Web 2.0 and SOA

elearning Content Management Middleware

Transcription:

XML Data Integration in OGSA Grids Carmela Comito and Domenico Talia University of Calabria Italy comito@si.deis.unical.it

Outline Introduction Data Integration and Grids The XMAP Data Integration Framework Integration Model XPath Query Reformulation Algorithm The Grid Data Integration System Conclusions 2

The Problem (1) Grid applications can access distributed heterogeneous data sources Managed by different software system Accessible through different protocols and interfaces Modeled through different data models Data Sources are autonomous and highly dynamic The case for high-level services: Assist users to access several databases Exploit the variety and dynamic nature of resources offered by the Grid 3

The Problem (2) NG id Artist artefact style Name I d like to find all the places holding works of art of Impressionists National Gallery (NG) Data Integration is a key issue for expoliting the title category availability of large, heterogeneous, distributed data volumes on Grids Integration formalisms can benefit from OGSAbased Grids BM ArtistInfo Dynamic discovery, allocation, access and use of resources code category first-name Last-Nam British Museum (BM) Painter Sculptor Painting School artefact Style 4

Our Solution British Museum (BM) XMAP GDIS Museums (M) National Gallery (NG) Historic Buildings (HB) Tate Gallery (TG) St. Paul Cathedral (SPC) Buckingham Palace (BP) Westminster 5

Goals Develop a decentralized framework for integrating heterogeneous XML data source Addressing challenges arisen from autonomous, dynamic data sources across unpredictable network Meeting the requirements of scalability, robustness, autonomy Deploy the integration framework in a service-based Grid architectur Expose data integration utilities as Grid services Exploit the middleware provided by OGSA-DAI, OGSA-DQP and Globus Toolkit 6

Data Integration and Grids data integration system provides a uniform query interface across utonomous, heterogeneous networked or local data sources Federated Database Management Systems (FDBMSs) Mediator/Wrapper based Integration System n the Grid Multiple, autonomous, unpredictable sites Huge, highly dynamic, data volumes Heterogeneity and Distribution of data resources Sites both clients and servers Traditional approaches to data integration are not suitable in Grid settings 7

Challenges of Grid-based Data Integration Architectures The Grid raises new challenges in data integration systems: No need for a central mediated schema (Decentralization) Ability to map data as is most convenient (Flexibility, Dynamism Wide-scale, ad-hoc nature (Scalability) Queries are posed using the node s schema. Answers come from anywhere in the system (Sharing and Cooperation) 8

Peer-to to-peer and Grids Recent works on data management in peer-to-peer systems Lacks a global schema Each peer represents an autonomous information system Semantic mappings are established directly among peers Peer-to-peer based data management architectures present similar features with respect to Grid-based ones OGSA Grids can provide a suitable and reliable infrastructure for P2P systems P2P architectures address issues and problems common to several Grid applicatio The proposed integration model is inspired from recent approaches in P2P data integration 9

MAP: : XML Data Integration Framework decentralized network of semantically related XML data sources A set of distributed, heterogeneous, autonomous XML data sources Different data sources have their own schema Mapping is a key issue to any data sharing architecture XMAP integration model is based on schema mappings Mapping specification is flexible and scalable not resorting to any hierarchical structure Each source schema is directly connected to only a small number of other schemas (point-to-point mapping) Each source schema is reachable from all other schemas belonging to its transitive closure (transitive mapping) 10

MAP: : XML Data Integration Framework British Museum (BM) Mediators Point-to-point Mapping Transitive Mapping Museums (M) Each data source National Gallery (NG) Export data in its own schema Historic Buildings (HB) Serve as logical mediators for other sources Tate Gallery (TG) St. Paul Cathedral (SPC) Buckingham Palace (BP) Westminster 11

Schema Mapping in XMAP Schema mapping in XMAP associates paths in different schemas (path-to-path mapping) A Mapping M over a source schema S is a set of mapping rules R M ={R M 1, RM 2,, RM k } A mapping rule R M i relates a pair of schemas by associating paths on the basis of mappings cardinality constraints Mapping rules are specified in XML documents called XMAP documents Each source schema is associated to an XMAP document containing all the mapping rules related to it. 12

XMAP Document Structure <schema targetnamespace="http://xmap/xmapdocument" xmlns="http://www.w3.org/2001/xmlschema" > <element name="mapping"> <complextype> <sequence> <element name="sourceschema" type="string" minoccurs="1" maxoccurs= 1 /> <element name="rule" minoccurs="1"> <complextype> <sequence> <attribute name="cardinality" type="string" minoccurs="1" maxoccurs= 1 /> <element name="sourcepath" type="string" minoccurs="1"/> <element name="destschema" type="string" minoccurs="1" maxoccurs= 1 /> <element name="destpath" type="string" minoccurs="1"/> </sequence> </complextype> </element> </sequence> </complextype> </element> </schema> 13

XMAP Reformulation Algorithm XMAP Reformulation Algorithm reformulates an XPath query Q over all the schemas related to Q Query is answered by chaining of mapped sources using the mapping rules defined in XMAP documents Direct reformulations of Q by using the mapping of S (point-to-point mapping) Transitive reformulations are obtained by recursively invoking the algorithm over each reformulated query (transitive mappings). INPUT A query Q over the schema NG XMAP document associated with NG, XMAPNG OUTPUT A set of reformulated queries Q R i 14

XMAP Reformulation I d Algorithm like to find all - Example the places holding works of art of Impressionists NG Artist id artefact style Name title category Q=/Artist[style= Impressionism ]/artefact/title 15

XMAP Reformulation Algorithm - Steps 1. Identifying the paths in Q. 16

XMAP Reformulation Algorithm - Example Q=/Artist[style= Impressionism ]/artefact/title NG Artist id artefact style Name title category 17

XMAP Reformulation Algorithm - Steps 1. Identifying the paths in Q. 2. Looking for candidate paths in all source schemas related to NG. 18

XMAP Document of schema NG?xml version="1.0"?> <XMAP> <sourceschema>ng</sourceschema> <Rule cardinality="mappingn-1"> <destinationschema>bm</destinationschema> <sourcepath> /artist/first-name </sourcepath> <sourcepath>/artist/last-name</sourcepath> <destinationpath> /Info/Name </destinationpath> </Rule> <Rule cardinality="mapping1-n"> <destinationschema> BM </destinationschema> <sourcepath> /artist/style </sourcepath> <destinationpath> /Info/Kind/Painter/School </destinationpath> <destinationpath> /Info/Kind/Sculptor/Style </destinationpath> </Rule> <Rule cardinality="mapping1-n"> <destinationschema> BM </destinationschema> <sourcepath> /artist/artefact/title </sourcepath> <destinationpath> /Info/Kind/Painter/Painting/Title</destinationPath> <destinationpath> /Info/Kind/Sculptor/Artefact</destinationPath> </Rule> </XMAP> 19

XMAP Reformulation Algorithm - Example NG Artist id artefact style Name title category Info - <Rule BMcardinality="Mapping1-N"> <destinationschema> BM </destinationschema> <sourcepath> /artist/artefact/title /artist/style </sourcepath> </sourcepath> <destinationpath> code /Info/Kind/Painter/Painting/Title</destinationPath> /Info/Kind/Painter/School first-name Last-Name <destinationpath> /Info/Kind/Sculptor/artefact</destinationPath> /Info/Kind/Sculptor/Style </Rule> Painter Sculptor Painting School artefact Style 20

XMAP Reformulation Algorithm - Steps 1. Identifying the paths in Q 2. Looking for candidate paths in all source schemas related to NG. 3. Pruning of Candidate schemas 4. Constructing Reformulated Queries 21

XMAP Reformulation Algorithm - Example R1 = /ArtistInfo/category/painter[school= Impressionism ]/painting/title R2 = /ArtistInfo/category/sculptor[style= Impressionism ]/artefact BM Info code Kind first-name Last-Name Painter Sculptor Painting School artefact Style Title 22

XMAP Reformulation Algorithm - Steps 1. Identifying the paths in Q 2. Looking for candidate paths in all source schemas related to NG. 3. Pruning of Candidate schemas. 4. Constructing Reformulated Queries 5. Recursive invocation of the algorithm 23

The Grid Data Integration System The Grid Data Integration System (GDIS) is a service-based architecture for data integration on Grid-enabled databases Offers a wrapper/mediator-based approach to integrate data sources Adopts the XMAP decentralized mediator approach to handle semantic heterogeneity over data sources Syntactic heterogeneity is hidden behind OGSA-DAI wrappers Exposes data integration utilities as Grid Data Services Mapping Specifications XMAP Reformulation Algorithm 24

Wrapper/Mediator Approach in GDIS Query XMAP Query Reformulator Source Schema 3 Source Schema 5 Source Schema 6 Source Schema 1 Source Schema 2 Source Schema 4 Source Schema n Reformulated Query Distributed Query Distributed Query Processor Wrapper Wrapper Wrapper Wrapper Data Source 1 Data Source 2 Data Source 3 Data Source n 25

GDIS Logical Model A set of Grid nodes Each node can Provide Data sources (Data Provider) Provide Schemas (Mediator) Expose Semantic Mappings Formulate queries (Client) Challenges of a wrapper/mediator-based integration system Processing nodes Execution nodes Data integration nodes Wrapper nodes 26

GDIS Layered Architecture Applications Application Layer OGSA-GDI OGSA-DQP OGSA-DAI Data Middleware Layer Globus Toolkit 3 Fabric Grid Middleware Layer MySql Xindice Data 27

Conclusions XMAP proposes a decentralized solution to address data heterogeneity among XML databases Integration approach based on flexible and scalable semantic connections among small set of database schemas XMAP is deployed in GDIS, a service-based Grid architecture The XMAP framework has been recently implemented using Java 1.4 and integrated in the GDIS system using OGSA-DAI 5.0 and Globus Toolkit 3.2.1 Future directions PARIS: using XMAP for reformulating queries in a P2P archictecture Embed the XMAP framework within the OGSA-DQP engin 28