Intelligent Data Integration Middleware Based on Updateable Views



Similar documents
ODRA: A Next Generation Object-Oriented Environment for Rapid Database Application Development

Three-Level Object-Oriented Database Architecture Based on Virtual Updateable Views 1

Virtual Schemas in Visual Interfaces to Databases

Project VIDE Challenges of Executable Modelling of Business Applications

Overview of the Project ODRA

Workflow Object Driven Model

Dynamic Changes of Workflow Processes

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

OODBMS Metamodel Supporting Configuration Management of Large Applications 1

Advanced egovernment Information Service Bus (egov-bus)

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

REST vs. SOAP: Making the Right Architectural Decision

TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Distributed System Principles

International Journal of Advanced Research in Computer Science and Software Engineering

Application of ontologies for the integration of network monitoring platforms

Advanced egovernment Information Service Bus

Database Middleware and Web Services for Data Distribution and Integration in Distributed Heterogeneous Database Systems

A common interface for multi-rule-engine distributed systems

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

Introduction to CORBA. 1. Introduction 2. Distributed Systems: Notions 3. Middleware 4. CORBA Architecture

How To Understand The Concept Of A Distributed System

Integration of Distributed Healthcare Records: Publishing Legacy Data as XML Documents Compliant with CEN/TC251 ENV13606

AUGMENTING WEB-BASED COLLABORATION WITH ADAPTIVE REPLICATION AND MOBILITY (EXTENDED ABSTRACT )

Transparency and Efficiency in Grid Computing for Big Data

Filtering the Web to Feed Data Warehouses

An Overview of Distributed Databases

Secure Data Transfer and Replication Mechanisms in Grid Environments p. 1

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

chapater 7 : Distributed Database Management Systems

Data Grids. Lidan Wang April 5, 2007

Code Generation for Mobile Terminals Remote Accessing to the Database Based on Object Relational Mapping

Distributed Databases

jeti: A Tool for Remote Tool Integration

A Grid Architecture for Manufacturing Database System

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Integration of Application Business Logic and Business Rules with DSL and AOP

Mobile Storage and Search Engine of Information Oriented to Food Cloud

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

Mobile Agent System for Web Services Integration in Pervasive Networks

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Development and Execution of Collaborative Application on the ViroLab Virtual Laboratory

Implementing reusable software components for SNOMED CT diagram and expression concept representations

Data Management System for grid and portal services

MTCache: Mid-Tier Database Caching for SQL Server

Enterprise Application Integration (EAI) Techniques

DATA PORTABILITY AMONG PROVIDERS OF PLATFORM AS A SERVICE. Darko ANDROCEC

Web Service Based Data Management for Grid Applications

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim

DATABASES AND THE GRID

Business Process Management

Annotation for the Semantic Web during Website Development

In Memory Accelerator for MongoDB

A Semantic Approach for Access Control in Web Services

SODDA A SERVICE-ORIENTED DISTRIBUTED DATABASE ARCHITECTURE

Contents RELATIONAL DATABASES

Migrating Legacy Software Systems to CORBA based Distributed Environments through an Automatic Wrapper Generation Technique

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

SNMP, CMIP based Distributed Heterogeneous Network Management using WBEM Gateway Enabled Integration Approach

Development of a generic IT service catalog as pre-arrangement for Service Level Agreements

Live Model Pointers A requirement for future model repositories

Dynamic Scheduling of Object Invocations in Distributed Object Oriented Real-Time Systems Jørgensen, Bo Nørregaard; Joosen, Wouter

The Concept of Automated Process Control

AN INTELLIGENT TUTORING SYSTEM FOR LEARNING DESIGN PATTERNS

Research and Design of Heterogeneous Data Exchange System in E-Government Based on XML

Implementing Web-Based Computing Services To Improve Performance And Assist Telemedicine Database Management System

Motivation Definitions EAI Architectures Elements Integration Technologies. Part I. EAI: Foundations, Concepts, and Architectures

Chapter 3 - Data Replication and Materialized Integration

ACHIEVING HIGH DEPENDABILITY OF AN ENDOSCOPY RECOMMENDER SYSTEM(ERS)

Ontology-based Domain Modelling for Consistent Content Change Management

Functionalities of a Content Management System specialized for Digital Library Applications

Remote Sensitive Image Stations and Grid Services

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

A Data Browsing from Various Sources Driven by the User s Data Models

Lightweight Data Integration using the WebComposition Data Grid Service

Design of Data Archive in Virtual Test Architecture

Distributed Data Management

Oracle8i Spatial: Experiences with Extensible Databases

Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

A Contribution to Expert Decision-based Virtual Product Development

Web services to allow access for all in dotlrn

A generic approach for data integration using RDF, OWL and XML

Service Oriented Architectures

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

Web Services in 2008: to REST or not to REST?

Chapter 1: Introduction

An Architecture for Web-based DSS

Integrating Heterogeneous Data Sources Using XML

Log Mining Based on Hadoop s Map and Reduce Technique

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System

Using Object And Object-Oriented Technologies for XML-native Database Systems

Context Capture in Software Development

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Transcription:

Intelligent Data Integration Middleware Based on Updateable Views Hanna Kozankiewicz 1, Krzysztof Stencel 2, Kazimierz Subieta 1,3 1 Institute of Computer Sciences of the Polish Academy of Sciences, Warsaw, Poland {hanka,subieta}@ipipan.waw.pl 2 Institute of Informatics Warsaw University, Warsaw, Poland stencel@mimuw.edu.pl 3 Polish-Japanese Institute of Information Technology, Warsaw, Poland Abstract. We present a new approach to the grid technology that is based on updateable views. Views are used in two ways: (1) as wrappers of local servers that adopt local ta to the federated database requirements; (2) as a facility for intelligent data integration and transformation into a canonical form according to the federated database. Views deliver virtual updatable objects to global clients. Objects can be associated with methods that present the procedural part of remote services, like e.g. in Web Services. The fundamental quality of the approach is transparency of servers: the user perceives the distributed data/service environment as an integrated virtual whole. Such a quality is achieved by applications based on CORBA. We attempt to achieve a higher level of transparency by providing means for integrating horizontal and vertical fragmentations of data and by taking into account various forms of data redundancy. The approach is based on a very simple and universal architecture and on the stack-based approach (SBA) to object-oriented query languages. 1 Introduction The grid technology provides a new information processing culture that integrates many local services into a big virtual service, summing all the resources belonging to particular services. The grid user is interested in services rather than in service providers thus local service providers should be transparent for applications. Such transparency has to be supported by many technical solutions concerning the network infrastructure and distributed data/service environment. In this paper we focus on data-intensive applications where distribution of bulk data implies distributed and parallel computation. From this point of view, the grid technology can be perceived as continuation of federated databases, the topic that has been developed for many years in the database domain. It worked out many concepts that are close to the current grid research, such as various forms of transparency, heterogeneity, canonical data models, distributed transaction procession, metamodels, etc.

The key issue behind such integration is transparency, which means abstraction from secondary features of distributed resources. There are many forms of transparency, in particular location, access, concurrency, implementation, scaling, fragmentation, replication, indexing and failure transparency. Due to transparency (implemented on the middleware level) some complex technical details of the distributed data/service environment need not to be taken into account in the application code. Thus transparency much amplifies programmers productivity and greatly supports flexibility and maintainability of software products. The novelty of our approach to the grid technology is that we propose to use updateable object-oriented views with full computational power. Such views (defined in a very high-level query language) much facilitate the development of intelligent mappings between heterogeneous data/service ontologies. The approach allows the designers to reduce the time required for development and maintenance of database federations. Views are defined in the query language SBQL that is integrated with imperative constructs (e.g. updating) and abstractions (functions, methods, procedures). While the idea of using views for integration of distributed/federated databases is not new (see e.g. [1, 8]), to the best of our knowledge implementation of this feature is still a challenge because of the updateability of virtual views and the practical universality of view definitions. In our recent research [3, 4, 6] we have developed and implemented object-oriented virtual views that have full algorithmic power and are updateable with no anomalies and limitations. Our views support full transparency of virtual objects, i.e. the programmer is unable to distinguish stored and virtual objects by any programming option. Due to this feature views can be considered as a general facility for integrating distributed and heterogeneous resources, including methods acting on virtual objects. The advantage of our approach is that it offers a very simple architecture that is much more flexible and universal than the currently known integration technologies for distributed heterogeneous data/service environments. The rest of this paper is structured as follows. The next section sketches the main ideas of updateable views that are the basis of our approach. Section 3 presents the architecture of our intelligent middleware. In Section 4 we show an example illustrating data integration within our approach. Section 5 concludes. 2 View Mechanism in the Stack-Based Approach Our grid mechanism is based on the Stack-Based Approach (SBA) to query languages [7, 9]. SBA treats a query language as a kind of programming languages. Therefore, queries are evaluated using mechanisms that are common in programming languages, such as an environmental stack. SBA defines an abstract formal framework that is known as the Stack-Based Query Language (SBQL). A view is a mapping of stored data into virtual ones. In the classical approach (SQL) a view is a function, similar to programming languages functions. View updating means that a function result is treated as an l-value in updating statements. Such an approach, however, appeared to be inconsistent due to problems with finding un-

equivocal mapping of updates on virtual data into updates on stored data. In our approach to view updates [3, 4] a view definer can augment a view definition by the information on update intents. The information allows the programmer to eliminate ambiguities connected with multiple ways of performing a view update and the risk of warping user intention, which is the well-known problem related to view updates [3]. In the approach a view definition consists of two main parts: (1) mapping between stored and virtual objects; (2) re-definitions of generic operations that can be performed on virtual objects. We have identified four operations on virtual objects, i.e., dereference returning the value of a given virtual object, insertion of an object into a given virtual object, deletion of a given virtual object, and update the value of a given virtual object. The view definer has freedom to decide which and how these operations have to be re-defined. The mapping of stored objects into virtual ones and the definitions of view updating operations are procedures with arbitrary complexity. The mapping procedure returns entities called seeds that are used as parameters for the re-defining procedures. This procedure is distinguished by the clause virtual objects. For procedures that re-define operations we use fixed names on_retrieve, on_insert, on_delete, on_update. An example view definition is presented in Section 4. View definitions can contain other elements such as definition of subviews, internal state variables, and others. 3 Architecture of the Intelligent Middleware The heart of our approach is the global virtual object and service store (shortly: global virtual store), Fig.1 [2, 5]. Global clients are applications that send requests to the global virtual store. Global is a collection of definitions of data/services provided by the global virtual store. The global is agreed upon by a consortium, by a standard or by a law that establishes the grid. The grid offers data/services supplied by local servers. Local ta define data/services inside a local server. These ta, as well as the business meaning of the data/services, can be different at each local server. The ta are invisible to the grid users. The first step of integration of a local server into the grid is done by the administrator of this local server who has to define the contributory. It is the description of the data and services contributed by the local server to the grid. The local server s administrator also defines contributory views that constitute the mapping of the local data/services to the canonical data/service model assumed by the grid. The second step of the integration of local servers into the grid is the creation of global views. These views are stored inside the global virtual store. The interface of them is defined by the global. They map the data and services provided by the local servers to the data and services available to the global clients. The global views are defined by the grid designer, which is a person, a team or software that generates these views upon the contributory ta, the global and the integration. The integration contains additional information how particular data and services of local servers contribute to the global canonical model of the grid.

Global client 1 Global client 2 Global client 3 Global Global views Global virtual object and service store Communication protocol Grid designer Integration views Local service and object store Local views Local service and object store Local Local server Local server Fig. 1 Architecture of the GRID 4 Data Integration Example Assume a database that contains Book objects with attributes ISBN, title, author and price. Below we present an example of integration of horizontally fragmented data with a replication. Assume there are three local servers located in Cracow, Gdansk, and Warsaw (we call these servers by their locations). Each server keeps data of the books. All contributory s are the same. In the considered case the integration contains the following information: The local servers in Cracow and Warsaw contain the same data. Data should be retrieved from a replica with shorter access time. Data in Cracow cannot be modified. Data in Warsaw can be modified, and the changes are immediately and automatically reflected in Cracow. Each book is uniquely identified by ISBN. Updating from a global application may change only the title of a book. The global database is the virtual union of the databases in Warsaw and Gdansk or (alternatively) in Cracow and Gdansk. The access timeout for users is 300 seconds. The definition of the global view might look as shown below: create view MyBookDef { virtual objects MyBook {

} int accesstimeout := 300; int accesstimecracow := accesstimeout; int accesstimewarsaw := accesstimeout; if alive(cracow) then accesstimecracow := checkaccesstime(cracow); if alive( Warsaw ) then accesstimewarsaw := checkaccesstime(warsaw); if min(accesstimecracow, accesstimewarsaw) accesstimeout then { throw exception( accesstooslow ); return ; } return (Gdansk.Book if accesstimecracow < accesstimewarsaw then Cracow.Book else Warsaw.Book) as b /* returning seeds */ on_retrieve do { return b.( deref(title) as title, deref(author) as author, deref(price) as price, deref(isbn) as ISBN ) } on_delete do { delete ( if server(b) = Cracow then (Warsaw.Book where ISBN = b.isbn) else b) } on_update newbook do {( if server(b) = Cracow then (Warsaw.Book where ISBN = b.isbn) else b).title := newbook.title; } }... /* further elements of the view definition */ Note that the view refers to objects from a particular server using the name of its location (e.g. Warsaw.Book). The view delegates the updates to appropriate servers. The view definer can use implicitly several routines of the communication protocol, for instance, the navigation (. ), insert, delete, update ( := ). These remote operations are called as if they were local. There are also auxiliary routines of the communication protocol. In the example their calls are underlined. Functions alive and checkaccesstime are used to determine, which of the two local servers (Warsaw or Cracow) should be used when constructing the virtual objects. Function server determines which server is the origin for the given virtual object. This routine makes it possible to distinguish proper updating site, for instance, because objects in Cracow cannot be updated one must make appropriate updates in Warsaw. Note that the global clients of the view cannot see the origin of particular virtual objects. They just refer to the name of the view and attribute names, like in the following query (return the titles of all books by Lem): (MyBook where author = Lem ). title or in an updating statement (change title Salaris into Solaris ): (MyBook where title = Salaris ) := ( Solaris as title );

5 Conclusion We have presented a novel approach to implementation of grid databases based on updateable views. The approach fulfills some fundamental requirements for grid applications such as transparency, interoperability and computational universality. The approach is based on a powerful query language SBQL which ensures high abstraction level. The advantage is decreasing development time of a new grid application and simpler adaptation to changing requirements. The presented view mechanism is flexible and allows one to describe any mapping between local databases and a federated database. Because our views can also map methods, we offer facilities as powerful as CORBA or Web Services. We have implemented SBQL and virtual updateable views for XML repositories based on the DOM model. After this experience our nearest plans assume implementation of our approach to the grid technology within the prototype ODRA, an object-oriented database platform build from scratch by our team (ca. 20 researchers) under.net. Currently in ODRA we have finished implementation of SBQL and are advanced in implementation of other functionalities (imperative constructs, procedures, methods, views, distribution protocols, etc.) that are necessary to make our idea sufficiently ready for testing and prototype applications. References [1] Z.Bellahsene: Extending a View Mechanism to Support Schema Evolution in Federated Database Systems. Proc. of DEXA 1997, 573-582 [2] K.Kaczmarski, P.Habela, K.Subieta. Metadata in a Data Grid Construction. Workshop on Emerging Technologies for Next generation GRID (ETNGRID-2004), June 2004, Proc. published by IEEE, to appear [3] H.Kozankiewicz, J.Leszczyłowski, J.Płodzień, K.Subieta. Updateable Object Views. Institute of Computer Science Polish Ac. Sci. Report 950, October 2002 [4] H.Kozankiewicz, J.Leszczyłowski, K.Subieta. Updateable XML Views. Proc. of ADBIS 03, Springer LNCS 2798, 2003, 385-399 [5] H.Kozankiewicz, K.Stencel, K.Subieta. Integration of Heterogeneous Resources through Updatable Views. Workshop on Emerging Technologies for Next Generation GRID (ETNGRID-2004), June 2004, Proc. published by IEEE, to appear [6] H.Kozankiewicz, K.Subieta. SBQL Views Prototype of Updateable Views. Proc. 8th ADBIS Conf., September 2004, Budapest, Hungary, to appear [7] K.Subieta, Y.Kambayashi, J.Leszczyłowski. Procedures in Object-Oriented Query Languages. Proc. of 21-st VLDB Conf., 1995, 182-193 [8] K.Subieta. Mapping Heterogeneous Ontologies through Object Views. Proc. 3-rd Workshop on Engineering Federated Information Systems (EFIS 2000), IOS Press, 1-10, 2000 [9] K.Subieta. Theory and Construction of Object-Oriented Query Languages. Editors of the Polish-Japanese Institute of Information Technology, 2004, 530 pages