SODDA A SERVICE-ORIENTED DISTRIBUTED DATABASE ARCHITECTURE



Similar documents
Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Distributed Database Management Systems for Information Management and Access

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System

Tier Architectures. Kathleen Durant CS 3200

Motivation Definitions EAI Architectures Elements Integration Technologies. Part I. EAI: Foundations, Concepts, and Architectures

chapater 7 : Distributed Database Management Systems

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

A Grid Architecture for Manufacturing Database System

Distributed Databases

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

A standards-based approach to application integration

Enterprise Application Designs In Relation to ERP and SOA

Lightweight Data Integration using the WebComposition Data Grid Service

A Generic Database Web Service

What You Need to Know About Transitioning to SOA

SQL Server Training Course Content

A Survey Study on Monitoring Service for Grid

DISTRIBUTED AND PARALLELL DATABASE

A Unified Messaging-Based Architectural Pattern for Building Scalable Enterprise Service Bus

Advanced Database Group Project - Distributed Database with SQL Server

Service Oriented Architecture (SOA) An Introduction

Service Oriented Architecture

Service Oriented Architecture: A driving force for paperless healthcare system

Protecting Database Centric Web Services against SQL/XPath Injection Attacks

TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA

Distributed Systems Architectures

Present a New Middleware to Control and Management Database Distributed Environment

An Overview of Distributed Databases

GORDA: An Open Architecture for Database Replication

Service Oriented Architecture

Alternatives to SNMP and Challenges in Management Protocols. Communication Systems Seminar Talk 10 Francesco Luminati

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Create a single 360 view of data Red Hat JBoss Data Virtualization consolidates master and transactional data

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Service Oriented Architectures

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

Introduction: Database management system

Service-oriented architecture in e-commerce applications

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Introduction to UDDI: Important Features and Functional Concepts

An empirical study of messaging systems and migration to service-oriented architecture

Event-based middleware services

Toward Next Generation Distributed Business Information Systems: Five Inherent Capabilities of Service-Oriented Computing

GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT

Database Middleware and Web Services for Data Distribution and Integration in Distributed Heterogeneous Database Systems

Integration of Hotel Property Management Systems (HPMS) with Global Internet Reservation Systems

Run-time Service Oriented Architecture (SOA) V 0.1

Introduction to Parallel and Distributed Databases

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Middleware for Heterogeneous and Distributed Information Systems

Chapter 2 Database System Concepts and Architecture

XIII. Service Oriented Computing. Laurea Triennale in Informatica Corso di Ingegneria del Software I A.A. 2006/2007 Andrea Polini

HexaCorp. White Paper. SOA with.net. Ser vice O rient ed Ar c hit ecture

Guiding Principles for Modeling and Designing Reusable Services

CONDIS. IT Service Management and CMDB

ESB as a SOA mediator: Minimizing Communications Complexity

Principles of Distributed Database Systems

1 What Are Web Services?

Service-oriented Development of Federated ERP Systems

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

CHAPTER 1 INTRODUCTION

Outline. Definitions. The Evolution of Distributed Systems. Distributed Systems: The Overall Architecture. Chapter 5

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Data processing goes big

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

JOHN KNEILING APRIL 3-5, 2006 APRIL 6-7, 2006 RESIDENZA DI RIPETTA - VIA DI RIPETTA, 231 ROME (ITALY)

Evolution of Distributed Database Management System

Topics. Distributed Databases. Desirable Properties. Introduction. Distributed DBMS Architectures. Types of Distributed Databases

Christoph Bussler. B2B Integration. Concepts and Architecture. With 165 Figures and 4 Tables. IIIBibliothek. Springer

XOP: Sharing XML Data Objects through Peer-to-Peer Networks

Integration using IBM Solutions

B.Com(Computers) II Year DATABASE MANAGEMENT SYSTEM UNIT- V

Distributed Databases in a Nutshell

Unlocking the Power of SOA with Business Process Modeling

Data Mining Governance for Service Oriented Architecture

Enterprise Application Integration (Middleware)

Literature Review Service Frameworks and Architectural Design Patterns in Web Development

AHAIWE Josiah Information Management Technology Department, Federal University of Technology, Owerri - Nigeria jahaiwe@yahoo.

Introduction to Database Systems

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

What is Middleware? Software that functions as a conversion or translation layer. It is also a consolidator and integrator.

Platform Technology VALUMO for Dynamic Collaboration

Fig. 3. PostgreSQL subsystems

Reengineering Open Source CMS using Service-Orientation: The Case of Joomla

How To Create A C++ Web Service

The Information Revolution for the Enterprise

Getting started with API testing

Leveraging Service Oriented Architecture (SOA) to integrate Oracle Applications with SalesForce.com

3-Tier Architecture. 3-Tier Architecture. Prepared By. Channu Kambalyal. Page 1 of 19

OWB Users, Enter The New ODI World

PIE. Internal Structure

STRATEGIES ON SOFTWARE INTEGRATION

Distributed Database Management Systems

BUILDING OLAP TOOLS OVER LARGE DATABASES

XBRL Processor Interstage XWand and Its Application Programs

Architecting for the cloud designing for scalability in cloud-based applications

Overview: Siebel Enterprise Application Integration. Siebel Innovation Pack 2013 Version 8.1/8.2 September 2013

Development of a generic IT service catalog as pre-arrangement for Service Level Agreements

Getting Started with Service- Oriented Architecture (SOA) Terminology

Transcription:

SODDA A SERVICE-ORIENTED DISTRIBUTED DATABASE ARCHITECTURE Breno Mansur Rabelo Centro EData Universidade do Estado de Minas Gerais, Belo Horizonte, MG, Brazil breno.mansur@uemg.br Clodoveu Augusto Davis Jr. Instituto de Informática Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, MG, Brazil clodoveu@pucminas.br Keywords. Abstract. Distributed database, service-oriented architecture, web services, database management system, heterogeneous data sources, data integration, distributed query processing. The dissemination of information systems over computer networks has reinforced the interest on distributed database management systems (DDBMS). A review on the architecture of such systems is motivated by the wide availability of networking resources, which allows the cost of communication among the nodes to be reduced. Besides, the coming of age of syntactic interoperability standards, such as, and of service-based networking allow for new alternatives for the implementation and deployment of distributed databases. This paper proposes the adoption of elements from service-oriented architectures for the implementation of the connections among distributed database components, thus configuring a service-oriented distributed database architecture. 1 INTRODUCTION Organizations are actively pursuing the integration of their systems with those of their partners, with the intention of achieving improvements in their production chains and logistics processes. Managing distributed data, stored in physically distinct and independent servers, is often required. As the volume of distributed data grows, the use of distributed database management techniques increases in importance. With such techniques, one can build an integrated database from isolated and independent segments. The cost of high-capacity servers and the operational decentralization of several types of organizations are among the reasons for this tendency. Kossman (2000) points out that, even though good ideas have been presented by research initiatives on distributed database management systems (DDBMS), the prototypes that have been developed did not make it to commercial tools. Kossman (2000) believes that such research may have taken place ahead of the ideal time, mainly because of the lack of a communications infrastructure as stable and inexpensive as the Internet. With the growing availability of network communication resources, through the Internet, along with growing quality of service and a strong trend towards cost reduction, some of the obstacles in the path of developing distributed databases have vanished. However, the theoretical background in this field still does not consider peculiar aspects related to the use of Internet standards such as (Bray, Paoli et al. 2006), services (Alonso, Casati et al. 2004), SOAP (Simple Object Access Protocol) (Curbera, Duftler et al. 2002) and SOA (-Oriented Architecture) (Endrei, Ang et al. 2004; Huhns and Singh 2005), among others. Such recent technological elements can be put together, in a review of DDBMS concepts, using SOA to implement Internet-based data exchange. Current demand for distributed databases makes it possible to envision the use of previously developed ideas in a more favorable technological context, thus generating interesting alternatives to centralized DBMS. Recent works (Campbell 2005; Tok and Bressan 2006) present initiatives related to the integration of

SOA and DBMS, thus proposing a service-oriented database architecture (SODA). We propose an extension of SODA, as a new architecture for Internet-based distributed database management using SOA-based communication. The remainder of this paper is organized as follows. Section 2 presents related work. Sections 3 and 4 present the proposed architecture and a prototype implementation. Finally, Section 5 presents conclusions and notes for future work. 2 SERVICE-ORIENTED DATABASE ARCHITECTURE SODA represents a step towards the integration between database systems and SOA. SODA uses services to publish functions and procedures that can handle the content of databases across the Internet. SODA has been proposed (Campbell 2005) in the context of the creation of SOA support structures in the core of Microsoft s Server. The paper presents details on a novel HTTP daemon in the DBMS core, so that the DB server and the server can be independent, while allowing the DBMS to respond to HTTP requests. According to Tok and Bressan (2006), several commercial database management systems are beginning to include features for the integration between services and stored procedures, and mention Server as an example. However, this integration is superficial; the database query processor ends up ignoring the communications impact in its cost calculations. We agree with the notion that the ideas behind the SODA architecture, as proposed by Campbell (2005), can lead to interesting technological solutions to many distributed database issues which have been previously documented. We present a proposal for the evolution of SODA towards distributed databases, in a SOA-based context. 3 SODDA SODDA works out the main issues involving communication needs in distributed database environments. Basically, SODDA merges DDBMS concepts with new technologies and initiatives associated to SOA and the Internet. We propose using service-oriented protocols over the Internet as the principal means of communication for a DDBMS. This approach makes it necessary to review the situations in which communications take place among DDBMS nodes, so that services and SOA features can be used as an alternative. We describe the necessary adaptations next. SODDA uses services to coordinate operations among distributed database nodes. Each node includes a service to coordinate the local database, and which is capable to respond to a client data provider, called the SODDA Hub, when the node receives requests for queries or other database operations (Figure 1). SODDA Hub can be seen as a common connectivity middleware. It runs on the client s side and it is responsible for all interactions with the SOA components of SODDA. Therefore, in the application developer s point of view, SODDA is a fully transparent and modular distributed data provider that can be used in a way that is similar to OLEDB, JDBC or ODBC. Unlike SODA, in which services are used only to expose stored procedures that can handle data, SODDA uses services as fundamental components of the communication among nodes. This is a typical activity in SOA. All operations that are submitted to database nodes are conducted through the SODDA Hub, which is also capable of accessing the distributed database s global catalog. Direct access to local databases is encapsulated by two elements of the architecture: Node s and Data Source s. services act as Node s. In any node of the distributed environment there must be a service to respond to client invocations. To execute commands in the local context of a node, the Node must use a Data Source. These wrappers implement methods to adapt and convert standard into specific query commands at each data source. In a DDBMS, the global catalog stores data on fragments and their location. In SODDA, the global catalog is replaced by a catalog service, which provides information about the location of Node s. This brings location transparency, one of the main SOA advantages, to SODDA. The catalog service only contains locations and basic information on Node s, but includes the statistics service, which contains methods to retrieve data on communication and storage costs from every node, and to prepare these data for use in cost-based query optimization. The catalog service is accessed only to determine the nodes that are possibly involved in the operation, and the statistical data guides the selection of the best alternatives among redundant nodes. The SODDA Hub then ex-

changes documents directly with the Node s, in order to retrieve data to respond to a client request. This exchange is fundamentally important for the distributed query processing. Catalog Database Client SODDA Hub Internet Database Statistics % Figure 1. Basic layout of the service-oriented distributed database architecture. Distributed query processing is naturally one of the most important functions in a DDBMS, since it is used in every data retrieval situation involving fragments that are in various sites. In SODDA, we assume that the distributed database nodes are managed by regular DBMSs with local autonomy. Therefore, these nodes also include the internal features of a conventional DBMS. Considering this, SODDA includes a query processor with optimization in two levels. The first level deals with distribution aspects, based on metadata from the catalog service and from the statistics service. The optimizer gathers statistics on the tables that are involved in a query, using a cost-based strategy supported by the statistics service. The second optimization level handles the part of the query which is transformed into a local query in the point of view of a given node, and executed using the native query processing and optimization methods available to the local DBMS. Since part of the optimization work is performed by the conventional DBMSs, SODDA can focus on distribution and consolidation issues. The query processing mechanism in the SODDA architecture is based on the basic model presented by Kossman (2000). The query processor is divided into three parts: the Query Decomposition Engine (QDE), the Distribution Optimizer (DO), and the Distributed Execution Engine (DEE) (Figure 2). QDE is responsible for partitioning the queries, finding out which nodes are involved in the operation, decomposing the query into sub queries, based on the fragmentation schema. QDE generates query parts (QP), each of local scope, in regard to a DDBMS node. DO then must determine which tasks can be performed in parallel, building a tree to guide the work of ing the QP. The result of this phase is the execution plan, obtained from a tree, built from the dependencies among the QP. Independent QP s, therefore, are always defined as leaves in this tree. Finally, the DEE traverses the tree generated by DO, and commands the execution of each QP. DEE can identify nodes which can receive query parts simultaneously, and which therefore can execute their tasks in parallel. When results are received, DEE dynamically creates a representation of global schema from the distributed environment, and copies partial results. Finally, DEE performs the required s over partial results in a temporary swap database, called Dynamic Swap (DS), in order to generate the final results. The DS is either handled at the SODDA Hub s site, or at a different node, added to the network precisely to support thin or mobile clients. Standard input Query SODDA Hub Query Decomp. Engine Query Catalog Partitioned query Nodes locations Nodes & Tables Statistics % Figure 2. Distributed query processing in SODDA. At this point, we observe that SODDA presents not only location transparency, but also access transparency, since the access to the distributed database is performed from the client s location, running the SODDA Hub, which concentrates services required to access the distributed database. Beyond the query processor, a DDBMS must implement some other services to manage transactions and replication features of the distributed environment. We propose three additional services to that effect: the Distributed Transactions Manager (DTM), the Replicated Data Manager (RDM), and the Database Recovery (DRS). Distribution Optmizer Execution plan Statistical data Distributed Execution Engine WS Data Join results Results WS Data

The DTM is a mechanism that deals with transaction atomicity in a distributed database environment. In SODDA, the Hub coordinates the execution of the transaction. The original transaction, through the SODDA Hub, starts sub-transactions at each node involved in the process, and waits for their completion. In case of failure at any particular node, the Hub issues messages, using services, to cause a rollback at every node involved. If every node completes its task successfully, the Hub issues messages to cause a commit at every node. The success in the implementation of this procedure requires the implementation of a services-based distributed commit protocol. The RDM implements such a protocol to promote synchronization of data that exist in different nodes and are possibly replicated. RDM uses a publish/subscribe method. The nodes that subscribe to a topic receive change notifications. If a node becomes offline, the previously notified changes must be processed at the moment it becomes online again, before it can accept any further commands. The DRS is a mechanism that is responsible for finding out whether or when a node becomes inaccessible, and for determining which redundant copy (if any) can replace it during the period of unavailability. This mechanism is integrated to the RDM since, if the distributed database manages redundant data, every modification is propagated using the publish/subscribe notification system, in order to ensure the synchronization. Therefore, available nodes receive changes immediately, and unavailable nodes receive updates later, in a queue. 4 SODDA PROTOTYPE To validate proposed architecture, we have implemented a SODDA prototype as a Microsoft.NET data provider. However, we cannot provide many implementation details due to space limitations. Results obtained so far demonstrate the feasibility of the ideas and proposals presented here, along with a large potential to support future work, both in research and in commercial applications. In order to test our prototype, we developed three Data Source s, respectively for Server, Server CE and Access. Support for data sources such as My, Oracle, files, and CSV files is planned for future releases. The prototype currently only implements data retrieval features. Data manipulation is to be included in the next releases of the SODDA data provider. 5 CONCLUSIONS SODDA intends to use some of the most interesting features of SOA to implement distributed databases. Expected benefits include easier implementation, lower communications costs, and greater access capillarity. The proposed statistics service facilitates query optimization, since it unifies the treatment of performance-oriented metadata and allows implementation of automatic statistics updating. An extensive list of possibilities for future work presents itself at this stage. We observe that the distributed database nodes do not, necessarily, need to be managed by a full DBMS. Since they are accessed only through services, it would be possible to have other data sources, such as spreadsheets, pages and others. Data Source s can be written to use these sources as nodes on a SODDA-based environment. Another possibility involves the implementation of hotswapping of nodes, making it possible to achieve full availability for, say, equipment maintenance, through the simple modification of catalog entries. 6 REFERENCES Alonso, G., F. Casati, et al. (2004). -s: Concepts, Architecture and Applications, Springer Verlag. Bray, T., J. Paoli, et al. (2006). "Extensible Markup Language () 1.0 (Fourth Edition) - W3C Recommendation 16 August 2006, edited in place 29 September 2006." Retrieved 24/04/2007, from http://www.w3.org/tr/xml/. Campbell, D. (2005). Oriented Database Architecture: App Server-Lite? Proceedings of the ACM SIGMOD International Conference on Management of Data. Curbera, F., M. Duftler, et al. (2002). Unraveling the s : An Introduction to SOAP, WSDL, and UDDI. IEEE Internet Computing. Vol 6: 86-93. Endrei, M., J. Ang, et al. (2004). Patterns: - Oriented Architecture and s, International Business Machines Corporation (IBM). Huhns, M. and M. P. Singh (2005). -Oriented Computing: Key Concepts and Principles. IEEE Internet Computing. Vol 9: 75-81. Kossman, D. (2000). The State of the Art in Distributed Query Processing. ACM Computing Surveys. Vol. 32: 442-469. Tok, W. H. and S. Bressan (2006). DBNet: A - Oriented Database Architecture. Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06).