A Document Management System Based on an OODB



Similar documents
The Online Grade Book A Case Study in Learning about Object-Oriented Database Technology

Handling Spatial Objects in a GIS Database -Relational v Object Oriented Approaches

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Introduction to XML Applications

A LANGUAGE INDEPENDENT WEB DATA EXTRACTION USING VISION BASED PAGE SEGMENTATION ALGORITHM

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Data Tool Platform SQL Development Tools

XML: extensible Markup Language. Anabel Fraga

Database Concepts (3 rd Edition) APPENDIX D Getting Started with Microsoft Access 2007

ADVANCED DOCUMENT MANAGEMENT SOLUTIONS FOR THE CONSTRUCTION INDUSTRY: THE CONDOR APPROACH

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Chapter 2: Designing XML DTDs

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

The Microsoft Access 2007 Screen

XFlash A Web Application Design Framework with Model-Driven Methodology

Multimedia Applications. Mono-media Document Example: Hypertext. Multimedia Documents

Module One: Getting Started Opening Outlook Setting Up Outlook for the First Time Understanding the Interface...

COS 480/580: Database Management Systems

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

A View Integration Approach to Dynamic Composition of Web Services

XML WEB TECHNOLOGIES

Chapter 1: Introduction

A Workbench for Prototyping XML Data Exchange (extended abstract)

DTD Tutorial. About the tutorial. Tutorial

Technologies for a CERIF XML based CRIS

Object-Oriented Modeling and Design

A Concept for an Electronic Magazine

Access Support Tree & TextArray: A Data Structure for XML Document Storage & Retrieval

Component visualization methods for large legacy software in C/C++

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System

BUSINESS OBJECTS DATA INTEGRATOR

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives

Evaluating OO-CASE tools: OO research meets practice

Source Code Translation

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

Hypercosm. Studio.

Content Author's Reference and Cookbook

Outlook . User Guide IS TRAINING CENTER. 833 Chestnut St, Suite 600. Philadelphia, PA

Hands-on training in relational database concepts

INVENTS: an hybrid system for subsurface ventilation analysis

AvePoint Tags 1.1 for Microsoft Dynamics CRM. Installation and Configuration Guide

Content Author's Reference and Cookbook

Contents. Launching FrontPage Working with the FrontPage Interface... 3 View Options... 4 The Folders List... 5 The Page View Frame...

Oracle8i Spatial: Experiences with Extensible Databases

Database System Concepts

Calculator Notes for the TI-Nspire and TI-Nspire CAS

LabVIEW Internet Toolkit User Guide

Microsoft Office Access 2007 Basics

Managing large sound databases using Mpeg7

Adobe Conversion Settings in Word. Section 508: Why comply?

XML Processing and Web Services. Chapter 17

Visual Basic. murach's TRAINING & REFERENCE

XML Data Integration

Authoring Guide for Perception Version 3

Course: CSC 222 Database Design and Management I (3 credits Compulsory)

A Visual Language Based System for the Efficient Management of the Software Development Process.

Firewall Builder Architecture Overview

ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004

12 File and Database Concepts 13 File and Database Concepts A many-to-many relationship means that one record in a particular record type can be relat

Arbortext 6.1. Curriculum Guide

Exploiting Tag Clouds for Database Browsing and Querying

2 Associating Facts with Time

The Role of Requirement Engineering in Software Development Life Cycle 1

SQLMutation: A tool to generate mutants of SQL database queries

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

X-Trade Brokers Dom Maklerski S.A. XTB Expert Builder. Tutorial. Michał Zabielski

Word 2007: Basics Learning Guide

From Object Oriented Conceptual Modeling to Automated Programming in Java

2Creating Reports: Basic Techniques. Chapter

DCA. Document Control & Archiving USER S GUIDE

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

ARCHITECTURAL DESIGN OF MODERN WEB APPLICATIONS

Developing and Implementing Web Applications with Microsoft Visual C#.NET and Microsoft Visual Studio.NET

A QUICK OVERVIEW OF THE OMNeT++ IDE

ARIS Design Platform Getting Started with BPM

1 First Steps. 1.1 Introduction

Microsoft Publisher 2010: Web Site Publication

A Tool for Generating Relational Database Schema from EER Diagram

How To Create A Large Data Storage System

Managing XML Documents Versions and Upgrades with XSLT

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Transcription:

Tamkang Journal of Science and Engineering, Vol. 3, No. 4, pp. 257-262 (2000) 257 A Document Management System Based on an OODB Ching-Ming Chao Department of Computer and Information Science Soochow University Taipei 100, Taiwan, R.O.C. E-mail: chao@cis.scu.edu.tw Abstract Efficient document management is extremely important as a tremendous volume of documents are produced and accessed by modern information systems. A document management system is described in this paper. The system stores SGML documents in an ObjectStore object-oriented database and is capable of storing, within one database, different types of documents by accommodating multiple DTDs. We create an object type for all DTDs and store each DTD as an object of that object type. We create an object type for each element definition in a DTD and store each element of an SGML document as an object. This database representation is advantageous to declarative query and fine-grained modification of documents. The system supports automatic creation of object types and insertion of documents into the database. Two different interfaces are provided for the user to retrieve, modify, and delete documents. The system supports declarative query of documents, which can be made with respect to their contents or structure. Key Words: Document Management, Object-Oriented Database, SGML 1. Introduction Structured documents are central to a wide class of applications such as software engineering, digital library, information retrieval, etc. The ever-increasing volume of structured documents produced by modern information systems makes efficient document management extremely important. The recognition and marking up of the internal structure of structured documents helps to increase the efficiency of retrieving documents in document management systems. SGML [4] has been widely used for marking up structured documents. It is a standard markup language for document description. It is designed specifically to enable text interchange such that marked-up documents can be used and exchanged among different systems and platforms. It can also be used to add logical structure information into documents that gives documents greater applicability. Researchers have recognized that the management of structured documents can benefit notably from database support. A current trend is to employ the object-oriented database technology in the management of structured documents. In this paper, we report our work on developing an SGML document management system. The system stores SGML documents in an Object- Store object-oriented database and supports insertion, modification, retrieval, and deletion of documents. With regard to the database representation of documents, we create an object type for each element definition in a document type definition (DTD). In this way, each element of an SGML document is stored as an object. This storage representation is advantageous to declarative query and fine-grained modification of documents. The system is capable of storing, within one database, different types of documents by accommodating multiple DTDs. The system includes three primary compo-

A Document Management System Based on an OODB 261 nents: a DTD parser, an SGML parser, and a query processor. We assume SGML documents and associated DTDs have been created and validated by an authoring tool. The DTD parser accepts a new DTD and automatically generates object types that correspond to the elements defined in the DTD. The SGML parser accepts an SGML document instance and automatically inserts the document into the database by instantiating appropriate objects that correspond to the elements in the document. The query processor is responsible for retrieval, modification, and deletion of documents. The system supports declarative query of documents, which can be made with respect to their contents or structure. The rest of this paper is organized as follows. In Section 2 we review previous work on storage and retrieval of structured documents (in particular, SGML documents). In Section 3 we briefly introduce the syntactic structure of SGML documents as well as investigate the issue of representing SGML documents in an object-oriented database. In Section 4 we present the document management system. Section 5 concludes this paper and suggests future research directions. 2. Related Work In this section we briefly review previous work on storage and retrieval of structured documents (in particular, SGML documents). They are distinguished in principle by ways in which documents are stored and accessed. Schouten [8] used the relational data model to design an SGML document database. Because of the hierarchical and intricate structure of SGML documents, relational databases with flat tables and scalar data types are inappropriate to store these documents for at least two reasons. First, mapping SGML documents into relational tables is a complicated and unnatural process and may lose some structural information. Second, because a document is scattered over several tables, retrieving a document from the database requires several join operations and therefore is inefficient. VERSO [3], developed at INRIA in France, is an object-oriented database system for SGML documents. It is built on top of the O 2 object-oriented database management system to exploit its sophisticated type system and extensible query language O 2 SQL. Using an extended version of the Euroclid SGML parser, VERSO maps DTDs into O 2 schema, and document instances into corresponding objects. This requires the extension of the O 2 data model to union types and ordered tuples. It also extends the O 2 query language O 2 SQL for document retrieval. HyperStorM [2], which stands for Hypermedia Document Storage and Modeling, is a project developed at GMD-IPSI in Germany. It is built on the VODAK object-oriented database management system. The Structured Document Database component [1] of HyperStorM investigates various object-oriented technologies for structured documents. It suggests a hybrid database-internal representation for documents. That is, some elements are represented by individual database objects, while others (the flat elements) are not. This representation is subject to configuration for the particular document type. It also proposes the concept of query template as a declarative access mechanism of SGML documents. Ozsu et al. [6] developed an object-oriented multimedia database management system that can store and manage SGML/HyTime compliant multimedia documents. The system is capable of storing and managing different types of documents in one database. This is accomplished by dynamically creating object types according to element definitions in each DTD. The system also has tools to automatically insert marked-up documents into the database and provides facilities for querying these documents with respect to their contents and with respect to their structure. Sengupta and Dillon [9] proposed an approach to the representation of SGML documents that is different from those mentioned above. They argued that converting SGML documents into database formats is unnatural and may lose information. Their system puts a set of SGML documents in a repository and poses queries on these documents. A query language based on the SQL standard and a query interface based on the QBE interface are also proposed. 3. An Object-Oriented Document Database In this section, we will investigate the issue of representing SGML documents in an object-oriented database. Before doing that, we have to first understand the basic concepts and syntactic structure of SGML documents. An SGML document is composed of three parts: an SGML declaration, a document type definition (DTD), and a document instance (DI). The SGML declaration defines the character set and any special SGML features used in the document. If it is absent, the default will be used. The document type definition of a

A Document Management System Based on an OODB 259 document defines the structure and the rules for marking up the document instance. There can be many documents that share the same DTD. Therefore, it is mostly often to store the DTD separately from the document instance to make the document itself more concise and to make the DTD sharable by different documents. The document instance contains the content and tags of the document, including a reference to its DTD. It is marked up according to the rules defined by the DTD. Figure 1 shows the document instance of a memo document and Figure 2 shows its DTD. In Figure 3 we formally specify the syntactic structure of SGML documents in the OMT notation [7]. An SGML document has a name, an optional SGML declaration, one or more DTDs, and an element. An element may contain text data and/or component elements and may have any number of attributes. An element and its attributes are defined by their corresponding definitions in DTD. A DTD has a name, any number of entity definitions, notation definitions, and public text, and at least one element definition. An element definition contains either a content model or a declared content and an optional exception list. A model group is recursively defined. An element <! DOCTYPE Memo SYSTEM C:\Memo.dtd > <Memo> <To> All Employees </To> <From> The President </From> <Body> <P> In the last year, our company earnings increased 100%. It is good news. Please remember: <Q> Working hard is the best policy. </Q> I hope our company will be better tomorrow. </P> </Body> <Close> Isaac Newton </Close> </Memo> Figure 1. A Document Instance <! -- DTD for simple memoranda -- > <! ELEMENT Memo -- ((To & From), Body, Close?) > <! ELEMENT To -O (#PCDATA) > <! ELEMENT From -O (#PCDATA) > <! ELEMENT Body -O (P*) > <! ELEMENT P -O (#PCDATA Q)* > <! ELEMENT Q -- (#PCDATA) > <! ELEMENT Close -O (#PCDATA) > <! ATTLIST Memo status (confiden public) public> <! -- End of DTD -- > Figure 2. A Document Type Definition SGML Document name 1+ {ordered} SGML Declaration DTD Element name data contain instance Has Attribute Entity Notation Definition Public Text Define value name name name Define instance type type 1+ definition definition data data Element Definition Attribute Definition data type Contain name Has name omit start tag declare value Contain Group omit end tag declare value type Type default value occurrence Exclusive Inclusive {ordered} connector Figure 3. A Class Diagram of SGML Documents

A Document Management System Based on an OODB 259 definition may be associated with the definition of its attributes. We will not go into further details of the syntactic structure of SGML documents. The interested reader is referred to [5]. Now we have learned the syntactic structure of SGML documents, it is time to discuss how to represent SGML documents in an object-oriented database. First let us discuss how to store document type definitions. A DTD may include entity definitions, element definitions, and attribute definitions. An entity definition defines a symbolic name for any type of data. ENTITY is the keyword for an entity definition followed by the symbolic name and the data of the entity. An element definition defines the structure of an element of a document. ELEMENT is the keyword for an element definition followed by the element name, a two-character tag omission indicator, and a content model or declaration content. An element may have an attribute definition that defines one or more attributes of the element. ATTLIST is the keyword for an attribute definition followed by the element name and one or more attribute declarations. Each attribute declaration contains the name, all possible values, and the default value of the attribute. Other than storing each DTD in a file, we also create an object type for all DTDs and store each DTD as an object of that type. The structure of the object type for all DTDs is shown in Figure 4, which is drawn in the OMT notation. Now let us discuss how to store document instances. We create an object type for each element definition in a DTD. In this way, each element of an SGML document is stored as an object. This storage representation is advantageous to declarative query and fine-grained modification of documents. Our system is capable of storing, within one database, different types of documents by accommodating multiple DTDs. This is accomplished by DTD name 1+ entity def. element def. attribute def. name name name data tag omission all values content model default value Memo To text From text Body P Q text text Close text Figure 5. Object Types for Elements in Memo DTD creating different object types for different DTDs. For example, the object types created for element definitions in the memo DTD are shown in Figure 5. 4. A Document Management System A document management system must support the functionality of storing and accessing documents. The system we developed can be used to define an ObjectStore object-oriented database and store SGML documents in the database. In addition, it supports retrieval, modification, and deletion of SGML documents. The system architecture of our document management system is shown in Figure 6. The system includes three primary components: a DTD parser, an SGML parser, and a query processor. The DTD parser accepts a new DTD and automatically transforms the DTD into a collection of object type definitions. Each object type definition corresponds to an element definition in the DTD. The type generator is responsible for DTDs DTD type object parser generator database OODBMS documents SGML instance parser generator query processor Figure 4. Object Types for All DTDs Figure 6. The System Architecture

A Document Management System Based on an OODB 261 creating these object types in the database. The SGML parser accepts an SGML document, parses the document, and breaks the document instance into elements. The instance generator is responsible for automatically storing the document instance in the database by instantiating objects of appropriate object types. Each object corresponds to an element of the document instance. The query processor is responsible for retrieval, modification, and deletion of documents. The system provides a graphic user interface to the user. Figure 7 shows the main menu of the system. The main menu contains seven menu items: File, Edit, View, Parser, Database, Window, and Help. When a menu item is selected, a pull-down menu is displayed which contains several commands. Figure 7. Main Menu of the Document Management System The File menu includes commands for opening a file, closing a file, saving a file, saving on another file, previewing a file, setting up the format of file for printing, printing a file, and exiting the program. The Edit menu includes commands for undoing (and redoing) the previous command, cutting, copying, pasting, and searching and replacing the content of a file. The View menu includes commands to display (or not to display) the toolbar and the status bar. The Parser menu includes commands for invoking the DTD parser and the SGML parser. The Database menu is used for retrieving, modifying, and deleting documents. The Window menu includes commands for opening a new window and displaying opened windows in cascade or tile arrangement. Finally, the Help menu provides on-line help for using this SGML document management system. The system provides two different interfaces for the user to retrieve, modify, and delete SGML documents: one is command-driven and the other is form-driven. In the command-driven interface, the user enters statements in an Object SQL-like language. The statement for retrieving documents takes the form SELECT elements FROM DTD WHERE condition where the DTD specifies the DTD of the collection of documents to be searched, the condition specifies a condition to be satisfied by the retrieved documents, and the elements specifies the elements of the documents to be displayed. For example, the following statement SELECT * FROM Memo WHERE Memo.To contains All Employees and Memo.From contains The President retrieves all documents of Memo DTD in which the element To contains All employees and the element From contains The President. The statement for modifying documents takes the form UPDATE DTD elements modification WHERE condition where the DTD specifies the DTD of the collection of documents to be updated, the condition specifies the condition to be satisfied by the updated documents, and the elements modification specifies how the elements of the documents are to be modified. There are two ways to modify an element: one is to replace the whole element and the other is to replace only part of the element. For example, the following statement UPDATE Memo replace President in Memo.From by Chair WHERE Memo.From contains The President modifies all documents of Memo DTD in which the element From contains The President by replacing President in the element From by Chair. The statement for deleting documents takes the form DELETE FROM DTD WHERE condition where the DTD specifies the DTD of the collection of documents to be deleted and the condition specifies the condition to be satisfied by the deleted documents. For example, the following statement

A Document Management System Based on an OODB 259 DELETE FROM Memo WHERE Memo/status = confiden deletes all documents of Memo DTD in which the value of the attribute status in the element Memo is confiden. In the form-driven interface, the user first selects the statement (retrieve, modify, or delete) as well as the DTD of the documents to be accessed. For different statements and DTDs, the system provides different statement-and-dtd-specific forms to the user. The user only has to fill in the information to execute the statement. 5. Conclusion In this paper we described an SGML document management system based on an object-oriented database. The system stores SGML documents in an ObjectStore object-oriented database. We create an object type for all DTDs and store each DTD as an object of that object type. We create an object type for each element definition in a DTD and store each element of an SGML document as on object. This database representation is advantageous to declarative query and fine-grained modifications of documents. The system supports automatic creation of object types and insertion of documents into the database. It provides two different interfaces for the user to retrieve, modify, and delete documents. Currently it only supports declarative query. We plan to add the navigational access function to make the system useful on the WWW environment. [5] Maler, E. and El Andaloussi, J., Developing SGML DTDs: From Text to Model to Markup, Prentice Hall PTR, Upper Saddle River, New Jersey (1996). [6] Ozsu, M.T., Iglinski, P., Szafron, D., El-Medani, S. and Junghanns, M., An Object-Oriented SGML/HyTime Compliant Multimedia Database Management System, in Proceedings. of 1997 ACM Multimedia Conference, Seattle, Washington, USA (1997). [7] Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F. and Lorensen, W., Object-Oriented Modeling and Design, Prentice Hall, Englewood Cliffs, New Jersey (1991). [8] Schouten, H., SGML*CASE: The Storage of Documents in Databases, The Netherlands Ministry for Agriculture and Fisheries, Wageningen (1989). [9] Sengupta, A. and Dillon, A, Extending SGML to Accommodate Database Functions: A Methodological Overview, Journal of the American Society for Information Science, Vol. 48, No. 7, pp. 629-637 (1997). Manuscript Received: Apr. 12, 2000 Accepted: Nov. 23, 2000 References [1] Bohm, K. and Aberer, K., HyperStorM - Administering Structured Documents Using Object-Oriented Database Technology, in Proceedings. of 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 547 (1996). [2] Bohm, K., Aberer, K., Neuhold, E.J. and Yang, X., Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM, The VLDB Journal, Vol. 6, No. 4, pp. 296-311 (1997). [3] Christophides, V., Abiteboul, S., Cluet, S. and Schott, M., From Structural Documents to Novel Query Facilities, SIGMOD Record, Vol. 23, No. 2, pp. 313-324 (1994). [4] Goldfarb, C.F., The Standard Generalized Markup Language (ISO 8879), International Organization for Standardization, Geneva (1986).