MASARYK UNIVERSITY FACULTY OF INFORMATICS A Metadata-Driven Approach to Relational Database Management DISSERTATION PROPOSAL Mgr. Vojtěch Přehnal Supervisor: doc. RNDr. Ivan Kopeček, CSc. doc. RNDr. Ivan Kopeček, CSc. Brno, January 2012 Supervisor
1
Acknowledgements I would like to thank to my supervisor Ivan Kopeček for his helpful discussions and advices as well as to members of LSD (Laboratory of Searching and Dialogue) for their support and creative working environment. 2
3
Contents 1 Introduction... 6 2 State of the Art... 10 2.1 Automated SQL code generation... 10 2.2 Automated Rich User Interface Generation... 11 2.3 Serialization Schema Redefinition... 12 2.4 Relational Metadata Models... 14 3 Dissertation Thesis Intent... 16 4 Achieved Results... 17 4.1 Relational Schema Model (RSM)... 17 4.2 Relational Schema Protocol (RSP)... 18 5 References... 21 6 Summary of Study Results... 22 4
5
1 Introduction Relational databases, originally introduced by H. Codd in [4], have emerged as a predominant way of data storage in various industries, such as finance, banking and accounting, manufacturing and logistics, human resources management, medical care, public administration and much more [6]. The most widely used language for relational database management is SQL (Structured Programming Language). It is standardized by ISO (International Organization for Standardization) and is supported by all major relational database management systems vendors, such as Microsoft, Oracle, IBM or Sybase. SQL involves DML (Data Manipulation Language) statements (e.g. SELECT, INSERT, UPDATE, DELETE ) for data manipulation, DDL (Data Definition Language) statements (e.g. CREATE, ALTER, RENAME, DROP ) for schema alteration and others [1]. The SQL statements contain relational schema elements (such as names of columns or tables) in their syntax, hence, in order to perform the SQL statements on the database, the schema of the database have to be known. In simple scenarios, when the relational schema is invariable, the application logic is precisely customized for the particular schema: the appropriate SQL queries are stored in the database or are hard-wired in the application tier or are computed on-the-fly from the fixed hierarchy of classes generated by some ORM (Object-Relational Mapping) tool. The interacting applications perform custom business logic on the particular schema and, eventually, have a custom user interface designed. As the relational schema evolves, the SQL queries have to be redefined and the interacting applications have to be reimplemented and recompiled. This represents a serious issue for evolving data models, because every change in the relational schema requires additional work of programmers. In other words: user may not perform any action resulting in redefining the schema, which may become a limitation in many cases. When the user needs to change the schema, they can't help themselves. Instead, they have to contact the manufacturer of the application and wait for them to fulfill their requirements. This takes some time which may cause considerable financial loss for the company and may prevent the user from laying up new claims, although they could bring them additional gains. In some cases it s necessary to stop the database for a while (during the schema adjustments), which may cause additional losses or may be completely impossible. Furthermore, the user has to pay for something they could make themselves easily, using a few clicks of mouse. The user is dependent on the supplier/manufacturer of the application, and if the contract is terminated, the possibility of any kind of maintenance is over in the fact. Last but not least, the most of workload in development of data-driven applications is concerned on simple, fully-automatable tasks, such as altering database schema, altering source code, particularly the data model (classes and objects) manually or automatically by ORM (object-relational mapping) tools, implementing business logic for CRUD (Create, Read, Update, Delete) operations, creating user interface or redefining data serialization schema for transport over the network [12]. In order to access evolving relational schema in real-time, without the need for rewriting and/or recompiling the source code, the application is required to retrieve the relation schema from the database in run-time and build SQL queries on-the-fly. However, this introduces several challenges: QUERYING INFORMATION SCHEMA: The only way of retrieving metadata about relational schema out-of-the-box (using only the resources of database engines) is querying a set of 6
system views called INFORMATION SCHEMA [7]. These views have several limitations, though: they provide very poor metadata about the relational schema and they are not easily extensible. They are incompatible across the different database engines and their performance may become very low with increasing number of tables. DESIGN-TIME OBJECT-RELATIONAL MAPPING EXCLUSION: Relations (tables) cannot be mapped to the data model of the application (internal classes and objects) in design-time (including automated mapping using ORM tools). Instead, the data has to be retrieved dynamically and mapped to in-memory objects in real-time using specialized algorithms based on current relational metadata. SERIALIZATION SCHEMA REDEFINITION: In order to exchange relational data over a shared environment (e.g. the internet), the format (i.e. serialization schema) of the data has to be defined and shared by the interacting applications [8]. The serialization schema is typically mapped to the data model (classes and objects) of the interacting applications in design-time and hence, as the relational schema evolves, the serialization schema has to be redefined and the interacting applications have to be recompiled. This is in contrast with the primary objective: avoiding recompilation of the application in the course of relational schema evolvement. AUTO-GENERATED USER INTERFACE: The user interface has to be inferred from the relational schema and generated on-the-fly instead of being designed and customized by the software vendor. This puts high demands on metadata models to supply sufficient information for generating rich user interface from the provided metadata. The current relational metadata models, such as INFORMATION SCHEMA, CWM or OIM do not provide sufficient information, hence they have to be extended [5]. This introduces other problems with synchronization between relational metadata and relational schema (more in the next chapter). The aim of this work is to propose a novel, metadata-driven approach to relational database management enabling schema alteration in real-time, i.e. without recompiling the application. The keynote of the new approach consists in automated relational metadata to relational schema mapping: Instead of altering relational schema using SQL statements and retrieving relational metadata from specialized database views, the opposite approach is proposed in this work: the relational schema is altered automatically by modifying relational metadata in regular tables. In this work, a new software tier, Relational Schema Tier (RST), is proposed. This tier enables automated relational schema management using relational metadata exchange. It provides algorithms for automated relational metadata to relational schema mapping. For relational data and metadata exchange, a new communication protocol, Relational Schema Protocol (RSP), is specified. The purpose of this protocol is to replace schema-dependent SQL statements with schema-independent remote procedure calls. It defines remote operations (procedures) for relational schema exploration, data and metadata exchange and efficient aggregate functions computation and a novel serialization schema for generic relational data and metadata. For relational metadata representation and storage, a new metadata model, Relational Schema Model (RSM), is defined. This model involves relational metadata from standard relational metadata 7
models with revised structure for more effective processing as well as the additional metadata for relational schema localization, data visualization and validation. Data Access Tier Application Logic Tier Presentation Tier Database incl. RSM tables SQL Data, Metadata Relational Schema Tier (RST) RSP RSP Application RSP Application RSP RSP Autogenerated UI Autogenerated UI Figure 1.1: Architecture of the proposed approach In Figure 1.1, the architecture of the proposed approach is illustrated. Relational metadata is represented using the proposed metadata model (RSM) and stored in regular database tables instead of being retrieved directly from the resources of the database engine (i.e. from INFORMATION SCHEMA views). The database is not directly accessible using SQL commands, instead, the access to the database is provided by the proposed software tier (RST). This tier enables exchange of the relational data and metadata using the proposed protocol (RSP) and alters relational schema automatically according to changes in metadata. ADVANTAGES OF THE PROPOSED APPROACH: The proposed, metadata-driven approach to relational database management has numerous advantages over the traditional SQL-driven approach: CONSISTENCY: Relational metadata is automatically synchronized with relational schema using specialized algorithms in the proposed RST tier. The other applications do not have access to the database other than through the RST, hence they cannot alter the database schema without redefining the metadata and so the consistency between the stored metadata and current relational schema is assured. COMPATIBILITY: The proposed RSM model provides uniform and alternative way of relational metadata representation. It enables platform-independent relational metadata retrieval and storage and prevents issues with INFORMATION SCHEMA incompatibilities across the different database engines. PORTABILITY: The proposed RST tier completely encapsulates functions of database engine and provides SQL syntax-independent relational database management. It substitutes traditional SQL queries and statements with the operations of the proposed RSP protocol and builds the SQL statements dynamically on-the-fly. It can be implemented for any relational database engine (or application based on relational database). EXTENSIBILITY: The proposed RSM model is stored in regular database tables and hence is easily extensible in real-time. This way it is possible to store all the required metadata (e.g. metadata for automated rich user interface generation) directly into the database in the same manner as regular data. 8
SCALABILITY: Operations provided by the RST tier (and exposed by RSP protocol) are designed for efficient and highly scalable data retrieval (with built-in support for data filtering, paging, ordering and computation of aggregation functions). VARIABILITY: The proposed RST tier enables relational schema alteration in real-time using relational metadata exchange, the proposed RSP protocol provides fixed serialization schema for generic relational data and the proposed RSM model represents all the necessary metadata for dynamic user interface generation on-the-fly. Altogether, using the proposed approach, the interacting applications do not need to be recompiled in the course of relational schema evolution. STABILITY: In Service-Oriented Architectures (SOA), applications from different software vendors (e.g. web services) have to interact and cooperate smoothly, 24 hours per day. The proposed RSP protocol enables altering relational schema without redefining serialization schema. This prevents the issues with message format mismatch and serialization schema incompatibility in the course of relational schema evolution and results in higher stability and easier maintainability across the interacting applications. UNIFORMITY: Metadata is represented by the proposed RSM model and stored in regular database tables and changes in metadata are automatically reflected into the relational schema. Hence, it is possible to alter both relational schema and relational data using the same set of functions, contrary to SQL, where the DML statements are used for data manipulation, while the DDL statements (different statements with completely different syntax) are used for schema alteration. APPLICATIONS OF THE PROPOSED APPROACH: The proposed approach can be utilized in several scenarios: ERP (ENTERPRISE RESOURCE PLANNING) SOFTWARE: The primal purpose of the proposed approach is to enable users of various ERP software such as accounting software, CRM (Customer Relationship Management) software or SCM (Supply Chain Management) software to alter relational schema without rewriting and/or recompiling the source code (provided the existing business logic is not affected by the performed changes in relational schema). METADATA-DRIVEN SOFTWARE: The proposed approach is simple to employ in any metadata-driven software, such as graphic database managers. They can take advantages of the provided metadata especially in multi-platform environments. ONTOLOGY TO RELATIONAL SCHEMA MAPPING: Using the proposed approach, ontologies can be stored into the relational databases in real-time, such that the classes of concepts are represented by tables and attributes of classes are represented by table columns. It is also possible to extend the proposed metadata model (RSM) with additional metadata related to ontology elements (e.g. namespaces of concepts). 9
2 State of the Art Relational database management involves large scale of operations for exchanging data, altering data schema and performing business logic on the data. These operations are executed by the interacting application using SQL statements and performed by Relational Database Management System (RDBMS), e.g. a database server. In most cases, the SQL statements are hard-wired into the application in design-time (typed by a programmer or auto-generated using ORM tools). In this case, the application has to be recoded and recompiled as the relational schema is being altered [3]. This represents a serious issue for evolving relational schemas. To overcome this issue, the SQL statements have to be built dynamically in run-time according to the current relational schema and requested action by the user. Generally, in order to prevent the application from being recompiled in the course of relational schema evolution, the following must apply: Relational metadata is retrieved in real-time SQL code is generated automatically on-the-fly according to the retrieved metadata User interface is generated automatically on-the-fly according to the retrieved metadata Object-relational mapping is not be performed in design-time Serialization schema definition is not dependent on relational schema In this chapter, it will be analysed what metadata are necessary for automated SQL code and rich user interface generation. Problems with serialization schema redefinition will be described and alternative solution will be proposed. Current relational metadata models will be compared and evaluated with regard to the required metadata. 2.1 Automated SQL code generation In order to be able to generate SQL statements on-the-fly, the application has to retrieve information about the current relational schema (i.e. relational metadata) in real-time, because relational metadata has to be specified as a part of SQL syntax. This applies for almost all SQL statements, including those for schema alteration and data exchange (see identified relational metadata in bold): - SQL statements for schema alteration o Table Creation Syntax: CREATE TABLE table_name (column_name1 data_type, column_name2 data_type... ) Example: CREATE TABLE Persons (P_Id int, LastName varchar(255), FirstName varchar(255)) o Table Alteration Column Addition Syntax: ALTER TABLE table_name ADD column_name datatype Example: ALTER TABLE Persons ADD DateOfBirth date Column Alteration Syntax: ALTER TABLE table_name ALTER COLUMN column_name datatype Example: ALTER TABLE Persons ALTER COLUMN DateOfBirth year Column Removal 10
Syntax: ALTER TABLE table_name DROP COLUMN column_name Example: ALTER TABLE Persons DROP COLUMN DateOfBirth o Table Removal Syntax: DROP TABLE table_name Example: DROP TABLE Persons - SQL statements for data exchange o Data Retrieval Syntax: SELECT column_name(s) FROM table_name Example: SELECT LastName, FirstName FROM Persons o Data Insertion: Syntax: INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...) Example: INSERT INTO Persons (P_Id, LastName, FirstName) VALUES (5, 'Tjessem', 'Jakob') o Data Modification Syntax: UPDATE table_name SET column1=value, column2=value2,... WHERE some_column=some_value Example: UPDATE Persons SET Address='Nissestien 67', City='Sandnes' WHERE LastName='Tjessem' AND FirstName='Jakob' o Data Deletion Syntax: DELETE FROM table_name WHERE some_column=some_value Example: DELETE FROM Persons WHERE LastName='Tjessem' AND FirstName='Jakob' Note, that the list above is only a subset of the most frequently used SQL statements and is only used for illustration of presence of relational metadata in SQL commands and queries. As a consequence, the application have to retrieve relational metadata in real-time in order to be able to access evolving relational schema. 2.2 Automated Rich User Interface Generation For accessing databases with evolving relational schema in the real-time, the application has to retrieve the current metadata from the database and generate user interface on-the-fly [10]. The problem is that the current metadata models do not provide sufficient metadata for generating user interface of high quality. Although there are several database managers with user interface generated from standard relational metadata provided by INFORMATION SCHEMA (e.g. Microsoft SQL Server Management Studio, MySQL Workbench, Oracle SQL Developer ), they do not meet the most common and elementary user requirements for rich user interface: In the first place, metadata provided by INFORMATION SCHEMA does not include information about relational schema localization. It only includes real (native) names of database elements (e.g. names of tables, columns, keys ) but cannot provide their user-friendly names depending on the selected language of the user interface. For example, having table purchase_order, INFORMATION SCHEMA can only provide native database name purchase_order, but English-speaking users want to see user-friendly plural name of the table (e.g. Purchase Orders ) as the title of the table, while Germanspeaking users want to see Bestellungen in the same place. Furthermore, when the user opens an 11
item in the table, he wants to see singular name of the item as the title of the window (e.g. Purchase Order or Bestellung ). In the details of the item, the user wants to see localized userfriendly names of the attributes/fields/columns of the item (e.g. No. / Nr., Vendor / Verkäufer, Total Price / Gesamtpreis, ). The user also wants to see localized plural names of the tables referencing current table (e.g. Order Lines / Bestellpositionen ) as well as localized name of the tables referenced by the current table while picking value of one of its foreign keys (e.g. Select Customer / Verkäufer Suche ). All this metadata is not included in INFORMATION SCHEMA and have to be stored in separated metadata tables. INFORMATION SCHEMA provides information about data types of table columns. However, it can only retrieve native data types of the current database engine. This introduces several problems: The names of matching data types are not identical in different database engines. For example, int in the given database engine refers to the same data type as Integer in the other engine, but the application has to know both names for this data type in order to generate appropriate user interface (e.g. input text box with check on integer numeric value). Another problem occurs when different database engines provide data types with the same name but different semantics: for example, data type int refers to the 32bit integer numeric type in the given engine and in the other engine it refers to the 64bit integer numeric type. Furthermore, the sets of data types are incompatible across different database engines. For example, engine A provides data types: ntext, int, bool and binary and the engine B provides data types: varchar, integer and date. While ntext matches varchar and int matches integer (they are compatible but varies only in their name), bool, binary and date do not have matching data type in the other engine. In such a case, the application cannot provide unified list of data types to choose from when inserting a new column into the table or modifying an existing one. It s often very useful to define custom data types for better data visualization and validation, too. For example, text box for the field of data type password should display only bullets instead of letters and/or validate password strength, text box for e-mail data type should validate e-mail address format and so on. However, it is neither possible to define custom data types using SQL nor retrieving them using INFORMATION SCHEMA views. Hence, the application cannot provide extended logic and data controls for custom data types in the user interface without storing additional metadata into dedicated tables, performing two-way mapping to native data types and ensuring data type consistency manually. 2.3 Serialization Schema Redefinition Widely disparate applications are very often required to share and exchange data from relational data sources. This is accomplished by passing messages over a shared environment (e.g. computer network, file system, computing memory, etc.) in a well-defined, machine-processable format. In such a case, the elements of the serialization schema (e.g. XML tags) reflect typically the relations (tables) and their attributes (columns) in the relational schema [2]. Below is an example of simple relational schema serialization in XML format: 12
Listing 1: Simple relational data +----+-----------+----------+ ID FirstName LastName +----+-----------+----------+ 1 Joe Perry 2 Mark Tremonti 3 Richie Sambora +----+-----------+----------+ Listing 2: XML schema definition (DTD) <!ELEMENT Customers (Customer+)> <!ELEMENT Customer (Id, FirstName, LastName)> <!ELEMENT Id (#PCDATA)> <!ELEMENT FirstName (#PCDATA)> <!ELEMENT LastName (#PCDATA)> Listing 3: XML schema definition (XSD) <complextype name="customers"> <sequence> <element minoccurs="0" maxoccurs="unbounded" name="customer" type="customer"> </element> </sequence> </complextype> <complextype name="customer"> <sequence> <element minoccurs="1" maxoccurs="1" name="id" type="int"/> <element minoccurs="1" maxoccurs="1" name="firstname" type="string"/> <element minoccurs="1" maxoccurs="1" name="lastname" type="string"/> </sequence> </complextype> Listing 4: Relational data serialized in XML format <Customers> <Customer> <Id>1</Id> <FirstName>Joe</FirstName> <LastName>Perry</LastName> </Customer> <Customer> <Id>2</Id> <FirstName>Mark</FirstName> <LastName>Tremonti</LastName> </Customer> <Customer> <Id>3</Id> <FirstName>Richie</FirstName> <LastName>Sambora</LastName> </Customer> </Customers> Figure 2.1: Relational schema to serialization schema mapping In Listing 1, a simple result set from table Customers is retrieved. In order to serialize these data in XML format, the XML schema is defined using a DTD (Listing 2) or XSD (Listing 3) and shared by the interacting applications. Using this XML schema, the specified data is serialized as illustrated in Listing 4. Now, let s consider a simple change in the relation schema: for example, suppose that the columns FirstName and LastName have to be replaced with the new column Name. In such a case, the XML schema has to be redefined, which may affect its backward compatibility and result in message format mismatch between the interacting applications. Generally, as the relational schema evolves, the serialization schema needs to be redefined and all the interacting applications need to be reimplemented. This represents a serious issue for applications working with evolving relational schemas. In order to prevent the serialization schema to be redefined as Listing 5: Relational data serialized as ordered un-typed arrays in XML <table name="customers"> <fields> <field name="id"/> <field name="firstname"/> <field name="lastname"/> </fields> <items> <item> <value>1</value> <value>joe</value> <value>perry</value> </item> <item> <value>2</value> <value>mark</value> <value>tremonti</value> </item> <item> <value>3</value> <value>richie</value> <value>sambora</value> </item> </items> </table> 13
the relational schema evolves, the relational schema elements (tables, columns ) must not be mapped to the serialization schema. Instead, we propose the data items (records) from each table to be serialized in the form of ordered arrays (lists) of un-typed values and, consequently, the individual un-typed values to be serialized uniformly, e.g. as sequences of binary values (byte arrays) or sequences of characters (text strings). The same applies to the metadata provided with the data. Listing 6. Simple XML schema definition for generic relational data <complextype name="table"> <element minoccurs="1" maxoccurs="1" name="items" type="items"> <element minoccurs="1" maxoccurs="1" name="fields" type="fields"> </complextype> <complextype name="fields"> <sequence> <element minoccurs="0" maxoccurs="unbounded" name="field" type="field"> </element> </sequence> </complextype> <complextype name="field"> <element minoccurs="0" maxoccurs="1" name="name" type="string"/> </complextype> <complextype name="items"> <sequence> <element minoccurs="0" maxoccurs="unbounded" name="item" type="item"> </element> </sequence> </complextype> <complextype name="item"> <sequence> <element minoccurs="1" maxoccurs="1" name="value" type="string"/> </sequence> </complextype> In Listing 5 data items from Listing 4 are serialized in the form of ordered arrays of character sequences (strings) in XML format. They are also supplied with simple additional metadata (name of table, names of fields). The appropriate serialization schema for generic relational data is defined in XSD format in Listing 6. The highlighted name of column FirstName is not a part of serialization schema definition (it does not occur in Listing 6) but it is encoded as a data value in metadata section in Listing 5. In the terms of XML, name of column FirstName is not a name of the tag (or node), but it is a value of the tag. This represents a major difference (in compare with the previous schema definition in Listing 3) resulting in important consequences: if, for example, a new column is created in the table of customers, only data are changed (new field occurs in the list of fields and a forth value occurs in each item in Listing 5), but the schema definition (in Listing 6) remains the same. Generally, the proposed schema definition in Listing 6 is not required to be redefined as the relational schema evolves. The only drawback of this schema is a very simple metadata model: it only allows representing the name of the table and its fields (for brevity). One of the aims of this work is to define metadata model for generic relational data and metadata exchange as a part of the proposed Relational Schema Protocol (RSP). 2.4 Relational Metadata Models For relational schema representation, a number of relational metadata models were developed. Below is the list of the most-widely used relational metadata models: INFORMATION_SCHEMA a standard set of database views providing information about relational metadata, standardized in SQL-92 specification [7], but differs slightly across the different database engines. CWM (Common Warehouse Metamodel) probably the most widely accepted model for metadata representation and exchange, adopted by OMG (Open Management Group) standard in 2000 [11]; Relational metadata model is described in Relational package. OIM (Open Information Model) a metadata interchange format adopted by MDC (Meta Data Coalition) as a standard in 1999 [9]; Relational metadata model is part of the Database schema package in Database and Warehousing submodel. 14
Although there are several minor differences between individual models listed above [13], they all provide support to represent the core relational metadata: Tables (unique ID, name, table privileges for the current user) Columns (unique ID, name, table, ordinal, identity increment, nullability, numeric precision, character length, fixed/variable length, signed/unsigned data type, case sensitivity, column privileges for the current user) Keys/indexes (type:primary/foreign/unique/search, fill factor, columns in key) Check constraints (name, column, check clause) Referential constraints (name, column with foreign key, column with primary key) Routines, Views, Triggers (out of scope of this paper) This metadata for automated rich user interface generation is missing: Support for relational schema localization (localized plural names, singular names and documentation of tables, columns and foreign-key references) Support for data visualization (custom data types, auto-join functionality) Support for data validation (custom data types, editability of table fields/columns) 15
3 Dissertation Thesis Intent The intent of the dissertation thesis is to propose a new approach to relational database management, particularly to data retrieval, serialization, visualization, validation and submission (insertion, alteration, deletion) and schema exploration, serialization and alteration using relational metadata exchange. Instead of performing SQL statements and then retrieving relational schema, the opposite approach is proposed in this work: the relational schema is altered automatically by modifying relational metadata. EXPECTED RESULTS OF THE THESIS: Relational Schema Model (RSM) o Metadata model for relational metadata storage o Involves: Metadata from standard relational metadata models with revised structure for more effective processing Additional metadata for relational schema localization, data visualization and validation Relational Schema Protocol (RSP) o Communication protocol for relational data and metadata exchange o Involves: Metadata model for relational schema and data serialization Remote operations for data and metadata exchange Relational Schema Tier (RST) o Software tier for automated relational schema management (including data manipulation and schema alteration) using relational metadata exchange o Implements RSP operations (algorithms) for automated relational metadata to relational schema mapping Reviewed publications on relevant international forums STUDY PLAN: Definition of algorithms for automated relational schema alteration Sep 2012 Feb 2013 Testing of the proposed algorithms Feb 2013 Sep 2013 Work on the text of the thesis Sep 2013 Feb 2014 Final version of the thesis Feb 2014 June 2014 16
4 Achieved Results In this chapter, the achieved results are presented. So far, two components for implementation of metadata-driven database management were designed: Relational Schema Model (RSM) and Relational Schema Protocol (RSP). 4.1 Relational Schema Model (RSM) Relational Schema Model (RSM) is a data model for relational metadata representation and storage in regular database tables. It involves the core relational metadata from the standard relational metadata models (such as INFORMATION SCHEMA, OIM or CWM) as well as extended relational metadata for automated generic data logging (tables journal and journal_item), custom data types to native data types mapping (data_type), relational schema localization (table_header, field_header and reference_header), better data visualization and validation (fields data_type and is_computed in table field). The chart below displays the schema of the proposed metadata model: Figure 4.1: Relational Schema Model (RSM) 17
4.2 Relational Schema Protocol (RSP) Relational Schema Protocol (RSP) is the proposed communication protocol for relational data and metadata exchange. It allows the application to manage relational database through the proposed RST tier without use of SQL code. It defines a fixed relational schema for generic relational data and metadata and provides a set of functions (operations) for relational database management. Serialization schema for generic relational data is defined in Figure 4.2: Figure 4.2: Serialization Schema of Relational Schema Protocol (RSP) The proposed RSP protocol specifies the following operations for relational database management: ReadTableHeaders - schema exploration ReadTable - data and metadata retrieval Submit - data manipulation ReadScalar - single data value retrieval ReadCount - item count calculation ReadSum - sum calculation ReadMinimum - min. value calculation ReadMaximum - max. value calculation Detailed definition of the operations is specified in Listings 4.1 to 4.5: Listing 4.1: User Access Control /// <summary> /// Authenticates user using isolated database connection and transaction. /// </summary> /// <param name="username">user name</param> /// <param name="password">password</param> /// <returns>full name of the user if the user name and passowrd are valid, null otherwise</returns> [OperationContract] public string Authenticate(string UserName, string Password) 18
/// <summary> /// Authorizes user for specified table using isolated database connection and transaction. /// </summary> /// <param name="username">user name</param> /// <param name="tablename">table.</param> /// <returns>list of granted action-ids (1 = read, 2 = insert, 3 = update, 4 = delete,... ).</returns> [OperationContract] public List<int> Authorize(string UserName, string TableName) Listing 4.2: Relational Schema Exploration /// <summary> /// Returns list of tables in the database using isolated database connection and transaction. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="twoletterisolanguagename">iso language name.</param> /// <returns>list of table headers</returns> [OperationContract] public List<TableHeader> ReadTableHeaders(string UserName, string Password, string TwoLetterISOLanguageName) Listing 4.3: Relational Data Retrieval /// <summary> /// Reads scalar value from specified table using isolated database connection and transaction. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="tablename">name of table.</param> /// <param name="columnname">name of column. If null is specified, display column will be used.</param> /// <param name="filterexpression">filter expression.</param> /// <returns>value from specified cell (row and column). If more values found, the first one is returned, the others are ignored.</returns> [OperationContract] public string ReadScalar(string UserName, string Password, string TableName, string ColumnName, string FilterExpression) /// <summary> /// Reads header, fields, items, relations and access rights (actions) in the specified table using isolated database connection and transaction. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="tablename">name of table.</param> /// <param name="twoletterisolanguagename">iso language name.</param> /// <param name="skip">number of records to skip.</param> /// <param name="take">number of items to take. If 0 is specified, all found items are returned.</param> /// <param name="orderexpression">order expression.</param> /// <param name="filterexpression">filter expression.</param> /// <returns>list of rows</returns> [OperationContract] public Table ReadTable(string UserName, string Password, string TableName, string TwoLetterISOLanguageName, long Skip, long Take, string OrderExpression, string FilterExpression) Listing 4.4: Relational Data Submission /// <summary> /// Updates record in specified table using isolated database connection and transaction. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="tablename">database table name</param> /// <param name="fields">list of columns. Required members for all columns: Name, IsIdentity, IsComputed, IsJoined. At least 1 column with "PRIMARY KEY" constraint required for Update operation.</param> /// <param name="data">list of data.</param> /// <returns>identity of new item (or null if the table has no identity field).</returns> [OperationContract] public string Submit(string UserName, string Password, string TableName, SubmitOperation Operation, List<Field> Fields, List<string> Data) 19
Listing 4.5: Aggregate Functions /// <summary> /// Reads total number of records from specified table using isolated database connection and transaction. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="tablename">name of table.</param> /// <param name="filterexpression">filter expression.</param> /// <returns>total number of records in specified table passing filter expression.</returns> [OperationContract] public long ReadCount(string UserName, string Password, string TableName, bool AutoJoin, string ColumnName, string FilterExpression) /// <summary> /// Reads sum value from specified column from specified table using isolated database connection and transaction. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="tablename">name of table.</param> /// <param name="columnname">name of column.</param> /// <param name="filterexpression">filter expression.</param> /// <returns>total number of records in specified table passing filter expression.</returns> [OperationContract] public decimal ReadSum(string UserName, string Password, string TableName, bool AutoJoin, string ColumnName, string FilterExpression) /// <summary> /// Reads minimum value from specified table and column. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="tablename">name of table.</param> /// <param name="columnname">name of column.</param> /// <param name="filterexpression">filter expression.</param> /// <returns>minimum value in specified table and column passing filter expression.</returns> [OperationContract] public decimal ReadMinimum(string UserName, string Password, string TableName, bool AutoJoin, string ColumnName, string FilterExpression) /// <summary> /// Reads maximum value from specified table and column. /// </summary> /// <param name="username">user name.</param> /// <param name="password">password.</param> /// <param name="tablename">name of table.</param> /// <param name="columnname">name of column.</param> /// <param name="filterexpression">filter expression.</param> /// <returns>maximum value in specified table and column passing filter expression.</returns> [OperationContract] public decimal ReadMaximum(string UserName, string Password, string TableName, bool AutoJoin, string ColumnName, string FilterExpression) 20
5 References [1] ANSI/ISO 9075-2-1999 International Standard (IS). Database Language SQL Part 2: Foundation (SQL/Foundation). 1999. [2] Bei, J., Cai, F., Tao, L.J., Pan, J.G. 2004. A direct method of data exchange between XML and relational database. In Proceedings of the 26th International Conference on Information Technology Interfaces, vol. 1, pp.127-132. [3] Cabibbo, Luca. 2009. On keys, foreign keys and nullable attributes in relational mapping systems. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT '09), Martin Kersten, Boris Novikov, Jens Teubner, Vladimir Polutin, and Stefan Manegold (Eds.). ACM, New York, NY, USA, 263-274. [4] E. F. Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), 377-387. [5] Falb, J., Kavaldjian, S., Popp, R., Raneburger, D., Arnautovic, E., Kaindl, H. 2009. Fully automatic user interface generation from discourse models. In Proceedings of the 14th international conference on Intelligent user interfaces (IUI '09). ACM, New York, NY, USA, 475-476. [6] Hartung, M., Terwilliger, J. 2011. Schema Matching and Mapping, Springer Berlin Heidelberg, ISBN: 978-3-642-16518-4, pp. 149-190. [7] INCITS (International Committee for Information Technology Standards). The SQL-92 Standard. 1992. [8] Jingtao, Z., Shusheng, Z., Hongwei, S., Mingwei, W. 2004. An XML-based schema translation method for relational data sharing and exchanging. In Proceedings of the 8th International Conference on Computer Supported Cooperative Work in Design, vol. 1, pp. 714-717. [9] Meta Data Coalition. Open Information Model Version, 1.0 Edition. 1999. Available on-line at: http://www.mdcinfo.com [10] Nicholson, Andrew L., Glass, Michael J., Kosbie, David S., Vaughan, Thomas A. Automated schema and interface generation. US Patent 6,661,519 B1, 2003. [11] OMG. Common Warehouse Metamodel (CWM) Specification (OMG document ad/99-09-01, Initial Submission Edition). 1999. Available on-line at: http://www.omg.org/ [12] Richardson, Chris. 2009. ORM in Dynamic Languages. Commun. ACM 52, 4 (April 2009), 48-55. [13] Vetterli, T., Vaduva, A., Staudt, M. 2000. Metadata standards for data warehousing: open information model vs. common warehouse metadata. In SIGMOD Rec. 29, 3 (September 2000), pp. 68-75. 21
6 Summary of Study Results SEMINARS ATTENDED: Digital Data Processing Methods Seminar on Informatics Seminar of Searching and Dialog Laboratory Enterprise IT Systems and Services RESEARCH ACTIVITIES: As a member of Laboratory of Searching and Dialogue, I have participated in research project aimed at image ontology querying supported by grant GA201/07/0881. I have cooperated on image ontology storage implementation based on the proposed RSP protocol. PUBLISHED PAPERS: Přehnal, Vojtěch. Relational Schema Protocol (RSP) a formal specification, 2011. Available on-line at: http://arxiv.org/ftp/arxiv/papers/1105/1105.5718.pdf PRESENTATIONS: Presentation at Laboratory of Searching and Dialogue, Spring 2010 Presentation at Seminar on Informatics, Fall 2010 Presentation at Seminar on Informatics, Spring 2011 22