Analyzing Triggers in XML Data Integration Systems Jing Lu, Dunlu Peng, Huan Huo, Liping Gao, Xiaodong Zhu Analyzing Triggers in XML Data Integration Systems Jing Lu 1, Dunlu Peng 1, Huan Huo 1, Liping Gao 1, Xiaodong Zhu 2 (1: School of Optical Electrical and Computing Engineering, University of Shanghai for Science and Technology, Shanghai, 2: School of Management, University of Shanghai for Science and Technology, Shanghai, China) Email: jinglu76@gmail.com doi: 10.4156/jdcta.vol4.issue5.4 Abstract Nowadays XML-based data integration systems are accepted as data service providers on the web. Currently the user is permitted to submit updates. Therefore it is necessary to establish the best possible data consistency in the whole data integration system. To that extend, we present an approach based on an XQuery trigger service. In the XQuery trigger service trigger termination should be ensured. The trigger analyzing problem is mainly discussed in this paper. We first define the three categories of trigger dependence including total independence, possible dependence and definite dependence. According to the three categories there are tolerant termination and absolute termination. Triggering graph is used to carry on the real analysis. DAG algorithm is used to find the cycle in the triggering graph. The implementation shows that the XQuery triggers are ensured to be terminated according to different levels. Keywords: XML Data Integration Systems, XQuery, XQuery Trigger, Trigger Analyzer, Trigger Dependence, Trigger Termination 1. Introduction Web services are emerging as a new paradigm to build web applications. However, a large fraction of data continues to be stored in legacy data sources including relational databases, object-oriented databases, XML files, delimited files, HTML files, Excel files, etc. It is now generally realized that there is great value in taking information from various sources and making them work together as a whole [1]. The goal of a data integration system (DIS) is to provide a uniform interface to a multitude of data sources. It enables users to focus on specifying what they want, rather than thinking about how to obtain the answers. As a result, it frees the users from the tedious tasks of finding the relevant data sources, interacting separately using a particular interface, and combining data from multiple sources [2]. XML has emerged as a dominant standard for information exchange on the Internet. XQuery is a powerful and convenient language designed for querying XML. Figure 1 shows a typical XML-based data integration system (XML-DIS) at the concept level. In an XML-DIS, each wrapper exports an XML Schema describing the content of the corresponding data source as XML. The query processor accepts XQuery from the client, parses it, decomposes it, transforms it and pushes down the query plan to the wrappers. The wrappers translate the queries into the local language and transform the local results into the target XML. Typical XML-based data integration systems include BEA AquaLogic [3], Xcalia Intermediation Core [4], etc. Currently because W3C has not published the final standard of the XQuery Update facility, most of the XML-DISs only support at the integration level their own built-in XQuery update functions which are quite different from system to system. Programming frameworks are also designed to support updates in the global level. When the data integration system supports global updates, data consistency should be enforced, e.g., when the user updates data source A, B might also be updated to keep data consistency between A and B. Currently, in the data integration systems data consistency enforcement rules are described in different forms, managed by different components or even embedded in the programming code. Therefore, it is demanding to define a uniform definition, management and maintenance of data consistency enforcement rules. These requirements can be best fulfilled by triggers [5][6]. Triggers enable a uniform and compact description of active rules and integrity constraints, 38
International Journal of Digital Content Technology and its Applications Volume 4, Number 5, August, 2010 which are the foundation of data consistency, and facilitate the maintenance of them [7]. Triggers have a simple syntax and are automatically invoked in response to events. [8] proposes an active system whereby users can place triggers on immaterialized nested XML views of relational data. Triggers are based on Active XQuery [9]. This work is for relational data while ours is for the integration system of heterogeneous data sources. [10] extends XML triggers with path granularity so that the context node can be identified and related context paths can be extracted during trigger execution. Both triggers in this work and our work have node-level and statement-level granularity. The difference is that the XML triggers in this work are defined for XML databases while our XQuery triggers are defined for XML-based data integration system. There are namespace definitions in our XQuery triggers. Figure 1. Architecture of XML-based Data Integration System [11] [12] have done quite some work in specifying XQuery trigger semantic models and discussing the XQuery trigger execution model. The key analysis question is the termination of the trigger execution. A set of triggers is said to be terminating if for any initial event and any initial database state, the trigger execution terminates. Analysis of ECA rules in active databases is a well-studied topic, with a number of approaches appearing in the literature e.g., [13][14], mostly in the context of relational databases. A natural question to ask is whether it is possible to reuse analysis techniques developed for triggers in relational databases by translating the set of XML documents and associated triggers into a relational form and then applying previous analysis techniques to these triggers. The problem with this approach is that it may result in a significant loss of transformation. [15] [9] [16] propose new languages for defining Event- Condition-Action (ECA) rules on XML, providing reactive functionality on XML repositories. [17] proposes a specification language for active view definition on top of an XML repository. Our work is applied to XML views of the underlying heterogeneous data sources. [18] proposes and validates XBML (XML-based Business Modeling Language) as an XML active query language approach to specifying electronic commerce business models. Neither of these works discussed the trigger termination problem. In the XML-based data integration systems, the triggers are quite more dynamic. Therefore, we need a totally new mechanism to analyze the XQuery triggers in the XML-based data integration systems to ensure the termination of trigger execution. Our contributions include: We give an introduction of supporting XQuery trigger model and XQuery trigger service in an XML data integration system including the semantic model and the execution model. We propose three levels of trigger independence including total independence, possible dependence and definite dependence. According to the three levels of trigger independence we define two levels of termination to fit for the XML-DIS environment including tolerant termination and absolute termination. We give the methods to recognizing the trigger independence by analyzing the triggers. 39
Analyzing Triggers in XML Data Integration Systems Jing Lu, Dunlu Peng, Huan Huo, Liping Gao, Xiaodong Zhu We develop both the run-time analysis and repository analysis. We implement the whole trigger analysis procedure above the XQuery trigger service platform and give the evaluation. This paper is organized as the following; Section 2 gives a glance of the XQuery trigger semantic model and execution model. Section 3 introduces the trigger independence and discusses how to identify three levels of trigger independence. Section 4 proposes triggering graph and two levels of termination including tolerant termination and absolute termination. Section 5 discusses the implementation and gives the evaluation. Finally section 6 draws the conclusion. 2. XQuery Trigger Service 2.1 Trigger Semantic Model We decide to use a simplified and a slightly modified trigger model based on Active XQuery, due to its simplicity and good compatibility with XQuery. Our XQuery trigger uses the update syntax in conformance to the W3C standard update facility [19] and adheres to the spirit of SQL99 [20], which has gained tremendous popularity for developing data-intensive applications and which is the most used in commercial systems. The meta model of the XQuery trigger is defined in Figure 2. The first line is to declare the namespace. Which data objects in the data integration system are related in the trigger definition can be defined in the namespace declaration. The second line is the trigger name. The third line is the trigger associated operations. The fourth line is the trigger-relative elements. The fifth line defines the XQuery variables covering both the condition part and the action part. The sixth line is the trigger checking condition and the seventh line is the action part. The action part can be INSERT, DELETE, and REPLACE, an external operation or simply an error message (for data integrity constraints). 2.2 Trigger Execution Model Figure 2. The XQuery trigger semantic model Figure 3 shows the architecture of the XQuery trigger service in the XML-DIS. When the client application submits an update to the data integration system (Step 1), the data integration system will call the trigger service. The trigger service will first judge which kind of operation the update is (INSERT; DELETE; REPLACE) (Step 2). The trigger service will consult the trigger repository and fetch the related triggers (Step 3). The related triggers are put into the conflict set (Step 4). Then the trigger scheduling component will fetch the triggers in the conflict set (Step 5) and transform the fired triggers to the condition evaluator (Step 6). The trigger service will evaluate the condition with the help of XQuery engine (Step 8). During condition evaluation, it is possible to query other data sources through the data integration system (Step 7). The condition evaluation result will be sent to the action firing component (Step 9). If the condition is evaluated to true, there are two possible 40
International Journal of Digital Content Technology and its Applications Volume 4, Number 5, August, 2010 kinds of actions: one is an error message, the other is a queue of updates. If the action is an error message, it means that the triggered active rules are CHECK constraints and the constraints are violated. The action firing component will call the message generator (Step 10) and the error message will be sent back to the users (Step 11). The operation will be aborted so the data integrity is guaranteed. If the actions are a queue of updates, the queue will be sent to the data integration system (Step 12) and the update will be executed in the underlying data sources (Step 13). 3. Trigger Analyzing 3.1. Trigger Analysis in RDB Figure. 3. Architecture of the XQuery Trigger Enforcement Service Trigger analysis deals with predicting how a set of triggers behaves at run-time. The following are the three properties of trigger behaviors in active database systems (ADBs) [21]: Termination. ADB triggers are terminating only if there is no recursive firing of triggers. Confluence. Confluence property of triggers decides whether the execution order of nonprioritized triggers make any difference in the final database state. Observable Determinism. A trigger set is observably deterministic, if the effect of trigger processing as observed by the user of the system is independent of the order in which the fired triggers are selected for processing. 3.2 Runtime analysis and repository analysis In Fig 3, after the events are detected, the associated triggers are found from the repository and are put into the conflict set. We call the trigger analysis in the conflict set the runtime analysis. Also, the analysis can take place when a new trigger is inserted into the trigger repository. Every time there is an insertion of triggers, the analysis procedure must check when the insertion may lead to a loop among the triggers in the repository. We call this analysis repository analysis. 3.3 Trigger Abstraction The trigger model in Fig 2 has quite concrete and significant semantics. As we know, in XML-based data integration systems, data sources are autonomous, loosely-coupled and change quite quickly. It is necessary to find a relaxed trigger analysis rules. Therefore, in the trigger analysis, we take on a trigger abstraction first to make the semantic model more compact and simple. The trigger abstraction model is as following: 41
Analyzing Triggers in XML Data Integration Systems Jing Lu, Dunlu Peng, Huan Huo, Liping Gao, Xiaodong Zhu IN namespace1, namespace2,..., namespacen ON events IF conditions DO actions First line is to define the corresponding data source names as defined in the Fig 2 model. Events are replaced with the ON-clause (Line3-4 in Fig 2). Conditions are replaced with the WHEN-clause (Line 7 in Fig 2). Actions are replaced with the DO-clause (Line 8-13 in Fig 2). It is worthwhile to note that in runtime trigger analysis variations in LET-clause should be replaced with their real values since the variations are known. However, in repository trigger analysis, these variations are unknown. Thus, in repository analysis we need a more relaxed procedure. We will discuss in the following sections. 3.4 Trigger Independence A trigger Ri may trigger a trigger Rj if the action of Ri may generate an event which triggers Rj. Due to the autonomy and the character of loosely-coupling of data integration system, we consider the trigger independence in the trigger conflict set and in the trigger repository. Generally speaking, trigger Ri is independent of trigger Rj if the action of Ri does not generate an event which triggers Rj. We consider three different degrees of trigger independence: total independence, possible dependence, and definite dependence. We will explain these three degrees of trigger independence in the following sections. 3.4.1 Total independence We call a trigger Ri is totally independent of trigger Rj if the two triggers satisfy the following conditions: 1. The namespace of R i is total different from the namespace of R j ; or 2. The action part of R i is total different from the event part of R j ; or 3. The action part of R i is the same as the event part of R j, but the condition part of trigger R j can be evaluated to be false. It is clearly to find that if the namespaces of two triggers are different, which means that the triggers are executed above different data sources, there is no data interoperations among the two triggers. Therefore, it is impossible to cause a non-termination. The second condition means that if the execution of the action part of trigger Ri does not cause the event of trigger Rj, these two triggers are independent. For the triggers in the conflict set, we define the following rule: Rule 1 If the triggers in the conflict set are totally independent of each other, the execution of the triggers can be terminated. For the triggers in the repository, we define the following rule: Rule 2 If all the triggers in the trigger repository are totally independent of each other, the execution of the whole trigger service system can be terminated. 3.4.2 Possible dependence Total independence defines in which situation the triggers can be terminated surely. But in the real world, it is definitely possible that the execution of some triggers causes the execution of other triggers. It is worth of noting that in the data integration system, the data changes dynamically. Also in the definition of triggers, there are some variants which come from the changed content. It is still possible that there are some functions in the triggers. The result of the functions comes from the queries to the 42
International Journal of Digital Content Technology and its Applications Volume 4, Number 5, August, 2010 data integration systems. (Details could be referred to [11] [12].) Therefore, in the trigger repository analysis procedure, it is impossible to get the value of the variants or the result of the functions. Possible dependence is to define this situation. We define the possible dependence as the following. Two triggers are said to be possibly dependent if: 1. The triggers are not totally independent; and 2. The action part of trigger R i is the same as the event part of the trigger R j ; and 3. There are variants or functions in the condition part of trigger R j. Possible dependence means that if the variants or the functions in the condition part of Rj cause the condition to be true, trigger Ri will fire Rj. But it is also possible that the variants or the functions will not cause the condition to be true, then trigger Ri will not fire trigger Rj. Possible dependence refer to both the repository analysis and the runtime analysis. In repository analysis, it is impossible to get the value of the variants and it is impossible to calculate the result of the functions. In runtime analysis, the functions in the triggers are calculated in Step 7 (in Fig. 3), which is before the analysis. 3.4.3 Definite dependence Two triggers are said to be definitely dependent if: 1. The triggers are not totally independent; and 2. The action part of trigger R i is the same as the event part of the trigger R j ; and 3. There is no condition part in the trigger R j. If there is no condition part in trigger R j, it means, that trigger R j will definitely be fired if the event is detected. 4. Triggering Graph and Two levels of termination We use the triggering graph to analyze the triggers in the conflict set or in the repository. If two triggers are dependent, there will be a directed arrow from Ri to Rj. Again due to the autonomy and dynamicity of data integration system, we define two levels of termination. The first one is tolerant termination and the second level is absolute termination. In tolerant termination, we suppose the condition part of trigger Rj will not come true. Then there will not be a directed arrow from Ri to Rj. In absolute termination, we suppose that the condition part of trigger Rj will come true. Then there will be a directed arrow from Ri to Rj. Thus a triggering graph forms. We define the following rule to avoid the non-termination of triggers in the conflict set or in the repository: Rule 3 The triggers in the conflict set or in the repository are said to be terminated if there is no cycle in the triggering graph. We use the DAG algorithm to find the cycle in the triggering graph. The administrator can decide the whole trigger service system to be in tolerant termination or in absolute termination. 5. Implementation and evaluation Refer to Fig.3, we use BEA AquaLogic Dataservice Platform 3.0 [22] as the XML-DIS which supports SDO programming framework. We use Software AG s Tamino XML Server [23] as the trigger repository and Tamino s XQuery engine to evaluate conditions. We let the administrator to decide whether the system wants a tolerant termination or an absolute termination. Trigger dependence are categorized into three levels in our system: totally independent, possible dependence and definite dependence. The categories reflect the real situation of data integration systems which are autonomous, loosely-coupled and highly dynamic. Tolerant termination and absolute termination can be used to fit for different requirements of the data integration systems. After the analysis of triggers, the execution of triggers can be terminated and those triggers which cannot be terminated can be detected and the execution will be avoided. The possible non-termination can also be detected and the administrator will 43
Analyzing Triggers in XML Data Integration Systems Jing Lu, Dunlu Peng, Huan Huo, Liping Gao, Xiaodong Zhu get a warning. By the trigger analysis the quality of the while trigger service is improved and thus the reliability and stability are ensured. 6. Conclusion and future work This paper presents an approach to analyze the triggers in the trigger service for an XML-based data integration system. XQuery trigger service is introduced at the beginning. We propose the runtime analysis and repository analysis. Trigger relationship is divided into three levels: total independence, possible dependence and definite dependence. The termination of trigger execution can be chosen to be tolerant termination or absolute termination. We use the triggering graph to analyze the possibly dependent and definitely dependent triggers and DAG algorithm is used to find a cycle in the triggering graph. After the trigger analysis, the execution of triggers can be controlled, which add the robustness and stability of the whole XQuery trigger service. Analysis of trigger confluence and observable determinism are the future work. 7. References [1] Gio Wiederhold. Mediators in the architecture of future information systems. Computer, 25(3):38 49, 1992. [2] Alon Y. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270 294, 2001. [3] Michael Carey. Data delivery in a service-oriented world: the bea aqualogic data services platform. In SIGMOD 06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 695 705, New York, NY, USA, 2006. ACM. [4] Xcalia. Xcalia intermediation core. http://www.xcalia.com/products/datasheets/xcalia Intermediation Platform.pdf. [5] Eric N. Hanson and Samir Khosla. An introduction to the triggerman asynchronous trigger processor. In Lecture Notes In Computer Science; Vol. 1312, Proceedings of the Third International Workshop on Rules in Database Systems, pages 51 66. Springer-Verlag, 1997. [6] Genoveva Vargas-Solar, Christine Collet, and Helena G. Ribeiro. Active services for federated databases. In SAC 00: Proceedings of the 2000 ACM symposium on Applied computing, pages 356 360, New York, NY, USA, 2000. ACM. [7] Elena Baralis and Jennifer Widom. An algebraic approach to rule analysis in expert database systems. Technical report, Stanford, CA, USA, 1994. [8] Shao, F., Novak, A. & Shanmugasundaram, J., Triggers over nested views of relational data, ACM Transactions on Database Systems, Vol 31, No 3, pp 921 967, 2006. [9] A. Bonifati, D. Braga, A. Campi, and S. Ceri. Active xquery. In ICDE 02: Proceedings of the Eighteenth International Conference on Data Engineering, pages 403 412, San Jose, USA, 2002. IEEE Computer Society. [10] [Landberg, A. H., Rahayu, J. W., Pardede, E., Extending XML Triggers with Path-Granularity, Web Information Systems Engineering (WISE 2007), pp 410-422, 2007. [11] Jing Lu, Bernhard Mitschang., Enforcing Data Consistency in Data Integration Systems by Trigger Service. In: International Journal of Web Information Systems, vol 5(2), 2009. [12] Jing Lu, Bernhard Mitschang, An XQuery-based Trigger Service to Bring Consistency Management to Data Integration Systems, In: Proceedings of the 10th International Conference on Information Integration and Web-based Applications &Services (iiwas2008), Nov.24-26, Linz, Austria. [13] Eric N. Hanson and Samir Khosla. An introduction to the triggerman asynchronous trigger processor. In Lecture Notes In Computer Science; Vol. 1312, Proceedings of the Third International Workshop on Rules in Database Systems, pages 51 66. Springer-Verlag, 1997. [14] Elena Baralis, Stefano Ceri, and Stefano Paraboschi. Improving rule analysis by means of triggering and activation graphs. In RIDS 95: Proceedings of the Second International Workshop on Rules in Database Systems, pages 165 181, London, UK, 1995. Springer-Verlag. [15] James Bailey, Alexandra Poulovassilis, and Peter T. Wood. Analysis and optimization for eventcondition-action rules on xml. Computer Networks, 2001. 44
International Journal of Digital Content Technology and its Applications Volume 4, Number 5, August, 2010 [16] G. Papamarkos, A. Poulovassilis, and P. Wood. Event-condition-action rule languages for the semantic web. In In Proc. Workshop on Semantic Web and Databases, at VLDB 03, Berlin, 2003. [17] Serge Abiteboul, Bernd Amann, Sophie Cluet, Adi Eyal, Laurent Mignet, and Tova Milo. Active views for electronic commerce. In VLDB 99: Proceedings of the 25th International Conference on Very Large Data Bases, pages 138 149, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. [18] H. Ishikawa and M. Ohta. An active web-based distributed database system for e-commerce. In In Proc. Web Dynamics Workshop, London, 2001. [19] W3C. Xquery update: last call for specification. http://www.w3c.org. [20] Jim Melton, editor. Advanced SQL: 1999, Understanding Object-Oriented and Other Advanced Features. Morgan Kaufmann, 2003. [21] Alexander Aiken, Joseph M. Hellerstein, and Jennifer Widom. Static analysis techniques for predicting the behavior of active database rules. ACM Trans. Database Syst., 20(1):3 41, 1995. [22] BEA Systems. Inc. Bea aqualogic data services platform 3.0, 2009. http://edocs.bea.com/aldsp/docs30/index.html. [23] Software AG. Number one in xml management: Tamino xml server, technical factsheet, 2009. 45