Join Minimization in XML-to-SQL Translation: An Algebraic Approach
|
|
|
- Brandon Barnett
- 9 years ago
- Views:
Transcription
1 Join Minimization in XML-to-SQL Translation: An Algebraic Approach Murali Mani Song Wang Daniel J. Dougherty Elke A. Rundensteiner Computer Science Dept, WPI Abstract Consider an XML view defined over a relational database, and a user query specified over this view. This user XML query is typically processed using the following steps: (a) our translator maps the XML query to one or more SQL queries, (b) the relational engine translates an SQL query to a relational algebra plan, (c) the relational engine executes the algebra plan and returns SQL results, and (d) our translator translates the SQL results back to XML. However, a straightforward approach produces a relational algebra plan after step (b) that is inefficient and has redundant joins. In this paper, we report on our preliminary observations with respect to how joins in such a relational algebra plan can be minimized, while maintaining bag semantics. Our approach works on the relational algebra plan and optimizes it using novel rewrite rules that consider pairs of joins in the plan and determine whether one of them is redundant and hence can be removed. Our study shows that algebraic techniques achieve effective join minimization, and such techniques are useful and can be integrated into mainstream SQL engines. 1 Introduction Queries, and their corresponding algebra plans, generated automatically by translating queries specified over virtual views tend to have unnecessary joins [16]. Such algebra plans take much longer time to execute when compared to an equivalent algebra plan without the unnecessary joins. In this paper, we study the problem of how to remove unnecessary joins from a relational algebra plan. As it sounds, this problem has been extensively studied in the more than thirty years of SQL and relational history [2, 1, 8, 15, 4]. In spite of the large amount of research, current SQL engines do very minimal join-minimization; the only kind of join minimization done is that of removing a join such as A c B, where c is a condition of the form A.key = B.fk, and B.fk is foreign key attribute(s) of B that reference A. The reason for this minimal adoption is because existing solutions in research assume a set semantics, which give incorrect results when we assume bag semantics required by SQL. As a simple example, consider the algebra plan Π atta (A B), where A, B are relations, att A is the set of attributes of A, and Π is a set-based project that removes duplicates. The above plan is equivalent to the plan A, under such a project based on set semantics. However, if the project operator does not remove duplicates as under bag semantics, the above two plans give different results. Motivating Example: Let us consider an example application scenario from the medical domain to illustrate the practicality of this problem. Consider two relations in the database of a primary clinic: one that describes doctors and their speciality, and another that describes patients, who their primary doctor is, and what their primary health issue is. The two relations and their sample data are shown in Table 1. docid name speciality ID1 Mike ENT ID2 Mary General ID3 Cathy General (a) Doctor Relation with Sample Data patid name primaryhealthissue doctor SSN1 Matt Arthritis ID1 SSN2 Joe Polio ID1 SSN3 Mark Cancer ID2 SSN4 Emily Arthritis ID2 SSN5 Greg Cancer ID2 SSN6 Terry Cancer ID3 SSN7 Tina Cancer ID3 (b) Relation with Sample Data Table 1: Example Relational Database
2 Now consider that the primary clinic needs to export an XML view of its data to a certain class of users. The view must specify the patients diagnosed with cancer, and their primary health care physicians, grouped by the physicians. Such view definitions have been studied in several systems such as SilkRoute [5], XPERANTO [14], and CoT [10]. Fig 1 shows this view defined using XQuery [17]. <root> { for $d in //Doctor where exists (//[@doctor=$d/@docid Cancer ]) return <doctor DoctorID={$d/@docID}> for $p in //[@doctor=$d/@docid Cancer ]) return <patient ID={$p/@patID}/> </doctor> } </root> Figure 1: An XML view of the relational database from Table 1 defined using an XQuery <root> <doctor DoctorID= ID2 > <patient ID= SSN3 /> <patient ID= SSN5 /> </doctor> <doctor DoctorID= ID3 > <patient ID= SSN6 /> <patient ID= SSN7 /> </doctor> </root> Figure 2: The result from a user query /root against the view defined in Figure 1 Such a view is typically virtual, and not materialized. Once such a view is defined, one needs to support arbitrary queries to be specified over this view. For instance, the result of the query /root is shown in Figure 2. Consider a user query U 1 = //patient/@id, that retrieves all the patient IDs in the view. Such a query could be answered using the following steps: (a) our translator translates the above XML query into SQL queries, (b) the relational engine translates an SQL query into a relational algebra plan, (c) the relational engine executes the algebra plan to get SQL results, and (d) our translator translates the SQL results back to XML to conform to the view. After these steps, the user will get the answer {SSN3, SSN5, SSN6, SSN7} 1. 1 Note that we are assuming an unordered semantics. Considering order constraints, such as SSN3 and SSN5 must appear next to each other, are outside the scope of this work. Such unordered semantics as we assume might be appropriate, if the user knows that the underlying data source is relational. For step (a), our translator uses a mapping as shown in Figure 3. This mapping says that one root node always exists in the view; the set of doctor children of this root node is the doctors that have a patient with cancer; given a doctor, her patients are those who have cancer. Such mappings are derived from the view query definition [5, 14]. root SELECT * FROM Doctor d WHERE EXISTS ( SELECT * FROM p doctor WHERE p.primaryhealthissue= Cancer AND p.doctor=d.docid) SELECT * FROM p, doctor d patient WHERE p.primaryhealthissue= Cancer AND p.doctor=d.docid Figure 3: Mapping obtained from the view query (Figure 1) used to answer queries. Let us see how the translator translates the user XML query U 1 into SQL using the above mapping. The translator can determine that the set of patients can be obtained from the SQL query corresponding to the patient node in the mapping. This query in turn uses the doctor node in the mapping, which in turn can be substituted by the SQL query corresponding to the doctor node in the mapping. After such substitutions, and some minor syntactic rewriting, we get the SQL query Q 1 that answers the user query as: SELECT p.patientid FROM p, (SELECT * FROM Doctor d1 WHERE EXISTS ( SELECT * FROM p1 WHERE p1.primaryhealthissue= Cancer AND p1.doctor=d1.docid)) d WHERE p.primaryhealthissue= Cancer AND p.doctor=d.docid The above query specifies two joins: first there is a join between Doctor d1 and p1 to produce d, that is the set of doctors who have cancer patients. This d is then joined with p to get the final result. However, from the application semantics, we know that every patient who has cancer will appear in the view. Therefore a simpler SQL query Q 2 for answering U 1 would be: 2 2 Q 2 answers U 1 if we assume that every patient has one doctor. However even without this assumption, Q 1 can be optimized to a query which has only one join, as we will see later. In other words, Q 1 always has redundant joins.
3 SELECT p.patientid FROM p WHERE p.primaryhealthissue= Cancer A query such as Q 1 might not be inefficient if the relational engine can optimize the query. A relational engine first translates an SQL query into a relational algebra plan and tries to optimize this plan. This optimized plan is then executed. However, when Q 1 is fed to a relational engine (we use IBM DB2 V8), we get a final plan that looks like the one in Figure 4. Observe that the plan still has the two joins. TBSCAN JOIN TBSCAN Doctor JOIN TBSCAN Figure 4: Algebra Plan corresponding to Q 1 generated by an SQL engine. In this paper, we come up with a novel set of rules for minimizing joins in a relational algebra plan. Our rules determine whether a join in an algebra plan can be removed by examining other joins in the plan. Using our rules, as well as previously studied rules that minimize joins by examining semantic constraints in the schema, we are able to minimize the query plan in Figure 4 to an equivalent query plan without any joins, while maintaining bag semantics. Outline of the paper The rest of the paper is organized as follows. Section 2 describes some of the related work in minimizing joins. Our rules for minimizing joins, along with an example illustration, are described in Section 3. We report on our preliminary experimental studies that show the performance gain possible by such join minimization in Section 4. Our conclusions and future directions are given in Section 5. 2 Related Work Dan Suciu reported in [16] that the translator (step (a) in our process) in SilkRoute can produce SQL queries with unnecessary joins, and gave some insights as to why this problem might be more critical in the world of XML views, as opposed to plain SQL views. In XML views, there is a query associated with each type whereas in SQL views, there is only one type and a query associated with that type. Hence in XML views, queries that join multiple view queries are very frequent. In [9], the authors study the problem of join minimization for XML views. Here the authors try to optimize step (a) (as opposed to step (b) in our approach). They do this by identifying which nodes in the view mapping such as Figure 3 form bijective mappings. A node in the view mapping is said to be a bijective mapping with respect to a relation in the SQL database, if there is an element of this node type in the view instance corresponding to every row in the relation. In our example view mapping shown in Figure 3, every row in the Doctor relation does not appear in the view; every row in the relation also does not appear in the view. Therefore both the doctor node and the patient node in Figure 3 do not form bijective mappings. This means that the techniques studied in [9] will end up with an inefficient query plan such as the one in Figure 4. In [10], the authors study a class of views where every node in the mapping is necessarily bijective. In other words, they disallow a view definition such as the one in Figure 1. By making this assumption, the authors are able to optimize step (a), and come up with minimal SQL queries easily: every XPath expression (or subexpression) that selects every element in the instance corresponding to a node can be obtained by a select query from the corresponding relation (and no joins are needed). In the previous section, we mentioned the rich body of work that study join minimization assuming set semantics. In [2], Chandra and Merlin showed that there is a unique minimal query for any given conjunctive query, and that such minimization is NPhard. In [1], the authors considered additional constraints such as functional dependencies specified on the relations, and came up with a tableau (matrix) based approach for decreasing joins. Minimization of joins in the presence of functional dependencies was also shown to be NP-complete in the size of the query. In [8], the authors considered functional and inclusion dependencies and showed that minimization of joins is still NP-complete. Here the authors came up with a chase technique that, given a query, expands the query by adding conjuncts based on the functional and inclusion dependencies. This expanded query can then be minimized. A graph based approach, consisting of expansion and reduction steps, for join minimization is studied in [15]. Recently, in [4], the authors consider physical structures such as primary and secondary indexes, extent-based representation of OO classes, join indexes, path indexes, access support relations, gmaps etc. The authors study how to
4 translate a logical query into a minimal query against the physical schema, using a chase step that expands the logical query to a universal query, and then a backchase step that minimizes the universal query. The above approaches [2, 1, 8, 15, 4] do provide a good understanding of the problem; however, these techniques cannot be used in SQL engines, because SQL is based on bag semantics. The complexity of join minimization of conjunctive queries under bag semantics as in SQL is studied in [7, 3], and they report that query containment is Π p 2-hard. Further, in [3], the authors consider select-from-where queries with bag semantics, and remark that such queries cannot be minimized without additional semantic constraints. In our work, we consider queries that produce semi-joins in the plans (such as queries with exists), and show that these joins can infact be reduced without any additional semantic constraints. The approach that we propose for join minimization is an algebraic rewriting technique. Algebraic rewriting rules for SQL queries have been studied extensively, for example in [11, 12, 6, 13]. Some of the rules include removal of DISTINCT if one of the returned columns is known to be unique, techniques for decorrelation etc. However, none of the techniques study join minimization that can optimize the query plan shown in Figure 4. We expect that our techniques described in this paper will complement existing algebraic optimization techniques. 3 Rules for Minimizing Joins In this section, we will describe our rules for minimizing joins in an algebra plan. We will state each rule informally, rather than using a formal notation, for ease of explanation. Further, we assume that some preliminary analysis of the algebra plan has already been done to identify characteristics such as for every operator, what columns are needed in the rest of the algebra plan (refer to any commercial optimizer like IBM DB2). We will use the following common notations for our relational algebra operators: select is denoted by σ; project is denoted by π; denotes join; denotes semi-join; o L denotes left-outer join; δ removes duplicates; γ denotes grouping. Before we define the rules, we would like to introduce the notion of logical entailment. For instance, we say that the condition (a = b) (c = d) logically entails the condition (a = b). Given two conditions (boolean expressions) c 1 and c 2, c 2 logically entails c 1 if c 2 c 1 is always true. In other words, whenever c 2 evaluates to true c 1 will necessarily be true. A naive method for checking logical entailment is: identify common terms in c 1 and c 2 using syntactic analysis, and then check for all combinations of truth values of every term, whether c 2 c 1 is true. Our first two rules are already studied and implemented in most commercial systems. They utilize semantic constraints (key-foreign key constraints) in the schema to remove joins. Rule 1 A c B can be reduced to σ c (B) if c is a condition that logically entails the condition A.key = B.fk, where B.fk is foreign key referencing A, no column in B.fk can be NULL, and no column of A is needed in the rest of the algebra plan. c is obtained from c by removing the condition A.key = B.fk. Rule 2 A c B can be reduced to σ c (B) if c is a condition that logically entails the condition A.key = B.fk, where B.fk is foreign key of B that references A, and no column of A is needed in the rest of the algebra plan. c is obtained from c by removing the condition A.key = B.fk, and by adding condition of the form B.fk IS NOT NULL. Our third and fourth rules are more complex, and form the crux of our approach. They try to remove unnecessary semi-joins that may appear in the algebra plan. Semi-joins may appear in an algebra plan when we decorrelate a correlated SQL query. For example, consider the SQL query corresponding to the doctor node in Figure 3. It specifies a correlated query, which is translated into an algebra plan such as: Doctor c P atient, where c = (doctor = docid AND primaryhealthissue = Cancer ) is the join condition. The result of this semi-join is the set of rows in the Doctor relation, that satisfy the condition. Now in Q 1, the above result is then joined with the P atient relation. The algebra plan corresponding to this is (Doctor c 1 P atient) c 2 P atient. Further, in this query the two conditions c 1 and c 2 are identical. In other words, the doctors who have patients are then joined with patients. We see that the first semijoin can be removed. We now get the query plan Doctor c 2 P atient. Rule 3 (A c 1 B) c 2 B can be reduced to A c 2 B if the condition c 2 logically entails the condition c 1. The above rule can be implemented by doing a bottom-up traversal of the algebra plan. For any semi-join such as (A c 1 B), check if this operator has an ancestor operator in the plan that is a join with B, and has a join condition c 2 where c 2 logically entails c 1. This rule can be extended to an ancestor semi-join also, and the correctness holds. Rule 4 (A c 1 B) c 2 B can be reduced to A c 2 B if the conditions c 2 logically entails the condition c 1.
5 Using the above rules, we can come up with an efficient relational algebra plan for Q 1, as shown in Figure 5. First we start with a plan that includes a semi-join and a join. Using Rule 3, we first remove the semi-join. We then use Rule 1 to remove the remaining join. The result is an efficient algebra plan with no unnecessary joins. In our experimental section, we show this efficient plan executes orders of magnitude faster; we achieved improvement of a factor of about 26 for simple queries 3. Rule 3 Doctor Rule 1 (a) Query Plan from Q 4 (b) Query Plan from Q 5 Doctor Figure 5: Using our rules to minimize relational algebra plan for query Q 1. 4 Experimental Evaluation We performed some preliminary experiments to illustrate the effectiveness of our proposed approach. Our experiments were done on IBM DB2 V8 Database Server, which is installed on an 1.4 GHz Pentium machine with 512 MB RAM, running Windows XP. We used the TPC-H 4 benchmark data, loading data of different amounts from 500 MB to 4 GB. We performed three sets of experiments. The first set of experiments illustrate that joins can be expensive. For this, we executed the following two queries: Q 4 : SELECT COUNT (*) FROM LINEITEM l, PART p WHERE l.l PARTKEY=p.P PARTKEY Q 5 : SELECT COUNT (*) FROM LINEITEM l The plans for these two queries are shown in Figure 6. The execution times for these two queries against TPC-H data are shown in Figure 8. Note that this join can actually be very expensive, as it is not a key-foreign key join. The second set of experiments is similar to our motivating example, and show the effectiveness of Rule 3. For this we executed the queries Q 6, and the equivalent query Q 5. Our rules are able to reduce Q 6 to Q 5. The plan for Q 6 is shown in Figure 7. The execution times for these two queries against TPC-H data are shown in Figure 8. Note that we get considerable performance gain using our rules. Q 6 : SELECT COUNT (*) FROM LINEITEM l, (SELECT * FROM ORDERS o1 3 To clarify, for more complex queries, where the percentage of unnecessary joins is smaller, we expect to get lower factors of improvement, but larger absolute values of improvement. 4 Figure 6: Illustrating that joins can be expensive. The execution times are shown in Figure 8. WHERE EXISTS ( (SELECT * FROM LINEITEM l1 WHERE l1.l ORDERKEY=o1.O ORDERKEY)) o WHERE l.l ORDERKEY=o.O ORDERKEY Figure 7: Query plan corresponding to Q 6. Our rules reduce this plan to the plan in Figure 6(b). The third set of experiments illustrate the effectiveness of Rule 4. For this consider query Q 7 below: SELECT COUNT (*) FROM LINEITEM l WHERE EXISTS (SELECT * FROM ORDERS o1 WHERE o1.o_orderkey=l.l_orderkey) AND EXISTS (SELECT * FROM ORDERS o1 WHERE o1.o_orderkey=l.l_orderkey)
6 Using our Rule 4, we can remove one of the joins. We then get an algebra plan that is equivalent to the query Q 8 given below: (Execution times of Q 7 and Q 8 are shown in Figure 8.) SELECT COUNT (*) FROM LINEITEM l WHERE EXISTS (SELECT * FROM ORDERS o1 WHERE o1.o_orderkey=l.l_orderkey) Figure 8: Execution times for the different queries 5 Conclusions and Future Work In this paper, we have shown that significant performance gain can be achieved by performing join minimization, and that research so far has not solved the join minimization in a satisfactory manner. We have come up with a solution for join minimization that is based on the commercially used algebraic rewriting techniques and preserves SQL bag semantics. We expect that our work will open up renewed interest in this problem, and that the solutions will get adopted into commercial SQL engines. As part of future work, we need to integrate our solutions into commercial optimizers in order to study the query compilation time, as well as demonstrate the feasibility of our techniques. References [1] A. V. Aho, Y. Sagiv, and J. D. Ullman. Efficient Optimization of a Class of Relational Expressions. ACM Trans. on Database Systems (TODS), 4(4): , [2] A. K. Chandra and P. M. Merlin. Optimal Implementation of Conjunctive Queries in Relational Data Bases. ACM Symposium on Theory of Computing (STOC), pages 77 90, [3] S. Chaudhuri and M. Y. Vardi. Optimization of Real Conjunctive Queries. In ACM PODS, Washington, DC, May [4] A. Deutsch, L. Popa, and V. Tannen. Physical Data Independence, Constraints and Optimization with Universal Plans. In VLDB, Edinburgh, Scotland, Sep [5] M. F. Fernandez, Y. Kadiyska, D. Suciu, A. Morishima, and W. C. Tan. SilkRoute: A Framework for Publishing Relational Data in XML. ACM Trans. on Database Systems (TODS), 27(4): , Dec [6] P. Gassner, G. M. Lohman, K. B. Schiefer, and Y. Wang. Query Optimization in the IBM DB2 Family. IEEE Data Eng. Bulletin, 16(4):4 18, [7] Y. E. Ioannidis and R. Ramakrishnan. Containment of Conjunctive Queries: Beyond Relations as Sets. ACM Trans. on Database Systems (TODS), 20(3): , Sep [8] D. Johnson and A. Klug. Testing Containment of Conjunctive Queries under Functional and Inclusion Dependencies. In ACM PODS, Los Angeles, CA, Mar [9] R. Krishnamurthy, R. Kaushik, and J. F. Naughton. Efficient XML-to-SQL Query Translation: Where to Add the Intelligence. In VLDB, Toronto, Canada, Sep [10] D. Lee, M. Mani, F. Chiu, and W. W. Chu. NeT & CoT: Translating Relational Schemas to XML Schemas. In ACM CIKM, McLean, Virginia, Nov [11] H. Pirahesh, J. M. Hellerstein, and W. Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. In ACM SIGMOD, San Diego, CA, June [12] H. Pirahesh, T. Y. C. Leung, and W. Hasan. A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS. In IEEE ICDE, Birmingham, UK, Apr [13] P. Seshadri, H. Pirahesh, and T. Y. C. Leung. Complex Query Decorrelation. In IEEE ICDE, New Orleans, LA, Feb [14] J. Shanmughasundaram, J. Kiernan, E. Shekita, C. Fan, and J. Funderburk. Querying XML Views of Relational Data. In VLDB, Roma, Italy, Sep [15] S. T. Shenoy and Z. M. Ozsoyoglu. A System for Semantic Query Optimization. In ACM SIGMOD, San Francisco, CA, May [16] D. Suciu. On Database Theory and XML. ACM SIGMOD Record, 30(3):39 45, Sep [17] W3C. XQuery Working Group.
Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation
Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Rajasekar Krishnamurthy University of Wisconsin [email protected] Raghav Kaushik Microsoft Corporation [email protected]
Translating WFS Query to SQL/XML Query
Translating WFS Query to SQL/XML Query Vânia Vidal, Fernando Lemos, Fábio Feitosa Departamento de Computação Universidade Federal do Ceará (UFC) Fortaleza CE Brazil {vvidal, fernandocl, fabiofbf}@lia.ufc.br
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
An Efficient and Scalable Management of Ontology
An Efficient and Scalable Management of Ontology Myung-Jae Park 1, Jihyun Lee 1, Chun-Hee Lee 1, Jiexi Lin 1, Olivier Serres 2, and Chin-Wan Chung 1 1 Korea Advanced Institute of Science and Technology,
Unified XML/relational storage March 2005. The IBM approach to unified XML/relational databases
March 2005 The IBM approach to unified XML/relational databases Page 2 Contents 2 What is native XML storage? 3 What options are available today? 3 Shred 5 CLOB 5 BLOB (pseudo native) 6 True native 7 The
Relational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model
CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 The Relational Model CS2 Spring 2005 (LN6) 1 The relational model Proposed by Codd in 1970. It is the dominant data
The Relational Model. Why Study the Relational Model? Relational Database: Definitions
The Relational Model Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Microsoft, Oracle, Sybase, etc. Legacy systems in
Deferred node-copying scheme for XQuery processors
Deferred node-copying scheme for XQuery processors Jan Kurš and Jan Vraný Software Engineering Group, FIT ČVUT, Kolejn 550/2, 160 00, Prague, Czech Republic [email protected], [email protected] Abstract.
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Constraint Preserving XML Storage in Relations
Constraint Preserving XML Storage in Relations Yi Chen, Susan B. Davidson and Yifeng Zheng ÔØº Ó ÓÑÔÙØ Ö Ò ÁÒ ÓÖÑ Ø ÓÒ Ë Ò ÍÒ Ú Ö ØÝ Ó È ÒÒ ÝÐÚ Ò [email protected] [email protected] [email protected]
Integrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3
The Relational Model Chapter 3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase,
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms Mohammed M. Elsheh and Mick J. Ridley Abstract Automatic and dynamic generation of Web applications is the future
Technologies for a CERIF XML based CRIS
Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the
Semantic Analysis of Business Process Executions
Semantic Analysis of Business Process Executions Fabio Casati, Ming-Chien Shan Software Technology Laboratory HP Laboratories Palo Alto HPL-2001-328 December 17 th, 2001* E-mail: [casati, shan] @hpl.hp.com
Clean Answers over Dirty Databases: A Probabilistic Approach
Clean Answers over Dirty Databases: A Probabilistic Approach Periklis Andritsos University of Trento [email protected] Ariel Fuxman University of Toronto [email protected] Renée J. Miller University
Query reformulation for an XML-based Data Integration System
Query reformulation for an XML-based Data Integration System Bernadette Farias Lóscio Ceará State University Av. Paranjana, 1700, Itaperi, 60740-000 - Fortaleza - CE, Brazil +55 85 3101.8600 [email protected]
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
Likewise, we have contradictions: formulas that can only be false, e.g. (p p).
CHAPTER 4. STATEMENT LOGIC 59 The rightmost column of this truth table contains instances of T and instances of F. Notice that there are no degrees of contingency. If both values are possible, the formula
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,
IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen
Table of Contents IBM DB2 XML support About this Tutorial... 1 How to Configure the IBM DB2 Support in oxygen... 1 Database Explorer View... 3 Table Explorer View... 5 Editing XML Content of the XMLType
MTCache: Mid-Tier Database Caching for SQL Server
MTCache: Mid-Tier Database Caching for SQL Server Per-Åke Larson Jonathan Goldstein Microsoft {palarson,jongold}@microsoft.com Hongfei Guo University of Wisconsin [email protected] Jingren Zhou Columbia
CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?
CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency
XML Data Integration
XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
Efficiently Identifying Inclusion Dependencies in RDBMS
Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany [email protected]
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries
Indexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,
DLDB: Extending Relational Databases to Support Semantic Web Queries
DLDB: Extending Relational Databases to Support Semantic Web Queries Zhengxiang Pan (Lehigh University, USA [email protected]) Jeff Heflin (Lehigh University, USA [email protected]) Abstract: We
Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange
Data Integration and Exchange L. Libkin 1 Data Integration and Exchange Traditional approach to databases A single large repository of data. Database administrator in charge of access to data. Users interact
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
International Journal of Advanced Research in Computer Science and Software Engineering
Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach
AT&T Global Network Client for Windows Product Support Matrix January 29, 2015
AT&T Global Network Client for Windows Product Support Matrix January 29, 2015 Product Support Matrix Following is the Product Support Matrix for the AT&T Global Network Client. See the AT&T Global Network
Relational model. Relational model - practice. Relational Database Definitions 9/27/11. Relational model. Relational Database: Terminology
COS 597A: Principles of Database and Information Systems elational model elational model A formal (mathematical) model to represent objects (data/information), relationships between objects Constraints
II. PREVIOUS RELATED WORK
An extended rule framework for web forms: adding to metadata with custom rules to control appearance Atia M. Albhbah and Mick J. Ridley Abstract This paper proposes the use of rules that involve code to
Relational Algebra and SQL
Relational Algebra and SQL Johannes Gehrke [email protected] http://www.cs.cornell.edu/johannes Slides from Database Management Systems, 3 rd Edition, Ramakrishnan and Gehrke. Database Management
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
CIS 631 Database Management Systems Sample Final Exam
CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files
Caching XML Data on Mobile Web Clients
Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D-33102 Paderborn,
Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner
24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India
Modern Databases. Database Systems Lecture 18 Natasha Alechina
Modern Databases Database Systems Lecture 18 Natasha Alechina In This Lecture Distributed DBs Web-based DBs Object Oriented DBs Semistructured Data and XML Multimedia DBs For more information Connolly
Dependency Free Distributed Database Caching for Web Applications and Web Services
Dependency Free Distributed Database Caching for Web Applications and Web Services Hemant Kumar Mehta School of Computer Science and IT, Devi Ahilya University Indore, India Priyesh Kanungo Patel College
Translating SQL into the Relational Algebra
Translating SQL into the Relational Algebra Jan Van den Bussche Stijn Vansummeren Required background Before reading these notes, please ensure that you are familiar with (1) the relational data model
Normalization in OODB Design
Normalization in OODB Design Byung S. Lee Graduate Programs in Software University of St. Thomas St. Paul, Minnesota [email protected] Abstract When we design an object-oriented database schema, we need
Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys
Database Design Overview Conceptual Design. The Entity-Relationship (ER) Model CS430/630 Lecture 12 Conceptual design The Entity-Relationship (ER) Model, UML High-level, close to human thinking Semantic
Schema Mappings and Data Exchange
Schema Mappings and Data Exchange Phokion G. Kolaitis University of California, Santa Cruz & IBM Research-Almaden EASSLC 2012 Southwest University August 2012 1 Logic and Databases Extensive interaction
Load Balancing in Distributed Data Base and Distributed Computing System
Load Balancing in Distributed Data Base and Distributed Computing System Lovely Arya Research Scholar Dravidian University KUPPAM, ANDHRA PRADESH Abstract With a distributed system, data can be located
The Entity-Relationship Model
The Entity-Relationship Model 221 After completing this chapter, you should be able to explain the three phases of database design, Why are multiple phases useful? evaluate the significance of the Entity-Relationship
USING MYWEBSQL FIGURE 1: FIRST AUTHENTICATION LAYER (ENTER YOUR REGULAR SIMMONS USERNAME AND PASSWORD)
USING MYWEBSQL MyWebSQL is a database web administration tool that will be used during LIS 458 & CS 333. This document will provide the basic steps for you to become familiar with the application. 1. To
A Secure Mediator for Integrating Multiple Level Access Control Policies
A Secure Mediator for Integrating Multiple Level Access Control Policies Isabel F. Cruz Rigel Gjomemo Mirko Orsini ADVIS Lab Department of Computer Science University of Illinois at Chicago {ifc rgjomemo
On Rewriting and Answering Queries in OBDA Systems for Big Data (Short Paper)
On Rewriting and Answering ueries in OBDA Systems for Big Data (Short Paper) Diego Calvanese 2, Ian Horrocks 3, Ernesto Jimenez-Ruiz 3, Evgeny Kharlamov 3, Michael Meier 1 Mariano Rodriguez-Muro 2, Dmitriy
Constraint-Based XML Query Rewriting for Data Integration
Constraint-Based XML Query Rewriting for Data Integration Cong Yu Department of EECS, University of Michigan [email protected] Lucian Popa IBM Almaden Research Center [email protected] ABSTRACT
2. Basic Relational Data Model
2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that
Fundamentals of Database System
Fundamentals of Database System Chapter 4 Normalization Fundamentals of Database Systems (Chapter 4) Page 1 Introduction To Normalization In general, the goal of a relational database design is to generate
The core theory of relational databases. Bibliography
The core theory of relational databases Slide 1 La meilleure pratique... c est une bonne théorie Bibliography M.Levene, G.Loizou, Guided Tour of Relational Databases and Beyond, Springer, 625 pages,1999.
XML Data Integration Based on Content and Structure Similarity Using Keys
XML Data Integration Based on Content and Structure Similarity Using Keys Waraporn Viyanon 1, Sanjay K. Madria 1, and Sourav S. Bhowmick 2 1 Department of Computer Science, Missouri University of Science
DBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam
2009 IEEE International Advance Computing Conference (IACC 2009) Patiala, India, 6 7 March 2009 Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
Software quality improvement via pattern matching
Software quality improvement via pattern matching Radu Kopetz and Pierre-Etienne Moreau INRIA & LORIA {Radu.Kopetz, [email protected] Abstract. Nested if-then-else statements is the most common
Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University
Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional
The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1
The Relational Model Ramakrishnan&Gehrke, Chapter 3 CS4320 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. Legacy systems in older models
Data Integration. Maurizio Lenzerini. Universitá di Roma La Sapienza
Data Integration Maurizio Lenzerini Universitá di Roma La Sapienza DASI 06: Phd School on Data and Service Integration Bertinoro, December 11 15, 2006 M. Lenzerini Data Integration DASI 06 1 / 213 Structure
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
Portable Bushy Processing Trees for Join Queries
Reihe Informatik 11 / 1996 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard Wolfgang Scheufele Guido Moerkotte 1 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard
Semantic Errors in SQL Queries: A Quite Complete List
Semantic Errors in SQL Queries: A Quite Complete List Christian Goldberg, Stefan Brass Martin-Luther-Universität Halle-Wittenberg {goldberg,brass}@informatik.uni-halle.de Abstract We investigate classes
ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski [email protected]
Kamil Bajda-Pawlikowski [email protected] Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation
AVOIDANCE OF CYCLICAL REFERENCE OF FOREIGN KEYS IN DATA MODELING USING THE ENTITY-RELATIONSHIP MODEL
AVOIDANCE OF CYCLICAL REFERENCE OF FOREIGN KEYS IN DATA MODELING USING THE ENTITY-RELATIONSHIP MODEL Ben B. Kim, Seattle University, [email protected] ABSTRACT The entity-relationship (ER model is clearly
Keywords: Regression testing, database applications, and impact analysis. Abstract. 1 Introduction
Regression Testing of Database Applications Bassel Daou, Ramzi A. Haraty, Nash at Mansour Lebanese American University P.O. Box 13-5053 Beirut, Lebanon Email: rharaty, [email protected] Keywords: Regression
Integrating Pattern Mining in Relational Databases
Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
