Efficient Storage and Temporal Query Evaluation of Hierarchical Data Archiving Systems
|
|
- Ann Foster
- 8 years ago
- Views:
Transcription
1 Efficient Storage and Temporal Query Evaluation of Hierarchical Data Archiving Systems Hui (Wendy) Wang, Ruilin Liu Stevens Institute of Technology, New Jersey, USA Dimitri Theodoratos, Xiaoying Wu New Jersey Institute of Technology, New, Jersey, USA
2 Scientific Data in XML extensible Markup Language (XML): A mark-up language XML for scientific data Describe data structure and give simple processing instructions (e.g., XSIL and XDMF) Provide common data format that is an open, universal standard (e.g, CML, MathML, GML) Swiss-Prot Dataset 2
3 Updates on Scientific Data in XML Scientific databases are continuously updated Necessity of maintaining all versions in the archive Problem: as increasing volumes of these data being accumulated, the archives can reach a critical mass. 3
4 An Example of Multiple XML Dataset Versions Four consecutive instances of an extract of the Swiss-Prot Dataset 4
5 Challenges (1) how to store successive versions of XML databases in an archiving database in a cost-effective way. (2) how to evaluate queries efficiently on the archiving database. 5
6 Our Contributions A novel compact and updateable storage scheme for XML archiving databases. A simple yet expressive query language for XML archiving databases. Optimization of query evaluation over the compact storage. 6
7 Outline Preliminaries Compact storage Temporal query language Query optimization Experiments Conclusion 7
8 Compact Storage Scheme (I) Merge versions Store multiple occurrences of the same node only once in the archiving database A. Each node p in A is associated with a timestamp set which contains the timestamps of the instances of p. 8
9 An Example of Merging root 0 Entry 1 Species 2 Features 3 Descr 4 Rattus norvegicus Domain 5 Descr 6 100KDA protein Versions root 0 Entry 1 Species 2 Features 3 Descr 4 Ref 7 Domain 5 Author 8 Rattus norvegicus 100KDA protein PRO-RICH Version 1 Version 2 root 0 [1-2] [1-2] Entry 1 Species 2 Features 3 Descr 4 Ref 7 [1-2] [1-2] [1-2] [2] Rattus norvegicus Domain 5 [1-2] Descr 6 [1] PRO-RICH 100KDA protein Author 8 [2] Nene V Nene V Monotone property on timestamps of the nodes on the same path 9
10 Compact Storage Scheme (II) Timestamp set compaction The timestamp label of the parent node only preserves the timestamps that do not appear on any of its children. Lp = ts(p) cchild(p) ts(c) 10
11 An Example of Timestamp Set root 0 Compaction root 0 Entry 1 Species 2 Features 3 Descr 4 Rattus norvegicus Domain 5 Descr 6 100KDA protein Entry 1 Species 2 Features 3 Descr 4 Ref 7 Domain 5 Author 8 Rattus norvegicus 100KDA protein PRO-RICH Version 1 Version 2 Nene V root 0 [1-2] [1-2] Entry 1 Species 2 Features 3 Descr 4 Ref 7 [1-2] [1-2] [1-2] [1-2] [1-2] [2] Domain 5 Author 8 Rattus norvegicus [1-2] [2] 100KDA protein Descr [2] [2] 6 Nene V [1] [1] PRO-RICH 11
12 An Example of Compact root 0 Entry 1 Storage root 0 Entry 1 Species 2 Features 3 Descr 4 Rattus norvegicus Domain 5 Descr 6 PRO-RICH 100KDA protein Species 2 Features 3 Descr 4 Ref 7 Domain 5 Author 8 Rattus norvegicus 100KDA protein Nene V root 0 Entry 1 Species 2 Features 3 Descr 4 Ref 7 [1-2] [1-2] Domain 5 Author [2] 8 100KDA protein [2] Descr 6 Nene V [1] Rattus norvegicus PRO-RICH 12
13 Efficient Updates Incremental timestamp label computation When insert a new instance D k at timestamp k, root 0 add the newly inserted trees T 1,..., T m in D k to the archive, For leaf nodes in D k, update the timestamp labels of their corresponding archive nodes by adding timestamp k. root 0 Entry 1 Entry 1 Features 3 Descr 9 Domain 5 Version 3 Species 2 Features 3 Descr 4 Ref 7 [1-2] [1-2] Domain 5 100KDA Author [2] 8 protein [2] Descr 6 Nene V [1] Rattus norvegicus PRO-RICH Archiving Dataset before Update Descr 9 13
14 Efficient Updates Incremental timestamp label computation When insert a new instance D k at timestamp k, root 0 add the newly inserted trees T 1,..., T m in D k to the archive, update the archive nodes of the leaf nodes in D k root by adding the timestamp k. 0 Entry 1 Entry 1 Species 2 Features 3 Descr 4 [1-2] [1-2] Ref 7 Descr 9 [3] Features 3 Descr 9 Domain 5 Version 3 Rattus norvegicus Domain 5 [2-3] Descr 6 [1] PRO-RICH 100KDA protein Author 8 [2] Nene V Archiving Dataset after Update 14
15 Comparison with Related Work Compact storage [1] (Top-down approach) Remove timestamps at children if they are the same as those of the parent node Compared with our solution (bottom-up approach) w.r.t. # of updated timestamp labels when inserting a new instance D k into the archiving database A TD: # of nodes in D k whose corresponding nodes in A have timestamp labels + new nodes in D k BU: # of leaf nodes in D k 15
16 An Example of TD V.S. BU root 0 [1-4] Entry 1 Species[2-3] 2 Features 3 Descr [2-4] 4 [3-4] Ref 7 [2-4] Rattus norvegicus Domain 5 [3-4] Descr [4] 6 PRO-RICH 100KDA protein Author 8 [2] Nene V TD approach: Archiving Dataset before Update root 0 Entry 1 Species 2 Features 3 Descr 4 Ref 7 [2-3] [2] [3-4] [3-4] Domain 5 100KDA Author [3] 8 protein [2] Descr 6 Nene V [4] Rattus norvegicus PRO-RICH BU approach: Archiving Dataset before Update 16
17 An Example of TD V.S. BU root 0 [1-5] Entry 1 Species[2-3] 2 Features 3 Descr [2-5] 4 [3-4] Ref 7 [2-4] root 0 Entry 1 Features 3 Domain 5 Descr 6 Version 5 Rattus norvegicus Domain 5 [3-5] Descr [4-5] 6 PRO-RICH root 0 Entry 1 100KDA protein Author 8 [2] Nene V TD approach: Archiving Dataset after Update Species 2 Features 3 Descr 4 Ref 7 [2-3] [2] [3-4] [3-4] Domain 5 100KDA Author [3] 8 protein [2] Descr 6 [4-5] Rattus norvegicus PRO-RICH Nene V BU approach: Archiving Dataset after Update 17
18 Outline Compact storage Temporal query language Query optimization Experiments Conclusion 18
19 Temporal Query Language Types: Snapshot history trace Temporal constraints: includes(t), overlaps(t a, t b ), before(t)/after(t), contains(t a, t b )/is_contained(t a, t b ), meets(t a, t b ) Temporal queries: XML structural queries + temporal constraints on query nodes 19
20 Archivin g databas e Evaluation of Temporal Constraints over Compressed Timestamps root 0 [1-2] Entry 1 Species 2 Features 3 Descr 4 Ref 7 [1-2] [1-2] [1-2] Domain 5 [1-2][2] Author 8 100KDA protein [2] Descr[1] 6 Rattus norvegicus Nene V Answer Descr 4 PRO-RICH 100KDA protein root Query Entry include(2) Domain overlaps(1,3) Descr* contain(1,2) 20
21 Temporal Evaluation Annotations DC (Descendant Check) LC (Local Check) NC (No Check) Archive root 0 [1-2] Entry 1 Species 2 Features 3 Descr 4 Ref 7 [1-2] [1-2] [1-2] Domain 5 [1-2][2] Author 8 100KDA protein [2] Descr 6 Nene V [1] Rattus norvegicus PRO-RICH Query root Entry DC include(2) LC Domain Descr* overlaps(1,3) contain(1,2) 21
22 Cost Model Cost model of temporal evaluation annotations where T q is the number of nodes of the tree rooted at q in a query. root 0 Query Domain Entry overlaps(1,3) [1-2] Entry 1 root include(2) Descr* contain(1,2) NC(0) DC(2) LC(1) Archive Species 2 Features 3 Descr 4 Ref 7 [1-2] [1-2] [1-2] Domain 5 [1-2][2] Author 8 100KDA protein [2] Descr 6 [1] Rattus norvegicus PRO-RICH Nene V 22
23 Outline Preliminaries Compact storage Temporal query language Query optimization Experiments Conclusion 23
24 Optimization Problem DC is expensive (recursive check) DC can be replaced by LC/NC in some cases Goal: Replace as many DCs as possible with LCs/NCs 24
25 An Example of Optimization root Entry include(2) DC Descr* contain(1,4) DC Query Q root Database Schema Entry include(2) DC NC Descr* contain(1,4) DC LC After optimization 25
26 Inference Rules We use inference rules to find redundant temporal annotations Inference rules: P 1,, P k R if the premises P 1,, P k are true, then the conclusion R is also true. Types of inference rules Without database schema With presence of database schema 26
27 Inference Rules: No Database Schema AD(ancestor-descendant) Rule: Q = p//q q t p TR(transitivity) Rule: p t q and q t r p t r 27
28 Inference Rules: with Database Schema SP(SinglePath) Rule: Q = p//q, SinglePath(p, q ) p = t q DC(descendant) Rule: Q = p//q, Q = p//r, SinglePath(p, r ) q t r DE(derived) Rule: Q = p//q, Q = p//r, SinglePath(p, r ), SinglePath(q, r ) r t q 28
29 Temporal Constraint Graph Query root Temporal constraint graph Entry Features include(2) overlaps(1,3) contain(1,4) Descr* contain(3,4) Species An edge from p to q indicates an inferred p t q relationship 29
30 Temporal Constraint Consumption We check temporal constraint consumption on the temporal constraint graph Consuming temporal constraint on p Consumed temporal constraint on q include(t) p t q includes(t) overlaps(t 1, t 2 ), t [t 1, t 2 ] contains(t 1, t 2 ) includes(t 3 ), t 3 [t 1, t 2 ] contains(t 3, t 4 ), t 3 t 1, t 4 t 2 overlaps(t 3, t 4 ), t 1 < t 3 < t 2 or t 1 < t 4 < t 2 q t p is_contained(t 1, t 2 ) is_contained(t 3, t 4 ), t 3 t 1, t 4 t 2 before(t) after(t) before(t 1 ), t 1 t after(t 1 ), t 1 t q = t p meets(t 1, t 2 ) meets(t 1, t 2 ) 30
31 An Example of Query Optimization Consuming temporal constraint on p (Descr) contains(1, 4) Consumed temporal constraint on q (Entry) Descr t Entry includes(2) root Entry include(2) DC Features overlaps(1,3) DC NC Descr* contain(1,4) DC Species contain(3,4) DC LC LC NC 31
32 Outline Preliminaries Compact storage Temporal query language Query optimization Experiments Conclusion 32
33 Experiment Setup Hardware Intel Core 2 CPU 2.40 GHz processor, 4.00 GB of RAM Software OS: Windows 7 The algorithms were implemented in Java JDOM engine: parse the XML databases Wutka DTD parser: parse the XML DTD Oracle Berkeley DB XML engine: query evaluation 33
34 Datasets Synthetic dataset using the IBM XML generator on the DTD of the XMark benchmark Real dataset Treebank dataset Dataset Size # of elements Max. depth Avg. depth Treebank 22.3MB Xmark 14.6MB
35 Versions For Xmark and Treebank datasets, we created 50 and 20 consecutive database instances respectively. Each instance was generated from the previous one by first deleting and then inserting (sub)trees. The (inserted and deleted) trees take 10% of the nodes of the database instance. 35
36 Experiment Three storage approaches: The naive approach (NA): keeps the timestamp sets as un-compacted The top-down (TD) approach ([1]): eliminates the timestamps of the children nodes that are identical to the parent. Our bottom-up (BU) approach: eliminates the timestamps from the parent nodes that are repeated on children. 36
37 Experiment: Archiving Archiving Time Overhead Compaction ratio Dataset Top-down Bottom-up XMark (shallow&fat) XMark (deep&thin) 2.15% 2.72% 3.09% 2.75% Total number of timestamps (space overhead) 37
38 Experiment: Update Cost Summary of archiving overhead Both TD and BU can reduce the number of timestamps in the archive. The difference between TD and BU regarding the number of timetstamps is not significant. BU always has much better update cost than TD. 38
39 Experiment: Query Optimization Temporal Constraint Evaluation Optimization Our optimization can bring significant performance improvement with negligible overhead 39
40 Outline Preliminaries Compact storage Temporal query language Query optimization Experiments Conclusion 40
41 Conclusion We proposed an efficient XML archiving database system that consists of A novel compact and updateable storage scheme A simple yet expressive query language for XML archiving databases Optimization of query evaluation over the compact storage 41
42 Future Work Consider additional temporal constraints Consider unrestricted database schemas that may contain cycles Design efficient optimization algorithms that can work in this broader framework 42
43 Thank you! 43
44 Reference [1] P. Buneman, S. Khanna, K. Tajima, and W.-C. Tan. Archiving scientific data. In ACM Transactions on Database Systems, [2] A. P. Chapman, H. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD, [3] S. Chawathe and H. Garcia-molina. Meaningful change detection in structured data. In SIGMOD, [4] P. T. Jayant and J. R. Haritsa. Xgrind: A query-friendly xml compressor. In ICDE, [5] H. Liefke and D. Suciu. XMill: an efficient compressor for XML data. In SIGMOD, [6] H. M uller, P. Buneman, and I. Koltsidas. Xarch: Archiving scientific and reference data. In SIGMOD, [7] F. Rizzolo and A. A. Vaisman. Temporal xml: modeling, indexing, and query processing. The VLDB Journal, 17: , August [8] F. Wang and C. Zaniolo. Temporal queries in XML document archives and web warehouses. In TIME-ICTL, [9] F. Wang and C. Zaniolo. Temporal queries and version management in XML-based docu-ment archives. Data Knowl. Eng., 65: , May
45 Outline Preliminaries Compact storage Temporal query language Query optimization Experiments Conclusion 45
46 XML database XML database: tree-structured, ID-based. 46
47 Archiving XML database XML instances: XML database at certain time point Each instance is associate with a timestamp (version number) Archiving Database: multiple instances are merged into one database 47
48 Updating XML database Update operations: insertion deletion A sequence of update operations can be modeled as a set of deletions followed by a set of insertions. root 0 root 0 Entry 1 Entry 1 Species 2 Features 3 Descr 4 Species 2 Features 3 Descr 4 Ref 7 Domain 5 Rattus norvegicus 100KDA protein Descr 6 Domain 5 Rattus norvegicus 100KDA protein Descr 6 Author 8 Nene V PRO-RICH PRO-RICH 48
49 A Piece of Related Work P. Buneman, et al. Archiving scientific data. TODS, Solution: Merge instances Remove timestamps if same as parent node (Top-down) Weakness: Inefficient updates Inefficient query 49
50 Witness Graph Definition Given a query Q, a witness graph for Q is a graph WQ such that a)the nodes of W Q correspond to the nodes of Q b)there is an edge from node p to node q in W Q, iff p is a witness node of q in Q. Example Construct the witness graph from the temporal constraint graph after temporal annotation consumption (next page) 50
Archiving Scientific Data - A Practical Approach
Archiving Scientific Data PETER BUNEMAN University of Edinburgh SANJEEV KHANNA University of Pennsylvania KEISHI TAJIMA Japan Advanced Institute of Science and Technology and WANG-CHIEW TAN University
More informationThe Database Wiki Project: A General-Purpose Platform for Data Curation and Collaboration
The Database Wiki Project: A General-Purpose Platform for Data Curation and Collaboration Peter Buneman, James Cheney, Sam Lindley School of Informatics University of Edinburgh Edinburgh, United Kingdom
More informationCaching XML Data on Mobile Web Clients
Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D-33102 Paderborn,
More informationDatabase Technologies
Database Technologies Bachelor and Master Projects XML Databases Database & Information Systems Group Christian Grün Introduction XML just small files why databases? library of U (800 MB) genetic data
More informationEchidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
More informationSorting Hierarchical Data in External Memory for Archiving
Sorting Hierarchical Data in External Memory for Archiving Ioannis Koltsidas School of Informatics University of Edinburgh i.koltsidas@sms.ed.ac.uk Heiko Müller School of Informatics University of Edinburgh
More informationCreating Synthetic Temporal Document Collections for Web Archive Benchmarking
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In
More informationAn Efficient Algorithm for Web Page Change Detection
An Efficient Algorithm for Web Page Change Detection Srishti Goel Department of Computer Sc. & Engg. Thapar University, Patiala (INDIA) Rinkle Rani Aggarwal Department of Computer Sc. & Engg. Thapar University,
More informationModeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs
Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs Lipyeow Lim, Haixun Wang, and Min Wang IBM T. J. Watson Research Center {liplim,haixun,min}@us.ibm.com Abstract. Data in many industrial
More informationBinary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi
More informationXML Data Integration Based on Content and Structure Similarity Using Keys
XML Data Integration Based on Content and Structure Similarity Using Keys Waraporn Viyanon 1, Sanjay K. Madria 1, and Sourav S. Bhowmick 2 1 Department of Computer Science, Missouri University of Science
More informationQuickDB Yet YetAnother Database Management System?
QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
More informationResearch Problems in Data Provenance
Research Problems in Data Provenance Wang-Chiew Tan University of California, Santa Cruz Email: wctan@cs.ucsc.edu Abstract The problem of tracing the provenance (also known as lineage) of data is an ubiquitous
More informationA Model For Revelation Of Data Leakage In Data Distribution
A Model For Revelation Of Data Leakage In Data Distribution Saranya.R Assistant Professor, Department Of Computer Science and Engineering Lord Jegannath college of Engineering and Technology Nagercoil,
More informationTechnologies for a CERIF XML based CRIS
Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the
More informationA MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
More informationManaging large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT
More informationSharing large data collections between mobile peers
Sharing large data collections between mobile peers Brian Tripney, Christopher Foley, Richard Gourlay, John Wilson Department of Computer and Information Sciences University of Strathclyde Glasgow, G1
More informationEfficient Mapping XML DTD to Relational Database
Efficient Mapping XML DTD to Relational Database Mohammed Adam Ibrahim Fakharaldien 1, Khalid Edris 2, Jasni Mohamed Zain 3, Norrozila Sulaiman 4 Faculty of Computer System and Software Engineering,University
More informationEnhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University
Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional
More informationXML DATA INTEGRATION SYSTEM
XML DATA INTEGRATION SYSTEM Abdelsalam Almarimi The Higher Institute of Electronics Engineering Baniwalid, Libya Belgasem_2000@Yahoo.com ABSRACT This paper describes a proposal for a system for XML data
More informationState History Storage in Disk-based Interval Trees
State History Storage in Disk-based Interval Trees Alexandre Montplaisir June 29, 2010 École Polytechnique de Montréal Content Introduction : The concept of State The current method : Checkpoints The proposed
More informationA Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
More informationXStruct: Efficient Schema Extraction from Multiple and Large XML Documents
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents Jan Hegewald, Felix Naumann, Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, 10099 Berlin {hegewald,naumann,mweis}@informatik.hu-berlin.de
More informationPushing XML Main Memory Databases to their Limits
Pushing XML Main Memory Databases to their Limits Christian Grün Database & Information Systems Group University of Konstanz, Germany christian.gruen@uni-konstanz.de The we distribution of XML documents
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationSummary of Alma-OSF s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013
Summary of Alma-OSF s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel Peña ALMA Integrated Computing Team Coordination
More informationEfficient XML-to-SQL Query Translation: Where to Add the Intelligence?
Efficient XML-to-SQL Query Translation: Where to Add the Intelligence? Rajasekar Krishnamurthy Raghav Kaushik Jeffrey F Naughton IBM Almaden Research Center Microsoft Research University of Wisconsin-Madison
More informationChing-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
More informationGraph Database Performance: An Oracle Perspective
Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective
More informationHow To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
More informationABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu
Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation
More informationKEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
More information! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationHadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010
Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More
More informationCol*Fusion: Not Just Jet Another Data Repository
Col*Fusion: Not Just Jet Another Data Repository Evgeny Karataev 1 and Vladimir Zadorozhny 1 1 School of Information Sciences, University of Pittsburgh Abstract In this poster we introduce Col*Fusion a
More informationDuplicate Detection Algorithm In Hierarchical Data Using Efficient And Effective Network Pruning Algorithm: Survey
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 12 December 2014, Page No. 9766-9773 Duplicate Detection Algorithm In Hierarchical Data Using Efficient
More informationReport Paper: MatLab/Database Connectivity
Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been
More informationXRecursive: An Efficient Method to Store and Query XML Documents
XRecursive: An Efficient Method to Store and Query XML Documents Mohammed Adam Ibrahim Fakharaldien, Jasni Mohamed Zain, Norrozila Sulaiman Faculty of Computer System and Software Engineering, University
More informationOriginal-page small file oriented EXT3 file storage system
Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn
More informationDeferred node-copying scheme for XQuery processors
Deferred node-copying scheme for XQuery processors Jan Kurš and Jan Vraný Software Engineering Group, FIT ČVUT, Kolejn 550/2, 160 00, Prague, Czech Republic kurs.jan@post.cz, jan.vrany@fit.cvut.cz Abstract.
More informationStoring and Querying XML Data using an RDMBS
Storing and Querying XML Data using an RDMBS Daniela Florescu INRIA, Roquencourt daniela.florescu@inria.fr Donald Kossmann University of Passau kossmann@db.fmi.uni-passau.de 1 Introduction XML is rapidly
More informationON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH
ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH Hagen Höpfner and Jörg Schad and Essam Mansour International University Bruchsal, Campus
More informationAutomatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
More informationElastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud
Elastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud Data Warehouse and Business Intelligence Architect Credit Suisse, Zurich Joint research between Credit Suisse and ETH Zurich:
More informationCSE 326: Data Structures B-Trees and B+ Trees
Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is
More informationStatic Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
More informationInferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy
Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy Mohammad Rezwanul Huq, Andreas Wombacher, and Peter M.G. Apers University of Twente, 7500 AE Enschede,
More informationWhite Paper. Better Performance, Lower Costs. The Advantages of IBM PowerLinux 7R2 with PowerVM versus HP DL380p G8 with vsphere 5.
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 White Paper Better Performance, Lower Costs The Advantages of IBM PowerLinux 7R2 with PowerVM versus HP DL380p G8 with vsphere
More informationBig Data Provenance: Challenges and Implications for Benchmarking
Big Data Provenance: Challenges and Implications for Benchmarking Boris Glavic Illinois Institute of Technology 10 W 31st Street, Chicago, IL 60615, USA glavic@iit.edu Abstract. Data Provenance is information
More informationDBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
More informationAg + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments
Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Yaokai Feng a, Akifumi Makinouchi b a Faculty of Information Science and Electrical Engineering, Kyushu University,
More informationDevelopment of Monitoring and Analysis Tools for the Huawei Cloud Storage
Development of Monitoring and Analysis Tools for the Huawei Cloud Storage September 2014 Author: Veronia Bahaa Supervisors: Maria Arsuaga-Rios Seppo S. Heikkila CERN openlab Summer Student Report 2014
More informationJoin Minimization in XML-to-SQL Translation: An Algebraic Approach
Join Minimization in XML-to-SQL Translation: An Algebraic Approach Murali Mani Song Wang Daniel J. Dougherty Elke A. Rundensteiner Computer Science Dept, WPI {mmani,songwang,dd,rundenst}@cs.wpi.edu Abstract
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationChange Manager 5.0 Installation Guide
Change Manager 5.0 Installation Guide Copyright 1994-2008 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco, CA 94111 U.S.A. All rights reserved.
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationGRAPH PATTERN MINING: A SURVEY OF ISSUES AND APPROACHES
International Journal of Information Technology and Knowledge Management July-December 2012, Volume 5, No. 2, pp. 401-407 GRAPH PATTERN MINING: A SURVEY OF ISSUES AND APPROACHES B. Bhargavi 1 and K.P.
More informationArcGIS for Server Performance and Scalability: Testing Methodologies. Andrew Sakowicz, asakowicz@esri.com Frank Pizzi, fpizzi@esri.
ArcGIS for Server Performance and Scalability: Testing Methodologies Andrew Sakowicz, asakowicz@esri.com Frank Pizzi, fpizzi@esri.com Introductions Target audience - GIS, DB, System administrators - Testers
More informationQuiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?
Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090
More informationPERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0
PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. achrosny@treeage.com Andrew Munzer Director, Training and Customer
More informationNovel Data Extraction Language for Structured Log Analysis
Novel Data Extraction Language for Structured Log Analysis P.W.D.C. Jayathilake 99X Technology, Sri Lanka. ABSTRACT This paper presents the implementation of a new log data extraction language. Theoretical
More informationCHAPTER 3 PROPOSED SCHEME
79 CHAPTER 3 PROPOSED SCHEME In an interactive environment, there is a need to look at the information sharing amongst various information systems (For E.g. Banking, Military Services and Health care).
More informationBranch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
More informationImage Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
More informationConcurrency Control. Chapter 17. Comp 521 Files and Databases Fall 2010 1
Concurrency Control Chapter 17 Comp 521 Files and Databases Fall 2010 1 Conflict Serializable Schedules Recall conflicts (WR, RW, WW) were the cause of sequential inconsistency Two schedules are conflict
More informationInformation Discovery on Electronic Medical Records
Information Discovery on Electronic Medical Records Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, MD Anthony F. Rossi, MD Jeffrey A. White, FIU FIU Miami Children s Hospital Miami Children s Hospital
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationGeneral Purpose Database Summarization
Table of Content General Purpose Database Summarization A web service architecture for on-line database summarization Régis Saint-Paul (speaker), Guillaume Raschia, Noureddine Mouaddib LINA - Polytech
More informationEfficient Structure Oriented Storage of XML Documents Using ORDBMS
Efficient Structure Oriented Storage of XML Documents Using ORDBMS Alexander Kuckelberg 1 and Ralph Krieger 2 1 Chair of Railway Studies and Transport Economics, RWTH Aachen Mies-van-der-Rohe-Str. 1, D-52056
More informationEnergy Efficiency in Secure and Dynamic Cloud Storage
Energy Efficiency in Secure and Dynamic Cloud Storage Adilet Kachkeev Ertem Esiner Alptekin Küpçü Öznur Özkasap Koç University Department of Computer Science and Engineering, İstanbul, Turkey {akachkeev,eesiner,akupcu,oozkasap}@ku.edu.tr
More informationIntroduction to XML Applications
EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for
More informationUnraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation
Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Rajasekar Krishnamurthy University of Wisconsin sekar@cs.wisc.edu Raghav Kaushik Microsoft Corporation skaushi@microsoft.com
More informationWhat is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World
COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically related data for
More informationFirewall Design: Consistency, Completeness, Compactness
Firewall Design: Consistency, Completeness, Compactness Alex X. Liu alex@cs.utexas.edu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188, U.S.A. March, 2004 Co-author:
More informationBinary Image Scanning Algorithm for Cane Segmentation
Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch ricardo.castanedamarin@pg.canterbury.ac.nz Tom
More informationProcess Mining by Measuring Process Block Similarity
Process Mining by Measuring Process Block Similarity Joonsoo Bae, James Caverlee 2, Ling Liu 2, Bill Rouse 2, Hua Yan 2 Dept of Industrial & Sys Eng, Chonbuk National Univ, South Korea jsbae@chonbukackr
More informationFast Contextual Preference Scoring of Database Tuples
Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation
More informationSupporting Database Provenance under Schema Evolution
Supporting Database Provenance under Schema Evolution Shi Gao and Carlo Zaniolo University of California, Los Angeles {gaoshi, zaniolo}@cs.ucla.edu Abstract. Database schema upgrades are common in modern
More informationImproving Query Performance Using Materialized XML Views: A Learning-Based Approach
Improving Query Performance Using Materialized XML Views: A Learning-Based Approach Ashish Shah and Rada Chirkova Department of Computer Science North Carolina State University Campus Box 7535, Raleigh
More informationThe Planets Preservation Planning workflow and the planning tool Plato
The Planets Preservation Planning workflow and the planning tool Plato Hannes Kulovits Vienna University of Technology http://www.ifs.tuwien.ac.at/~kulovits Outline Preservation Planning Evaluation of
More informationSupporting Ontology-based Keyword Search over Medical Databases
Supporting Ontology-based Keyword Search over Medical Databases Anastasios Kementsietsidis, Ph.D. Lipyeow Lim, Ph.D. Min Wang, Ph.D. IBM T.J. Watson Research Center, Skyline Drive, Hawthorne, NY, USA.
More informationDatabase Design Patterns. Winter 2006-2007 Lecture 24
Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each
More informationData Management in RFID Applications
Data Management in RFID Applications Dan Lin 1, Hicham G. Elmongui 1,, Elisa Bertino 1, and Beng Chin Ooi 2 1 Department of Computer Science, Purdue University, USA {lindan, elmongui, bertino}@cs.purdue.edu
More informationDependency Free Distributed Database Caching for Web Applications and Web Services
Dependency Free Distributed Database Caching for Web Applications and Web Services Hemant Kumar Mehta School of Computer Science and IT, Devi Ahilya University Indore, India Priyesh Kanungo Patel College
More informationHow To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)
Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2
More informationThree Effective Top-Down Clustering Algorithms for Location Database Systems
Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr
More informationA Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
More informationOn Mining Group Patterns of Mobile Users
On Mining Group Patterns of Mobile Users Yida Wang 1, Ee-Peng Lim 1, and San-Yih Hwang 2 1 Centre for Advanced Information Systems, School of Computer Engineering Nanyang Technological University, Singapore
More informationConstraint Preserving XML Storage in Relations
Constraint Preserving XML Storage in Relations Yi Chen, Susan B. Davidson and Yifeng Zheng Ôغ Ó ÓÑÔÙØ Ö Ò ÁÒ ÓÖÑ Ø ÓÒ Ë Ò ÍÒ Ú Ö ØÝ Ó È ÒÒ ÝÐÚ Ò yicn@saul.cis.upenn.edu susan@cis.upenn.edu yifeng@saul.cis.upenn.edu
More informationAn Efficient and Scalable Management of Ontology
An Efficient and Scalable Management of Ontology Myung-Jae Park 1, Jihyun Lee 1, Chun-Hee Lee 1, Jiexi Lin 1, Olivier Serres 2, and Chin-Wan Chung 1 1 Korea Advanced Institute of Science and Technology,
More informationPersistent Binary Search Trees
Persistent Binary Search Trees Datastructures, UvA. May 30, 2008 0440949, Andreas van Cranenburgh Abstract A persistent binary tree allows access to all previous versions of the tree. This paper presents
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationProtein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)
APID (Agile Protein Interaction DataAnalyzer) 23 APID (Agile Protein Interaction DataAnalyzer) Integrates and unifies 7 DBs: BIND, DIP, HPRD, IntAct, MINT, BioGRID. Includes 51,873 proteins 241,204 interactions
More informationMerkle Hash Tree based Techniques for Data Integrity of Outsourced Data
Merkle Hash Tree based Techniques for Data Integrity of Outsourced Data ABSTRACT Muhammad Saqib Niaz Dept. of Computer Science Otto von Guericke University Magdeburg, Germany saqib@iti.cs.uni-magdeburg.de
More informationLearning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees
Learning Outcomes COMP202 Complexity of Algorithms Binary Search Trees and Other Search Trees [See relevant sections in chapters 2 and 3 in Goodrich and Tamassia.] At the conclusion of this set of lecture
More informationEfficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
More informationXML Fragment Caching for Small Mobile Internet Devices
XML Fragment Caching for Small Mobile Internet Devices Stefan Böttcher, Adelhard Türling University of Paderborn Fachbereich 17 (Mathematik-Informatik) Fürstenallee 11, D-33102 Paderborn, Germany email
More information