Probabilistic XML: A Data Model for the Web

Size: px
Start display at page:

Download "Probabilistic XML: A Data Model for the Web"

Transcription

1 Probabilistic XML: A Data Model for the Web Pierre Senellart DBWeb Habilitation à diriger les recherches, 13 juin 2012

2 An Uncertain World

3 Outline An Uncertain World Uncertainty is Everywhere Probabilistic Databases Probabilistic XML Applications Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web 3 / 34 Habilitation Pierre Senellart DBWeb

4 Uncertain data Numerous sources of uncertain data: Measurement errors Data integration from contradicting sources Imprecise mappings between heterogeneous schemata Imprecise automatic process (information extraction, natural language processing, graph mining, etc.) Imperfect human judgment Lies, opinions, rumors 4 / 34 Habilitation Pierre Senellart DBWeb

5 Uncertain data Numerous sources of uncertain data: Measurement errors Data integration from contradicting sources Imprecise mappings between heterogeneous schemata Imprecise automatic process (information extraction, natural language processing, graph mining, etc.) Imperfect human judgment Lies, opinions, rumors 4 / 34 Habilitation Pierre Senellart DBWeb

6 Use case: Web information extraction Never-ending Language Learning (NELL, CMU), 5 / 34 Habilitation Pierre Senellart DBWeb

7 Use case: Web information extraction Google Squared (terminated), screenshot from [Fink et al., 2011] 5 / 34 Habilitation Pierre Senellart DBWeb

8 Use case: Web information extraction Subject Predicate Object Confidence Elvis Presley diedondate % Elvis Presley ismarriedto Priscilla Presley 97.29% Elvis Presley influences Carlo Wolff 96.25% YAGO, [Suchanek et al., 2007] 5 / 34 Habilitation Pierre Senellart DBWeb

9 Uncertainty in Web information extraction The information extraction system is imprecise The system has some confidence in the information extracted, which can be: a probability of the information being true (e.g., conditional random fields) an ad-hoc numeric confidence score a rough discrete level of confidence (low, medium, high) What if this uncertain information is not seen as something final, but is used as a source of, e.g., a query answering system? 6 / 34 Habilitation Pierre Senellart DBWeb

10 Outline An Uncertain World Uncertainty is Everywhere Probabilistic Databases Probabilistic XML Applications Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web 7 / 34 Habilitation Pierre Senellart DBWeb

11 Managing uncertainty Objective Not to pretend this imprecision does not exist, and manage it as rigorously as possible throughout a long, automatic and human, potentially complex, process Especially: Represent different forms of uncertainty Probabilities are used to measure uncertainty in the data Query data and retrieve uncertain results Allow adding, deleting, modifying data in an uncertain way 8 / 34 Habilitation Pierre Senellart DBWeb

12 Managing uncertainty Objective Not to pretend this imprecision does not exist, and manage it as rigorously as possible throughout a long, automatic and human, potentially complex, process Especially: Represent different forms of uncertainty Probabilities are used to measure uncertainty in the data Query data and retrieve uncertain results Allow adding, deleting, modifying data in an uncertain way 8 / 34 Habilitation Pierre Senellart DBWeb

13 Why probabilities? Not the only option: fuzzy set theory [Galindo et al., 2005], Dempster-Shafer theory [Zadeh, 1986] Mathematically rich theory, nice semantics with respect to traditional database operations (e.g., joins) Some applications already generate probabilities (e.g., statistical information extraction or natural language probabilities) Naturally arising in case of conflicting information, based on the trust in the sources In other cases, we cheat and pretend that (normalized) confidence scores are probabilities: see this as a first-order approximation 9 / 34 Habilitation Pierre Senellart DBWeb

14 Probabilistic relational DBMSs Numerous studies on the modeling and querying of probabilistic relations: block-independent disjoint (BIDs) databases [Dalvi et al., 2009] tuples either mutually exclusive or independent, depending on key values probabilistic c-tables [Imieliński and Lipski, 1984, Green and Tannen, 2006, Senellart, 2007], tuples annotated with arbitrary probabilistic formulas Two main probabilistic relational DBMS: Trio [Widom, 2005] Various uncertainty operators: unknown value, uncertain tuple, choice between different possible values, with probabilistic annotations MayBMS [Koch, 2009] Uncertain tables can be constructed using a REPAIR-KEY operator, similar to BIDs, and more elaborate correlations can result from querying 10 / 34 Habilitation Pierre Senellart DBWeb

15 Probabilistic relational DBMSs Numerous studies on the modeling and querying of probabilistic relations: block-independent disjoint (BIDs) databases [Dalvi et al., 2009] tuples either mutually exclusive or independent, depending on key values probabilistic c-tables [Imieliński and Lipski, 1984, Green and Tannen, 2006, Senellart, 2007], tuples annotated with arbitrary probabilistic formulas Two main probabilistic relational DBMS: Trio [Widom, 2005] Various uncertainty operators: unknown value, uncertain tuple, choice between different possible values, with probabilistic annotations MayBMS [Koch, 2009] Uncertain tables can be constructed using a REPAIR-KEY operator, similar to BIDs, and more elaborate correlations can result from querying 10 / 34 Habilitation Pierre Senellart DBWeb

16 Probabilistic relational DBMSs Numerous studies on the modeling and querying of probabilistic relations: block-independent disjoint (BIDs) databases [Dalvi et al., 2009] tuples either mutually exclusive or independent, depending on key values probabilistic c-tables [Imieliński and Lipski, 1984, Green and Tannen, 2006, Senellart, 2007], tuples annotated with arbitrary probabilistic formulas Two main probabilistic relational DBMS: Trio [Widom, 2005] Various uncertainty operators: unknown value, uncertain tuple, choice between different possible values, with probabilistic annotations MayBMS [Koch, 2009] Uncertain tables can be constructed using a REPAIR-KEY operator, similar to BIDs, and more elaborate correlations can result from querying 10 / 34 Habilitation Pierre Senellart DBWeb

17 Outline An Uncertain World Uncertainty is Everywhere Probabilistic Databases Probabilistic XML Applications Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web 11 / 34 Habilitation Pierre Senellart DBWeb

18 Why probabilistic XML? Different typical querying languages: SQL and conjunctive queries vs XPath and tree-pattern queries (possibly with joins) Cases where a tree-like model might be appropriate: No schema or few constraints on the schema Independent modules annotating freely a content warehouse Inherently tree-like data (e.g., mailing lists, parse trees) with naturally occurring queries involving the descendant axis Remark Some results can be transferred from one model to the other. In other cases, connection much trickier! 12 / 34 Habilitation Pierre Senellart DBWeb

19 Web information extraction [Senellart et al., 2008] table / articles tr / article td / title td / authors token / title token / author #text #text Y 1 Y 2 Y 3 Y 4 Y 5 Y 8 Y 6 Y 7 Annotate HTML Web pages with possible labels Labels can be learned from a corpus of annotated documents Conditional random fields for XML: estimate probabilities of annotations given annotations of neighboring nodes Provides probabilistic labeling of Web pages 13 / 34 Habilitation Pierre Senellart DBWeb joint work with

20 Uncertain version control [Abdessalem et al., 2011, Ba et al., 2011] Use trees with probabilistic annotations to represent the uncertainty in the correctness of a document under open version control (e.g., Wikipedia articles) 14 / 34 Habilitation Pierre Senellart DBWeb joint work with

21 Probabilistic summaries of XML corpora [Abiteboul et al., 2012a,b] publish ppublish present ppresent q0 next 1- ppublish q1 $ 1- ppresent q2 Transform an XML schema (deterministic top-down tree automaton) into a probabilistic generator (probabilistic tree automaton) of XML documents Probability distribution optimal with respect to a given corpus Application: Optimal auto-completions in an XML editor 15 / 34 Habilitation Pierre Senellart joint work with DBWeb

22 Probabilistic XML: Models and Complexity

23 Outline An Uncertain World Probabilistic XML: Models and Complexity Modeling Querying Updating Reconciling Web Data Models with the Actual Web 17 / 34 Habilitation Pierre Senellart DBWeb

24 A general discrete Probabilistic XML Model [Abiteboul and Senellart, 2006, Senellart, 2007, Abiteboul et al., 2009] A A w 1, w 2 B semantics C A C A w 2 D C D B C Event Prob. p 1 = 0.06 p 2 = 0.70 p 3 = 0.24 w w Expresses arbitrarily complex dependencies Analogous to probabilistic c-tables 18 / 34 Habilitation Pierre Senellart DBWebjoint work with

25 A general discrete Probabilistic XML Model [Abiteboul and Senellart, 2006, Senellart, 2007, Abiteboul et al., 2009] A A w 1, w 2 B semantics C A C A w 2 D C D B C Event Prob. p 1 = 0.06 p 2 = 0.70 p 3 = 0.24 w w Expresses arbitrarily complex dependencies Analogous to probabilistic c-tables 18 / 34 Habilitation Pierre Senellart DBWebjoint work with

26 Continuous distributions [Abiteboul et al., 2011] root sensor e mes id i25 sensor mes id t vl t vl N (70, 4) i35 e: event it did not rain at time 1 e mes t vl 1 mux mux: mutually exclusive options N (70, 4): normal distribution.3 20 Two kinds of dependencies in discrete probabilistic XML: global (e) and local (mux) Addcontinuous leaves; non-obvious semantics issues 19 / 34 Habilitation Pierre Senellart joint work with DBWeb

27 Recursive Markov chains [Benedikt et al., 2010] <!ELEMENT directory (person*)> <!ELEMENT person (name,phone*)> D: directory 1 P P: person 1 1 N T 0.5 N : name T : phone 1 1 Probabilistic model that subsumes local discrete models Allows generating documents of unbounded width or depth 20 / 34 Habilitation Pierre Senellart DBWebjoint work with

28 Outline An Uncertain World Probabilistic XML: Models and Complexity Modeling Querying Updating Reconciling Web Data Models with the Actual Web 21 / 34 Habilitation Pierre Senellart DBWeb

29 Querying probabilistic XML Semantics of a (Boolean) query = probability the query is true: 1. Generate all possible worlds of a given probabilistic document 2. In each world, evaluate the query 3. Add up the probabilities of the worlds that make the query true EXPTIME algorithm! Can we do better, i.e., can we apply directly the algorithm on the probabilistic document? We shall talk about data complexity of query answering. 22 / 34 Habilitation Pierre Senellart DBWeb

30 Querying probabilistic XML Semantics of a (Boolean) query = probability the query is true: 1. Generate all possible worlds of a given probabilistic document (possibly exponentially many) 2. In each world, evaluate the query 3. Add up the probabilities of the worlds that make the query true EXPTIME algorithm! Can we do better, i.e., can we apply directly the algorithm on the probabilistic document? We shall talk about data complexity of query answering. 22 / 34 Habilitation Pierre Senellart DBWeb

31 Previously known results [Kimelfeld et al., 2009, Cohen et al., 2009] Tree-pattern queries are evaluable in linear time over local dependencies with a bottom-up, dynamic programming algorithm Indeed, any monadic second-order query is tractable over local dependencies Even trivial queries are #P-hard over global dependencies Monte-Carlo sampling works Multiplicative approximation is also tractable (existence of a FPRAS) 23 / 34 Habilitation Pierre Senellart DBWeb

32 Aggregation queries [Abiteboul et al., 2010] Aggregate Queries: sum, count, avg, countd, min, max, etc. Distributions? Possible values? Expected value? Computing expected values of sum and count tractable with global dependencies; everything else intractable Computing expected values of every of these aggregate functions tractable with local dependencies Computing distributions and possible values tractable for count, min, max, intractable for the others Continuous distributions do not add any more complexity! 24 / 34 Habilitation Pierre Senellart joint work with DBWeb

33 Queries with joins are hard [Kharlamov et al., 2011] Tree-pattern queries (TP) /A[C/D]//B C A B D Tree-pattern queries with joins (TPJ) for $x in $doc/a/c/d A return $doc/a//b[.=$x] C B D Over local dependencies, dichotomy for queries with a single join: If equivalent to a join-free query, linear-time Otherwise, #P-hard 25 / 34 Habilitation Pierre Senellart DBWeb joint work with

34 Queries with joins are hard [Kharlamov et al., 2011] Tree-pattern queries (TP) /A[C/D]//B C A B D Tree-pattern queries with joins (TPJ) for $x in $doc/a/c/d A return $doc/a//b[.=$x] C B D Over local dependencies, dichotomy for queries with a single join: If equivalent to a join-free query, linear-time Otherwise, #P-hard 25 / 34 Habilitation Pierre Senellart DBWeb joint work with

35 Rely on approximations... when needed [Senellart and Souihli, 2011, Souihli, 2011, Souihli and Senellart, 2012] Monte-Carlo is very good at approximating high probabilities Sometimes the structure of a query makes the probability of a query easy to evaluate For small formulas, naïve evaluation techniques good enough Refined approximation methods best when everything else fails Use a cost model to decide which evaluation algorithm to use! 1,000 Average time (ms) join 26 / 34 Habilitation Pierre Senellart DBWeb N / A joint work with Naïve Sieve Monte-Carlo Coverage EvalDP Best

36 Rely on approximations... when needed [Senellart and Souihli, 2011, Souihli, 2011, Souihli and Senellart, 2012] Monte-Carlo is very good at approximating high probabilities Sometimes the structure of a query makes the probability of a query easy to evaluate For small formulas, naïve evaluation techniques good enough Refined approximation methods best when everything else fails Use a cost model to decide which evaluation algorithm to use! 1,000 Average time (ms) join 26 / 34 Habilitation Pierre Senellart DBWeb N / A joint work with Naïve Sieve Monte-Carlo Coverage EvalDP Best

37 Outline An Uncertain World Probabilistic XML: Models and Complexity Modeling Querying Updating Reconciling Web Data Models with the Actual Web 27 / 34 Habilitation Pierre Senellart DBWeb

38 The complexity of updates [Senellart and Abiteboul, 2007, Kharlamov et al., 2010] Updates defined by a query (cf. XUpdate, XQuery Update). Semantics: for all matches of a query, insert or delete a node in the tree at a place located by the query. Results Most updates are intractable with local dependencies: the result of an update can require an exponentially larger representation size Insertions with a for-each-match semantics are tractable with arbitrary dependencies; deletions are intractable Some insert-if-there-is-a-match operations tractable for local dependencies but not for arbitrary dependencies 28 / 34 Habilitation Pierre Senellart DBWebjoint work with

39 Reconciling Web Data Models with the Actual Web

40 Outline An Uncertain World Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web Applying Probabilistic XML to Web data Formal Models for Web Content Acquisition 30 / 34 Habilitation Pierre Senellart DBWeb

41 Applying Probabilistic XML to Web data Current mismatch (in my research, but also in general) between: Applications that produce and have to deal with uncertainty: Information extraction Graph mining Natural language parsing Data integration etc. Probabilistic database management systems Analogy to data management in the 1960 s: Each data management application (accounting, cataloging, etc.) was developing its own way of managing data Each application dealing with imprecise data is developing its own way of managing imprecision Probabilistic DBMSs are becoming more and more mature time to put them to use for real applications? 31 / 34 Habilitation Pierre Senellart DBWeb

42 Applying Probabilistic XML to Web data Current mismatch (in my research, but also in general) between: Applications that produce and have to deal with uncertainty: Information extraction Graph mining Natural language parsing Data integration etc. Probabilistic database management systems Analogy to data management in the 1960 s: Each data management application (accounting, cataloging, etc.) was developing its own way of managing data Each application dealing with imprecise data is developing its own way of managing imprecision Probabilistic DBMSs are becoming more and more mature time to put them to use for real applications? 31 / 34 Habilitation Pierre Senellart DBWeb

43 Outline An Uncertain World Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web Applying Probabilistic XML to Web data Formal Models for Web Content Acquisition 32 / 34 Habilitation Pierre Senellart DBWeb

44 Formal Models for Web Content Acquisition Current mismatch between: Actual Web mining and Web data management applications: Web graph mining wrapper induction Web crawling Research on the foundations of Web data management Plan: bridge the two Use formal models for data exchange [Senellart and Gottlob, 2008, Gottlob and Senellart, 2010] to perform wrapper induction or ontology matching Use static analysis techniques for JavaScript to perform deep Web data extraction [Benedikt et al., 2012b] Use results of formal studies of the containment of recursive query languages [Benedikt et al., 2011, 2012a] to optimize query answering over the deep Web 33 / 34 Habilitation Pierre Senellart DBWeb

45 Formal Models for Web Content Acquisition Current mismatch between: Actual Web mining and Web data management applications: Web graph mining wrapper induction Web crawling Research on the foundations of Web data management Plan: bridge the two Use formal models for data exchange [Senellart and Gottlob, 2008, Gottlob and Senellart, 2010] to perform wrapper induction or ontology matching Use static analysis techniques for JavaScript to perform deep Web data extraction [Benedikt et al., 2012b] Use results of formal studies of the containment of recursive query languages [Benedikt et al., 2011, 2012a] to optimize query answering over the deep Web 33 / 34 Habilitation Pierre Senellart DBWeb

46 Questions?

47 Talel Abdessalem, M. Lamine Ba, and Pierre Senellart. A probabilistic XML merging tool. In Proc. EDBT, pages , Uppsala, Sweden, March Demonstration. Serge Abiteboul and Pierre Senellart. Querying and updating probabilistic information in XML. In Proc. EDBT, pages , Munich, Germany, March Serge Abiteboul, Benny Kimelfeld, Yehoshua Sagiv, and Pierre Senellart. On the expressiveness of probabilistic XML models. VLDB Journal, 18(5): , October Serge Abiteboul, T-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Aggregate queries for discrete and continuous probabilistic XML. In Proc. ICDT, pages 50 61, Lausanne, Switzerland, March 2010.

48 Serge Abiteboul, T-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Capturing continuous data and answering aggregate queries in probabilistic XML. ACM Transactions on Database Systems, 36(4), Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, and Pierre Senellart. Finding optimal probabilistic generators for XML collections. In Proc. ICDT, Berlin, Germany, March 2012a. Serge Abiteboul, Yael Amsterdamer, Tova Milo, and Pierre Senellart. Auto-completion learning for XML. In Proc. SIGMOD, pages , Scottsdale, USA, May 2012b. Demonstration. M. Lamine Ba, Talel Abdessalem, and Pierre Senellart. Towards a version control model with uncertain data. In Proc. PIKM, Glasgow, United Kingdom, October 2011.

49 Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. Probabilistic XML via Markov chains. Proceedings of the VLDB Endowment, 3(1): , September Presented at the VLDB 2010 conference, Singapore. Michael Benedikt, Georg Gottlob, and Pierre Senellart. Determining relevance of accesses at runtime. In Proc. PODS, pages , Athens, Greece, June Michael Benedikt, Pierre Bourhis, and Pierre Senellart. Monadic datalog containment. In Proc. ICALP, pages 79 91, Warwick, United Kingdom, July 2012a. Michael Benedikt, Tim Furche, Andreas Savvides, and Pierre Senellart. ProFoUnd: Program-analysis based form understanding. In Proc. WWW, pages , Lyon, France, April 2012b. Demonstration. Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Running tree automata on probabilistic XML. In PODS, 2009.

50 Nilesh Dalvi, Chrisopher Ré, and Dan Suciu. Probabilistic databases: Diamonds in the dirt. Communications of the ACM, 52(7), Robert Fink, Andrew Hogue, Dan Olteanu, and Swaroop Rath. SPROUT 2 : a squared query engine for uncertain web data. In SIGMOD, José Galindo, Angelica Urrutia, and Mario Piattini. Fuzzy Databases: Modeling, Design And Implementation. IGI Global, Georg Gottlob and Pierre Senellart. Schema mapping discovery from data instances. Journal of the ACM, 57(2), January Todd J. Green and Val Tannen. Models for incomplete and probabilistic information. In Proc. EDBT Workshops, IIDB, Munich, Germany, March Tomasz Imieliński and Witold Lipski. Incomplete information in relational databases. Journal of the ACM, 31(4): , 1984.

51 Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Updating probabilistic XML. In Proc. Updates in XML, Lausanne, Switzerland, March Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Value joins are expensive over (probabilistic) XML. In Proc. LID, pages 41 48, Uppsala, Sweden, March Benny Kimelfeld, Yuri Kosharovsky, and Yehoshua Sagiv. Query evaluation over probabilistic XML. VLDB Journal, 18(5), Christoph Koch. MayBMS: A system for managing large uncertain and probabilistic databases. In Charu Aggarwal, editor, Managing and Mining Uncertain Data. Springer, Pierre Senellart. Comprendre le Web caché. Understanding the Hidden Web. PhD thesis, Université Paris XI, Orsay, France, December 2007.

52 Pierre Senellart and Serge Abiteboul. On the complexity of managing probabilistic XML data. In Proc. PODS, pages , Beijing, China, June Pierre Senellart and Georg Gottlob. On the complexity of deriving schema mappings from database instances. In Proc. PODS, pages 23 32, Vancouver, Canada, June Pierre Senellart and Asma Souihli. ProApproX: A lightweight approximation query processor over probabilistic trees. In Proc. SIGMOD, pages , Athens, Greece, June Demonstration. Pierre Senellart, Avin Mittal, Daniel Muschick, Rémi Gilleron, and Marc Tommasi. Automatic wrapper induction from hidden-web sources with domain knowledge. In Proc. WIDM, pages 9 16, Napa, USA, October 2008.

53 Asma Souihli. Efficient query evaluation over probabilistic XML with long-distance dependencies. In EDBT/ICDT PhD Workshop, Asma Souihli and Pierre Senellart. Optimizing approximations of dnf query lineage in probabilistic xml, April Preprint. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. YAGO: A core of semantic knowledge. Unifying WordNet and Wikipedia. In WWW, pages , ISBN Jennifer Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, Lotfi A. Zadeh. A simple view of the Dempster-Shafer theory of evidence and its implication for the rule of combination. AI Magazine, 7(2), 1986.

Probabilistic XML: A Data Model for the Web

Probabilistic XML: A Data Model for the Web Probabilistic XML: A Data Model for the Web Pierre Senellart To cite this version: Pierre Senellart. Probabilistic XML: A Data Model for the Web. Databases. Université Pierre et Marie Curie - Paris VI,

More information

M. Lamine BA Post-doc fellow

M. Lamine BA Post-doc fellow M. Lamine BA Post-doc fellow 212 rue de Tolbiac 75013 Paris, France H (+33) 6-66-80-75-60 B mouhamadou.ba@telecom-paristech.fr Í http://perso.telecom-paristech.fr/ ba Personal data January 4, 1984 Dakar,

More information

Query Selectivity Estimation for Uncertain Data

Query Selectivity Estimation for Uncertain Data Query Selectivity Estimation for Uncertain Data Sarvjeet Singh *, Chris Mayfield *, Rahul Shah #, Sunil Prabhakar * and Susanne Hambrusch * * Department of Computer Science, Purdue University # Department

More information

Web Database Integration

Web Database Integration Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

More information

XML Data Integration

XML Data Integration XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration

More information

Querying Past and Future in Web Applications

Querying Past and Future in Web Applications Querying Past and Future in Web Applications Daniel Deutch Tel-Aviv University Tova Milo Customer HR System Logistics ERP Bank ecomm CRM Supplier Outline Introduction & Motivation Querying Future [VLDB

More information

Mining the Web of Linked Data with RapidMiner

Mining the Web of Linked Data with RapidMiner Mining the Web of Linked Data with RapidMiner Petar Ristoski, Christian Bizer, and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {petar.ristoski,heiko,chris}@informatik.uni-mannheim.de

More information

DATA MINING IN FINANCE

DATA MINING IN FINANCE DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia

More information

Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)

Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen

More information

XML with Incomplete Information

XML with Incomplete Information XML with Incomplete Information Pablo Barceló Leonid Libkin Antonella Poggi Cristina Sirangelo Abstract We study models of incomplete information for XML, their computational properties, and query answering.

More information

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,

More information

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Procedia Engineering 29 (2012) 1119 1125

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Procedia Engineering 29 (2012) 1119 1125 Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Engineering 00 (2011) 000 000 Procedia Engineering 29 (2012) 1119 1125 Procedia Engineering www.elsevier.com/locate/procedia

More information

Data Posting: a New Frontier for Data Exchange in the Big Data Era

Data Posting: a New Frontier for Data Exchange in the Big Data Era Data Posting: a New Frontier for Data Exchange in the Big Data Era Domenico Saccà and Edoardo Serra DIMES, Università della Calabria, 87036 Rende, Italy sacca@unical.it, eserra@deis.unical.it 1 Preliminaries

More information

Integrating Pattern Mining in Relational Databases

Integrating Pattern Mining in Relational Databases Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Navigational Plans For Data Integration

Navigational Plans For Data Integration Navigational Plans For Data Integration Marc Friedman University of Washington friedman@cs.washington.edu Alon Levy University of Washington alon@cs.washington.edu Todd Millstein University of Washington

More information

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan

More information

Automatic Wrapper Induction from Hidden-Web Sources with Domain Knowledge

Automatic Wrapper Induction from Hidden-Web Sources with Domain Knowledge Automatic Wrapper Induction from Hidden-Web Sources with Domain Knowledge Pierre Senellart INRIA Saclay & TELECOM ParisTech Paris, France pierre@senellart.com Avin Mittal Indian Institute of Technology

More information

Veracity in Big Data Reliability of Routes

Veracity in Big Data Reliability of Routes Veracity in Big Data Reliability of Routes Dr. Tobias Emrich Post-Doctoral Scholar Integrated Media Systems Center (IMSC) Viterbi School of Engineering University of Southern California Los Angeles, CA

More information

XML data integration in SixP2P a theoretical framework

XML data integration in SixP2P a theoretical framework XML data integration in SixP2P a theoretical framework Tadeusz Pankowski Institute of Control and Information Engineering Poznań University of Technology Poland Faculty of Mathematics and Computer Science

More information

Data Quality in Information Integration and Business Intelligence

Data Quality in Information Integration and Business Intelligence Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies

More information

Efficient Storage and Temporal Query Evaluation of Hierarchical Data Archiving Systems

Efficient Storage and Temporal Query Evaluation of Hierarchical Data Archiving Systems Efficient Storage and Temporal Query Evaluation of Hierarchical Data Archiving Systems Hui (Wendy) Wang, Ruilin Liu Stevens Institute of Technology, New Jersey, USA Dimitri Theodoratos, Xiaoying Wu New

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Automatic Annotation Wrapper Generation and Mining Web Database Search Result Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation

Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Rajasekar Krishnamurthy University of Wisconsin sekar@cs.wisc.edu Raghav Kaushik Microsoft Corporation skaushi@microsoft.com

More information

Analyzing Triggers in XML Data Integration Systems

Analyzing Triggers in XML Data Integration Systems Analyzing Triggers in XML Data Integration Systems Jing Lu, Dunlu Peng, Huan Huo, Liping Gao, Xiaodong Zhu Analyzing Triggers in XML Data Integration Systems Jing Lu 1, Dunlu Peng 1, Huan Huo 1, Liping

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

REST vs. SOAP: Making the Right Architectural Decision

REST vs. SOAP: Making the Right Architectural Decision REST vs. SOAP: Making the Right Architectural Decision Cesare Pautasso Faculty of Informatics University of Lugano (USI), Switzerland http://www.pautasso.info 1 Agenda 1. Motivation: A short history of

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

Big Data Provenance: Challenges and Implications for Benchmarking

Big Data Provenance: Challenges and Implications for Benchmarking Big Data Provenance: Challenges and Implications for Benchmarking Boris Glavic Illinois Institute of Technology 10 W 31st Street, Chicago, IL 60615, USA glavic@iit.edu Abstract. Data Provenance is information

More information

A Design and implementation of a data warehouse for research administration universities

A Design and implementation of a data warehouse for research administration universities A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon

More information

Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1

Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1 Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1 Hassina Bounif, Stefano Spaccapietra, Rachel Pottinger Database Laboratory, EPFL, School of Computer and Communication

More information

Querying Combined Cloud-Based and Relational Databases

Querying Combined Cloud-Based and Relational Databases Querying Combined Cloud-Based and Relational Databases Minpeng Zhu and Tore Risch Department of Information Technology, Uppsala University, Sweden Minpeng.Zhu@it.uu.se Tore.Risch@it.uu.se Abstract An increasing

More information

Caching XML Data on Mobile Web Clients

Caching XML Data on Mobile Web Clients Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D-33102 Paderborn,

More information

Creating Probabilistic Databases from Duplicated Data

Creating Probabilistic Databases from Duplicated Data VLDB Journal manuscript No. (will be inserted by the editor) Creating Probabilistic Databases from Duplicated Data Oktie Hassanzadeh Renée J. Miller Received: 14 September 2008 / Revised: 1 April 2009

More information

Issues in Monitoring Web Data

Issues in Monitoring Web Data Issues in Monitoring Web Data Serge Abiteboul INRIA and Xyleme Abstract. The web is turning from a collection of static documents to a global source of dynamic knowledge. First, HTMLis increasingly complemented

More information

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange Data Integration and Exchange L. Libkin 1 Data Integration and Exchange Traditional approach to databases A single large repository of data. Database administrator in charge of access to data. Users interact

More information

Big Data Begets Big Database Theory

Big Data Begets Big Database Theory Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Technologies for a CERIF XML based CRIS

Technologies for a CERIF XML based CRIS Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the

More information

CURRICULUM VITAE JORGE PÉREZ

CURRICULUM VITAE JORGE PÉREZ EDUCATION CURRICULUM VITAE JORGE PÉREZ Ph.D. Student, Department of Computer Science Pontificia Universidad Católica de Chile Email: jperez@ing.puc.cl, Http: www.ing.puc.cl/~jperez 2009 Ph.D. Student in

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Search Engines. Stephen Shaw 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

Data Integration with Uncertainty

Data Integration with Uncertainty Data Integration with Uncertainty Xin Dong University of Washington Seattle, WA 98195 lunadong@cs.washington.edu Alon Y. Halevy Google Inc. Mountain View, CA 94043 halevy@google.com Cong Yu University

More information

Offshore Holdings Analytics Using Datalog + RuleML Rules

Offshore Holdings Analytics Using Datalog + RuleML Rules Offshore Holdings Analytics Using Datalog + RuleML Rules Mohammad Sadnan Al Manir and Christopher J.O. Baker Department of Computer Science and Applied Statistics University of New Brunswick, Saint John,

More information

A Hybrid Approach for Ontology Integration

A Hybrid Approach for Ontology Integration A Hybrid Approach for Ontology Integration Ahmed Alasoud Volker Haarslev Nematollaah Shiri Concordia University Concordia University Concordia University 1455 De Maisonneuve Blvd. West 1455 De Maisonneuve

More information

Some Research Challenges for Big Data Analytics of Intelligent Security

Some Research Challenges for Big Data Analytics of Intelligent Security Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,

More information

Fuzzy Duplicate Detection on XML Data

Fuzzy Duplicate Detection on XML Data Fuzzy Duplicate Detection on XML Data Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, Berlin, Germany mweis@informatik.hu-berlin.de Abstract XML is popular for data exchange and data publishing

More information

Managing large sound databases using Mpeg7

Managing large sound databases using Mpeg7 Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Report on the Dagstuhl Seminar Data Quality on the Web

Report on the Dagstuhl Seminar Data Quality on the Web Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,

More information

Certain Answers as Objects and Knowledge

Certain Answers as Objects and Knowledge Certain Answers as Objects and Knowledge Leonid Libkin School of Informatics, University of Edinburgh Abstract The standard way of answering queries over incomplete databases is to compute certain answers,

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Duplicate Detection Algorithm In Hierarchical Data Using Efficient And Effective Network Pruning Algorithm: Survey

Duplicate Detection Algorithm In Hierarchical Data Using Efficient And Effective Network Pruning Algorithm: Survey www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 12 December 2014, Page No. 9766-9773 Duplicate Detection Algorithm In Hierarchical Data Using Efficient

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Efficient Query Evaluation on Probabilistic Databases

Efficient Query Evaluation on Probabilistic Databases The VLDB Journal manuscript No. (will be inserted by the editor) Nilesh Dalvi Dan Suciu Efficient Query Evaluation on Probabilistic Databases the date of receipt and acceptance should be inserted later

More information

Constraint-based Query Distribution Framework for an Integrated Global Schema

Constraint-based Query Distribution Framework for an Integrated Global Schema Constraint-based Query Distribution Framework for an Integrated Global Schema Ahmad Kamran Malik 1, Muhammad Abdul Qadir 1, Nadeem Iftikhar 2, and Muhammad Usman 3 1 Muhammad Ali Jinnah University, Islamabad,

More information

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Note on Maximum Independent Sets in Rectangle Intersection Graphs A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,

More information

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,

More information

Analysis of Uncertain Data: Tools for Representation and Processing

Analysis of Uncertain Data: Tools for Representation and Processing Analysis of Uncertain Data: Tools for Representation and Processing Bin Fu binf@cs.cmu.edu Eugene Fink e.fink@cs.cmu.edu Jaime G. Carbonell jgc@cs.cmu.edu Abstract We present initial work on a general-purpose

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

More information

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In

More information

Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University

Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional

More information

Databases 2011 The Relational Model and SQL

Databases 2011 The Relational Model and SQL Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is a Database? Main Entry: da ta base Pronunciation: \ˈdā-tə-ˌbās, ˈda- also ˈdä-\ Function: noun Date: circa 1962 : a usually

More information

Smarter Planet evolution

Smarter Planet evolution Smarter Planet evolution 13/03/2012 2012 IBM Corporation Ignacio Pérez González Enterprise Architect ignacio.perez@es.ibm.com @ignaciopr Mike May Technologies of the Change Capabilities Tendencies Vision

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics

Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada Joint work with: Andrei Lopatenko (Google,

More information

Logical Foundations of Relational Data Exchange

Logical Foundations of Relational Data Exchange Logical Foundations of Relational Data Exchange Pablo Barceló Department of Computer Science, University of Chile pbarcelo@dcc.uchile.cl 1 Introduction Data exchange has been defined as the problem of

More information

Data Integration for XML based on Semantic Knowledge

Data Integration for XML based on Semantic Knowledge Data Integration for XML based on Semantic Knowledge Kamsuriah Ahmad a, Ali Mamat b, Hamidah Ibrahim c and Shahrul Azman Mohd Noah d a,d Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia,

More information

Web Content Mining and NLP. Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic.

Web Content Mining and NLP. Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic. Web Content Mining and NLP Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic.edu/~liub Introduction The Web is perhaps the single largest and distributed

More information

Structured and Semi-Structured Data Integration

Structured and Semi-Structured Data Integration UNIVERSITÀ DEGLI STUDI DI ROMA LA SAPIENZA DOTTORATO DI RICERCA IN INGEGNERIA INFORMATICA XIX CICLO 2006 UNIVERSITÉ DE PARIS SUD DOCTORAT DE RECHERCHE EN INFORMATIQUE Structured and Semi-Structured Data

More information

An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams

An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams Susan D. Urban 1, Suzanne W. Dietrich 1, 2, and Yi Chen 1 Arizona

More information

Uncertain Data Management for Sensor Networks

Uncertain Data Management for Sensor Networks Uncertain Data Management for Sensor Networks Amol Deshpande, University of Maryland (joint work w/ Bhargav Kanagal, Prithviraj Sen, Lise Getoor, Sam Madden) Motivation: Sensor Networks Unprecedented,

More information

Interactive Learning of HTML Wrappers Using Attribute Classification

Interactive Learning of HTML Wrappers Using Attribute Classification Interactive Learning of HTML Wrappers Using Attribute Classification Interactive Learning of HTML Wrappers Using Attribute Classification Michal Ceresna DBAI, TUMichal Wien, Ceresna Vienna, Austria ceresna@dbai.tuwien.ac.at

More information

A Paradigm for Learning Queries on Big Data

A Paradigm for Learning Queries on Big Data A Paradigm for Learning Queries on Big Data Angela Bonifati Radu Ciucanu Aurélien Lemay Sławek Staworko University of Lille & INRIA, France {angela.bonifati, radu.ciucanu, aurelien.lemay, slawomir.staworko}@inria.fr

More information

Archivage du contenu éphémère du Web à l aide des flux Web *

Archivage du contenu éphémère du Web à l aide des flux Web * Archivage du contenu éphémère du Web à l aide des flux Web * Marilena Oita Pierre Senellart Résumé Cette proposition de démonstration concerne une application d archivage du contenu du Web à l aide des

More information

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential

More information

Logical and categorical methods in data transformation (TransLoCaTe)

Logical and categorical methods in data transformation (TransLoCaTe) Logical and categorical methods in data transformation (TransLoCaTe) 1 Introduction to the abbreviated project description This is an abbreviated project description of the TransLoCaTe project, with an

More information

Integrating Heterogeneous Data Sources Using XML

Integrating Heterogeneous Data Sources Using XML Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer

More information

Dealing with redundancies and dependencies in normalization of XML data

Dealing with redundancies and dependencies in normalization of XML data Proceedings of the International Multiconference on Computer Science and Information Technology pp. 543 550 ISBN 978-83-60810-14-9 ISSN 1896-7094 Dealing with redundancies and dependencies in normalization

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

DYNAMIC QUERY FORMS WITH NoSQL

DYNAMIC QUERY FORMS WITH NoSQL IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH

More information

Peer Data Management Systems Concepts and Approaches

Peer Data Management Systems Concepts and Approaches Peer Data Management Systems Concepts and Approaches Armin Roth HPI, Potsdam, Germany Nov. 10, 2010 Armin Roth (HPI, Potsdam, Germany) Peer Data Management Systems Nov. 10, 2010 1 / 28 Agenda 1 Large-scale

More information

Applying an XML Warehouse to Social Network Analysis

Applying an XML Warehouse to Social Network Analysis Applying an XML Warehouse to Social Network Analysis Benjamin Nguyen University of Versailles benjamin.nguyen@prism.uvsq.fr History evidences it, sociology extracts it. ABSTRACT Lessons from the WebStand

More information

Data Stream Management

Data Stream Management Data Stream Management Synthesis Lectures on Data Management Editor M. Tamer Özsu, University of Waterloo Synthesis Lectures on Data Management is edited by Tamer Özsu of the University of Waterloo. The

More information

15.564 Information Technology I. Business Intelligence

15.564 Information Technology I. Business Intelligence 15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

More information

Schema mapping and query reformulation in peer-to-peer XML data integration system

Schema mapping and query reformulation in peer-to-peer XML data integration system Control and Cybernetics vol. 38 (2009) No. 1 Schema mapping and query reformulation in peer-to-peer XML data integration system by Tadeusz Pankowski Institute of Control and Information Engineering Poznań

More information

Converging Web-Data and Database Data: Big - and Small Data via Linked Data

Converging Web-Data and Database Data: Big - and Small Data via Linked Data DBKDA/WEB Panel 2014, Chamonix, 24.04.2014 DBKDA/WEB Panel 2014, Chamonix, 24.04.2014 Reutlingen University Converging Web-Data and Database Data: Big - and Small Data via Linked Data Moderation: Fritz

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

On Development of Fuzzy Relational Database Applications

On Development of Fuzzy Relational Database Applications On Development of Fuzzy Relational Database Applications Srdjan Skrbic Faculty of Science Trg Dositeja Obradovica 3 21000 Novi Sad Serbia shkrba@uns.ns.ac.yu Aleksandar Takači Faculty of Technology Bulevar

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Talel ABDESSALEM. May 7, 2014. 1 Curriculum Vitæ 3 1.1 Personal Data... 3 1.2 Contact Information... 3 1.3 Professional Record... 3 1.4 Short Bio...

Talel ABDESSALEM. May 7, 2014. 1 Curriculum Vitæ 3 1.1 Personal Data... 3 1.2 Contact Information... 3 1.3 Professional Record... 3 1.4 Short Bio... Talel ABDESSALEM Professor Télécom ParisTech - Dept. of Computer Science and Networks 46, rue Barrault - 75013 Paris Tel.: (+33) 1 45 81 77 79, Fax: (+33) 1 45 81 31 19 Talel.Abdessalem@telecom-paristech.fr

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Semantification of Query Interfaces to Improve Access to Deep Web Content

Semantification of Query Interfaces to Improve Access to Deep Web Content Semantification of Query Interfaces to Improve Access to Deep Web Content Arne Martin Klemenz, Klaus Tochtermann ZBW German National Library of Economics Leibniz Information Centre for Economics, Düsternbrooker

More information