Probabilistic XML: A Data Model for the Web


 Barnaby Jones
 1 years ago
 Views:
Transcription
1 Probabilistic XML: A Data Model for the Web Pierre Senellart DBWeb Habilitation à diriger les recherches, 13 juin 2012
2 An Uncertain World
3 Outline An Uncertain World Uncertainty is Everywhere Probabilistic Databases Probabilistic XML Applications Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web 3 / 34 Habilitation Pierre Senellart DBWeb
4 Uncertain data Numerous sources of uncertain data: Measurement errors Data integration from contradicting sources Imprecise mappings between heterogeneous schemata Imprecise automatic process (information extraction, natural language processing, graph mining, etc.) Imperfect human judgment Lies, opinions, rumors 4 / 34 Habilitation Pierre Senellart DBWeb
5 Uncertain data Numerous sources of uncertain data: Measurement errors Data integration from contradicting sources Imprecise mappings between heterogeneous schemata Imprecise automatic process (information extraction, natural language processing, graph mining, etc.) Imperfect human judgment Lies, opinions, rumors 4 / 34 Habilitation Pierre Senellart DBWeb
6 Use case: Web information extraction Neverending Language Learning (NELL, CMU), 5 / 34 Habilitation Pierre Senellart DBWeb
7 Use case: Web information extraction Google Squared (terminated), screenshot from [Fink et al., 2011] 5 / 34 Habilitation Pierre Senellart DBWeb
8 Use case: Web information extraction Subject Predicate Object Confidence Elvis Presley diedondate % Elvis Presley ismarriedto Priscilla Presley 97.29% Elvis Presley influences Carlo Wolff 96.25% YAGO, [Suchanek et al., 2007] 5 / 34 Habilitation Pierre Senellart DBWeb
9 Uncertainty in Web information extraction The information extraction system is imprecise The system has some confidence in the information extracted, which can be: a probability of the information being true (e.g., conditional random fields) an adhoc numeric confidence score a rough discrete level of confidence (low, medium, high) What if this uncertain information is not seen as something final, but is used as a source of, e.g., a query answering system? 6 / 34 Habilitation Pierre Senellart DBWeb
10 Outline An Uncertain World Uncertainty is Everywhere Probabilistic Databases Probabilistic XML Applications Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web 7 / 34 Habilitation Pierre Senellart DBWeb
11 Managing uncertainty Objective Not to pretend this imprecision does not exist, and manage it as rigorously as possible throughout a long, automatic and human, potentially complex, process Especially: Represent different forms of uncertainty Probabilities are used to measure uncertainty in the data Query data and retrieve uncertain results Allow adding, deleting, modifying data in an uncertain way 8 / 34 Habilitation Pierre Senellart DBWeb
12 Managing uncertainty Objective Not to pretend this imprecision does not exist, and manage it as rigorously as possible throughout a long, automatic and human, potentially complex, process Especially: Represent different forms of uncertainty Probabilities are used to measure uncertainty in the data Query data and retrieve uncertain results Allow adding, deleting, modifying data in an uncertain way 8 / 34 Habilitation Pierre Senellart DBWeb
13 Why probabilities? Not the only option: fuzzy set theory [Galindo et al., 2005], DempsterShafer theory [Zadeh, 1986] Mathematically rich theory, nice semantics with respect to traditional database operations (e.g., joins) Some applications already generate probabilities (e.g., statistical information extraction or natural language probabilities) Naturally arising in case of conflicting information, based on the trust in the sources In other cases, we cheat and pretend that (normalized) confidence scores are probabilities: see this as a firstorder approximation 9 / 34 Habilitation Pierre Senellart DBWeb
14 Probabilistic relational DBMSs Numerous studies on the modeling and querying of probabilistic relations: blockindependent disjoint (BIDs) databases [Dalvi et al., 2009] tuples either mutually exclusive or independent, depending on key values probabilistic ctables [Imieliński and Lipski, 1984, Green and Tannen, 2006, Senellart, 2007], tuples annotated with arbitrary probabilistic formulas Two main probabilistic relational DBMS: Trio [Widom, 2005] Various uncertainty operators: unknown value, uncertain tuple, choice between different possible values, with probabilistic annotations MayBMS [Koch, 2009] Uncertain tables can be constructed using a REPAIRKEY operator, similar to BIDs, and more elaborate correlations can result from querying 10 / 34 Habilitation Pierre Senellart DBWeb
15 Probabilistic relational DBMSs Numerous studies on the modeling and querying of probabilistic relations: blockindependent disjoint (BIDs) databases [Dalvi et al., 2009] tuples either mutually exclusive or independent, depending on key values probabilistic ctables [Imieliński and Lipski, 1984, Green and Tannen, 2006, Senellart, 2007], tuples annotated with arbitrary probabilistic formulas Two main probabilistic relational DBMS: Trio [Widom, 2005] Various uncertainty operators: unknown value, uncertain tuple, choice between different possible values, with probabilistic annotations MayBMS [Koch, 2009] Uncertain tables can be constructed using a REPAIRKEY operator, similar to BIDs, and more elaborate correlations can result from querying 10 / 34 Habilitation Pierre Senellart DBWeb
16 Probabilistic relational DBMSs Numerous studies on the modeling and querying of probabilistic relations: blockindependent disjoint (BIDs) databases [Dalvi et al., 2009] tuples either mutually exclusive or independent, depending on key values probabilistic ctables [Imieliński and Lipski, 1984, Green and Tannen, 2006, Senellart, 2007], tuples annotated with arbitrary probabilistic formulas Two main probabilistic relational DBMS: Trio [Widom, 2005] Various uncertainty operators: unknown value, uncertain tuple, choice between different possible values, with probabilistic annotations MayBMS [Koch, 2009] Uncertain tables can be constructed using a REPAIRKEY operator, similar to BIDs, and more elaborate correlations can result from querying 10 / 34 Habilitation Pierre Senellart DBWeb
17 Outline An Uncertain World Uncertainty is Everywhere Probabilistic Databases Probabilistic XML Applications Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web 11 / 34 Habilitation Pierre Senellart DBWeb
18 Why probabilistic XML? Different typical querying languages: SQL and conjunctive queries vs XPath and treepattern queries (possibly with joins) Cases where a treelike model might be appropriate: No schema or few constraints on the schema Independent modules annotating freely a content warehouse Inherently treelike data (e.g., mailing lists, parse trees) with naturally occurring queries involving the descendant axis Remark Some results can be transferred from one model to the other. In other cases, connection much trickier! 12 / 34 Habilitation Pierre Senellart DBWeb
19 Web information extraction [Senellart et al., 2008] table / articles tr / article td / title td / authors token / title token / author #text #text Y 1 Y 2 Y 3 Y 4 Y 5 Y 8 Y 6 Y 7 Annotate HTML Web pages with possible labels Labels can be learned from a corpus of annotated documents Conditional random fields for XML: estimate probabilities of annotations given annotations of neighboring nodes Provides probabilistic labeling of Web pages 13 / 34 Habilitation Pierre Senellart DBWeb joint work with
20 Uncertain version control [Abdessalem et al., 2011, Ba et al., 2011] Use trees with probabilistic annotations to represent the uncertainty in the correctness of a document under open version control (e.g., Wikipedia articles) 14 / 34 Habilitation Pierre Senellart DBWeb joint work with
21 Probabilistic summaries of XML corpora [Abiteboul et al., 2012a,b] publish ppublish present ppresent q0 next 1 ppublish q1 $ 1 ppresent q2 Transform an XML schema (deterministic topdown tree automaton) into a probabilistic generator (probabilistic tree automaton) of XML documents Probability distribution optimal with respect to a given corpus Application: Optimal autocompletions in an XML editor 15 / 34 Habilitation Pierre Senellart joint work with DBWeb
22 Probabilistic XML: Models and Complexity
23 Outline An Uncertain World Probabilistic XML: Models and Complexity Modeling Querying Updating Reconciling Web Data Models with the Actual Web 17 / 34 Habilitation Pierre Senellart DBWeb
24 A general discrete Probabilistic XML Model [Abiteboul and Senellart, 2006, Senellart, 2007, Abiteboul et al., 2009] A A w 1, w 2 B semantics C A C A w 2 D C D B C Event Prob. p 1 = 0.06 p 2 = 0.70 p 3 = 0.24 w w Expresses arbitrarily complex dependencies Analogous to probabilistic ctables 18 / 34 Habilitation Pierre Senellart DBWebjoint work with
25 A general discrete Probabilistic XML Model [Abiteboul and Senellart, 2006, Senellart, 2007, Abiteboul et al., 2009] A A w 1, w 2 B semantics C A C A w 2 D C D B C Event Prob. p 1 = 0.06 p 2 = 0.70 p 3 = 0.24 w w Expresses arbitrarily complex dependencies Analogous to probabilistic ctables 18 / 34 Habilitation Pierre Senellart DBWebjoint work with
26 Continuous distributions [Abiteboul et al., 2011] root sensor e mes id i25 sensor mes id t vl t vl N (70, 4) i35 e: event it did not rain at time 1 e mes t vl 1 mux mux: mutually exclusive options N (70, 4): normal distribution.3 20 Two kinds of dependencies in discrete probabilistic XML: global (e) and local (mux) Addcontinuous leaves; nonobvious semantics issues 19 / 34 Habilitation Pierre Senellart joint work with DBWeb
27 Recursive Markov chains [Benedikt et al., 2010] <!ELEMENT directory (person*)> <!ELEMENT person (name,phone*)> D: directory 1 P P: person 1 1 N T 0.5 N : name T : phone 1 1 Probabilistic model that subsumes local discrete models Allows generating documents of unbounded width or depth 20 / 34 Habilitation Pierre Senellart DBWebjoint work with
28 Outline An Uncertain World Probabilistic XML: Models and Complexity Modeling Querying Updating Reconciling Web Data Models with the Actual Web 21 / 34 Habilitation Pierre Senellart DBWeb
29 Querying probabilistic XML Semantics of a (Boolean) query = probability the query is true: 1. Generate all possible worlds of a given probabilistic document 2. In each world, evaluate the query 3. Add up the probabilities of the worlds that make the query true EXPTIME algorithm! Can we do better, i.e., can we apply directly the algorithm on the probabilistic document? We shall talk about data complexity of query answering. 22 / 34 Habilitation Pierre Senellart DBWeb
30 Querying probabilistic XML Semantics of a (Boolean) query = probability the query is true: 1. Generate all possible worlds of a given probabilistic document (possibly exponentially many) 2. In each world, evaluate the query 3. Add up the probabilities of the worlds that make the query true EXPTIME algorithm! Can we do better, i.e., can we apply directly the algorithm on the probabilistic document? We shall talk about data complexity of query answering. 22 / 34 Habilitation Pierre Senellart DBWeb
31 Previously known results [Kimelfeld et al., 2009, Cohen et al., 2009] Treepattern queries are evaluable in linear time over local dependencies with a bottomup, dynamic programming algorithm Indeed, any monadic secondorder query is tractable over local dependencies Even trivial queries are #Phard over global dependencies MonteCarlo sampling works Multiplicative approximation is also tractable (existence of a FPRAS) 23 / 34 Habilitation Pierre Senellart DBWeb
32 Aggregation queries [Abiteboul et al., 2010] Aggregate Queries: sum, count, avg, countd, min, max, etc. Distributions? Possible values? Expected value? Computing expected values of sum and count tractable with global dependencies; everything else intractable Computing expected values of every of these aggregate functions tractable with local dependencies Computing distributions and possible values tractable for count, min, max, intractable for the others Continuous distributions do not add any more complexity! 24 / 34 Habilitation Pierre Senellart joint work with DBWeb
33 Queries with joins are hard [Kharlamov et al., 2011] Treepattern queries (TP) /A[C/D]//B C A B D Treepattern queries with joins (TPJ) for $x in $doc/a/c/d A return $doc/a//b[.=$x] C B D Over local dependencies, dichotomy for queries with a single join: If equivalent to a joinfree query, lineartime Otherwise, #Phard 25 / 34 Habilitation Pierre Senellart DBWeb joint work with
34 Queries with joins are hard [Kharlamov et al., 2011] Treepattern queries (TP) /A[C/D]//B C A B D Treepattern queries with joins (TPJ) for $x in $doc/a/c/d A return $doc/a//b[.=$x] C B D Over local dependencies, dichotomy for queries with a single join: If equivalent to a joinfree query, lineartime Otherwise, #Phard 25 / 34 Habilitation Pierre Senellart DBWeb joint work with
35 Rely on approximations... when needed [Senellart and Souihli, 2011, Souihli, 2011, Souihli and Senellart, 2012] MonteCarlo is very good at approximating high probabilities Sometimes the structure of a query makes the probability of a query easy to evaluate For small formulas, naïve evaluation techniques good enough Refined approximation methods best when everything else fails Use a cost model to decide which evaluation algorithm to use! 1,000 Average time (ms) join 26 / 34 Habilitation Pierre Senellart DBWeb N / A joint work with Naïve Sieve MonteCarlo Coverage EvalDP Best
36 Rely on approximations... when needed [Senellart and Souihli, 2011, Souihli, 2011, Souihli and Senellart, 2012] MonteCarlo is very good at approximating high probabilities Sometimes the structure of a query makes the probability of a query easy to evaluate For small formulas, naïve evaluation techniques good enough Refined approximation methods best when everything else fails Use a cost model to decide which evaluation algorithm to use! 1,000 Average time (ms) join 26 / 34 Habilitation Pierre Senellart DBWeb N / A joint work with Naïve Sieve MonteCarlo Coverage EvalDP Best
37 Outline An Uncertain World Probabilistic XML: Models and Complexity Modeling Querying Updating Reconciling Web Data Models with the Actual Web 27 / 34 Habilitation Pierre Senellart DBWeb
38 The complexity of updates [Senellart and Abiteboul, 2007, Kharlamov et al., 2010] Updates defined by a query (cf. XUpdate, XQuery Update). Semantics: for all matches of a query, insert or delete a node in the tree at a place located by the query. Results Most updates are intractable with local dependencies: the result of an update can require an exponentially larger representation size Insertions with a foreachmatch semantics are tractable with arbitrary dependencies; deletions are intractable Some insertifthereisamatch operations tractable for local dependencies but not for arbitrary dependencies 28 / 34 Habilitation Pierre Senellart DBWebjoint work with
39 Reconciling Web Data Models with the Actual Web
40 Outline An Uncertain World Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web Applying Probabilistic XML to Web data Formal Models for Web Content Acquisition 30 / 34 Habilitation Pierre Senellart DBWeb
41 Applying Probabilistic XML to Web data Current mismatch (in my research, but also in general) between: Applications that produce and have to deal with uncertainty: Information extraction Graph mining Natural language parsing Data integration etc. Probabilistic database management systems Analogy to data management in the 1960 s: Each data management application (accounting, cataloging, etc.) was developing its own way of managing data Each application dealing with imprecise data is developing its own way of managing imprecision Probabilistic DBMSs are becoming more and more mature time to put them to use for real applications? 31 / 34 Habilitation Pierre Senellart DBWeb
42 Applying Probabilistic XML to Web data Current mismatch (in my research, but also in general) between: Applications that produce and have to deal with uncertainty: Information extraction Graph mining Natural language parsing Data integration etc. Probabilistic database management systems Analogy to data management in the 1960 s: Each data management application (accounting, cataloging, etc.) was developing its own way of managing data Each application dealing with imprecise data is developing its own way of managing imprecision Probabilistic DBMSs are becoming more and more mature time to put them to use for real applications? 31 / 34 Habilitation Pierre Senellart DBWeb
43 Outline An Uncertain World Probabilistic XML: Models and Complexity Reconciling Web Data Models with the Actual Web Applying Probabilistic XML to Web data Formal Models for Web Content Acquisition 32 / 34 Habilitation Pierre Senellart DBWeb
44 Formal Models for Web Content Acquisition Current mismatch between: Actual Web mining and Web data management applications: Web graph mining wrapper induction Web crawling Research on the foundations of Web data management Plan: bridge the two Use formal models for data exchange [Senellart and Gottlob, 2008, Gottlob and Senellart, 2010] to perform wrapper induction or ontology matching Use static analysis techniques for JavaScript to perform deep Web data extraction [Benedikt et al., 2012b] Use results of formal studies of the containment of recursive query languages [Benedikt et al., 2011, 2012a] to optimize query answering over the deep Web 33 / 34 Habilitation Pierre Senellart DBWeb
45 Formal Models for Web Content Acquisition Current mismatch between: Actual Web mining and Web data management applications: Web graph mining wrapper induction Web crawling Research on the foundations of Web data management Plan: bridge the two Use formal models for data exchange [Senellart and Gottlob, 2008, Gottlob and Senellart, 2010] to perform wrapper induction or ontology matching Use static analysis techniques for JavaScript to perform deep Web data extraction [Benedikt et al., 2012b] Use results of formal studies of the containment of recursive query languages [Benedikt et al., 2011, 2012a] to optimize query answering over the deep Web 33 / 34 Habilitation Pierre Senellart DBWeb
46 Questions?
47 Talel Abdessalem, M. Lamine Ba, and Pierre Senellart. A probabilistic XML merging tool. In Proc. EDBT, pages , Uppsala, Sweden, March Demonstration. Serge Abiteboul and Pierre Senellart. Querying and updating probabilistic information in XML. In Proc. EDBT, pages , Munich, Germany, March Serge Abiteboul, Benny Kimelfeld, Yehoshua Sagiv, and Pierre Senellart. On the expressiveness of probabilistic XML models. VLDB Journal, 18(5): , October Serge Abiteboul, TH. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Aggregate queries for discrete and continuous probabilistic XML. In Proc. ICDT, pages 50 61, Lausanne, Switzerland, March 2010.
48 Serge Abiteboul, TH. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Capturing continuous data and answering aggregate queries in probabilistic XML. ACM Transactions on Database Systems, 36(4), Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, and Pierre Senellart. Finding optimal probabilistic generators for XML collections. In Proc. ICDT, Berlin, Germany, March 2012a. Serge Abiteboul, Yael Amsterdamer, Tova Milo, and Pierre Senellart. Autocompletion learning for XML. In Proc. SIGMOD, pages , Scottsdale, USA, May 2012b. Demonstration. M. Lamine Ba, Talel Abdessalem, and Pierre Senellart. Towards a version control model with uncertain data. In Proc. PIKM, Glasgow, United Kingdom, October 2011.
49 Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart. Probabilistic XML via Markov chains. Proceedings of the VLDB Endowment, 3(1): , September Presented at the VLDB 2010 conference, Singapore. Michael Benedikt, Georg Gottlob, and Pierre Senellart. Determining relevance of accesses at runtime. In Proc. PODS, pages , Athens, Greece, June Michael Benedikt, Pierre Bourhis, and Pierre Senellart. Monadic datalog containment. In Proc. ICALP, pages 79 91, Warwick, United Kingdom, July 2012a. Michael Benedikt, Tim Furche, Andreas Savvides, and Pierre Senellart. ProFoUnd: Programanalysis based form understanding. In Proc. WWW, pages , Lyon, France, April 2012b. Demonstration. Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Running tree automata on probabilistic XML. In PODS, 2009.
50 Nilesh Dalvi, Chrisopher Ré, and Dan Suciu. Probabilistic databases: Diamonds in the dirt. Communications of the ACM, 52(7), Robert Fink, Andrew Hogue, Dan Olteanu, and Swaroop Rath. SPROUT 2 : a squared query engine for uncertain web data. In SIGMOD, José Galindo, Angelica Urrutia, and Mario Piattini. Fuzzy Databases: Modeling, Design And Implementation. IGI Global, Georg Gottlob and Pierre Senellart. Schema mapping discovery from data instances. Journal of the ACM, 57(2), January Todd J. Green and Val Tannen. Models for incomplete and probabilistic information. In Proc. EDBT Workshops, IIDB, Munich, Germany, March Tomasz Imieliński and Witold Lipski. Incomplete information in relational databases. Journal of the ACM, 31(4): , 1984.
51 Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Updating probabilistic XML. In Proc. Updates in XML, Lausanne, Switzerland, March Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Value joins are expensive over (probabilistic) XML. In Proc. LID, pages 41 48, Uppsala, Sweden, March Benny Kimelfeld, Yuri Kosharovsky, and Yehoshua Sagiv. Query evaluation over probabilistic XML. VLDB Journal, 18(5), Christoph Koch. MayBMS: A system for managing large uncertain and probabilistic databases. In Charu Aggarwal, editor, Managing and Mining Uncertain Data. Springer, Pierre Senellart. Comprendre le Web caché. Understanding the Hidden Web. PhD thesis, Université Paris XI, Orsay, France, December 2007.
52 Pierre Senellart and Serge Abiteboul. On the complexity of managing probabilistic XML data. In Proc. PODS, pages , Beijing, China, June Pierre Senellart and Georg Gottlob. On the complexity of deriving schema mappings from database instances. In Proc. PODS, pages 23 32, Vancouver, Canada, June Pierre Senellart and Asma Souihli. ProApproX: A lightweight approximation query processor over probabilistic trees. In Proc. SIGMOD, pages , Athens, Greece, June Demonstration. Pierre Senellart, Avin Mittal, Daniel Muschick, Rémi Gilleron, and Marc Tommasi. Automatic wrapper induction from hiddenweb sources with domain knowledge. In Proc. WIDM, pages 9 16, Napa, USA, October 2008.
53 Asma Souihli. Efficient query evaluation over probabilistic XML with longdistance dependencies. In EDBT/ICDT PhD Workshop, Asma Souihli and Pierre Senellart. Optimizing approximations of dnf query lineage in probabilistic xml, April Preprint. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. YAGO: A core of semantic knowledge. Unifying WordNet and Wikipedia. In WWW, pages , ISBN Jennifer Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, Lotfi A. Zadeh. A simple view of the DempsterShafer theory of evidence and its implication for the rule of combination. AI Magazine, 7(2), 1986.
Answering Queries using Humans, Algorithms and Databases
Answering Queries using Humans, Algorithms and Databases Aditya Parameswaran Stanford University adityagp@cs.stanford.edu Neoklis Polyzotis Univ. of California at Santa Cruz alkis@cs.ucsc.edu ABSTRACT
More informationRecovering Semantics of Tables on the Web
Recovering Semantics of Tables on the Web Petros Venetis Alon Halevy Jayant Madhavan Marius Paşca Stanford University Google Inc. Google Inc. Google Inc. venetis@cs.stanford.edu halevy@google.com jayant@google.com
More informationLoad Shedding for Aggregation Queries over Data Streams
Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu
More informationThe limits of Web metadata, and beyond
The limits of Web metadata, and beyond Massimo Marchiori The World Wide Web Consortium (W3C), MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, U.S.A. max@lcs.mit.edu Abstract
More informationClean Answers over Dirty Databases: A Probabilistic Approach
Clean Answers over Dirty Databases: A Probabilistic Approach Periklis Andritsos University of Trento periklis@dit.unitn.it Ariel Fuxman University of Toronto afuxman@cs.toronto.edu Renée J. Miller University
More informationimap: Discovering Complex Semantic Matches between Database Schemas
imap: Discovering Complex Semantic Matches between Database Schemas Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois, UrbanaChampaign, IL, USA {dhamanka,ylee11,anhai}@cs.uiuc.edu
More information1 NOT ALL ANSWERS ARE EQUALLY
1 NOT ALL ANSWERS ARE EQUALLY GOOD: ESTIMATING THE QUALITY OF DATABASE ANSWERS Amihai Motro, Igor Rakov Department of Information and Software Systems Engineering George Mason University Fairfax, VA 220304444
More informationQuestion Classification by Approximating Semantics
Question Classification by Approximating Semantics Guangyu Feng, Kun Xiong, Yang Tang, Anqi Cui School of Computer Science, University of Waterloo {gfeng, caq}@uwaterloo.ca, {xiongkun04, tangyang9}@gmail.com
More informationData Integration: A Theoretical Perspective
Data Integration: A Theoretical Perspective Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Via Salaria 113, I 00198 Roma, Italy lenzerini@dis.uniroma1.it ABSTRACT
More informationAutomatic segmentation of text into structured records
Automatic segmentation of text into structured records Vinayak Borkar Kaustubh Deshmukh Sunita Sarawagi Indian Institute of Technology, Bombay ABSTRACT In this paper we present a method for automatically
More informationReasoning With Neural Tensor Networks for Knowledge Base Completion
Reasoning With Neural Tensor Networks for Knowledge Base Completion Richard Socher, Danqi Chen*, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305,
More informationFrom Data Fusion to Knowledge Fusion
From Data Fusion to Knowledge Fusion Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, Wei Zhang Google Inc. {lunadong gabr geremy wilko kpmurphy sunsh weizh}@google.com
More informationEmerging Methods in Predictive Analytics:
Emerging Methods in Predictive Analytics: Risk Management and Decision Making William H. Hsu Kansas State University, USA A volume in the Advances in Data Mining and Database Management (ADMDM) Book Series
More informationQuery Processing over Incomplete Autonomous Databases
Processing over Incomplete Autonomous Databases Garrett Wolf Hemal Khatri Bhaumik Chokshi Jianchun Fan Yi Chen Subbarao Kambhampati Department of Computer Science and Engineering Arizona State University
More informationAnomalous System Call Detection
Anomalous System Call Detection Darren Mutz, Fredrik Valeur, Christopher Kruegel, and Giovanni Vigna Reliable Software Group, University of California, Santa Barbara Secure Systems Lab, Technical University
More informationA Dataspace Odyssey: The imemex Personal Dataspace Management System
A Dataspace Odyssey: The imemex Personal Dataspace Management System Lukas Blunschi JensPeter Dittrich Olivier René Girard Shant Kirakos Karakashian ETH Zurich, Switzerland dbis.ethz.ch imemex.org Marcos
More informationType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data Heiko Paulheim and Christian Bizer University of Mannheim, Germany Research Group Data and Web Science {heiko,chris}@informatik.unimannheim.de Abstract. Type information
More informationTowards a Benchmark for Graph Data Management and Processing
Towards a Benchmark for Graph Data Management and Processing Michael Grossniklaus University of Konstanz Germany michael.grossniklaus @unikonstanz.de Stefania Leone University of Southern California United
More informationStrabon: A Semantic Geospatial DBMS
Strabon: A Semantic Geospatial DBMS Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis National and Kapodistrian University of Athens, Greece {kkyzir,mk,koubarak}@di.uoa.gr Abstract. We present
More informationThe Role Mining Problem: Finding a Minimal Descriptive Set of Roles
The Role Mining Problem: Finding a Minimal Descriptive Set of Roles Jaideep Vaidya MSIS Department and CIMIC Rutgers University 8 University Ave, Newark, NJ 72 jsvaidya@rbs.rutgers.edu Vijayalakshmi Atluri
More informationQualityaware ServiceOriented Data Integration: Requirements, State of the Art and Open Challenges
Qualityaware ServiceOriented Data Integration: Requirements, State of the Art and Open Challenges Schahram Dustdar Reinhard Pichler Vadim Savenkov HongLinh Truong Vienna University of Technology {dustdar,truong}@infosys.tuwien.ac.at
More informationQuestion Answering on Interlinked Data
Question Answering on Interlinked Data Saeedeh Shekarpour Universität Leipzig, IFI/AKSW shekarpour@informatik.unileipzig.de AxelCyrille Ngonga Ngomo Universität Leipzig, IFI/AKSW ngonga@informatik.unileipzig.de
More informationThe Stratosphere platform for big data analytics
The VLDB Journal DOI 10.1007/s007780140357y REGULAR PAPER The Stratosphere platform for big data analytics Alexander Alexandrov Rico Bergmann Stephan Ewen JohannChristoph Freytag Fabian Hueske Arvid
More informationEfficient ExternalMemory Bisimulation on DAGs
Efficient ExternalMemory Bisimulation on DAGs Jelle Hellings Hasselt University and transnational University of Limburg Belgium jelle.hellings@uhasselt.be George H.L. Fletcher Eindhoven University of
More informationMining Templates from Search Result Records of Search Engines
Mining Templates from Search Result Records of Search Engines Hongkun Zhao, Weiyi Meng State University of New York at Binghamton Binghamton, NY 13902, USA {hkzhao, meng}@cs.binghamton.edu Clement Yu University
More informationAutomatically Detecting Vulnerable Websites Before They Turn Malicious
Automatically Detecting Vulnerable Websites Before They Turn Malicious Kyle Soska and Nicolas Christin, Carnegie Mellon University https://www.usenix.org/conference/usenixsecurity14/technicalsessions/presentation/soska
More informationYAGO3: A Knowledge Base from Multilingual Wikipedias
YAGO3: A Knowledge Base from Multilingual Wikipedias Farzaneh Mahdisoltani Max Planck Institute for Informatics, Germany fmahdisoltani@gmail.com Joanna Biega Max Planck Institute for Informatics, Germany
More informationKnowledge Base Completion via SearchBased Question Answering
Knowledge Base Completion via SearchBased Question Answering Robert West*, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin *Computer Science Department, Stanford University, Stanford,
More informationSupporting Keyword Search in Product Database: A Probabilistic Approach
Supporting Keyword Search in Product Database: A Probabilistic Approach Huizhong Duan 1, ChengXiang Zhai 2, Jinxing Cheng 3, Abhishek Gattani 4 University of Illinois at UrbanaChampaign 1,2 Walmart Labs
More informationPerformance Profiling with EndoScope, an Acquisitional Software Monitoring Framework
Performance Profiling with EndoScope, an Acquisitional Software Monitoring Framework Alvin Cheung MIT CSAIL akcheung@mit.edu Samuel Madden MIT CSAIL madden@csail.mit.edu ABSTRACT We propose EndoScope,
More information