Path Querying on Graph Databases



Similar documents
Formal Verification and Linear-time Model Checking

Fixed-Point Logics and Computation

Temporal Logics. Computation Tree Logic

Algorithmic Software Verification

Semantics and Verification of Software

Introduction to Logic in Computer Science: Autumn 2006

logic language, static/dynamic models SAT solvers Verified Software Systems 1 How can we model check of a program or system?

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

Model Checking: An Introduction

Schedule. Logic (master program) Literature & Online Material. gic. Time and Place. Literature. Exercises & Exam. Online Material

A first step towards modeling semistructured data in hybrid multimodal logic

Software Modeling and Verification

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Social Media Mining. Graph Essentials

Software Verification and Testing. Lecture Notes: Temporal Logics

A Propositional Dynamic Logic for CCS Programs

Introduction to Graph Theory

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Midterm Practice Problems

CHAPTER 7 GENERAL PROOF SYSTEMS

Testing LTL Formula Translation into Büchi Automata

(LMCS, p. 317) V.1. First Order Logic. This is the most powerful, most expressive logic that we will examine.

SPARQL: Un Lenguaje de Consulta para la Web

Graph Theory Problems and Solutions

Theoretical Computer Science (Bridging Course) Complexity

Automata-based Verification - I

tutorial: hardware and software model checking

x < y iff x < y, or x and y are incomparable and x χ(x,y) < y χ(x,y).

XML Data Integration

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Journal of Mathematics Volume 1, Number 1, Summer 2006 pp

Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics

Introduction to Software Verification

Generating models of a matched formula with a polynomial delay

Bounded Treewidth in Knowledge Representation and Reasoning 1

On strong fairness in UNITY

Answer: (a) Since we cannot repeat men on the committee, and the order we select them in does not matter, ( )

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Introducing Formal Methods. Software Engineering and Formal Methods

Model Checking II Temporal Logic Model Checking

Bindings, mobility of bindings, and the -quantifier

Development of dynamically evolving and self-adaptive software. 1. Background

MATHEMATICS: CONCEPTS, AND FOUNDATIONS Vol. III - Logic and Computer Science - Phokion G. Kolaitis

2. The Language of First-order Logic

The Model Checker SPIN

Anotherpossibilityistoentersomealreadyknownimplicationsbeforestartingtheexploration.Theseimplications,theuseralreadyknowstobevalid,

DEGREES OF CATEGORICITY AND THE HYPERARITHMETIC HIERARCHY

CS510 Software Engineering

Computational Methods for Database Repair by Signed Formulae

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, Chapter 7: Digraphs

MATHEMATICS Unit Decision 1

University of Ostrava. Reasoning in Description Logic with Semantic Tableau Binary Trees

A computational model for MapReduce job flow

On the Modeling and Verification of Security-Aware and Process-Aware Information Systems

On the k-path cover problem for cacti

Decidable Fragments of First-Order and Fixed-Point Logic

Introduction to Algorithms. Part 3: P, NP Hard Problems

Dedekind s forgotten axiom and why we should teach it (and why we shouldn t teach mathematical induction in our calculus classes)

CODING TRUE ARITHMETIC IN THE MEDVEDEV AND MUCHNIK DEGREES

INTRODUCTORY SET THEORY

Runtime Verification - Monitor-oriented Programming - Monitor-based Runtime Reflection

Model checking test models. Author: Kevin de Berk Supervisors: Prof. dr. Wan Fokkink, dr. ir. Machiel van der Bijl

Formal Verification of Software

ON FUNCTIONAL SYMBOL-FREE LOGIC PROGRAMS

Turing Degrees and Definability of the Jump. Theodore A. Slaman. University of California, Berkeley. CJuly, 2005

Foundational Proof Certificates

The Ultimate Undecidability Result for the Halpern Shoham Logic

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Automated Route Planning for Milk-Run Transport Logistics with the NuSMV Model Checker

Scalable Automated Symbolic Analysis of Administrative Role-Based Access Control Policies by SMT solving

Analysis of Algorithms, I

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Optimizing Description Logic Subsumption

Formal Verification Problems in a Bigdata World: Towards a Mighty Synergy

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software

Explorer's Guide to the Semantic Web

Formalization of the CRM: Initial Thoughts

Notes on NP Completeness

! " # The Logic of Descriptions. Logics for Data and Knowledge Representation. Terminology. Overview. Three Basic Features. Some History on DLs

Using Patterns and Composite Propositions to Automate the Generation of Complex LTL

First-Order Stable Model Semantics and First-Order Loop Formulas

Monitoring Metric First-order Temporal Properties

Transcription:

Path Querying on Graph Databases Jelle Hellings Hasselt University and transnational University of Limburg 1/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 2/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 3/38

Graphs Pieces of data (nodes) Relations between the pieces of data (edges) Example (Social networks) workswith Jan Jelle worksat UHasselt worksat 4/38

Applications XML and RDF, Social networks, Transportation networks, The World Wide Web,... 5/38

Graph Database: Google Maps Nodes: points of interest, addresses,... Edges: road network Queries: Example (Distance based query) university close to <my address> (answer: Universiteit Hasselt; 5.2 km) Example (Route-planning query) From: <my address>, to: <university> (answer: options for university; followed by route) 6/38

Challenges Engineering: big data Storage, distributed processing, hardware failures,... Conceptual: semantics and consistency Structured data (facebook) versus structured? data (the web) Conceptual: data querying Local/navigational based versus graph-wide path based No widely used general purpose languages Current practice: application specific languages 7/38

Challenges Engineering: big data Storage, distributed processing, hardware failures,... Conceptual: semantics and consistency Structured data (facebook) versus structured? data (the web) Conceptual: data querying Local/navigational based versus graph-wide path based No widely used general purpose languages Current practice: application specific languages Our focus Path-based graph querying 7/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 8/38

Motivation Expressing graph-queries Properties of paths, walks,... Route planning We want to travel from our office to a cafetaria and from this cafetaria get back to the office using a different route 9/38

Graph querying Graphs as traditional relations worksat(person, company) Already deep knowledge of these systems First-order based query languages: Q(n) := m workswith(n, m) worksat(m, UHasselt) Largely restricted to local reasoning no paths, no or only limited reachability,... 10/38

Higher-order logics Monadic second-order logic Extend first-order logic with quantification over sets Strong theoretical background Sets with only nodes versus sets with nodes and edges Some graph problems are naturally expressible with sets: Graph coloring, bipartite graph,... S T ( x (x S x T ) (x S = x T ) (x T = x S) y edge(x, y) = ((x S y T ) (y S x T ))) Paths non-straightforward: y is reachable from x S [(x S) u v (u S edge(u, v) = v S) = y S] 11/38

Conjunctive Regular Path Queries Idea Query nodes based on labelling of paths between nodes Express labelling by a regular expression Example Q(a, b) := aπb, (αβ + γδ) + (π) α n 1 n 2 β n 3 γ n 4 δ n 5 γ n 6 α n 7 β n 8 δ 12/38

Conjunctive Regular Path Queries Idea Query nodes based on labelling of paths between nodes Express labelling by a regular expression Example Q(a, b) := aπb, (αβ + γδ) + (π) α n 1 n 2 β n 3 γ n 4 δ n 5 γ n 6 α n 7 β n 8 δ 12/38

Conjunctive Regular Path Queries Idea Query nodes based on labelling of paths between nodes Express labelling by a regular expression Example Q(a, b) := aπb, (αβ + γδ) + (π) α n 1 n 2 β n 3 γ n 4 δ n 5 γ n 6 α n 7 β n 8 δ 12/38

Extended Conjunctive Regular Path Queries Idea Comparing labelling of paths Regular expressions over n-tuples Use special symbol to specify end-of-path Example Q(a, b) := aπ 1 b, aπ 2 b, ([ α β ] + [ α ])(π 1, π 2 ) α n 1 n 2 α n 3 α n 4 β n 6 β n 7 β n 8 α 13/38

Computation tree logic Usage: verification of formal models Describe behaviour by a transition system (graph) Write propositions that should hold Example shutdown buy start idle work crash giveup 14/38

Computation tree logic Usage: verification of formal models Describe behaviour by a transition system (graph) Write propositions that should hold Example shutdown buy start idle work crash giveup Machine never crashes: A G crash 14/38

Computation tree logic Usage: verification of formal models Describe behaviour by a transition system (graph) Write propositions that should hold Example shutdown buy start idle work crash giveup Machine can work without crashing: E G crash 14/38

Hybrid CTL Idea CTL has only implicit paths and nodes Add ability to name nodes in our formulae Example a b We can get from a to b in two different ways: E v1 E F( v2 E X F(b v3 @ v1 E( v 2 U v 3 ))) 15/38

Hybrid CTL Idea CTL has only implicit paths and nodes Add ability to name nodes in our formulae Example v 1 a b We can get from a to b in two different ways: E v1 E F( v2 E X F(b v3 @ v1 E( v 2 U v 3 ))) 15/38

Hybrid CTL Idea CTL has only implicit paths and nodes Add ability to name nodes in our formulae Example v 2 v 1 a b We can get from a to b in two different ways: E v1 E F( v2 E X F(b v3 @ v1 E( v 2 U v 3 ))) 15/38

Hybrid CTL Idea CTL has only implicit paths and nodes Add ability to name nodes in our formulae Example v 2 v 1 v 3 a b We can get from a to b in two different ways: E v1 E F( v2 E X F(b v3 @ v1 E( v 2 U v 3 ))) 15/38

Hybrid CTL Idea CTL has only implicit paths and nodes Add ability to name nodes in our formulae Example v 2 v 1 v 3 a b v 2 v 2 We can get from a to b in two different ways: E v1 E F( v2 E X F(b v3 @ v1 E( v 2 U v 3 ))) 15/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 16/38

Walk Logic Idea: extend first-order logic Add walks Add positions on walks Necessary operators to compare positions Route planning We want to travel from our office to a cafetaria (R) and from this cafetaria get back to the office using a different route (S) R S t 1 R t 2 R u 1 S u 2 S u 3 S (office(t 1 ) t 1 < t 2 cafeteria(t 2 ) u 1 < u 3 < u 2 u 1 t 2 u 2 t 1 t 3 R (t 1 < t 3 < t 2 = t 3 u 3 )) 17/38

Definitions Definition (Directed node-labeled graph) A directed node-labeled graph is a triple G = (N, E, l): N is a finite set of nodes E N N is the set of edges l : N 2 AP is a node-label function 18/38

Walk Logic Walk variables Position variables per walk variable Atomic Formulae a(t) Node referred to by position variable t has labelling a t 1 t 2 Position variables t 1, t 2 refer to the same node t 1 < t 2 Position variable t 1 comes before t 2 in walk W Position variables t 1 and t 2 must be of the same sort ϕ, ψ are formulae ϕ, ϕ ψ W ϕ t W ϕ Negation and disjunction Quantification over walks Quantification over positions on walks 19/38

Semantics: walks? Definition Infinite walk A finite or infinite sequence v 1... of nodes such that (v i, v i+1 ) E for each 1 i v 1... Walk A nonempty finite sequence v 1... v n of nodes such that (v i, v i+1 ) E for each 1 i n Trail A walk without edge repetition Path A walk without node repetition 20/38

Semantics: walks? Definition Infinite walk A finite or infinite sequence v 1... of nodes such that (v i, v i+1 ) E for each 1 i v 1... Walk A nonempty finite sequence v 1... v n of nodes such that (v i, v i+1 ) E for each 1 i n Trail A walk without edge repetition Path A walk without node repetition CTL and Hybrid CTL : primarily infinite walks CRPQs: primarily walks 20/38

Semantics: expressive power Hierarchy of expressive power Infinite walk A walk W is finite: t W u W t < u Walk A walk is a trail (informally): t W u W (t u t +1 u +1 ) = t = u Trail A trail W is a path (informally): t W u W t u = t = u Path Logic Trail Logic Walk Logic Infinite Walk Logic 21/38

Walk-based Graph Properties Example (Hamiltonian Path (in Path Logic)) P Q t Q u P (t u) Example (Eulerian Trail (in Trail Logic)) T Q t Q u T (t u) (t +1 u +1 ) Example (Strongly Connected) P Q t P u Q R v R w R (v < w t v u w) 22/38

Properties on undirected graphs Theorem Weakly Connected is not expressible on directed graphs Proof. n 1 n 2 n 3 n 4 n 5 n 6 All walks contain at most 2 nodes: reduce to first-order logic Direction matters! On undirected graphs: Weakly Connected same way as strongly connected Planar Graph using Kuratowski s Theorem 23/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 24/38

MSO(nodes, edges) and paths Observations Path: sequence of connected edges No node repetition: nodes and positions coincide Node a before node b on path P if and only if Node b is reachable from a using the edges in P Theorem Path Logic MSO(nodes, edges) 25/38

Set-based Graph Properties Theorem Bipartite graph is not expressible on directed graphs Lemma (Dénes Kőnig) A graph is bipartite iff it does not contain an odd cycle Proof. n 2 n 3 m 2 m 3 m 4 n 1 m 1 m 6 m 5 All walks contain at most 3 nodes: reduce to first-order logic MSO(nodes) can express bipartite graph Is Walk Logic strictly subsumed by MSO? 26/38

Eulerian Trail Theorem MSO(nodes, edges) cannot express Eulerian Trail Lemma (well known result) MSO cannot distinguish sets with i from sets with j elements Proof. For MSO: existence of Eulerian Trail in the graph a n v 2. a 1 v 1 a n b m b m. =.. b 1 a 1 b 1 Reduces to sets A and B having the equal number of elements 27/38

Relations with FO and MSO We have FO Path Logic MSO(nodes, edges) Trail Logic, Walk Logic, and Infinite Walk Logic are incomparable with MSO(nodes) and MSO(nodes, edges) Lemma (Courcelle and Engelfriet) MSO(nodes) cannot express Hamiltonian Path Path Logic and MSO(nodes) are incomparable Path Logic Trail Logic 1 Walk Logic Infinite Walk Logic 1 The proof for Trail Logic Walk Logic is omitted but is similar to the proof of Path Logic Trail Logic 28/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 29/38

Review: Hybrid CTL Definition Let a be an atomic proposition, x a node variable, ϕ 1 and ϕ 2 node formulas, ψ 1 and ψ 2 path formulas Node formulas ϕ ::= a x ϕ 1 ϕ 1 ϕ 2 x ϕ 1 @ x ϕ 1 E ψ Path formulas ψ ::= ϕ ψ 1 ψ 1 ψ 1 X ψ 1 ψ 1 U ψ 2 30/38

Translating Hybrid CTL to Walk Logic Idea Node formulas Properties of single node: translate to properties of single position Hybrid extensions Named nodes: translate to named position variables Path formulas Properties on a single path with forward navigation: translate to walk variable; keep track of current position using position variables and < 31/38

Translating Hybrid CTL to Walk Logic Idea Node formulas Properties of single node: translate to properties of single position Hybrid extensions Named nodes: translate to named position variables Path formulas Properties on a single path with forward navigation: translate to walk variable; keep track of current position using position variables and < CTL Hybrid CTL Infinite Walk Logic 31/38

Hybrid CTL Infinite Walk Logic? Theorem Hybrid CTL Infinite Walk Logic Proof. CTL Infinite Walk Logic as CTL is invariant under bisimulation Hybrid CTL Infinite Walk Logic as Hybrid CTL is invariant under generated submodels 32/38

Hybrid CTL Infinite Walk Logic? Theorem Hybrid CTL Infinite Walk Logic Proof. CTL Infinite Walk Logic as CTL is invariant under bisimulation Hybrid CTL Infinite Walk Logic as Hybrid CTL is invariant under generated submodels CTL Hybrid CTL Infinite Walk Logic 32/38

Hybrid CTL Infinite Walk Logic? Theorem Hybrid CTL Infinite Walk Logic Proof. CTL Infinite Walk Logic as CTL is invariant under bisimulation Hybrid CTL Infinite Walk Logic as Hybrid CTL is invariant under generated submodels CTL Hybrid CTL Infinite Walk Logic Walk Logic Infinite Walk Logic 32/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 33/38

CRPQs versus Walk Logics Different languages! Focus on path labelling versus focus on path-structure of graphs All CRPQs are incomparable with all Walk Logics. 34/38

CRPQs versus Walk Logics Different languages! Focus on path labelling versus focus on path-structure of graphs All CRPQs are incomparable with all Walk Logics. Similar semantically questions Similar proof techniques 34/38

Some results Hamiltonian path cannot be expressed Eulerian trail cannot be expressed 35/38

Some results Hamiltonian path cannot be expressed Eulerian trail cannot be expressed Paths versus Walks CRPQ with paths can express queries not expressible in the strongest language with Walks! 35/38

Overview Graph Databases Motivation Walk Logic Relations with FO and MSO Relations with CTL and Hybrid CTL Conjunctive Regular Path Queries Open Problems and Conclusion 36/38

Open Problems Walk Logic versus Infinite Walk Logic: Infinite walks are the standard in verification logics Can we express the verification logics in Walk Logic? Also interesting: finite CTL versus infinite CTL Complexity bounds on model checking for WL: WL model checking is decidable Current approach has horrible complexity 37/38

Conclusion General walk-based reasoning on graphs Relates to practical graph languages Framework for studying expressivity 38/38