When the Web Meets Programming Languages and Databases: Foundations of XML

Similar documents
Model Checking: An Introduction

A first step towards modeling semistructured data in hybrid multimodal logic

Logic in general. Inference rules and theorem proving

CS510 Software Engineering

Fixed-Point Logics and Computation

XML Data Integration

Path Querying on Graph Databases

Bounded Treewidth in Knowledge Representation and Reasoning 1

Formal Verification of Software

ON FUNCTIONAL SYMBOL-FREE LOGIC PROGRAMS

The Model Checker SPIN

SPARQL: Un Lenguaje de Consulta para la Web

Schedule. Logic (master program) Literature & Online Material. gic. Time and Place. Literature. Exercises & Exam. Online Material

Software Verification and Testing. Lecture Notes: Temporal Logics

Testing LTL Formula Translation into Büchi Automata

COMPUTER SCIENCE TRIPOS

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

(LMCS, p. 317) V.1. First Order Logic. This is the most powerful, most expressive logic that we will examine.

Correspondence analysis for strong three-valued logic

HECTOR a software model checker with cooperating analysis plugins. Nathaniel Charlton and Michael Huth Imperial College London

Course: Introduction to XML

Satisfiability Checking

CHAPTER 7 GENERAL PROOF SYSTEMS

Formal Verification Coverage: Computing the Coverage Gap between Temporal Specifications

Automated Theorem Proving - summary of lecture 1

Non-deterministic Semantics and the Undecidability of Boolean BI

Optimizing Description Logic Subsumption

Handout #1: Mathematical Reasoning

Software Modeling and Verification

Data Structures Fibonacci Heaps, Amortized Analysis

Automata-based Verification - I

COURSE DESCRIPTION FOR THE COMPUTER INFORMATION SYSTEMS CURRICULUM

MetaGame: An Animation Tool for Model-Checking Games

µz An Efficient Engine for Fixed points with Constraints

Foundational Proof Certificates

Lecture 7: NP-Complete Problems

Complexity of Higher-Order Queries

Finite Automata. Reading: Chapter 2

Introduction to Logic in Computer Science: Autumn 2006

Regression Verification: Status Report

logic language, static/dynamic models SAT solvers Verified Software Systems 1 How can we model check of a program or system?

2 Temporal Logic Model Checking

Model Checking II Temporal Logic Model Checking

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

An Approach for Generating Concrete Test Cases Utilizing Formal Specifications of Web Applications

Which Semantics for Neighbourhood Semantics?

A Theorem Prover for Boolean BI

Properties of Stabilizing Computations

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

o-minimality and Uniformity in n 1 Graphs

Generating models of a matched formula with a polynomial delay

A Propositional Dynamic Logic for CCS Programs

Informatique Fondamentale IMA S8

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Transformation Techniques for Constraint Logic Programs with Applications to Protocol Verification

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Cassandra. References:

Full and Complete Binary Trees

x < y iff x < y, or x and y are incomparable and x χ(x,y) < y χ(x,y).

Computational Methods for Database Repair by Signed Formulae

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

NP-Completeness and Cook s Theorem

A Beginner s Guide to Modern Set Theory

University of Ostrava. Reasoning in Description Logic with Semantic Tableau Binary Trees

The Classes P and NP

Lecture 1: Introduction

InvGen: An Efficient Invariant Generator

Graph Theory Problems and Solutions

Jedd: A BDD-based Relational Extension of Java

Complexity Theory. Jörg Kreiker. Summer term Chair for Theoretical Computer Science Prof. Esparza TU München

Borne inférieure de circuit : une application des expanders

Monitoring Metric First-order Temporal Properties

each college c i C has a capacity q i - the maximum number of students it will admit

Logical Agents. Explorations in Artificial Intelligence. Knowledge-based Agents. Knowledge-base Agents. Outline. Knowledge bases

A Workbench for Prototyping XML Data Exchange (extended abstract)

Coverability for Parallel Programs

Updating Action Domain Descriptions

A Note on Context Logic

Querying Past and Future in Web Applications

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

6.080/6.089 GITCS Feb 12, Lecture 3

EQUATIONAL LOGIC AND ABSTRACT ALGEBRA * ABSTRACT

Sudoku puzzles and how to solve them

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Cartesian Products and Relations

Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics

The Course.

Lecture 1: Course overview, circuits, and formulas

Lesson 4 Web Service Interface Definition (Part I)

Resources, process calculi and Godel-Dummett logics

Logic in Computer Science: Logic Gates

XML with Incomplete Information

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

! " # The Logic of Descriptions. Logics for Data and Knowledge Representation. Terminology. Overview. Three Basic Features. Some History on DLs

Union-Find Algorithms. network connectivity quick find quick union improvements applications

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Transcription:

1 / 22 When the Web Meets Programming Languages and Databases: Foundations of XML Pierre Genevès CNRS LIG WAM May 16 th, 2008

For more than half of the fifty years computer programmers have been writing code, O Reilly has provided developers with comprehensive, in-depth technical information. We ve kept pace with rapidly changing technologies as new languages have emerged, developed, and matured. Whether you want to learn something new or need answers to tough technical questions, you ll find what you need in O Reilly books and on the O Reilly Network. This timeline includes fifty of the more than 2500 documented programming languages. It is based on an original diagram created by Éric Lévénez (www.levenez.com), augmented with suggestions from O Reilly authors, friends, and conference attendees. For information and discussion on this poster, go to www.oreilly.com/go/languageposter. 2 / 22 History of Programming Languages 1954 1960 1965 1970 1975 1980 1985 1990 1995 2000 2002 2004 1954 1960 1970 2001 2004 www.oreilly.com Objective CAML (1996) FORTRAN (1954) LISP (1958) C # (ISO) (2003) Java 2 (v1.5) (2003)

3 / 22 What about Data? Often, data is more important than programs (e.g. banks, aeronautical technical documentation, ) How to ensure long-term access to data?

3 / 22 What about Data? Often, data is more important than programs (e.g. banks, aeronautical technical documentation, ) How to ensure long-term access to data? A quite old problem Can we really do better with computers? La pierre de Rosette.

What has not changed for 50 years in Computer Science? 4 / 22

5 / 22 Standards for Data Representation 1963 2008 ASCII

5 / 22 Standards for Data Representation 1963 1998 2008 ASCII XML Before: file format tied to a processor (due to processor-specific instructions) After: markup language for describing (structured) data in itself (independently from processors)

6 / 22 Back in 2008 Now XML is the Lingua franca for communicating on the web (documents, mobiles data, financial data, molecules, architectures ) Two key notions: XML Types: define constraints on children and siblings of nodes using regular expressions XML Queries: expressions for selecting a set of matching nodes (XPath standard)

7 / 22 Processing XML Documents Transform(t, t ) Validate(t) For x in (q) do { } let y=x; Write(y, t ) Validate(t ) Program P 3 Essential Tasks Validation: check that an XML document is valid w.r.t. a given type Navigation/Extraction: select a set of nodes (q: XPath expression) Transformation: build a new document from an existing one

8 / 22 Major Challenge for the Years to Come Data manipulations must be safe and efficient Do not design a n th new programming language for XML processing Ensure that those which are used for this purpose are safe and efficient (whatever the programming language family is) Introduce XML as a first-class citizen in programming languages Key (and hard) problem Reasoning with XML types and (XPath) queries Approach Design static analysis methods for XML processing

9 / 22 Static Analysis for XML Processing: Examples Type T in Transform(t, t ) Validate(t) For x in (q) do { } let y=x; Write(y, t ) Validate(t ) t T in, does P(t) T out? Errors are very difficult to detect (invalid output, empty queries) Huge performance problems at execution (language overuse, tree size) Difficult to check properties on programs (security holes, termination)

9 / 22 Static Analysis for XML Processing: Examples Type T in q T in? Transform(t, t ) Validate(t) For x in (q) do { } let y=x; Write(y, t ) Validate(t ) Errors are very difficult to detect (invalid output, empty queries) Huge performance problems at execution (language overuse, tree size) Difficult to check properties on programs (security holes, termination)

9 / 22 Static Analysis for XML Processing: Examples Type T in? q T in qoptim Transform(t, t ) Validate(t) For x in (q) do { } q let y=x; optim Write(y, t ) Validate(t ) Errors are very difficult to detect (invalid output, empty queries) Huge performance problems at execution (language overuse, tree size) Difficult to check properties on programs (security holes, termination)

9 / 22 Static Analysis for XML Processing: Examples Type T in q T forbidden? Transform(t, t ) Validate(t) For x in (q) do { } let y=x; Write(y, t ) Validate(t ) Errors are very difficult to detect (invalid output, empty queries) Huge performance problems at execution (language overuse, tree size) Difficult to check properties on programs (security holes, termination)

9 / 22 Static Analysis for XML Processing: Examples Type T in q T forbidden?! Forbidden Access Transform(t, t ) Validate(t) For x in (q) do { } let y=x; Write(y, t ) Validate(t ) Errors are very difficult to detect (invalid output, empty queries) Huge performance problems at execution (language overuse, tree size) Difficult to check properties on programs (security holes, termination)

10 / 22 Outline 1. The logical approach for static analysis 2. A logic of finite ordered trees for XML 3. Satisfiability-testing algorithm

11 / 22 The Logical Approach for XML Types/XPath Analysis Define an appropriate logic for reasoning on XML trees Formulate the problem into the logic and test satisfiability q 2 XPath Fragment q 1 XML Types T Translations (ϕ 1 ϕ 2 ) Logic ϕ Yes/No + Satisfiability-Testing Algorithm Advantages Solve problems that are boolean combination of other problems Provide formal proofs of safety properties and bugs Can be used in synthesis: query optimization, counter examples, etc. Requirements for the Logic 1. Expressive enough to capture XPath and XML types but succinct 2. Best complexity 3. Nice algorithmic properties (not always worst case)

12 / 22 Data Model for the Logic XML trees are n-ary trees with one label per node There is a bijective encoding of unranked trees as binary trees 0 1 2 3 3 0 1 2 General Encoding Queries (binary relations on tree nodes) XML Types

13 / 22 Formulas of the L µ Logic: the Holy Grail Programs α {1, 2, 1, 2} for navigating binary trees (α = α) 1 2 L µ ϕ, ψ ::= formula true σ σ atomic prop (negated) S S context (negated) ϕ ψ ϕ ψ disjunction (conjunction) α ϕ α existential (negated) µx.ϕ unary fixpoint (finite recursion) µx i.ϕ i in ψ n-ary fixpoint

14 / 22 Sample Formula and Satisfying Tree a a

14 / 22 Sample Formula and Satisfying Tree a 2 b a b

14 / 22 Sample Formula and Satisfying Tree?? c a 2 b µx. 2 c 1 X a b

14 / 22 Sample Formula and Satisfying Tree?? c a 2 b µx. 2 c 1 X a b Semantics: models of ϕ are finite trees for which ϕ holds at some node XPath and XML types can be translated into the logic, linearly

15 / 22 Outline 1. The logical approach for static analysis 2. A logic of finite ordered trees for XML 3. Satisfiability-testing algorithm

16 / 22 Deciding Satisfiability Is a formula ψ satisfiable? Given ψ, determine whether there exists a tree that satisfies ψ Validity: test ψ Different (more complex) than model-checking Principles: Automatic Theorem Proving Search for a proof tree Build the proof bottom up: if ψ holds then it is necessarily somewhere up

17 / 22 Search Space Optimization Idea: Truth Status is Inductive The truth status of ψ can be expressed as a function of its subformulas For boolean connectives, it can be deduced (truth tables) Only base subformulas really matter: Lean(ψ) Lean(ψ) : 1 2 D E 1 {z } topological propositions D E 2 a b σ 1 ϕ 2 ϕ {z } atomic propositions in ψ {z } existential subformulas A Tree Node: Truth Assignment of Lean(ψ) Formulas D E D E With some additional constraints, e.g. 1 2

18 / 22 Satisfiability-Testing Algorithm: Principles Bottom-up construction of proof tree A set of nodes is repeatedly updated (fixpoint computation)

18 / 22 Satisfiability-Testing Algorithm: Principles Bottom-up construction of proof tree Step 1: all possible leaves are added

18 / 22 Satisfiability-Testing Algorithm: Principles Bottom-up construction of proof tree Step i > 1: all possible parents of previous nodes are added

18 / 22 Satisfiability-Testing Algorithm: Principles 1 ϕ ϕ ϕ D E 2 ϕ Compatibility relation between nodes Nodes from previous step are proof support: α ϕ is added if ϕ holds in some node added at previous step

18 / 22 Satisfiability-Testing Algorithm: Principles η D E b µx.b 2 X {z } η Compatibility relation between nodes Nodes from previous step are proof support: α ϕ is added if ϕ holds in some node added at previous step

18 / 22 Satisfiability-Testing Algorithm: Principles Progressive bottom-up reasoning (partial satisfiability) α ϕ are left unproved until a parent is connected

18 / 22 Satisfiability-Testing Algorithm: Principles α ϕ ψ Termination If ψ is present in some root node, then ψ is satisfiable Otherwise, the algorithm terminates when no more nodes can be added

18 / 22 Satisfiability-Testing Algorithm: Principles ψ Implementation techniques Crucial optimization: symbolic representation

19 / 22 Correctness & Complexity Theorem The satisfiability problem for a formula ψ L µ is decidable in time 2 O(n) where n = Lean(ψ). Theorem Translations of XPath and XML types into the logic are linear. Corollary Decision problems involving XPath and types (e.g. typing, containment, emptiness, equivalence) can be decided in time 2 O(n). System fully implemented: solver + XPath & XML types compilers [PLDI 07]

20 / 22 Overview of Experiments DTD Symbols Binary type variables SMIL 1.0 19 11 XHTML 1.0 Strict 77 325 Table: Types used in experiments. XPath decision problem XML type Time (ms) e 1 e 2 and e 2 e 1 none 353 e 4 e 3 and e 4 e 3 none 45 e 6 e 5 and e 5 e 6 none 41 e 7 is satisfiable SMIL 1.0 157 e 8 is satisfiable XHTML 1.0 2630 e 9 (e 10 e 11 e 12 ) XHTML 1.0 2872 Table: Some decision problems and corresponding results. For the last test, size of the Lean is 550. The search space is 2 550 10 165 more than the square number of atoms in the universe 10 80

21 / 22 Tree Logics: an Overview 1968 1977 1981 1983 2006 WS2S PDL (tree) CTL µ-calculus (for trees) L µ (forward & backward) Expr Power: MSO Sat: Hyperexponential? (<MSO) EXPTIME FO EXPTIME MSO EXPTIME MSO 2 O(n)

22 / 22 Extending the tree logic Future Works Decidable counting constraints and data-value comparisons Higher order XML types (web services) Logic for graphs (semantic web) Some applications XML code optimization / security XML databases C/Java code analysis (Bohne, PALE) Typing of (reconfigurable) component based languages (FScript/Sardes) Binary Trees (Desert of California).