Effiziente Suche im Web Vorlesung 12 Querying RDF graphs with SPARQL Sebastian Maneth Universität Leipzig - Sommer 2012
Agenda 2 1. Background on RDF and SPARQL 2. Turtle RDF Syntax 3. SPARQL by example 4. RDF Schema
1. RDF Background 3 RDF: W3C Recommendation February 1998 Revised Recommendations Feb. 2004 Implementors have lead RDF query language development Querying XML (XQuery) developed from Sept 1999
1. RDF Background 4 The main RDF query language styles: SQL-like: SPARQL, RDQL/Squish, SeRQL, RDFDB QL, RQL,... XPath-like: Versa, RDFPath Rules-like: N3QL, Triple, DQL, OWL-QL,... Language-like: Algae2, Fabl, Abeline Using XML: XSLT, XPath, XQuery Most popular by far are the SQL-like languages.
Triple = labeled edge between two nodes (subject, predicate, object) sub 5 pre obj
1. SPARQL Background 6 SPARQL Protocol and RDF Query Language An RDF data access query language Data access means reading information, not writing (updates) Query model: graph patterns (conjunction of triple pattern, using variables) Services running SPARQL queries over a set of graphs A transport protocol for invoking the service
2. Turtle RDF Syntax 7 Turtle = Terse RDF Triple Language (Dave Beckett, Tim Berners-Lee) URIs Enclosed in <> @prefix foo: <http://example.org/ns#> in the style of XML Qnames as a shorthand for the full URI Blank Nodes _:name Literals Literal Literal @language long literal with newlines Datatyped Literals lexical form ^^datatype URI e.g. 10 ^^xsd:integer true ^^xsd:boolean foo:bar expands to http://example.org/ns#bar Node representing a resource for which no URI and no literal is given. (can only be used as subject or object) e.g. John has a friend born on April 21 st ex:john foaf:knows _:p1 _:p1 foaf:birthdate 04-21 (values) maybe be object, but not subject or predicate.
2. Turtle RDF Syntax 8 Triples separated by. :a :b :c. :d :e :f. Common triple subject and predicate :a :b :c, :d. which is the same as :a :b :c. :a :b :d. Common triple subject :a :b :c; :d :e. which is the same as :a :b :c. :a :d :e. Blank node as a object / subject :a :b [ :c :d ] which is the same as :a :b _:x. _:x :c :d
2. Turtle RDF Syntax 9 _:a foaf:name alice. _:b foaf:name bob. _:c foaf:name eve. _:a foaf:knows _:b. _:a foaf:knows _:c. _:c foaf:knows _:a.
3. SPARQL by example 10 SPARQL queries consist of three parts: 1) Pattern matching part optional parts unions nesting filtering 2) Solution modifiers projection distinct order limit offset 3) Output yes/no selection of values construction of new triples description of resources PREFIX SELECT SELECT DISTINCT SELECT REDUCED CONSTRUCT FROM FROM NAMED WHERE LIMIT OFFSET ORDER BY
3. SPARQL by example 11 Simplest query: ask for the existence of a single edge. For instance, is there an edge (Amazon_River, length,?x) in the dbpedia RDF graph? PREFIX prop: <http://dbpedia.org/property/> ASK { <http://dbpedia.org/resource/amazon_river> prop:length?x. } Paste this query at http://dbpedia.org/sparql/ Answer: true
3. SPARQL by example 12 Simplest query: ask for the existence of a single edge. For instance, is there an edge (Amazon_River, length,?x) in the dbpedia RDF graph? Returns Boolean triple pattern PREFIX prop: <http://dbpedia.org/property/> ASK { <http://dbpedia.org/resource/amazon_river> prop:length?x. } Paste this query at http://dbpedia.org/sparql/ Answer: true
3. SPARQL by example 13 triple pattern PREFIX prop: <http://dbpedia.org/property/> ASK { <http://dbpedia.org/resource/amazon_river> prop:length?x. } A triple pattern P is a tuple of the form (IL V) x (I V) x (IL V) where IL= I L and I = IRIs (Internationalized Resource Identifiers) L = Literals V = Variables Let D be an RDF dataset. [[P]] D = { μ dom(μ) = var(p) and μ(p) D } [[(P1 UNION P2)]] D = [[P1]] D [[P2]] D Note IRI s are the extension of URI s to use Unicode = internationalized URI s
3. SPARQL by example 14 Simplest query: ask for a particular value: For instance, what is?x for (Amazon_River, length,?x) in the dbpedia RDF graph? PREFIX prop: <http://dbpedia.org/property/> SELECT?x FROM { <http://dbpedia.org/resource/amazon_river> prop:length?x. } Paste this query at http://dbpedia.org/sparql/ Answer: "6800"^^<http://www.w3.org/2001/XMLSchema#int>
3. SPARQL by example 15 Simplest query: ask for a particular value: For instance, what is?x for (Amazon_River, length,?x) in the dbpedia RDF graph? PREFIX prop: <http://dbpedia.org/property/> ASK { <http://dbpedia.org/resource/amazon_river> prop:length?x. <http://dbpedia.org/resource/nile> prop:length?y. FILTER(?x >?y). } Answer: true
3. SPARQL by example 16 Simplest query: ask for a particular value: For instance, what is?x for (Amazon_River, length,?x) in the dbpedia RDF graph? PREFIX prop: <http://dbpedia.org/property/> ASK { <http://dbpedia.org/resource/amazon_river> prop:length?x. <http://dbpedia.org/resource/nile> prop:length?y. FILTER(?x >?y). } {. FILTER(..). } = Group Graph Pattern Scope of FILTER is the group FILTER can appear anywhere in group (same semantics)
3. SPARQL by example 17 Simplest query: ask for a particular value: What properties/values are known about the Amazon river? PREFIX prop: <http://dbpedia.org/property/> SELECT?p?x WHERE { <http://dbpedia.org/resource/amazon_river>?p?x. } Answer:
18
19 Default semantics is CONJUNCTION: PREFIX foaf: http://xmlns.com/foaf/0.10/ SELECT?name?mbox WHERE {?x foaf:name?name.?x foaf:mbox?mbox } foaf:name?x?na foaf:mbox?mbox [[(P1 AND P2]] D = [[P1]] D Join [[P2]] D Ω 1 Join Ω 2 = { μ 1 μ 2 μ 1 Ω 1, μ 2 Ω 2 are compatible mappings } [[(P1 UNION P2)]] D = [[P1]] D [[P2]] D
20 Example: Arithmetic Filters Data @prefix dc: <http://purl.org/dc/elements/1.1/>. @prefix : <http://example.org/book/>. @prefix ns: <http://example.org/ns#>. :book1 dc:title "SPARQL Tutorial". :book1 ns:price 42. :book2 dc:title "The Semantic Web". :book2 ns:price 23. Query PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT?title?price WHERE {?x ns:price?price. FILTER (?price < 30.5)?x dc:title?title. } Result title price "The Semantic Web" 23
21 Example: String Filters Data @prefix dc: <http://purl.org/dc/elements/1.1/>. @prefix : <http://example.org/book/>. @prefix ns: <http://example.org/ns#>. :book1 dc:title "SPARQL Tutorial". :book1 ns:price 42. :book2 dc:title "The Semantic Web". :book2 ns:price 23. Query PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT?title?price WHERE {?x ns:price?price. FILTER regex(?title, ^SPARQL )?x dc:title?title. } Result title price "The Semantic Web" 23
3. SPARQL by example 22 Simplest query: ask for a particular value: What properties/values are known about the Amazon river? PREFIX prop: <http://dbpedia.org/property/> SELECT?p?x WHERE { <http://dbpedia.org/resource/amazon_river>?p?x. } [[(P FILTER R)]] D = { μ [[P]] D μ ² R } R is an expression over AND, OR, NOT, =, and built-in conditions. μ ² R means that μ satisfies R
23 Value Tests Based on XQuery 1.0 and XPath 2.0 Function and Operators XSD boolean, string, integer, decimal, float, double, datetime Notation <, >, =, <=, >= and!= for value comparison Apply to any type BOUND, isuri, isblank, isliteral REGEX, LANG, DATATYPE, STR (lexical form) Function call for casting and extensions functions
24 OPT - Allows to add information to a mapping. P1 = SELECT?A,?E,?W WHERE ((?A email?e) OPT (?A webpage?w) Select persons with email addresses, and, also include their web page, if it exists.
25 OPT - Allows to add information to a mapping. P1 = SELECT?A,?E,?W WHERE ((?A email?e) OPT (?A webpage?w) Select persons with email addresses, and, also include their web page, if it exists. Ω 1 Join Ω = { μ 1 μ 2 μ 1 Ω 1, μ 2 Ω 2 are compatible mappings } Ω 1 \ Ω 2 = { μ Ω 1 for all μ Ω 2, μ and μ are not compatible } Ω 1 # Ω 2 = (Ω 1 Join Ω 2 ) (Ω 1 \ Ω 2 ) [[(P1 OPT P2)]] D = [[P1]] D # [[P2]] D
26 OPT - Allows to add information to a mapping. P1 = SELECT?A,?E,?W WHERE ((?A email?e) OPT (?A webpage?w) Select persons with email addresses, and, also include their web page, if it exists.
27 OPT - Allows to add information to a mapping. P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) Select all persons and includes their email, then include web pages to those.
28 OPT - Allows to add information to a mapping. P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) Select all persons and includes their email, then include web pages to those.
OPT - Allows to add information to a mapping. How is P3 different from P2? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 29 P3 = SELECT?A,?N,?E,?W WHERE ((?A name?n) OPT ((?A email?e) OPT (?A webpage?w)))
OPT - Allows to add information to a mapping. How is P3 different from P2? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 30 P3 = SELECT?A,?N,?E,?W WHERE ((?A name?n) OPT ((?A email?e) OPT (?A webpage?w))) P3
OPT - Allows to add information to a mapping. What is the result for P4? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 31 P4 = SELECT?A,?N,?E,?W WHERE ((?A name?n) AND ((?A email?e) UNION (?A webpage?w)))
OPT - Allows to add information to a mapping. What is the result for P4? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 32 P4 = SELECT?A,?N,?E,?W WHERE ((?A name?n) AND ((?A email?e) UNION (?A webpage?w)))
OPT - Allows to add information to a mapping. What is the result for P4? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 33 P4 = SELECT?A,?N, WHERE ((?A name?n) AND ((?A email?e) UNION (?A webpage?w)))
OPT - Allows to add information to a mapping. What is the result for P4? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 34 P41 = SELECT DISTINCT?A,?N, WHERE ((?A name?n) AND ((?A email?e) UNION (?A webpage?w))) [[P41]] D =
OPT - Allows to add information to a mapping. What is the result for P4? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 35 P42 = SELECT REDUCED?A,?N, WHERE ((?A name?n) AND ((?A email?e) UNION (?A webpage?w))) [[P42]] D =
OPT - Allows to add information to a mapping. What is the result for P5? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 36 P4 = SELECT?A,?N,?P WHERE ((?A name?n) OPT ((?A phone?p)) FILTER NOT(bound(?P))) μ ² bound(?x) if?x dom(μ)
OPT - Allows to add information to a mapping. What is the result for P5? P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) 37 P4 = SELECT?A,?N,?P WHERE ((?A name?n) OPT ((?A phone?p)) FILTER NOT(bound(?P)))
38 OPT - Allows to add information to a mapping. P2 = SELECT?A,?N,?E,?W WHERE (((?A name?n) OPT (?A email?e)) OPT (?A webpage?w)) P4 = SELECT?A,?N,?P WHERE ((?A name?n) OPT ((?A phone?p)) FILTER NOT(bound(?P))) Note Any graph pattern expression can be transformed into: P1 UNION P2 UNION.. UNION P_n Where P1, P2, P_n are UNION-free.
39 The next 4 slides are by Dieter Fensel and Federico Facca and Ioan Toma (Semantic Web lecture at STI Innsbruck)
PREFIX uni: <http://example.org/uni/> SELECT?name FROM <http://example.org/personal> WHERE {?s uni:name?name.?s rdf:type uni:lecturer } 40 PREFIX Prefix mechanism for abbreviating URIs SELECT Identifies the variables to be returned in the query answer SELECT DISTINCT SELECT REDUCED FROM Name of the graph to be queried FROM NAMED WHERE Query pattern as a list of triple patterns LIMIT OFFSET ORDER BY
41 PREFIX: based on namespaces DISTINCT: The DISTINCT solution modifier eliminates duplicate solutions. Specifically, each solution that binds the same variables to the same RDF terms as another solution is eliminated from the solution set. REDUCED: While the DISTINCT modifier ensures that duplicate solutions are eliminated from the solution set, REDUCED simply permits them to be eliminated. The cardinality of any set of variable bindings in an REDUCED solution set is at least one and not more than the cardinality of the solution set with no DISTINCT or REDUCED modifier. LIMIT: The LIMIT clause puts an upper bound on the number of solutions returned. If the number of actual solutions is greater than the limit, then at most the limit number of solutions will be returned.
42 OFFSET: OFFSET causes the solutions generated to start after the specified number of solutions. An OFFSET of zero has no effect. ORDER BY: The ORDER BY clause establishes the order of a solution sequence. Following the ORDER BY clause is a sequence of order comparators, composed of an expression and an optional order modifier (either ASC() or DESC()). Each ordering comparator is either ascending (indicated by the ASC() modifier or by no modifier) or descending (indicated by the DESC() modifier).
43 Use CONSTRUCT to generate new graphs Rewrite the naming information in original graph by using the foaf:name PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> CONSTRUCT {?x foaf:name?name } WHERE {?x vcard:fn?name } result: #john foaf:name John Smith" #marry foaf:name Marry Smith" @prefix ex: <http://example.org/#>. @prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>. ex:john vcard:fn "John Smith" ; vcard:n [ vcard:given "John" ; vcard:family "Smith" ] ; ex:hasage 32 ; ex:marriedto :mary. ex:mary vcard:fn "Mary Smith" ; vcard:n [ vcard:given "Mary" ; vcard:family "Smith" ] ; ex:hasage 29.
44 SELECT DISTINCT?Author WHERE {?Book rdf:type swrc:book.?book dc:creator?author.?paper swrc:journal?journal.?paper dc:creator?author. } Select all authors that wrote a book and a journal.
45 SELECT DISTINCT?Author WHERE {?Book rdf:type swrc:book.?book dc:creator?author.?paper swrc:journal?journal.?paper dc:creator?author. } Select all authors that wrote a book and a journal. How can we select all Book authors that never published a journal?
46 Note SPARQL is still in the making. SPARQL 1.1 has working draft from Januar 2012. Highlights of that working draft: The new features in SPARQL 1.1 Query are: Aggregates Subqueries Negation Expressions in the SELECT clause Property Paths Assignment A short form for CONSTRUCT An expanded set of functions and operators
47 Aggregates, Expressions in the SELECT clause Example
48 Negation, Example
path syntax constructs. Property Paths 49 Syntax Form iri ^elt!iri or!(iri 1... iri n )!^iri or!(iri 1... iri j ^iri j+1... ^iri n ) (elt) elt1 / elt2 elt1 elt2 elt* elt+ elt? elt{n,m} elt{n} elt{n,} elt{,n} Matches An IRI. A path of length one. Inverse path (object to subject). Negated property set. An IRI which is not one of iri i.!iri is short for!(iri). Negated property set with some inverse properties. An IRi which is no iri j+1...iri n as reverse paths.!^iri is short for!(^iri). A group path elt, brackets control precedence. A sequence path of elt1 followed by elt2. A alternative path of elt1 or elt2 (all possibilities are tried). A path of zero or more occurrences of elt. A path of one or more occurrences of elt. A path of zero or one occurrences of elt. A path of between n and m occurrences of elt. A path of exactly n occurrences of elt. A path of n or more occurrences of elt. A path of between 0 and n occurrences of elt.
50 The next slides about RDF Schema are by Roger L. Costello David B. Jacobs of the MITRE Corporation.
4. RDF Schema 51 The purpose of RDF Schema is to provide an XML vocabulary to: -- express classes and their (subclass) relationships. -- define properties and associate them with classes. The benefit of an RDF Schema is that it facilitates inferencing on your data, and enhanced searching.
4. RDF Schema 52 Is about generating Taxonomies! (class hieararchies) NaturallyOccurringWaterSource Stream BodyOfWater Brook River Tributary Lake Ocean Sea Properties: length: Literal emptiesinto: BodyOfWater Rivulet
4. RDF Schema 53 What inferences can be made with this data? Using the taxonomy of the previous slide. <?xml version="1.0"?> <River rdf:id="yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/water/naturally-occurring#"> <length>6300 kilometers</length> <emptiesinto rdf:resource="http://www.china.org/geography#eastchinasea"/> </River> Yangtze.rdf Inferences are made by examining a taxonomy that contains River. See next slide.
NaturallyOccurringWaterSource 4. RDF Schema 54 Stream BodyOfWater Brook Rivulet River Tributary Properties: length: Literal emptiesinto: BodyOfWater Lake Ocean Sea Inference Engine <?xml version="1.0"?> <River rdf:id="yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/water/naturally-occurring#"> <length>6300 kilometers</length> <emptiesinto rdf:resource="http://www.china.org/geography#eastchinasea"/> </River> Yangtze.rdf Inferences: - Yangtze is a Stream - Yangtze is an NaturallyOcurringWaterSource - http://www.china.org/geography#eastchinasea is a BodyOfWate
How does a taxonomy facilitate searching? 55 NaturallyOccurringWaterSource Stream BodyOfWater Brook River Tributary Lake Ocean Sea Properties: length: Literal emptiesinto: BodyOfWater Rivulet
56 NaturallyOccurringWaterSource Stream BodyOfWater Brook Rivulet River Tributary Properties: length: Literal emptiesinto: BodyOfWater Lake Ocean Sea "Show me all documents that contain info about Streams" Search Engine <?xml version="1.0"?> <River rdf:id="yangtze" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.geodesy.org/water/naturally-occurring#"> <length>6300 kilometers</length> <emptiesinto rdf:resource="http://www.china.org/geography#eastchinasea"/> </River> Yangtze.rdf Results: - Yangtze is a Stream, so this document is relevant to the query.
57 Classes have Properties. Properties may have Subproperties. Classes Stream Brook Rivulet River Tributary length X X X X X emptiesinto X obstacle X estimatedlength officiallength X X X X X X X X X X Properties Property Hierarchy: length "rdfs:subpropertyof" "rdfs:subpropertyof" officiallength estimatedlength
6.1 RDF classes Class name rdfs:resource rdfs:literal rdf:xmlliteral rdfs:class rdf:property rdfs:datatype rdf:statement rdf:bag rdf:seq rdf:alt rdfs:container rdfs:containermembershipproperty rdf:list comment The class resource, everything. The class of XML literals values. The class of classes. The class of RDF properties. The class of RDF datatypes. The class of RDF statements. The class of unordered containers. The class of ordered containers. The class of containers of alternatives. The class of RDF containers. The class of RDF Lists. 58 The class of literal values, e.g. textual strings and integers. The class of container membership properties, rdf:_1, rdf:_2,..., all of which are sub-properties of 'member'.
6.2 RDF properties Property name comment domain 59 range rdf:type The subject is an instance of a class. rdfs:resource rdfs:class rdfs:subclassof The subject is a subclass of a class. rdfs:class rdfs:class rdfs:subpropertyof The subject is a subproperty of a property. rdf:property rdf:property rdfs:domain A domain of the subject property. rdf:property rdfs:class rdfs:range A range of the subject property. rdf:property rdfs:class rdfs:label A human-readable name for the subject. rdfs:resource rdfs:literal rdfs:comment A description of the subject resource. rdfs:resource rdfs:literal rdfs:member A member of the subject resource. rdfs:resource rdfs:resource rdf:first The first item in the subject RDF list. rdf:list rdfs:resource rdf:rest The rest of the subject RDF list after the first item. rdf:list rdf:list rdfs:seealso Further information about the subject resource. rdfs:resource rdfs:resource rdfs:isdefinedby The definition of the subject resource. rdfs:resource rdfs:resource rdf:value Idiomatic property used for structured values (see the RDF Primer for an example of its usage). rdfs:resource rdfs:resource rdf:subject The subject of the subject RDF statement. rdf:statement rdfs:resource rdf:predicate The predicate of the subject RDF statement. rdf:statement rdfs:resource rdf:object The object of the subject RDF statement. rdf:statement rdfs:resource
End of Lecture 12 60