|
|
|
- Kerrie Gibbs
- 10 years ago
- Views:
Transcription
1 Dan Suciu AT&T Labs Labs 1 AT&T
2 How the Web is Today HTML documents all intended for human consumption many are generated automatically by applications Labs 2 AT&T
3 applications consuming HTML documents are out there Netbot): brittle technology (Junglee, the new Web standard XML makes data exchange between radically simpler applications Paradigm Shift on the Web { XML generated by applications { XML consumed by applications data exchange { across platforms (enterprise inter-operability) { across enterprises Web shifts from collection of documents to data and documents Labs 3 AT&T
4 Database Community Can Help Relevant elds: query optimization, query processing views, transformations data warehouses, data integration systems mediators, query rewriting secondary storage, indexes Labs 4 AT&T
5 But it Needs a Paradigm Shift Too Web data is dierent from database data: self-describing, schema-less structure changes without notice heterogeneous, deeply nested, irregular documents and data mixed together accessed through limited capabilities distributed, linked XML was designed by document experts, not database experts! Need to extend data management to Web data management Labs 5 AT&T
6 What This Tutorial is About what the database community has done: semistructured data { data model: OEM { query languages: Lore, UnQL, StruQL, etc. { schemas what the Web community has done: { data format: XML { transformation language: XSL { schemas where they meet and where they dier Labs 6 AT&T
7 Outline Semistructured data and XML Query languages Schemas Systems issues Conclusions Labs 7 AT&T
8 Part 1 Semistructured Data and XML Labs 8 AT&T
9 Semistructured Data Origins: integration of heterogeneous sources data sources with non-rigid structure biological data Web data Labs 9 AT&T
10 Semistructured Data Model The Bib &o1 paper book paper author title references references references author author year author publisher author http author title &o12 &o24 &o29 title page "..." &o43 &o96 &o "..." "..." "..." "..." "..." "..." firstname lastname first last &o243 &o206 "Serge" "Abiteboul" "Vianu" "Victor" 8 < - complex objects: &o1, &o12,... Object Exchange Model (OEM): : - atomic objects: &o243, &o206,... Semistructured data is an edge-labeled graph Labs 10 AT&T
11 the data: Serializing f paper: &o12 f... g, Bib:&o1 Syntax for Semistructured Data book: &o24 f... g, paper: &o29 f author: &o52 "Abiteboul", author: &o96 f rstname: &o243 "Victor", lastname: &o206 "Vianu"g, title: &o93 "Regular path queries with constraints", references: &o12, references: &o24, page: &o25 f rst: &o64 122, last: &o92 133ggg Labs 11 AT&T
12 omit oid's: May paper: f Syntax for Semistructured Data f author: "Abiteboul", author: f rstname: "Victor", lastname: "Vianu"g, title: "Regular path queries with constraints", page: frst: 122, last: 133ggg Labs 12 AT&T
13 Characteristics of Semistructured Data missing or additional attributes multiple attributes attributes with dierent types in dierent objects heterogeneous collections self-describing, irregular data, no a priori structure Labs 13 AT&T
14 Comparison of the Semistructured and Relational Model row row row name phone name phone name phone name phone John 3634 "John" 3634 "Sue" 6343 "Dick" 6363 Sue 6343 f row : fname: "John", phone: 3634g, Dick 6363 row : fname: "Sue", phone: 6343g, row : fname: "Dick", phone: 6363g g schema components become labels in semistructured data Labs 14 AT&T
15 standard approved by W3C, to complement HTML New structured text (SGML) Origins: XML Motivation: HTML describes presentation XML describes content XML v.s. HTML [Bosak'97] { users dene new tags { arbitrary nesting { validation is possible Labs 15 AT&T
16 HTML: Bibliography </h1> <h1> From HTML to XML <i> Foundations of Databases </i>, Abiteboul, Hull, Vianu <p> Addison Wesley, 1995 <br> <i> Data on the Web </i>, Abiteboul, Buneman, Suciu <p> Morgan Kaufmann, <br> HTML describes the presentation Labs 16 AT&T
17 Basics XML <book> <title> Foundations of Databases </title> <bibliography> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book>... </bibliography> XML describes the content Labs 17 AT&T
18 XML Terminology tags: book, title, author,... start tag: <book> end tag: </book> elements: <book>... </book>, or <author>... </author> elements are nested empty elements: <red></red>, or <red/> an XML document consists of a single element called root element a well-formed XML document is one with correctly matching tags Labs 18 AT&T
19 XML: Attributes More price="55" currency="usd"> <book <title> Foundations of Databases </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> attributes price, currency, values "55", "USD" attributes are alternative ways to represent same data Labs 19 AT&T
20 XML: Oid's and References More id="o555"> <name>jane</name> <person <person id="o456"> <name>mary</name> <person id="o123" mother="o456"> <children idref="o123 o555"/> </person> </person> <name>john</name> </person> oids and references in XML are just syntax Labs 20 AT&T
21 XML Data Model does not exists closest approximation: Document Object Model (DOM): { class hierarchy (node, element, attribute,...) { object model, not data model: objects have behavior { denes API to inspect/modify the document Labs 21 AT&T
22 of XML and Semistructured Data Comparison id="o123"> f person: &o123 <person <name> Alan </name> f name: "Alan", <age> 42 </age> age: 42, < > [email protected] </ > "[email protected]"gg <person> <person father="o123">... f person: f father &o123...g person </person> father person father g name age name age Alan 42 [email protected] Alan 42 [email protected] labels on nodes v.s. on edges: trees easy to convert, graphs harder Labs 22 AT&T
23 XML can mix text and elements: { Making Java easier to type and easier to type <talk> <speaker>phil Wadler</speaker> Comparison of XML and Semistructured Data similarities: { both described best by a graph { both are schema-less, self-describing dierences: { XML is ordered, ssd is not </talk> { XML has other stu (processing instructions, comments) Labs 23 AT&T
24 Part 2 Query Languages Semistructured data and XML Query languages Schemas Systems issues Conclusions Labs 24 AT&T
25 Query Languages: Outline For Semistructured Data: { Lorel (coercions, regular path expressions) { UnQL (patterns, constructors) { Skolem Functions For XML: XML-QL A dierent paradigm: { Structural Recursion { XSL Labs 25 AT&T
26 Lorel Part of the Lore system (Stanford) Lorel adapts OQL to semistructured data select X.title from Bib.paper X where X.year > 1995 Abbreviated to: select Bib.paper.title where Bib.paper.year > 1995 Labs 26 AT&T
27 Lorel v.s. OQL implicit coercions ( 1995 v.s. "1995") missing attributes (empty answer v.s. type error) set-valued attributes ( X.year>1995: may be several years) regular path expressions (next) Labs 27 AT&T
28 Lorel Regular Path Expressions select X.title from Bib.paper X, Bib.(paperjbook) Y where Y.author.lastname? = "Ullman" and Y.reference+ X Useful for: syntactic substitute for inheritance: paperjbook navigating partially known structures: lastname? transitive closure: reference+ Labs 28 AT&T
29 UnQL Pattern Matching: select T where Bib.paper: f title: T, year: Y, journal: "TODS"g and Y > 1995 Labs 29 AT&T
30 looks like: Results result: f fn:"john", ln:"smith", f UnQL Simple Constructors select result:f fn: F, ln: L, pub: f title: T, year: Ygg where Bib.paper: f title: T, author: frstname: F, lastname: Lg, year: Yg and Y > 2000 pub:ftitle:"p equals NP", year: 2005gg, result: f fn:"joe", ln:"doe", pub:ftitle:"errata to P=NP", year: 2006gg,...g Labs 30 AT&T
31 languages for semistructured data have Skolem functions: Many StruQL, Yat, Florid. MSL, Skolem Functions Skolem Functions in databases: Maier, 1986: in OO systems. Kifer et al, 1989: F-logic. Hull and Yoshikawa, 1990: in deductive systems (ILOG). Papakonstantinou et al., 1996: in semistructured data (MSL). Labs 31 AT&T
32 Skolem Functions Strudel: a Web Site Management System StruQL: it's query language where Root! "Bib"! X, X! "paper"! P, P! "author"! A, P! "title"! T, P! "year"! Y create Root(), HomePage(A), YearPage(A,Y), PubPage(P) link Root()! "person"! HomePage(A), HomePage(A)! "yearentry"! YearPage(A,Y), YearPage(A,Y)! "publication"! PubPage(P), PubPage(P)! "author"! HomePage(A) PubPage(P)! "title"! T Labs 32 AT&T
33 Functions Skolem Root() person person person HomePage("Smith") HomePage("Jones") HomePage("Mark") yearentry yearentry author author yearentry yearentry yearentry author YearPage("Smith",1994) YearPage("Smith",1996) YearPage("Jones",1994)YearPage("Jones",1996) YearPage("Mark",1996) publication author publication publication author publication PubPage("The Comma") PubPage("The Dot") PubPage("Experience with Comma Categories") title title title Labs 33 AT&T
34 relies on the edge-labeled semistructured data model - suers model mismatch from A query language for XML: XML-QL comes from the semistructured data community features: { regular path expressions { patterns { simple constructors { Skolem Functions plus: XML syntax Labs 34 AT&T
35 where <book language="french"> <publisher><name>morgan Kaufmann</name> Pattern Matching in XML-QL </publisher> <title> $t </title> <author> $a </author> </book> in " construct $a Labs 35 AT&T
36 Simple Constructors in XML-QL where language=$l> <book $a </> <author> IN " </> <result> <author> $a </> construct <lang> $l </> </> Note: </> abbreviates </book> or </result>, etc. Groups titles by authors: <result> <author>smith</author> <lang>english</lang> </result> <result> <author>smith</author> <lang>mandarin</lang> </result>... Labs 36 AT&T
37 Skolem Functions in XML-QL <paperjbook> <title> $t </> where $a </> <author> IN " </> <result id=f($a)> <author> $a </> construct <t> $t </> </> <result> <author>joe</author> <t><t/> <t><t/> </result> <result> <author>jim</author> <t><t/> <t></t> <t></t> </result> <result> <author>jan</author> <t><t/> </result>... Labs 37 AT&T
38 as sets, with union operator: Data a:fb:"one",c:5g, b:4g = fa:3g U fa:fb:"one",c:5g, b:4g fa:3, fresult:3, result:5, result:4g A Dierent Paradigm: Structural Recursion = fa:3g U fa:fb:"one",c:5gg U fb:4g Example: retrieve all integers in the data. f(v) = if isint(v) then f"result": vg else fg f(fg) = fg f(fl: tg) = f(t) f(t1 U t2) = f(t1) U f(t2) Answer: standard textbook programming on trees Labs 38 AT&T
39 Structural Recursion with Multiple Functions Example: increase all engine's prices by 10%. f engine:f part: fprice: 100g, f engine:f part: fprice: 110g, price: 1000g, price: 1100g, f! body: f part: fprice: 100g, body: f part: fprice: 100g, price: 1000gg price: 1000gg f(v) = v g(v) = v g(fg) = fg f(fg) = fg f(fl: tg) g(fl: tg) = if l = "engine" = if l = "price" then fl: g(t)g then fl: 1.1*tg else fl: g(t)g else fl: f(t)g g(t1 U t2) = g(t1) U g(t2) f(t1 U t2) = f(t1) U f(t2) Labs 39 AT&T
40 limited expressive power, e.g. no joins (but continues to be changed) XSL a proposal at W3C (soon a standard) implemented in IE 5.0 purpose: stylesheet specication language: { stylesheet: XML! HTML { more general: XML! XML perfectly integrated with XML still changing... Labs 40 AT&T
41 <xsl:template match="/bib/*/title"> XSL: Templates and Rules a query is collection of template rules a template rule has a match pattern and a template Example: retrieve all book titles in the bibliography database: <xsl:template> <xsl:apply-templates/> </xsl:template> <result> <xsl:value-of/> </result> </xsl:template> Labs 41 AT&T
42 Path Expressions in XSL bib matches a bib element * matches any element / matches the root element /bib matches a bib element immediately after root bib/paper matches a paper following a bib bib//paper matches a paper following a bib at any depth //paper matches a paper at any depth paper j book matches a paper or a matches a price attribute bib/book/@price the price attribute of a book Labs 42 AT&T
43 <xsl:template match="a"> <A> <xsl:apply-templates/> </A> <xsl:template match="b"> <B> <xsl:apply-templates/> </B> <xsl:template match="c"> <C> <xsl:value-of/> </C> Flow Control in XSL <xsl:template> <xsl:apply-templates/> </xsl:template> </xsl:template> </xsl:template> </xsl:template> Labs 43 AT&T
44 </b> <a> </e> 4 </c> <c> 1 </c> <c> 2 </c> <c> & % <B> <C> 1 </C> <A> 2 </C> <C> </B> <C> 3 </C> <A> </A> 4 </C> <C> $ % ' $ <a> <e> <b> '! <c> 3 </c> </a> </A> </a> & Labs 44 AT&T
45 query = a single structural recursion function XSL query with modes = multiple functions XSL XSL is Structural Recursion Equivalent to: f(v) = v f(fg) = fg f(fl:tg) = if l = "c" then f"c":tg else if l = "b" then f"b":f(t)g else if l = "a" then f"a":f(t)g else f(t) f(t1 U t2) = f(t1) U f(t2) Labs 45 AT&T
46 <xsl:template match="e"> <xsl:apply-patterns select="/"> Comparison of XSL and Structural Recursion applicability: { structural recursion: on arbitrary graphs { XSL: on trees only termination: { structural recursion: always terminates { XSL: may run forever. Add this to previous program </xsl:template> Stack overow on IE 5.0 Labs 46 AT&T
47 powerful features: regular expressions, constructors, Skolem structural recursion Functions, XSL intended to be a style sheet language, can serve as query substitute language Summary of Query Languages studied extensively in the context of semistructured data no standard query language for XML yet control ow in XSL is structural recursion Labs 47 AT&T
48 Part 3 Schemas Semistructured data and XML Query languages Schemas Systems issues Conclusions Labs 48 AT&T
49 Highlights of Schemas in Semistructured Data why? { to describe semantics { to improve data processing here lies our interest what? { many proposals (both in ssd and in the W3C) { this tutorial: show two basic formalisms, and DTDs when? { a posteriori { RDBMS: need schema a priori to interpret binary data how? { independent from the data (may have several schemas!) { RDF, DCD: schema hardwired with the data Labs 49 AT&T
50 &r1 Schemas: An Example person company person company person manages employee c.e.o. works-for &p1 &c1 &p2 &c2 &p3 works-for works-for c.e.o. name name name name name position address Phone address position url &s0 &s1 &s2 &s3 &s4 &s5 &s6 &s7 &s8 &s9 "Smith" "Manager" "Widget" "Trenton" "Jones" "Gaget" "Paris" "Dupont" " " &s10 "Sales" description description " &a1 &a4 performance procurement &a5 salesrep task 1998 &a contact &a3 &a7 &a6 "below target" "on target" some well-dened classes: companies, persons some irregular data Labs 50 AT&T
51 Lower-Bound Schemas Shows required labels Root company person works-for Company c.e.o. Employee managed-by name address name string Labs 51 AT&T
52 Upper Bound Schemas Shows permitted labels. Root manages person employee company Person c.e.o. worksfor Company description name phone position name address url Any string _ Called Schema Graphs in [Buenman et al'97] Labs 52 AT&T
53 The Two Questions to Ask Conformance: does that data conform to this schema? Classication: if so, then which objects belong to what classes? Labs 53 AT&T
54 Using Unary Datalog on Lower-Bound Schemas [Nestorov et. al 1997] Unary Datalog Types Root(X) :- ref(x,company,c), Company(C), ref(x,person,m), Employee(M) Employee(E) :- ref(e,works-for,c), Company(C), EDBs: ref, string ref(e,managed-by,m), Employee(M) IDBs: ref(e,name,n), string(n) Root, Employee, Company Company(C) :- ref(c,ceo.,e), Employee(E), ref(c,name,n), string(n), ref(c,address,a), string(a) Conformance: check Root(&r1) Classication: check Employee(&p2), Company(&c1), etc Need greatest xpoint semantics for Datalog! Labs 54 AT&T
55 Denition Given two edge-labeled graphs G 1 G 2 if 1 x 2 ) 2 R, then for every edge x a 1! y 1 in G 1 a (x 2 in G 2 (same label) y! there exists an edge x 2...G 2 can do too! compute simulation quite eciently: O(j G 1 j j G 2 j). can Henzinger, and Kopke'95] [Henzinger, Graph Simulation a simulation is a relation R on their nodes s.t.: s.t. (y 1 y 2 ) 2 R. Notation: G 1 R G 2 x R x 1 2 l l y 1 R y 2 what G 1 can do... G 1 G 2 Labs 55 AT&T
56 Using Simulation Given data instance D, schema S Upper bound schemas: { conformance: check D R S { classication: check (x c) 2 R Lower bound schemas: { conformance: check S R D { classication: check (c x) 2 R where R is the largest simulation For lower bound schemas: same as unary datalog Labs 56 AT&T
57 is an ecient way to decide conformance and compute Simulation classication object Conformance and Classication &r1 Root Root person company company Company works-for c.e.o. person Employee manages person company person company person manages employee c.e.o. works-for &p1 &c1 &p2 &c2 &p3 works-for works-for c.e.o. name name name name name position address Phone address position url &s0 &s1 &s2 &s3 &s4 &s5 &s6 &s7 &s8 &s9 manages description Person employee c.e.o. worksfor name phone position Company name address url name address name "Smith" "Manager" "Widget" "Trenton" "Jones" "Gaget" "Paris" "Dupont" " " &s10 "Sales" description Any string Lower Bound (Unary Datalog) description " &a1 &a4 performance procurement &a5 salesrep task &a contact &a3 &a6 "on target" Semistructured Data Instance 1998 &a7 "below target" _ Upper Bound (Schema Graph) Labs 57 AT&T
58 Application 1: Improve Secondary Storage Root Company company person oid name address c.e.o. works-for Company name address c.e.o. Employee name managed-by Employee oid name managed-by works-for string Store rest in overow repository Lower-Bound Schema Labs 58 AT&T
59 Application 2: Query Optimization Bib select X.title from Bib. X paper book where X.*.zip = \12345" year journal title address author title + int string string string X.title select zip city street last name first name string string string string string from Bib.book X where X.address.zip= \12345" Upper-Bound Schema Labs 59 AT&T
60 Schema Extraction from Data Problem statement: given data instance D nd the \most specic" schema S for D in practice: may be too large, need to relax (approximate, etc) Labs 60 AT&T
61 Schema Extraction Example data: r employee employee employee employee employee employee employee employee manages manages manages manages company manages &p1 managedby &p2 &p3 &p4 &p5 &p6 &p7 managedby &p8 worksfor managedby managedby name phone position position managedby position name name name name name name position name "Jones" "6666" "Smith" "Joe" "Marketing" "Dupont" "Gaston" "Sales" "Gonnet" worksfor "Jack" "IT" "Fred" "IT" &c worksfor name worksfor worksfor worksfor worksfor worksfor "Widget Trenton" Labs 61 AT&T
62 data guide is precisely the most specic (deterministic) the schema upper-bound Extracting Upper Bound Data Guide [Goldman and Widom 1997] Given data instance D: a data guide S for D is a graph s.t.: accuracy D and S have the same sets of paths conciseness every path in S is unique construction of S: powerset automaton for D implemented in Lore Labs 62 AT&T
63 Root &r employee Data Guide company Employees &p1, &p2, &p3, &p4 &p5, &p6, &p7, &p8 worksfor managedby name position phone manages Company &c worksfor managedby Boss &p1, &p4, &p6 name phone manages Regular &p2, &p3, &p5 &p7, &p8 name position name worksfor Data guides are deterministic Labs 63 AT&T
64 Extracting the Lower Bound [Nestorov et al., 1998] construction: { build \most specic" datalog program: each node = a class { run the program { collapse equal classes alternatively: compute the simulation equivalence classes Root company employee manages employee employee employee manages employee manages Boss1 Regular1 Regular2 Boss2 Regular3 managedby worksfor name phone managedby position managedby name name name name position worksfor worksfor Company worksfor worksfor name Labs 64 AT&T
65 Schema Inference Problem statement: given query Q nd most specic schema S s.t. { for every D, Q(D) conforms to S related to type inference in functional programming languages Labs 65 AT&T
66 Schema Inference where Root! "Bib"! X, X! "paper"! P, P! "author"! A, P! "title"! T, P! "year"! Y create Root(), HomePage(A), YearPage(A,Y), PubPage(P) link Root()! "person"! HomePage(A), HomePage(A)! "yearentry"! YearPage(A,Y), YearPage(A,Y)! "publication"! PubPage(P), PubPage(P)! "author"! HomePage(A) PubPage(P)! "title"! T Labs 66 AT&T
67 Inference Schema Root() Root() person person person HomePage("Smith") HomePage("Jones") HomePage() yearnetry yearnetry author author yearnetry yearnetry yearnetry author YearPage("Smith",1994) YearPage("Smith",1996) YearPage("Jones",1994)YearPage("Jones",1996) YearPage() publication author author publication PubPage("Moving the Comma") PubPage("Moving the Dot") PubPage() title title title Some data instance Q(D) Inferred schema Labs 67 AT&T
68 Schemas in XML Document Type Descriptor (DTD) optionally accompanies an XML document species a grammar has only limited use as a schema Labs 68 AT&T
69 ] > <!ELEMENT section ((title,section*) j text)> DTD's as Grammars <!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)> <paper> <section> <text> </text> </section> <section> <title> </title> <section>...</section> <section>...</section> </section> </paper> Labs 69 AT&T
70 DTD's as Schemas Not so well suited: impose unwanted restrictions on order: <!ELEMENT person (name,phone)> means that <name> must occur before <phone> problems with references: cannot restrict their range can be too vague: <!ELEMENT person ((namejphonej )*)> what combinations of attributes should one expect? Labs 70 AT&T
71 all proposals the XML documents must make explicit reference in the schema to Other Schemas Proposed for XML Currently considered by the W3C (no clear winner): XML-Data RDF Schema DCD ::: Characteristics: object-oriented avor (classes, subclasses) constraints Labs 71 AT&T
72 in SS data: data and schema are decoupled ) schema from data extraction in SS data: the emphasis is on using schema information in processing (optimization, storage) data in XML: schemas range from grammars (DTD) to object-oriented Summary of Schemas in XML: emphasis is on semantics Labs 72 AT&T
73 Part 4 Systems Issues Semistructured data and XML Query languages Schemas Systems issues Conclusions Labs 73 AT&T
74 Systems Issues: Outline servers for semistructured data mediators for semistructured data Labs 74 AT&T
75 Servers for Semistructured Data secondary storage: { text le (XML) { OO repository (need DTD): POQL [Christophides et al. 94], ObjectDesign's excelon, Bluestone, etc. { extend OO: Ozone { build object repository from scratch (Lore) { as ternary relation: [Florescu and Kossmann '99] { from data mining to relational schema [Deutsch et al. 99] indexes { deal with dierent atomic types: 1995, "1995" (Lore) { compute path expressions: Region Algebras (structured text), data guides, T-indexes, XSet query processing and optimization (Lore) Labs 75 AT&T
76 query optimization in the presence of limited source { (nding feasible plans) [Yerneni et al'99, capabilities query composition and rewriting [Papakonstantinou et { Papakonstantinou&Vassalos'99, Deutsch et al'99, al'96, Mediator for Semistructured Data XML = a view over Relational/OO/OR data mediators: { translation between models, e.g. Relational $ XML { integration issues: Florescu et al'99] Calvanese et al'99] { data conversion [Cluet et al'98] Labs 76 AT&T
77 Part 5 Conclusions Semistructured data and XML Query languages Schemas Systems issues Conclusions Labs 77 AT&T
78 not covered by the tutorial: transactions, limited source [Yerneni et al'99], lots of W3C standard proposals capabilities Summary data on the Web { at the intersection of semistructured data and structured text (XML) { highlights cultural mismatch: db community and document community paradigm shift, for both Web and db: { Web shifts from document to data { db shifts from structured to irregular, self-describing data covered by this tutorial: data models, queries, schemas (XML-Pointer, XML-Link, RDF, RDF-Schema, DCD) Labs 78 AT&T
79 XML object repository, indexes, query needed: data mining processing/optimization, What Technologies are Needed Web applications possible today: { servers store/process data in traditional ways { applications exchange data in XML (and queries too!) { XML = view over structured databases needed: mediator technology, good compression (not there yet) Web applications in the future: { store/process native XML data { mine/analyze XML data Labs 79 AT&T
80 Why This Is Cool for Database Researchers put to work all those techniques you teach in CS101! { recursive tree traversal (structural recursion, XSL) { automata theory (DTD's, path expressions) { graph theory (simulation, bisimulation, graph morphism) adapt old db techniques to handle the new kind of data make the world a better place (from FAX to XML)! The End Labs 80 AT&T
Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007
Introduction to XML Yanlei Diao UMass Amherst Nov 15, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly
AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE
THE UNIVERSITY OF SHEFFIELD DEPARTMENT OF COMPUTER SCIENCE RESEARCH MEMORANDA CS-03-08 MPHIL/PHD UPGRADE REPORT AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE SUPERVISORS:
Modern Databases. Database Systems Lecture 18 Natasha Alechina
Modern Databases Database Systems Lecture 18 Natasha Alechina In This Lecture Distributed DBs Web-based DBs Object Oriented DBs Semistructured Data and XML Multimedia DBs For more information Connolly
Managing large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob ([email protected]) ABSTRACT
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?
Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090
Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS
Relational Databases for Querying XML Documents: Limitations and Opportunities Jayavel Shanmugasundaram Kristin Tufte Gang He Chun Zhang David DeWitt Jeffrey Naughton Outline Motivation and Problem Definition
Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu
Semistructured data and XML Institutt for Informatikk 1 Unstructured, Structured and Semistructured data Unstructured data e.g., text documents Structured data: data with a rigid and fixed data format
Department of Computer and Information Science, Ohio State University. In Section 3, the concepts and structure of signature
Proceedings of the 2nd International Computer Science Conference, Hong Kong, Dec. 1992, 616-622. 616 SIGNATURE FILE METHODS FOR INDEXING OBJECT-ORIENTED DATABASE SYSTEMS Wang-chien Lee and Dik L. Lee Department
II. PREVIOUS RELATED WORK
An extended rule framework for web forms: adding to metadata with custom rules to control appearance Atia M. Albhbah and Mick J. Ridley Abstract This paper proposes the use of rules that involve code to
Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling
XML (extensible Markup Language) Nan Niu ([email protected]) CSC309 -- Fall 2008 DHTML Modifying DOM Event bubbling Applets Last Week 2 HTML Deficiencies Fixed set of tags No standard way to create new
A Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
Web Data Management - Some Issues
Web Data Management - Some Issues Properties of Web Data Lack of a schema Data is at best semi-structured Missing data, additional attributes, similar data but not identical Volatility Changes frequently
An XML Based Data Exchange Model for Power System Studies
ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical
XML WEB TECHNOLOGIES
XML WEB TECHNOLOGIES Chakib Chraibi, Barry University, [email protected] ABSTRACT The Extensible Markup Language (XML) provides a simple, extendable, well-structured, platform independent and easily
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
Databases in Organizations
The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
Combining Unstructured, Fully Structured and Semi-Structured Information in Semantic Wikis
Combining Unstructured, Fully Structured and Semi-Structured Information in Semantic Wikis Rolf Sint 1, Sebastian Schaffert 1, Stephanie Stroka 1 and Roland Ferstl 2 1 {firstname.surname}@salzburgresearch.at
NETMARK: A SCHEMA-LESS EXTENSION FOR RELATIONAL DATABASES FOR MANAGING SEMI-STRUCTURED DATA DYNAMICALLY
NETMARK: A SCHEMA-LESS EXTENSION FOR RELATIONAL DATABASES FOR MANAGING SEMI-STRUCTURED DATA DYNAMICALLY David A. Maluf, NASA Ames Research Center, Mail Stop 269-4, Moffett Field, California, USA Peter
Integrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks
Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Ramaswamy Chandramouli National Institute of Standards and Technology Gaithersburg, MD 20899,USA 001-301-975-5013 [email protected]
Chapter 3: XML Namespaces
3. XML Namespaces 3-1 Chapter 3: XML Namespaces References: Tim Bray, Dave Hollander, Andrew Layman: Namespaces in XML. W3C Recommendation, World Wide Web Consortium, Jan 14, 1999. [http://www.w3.org/tr/1999/rec-xml-names-19990114],
XML Processing and Web Services. Chapter 17
XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing
OBJECT ORIENTED EXTENSIONS TO SQL
OBJECT ORIENTED EXTENSIONS TO SQL Thomas B. Gendreau Computer Science Department University Wisconsin La Crosse La Crosse, WI 54601 [email protected] Abstract Object oriented technology is influencing
Graph-Based Linking and Visualization for Legislation Documents (GLVD) Dincer Gultemen & Tom van Engers
Graph-Based Linking and Visualization for Legislation Documents (GLVD) Dincer Gultemen & Tom van Engers Demand of Parliaments Semi-structured information and semantic technologies Inter-institutional business
10CS73:Web Programming
10CS73:Web Programming Question Bank Fundamentals of Web: 1.What is WWW? 2. What are domain names? Explain domain name conversion with diagram 3.What are the difference between web browser and web server
The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014
The MongoDB Tutorial Introduction for MySQL Users Stephane Combaudon April 1st, 2014 Agenda 2 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability
CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services 1995 2008 Skilltop Technology Limited. All rights reserved.
CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services 1995 2008 Skilltop Technology Limited. All rights reserved. Opening Night Course Overview Perspective Business
XML. CIS-3152, Spring 2013 Peter C. Chapin
XML CIS-3152, Spring 2013 Peter C. Chapin Markup Languages Plain text documents with special commands PRO Plays well with version control and other program development tools. Easy to manipulate with scripts
CSE 233. Database System Overview
CSE 233 Database System Overview 1 Data Management An evolving, expanding field: Classical stand-alone databases (Oracle, DB2, SQL Server) Computer science is becoming data-centric: web knowledge harvesting,
Advantages of XML as a data model for a CRIS
Advantages of XML as a data model for a CRIS Patrick Lay, Stefan Bärisch GESIS-IZ, Bonn, Germany Summary In this paper, we present advantages of using a hierarchical, XML 1 -based data model as the basis
Mobile Web Design with HTML5, CSS3, JavaScript and JQuery Mobile Training BSP-2256 Length: 5 days Price: $ 2,895.00
Course Page - Page 1 of 12 Mobile Web Design with HTML5, CSS3, JavaScript and JQuery Mobile Training BSP-2256 Length: 5 days Price: $ 2,895.00 Course Description Responsive Mobile Web Development is more
Agents and Web Services
Agents and Web Services ------SENG609.22 Tutorial 1 Dong Liu Abstract: The basics of web services are reviewed in this tutorial. Agents are compared to web services in many aspects, and the impacts of
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen
XML Data Integration
XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration
Big Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Introduction
XML and Data Integration
XML and Data Integration Week 11-12 Week 11-12 MIE253-Consens 1 Schedule Week Date Lecture Topic 1 Jan 9 Introduction to Data Management 2 Jan 16 The Relational Model 3 Jan. 23 Constraints and SQL DDL
A first step towards modeling semistructured data in hybrid multimodal logic
A first step towards modeling semistructured data in hybrid multimodal logic Nicole Bidoit * Serenella Cerrito ** Virginie Thion * * LRI UMR CNRS 8623, Université Paris 11, Centre d Orsay. ** LaMI UMR
Introduction to XML Applications
EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for
Lecture 21: NoSQL III. Monday, April 20, 2015
Lecture 21: NoSQL III Monday, April 20, 2015 Announcements Issues/questions with Quiz 6 or HW4? This week: MongoDB Next class: Quiz 7 Make-up quiz: 04/29 at 6pm (or after class) Reminders: HW 4 and Project
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
HETEROGENEOUS DATABASE INTEGRATION FOR WEB APPLICATIONS
HETEROGENEOUS DATABASE INTEGRATION FOR WEB APPLICATIONS V. Rajeswari, Assistant Professor, Department of Information Technology, Karpagam College of Engineering, Coimbatore 32, [email protected]
XSLT Mapping in SAP PI 7.1
Applies to: SAP NetWeaver Process Integration 7.1 (SAP PI 7.1) Summary This document explains about using XSLT mapping in SAP Process Integration for converting a simple input to a relatively complex output.
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,
SPARQL: Un Lenguaje de Consulta para la Web
SPARQL: Un Lenguaje de Consulta para la Web Semántica Marcelo Arenas Pontificia Universidad Católica de Chile y Centro de Investigación de la Web M. Arenas SPARQL: Un Lenguaje de Consulta para la Web Semántica
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms Mohammed M. Elsheh and Mick J. Ridley Abstract Automatic and dynamic generation of Web applications is the future
XML: extensible Markup Language. Anabel Fraga
XML: extensible Markup Language Anabel Fraga Table of Contents Historic Introduction XML vs. HTML XML Characteristics HTML Document XML Document XML General Rules Well Formed and Valid Documents Elements
Lesson 8: Introduction to Databases E-R Data Modeling
Lesson 8: Introduction to Databases E-R Data Modeling Contents Introduction to Databases Abstraction, Schemas, and Views Data Models Database Management System (DBMS) Components Entity Relationship Data
CSE 132A. Database Systems Principles
CSE 132A Database Systems Principles Prof. Victor Vianu 1 Data Management An evolving, expanding field: Classical stand-alone databases (Oracle, DB2, SQL Server) Computer science is becoming data-centric:
Language. Johann Eder. Universitat Klagenfurt. Institut fur Informatik. Universiatsstr. 65. A-9020 Klagenfurt / AUSTRIA
PLOP: A Polymorphic Logic Database Programming Language Johann Eder Universitat Klagenfurt Institut fur Informatik Universiatsstr. 65 A-9020 Klagenfurt / AUSTRIA February 12, 1993 Extended Abstract The
Translating between XML and Relational Databases using XML Schema and Automed
Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203
2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications
2. Background on Data Management Aspects of Data Management and an Overview of Solutions used in Engineering Applications Overview Basic Terms What is data, information, data management, a data model,
Data Integration through XML/XSLT. Presenter: Xin Gu
Data Integration through XML/XSLT Presenter: Xin Gu q7.jar op.xsl goalmodel.q7 goalmodel.xml q7.xsl help, hurt GUI +, -, ++, -- goalmodel.op.xml merge.xsl goalmodel.input.xml profile.xml Goal model configurator
DLDB: Extending Relational Databases to Support Semantic Web Queries
DLDB: Extending Relational Databases to Support Semantic Web Queries Zhengxiang Pan (Lehigh University, USA [email protected]) Jeff Heflin (Lehigh University, USA [email protected]) Abstract: We
Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data
Structured vs. unstructured data 2 Semistructured data, XML, DTDs Introduction to databases CSCC43 Winter 2011 Ryan Johnson Databases are highly structured Well-known data format: relations and tuples
Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar
Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs
Chapter 2: Designing XML DTDs
2. Designing XML DTDs 2-1 Chapter 2: Designing XML DTDs References: Tim Bray, Jean Paoli, C.M. Sperberg-McQueen: Extensible Markup Language (XML) 1.0, 1998. [http://www.w3.org/tr/rec-xml] See also: [http://www.w3.org/xml].
Formal Engineering for Industrial Software Development
Shaoying Liu Formal Engineering for Industrial Software Development Using the SOFL Method With 90 Figures and 30 Tables Springer Contents Introduction 1 1.1 Software Life Cycle... 2 1.2 The Problem 4 1.3
Technologies for a CERIF XML based CRIS
Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the
Programming Languages
Programming Languages Qing Yi Course web site: www.cs.utsa.edu/~qingyi/cs3723 cs3723 1 A little about myself Qing Yi Ph.D. Rice University, USA. Assistant Professor, Department of Computer Science Office:
by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000
Home Products Consulting Industries News About IBM by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000 Copyright IBM Corporation, 1999. All Rights Reserved. All trademarks or registered
Cedalion A Language Oriented Programming Language (Extended Abstract)
Cedalion A Language Oriented Programming Language (Extended Abstract) David H. Lorenz Boaz Rosenan The Open University of Israel Abstract Implementations of language oriented programming (LOP) are typically
Data Integration Hub for a Hybrid Paper Search
Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., [email protected],
Chapter 3. Data Modeling Using the Entity-Relationship (ER) Model
Chapter 3 Data Modeling Using the Entity-Relationship (ER) Model Chapter Outline Overview of Database Design Process Example Database Application (COMPANY) ER Model Concepts Entities and Attributes Entity
Introduction to Ingeniux Forms Builder. 90 minute Course CMSFB-V6 P.0-20080901
Introduction to Ingeniux Forms Builder 90 minute Course CMSFB-V6 P.0-20080901 Table of Contents COURSE OBJECTIVES... 1 Introducing Ingeniux Forms Builder... 3 Acquiring Ingeniux Forms Builder... 3 Installing
DTD Tutorial. About the tutorial. Tutorial
About the tutorial Tutorial Simply Easy Learning 2 About the tutorial DTD Tutorial XML Document Type Declaration commonly known as DTD is a way to describe precisely the XML language. DTDs check the validity
04 XML Schemas. Software Technology 2. MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard
MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard Software Technology 2 04 XML Schemas 2 XML: recap and evaluation During last lesson we saw the basics
Semester Review. CSC 301, Fall 2015
Semester Review CSC 301, Fall 2015 Programming Language Classes There are many different programming language classes, but four classes or paradigms stand out:! Imperative Languages! assignment and iteration!
Part VI. Object-relational Data Models
Part VI Overview Object-relational Database Models Concepts of Object-relational Database Models Object-relational Features in Oracle10g Object-relational Database Models Object-relational Database Models
