Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu



Similar documents
XML: extensible Markup Language. Anabel Fraga

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

An XML Based Data Exchange Model for Power System Studies

Extensible Markup Language (XML): Essentials for Climatologists

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

DTD Tutorial. About the tutorial. Tutorial

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

Introduction to XML Applications

XML Schema Definition Language (XSDL)

XML and Data Management

XML and Data Integration

XML. Document Type Definitions XML Schema

Web Services Technologies

T Network Application Frameworks and XML Web Services and WSDL Tancred Lindholm

Data Integration through XML/XSLT. Presenter: Xin Gu

AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE

XML WEB TECHNOLOGIES

Chapter 2: Designing XML DTDs

Translating between XML and Relational Databases using XML Schema and Automed

Developing XML Solutions with JavaServer Pages Technology

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

Chapter 3: XML Namespaces

Lecture 21: NoSQL III. Monday, April 20, 2015

Extending the Linked Data API with RDFa

Markup Languages and Semistructured Data - SS 02

XSLT Mapping in SAP PI 7.1

CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services Skilltop Technology Limited. All rights reserved.

XML Processing and Web Services. Chapter 17

Concrete uses of XML in software development and data analysis.

How To Use Xml In A Web Browser (For A Web User)

T XML in 2 lessons! %! " #$& $ "#& ) ' */,: -.,0+(. ". "'- (. 1

Unified XML/relational storage March The IBM approach to unified XML/relational databases

Representation of E-documents in AIDA Project

Internationalization Tag Set 1.0 A New Standard for Internationalization and Localization of XML

Lesson 4 Web Service Interface Definition (Part I)

High Performance XML Data Retrieval

1. Write the query of Exercise 6.19 using TRC and DRC: Find the names of all brokers who have made money in all accounts assigned to them.

CHAPTER 1 INTRODUCTION

Data XML and XQuery A language that can combine and transform data

Overview of DatadiagramML

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration

Xtreeme Search Engine Studio Help Xtreeme

Common definitions and specifications for OMA REST interfaces

Developer Guide to Authentication and Authorisation Web Services Secure and Public

Java and XML parsing. EH2745 Lecture #8 Spring

Implementing XML Schema inside a Relational Database

12 The Semantic Web and RDF

An Approach to Translate XSLT into XQuery

Managing XML Documents Versions and Upgrades with XSLT

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

TagSoup: A SAX parser in Java for nasty, ugly HTML. John Cowan (cowan@ccil.org)

Introduction to Web Services

Structured Data Capture (SDC) Draft for Public Comment

Organizational Search in Systems

10. XML Storage Motivation Motivation Motivation Motivation. XML Databases 10. XML Storage 1 Overview

Deferred node-copying scheme for XQuery processors

Big Data Analytics. Rasoul Karimi

WWW. World Wide Web Aka The Internet. dr. C. P. J. Koymans. Informatics Institute Universiteit van Amsterdam. November 30, 2007

Exchanger XML Editor - Canonicalization and XML Digital Signatures

Agents and Web Services

REST vs. SOAP: Making the Right Architectural Decision

Presentation / Interface 1.3

Translating XQuery expressions to Functional Queries in a Mediator Database System

Discussion: XML and the Semantic Web

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR

LabVIEW Internet Toolkit User Guide

Grandstream XML Application Guide Three XML Applications

SEMESTER VIII IT1451 XML AND WEB SERVICES UNIT I XML TECHNOLOGY FAMILY 9

Semi-structured Data. 1 - Introduction

6. SQL/XML. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. XML Databases 6. SQL/XML. Creating XML documents from a database

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

i-scream The future is bright; the future is blue.

Contents. 2 Alfresco API Version 1.0

Introduction to Ingeniux Forms Builder. 90 minute Course CMSFB-V6 P

Visualizing a Neo4j Graph Database with KeyLines

CIS 467/602-01: Data Visualization

Transcription:

Semistructured data and XML Institutt for Informatikk 1

Unstructured, Structured and Semistructured data Unstructured data e.g., text documents Structured data: data with a rigid and fixed data format e.g., tables in relational databases Semistructured data: no predefined schema, data is self-describing and mixed in with schema information (schemaless, self-describing data) e.g., email, ical etc. 2

Unstructured data Unstructured data data can be of any type not necessarily following any format or sequence does not follow any rules is not predictable examples include: text, video, sound, images 3

Structured data data is organized in semantic chunks (entities) similar entities are grouped together (classes) entities in the same group have the same descriptions (attributes) descriptions for all entities in a group (schema) have the same defined format, have a predefined length and are all present and follow the same order 4

Semistructured data organized in semantic entities similar entities are grouped together entities in same group may not have same attributes order of attributes not necessarily important not all attributes may be required size of same attributes in a group may differ type of same attributes in a group may differ 5

Semistructured data Why semistructured data? Integration of databases similar data different with schemas Information share on the Web e.g., XML, JSON etc. Flexible: irregular structure, evolves rapidly add new attributes freely empty values new relationships without needing to change a schema 6

Semistructured data Example name: Peter Wood email: ptw@dcs.bbk.ac.uk, p.wood@bbk.ac.uk name: first name: Mark last name: Levene email: mark@dcs.bbk.ac.uk name: Alex Poulovassilis affiliation: Birkbeck 7

Semistructured data Representation Labelled directed graph, nodes: leaf or interior schema information is in the edge labels data stored at the leaves StarMovieData StarsIn Star Star Movie StarOf Carrie Fisher Name Address Address Street City Street Mark Hamill City Name Street Oak City StarsIn Redwood StarOf Title Star Wars Year 1977 Maple Locust Malibu Hollywood 8

Semistructured data Information integration No common schema, legacy-database problem Approach: semistructured data with wrappers interface Other applications Other applications Database Database 9

Semistructured data Markup languages Allows marking up documents by representing structural, presentational, and semantic information alongside content Markup languages play a key role: notably XML XML is derived from SGML (Standard Generalized Markup Language) SGML is a ISO standard technology for defining markup languages HTML is another example of a markup language originally derived from SGML 10

XML Extensible Markup Language Follows a tag-based notation, similar to HTML HTML tags talk about the presentation while XML tags talk about the meaning HTML <html> <body> <i>this is italic</i> <p>this is a paragraph.</p> </body> </html> XML <note> <to>tove</to> <from>jani</from> <subject /> <heading>reminder</heading> <body>call me!</body> </note> 11

XML With and without schema XML can be used in different modes Well-formed XML no predefined schema invent your own tags nesting rules has to be obeyed (syntactically correct) i.e., has to be well-formed Valid XML: involves a schema definition allowable tags and grammar is specified between strict-schema and schemaless models 12

Well-formed XML Begins with a declaration of the document type (i.e., XML) It has a root element that is the entire body character encoding <?xml version="1.0" encoding= utf-8 standalone= yes?> <sometag>... </sometag> well-formed or valid root element 13

Well-formed XML example <?xml version="1.0" encoding="utf-8"?> <StarMovieData> <Star> <Name>Carrie Fisher</Name> <Address> <Street>123 Maple Street</Street> <City>Hollywood</City> </Address> </Star> <Movie> <Title>Star Wars</Title> <Year>1977</Year> </Movie> </StarMovieData> Carrie Fisher Name Maple Address Street Star City Hollywood StarMovieData Title Movie Star Wars Year 1977 14

Well-formed XML Attributes XML elements can have attributes within opening tags An alternative way to represent a leaf node Attributes can represent labeled arcs <Movie year = 1977><Title> Star Wars</Title></Movie> <Movie title= Star Wars year = 1977></Movie> <Movie title= Star Wars year = 1977 /> 15

Well-formed XML Attributes Attributes can also represent relationships <?xml version="1.0" encoding="utf-8"?> <StarMovieData> <Star starid="cf" starredin="sw"> <Name>Carrie Fisher</Name> <Address> <Street>123 Maple Street</Street> <City>Hollywood</City> </Address> </Star> <Movie movieid="sw starof="cf"> <Title>Star Wars</Title> <Year>1977</Year> </Movie> </StarMovieData> 16

Well-formed XML Namespaces Can qualify the tags in the XML document Facilitate reuse of vocabularies Use several vocabularies in the same XML document without name conflicts Namespace specified by a URI which is typically a URL that refers to a document describing the interpretation of the tags in the namespace This document can be an XML document, an informal document (HTML),... or nothing 17

Well-formed XML Namespaces HTML table <table> <tr> <td>apples</td> <td>bananas</td> </tr> </table> A real table <table> <name>african Coffee Table</name> <width>80</width> <length>120</length> </table> <root> <h:table xmlns:h="http://www.w3.org/"> <h:tr> <h:td>apples</h:td> <h:td>bananas</h:td> </h:tr> </h:table> <f:table xmlns:f="http://www.furniture.com"> <f:name>african Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> </root> 18

Well-formed XML XML and Databases It is common for computers to share data across the internet by passing messages in form of XML It is increasingly common for XML to be used for data storage similar to relational databases How do we catch efficiency in data access with XML? Store XML data in parsed form, e.g., SAX (Simple API for XML) and DOM (Document Object Model) Represent documents and their elements as relations and store in conventional databases 19

Well-formed XML XML and Databases A possible relational schema for storing XML is: Relates document IDs to the IDs of their root element DocRoot(docID, rootelementid) SubElement(parentID, childid, position) ElementAttribute(elementID, name, value) ElementValue(elementID, value) Connects an element to each of its immediate sub elements Relates elements to their attributes Relates leaf elements to their values 20

Valid XML Valid: well-formed and follows a particular schema A schema is a definition of the syntax of an XMLbased language (i.e., it defines a class of XML documents) Allows automatically interpreting the meaning or semantics of the elements Two prominent alternatives: XML DTD (document type definition) and XML Schema 21

Valid XML XML DTD <!DOCTYPE StarMovieData [ <!ELEMENT StarMovieData (Star*, Movie*)> ]> <!ELEMENT Star (Name, Address+)> <!ATTLIST Star starid ID #REQUIRED starredin IDREFS #IMPLIED > <!ELEMENT Name (#PCDATA)> <!ELEMENT Address (Street, (City Zip))> <!ELEMENT Street (#PCDATA)> <!ELEMENT City (#PCDATA)> <!ELEMENT Movie (Title, Year, Genre)> <!ATTLIST Movie movieid ID #REQUIRED starsof IDREFS #IMPLIED > <!ELEMENT Title (#PCDATA)> <!ELEMENT Year (#PCDATA)> <!ELEMENT Genre (Comedy Drama SciFi)> ELEMENT: element declaration ATTLIST: attribute declarations #PCDATA: data should be parsed #CDATA: data should not be parsed #REQUIRED: attribute must be present #IMPLIED: attribute is optional ID: defines an identifier IDREF: references to other elements *: element may occur any # of times +: element may occur 1 or more times?: element may occur 0 or 1 time : exactly 1 option appears 22

Valid XML XML Schema It is more powerful than DTD provides far more control for the developer over what is legal and a detailed way to define what the data can and cannot contain allows arbitrary restrictions on the number of occurrences of sub elements allows to declare types such as integer, float... gives ability to declare keys and foreign keys XML schemas themselves are XML documents 23

XML Schema <?xml version = "1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema"> </xs:schema> 24

XML Schema Elements and simple types <?xml version = "1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema"> <xs:element name="title" type="xs:string" /> <xs:element name="year" type="xs:integer" /> </xs:schema> 25

XML Schema <?xml version = "1.0" encoding="utf-8"?> Complex types <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema"> <xs:complextype name="movietype > <xs:sequence> </xs:sequence> </xs:complextype> <xs:element name="movies"> </xs:element> </xs:schema> <xs:complextype> <xs:element name="title" type="xs:string" /> <xs:element name="year" type="xs:integer" /> <xs:sequence> </xs:complextype> <xs:element name="movie" type="movietype" minoccurs="0" maxoccurs="unbounded" /> </xs:sequence> 26

XML Schema Example XML document <?xml version = "1.0"encoding="utf-8"?> <Movies xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:nonamespaceschemalocation="movies.xsd" > <Movie> </Movie> <Title>Star Wars</Title> <Year>1977</Year> <Movie> </Movie> </Movies> 27

XML Schema Attributes <?xml version = "1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema"> <xs:complextype name="movietype"> <xs:attribute name="movieid" type="xs:string" use="required" /> <xs:attribute name="starof" type="xs:string" /> <xs:sequence> <xs:element name="title" type="xs:string" /> <xs:element name="year" type="xs:integer" /> </xs:sequence> </xs:complextype> <xs:element name="movies"> <xs:complextype> <xs:sequence> <xs:element name="movie" type="movietype" minoccurs="0" maxoccurs="unbounded" /> </xs:sequence> </xs:complextype> </xs:element> </xs:schema> 28

XML Schema Example XML Document <?xml version = "1.0" encoding="utf-8"?> <Movies xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:nonamespaceschemalocation="movies.xsd"> <Movie movieid="sw"> <Title>Star Wars</Title> <Year>1977</Year> </Movie> <Movie movieid="rj"> </Movie> </Movies> 29

XML Schema Restricted Simple Types <xs:simpletype name = "MovieYearType > <xs:restriction base = xs:integer > <xs:mininclusive value = 1915 /> </xs:restriction> </xs:simpletype> restrict numerical values with mininclusive and maxinclusive <xs:simpletype name = "genretype"> <xs:restriction base = "xs:string"> <xs:enumeration value = "comedy" /> <xs:enumeration value = "drama" /> <xs:enumeration value = "scifi" /> </xs:restriciton> </xs:ssimpletype> restrict values to an enumerated type 30

XML Schema Keys <?xml version = "1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema">... <xs:element name="movies"> <xs:complextype> <xs:sequence> </xs:complextype> <xs:element name="movie" type="movietype" minoccurs="0" maxoccurs="unbounded" /> </xs:sequence> <xs:key name="moviekey"> </xs:key> </xs:element> </xs:schema> <xs:selector xpath="movie" /> <xs:field xpath="title" /> <xs:field xpath= Year" /> 31

XML Schema <xs:element name="stars"> <xs:complextype> Foreign Keys <xs:element name="starredin" minoccurs="0" maxoccurs="unbounded"> <xs:complextype> <xs:element name="title" type="xs:string" /> <xs:element name="year" type="xs:integer" /> </xs:complextype> </xs:element> </xs:complextype> <xs:keyref name="movieref" refers = "moviekey"> <xs:selector xpath="star/starredin" /> <xs:field xpath= title" /> <xs:field xpath= year" /> </xs:keyref> </xs:element> 32

XML Programming Languages XPath uses path expressions to navigate in XML documents XQuery is the language for querying XML data and is built on XPath expressions (like SQL for DBs) XSLT transforms an XML document into another XML document 33

XPath XPath expressions generally returns a sequence of items that satisfy certain patterns A sequence of elements can be specified using an absolute or relative path /Movies - root element and all its content /Movies/Movie all Movie elements inside (direct child of) Movies element /Movies//Title all Title elements inside (at any level) Movies element * - any element /Movies/Movie/[Year="1980"] - all Movie elements with Year value 1980 34

XQuery Allows specification of more complex queries on one or more documents The typical form of XQuery is known FLWR expression FOR <variable bindings to individual nodes> LET <variable bindings to collection of nodes> WHERE <qualifier conditions> RETURN <query result specification> 35

XQuery Example XML Document <?xml version = "1.0" encoding="utf-8"?> <Movies> <Movie genre="comedy"> <Title>Bruce Almighty</Title> <Star><Name>Jim Carrey</Name></Star> </Movie> <Movie genre="comedy"> <Title>Dumb & Dumber</Title> <Star><Name>Jim Carrey</Name></Star> </Movie> <Movie genre="drama"> <Title>The Truman Show</Title> <Star><Name>Jim Carrey</Name></Star> </Movie> <Movie genre="comedy"> <Title>Nine Months</Title> <Star><Name>Hugh Grant<Name></Star> </Movie> </Movies> 36

XQuery Example XQuery Find all comedy movies in which Jim Carrey is an actor let $movies := doc("movies.xml") for $movie in $movies//movie[@genre="comedy"] where $movie/star/[name="jim Carrey"] return $movie/title Find the cities in which stars are mentioned let $movies := doc("movies.xml") let $stars := doc( stars.xml") for $s1 in $movies/movies/movie/version/star, $s2 in $stars/stars/star where data(s1) = data($s2/name) return $s2/address/city 37

XQuery Other features Eliminating duplicates let $s := distinct-values( ) Quantifiers every $s in satisfies some $s in satisfies Aggregation (count, sum, max, ) Branching if ( ) then else 38

XSLT Extensible Stylesheet Language for Transformations original purpose is to transform XML documents to other document forms (XML, HTML etc.) in practice is another query language uses XPath for navigating in XML documents 39

XSLT XML-document Example <?xml version = "1.0" encoding="utf-8"?> <Movies> <Movie genre="comedy"> <Title>Bruce Almighty</Title> <Star><Name>Jim Carrey</Name></Star> </Movie>... XSLT stylesheet <?xml version = "1.0" encoding = "utf-8"?> <xsl:stylesheet xmlns:xsl = "http:...xsl/transform version = "1.0"> <xsl:output method = xml indent = yes /> <xsl:template match = "/Movies"> <ComedyMovies> <xsl:apply-templates /> </ComedyMovies>... XML-document XSLT Processor <?xml version = "1.0" encoding="utf-8"?> <ComedyMovies> <Comedy title = "Bruce Almighty" /> <Comedy title = "Dumb & Dumber" /> <Comedy title = "Nine Months" /> </ComedyMovies> 40

XSLT Example <?xml version = "1.0" encoding = "utf-8"?> <xsl:stylesheet xmlns:xsl = "http://www.w3.org/1999/xsl/transform version = "1.0"> <xsl:output method = xml indent = yes /> <xsl:template match = "/Movies"> <ComedyMovies> <xsl:apply-templates /> </ComedyMovies> </xsl:template> <xsl:template match = "Movie[@genre="comedy"]"> <xsl:apply-templates /> </xsl:template> <xsl:template match = "Title"> <Comedy title = "<xsl:value-of select = "." /> " /> </xsl:template> <xsl:stylesheet> 41

Some online resources XML: http://www.w3schools.com/xml/ XPath: www.w3schools.com/xpath/ XPath tester: http://www.xpathtester.com/test XQuery: www.w3schools.com/xquery/ XQuery tester: http://www.zorba-xquery.com/html/demo XSLT: www.w3schools.com/xsl/ XSLT tester: http://www.w3.org/2005/08/online_xslt/ 42