XML and Data Integration



Similar documents
Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

Transactions and the Internet

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

DTD Tutorial. About the tutorial. Tutorial

XML and Data Management

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

XML: extensible Markup Language. Anabel Fraga

IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen

Chapter 2: Designing XML DTDs

XML Processing and Web Services. Chapter 17

Database Systems. Lecture 1: Introduction

1. Write the query of Exercise 6.19 using TRC and DRC: Find the names of all brokers who have made money in all accounts assigned to them.

Translating between XML and Relational Databases using XML Schema and Automed

Introduction to XML Applications

An XML Based Data Exchange Model for Power System Studies

Markup Languages and Semistructured Data - SS 02

Unified XML/relational storage March The IBM approach to unified XML/relational databases

Objectives of Lecture 1. Class and Office Hours. Labs and TAs. CMPUT 391: Introduction. Introduction

Extensible Markup Language (XML): Essentials for Climatologists

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Managing large sound databases using Mpeg7

Objectives of Lecture 1. Labs and TAs. Class and Office Hours. CMPUT 391: Introduction. Introduction

A Workbench for Prototyping XML Data Exchange (extended abstract)

Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and

Deferred node-copying scheme for XQuery processors

CSE 132A. Database Systems Principles

C S 105 (53485) Computer Programming: PHP/SQL Spring 2014

Chapter 1: Introduction

Jamcracker Web Services. David Orchard Standards Architect

XSLT Mapping in SAP PI 7.1

Introduction to Databases

XML WEB TECHNOLOGIES

1 File Processing Systems

Data Integration through XML/XSLT. Presenter: Xin Gu

XML for Manufacturing Systems Integration

An Oracle White Paper October Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case

Instant SQL Programming

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

Java and XML parsing. EH2745 Lecture #8 Spring

Sage CRM Connector Tool White Paper

Overview of Data Management

Fast track to HTML & CSS 101 (Web Design)

XML. Document Type Definitions XML Schema

Agents and Web Services

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

XML Data Integration

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE

Converting XML Data To UML Diagrams For Conceptual Data Integration

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Principles of Database. Management: Summary

Database System Concepts

Lecture 21: NoSQL III. Monday, April 20, 2015

Database Design and Database Programming with SQL - 5 Day In Class Event Day 1 Activity Start Time Length

Part I: Entity Relationship Diagrams and SQL (40/100 Pt.)

History of Database Systems

3. Relational Model and Relational Algebra

Introduction: Database management system

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Schematron Validation and Guidance

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1

High Performance XML Data Retrieval

Lesson 4 Web Service Interface Definition (Part I)

WWW. World Wide Web Aka The Internet. dr. C. P. J. Koymans. Informatics Institute Universiteit van Amsterdam. November 30, 2007

Database trends: XML data storage

Stage 3 proposal: Feature #13102 (Release Management Domain)

Chapter 2 Database System Concepts and Architecture

Oracle SQL. Course Summary. Duration. Objectives

Organizational Search in Systems

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

T XML in 2 lessons! %! " #$& $ "#& ) ' */,: -.,0+(. ". "'- (. 1


Technologies for a CERIF XML based CRIS

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration

Overview of Database Management

How To Use Xml In A Web Browser (For A Web User)

10CS73:Web Programming

XSL - Introduction and guided tour

EHR central system advantages and disadvantages, the case of Estonia. Estonian E-health Foundation Raul Mill

Transcription:

XML and Data Integration Week 11-12 Week 11-12 MIE253-Consens 1

Schedule Week Date Lecture Topic 1 Jan 9 Introduction to Data Management 2 Jan 16 The Relational Model 3 Jan. 23 Constraints and SQL DDL 4 Jan. 30 SQL DML, DB Applications, JDBC 5 Feb 6 JDBC, DDL (Views, Access Control) 6 Feb 13 Relational Algebra, Advanced SQL - Feb 20 [Reading Week] 7 Feb 27 Review and Midterm (Mar 1) 8 Mar 5 OLAP 9 Mar 12 ER Conceptual Modelling 10 Mar 19 Normalization 11 Mar 26 XML and Data Integration 12 Apr 2 Transactions and the Internet, Query Processing 13 Apr 9 Final Review This week s reading: Chapter 15 Week 11-12 MIE253-Consens 2

Semistructured Data A typical piece of data on the Web: <dt>name: John Doe <dd>student Id: 111111111 <dd>address: <ul> <li>number: 123 <li>street: Main </ul> </dt> <dt>name: Joe Public <dd>student Id: 222222222 </dt> Week 11-12 MIE253-Consens 3

Semistructured Data (contd.) To make the previous student list suitable for machine consumption on the Web, it should have these characteristics: Be object-like Be schemaless (doesn t guarantee to conform exactly to any schema, but different objects have some commonality among themselves) Be self-describing (some schema-like information, like attribute names, is part of data itself) Week 11-12 MIE253-Consens 4

Why XML? XML is a standard format for data exchange Plenty of industry-specific standards Take a look at http://xml.coverpages.org Extensive software support All major relational database products have been retrofitted with facilities to store and construct XML documents Web Browser and Operating System support Week 11-12 MIE253-Consens 5

Health Data Exchange The HL7 Patient Record Architecture is a framework for exchange of clinical documents Week 11-12 MIE253-Consens 6

Sample HL7 Exam Report <LevelOne> <header>...</header> <body> <section> <section.title>admitting PHYSICAL EXAMINATION</section.title> <section> <section.title>general</section.title> <paragraph>the blood pressure is 170/88, pulse 80 and regular, and <healthcare.code identifier="9279-1" preferred.name="respiratory RATE" name.of.coding.system="ln" local.coding.system= N > respirations </healthcare.code> 18. She weighs 240 pounds. </paragraph> </section> <section> <section.title>heent</section.title> <paragraph>examination of the head is normocephalic. The patient has bilateral<healthcare.code identifier="f-f5480" preferred.name="carotid bruit" name.of.coding.system="sn3" local.coding.system= N > carotid bruit </healthcare.code>. There is no jugular venous distention or lymphadenopathy. </paragraph> </section> </section> </body> </LevelOne> Week 11-12 MIE253-Consens 7

Sample HL7 Header <header> <document>...</document> <event> <event.id><id.value>1009</id.value></event.id> <event.date>19990212</event.date> </event> <patient> <patient.id><id.value>p001</id.value></patient.id> <patient.name> <family.name>lantry</family.name> <given.name>connie</given.name> </patient.name> <patient.date.of.birth>19630613</patient.date.of.birth> <patient.sex value="female"/> </patient> <practitioner> <practitioner.id><id.value>24680</id.value></practitioner.id> <practitioner.role> <text>attending PHYSICIAN</text> <name.of.coding.system>hl70133</name.of.coding.system> </practitioner.role> </practitioner> </header> Week 11-12 MIE253-Consens 8

Sample HL7 Document Origin <document> <document.creation.date>19990212</document.creation.date> <document.id> <id.value>1009</id.value> </document.id> <document.originating.system> <id.value>systemx</id.value> <organization.name>global Healthcare, INC</organization.name> </document.originating.system> <document.originator.id> <id.value>24680</id.value> <family.name>levin</family.name> <given.name>henry</given.name> <suffix>the 7th</suffix> <degree>md</degree> </document.originator.id> <document.state value="original"/> <document.type> <identifier>11492-6</identifier> <text>history AND PHYSICAL</text> <name.of.coding.system>ln</name.of.coding.system> </document.type> </document> Week 11-12 MIE253-Consens 9

Summary: XML XML and Semi-structured Data Schema-less Self-describing XML for Data Exchange Week 11-12 MIE253-Consens 10

Additional Material: XML Well-formed XML Valid XML (DTD, XML Schema) XPath basics Further material, applications in MIE354H1F Business Process Engineering Week 11-12 MIE253-Consens 11

Example XML Document <?xml version= 1.0?> declaration attributes <PersonList Type= Student Date= 2002-02-02 > <Title Value= Student List /> <Person> </Person> <Person> </Person> </PersonList> elements empty element Element (or tag) names Root element Elements are nested Root element contains all others Week 11-12 MIE253-Consens 12

More Terminology Opening tag <Person Name = John Id = 111111111 > John is a nice fellow standalone text, not useful as data <Address> <Number>21</Number> <Street>Main St.</Street> </Address> </Person> Nested element, child of Person Child of Address, Descendant of Person Content of Person Parent of Address, Ancestor of number Closing tag: What is open must be closed Week 11-12 MIE253-Consens 13

Well-formed XML Documents Must have a root element Every opening tag must have matching closing tag Elements must be properly nested <a><b></a></b> is not well-formed <a><b></b></a> or <a></a><b></b> is well-formed An attribute name can occur at most once in an opening tag. If it occurs, It must have a value The value must be quoted (with or ) XML processors are not supposed to try and fix ill-formed documents (unlike HTML browsers) Week 11-12 MIE253-Consens 14

XML Document Tree Week 11-12 MIE253-Consens 15

Valid XML Documents Two mechanisms to describe the schema of an XML document: DTD (Document Type Definition) XML Schema A document that satisfies the constraints in an XML DTD/Schema is valid XML documents must always be well-formed, validity is an additional property Historic reasons for multiple schemas for XML - tools translate among them and from conceptual models (ER, UML) Week 11-12 MIE253-Consens 16

DTD Elements and Attributes <!DOCTYPE Report [ <!ELEMENT Report (Students, Classes, Courses)> <!ELEMENT Students (Student*)> <!ELEMENT Classes (Class*)> <!ELEMENT Courses (Course*)> <!ELEMENT Student (Name, Status, CrsTaken*)> <!ELEMENT Name (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT CrsTaken EMPTY> <!ELEMENT Class (CrsCode,Semester,ClassRoster)> <!ELEMENT Course (CrsName)> <!ATTLIST Report Date #IMPLIED> <!ATTLIST Student StudId ID #REQUIRED> <!ATTLIST Course CrsCode ID #REQUIRED> <!ATTLIST CrsTaken CrsCode IDREF #REQUIRED> <!ATTLIST ClassRoster Members IDREFS #IMPLIED> ]> text Zero or more Empty element Same attribute in different elements Week 11-12 MIE253-Consens 17

XPath Document Tree Week 11-12 MIE253-Consens 18

Document Corresponding to the Tree <?xml version= 1.0?> <!-- Some comment --> <Students> <Student StudId= 111111111 > <Name><First>John</First><Last>Doe</Last></Name> <Status>U2</Status> <CrsTaken CrsCode= CS308 Semester= F1997 /> <CrsTaken CrsCode= MAT123 Semester= F1997 /> </Student> <Student StudId= 987654321 > <Name><First>Bart</First><Last>Simpson</Last></Name> <Status>U4</Status> <CrsTaken CrsCode= CS308 Semester= F1994 /> </Student> </Students> <!-- Some other comment --> Week 11-12 MIE253-Consens 19

XML Query Languages XPath core query language used in XML Schema, XSLT, XQuery, many other XML standards XSLT a functional style document transformation language. Very powerful, very complicated XQuery upcoming standard. Very powerful, fairly intuitive, SQL-style Also SQL extensions supporting XML Week 11-12 MIE253-Consens 20

XPath Basics Expression / returns root node /Students/Student returns all Student-elements that are children of Students elements, which in turn must be children of the root /Student returns empty set //Students returns all Student-elements below the root Students who have taken CS532: //Student[CrsTaken/@CrsCode= CS532 ] Last course taken by the first student in the list: /Students/Student[1]/CrsTaken[last()] Week 11-12 MIE253-Consens 21

XPath Semantics locationstep1/locationstep2/ means: Find all nodes specified by locationstep1 For each such node N: Find all nodes specified by locationstep2 using N as the current node Take union For each node returned by locationstep2 do the same locationstep = axis::node[predicate] Find all nodes specified by axis::node Select only those that satisfy predicate Week 11-12 MIE253-Consens 22