Semi-structured Data. 1 - Introduction



Similar documents
Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

The business of Gatwick Airport Limited

COST TUD0804 Final Conference Merano 2013

Flight Connections to Austria Winter 2014/2015

New inspirations for the Winter season at Nice-Côte d Azur airport

BORYSPIL INTERNATIONAL AIRPORT SEASON TIMETABLE WINTER

Database Systems. Lecture 1: Introduction

The Entity-Relationship Model

2. Basic Relational Data Model

TTI-use in Monitoring Flight Catering and other foods

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

A Workbench for Prototyping XML Data Exchange (extended abstract)

Chapter 1: Introduction

AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Organizational Search in Systems

2013 Ruby on Rails Exploits. CS 558 Allan Wirth

A new chapter of travel to Asia and Australia

Relational Databases

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Application Centric Infrastructure Object-Oriented Data Model: Gain Advanced Network Control and Programmability

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 2,

Incentive Schemes to air transport currently in force in Cyprus

Database System Concepts

How To Improve Performance In A Database

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Automatic data collection on the Internet (web scraping)

The Mayor of London s Submission:

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

RYANAIR HOLDINGS plc. Annual Report and Financial statements 2000

A day in the life of an aircraft operator

Representing XML Schema in UML A Comparison of Approaches

Lesson 8: Introduction to Databases E-R Data Modeling

Abstract 1. INTRODUCTION

Treemap Visualisations

FLUGHAFEN WIEN AG. Traffic Results 2015 and Business Outlook for 2016

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

NoSQL's biggest secret: SQL went nowhere Matthew Revell Director of Developer Advocacy, Couchbase

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

Building a Complex UML BIRT Report

A first step towards modeling semistructured data in hybrid multimodal logic

Advantages of XML as a data model for a CRIS

Databases What the Specification Says

Chapter 1: Introduction. Database Management System (DBMS)

INTRODUCTION TO DATABASE SYSTEMS

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

An XML Based Data Exchange Model for Power System Studies

Using Multiple Operations. Implementing Table Operations Using Structured Query Language (SQL)

Heathrow and Dubai the World s Hub Airports

YOUR DAILY CONNECTION TO GREECE!

Business Requirements Document. BCBP Passenger Data exchange using the 2D Barcode. Version March 2009

Instant YANG. The Basics. Hakan Millroth, Tail- f Systems ( hakan@tail- f.com)

Chapter 7 Memory Management

FLUGHAFEN WIEN AG. Results Q1-3/2015

Understanding Infrastructure as Code. By Michael Wittig and Andreas Wittig

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Rod Eddington. Chief Executive

UK AIRSPACE Introduction

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

CSE 233. Database System Overview

02267: Software Development of Web Services

Course 6232A: Implementing a Microsoft SQL Server 2008 Database

ERASMUS AT READING. /studyabroadreading

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Oracle Fusion Middleware

ECS 165A: Introduction to Database Systems

Symbol Tables. Introduction

Efficient Data Structures for Decision Diagrams

Information Models, Data Models, and YANG. IETF 86, Orlando,

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Service Oriented Architecture

Document Engineering: Analyzing and Designing the Semantics of Business Service Networks

Building the Semantic Web on XML

Qantas International Network Changes Frequently Asked Questions Trade

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange

Android Based Healthcare System Using Augmented Reality

OData Extension for XML Data A Directional White Paper

SQL Simple Queries. Chapter 3.1 V3.0. Napier University Dr Gordon Russell

Introduction to database management systems

Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and

Relational model. Relational model - practice. Relational Database Definitions 9/27/11. Relational model. Relational Database: Terminology

Conceptual Level Design of Semi-structured Database System: Graph-semantic Based Approach

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types

Relational Database Basics Review

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Electronic Medical Records

II. PREVIOUS RELATED WORK

Transcription:

Semi-structured Data 1 - Introduction Andreas Pieris and Wolfgang Fischl, Summer Term 2016

Outline Structured Data Semi-structured Data Why Semi-structured Data? The Data Model Store Semi-structured Data

Structured Data Data is structured in semantic chunks - entities VIE, Vienna International, Vienna LHR, London Heathrow, London VIE, LHR, BA VIE, LHR, OS BA, British Airways OS, Austrian Airlines Similar entities are grouped together - classes Flights VIE, Vienna International, Vienna LHR, London Heathrow, London VIE, LHR, BA VIE, LHR, OS BA, British Airways Airlines OS, Austrian Airlines Airports

Structured Data Entities in the same class have the same descriptions - attributes Airports (VIE, Vienna International, Vienna) (LHR, London Heathrow, London) (Airport_Code, Name, City) Airlines (BA, British Airways) (OS, Austrian Airlines) Flights (VIE, LHR, BA) (VIE, LHR, OS) (Origin, Destination, Airline) (Airline_Code, Name)

Structured Data Entities in the same class have the same descriptions - attributes Airports (VIE, Vienna International, Vienna) (LHR, London Heathrow, London) (Airport_Code, Name, City) Flights (VIE, LHR, BA) (VIE, LHR, OS) (Origin, Destination, Airline) Airlines (BA, British Airways) (OS, Austrian Airlines) (Airline_Code, Name) same format (string, integer, date, etc.) Attributes in similar entities predefined length all present same order strict structure forced by a schema!!!

Structured Data - Relational Model Database model for structured data: entities records (or tuples) classes tables (or relations) Records grouped in tables attribute record table

Structured Data: On the Fly Example Airports Code Name City VIE Vienna International Vienna LHR London Heathrow London LGW London Gatwick London LCA Larnaca International Larnaca GLA Glasgow Glasgow EDI Edinburgh Edinburgh Airlines Code Name BA British Airways OS Austrian Airlines U2 EasyJet Flights Origin Destination Airline VIE LHR British Airways VIE LHR Austrian Airlines LHR EDI British Airways LGW GLA EasyJet

Persons Example Gerti Kappel, 18870, 18896, gerti@big.tuwien.ac.at Andreas, Pieris, pieris@dbai.tuwien.ac.at, 740072, 18493 Wolfgang Fischl, wfischl@dbai.tuwien.ac.at, 740050 Bill, Robert, 188316, bill@big.tuwien.ac.at

Semi-structured Data (SSD) Data is structured in semantic entities Similar entities are grouped in classes there is structure Entities in the same class may not have the same attributes Attributes of similar entities may have different format may have different length not all required may have different order but not too much structure

Semi-structured Data: Persons Example Gerti Kappel, 18870, 18896, gerti@big.tuwien.ac.at Andreas, Pieris, pieris@dbai.tuwien.ac.at, 740072, 18493 Wolfgang Fischl, wfischl@dbai.tuwien.ac.at, 740050 Martin, Fleck, 58801, fleck@big.tuwien.ac.at There is structure o Each row is a semantic entity - person o All entities are grouped in a class - persons But not too much structure o Entities have no regular structure o Structure of future entities is unpredictable

Why Semi-structured Data? There are data sources that we would like to treat as databases, but which cannot be constraint by a schema Flexible format for exchanging data between different places the WEB GOAL: Reconcile document view (web) with strict structures (databases)

Data Model We need an effective way to represent semi-structured data Like the relational model for structured data any ideas?

Trees as Data Model Gerti Kappel, 18870, 18896, gerti@big.tuwien.ac.at Andreas, Pieris, pieris@dbai.tuwien.ac.at, 740072, 18493 persons person name email tel fax Gerti Kappel 18870 18896 gerti@big.tuwien.ac.at person name email tel fax first Andreas pieris@dbai.tuwien.ac.at 740072 18493 last Pieris

Trees as Data Model SSD can be represented as a (labelled) tree: o leaf nodes standing for single data items o inner nodes have no label o edges labelled with elements person persons person name tel fax email Gerti Kappel 18870 18896 gerti@big.tuwien.ac.at Such a model is called self-describing - information that is usually associated with a schema is contained within the data Data carries its own description

SSD: Representing Relational Data Structured data is a special case of semi-structured data + relational data can be represented as a tree (with an overhead) R A B C a 1 b 1 c 1 a 2 b 2 c 2 R-table R-record R-record A B C A B C a 1 b 1 c 1 a 2 b 2 c 2

Store Semi-structured Data There are various formalisms to store semi-structured data o Object Exchange Model (OEM) o JavaScript Object Notation (JSON) o extensible Markup Language (XML)

Store Semi-structured Data persons person person name tel fax email Gerti Kappel 18870 18896 gerti@big.tuwien.ac.at {persons: {person: {name: Gerti Kappel tel: 18870 fax: 18896 email: gerti@big.tuwien.ac.at } } {person: {name: {first: Andreas, last: Pieris } email: pieris@dbai.tuwien.ac.at tel: 740072 fax: 18493} } } OEM Representation

Store Semi-structured Data persons person person name Gerti Kappel 18870 18896 gerti@big.tuwien.ac.at <persons> <person> <name> Gerti Kappel </name> <tel> 18870</tel> <fax> 18896</fax> <email> gerti@big.tuwien.ac.at </email> </person> <person> <name> <first> Andreas </first> <last> Pieris </last> </name> <email> pieris@dbai.tuwien.ac.at </email> <tel> 740072 </tel> <fax> 18493</fax> </person> </persons> tel fax email XML Representation

Store Semi-structured Data There are various formalisms to store semi-structured data o Object Exchange Model (OEM) o JavaScript Object Notation (JSON) o extensible Markup Language (XML) Different syntax Different mechanisms for self-describing but the goal is the same: store SSD Different description mechanisms o Which attributes are allowed/required o Which values are allowed/required Different query languages and manipulation mechanisms

Sum Up Structured Data o Similar entities grouped in classes o Similar entities have a regular structure o Relational Model Semi-structured Data o Similar entities grouped in classes o Similar entities have irregular structure o Trees as a Model Store Semi-structured Data o Various formalisms o extensible Markup Language (XML)