Database Technologies



Similar documents
Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

XML and Data Management

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Database & Information Systems Group Prof. Marc H. Scholl. XML & Databases. Tutorial. 11. SQL Compilation, XPath Symmetries

High Performance XML Data Retrieval

Processing Genome Data using Scalable Database Technology. My Background

Creating a TEI-Based Website with the exist XML Database

Hierarchical Data Visualization

Analysis and Design of Software Systems Practical Session 01. System Layering

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

WebCite Technical Background and Best Practices Guide

Agents and Web Services

DataDirect XQuery Technical Overview

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Evaluating Metadata access

XML: extensible Markup Language. Anabel Fraga

Keep it Simple... 7 Transformation-based Development (2013 and Beyond)...7 Less Customization and More Innovation...8 Time to Market...

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

CellStore: Educational and Experimental XML-Native DBMS

Modern Databases. Database Systems Lecture 18 Natasha Alechina

<Namespaces> Core XML Technologies. Why Namespaces? Namespaces - based on unique prefixes. Namespaces. </Person>

Web Service Testing. SOAP-based Web Services. Software Quality Assurance Telerik Software Academy

Introduction to XML Applications

XML WEB TECHNOLOGIES

Data XML and XQuery A language that can combine and transform data

Lesson 4 Web Service Interface Definition (Part I)

REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES. Jesse Wright Jet Propulsion Laboratory,

Linear Sequence Analysis. 3-D Structure Analysis

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben

Markup Languages and Semistructured Data - SS 02

A Comparison of the Relative Performance of XML and SQL Databases in the Context of the Grid-SAFE Project

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

Ficha técnica de curso Código: IFCAD320a

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases

Guile Present. version 0.3.0, updated 21 September Andy Wingo

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration

Managing large sound databases using Mpeg7

Large Scale Text Analysis Using the Map/Reduce

10CS73:Web Programming

Use a Native XML Database for Your XML Data

MD Link Integration MDI Solutions Limited

X.500 and LDAP Page 1 of 8

ISM/ISC Middleware Module

Financial Big Data Loosely coupled, highly structured. Andrew Elmore

Using Altova Tools with DB2 purexml

Efficiently Identifying Inclusion Dependencies in RDBMS

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Chapter 1: Introduction

Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007

Overview Document Framework Version 1.0 December 12, 2005

Jamcracker Web Services. David Orchard Standards Architect

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

XML- New meta language in e-business

Big Data Analytics. Rasoul Karimi

T Network Application Frameworks and XML Web Services and WSDL Tancred Lindholm

INTRO TO XMLSPY (IXS)

Zoomer: An Automated Web Application Change Localization Tool

Rational Application Developer Performance Tips Introduction

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR

Modern XML applications

Multimedia Applications. Mono-media Document Example: Hypertext. Multimedia Documents

Using Object And Object-Oriented Technologies for XML-native Database Systems

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

Natural Language to Relational Query by Using Parsing Compiler

CHAPTER 1 INTRODUCTION

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

XForms. National Informatics Centre, Open Technology Centre. -a new generation e-form Introduction

Teaching Bioinformatics to Undergraduates

Adding Panoramas to Google Maps Using Ajax

Bioinformatics Grid - Enabled Tools For Biologists.

Auto-lead Data Format / ADF. An Industry Standard Data Format for the Export and Import of Automotive Customer Leads using XML

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

Technologies for a CERIF XML based CRIS

How To Create A Data Transformation And Data Visualization Tool In Java (Xslt) (Programming) (Data Visualization) (Business Process) (Code) (Powerpoint) (Scripting) (Xsv) (Mapper) (

Extensible Markup Language (XML): Essentials for Climatologists

Reduces development time by 90%

Page: 1. Merging XML files: a new approach providing intelligent merge of XML data sets

XStruct: Efficient Schema Extraction from Multiple and Large XML Documents

MongoDB and Couchbase

RNA Movies 2: sequential animation of RNA secondary structures

DbSchema Tutorial with Introduction in MongoDB

Cleo Communications. CUEScript Training

Transcription:

Database Technologies Bachelor and Master Projects XML Databases Database & Information Systems Group Christian Grün

Introduction XML just small files why databases? library of U (800 MB) genetic data (Swissprot, 3 GB) Wikipedia (8 GB) Medline (38 GB) Challenges support new standards find relevant query optimizations visualizing results tree-structured data structure <XML/> vs <root><entry id="100k_rat" class="standard" mtype="prt" seqlen="889"> <AC>Q62671</AC> <Mod date="01-nov-1997" Rel="35" type="created"></mod> <Mod date="01-nov-1997" Rel="35" type="last sequence update"></mod> <Mod date="15-jul-1999" Rel="38" type="last annotation update"></mod> <Descr>100 KDA PROTEIN (EC 6.3.2.-)</Descr> <Species>Rattus norvegicus (Rat)</Species> <Org>Eukaryota</Org> <Org>Metazoa</Org> <Org>Chordata</Org> <Org> Craniata</Org> <Org>Vertebrata</Org> <Org>Euteleostomi</Org> <Org>Mammalia</Org> <Org> Eutheria</Org> <Org>Rodentia</Org> <Org>Sciurognathi</Org> <Org>Muridae</Org> <Org> Murinae</Org> <Org>Rattus</Org> <Ref num="1" pos="sequence FROM N.A"> <Comment> STRAIN=WISTAR</Comment> <Comment>TISSUE=TESTIS</Comment> <DB>MEDLINE</DB> <MedlineID> 92253337</MedlineID> <Author>Mueller D</Author> <Author>Rehbein M</Author> <Author> Baumeister H</Author> <Author>Richter D</Author> <Cite>Nucleic Acids Res. 20:1471-1475(1992)</Cite> </Ref> <Ref num="2" pos="erratum"> <Author>Mueller D</Author> <Author>Rehbein M</Author> <Author>Baumeister H</Author> <Author>Richter D</Author> <Cite>Nucleic Acids Res. 20:2624-2624(1992)</Cite> </Ref> <EMBL prim_id="x64411" sec_id= "CAA45756"></EMBL> <INTERPRO prim_id="ipr000569" sec_id="-"></interpro> <INTERPRO prim_id="ipr002004" sec_id="-"></interpro> <PFAM prim_id="pf00632" sec_id="hect" status= "1"></PFAM> <PFAM prim_id="pf00658" sec_id="pabp" status="1"></pfam> <Keyword>Ubiquitin conjugation</keyword> <Keyword>Ligase</Keyword> <Features> <DOMAIN from="77" to="88"> <Descr>ASP/GLU-RICH (ACIDIC)</Descr> </DOMAIN> <DOMAIN from="127" to="150"> <Descr>PRO-RICH</Descr> </DOMAIN> <DOMAIN from="420" to="439"> <Descr>ARG/GLU-RICH (MIXED CHARGE)</Descr> </DOMAIN> <DOMAIN from="448" to="457"> <Descr>ARG/ASP-RICH (MIXED CHARGE)</Descr> </DOMAIN> <DOMAIN from="485" to="514"> <Descr>PABP- LIKE</Descr> </DOMAIN> <DOMAIN from="579" to="590"> <Descr>ASP/GLU-RICH (ACIDIC) </Descr> </DOMAIN> <DOMAIN from="786" to="889"> <Descr>HECT DOMAIN</Descr> </DOMAIN> <DOMAIN from="827" to="847"> <Descr>PRO-RICH</Descr> </DOMAIN> <BINDING from="858" to="858"> <Descr>UBIQUITIN (BY SIMILARITY)</Descr> </BINDING> </Features></Entry> <Entry id="104k_thepa" class="standard" mtype="prt" seqlen="924"> <AC>P15711</AC> <Mod date="01-apr-1990" Rel="14" type="created"></mod> <Mod date="01-apr-1990" Rel="14" type="last sequence update"></mod> <Mod date="01-aug-1992" Rel="23" type="last annotation update"></mod> <Descr>104 KDA MICRONEME-RHOPTRY ANTIGEN</Descr> <Species>Theileria parva </Species> <Org>Eukaryota</Org> <Org>Alveolata</Org> <Org>Apicomplexa</Org> <Org> Piroplasmida</Org> <Org>Theileriidae</Org> <Org>Theileria</Org> <Ref num="1" pos= "SEQUENCE FROM N.A"> <Comment> STRAIN=MUGUGA</Comment> <DB>MEDLINE</DB> <MedlineID> 90158697</MedlineID> <Author>Iams K.P</Author> <Author>Young J.R</Author> <Author>Nene V</Author> <Author>Desai J</Author> <Author>Webster P</Author> <Author>Ole-Moiyoi O.K</Author> <Author>Musoke A.J</Author> <Cite>Mol. Biochem. Parasitol. 39:47-60(1990)</Cite> </Ref> <EMBL prim_id="m29954" sec_id="aaa18217"></embl> <PIR prim_id= "A44945" sec_id="a44945"></pir> <Keyword>Antigen</Keyword> <Keyword>Sporozoite</Keyword> Seite 2

BaseX XML database, developed in DBIS workgroup open source: www.basex.org query languages: W3C standards XPath & XQuery extensions: XQuery Update, Full-Text indexes: attributes, texts full-text special focus: tight coupling between frontend and backend Seite 3

Topics Backend Namespace Support what are namespaces? <Address> <FirstName>John</FirstName> <FamilyName>McHilton</FamilyName> <Street>12 Donovan Road</Name> <Town>Chicago, 31072</Town> </Address> XPath: //FirstName //Familyname <Address xmlns:name="names"> <name:first>john</name:first> <name:family>mchilton</name:family> <Street>12 Donovan Road</Name> <Town>Chicago, 31072</Town> </Address> XPath: //name:* design of an elegant solution for namespace access extension of the internal BaseX storage unterstanding of the specification Seite 4

Topics Backend DTD Parsing what is a DTD? defines the document structure and entities allows document validation <mondial> <country id= f0_136"> <name>germany</name> <city>münchen</city> </mondial> <!ELEMENT mondial (country*) > <!ELEMENT country (name, city*) > <!ELEMENT name (#PCDATA) > <!ELEMENT city (#PCDATA) > <!ATTLIST country id ID #REQUIRED > extension of the XML parser integration of validate commands unterstanding of the specification <!ENTITY uuml ü > Seite 5

Topics Backend XQuery Optimizations sample (returns all media with the title Casablanca ): possible query plans: for $i in doc("library.xml")//medium where $i/title = "Casablanca" return $i parse all Medium and Title tags (sequential scan) very slow access the index and check results much faster! implementation of existing XPath optimizations for XQuery learning much about XQuery and tree-structured optimizations! Seite 6

Topics Backend Index Management current state: one index for all texts & attribute value desirable: special-purpose indexes: indexes for single tags/attributes indexes on numeric values range queries index for approximate text search extension of the existing indexes adaptation of the query optimizations thoughts on new index structures <Medium> <Title>Matrix</Title> <Year>1999</Year> <Type>DVD</Type> </Medium> <Medium> <Title>Matrix Reloaded</Title> <Year>2003</Year> <Type>DVD</Type> </Medium> XPath: //Medium[Year > 2000]? Seite 7

Topics Frontend View Schemas XML structure and contents can be very diverse: attribute-based storage <item id="0" firstname="hans" lastname="gruber" title="b.sc." /> <item id="1" firstname="thomas" lastname="schmid" title="prof." /> text-based storage <item><id>0</id><first>hans</first><last>gruber</last><title>... flat vs. hierarchic data desirable: view definitions to optimize visualization output analysis of existing XML documents design of a view schema implementation of schema parsing and interpretation Seite 8

Topics Frontend TreeMap space-filling visualization for hierarchic data diversity of layout algorithms available numerous attributes unexploited: color, intensity, popular example: size-based file system visualization visualization of tree-structured data implementation of efficient Java visualizations Seite 9

Topics Frontend Visualization numerous visualizations exist for tree-structured data: conventional tree view hyperbolic view interring, visualization of tree-structured data implementation of efficient Java visualizations Seite 10

Organization First take some time for your decision feel free to suggest own topics Events project is accompanied by a weekly project seminar seminar includes regular updates between all members and one talk on your project Room: E217 88-4449 @ christian.gruen@uni-konstanz.de Seite 11