Tools and Experiences in Implementing INSPIRE Data Specifications IV: Practical Schema Translation for INSPIRE Eddie Curtis Snowflake Software
Agenda Background to the INSPIRE testing project The solution architecture The schema translation Model differences between HMLR and INSPIRE Schema translation patterns Quality of service issues Performance Scalability Extensibility
Background Land Registry of England and Wales (HMLR) Register title to land in England and Wales Record dealings with registered land Snowflake Software The Data exchange company specialising in XML and GML Provide solutions via consultancy, software and training INSPIRE Testing HMLR Data provider for Annex 1 Cadastral Parcels theme Snowflake Software provider for data translation
Approach Test the viability of using Commercial Off-The-Shelf (COTS) software for INSPIRE Develop the translation without software customisation Work quickly and productively to reduce costs Refine the translation over several iterations within a limited time period Implement Download and Direct Access Services to explore the practical issues of implementing real business requirements On-the-fly translation to avoid replicating database infrastructure Source data from the existing HMLR data model to avoid disruption to existing business processes Implement an industrial strength solution
Solution Architecture
Schema Translation GO Publisher Software Architecture Database to GML/XML translation Desktop Configuration of translation rules File export WFS configuration Agent configuration Agent Bulk file generation Chunking of output data WFS OGC compliant Web Feature Service 6
Schema Translation One Directional Translation - GO Publisher Desktop. Desktop Data Store SQL Query Database Records Schema Translation Configuration with GUI User XML/GML preview XML/GML. XML File. 7
Schema Translation Bulk data creation - GO Publisher Agent 8
Schema Translation Query Translation - GO Publisher WFS SQL Query Server Data Request GO Publisher can publish the same data in different XML formats for different clients. Database Records Schema Translation GML WFS Client Data Store SQL Query Database Records Schema Translation Data Request WFS Client Database table information Translation Configuration Desktop GML Translations are uploaded to the server. Graphical user interface is used to defined the translation from the storage model to the XML data model. 9
The Solution Architecture
Schema Translation 11
Translation Patterns Simple Translation BUILDING ID NAME GEOMETRY 1 Eastgate House 2 Villa Villekulla 3 West Quay <Building gml:id= b1 > <name>eastgate House</name> <geometry> <gml:polygon> </gml:polygon> </geometry> </Building> 12
13
Translation Patterns Grouping of Columns BUILDING ID NAME POSTCODE GEOMETRY 1 Eastgate House SO15 2NY 2 Villa Villekulla VI12 4KT 3 West Quay SO12 4KD <Building> <address> <name>eastgate House</name> <postcode>so15 2NY</postCode> </address> <geometry> </geometry> </Building> 14
15
Translation Patterns Concatenation BUILDING ID NAME Town GEOMETRY 1 Eastgate, House Southampton 2 Villa Villekulla Portsmouth 3 West Quay Southampton <Building> <address> Eastgate, House, Southampton </address> <geometry> </geometry> </Building> 16
Translation Patterns Translating Table Joins - Database vs. XML Relationships Relational Relationships (joins) are two directional Identifiers essential to maintaining relationships Relationships can be discovered through queries. XML Relationships are inherently directed Identity of objects does not have to be explicit Relationships are explicit parent/child previous/next sibling Xlink/Xpointer references 18
Translation Patterns Translating Table Joins BUILDING ID NAME GEOMETRY 1 Eastgate House 2 Villa Villekulla 3 West Quay BUILDING_EXTENSION BLDG_ID TYPE GEOMETRY 1 Garage 1 Shed 2 Shed <Building gml:id= b1 > <name>eastgate House</name> <geometry> <gml:polygon> </gml:polygon> </geometry> <extension> <BuildingExtension> <type>garage</type> <geometry> </geometry> </BuildingExtension> </extension> <extension> </extension> </Building> 19
20
Translation Patterns Filtering TOPOGRAPHIC_AREA ID DESRCIPTIVE_GROUP GEOMETRY 1 Building 2 Natural Environment 3 Road Or Track <Building gml:id= id1 > <geometry> </geometry> </Building> 21
22
Translation Patterns Fan-Out Translation TOPOGRAPHIC_AREA ID DESRCIPTIVE_GROUP GEOMETRY 1 Building 2 Natural Environment 3 Road Or Track <Building gml:id= id1 > <geometry> </geometry> </Building> <NaturalEnvironment gml:id= id2 > <geometry> </geometry> </NaturalEnvironment> <RoadOrTrack gml:id= id3 > <geometry>..</geometry> </RoadOrTrack> 23
24
25
Translation Patterns Fan-In Translation ID 1 2 ID 3 ID 4 PYLON GEOMETRY CRANE GEOMETRY BRIDGE GEOMETRY <Structure gml:id= id1 > <geometry> </geometry> </Structure> <Structure gml:id= id2 > <geometry> </geometry> </Structure> <Structure gml:id= id3 > <geometry>..</geometry> </Structure> <Structure gml:id= id4 > <geometry>..</geometry> </Structure> 26
27
Translation Patterns Constants PYLON ID GEOMETRY 1 2 CRANE ID GEOMETRY 3 BRIDGE ID GEOMETRY 4 <Structure gml:id= id1 > <type>pylon</type> <geometry> </geometry> </Structure> <Structure gml:id= id2 > <type>pylon</type> <geometry> </geometry> </Structure> <Structure gml:id= id3 > <type>crane</type> <geometry>..</geometry> </Structure> <Structure gml:id= id4 > <type>bridge</type> <geometry>..</geometry> </Structure> 28
29
Translation Patterns Relationships Between Patterns 30
Advanced Scenarios SQL processing Using integrated XSLT engine XSLT processing of the entire output XSLT processing of fragments of output Java processing 31
Advanced Scenarios Most translations are covered by the translation patterns For very complex cases programming or scripting is needed Several approaches are available The choice of approach is based on Limitations of the approaches Performance and scalability considerations Availability of programming/scripting skills. 32
Advanced Scenarios SQL Processing Applies functions to data values e.g. Mathematical functions Lookups for coded values Coordinate system transformations Applies before XML creation Functions can have Single output value Multiple input values 33
Advanced Scenarios XSLT Processing Applies after initial XML creation Create additional XML child element Delete elements Modify data values Can be applied to whole output stream Can transform entire document structure Can carry out XML to non-xml conversion May require whole output in memory fragments of output Can change data structure or values of fragment Only requires fragment to be held in memory 34
Advanced Scenarios Java Processing Applies after initial XML creation Create additional XML child element Delete elements Modify data values Standard java interface (XMLFilter) SAX stream processing Requires additional programming tools Typically higher performance than XSLT processing 35
Quality of Service
Quality of Service Performance Response Time Throughput Request Throughput Data Volume Resilience Failover Graceful Degradation Scaling Concurrent Requests Data Volume Response Size Extensibility Additional Schemas Additional Interfaces
Quality of Service Streaming vs. In-memory Processing Streaming Data returned piece by piece only part of the data in memory at once Low memory footprint Large number of concurrent requests No limit on size of data returned Suitable for large XML documents (like GML) Typically SAX based In-memory Data loaded into memory for processing Higher memory footprint Smaller number of concurrent request Size of data returned limited by memory Typical implementation for small XML files Typically based on DOM, XSLT or static binding
Quality of Service On-the-fly translation Re-uses existing database infrastructure No disruption to existing business processes Extra translations added at low cost Low initial investment - costs scale with increasing levels of data traffic
Thank you! Eddie.Curtis@snowflakesoftware.com www.snowflakesoftware.com