Data Models For Interoperability Rob Atkinson
Contents Problem Statement Conceptual Architecture Role of Data Models Role of Registries Integration of GRID and SDI
Problem Statement How do we derive useful information from sparse sampling and many possible models of behaviour In the solid earth domain note many similar issues in atmosphere, marine, land management, ecosystems Given that data is collected and/or managed by many different agencies
Problem (Facets) Modelling many possible inputs Geochemical sampling Predictive Data Model Geochemical properties Spatial Constraints Modelled Results Information Products Model Parameters (eg element size
Problem (Facets) Many possible models May want to re-run conditions with new model, or new data Models may take time Or results may need to be archived for audit trail reasons Huge or trivial amounts of data.
Architecture..the strategic decisions about the structure and behaviour of the system, the collaborations among the system and the physical deployment of the system Quatrani
Conceptual Architecture GRID Web Services Differences? Commonalities?
Sloan Digital Sky Survey Production System
I ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes. created-by Motivations Data I ve detected a calibration error in an instrument and want to know which derived data to recompute. consumed-by/ generated-by Transformation execution-of I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won t have to write one from scratch. Derivation I want to apply a astronomical analys program to millions o objects. If the result already exist, I ll sav weeks of computation
Motivations file1 psearch t 10 file8 simulate t 10 file2 reformat f fz file1 file1 File3,4,5 file7 conv I esd o aod file6 summarize t 10 Patch workflow following changes Manage workflow Explain provenance, e.g. file8: psearch t 10 i file3 file4 file5 o file8 summarize t 10 i file6 o file7 reformat f fz i file2 o file3 file4 file5 conv l esd o aod i file 2 o file6 simulate t 10 o file1 file2 On-demand data generation
Virtual Data Grid discovery virtual data catalog virtual data catalog Production Manager planning Science Review workflow executor (DAGman) composition request executor (Condor-G, GRAM) workflow planner Researcher Grid Monitor request planner request predictor (Prophesy) sharing discovery derivation Data Transport virtual data index storage element replica location service simulation data simulation Data Grid storage element virtual data catalog storage element raw data Storage Resource Mgmt analysis detecto Grid Operations Computing Grid
Example GRID Dataset Types FileDataset File FileSet Representational MultiFileSet TarFileSet EventCollection Logical RawEventSet SimulatedEventSet MonteCarloSim ulation DiscreteEvent Simulation
Web Services 2. Find Registry 1. Publish Service provides binding metadata e.g WSDL, OpenGIS getcapabilities() etc Requestor 3. Bind Provider 4. Chain Web protocols, XML encoding of responses Provider
OpenGIS Web Services Application Client Browser Application Server Application Client OpenGIS Application Services Network Find Bind Registry Services Publish Encodings OpenGIS Services Framewor Data Services Portrayal Services Processing Services
Example GIS Data Service (WFS) Data service TD databas Application Control GML (geometry) Modelling service XML material/ thermodynamic properties SVG (graphics) Portrayal & Graphics service XML (results)
SDI Component Model Data Repository Data Repository Interface Interface Applications Applications Search Organisations And People Species Taxonomy Register (using keywords) Search Observables Dictionary Metadata Catalogue Map Catalogue Services Catalogue Symbology Feature Type Catalogue Catalogs Keywords
Atkinson s instant type hierarchy Primitives (supplied by environment) standards allow technology implementation (GML, RDF etc) profiles mandated common patterns (eg metadata requirements) Feature Types domain models Service Types Service Offerings (introduces content models)
Data Models GRID files homogeneous data model interoperable but not flexible OpenGIS WS feature types implementation policies determine level of interoperability Can we develop a humungous data model to anticipate all needs and capabilities.
Component Data Models Reusable sub-components Allow services to be created to serve reusable data Because a service consumer doesn t have to subscribe to the entire enterprise model! NB: Service offers described using types WSDL, using XML:schema (c.f. GML)
MarineXML Architecture Marine Cadastre Land/Sea interface EXAMPLES: ECDIS Marine Navigation Cruise Marine Survey Events Marine Biodiversity Marine Science Water Column Features Benthic Features Domain Profiles Common Types Abstract Types LandXML Observations SensorML Ontology Platform Components GML 3
Data Model : Feature Types Example Feature Feature Type Definitions
Strong or Weak Typing Is a feature best described as: GeneralFeature (attribute:type= Hut ) Or Building (attribute:type= Hut, beds=3) Or Hut (beds=3) Or 3 Bed Hut Granularity depends on usage requirements
Logical and Physical Data Models ArcGIS Marine Data Model Physical Model: Abstraction from a geometry
Example Logical View
Implementation
SDI Information Architecture Registry Client Encoding Abstract Logical Data Model Web Service Physical Data Model
Role of Registries service offer type definition service instance OGC Service Registry find publish Requester Provider bind
Role of Registries central purpose is to store service offers and match them with service requests provide a context in which resources can be discovered and used Data and service types are the key units of classification Ipso facto registry design is a translation of policy into technology driven by a system architecture that clearly identifies data model.
Registries Easy to build Many implementations little consistency Not too hard to connect to if the content is valuable! Hard to populate Technically no problems Achieving critical mass And being useful to future applications
Integration of GRID and Web Services? GRID pre-defined applications Computational and data storage A range of outputs Control functions Web Services : Initiation Exploitation Discovery
Virtual Data Grid discovery virtual data catalog virtual data catalog Production Manager planning Science Review workflow executor (DAGman) composition request executor (Condor-G, GRAM) workflow planner Researcher Grid Monitor request planner request predictor (Prophesy) sharing discovery derivation Data Transport virtual data index storage element replica location service simulation data simulation Data Grid storage element virtual data catalog storage element raw data Storage Resource Mgmt analysis detecto Grid Operations Computing Grid
Example GIS Data Service (WFS) Data service TD databas Application Control GML (geometry) Modelling service XML material/ thermodynamic properties SVG (graphics) Portrayal & Graphics service XML (results)
Going forward Model desired outputs (are these already well known?) Know current storage and processing baseline identify possible GRID applications System architecture: what goes where Data Models Web Services Registries Build something and learn! Deployed services are the performance benchmark
Infrastructure Grants: will allow us to work out WHAT to do Services require a long term home Services drive other people s business plans! So we need to address both in parallel