Catalogue or Register? A Comparison of Standards for Managing Geospatial Metadata Gerhard JOOS and Lydia GIETLER Abstract Publication of information items of any kind for discovery purposes is getting more and more important in every kind of business process. Several standards supporting this functionality have been developed by different standardization organizations. A standard for generic registry services was developed by OASIS and later adopted by ISO. Almost simultaneously OGC developed a catalogue service interface specification which was ambiguous and led to two different, non-interoperable profiles. The OASIS/ISO standard and the two OGC specifications are compared with respect to requirements for managing different types of geospatial metadata. It is shown, that one of the two OGC specifications, namely the CSW-ISO profile, does only support ISO 19115 and ISO 19119 metadata. This is problematic, because as soon as additional kinds of metadata like e.g. feature catalogue should be provided, an additional content model and interface have to be defined. This leads not only to a variety of unnecessary interfaces but also to additional costs for implementation. Since the geo-community often uses the terms catalogue and register as synonyms, the authors provide definitions for a clear distinction. A register is basically a catalogue with additional management mechanisms for versioning, security, and traceability. Since register has the more powerful concept, because a catalogue is a true subset, it is recommended to support the registry content models with the required extensions and a generic registry interfaces to be in harmony with the enterprise IT infrastructure. 1 Introduction Different kinds of metadata are required to support modelling, management, analysis and portrayal of geospatial information. In a classical view metadata is used to describe geospatial datasets in order to make their availability publicly accessible as catalogue for data discovery. Other meta information like the definition of a feature type or the parameters and values for coordinate reference systems or the rules for the cartographic presentation of a certain feature or the meaning of quality values should likewise be publicly available. GIS and other information system should be able to reference to these items. The definition of e.g. a feature type shall not be changed for any strong reason and the old definition shall still be accessible, even if superseded. The Organization for the Advancement of Structured Information Standards (OASIS) has developed standards in order to manage arbitrary items These standards have been forwarded and accepted by the International Standardization Organisation (ISO) as interna-
2 G. Joos and L. Gietler tional standard. In the Open Geospatial Consortium (OGC) a specification to manage specific items has been developed. A managed list of items is sometimes referred as catalogue or as register. But there are significant differences. The way their content and their interfaces are specified are important for the system developer of the service and has consequences for the user. How the different relevant standards are related and where they are distinct are discussed in this paper. Implementation cost is not considered. 2 Geospatial Metadata In the context of geospatial information system, several kinds of metadata have to be managed either in catalogues or in registers. Not all of these kinds of items have a spatial property. In the following paragraphs different kind of metadata are investigated, if they bear spatial characteristics. The ISO 19115 metadata standard provides elements to describe geospatial features or datasets as collections of features. A geospatial feature or a geospatial dataset have a geographic extent or a location. The extent or the location may be useful if data of a particular area is searched. The ISO 19119 standard for service metadata provides elements to describe services that deal with geospatial properties, e.g. Web Map Services and Web Feature Services. Even if the services themselves do not have spatial extensions, their content has. For that reason service metadata must be capable of holding spatial properties. An ISO 19126 feature concept dictionary provides basic definitions and related information about a set of concepts that may be used to describe geographic features and shared across multiple application areas. Elements from a feature concept dictionary may be re-used in one or more feature catalogues. This abstraction has no geometry. ISO 19110 methodology for feature cataloguing specifies how feature types can be organized into a feature catalogue and presented to the users of a set of geographic data. Similar to feature concept dictionaries this standards deal only at the type level and no instances with a spatial extent is needed. Portrayal rules and symbols as defined in ISO 19117 have no spatial extent per se. But it might be required to describe geometric or topological relationship of feature instances for a rule to portray special geometric constellations for some portrayal rules. Symbols do have geometry, but it is not related to a coordinate reference system. Coordinate reference systems as defined in ISO 19111 are required to give any coordinate in geospatial datasets meaning. The systems are mathematically defined by parameter of geodetic datums and projections. Most CRS have a limited area of validity. In order to manage this spatial property in a register, geospatial data types have to be provided. ISO 19138 data quality measures are the primitives for any quality description. They are used for any quantitative quality evaluation and its result. Some data quality measures refer to the spatial properties of features, but they do not have any spatial property.
Catalogue or Register? 3 3 Catalogues and Registers Registers are defined in the ISO standard 19135:2005, procedures for item registration, as a set of files containing identifiers assigned to items with descriptions of the associated items. On the contrary no clear definition of a catalogue can be found in the standards. Attempts of explanations are given in the OGC Abstract Specification (OGC, 1999), but these are ambiguous and contradicting. The content of catalogues and registers is metadata describing and/or summarizing associated resources. Both are designed to support discovery and retrieval of the associated resources. The big difference is that the content of a register is authoritative. This means, that only items that are accepted by an authorized register control body can be recorded in the register. This circumstance requires a number of additional functionality to be supported by a register as described in ISO 19135. It must be assured that a well-defined registration process is followed. This requires security policies at several levels to be enforced. Furthermore, it must always be possible to find out, who has when made which changes to the register. This functionality is called audit trail. Superseded register items shall not be replaced, but a new version shall be added and the old version shall be marked as retired. This requires support for versioning of register items by the register. Based on the above given requirements, the ISO 19135 definition of a register refers after a slight revision to a catalogue and the new catalogue definition can then be used to derive a clear definition for a register. A catalogue is a set of metadata entities, which contain identifiers and descriptions of associated resources. A register is an authoritative catalogue. 4 Services for Catalogues and Registers Both, catalogues and registers are maintained in information systems. While an information system maintaining registers is called registry (ISO 19135:2005), there is no special name for a catalogue information system. Access to the content of catalogues and registers is provided by catalogue services and registry services, respectively. A service interface for catalogue services is defined in the OpenGIS Catalogue Services Specification (OGC, 2007a). This specification is abstract and describes several types of implementations, based on different network protocols. One of these, the most popular, is the catalogue service for the web (CSW) which utilizes the Internet protocol http as network protocol. OGC developed two non-interoperable profiles of CSW, the ISO metadata application profile (CSW- ISO) and the ebrim application profile of CSW (CSW-ebRIM). The latter is also called CSW-ebRIM registry service. A set of generic standards for registry services are the ebxml Registry Specifications, also known as ebxml RegRep, initially developed by OASIS and later adopted by ISO. CSW-ISO utilizes a combination of a subset of elements from ISO 19115 and a subset of elements from ISO 19119 as the fixed catalogue information model (OGC, 2007b). The standard does not allow adopting the information model for other types of geospatial metadata. CSW-ebRIM utilizes OASIS ebrim version 3 as the underlying registry information model (OGC, 2007c). ebrim (ebxml Registry Information Model) is part 3 of ebxml (elec-
4 G. Joos and L. Gietler tronic business extensible Markup Language), a modular suite of specifications, enabling enterprises to conduct business over the Internet (ISO/TS 15000:2004-3; OASIS 2005a). ebrim defines a registry information model allowing to manage digital artefacts stored in repositories. It provides a mechanism to extend the model in order to support different types of repository content by developing extension packages. CSW-ebRIM introduces the basic extension package which defines a number of additional elements and taxonomies allowing describing geographic information services (OGC, 2007d). This includes the definition of some spatial and temporal data types, which are not supported natively in ebrim. Additional extension packages are needed for each kind of content. ebxml RegRep is short for a system implementing ebrim and ebrs (ebxml Registry Services and Protocols), which is part 4 of ebxml defining the registry service interface. ebrs has been developed as the native interface to ebrim and has therefore a normative reference on ebrim (OASIS, 2005b). It utilizes the same extension mechanism as CSWebRIM in order to support different types of resources. In this context extensions are called profiles. OASIS ebxml version 2 has been adopted by ISO as technical specification ISO/TS 15000 which consists of 6 parts. The revision of this standard is taking place this year and it is expected that OASIS ebxml version 3 will become the new ISO standard without major technical changes. For this reason, and as OASIS is already working on version 4, we discuss ebxml RegRep version 3 in the following paragraphs. Registry and catalogue services may support two different ways of querying entries based on their values: ad hoc queries and stored queries (OGC, 2007c; OASIS 2005b). An ad hoc query is a query which incorporates a filter encoded in a syntax which is supported by both client and server. In this case the client builds the filter when needed, which gives a lot of flexibility, especially if the filter syntax supports joins. A stored query is a named parameterized query provided by the service to the client through an interface. In this case, the list of search parameters is fixed and the client can only send values for those parameters. 5 Comparison CSW-ISO in general can only be used for maintaining catalogues. Although CSW-ebRIM claims to be a registry service, audit trail and version management are not supported. ebxml RegRep on the other hand does not natively support spatial data types and spatial query syntax, but provides appropriate extension mechanisms. Table 1 shows a detailed comparison of relevant functionality as described in chapter 3 and the standards for catalogues and registries. Not supported indicates that a standard does not provide the functionality and there is no extension mechanism. Supported means that the implementation of functionality is optional. Mandatory indicates that functionality must be supported by a system implementing the standard. Extensible means that the standard does not mention this feature, but it provides extension mechanism which can be used to support the functionality. A comparison of the standards and the types of geospatial metadata described in chapter 2 shows that CSW-ISO only supports cataloguing of ISO 19115 and ISO 19119 metadata. These types as well as the other types of geospatial metadata can be registered in both CSW-ebRIM and ebxml RegRep. In all cases, ebrim extension packages are required in order to support core queryables which differ from type to type. An extension package
Catalogue or Register? 5 developed for a certain type of metadata can be used for both CSW-ebRIM and ebxml RegRep. An OGC discussion paper describing an extension package for ISO 19115 and ISO 19119 metadata is already available (OGC, 2007d). A draft of an extension package for feature types is under development by OGC. Table 1: Comparison of functionality and the catalogue and registry standards Functionality CSW-ISO CSW-ebRIM ebxml RegRep Registration process not supported mandatory mandatory Version management not supported not supported mandatory Audit trail not supported not supported mandatory Security policy enforcement not supported supported mandatory Spatial datatypes supported mandatory extensible Spatial query syntax supported mandatory extensible Stored queries not supported supported supported 6 Conclusion and Discussion From the comparison in section 5 of the different specifications it can be concluded that the CSW-ISO profile does not support items that are not of direct spatial nature. On the other hand as shown in section 2 a majority of metadata has no spatial dimension. Due to the fixed content model of CSW-ISO, a new content model for each additional catalogue has to be defined. This leads not only to a variety of unnecessary interfaces but also to additional costs for implementation. Neither CSW-ebRIM nor ebxml RegRep support all functionality required for managing registers as defined in chapter 3. Adapting CSW-ebRIM to fulfil these requirements means adding functionality for audit trail and version management. Adjusting ebxml RegRep is only necessary if the type of metadata requires it, like e.g. ISO 19115 and ISO 19119 metadata do. If this is the case, the provided extension mechanisms may be used to develop an appropriate ebrim profile as well as an ebrs profile which extends the existing query syntax by spatial operators or by incorporating a mandatory a spatial query language like OGC Filter Encoding, which is also used by CSW-ebRIM. Besides this, spatial query syntax can be simulated by means of stored queries where the value of a parameter indicates the spatial operator. In this case, a client does not have the flexibility to create filters during run-time, but this is not really a limitation compared to CSW-ebRIM as OGC Filter Encoding does not support joins. The support of stored queries is even an advantage as the implementation of a client becomes less complex. Similar arguments hold for enterprises where geospatial information is only one component of service oriented architecture to support business processes. An implementation compliant with CSW-ISO or with CSW-ebRIM is not interoperable with other registers in such enterprises. Hence a workflow for searching any kind of data regardless if geospatial or not
6 G. Joos and L. Gietler via the same interfaces is broken. This is especially a pity, because the European initiative INSPIRE for establishing a spatial data infrastructure homogeneous for all European governments currently prefers the CSW-ISO profile in the proposed implementation rules and hence such enterprises have to support two interfaces. References OASIS (2005a): ebxml Registry Information Model. Available: http://www.oasisopen.org/committees/tc_home.php?wg_abbrev=regrep (4.5.2008) OASIS (2005b): ebxml Registry Services and Protocols. Available: http://www.oasisopen.org/committees/tc_home.php?wg_abbrev=regrep (4.5.2008) OGC (1999), The OpenGIS Abstract Specification Topic 13: Catalogue Services. Available: http://portal.opengeospatial.org/files/?artifact_id=20555 (4.5.2008) OGC (2007a), OpenGIS Catalogue Services Specification. Available: http://portal.opengeospatial.org/files/?artifact_id=20555 (4.5.2008) OGC (2007b): OpenGIS Catalogue Services Specification 2.0.2 -ISO Metadata Application Profile.. Available: http://portal.opengeospatial.org/files/?artifact_id=20596 OGC (2007c), CSW-ebRIM Registry Service - Part 1: ebrim profile of CSW. Available: http://portal.opengeospatial.org/files/?artifact_id=27092 (4.5.2008) OGC (2007d), CSW-ebRIM Registry Service Part 2: Basic extension package. Available: http://portal.opengeospatial.org/files/?artifact_id=27093 (4.5.2008) OGC (2007e), Cataloguing of ISO Metadata (CIM) Using the ebrim profile of CS-W. Available: http://portal.opengeospatial.org/files/index.php?artifact_id=20596 (4.5.2008) OGC (2005), OpenGIS Filter Encoding Implementation Specification.. Available: http://portal.opengeospatial.org/files/index.php?artifact_id=8340 ISO 15000-3:2004, Electronic business extensible Markup Language (ebxml) - Part 3: Registry information model specification (ebrim) ISO 15000-4:2004, Electronic business extensible Markup Language (ebxml) - Part 4: Registry services specification (ebrs) ISO 19110:2005, Geographic information - Methodology for feature cataloguing 19110:2005/FDAM 1, Geographic information - Methodology for feature cataloguing, Amendment 1 ISO/FDIS 19111, Geographic information - Spatial referencing by coordinates ISO 19117:2005: Geographic information - Portrayal ISO/CD 19126:2007, Geographic information - Feature concept dictionaries and registers ISO 19135:2005, Geographic information - Procedures for item registration ISO/TS 19138 Geographic information - Data quality measures