ORCA-Registry v1.0 Documentation Document History James Blanden 13 November 2007 Version 1.0 Initial document. Table of Contents Background...2 Document Scope...2 Requirements...2 Overview...3 Functional Specification...5 Public Access...5 Search...5 Browse...6 View Registry Object...7 Web Services...9 Administration...10 List Data Sources...10 Add Data Source...11 View Data Source...12 Edit Data Source...16 Delete Data Source...17 Brief Technical Overview...18 System Requirements...18 File Structure...18 Database...19 Schemata...19 ORCA-Registry v1.0 Documentation 13 November 2007 Page 1 of 19
Background The Online Research Collections Australia (ORCA)-Registry software is a product of the Australian Partnership for Sustainable Repositories (APSR) Collection Services and Infrastructures (COSI) initiatives of 2007. The ORCA-Registry is a PHP/PostgreSQL web application designed to be housed within an instance of the COSI-Framework. More information about ORCA can be found at the APSR website http://www.apsr.edu.au/orca/index.htm. Document Scope This document describes the ORCA-Registry software only. It does not describe specifics of the software framework that is required to house it (see the COSI-Framework documentation for more information); issues relating to governance of an instance of the software; or provide guidelines or information for those who wish to provide data to an instance of the software. Installation instructions are provided in the software download, and so are not repeated here. A general knowledge of web application design and technologies is assumed. Requirements Requirements to be met by the ORCA-Registry web application software were defined at a high level in background and project documentation. As the development progressed, understanding and definition of the requirements became clearer these are summarised below: The registry data structures are to be based on those described in (and the schema provided by) ISO 2146 - Registry Services for Libraries and Related Organisations (ISO TC46 SC4 Working Draft, 13 December 2005). Available as a Microsoft Word document at http://www.nla.gov.au/wgroups/iso2146/n197.doc. Define an XML schema (based on the above) for the import, export, and interchange of data into, out of, and between instances of the software. Provide for the import of conforming XML data over HTTP. Provide for the export of conforming XML over HTTP thus enabling the aggregation/federation of registries, and providing data for utilisation by external services. Provide public access to a web-based user interface for searching and browsing (discovery) of data in the registry. Provide restricted access to a web-based user interface for complete administration of the registry for the owners of the registry to manage the sources of data that the registry will be populated from. Provide restricted access to a web-based user interface for partial administration of a defined set of records in the registry for the owners of the sources of data that the registry will be populated from to manage some aspects of registry interaction with their data. ORCA-Registry v1.0 Documentation 13 November 2007 Page 2 of 19
Overview The ORCA-Registry is a PHP/PostgreSQL web application designed to be housed in an instance of the COSI-Framework (see the COSI-Framework documentation for more information). Figure 1 (below) provides a conceptual overview of the application design. Figure 1: Overview of the ORCA-Registry. ORCA-Registry v1.0 Documentation 13 November 2007 Page 3 of 19
The registry is a database containing Registry Object records along with some information to support the gathering of this data from data providers via their Data Sources. Registry Object records are one of four classes describing a Collection, Service, Party, or Activity. These records are defined using the ORCA-Registry Data Interchange Schema (an XML schema based on that provided in the ISO 2146 draft standard). The registry software will only import, store, and export a subset of the data that can be described using the ORCA-Registry Data Interchange Schema. This subset is referred to as the supported set (for the convenience of data providers, the supported set is defined in an other XML schema). The basic operation of the registry software can be described as follows. An organisation has data (usually housed in a repository of some description, though not necessarily) that they wish to be discoverable by users of the registry. They (the data providers) request that the owners of the registry configure their organisation s repository as a Data Source in the registry. The registry owners configure the Data Source; issuing a key (which is unique within the registry, and taking the form of the reverse domain name of the organisation); setting a URI for where to find the data that the data providers will expose; and creating roles to allow the data providers to manage and test their Data Source against the registry. The data providers now need to provide XML conforming to the ORCA- Registry Data Interchange Schema (and not bothering with more than the supported set) at the URI set for their Data Source (forming a Repository Interface for the Metadata Export box in Figure 1). The data can now be imported into the registry, where it will be discoverable by search and browse, and exposed by web services. ORCA-Registry v1.0 Documentation 13 November 2007 Page 4 of 19
Functional Specification Public Access The ORCA-Registry provides public access to functionality that supports discovery, exchange, and further utilisation of the data that it contains, by users and external software. Search The Search activity provides a simple form for searching the registry. The Search provides filtering by Registry Object type (using checkboxes) and a text field for entering a search term or phrase. The Search will look for a case-insensitive match of the entered text across Registry Object names, keys, identifiers, and subjects. Any resulting matches will be filtered according to which types (Collections, Services, Parties, and Activities) have been included the search. A checked checkbox next to a type indicates that objects of that type will be included in the results. The default setting has all of the checkboxes checked, and so will include objects of all types in the results. The search does not look separately at words within a search phrase. It will only return objects within which it can find the exact search phrase (ignoring case). For example: a search for 'the australian' (ignoring quotes) would return objects with the name 'The Australian National University', but would not return objects with the name 'An Australian Story', while a search for 'australian' would return both objects. Figure 2 (below) shows an example of the Search activity displaying results for a search. Figure 2: ORCA-Registry, Search example. If more results than can reasonably be displayed on one page are returned, then the bottom of the page will provide navigation between pages of results, as shown in Figure 3 (below). Figure 3: ORCA-Registry results pagination example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 5 of 19
Any search results will consist of a list of records including the following information for each record (where available): Names. A list of all recorded names for the record. Consisting of all name values for the record, delimited by a middle dot character, not including name role or name part type, and hyperlinked to the View Registry Object activity for that record. Type Key. Registry Object type key. Source. The title of the Data Source from which the data was collected. Identifiers. A list of all recorded globally recognised identifiers for the record delimited by a middle dot character. Relations. A list of all recorded relations that the Registry Object has to other Registry Objects collected from the same Data Source. Consisting of relation type: value pairs, delimited by a middle dot character, and hyperlinked to the View Registry Object activities for the related records. Subjects. A list of all recorded subjects for the record. Consisting of subject scheme: value pairs delimited by a middle dot character. Description. All of the recorded descriptions for the record, truncated to less than 300 characters. Consisting of description role: value pairs. Browse The Browse activity provides an alternative means of finding information within the registry. Using the Browse, the entire contents of the registry are accessible by following hyperlinks for the type of Registry Object, and then alphabetical categorisation of record names. Records with a name beginning with the word The will be listed under T as well as the first letter of the first word following The. All Browse results are displayed in the same manner as the Search results. As the Browse is navigated via hyperlinks, and all records are exposed, this also provides a means for external search engines to crawl and index all of the data which can then be discovered with whatever tools they may offer. Figure 4 (below) shows an example of the Browse activity. Figure 4: ORCA-Registry, Browse example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 6 of 19
View Registry Object The View Registry Object activity displays all of the data held in the registry for a Registry Object. That is, all of the data for a Registry Object within the supported set collected from a Data Source. The data is displayed as follows: Type. The heading for the record is the Registry Object type of the record. Key. The Registry Object key. Source. The title of the Data Source from which the data was collected. Identifiers. A list of all recorded globally recognised identifiers for the record delimited by line breaks. Relations. A list of all recorded relations that the Registry Object has to other Registry Objects collected from the same Data Source. Consisting of relation type: value pairs, delimited by line breaks, and hyperlinked to the View Registry Object activities for the related records. Physical Addresses. A list of all recorded physical addresses for the record. Consisting of physical address part part type: value sequences (with part type being optional) delimited by line breaks. Electronic Addresses. A list of all recorded electronic addresses for the record. Consisting of electronic address part part type: value sequences (with part type being optional) delimited by line breaks. Subjects. A list of all recorded subjects for the record. Consisting of subject scheme: value pairs delimited by line breaks. Description. All of the recorded descriptions for the record. Consisting of description role: value pairs delimited by line breaks. Activity Type. The Activity type. Group Type. The Group type. Person Type. The Person type. Collection Type. The Collection type. Service Type. The Service type. ORCA-Registry v1.0 Documentation 13 November 2007 Page 7 of 19
Figure 5 (below) shows an example of the View Registry Object activity. Figure 5: ORCA-Registry, View Registry Object example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 8 of 19
Web Services The Web Services activity displays information and links to the data export services provided by an instance of the registry. As well as supporting the transfer and aggregation of data between and among multiple instances of the ORCA-Registry software, these services also provide a means for external (or downstream ) services to utilise the data (as per the overview diagram in figure 1). The Get Data Source List service retrieves a list of the Data Sources that the registry is configured to gather data from. The response (described by the ORCA-Registry Data Source List Schema) provides the key for each Data Source along with URIs that can be used to retrieve all of the data that the registry has gathered from that source (the instancedatauri), or directly from the source itself (the sourcedatauri). The Get Registry Objects service retrieves all of the supported set of data (described by the ORCA- Registry Data Interchange Schema) that this registry has gathered from the identified Data Source. Figure 6 (below) shows an example of the Web Services activity. Figure 6: ORCA-Registry, Web Services example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 9 of 19
Administration The ORCA-Registry installation creates two functional roles in the COSI-Framework, ORCA Administrator and ORCA Data Source Administrator. Members of the ORCA Administrator functional role will have access to all administrative activities, and all Data Source records. Members of the ORCA Data Source Administrator functional role will not be able to add or delete data source records, and will only have access to Data Source records that are owned by organisational roles that they are a member of. List Data Sources The List Data Sources activity displays a list of the all of the Data Sources that the registry is configured to collect data from. ORCA Data Source Administrators will only see Data Sources that are owned by organisational roles that they are a member of. The Data Source list can be filtered by title using the form at the top of the page. If the List Data Sources list contains only one record, then the user will be taken to the View Data Source activity for that record. Figure 7 (below) shows an example of the List Data Sources activity. Figure 7: ORCA-Registry, List Data Sources example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 10 of 19
Add Data Source The Add Data Source activity (only accessible by ORCA Administrators) displays a form for adding a new Data Source to the registry. The form indicates that key, title, and URI are mandatory. Key must be unique for an instance of the framework, and it is recommended that it take the form of the reverse domain name of the organisation for which the Data Source is being configured. All Registry Objects at this Data Source will be tested to make sure that their key is prefixed by the Data Source key before they are imported into the registry (or they won t be imported). This is to ensure that all records in the registry have a unique key. URI is the location at which the data (conforming to the ORCA-Registry Data Interchange Schema) is located. If the URI is not yet known, just enter http:// as a placeholder. Figure 8 (below) shows an example of the Add Data Source activity. Figure 8: ORCA-Registry, Add Data Source example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 11 of 19
View Data Source The View Data Source activity displays detailed information for a Data Source. This information includes details of the last interaction between the registry and the Data Source (the last run). Figure 9 (below) shows an example of the View Data Source activity in which there have been no interactions with the registry since the creation of the Data Source. Figure 9: ORCA-Registry, View Data Source example. Clicking the Edit button will take the user to the Edit Data Source activity for this record. Clicking the Delete button (only available to ORCA Administrators) will take the user to the Delete Data Source activity for this record. The Test and Import buttons control interaction between the registry and the Data Source. The Clear button removes all records collected from this Data Source from the registry. ORCA-Registry v1.0 Documentation 13 November 2007 Page 12 of 19
Clicking the Test button will cause the registry to attempt a connection to the Data Source at its configured URI, and then to validate the data contained in the response against the ORCA-Registry Data Interchange Schema. The results of this test will be displayed at the bottom of the page, but will not be recorded. This provides a test of the Data Source that can be used by ORCA Data Source Administrators to assist in the development of their interfaces, or to assist in troubleshooting a Data Source without risk of modifying data in the registry. Figure 10 (below) shows an example of the View Data Source activity displaying the results of a test of the Data Source. Figure 10: ORCA-Registry, View Data Source, Test example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 13 of 19
Clicking the Import button will cause the registry to attempt a connection to the Data Source at its configured URI, validate the data in the response against the ORCA-Registry Data Interchange Schema, and then import the Registry Objects contained in the data into the registry. Each Registry Object described in the data is checked to ensure that its Registry Object key is prefixed by the Data Source key (to make sure that it will be unique in the registry); any existing Registry Object in the registry with the same key is deleted; and the Registry Object is inserted into the registry. The results of this activity are recorded and the page updated to show them. Figure 11 (below) shows an example of the View Data Source activity displaying the results of an import run against the Data Source. Figure 11: ORCA-Registry, View Data Source, Import example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 14 of 19
Clicking the Clear button will remove all Registry Objects collected from the Data Source from the registry. Figure 12 (below) shows an example of the View Data Source activity displaying the results of clearing the registry of objects collected from the Data Source. Figure 12: ORCA-Registry, View Data Source, Clear example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 15 of 19
Edit Data Source The Edit Data Source activity provides a form for modifying a Data Source. Only ORCA Administrators will be able to modify the Record Owner attribute. The key cannot be changed if there is an error with the key, then the record must be deleted and a new one with the correct key created using the Add Data Source activity. After successfully saving the changes, the user will be taken to the View Data Source activity for that record. Clicking the Cancel button will take the user back to the View Data Source activity for this record (without saving any changes). Figure 13 (below) shows an example of the Edit Data Source activity. Figure 13: ORCA-Registry, Edit Data Source example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 16 of 19
Delete Data Source The Delete Data Source activity prompts for confirmation that a Data Source is to be deleted. Clicking the Delete button will remove all Registry Objects collected from the Data Source from the registry, and then delete the Data Source from the registry. Clicking the Cancel button will take the user to the View Data Source activity for that record (without deleting any records). Figure 14 (below) shows an example of the Delete Data Source activity. Figure 14: ORCA-Registry, Delete Data Source example. ORCA-Registry v1.0 Documentation 13 November 2007 Page 17 of 19
Brief Technical Overview System Requirements The ORCA-Registry is a PHP/PostgreSQL web application built to utilise the COSI-Framework (an install of which is a prerequisite). System and web browser requirements are the same as those for the COSI-Framework (see the COSI-Framework documentation for more information). File Structure The ORCA-Registry file structure is similar to that of the COSI-Framework within which it is housed. Activities are defined in files at the application root, and within folders that are not prefixed with an underscore ( _ ). Folders that are prefixed with an underscore are used to hold function libraries, help content etc. Function library files are prefixed with orca to prevent name clashes with other files included in the COSI-Framework. Figure 15 (below) shows the ORCA-Registry file structure. Figure 15: ORCA-Registry file structure. The file orca_init.php at the root of the application contains the environment settings for an installed instance of the ORCA-Registry consisting of the location of the schemata that are to be used by the instance. The ORCA-Registry defines some Cascading Stylesheet (CSS) styles for its own use in orca.css. These are included into the response in orca_init.php via the COSI-Framework API. The installation process makes changes to the application_config.php and database_env.php files in the housing COSI-Framework. ORCA-Registry v1.0 Documentation 13 November 2007 Page 18 of 19
Database The registry database (named dbs_orca) contains just under one hundred tables, plus supporting views and user-defined functions. The table structure has been designed to hold the complete set of data described by the ORCA-Registry Data Interchange Schema, though only the tables required for the supported set of this data are used by the current version of the application. The accompanying document ORCA-Registry.1.0.DatabaseSchema.pdf shows the complete table structure, and identifies (via orange highlight) all of the fields that the application uses. All access to data is via parameterised calls to user-defined functions. Schemata The ORCA-Registry relies on three XML schemata: ORCA-Registry Data Interchange Schema. Defines the structure of an XML document containing data for import into, or export from, an instance of the ORCA-Registry. ORCA-Registry Data Interchange Schema (supported set). Defines the subset of elements from the ORCA-Registry Data Interchange Schema that is currently supported by the application (i.e. the subset of data that will be imported, stored, displayed, and exported). This is provided for use by developers in exposing repository data, as it s smaller and easier to read than the complete set. Finalised interfaces should not reference this schema, they should reference the complete schema. ORCA-Registry Data Source List Schema. Defines the structure of the XML document listing the Data Sources in an instance of the ORCA-Registry, including URIs for retrieving data from either the registry or the original Data Source itself. These schemata are provided with the installation files. For stand-alone use, the schemata can be located on the same web server as the ORCA-Registry, and the registry configured to utilise the files at that location. The XML from Data Sources configured in an instance must also reference the schemata at that location in their xsi:nonamespaceschemalocation. For use in a federation, the schemata can be located in an agreed location, and all member instances of an ORCA-Registry can be configured to utilise the schemata at that location. In this case the XML from all Data Sources configured within the member registries would also reference the schema at that location. The accompanying document orca.xsd.pdf shows the XML schema for the ORCA-Registry Data Interchange Schema, and identifies (via yellow highlight) the supported subset. ORCA-Registry v1.0 Documentation 13 November 2007 Page 19 of 19