1 Nick Chin Lauren Merrill Olena Zubaryeva This document is a response to the Boston Public Library s request for a review of systems for requesting, processing, and delivering digital images. An analysis of four systems are included below. Fedora and Greenstone are open-source systems. ContentDM and DigiTool are proprietary products. Fedora Fedora stands for Flexible Extensible Digital Object Repository Architecture. It is an open source digital repository system based on digital objects composed of datastreams. The latest version is Fedora which was released on August 13 th, 2007 and is available for download from History Fedora s roots lie in early 1990s research defining and networking digital objects funded by DARPA (Defense Advanced Research Projects Agency). This was further developed by the University of Virginia as a solution for managing complex digital content. It is currently developed and supported by the Fedora Project, which is a collaboration between the University of Virginia and Cornell Unviversity. Current funding for the project runs through 2007 and is provided by a grant from the Andrew W. Mellon Foundation. The Fedora Community centers around though it is soon to be moved to Digital Objects The Fedora white paper defines digital objects as an aggregation of content items where each content item maps to a representation. Datastreams are the primary components of the digital object and can be digital content, metadata, disseminators which display content, or relationships to other objects. The system locates digital objects through the use of unique persistent identifiers (PID) and tracks the objects through system-defined object properties. Because Fedora is a service housed on a web server, users access datastreams via URL and are presented content through a standard web browser. Fedora digital objects are represented in the file system as files in an open XML format. Because of this architecture, a digital object can house any type and kind of data (video, text, audio) or metadata. For example, one digital object could contain a photo in one datastream and Dublin Core metadata in another datastream; or, a digital object could just as easily contain a series of 100s of photos (in a corresponding number of datastreams) with one set of metadata; or a digital object could contain pointers to a number of digital objects which contain individual photos and individual metadata. This is what is meant by flexible. Fedora provides an architecture based on digital objects which can be the basis for using any kind of structure and can contain any kind of content so long as it can be delivered digitally through the web.
2 Workflow The following review of administrative workflow comes from the SherpaDP review of the Fedora Architecture. These are the functionality provided by interface for Fedora administrator: Ingest in Fedora - Fedora digital objects can be encoded in different XML formats for ingest and export. In Fedora, objects can be ingested and exported in either the Fedora Object XML (FOXML) format or the Metadata Encoding and Transmission Standard (METS). Fedora provides an extension of the METS 1.2 schema that adds a few attributes to support Fedora. Ingest source can be another Fedora archive or a directory. Search/Browse - Administrator can search the repository on the date of creation, title and object id. Purge/Remove Object - Purging an object completely and permanently removes it from the repository. Exporting Object from Fedora - Fedora Administrator has option to export one object, or multiple objects by type. Export allows exporting object into metadata format while it does not export the datastream associated with it. In addition, the following actions can be taken with datastreams: add, modify, withdraw, delete, purge, get, and get history. Fedora does keep track of a datastream s version history in order to log and track changes for preservation purposes. Reviews Fedora has gained publicity primarily through its usage by larger institutions in the creation of digital libraries and through the publication of papers by its developers at Cornell and the University of Virginia. Fedora Commons contains a Top 10 list of reasons to use Fedora. They include: the ability to store whatever you want ; the ability to express relationships using pointer datastreams; an extensible framework that can ease integration with existing systems; and support for preservation. Much of the available literature about Fedora has primarily been written from an anecdotal perspective. While generally positive, it is difficult to draw definitive conclusions. However, one 2006 evaluation pitting Fedora against Greenstone, CDSware, and EPrints showed Fedora fully capable of the claims made in the Top 10 garnering full scores in the preservation, standards support, and metadata categories, but also showed some weaknesses. Fedora scored the lowest in the content management category (4.5 out of 10) due to a lack of submission support and review, or notification of submission status. Fedora also showed a lack of support for automated content acquisition, harvesting, and metadata generation. Overall, Fedora placed third with Greenstone scoring the highest. Greenstone Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD- ROM. Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato, and developed and distributed in cooperation with UNESCO and
3 the Human Info NGO. It is open-source software, available from under the terms of the Gnu General Public License. The Greenstone digital library software id a comprehensive, open-source system for constructing, presenting, and maintaining information collections. Collections can be built from HTML documents, Word, PDF and PostScript documents, images in various formats, MP3 and MIDI audio, MARC records and more. For each collection, various different full-text search indexes and metadata-based browsers can be created. There is support for Dublin Core, OAI, and METS, also collections can be exported to and from DSpace. Greenstone s librarian interface allows users to gather together sets of documents, import or assign metadata, build them into a Greenstone collection, and serve it from their web site. It supports seven basic activities: opening an existing collection or defining a new one; copying documents into it, with metadata attached (if any); mirroring documents from the Web if required; enriching the documents by adding further metadata to individual documents or groups; designing the collection by determining its appearance and the access facilities it will support; building it using Greenstone; previewing the newly created collection from the Greenstone home page. The interface explicitly supports four levels of user proficiency: Library Assistants, who can add documents and metadata to collections, and create new ones whose structure mirrors that of existing collections; Librarians, who can, in addition, design new collections, but cannot use specialist IT features (e.g. regular expressions); Library Systems Specialists, who can use all design features, but cannot perform troubleshooting tasks (e.g. interpreting debugging output from Perl scripts); Experts, who can perform all functions. Collections built with Greenstone automatically include effective full-text searching and metadata-based browsing facilities that are attractive and easy to use. They are easily maintainable and can be rebuilt entirely automatically. Searching is full-text, and different indexes can be constructed (including metadata indexes). Browsing utilizes hierarchical structures that are created automatically from metadata associated with the source documents. Collections can include text, pictures, audio, and video. The interface to collections can be extensively customized. Documents can be in any language: the interface has been translated into about thirty languages. Although primarily designed for Web access, collections can be made available, in precisely the same form, on CD-ROM or DVD. The system is extensible: software "plug-ins" accommodate different document and metadata types. The Greenstone software runs under Unix, Windows and Mac (OS/X), and is issued as source code under the GNU public license. For users with programming skills it is possible to extend and tailor the system extensively.
4 One of the things that the users creating collections with Greenstone should keep in mind is that there is no control in the system on how and what format the metadata entered is in. For example, it is possible to add metadata for the field Date in different formats (10/7/2007, or or October 7 th, 2007). Previous controlled vocabulary and established rules on metadata format should be established. The system allows to skip some or all metadata fields, so it is recommended that the specialist describing the collection materials should be trained before so as to minimize the amount of possible errors and ambiguities when entering data in Greenstone collections. ContentDM CONTENTdm is a digital content management system that is owned by the company DiMeMa, Inc. As CONTENTdm is a software created by DiMeMa it follows common standards, specifically Z39.50 through the program ZCOTENT, which allows access to digital collections on CONTENTdm servers from library portals and local catalogs and can be downloaded for free from the CONTENTdm User Support Center. The software also comes with the Library of Congress Thesaurus for Graphical Materials I, but also allows the creation or import of other controlled vocabularies. CONTENTdm supports audio, video, PDF, image files. It also supports multi-sided objects. CONTENTdm claims to support other files, but I could not find a listing of any other named files. CONTENTdm does also include image rights tools within its software. CONTENTdm works by giving each user a personal project space within the CONTENTdm Acquisition Station on their desktop. When image acquisition and metadata entry are complete within a user's project, the items are uploaded to a centralized pending queue for final review or editing by a collection administrator before being added to the collection. This would give BPL the division of work processes that it s looking for. CONTENTdm also allows for objects to be added singularly or via batch processing. The software also allows for metadata templates to be created and used in order to speed up and standardize the entry of metadata, namely descriptive and administrative metadata. It also allows users to apply image settings instantly to items as they are imported into your projects or make global changes to metadata within live collections. The program also gives users the options for working with high resolution JPEG or TIFF files, from automatically creating lower resolution display copies that enable modifications to image quality and legibility, to using custom display copies you create. As the BPL is currently storing data regarding its files in Microsoft Excel, a perk of CONTENTdm is the ability to add into CONTENTdm from existing tools such as Access, Excel, and FileMaker Pro. CONTENTdm works via servers and collections are able to be administered from remote desktops through Web browsers, thus, there can be stations for the different work processes involved. For the purposes of the Boston Public Library this would allow the multiple departments and individuals that are involved in the digitization process to access the servers.
5 A claim that CONTENTdm makes, which is worth referencing, is that search results are retrievable in less than a second, in a collection that has more than a million objects. It claims to do this without any degradation of speed, as the collection is added to. In terms of sharing its collection, CONTENTdm would allow the BPL to add their digital objects to OCLC WorldCat and to the Open Archives Initiative (OAI), which would allow for metadata harvesting. CONTENTdm follows the OAI Protocol for Metadata Harvesting Volume 2. CONTENTdm is also employs the use of Dublin Core and the Visual Resource Association Core, which allows for a common language standard for describing media; however CONTENTdm does allow collection administrators to add their own field descriptions, if needed. From the Web perspective, CONTENTdm utilizes XML for all internal structure description and also, allows for a custom XML export of metadata. As far as pricing goes, CONTENTdm does have customizable packaging based on seven levels. Pricing is based on collection size. Pricing includes a CONTENTdm server (installed on a single system), acquisition station software for up to 50 workstations (although it should be noted this can only be installed on windows machines), search client and web templates and JPEG2000 and OCR Extensions. Included in pricing is the technical support. They also provide contact information for organizations who have similar types of collections and environments, which can allow for sharing ideas and issues down the line. The minimum price of the software is $9,800 with an annual maintenance fee of $1,900; while the maximum price is $49,800 with an annual maintenance fee of $8,900. CONTENTdm will also provide installation assistance and on-site training, if needed, for one day at the cost of $2,500 plus travel expenses. DigiTool As the BPL has already purchased DigiTool it is worth looking at whether this software can be used. DigiTool is modular-based program which allows an institution to manage their digital assets through creation, management and preservation. DigiTool suuports text, images, audio and video through its modular approach. The claims of DigiTool include that it addresses multiple needs, functions and workflows specific to the life cycle of a digital object. Specifically, the workflows/modules are broken down by function, including ingesting, metadata editing, collection management and system administration. DigiTool specifically works through the DigiTool Repository, which is for storing and managing the digital objects and associated metadata. Whereas metadata is stored in the Repository's Oracle-based database, uploaded objects are stored in a secure network file system (NFS) or on remote systems accessed via URLs. A standard Web services (SOAP) layer enables the Repository to interact with the other DigiTool modules as well as with local or third-party systems. The repository is able to be search for object metadata or full-text documents. Viewing privileges can be established by the institution to control access and are customizable. It should be mentioned that DigiTool is a software from ExLibris and much of its end-user
6 management is based on coordination with the ExLibris system. The DigiTool architecture is based on technologies such as Web services (SOAP), XML, XSD, XSL, ODBC, Unicode, and JPEG As the system has already been purchased by the BPL, I have left out pricing, and annual maintenance is not listed in any of their supporting materials until a license agreement has been established. BPL may have already established this with DigiTool, and can be referenced later on.
7 Sources Bodhmage, Kirti (2005, December 1). Fedora architectural review. Retrieved October 1, 2007, from SherpaDP Web site: Boston College Libraries. DigiTool Implementation and Documentation. Retrieved October 5, 2007 from ol/digitool.html ExLibris, DigiTool Overview. Retrieved October 5, 2007, from ExLibris Web site: Fedora Development Team, (2005). Fedora Open Source Repository Software. Retrieved October 7, 2007, from Fedora Web site: WhitePaper.pdf CONTENTdm, Frequently Asked Questions, Retrieved October 5, 2007 from Goh, D, Chua, A, Khoo, D, Khoo, E, & Mak, E (2006). A checklist for evaluating open source digital library software. Online Information Review, 30, Retrieved October 2, 2007, from Library, Information Science, and Technology Abstracts Database. Greenstone Digital Library, - retrieved 10/7/2007. Greenstone Digital Library General Documentation, umentation - retrieved 10/7/2007. Johnston, Leslie (2005). Development and assessment of a public discovery and delivery interface for a Fedora repository. D-Lib Magazine, 11, Retrieved October 4, 2007, from n.html Lagoze, C, Payette, S, Shin, E, & Wilper, C (2006). Fedora: an architecture for coplex objects and their relationships. International Journal on Digital Libraries, 6,
8 RetrievedOctober 3, 2007, from Library, Information Science, and Technology Abstracts Database. Lally, Ann (2007). University of Washington Libraries Digital Collections. D-Lib Magazine, 13, Retrieved October 6, 2007 from OCLC, CONTENTdm for digital collections. Retrieved October 5, 2007 from