Arts Image Database - Specifications Sean Dooley June 6, 2007
Contents Introduction 4 2 Project Outline 5 3 User Requirements 6 3. User Types...................................... 6 4 Functional Requirements 7 5 Non Functional Specifications 2 5. Platforms and Server Environments........................ 2 5.2 External Standards................................. 2 6 Deployment Hardware 4 7 Security 5 7. User Authentication................................. 5 7.2 Image Access..................................... 5 7.3 File Access...................................... 5 8 Risk Analysis 6 2
List of Figures Anonymous User Functions............................ 8 2 Departmental user functions............................. 9 3 Departmental administrator functions........................ 0 4 Site administrator functions.............................. 5 Deployment Hardware................................. 4 3
Introduction This document is meant to provide an informal specification for the functionality provided by the Arts Image Database system currently under development. The Arts Image Database is being built to be a central repostory of images for all the Schools in mainly the Arts Faculty. It will be possible for users to add images to the repository when they have image collections that need to be managed, as well as browse images that exist in the system to search for appropriate teaching/research materials. It is intended that if users have sufficient access permissions they will be able to download images in different formats and resolutions depending on the purpose of use. Questions and comments are welcome. Please send all email correspondance to both Sean Dooley (srd@st-andrews.ac.uk) and Birgit Plietzsch (bp0@st-andrews.ac.uk). 4
2 Project Outline The tasks involved in the development of the system can be broken down as follows. Specification This document is the first step towards a full specification for system functionality. In a general sense, we need to specify what the system will do and how it will do it. Database Design and Build A PostgreSQL relational database needs to be designed and built to store all image data and permit searching and retrieval of images Interface Design and Development Interface design will run in parallel with database design. An interface has to be built to provide access to the data stored within the PostgreSQL database. The interface will be web based, making use of php on an Apache web server to give access to the image repository. Other Repository Integration Integration with any other image repositories (such as the AHDS system) will need to be considered in conjunction with the interface design and development. User Testing and Feedback A period of user testing and feedback is essential to ensure the system meets as many desired functional requirements as possible. Modifications Following user testing and feedback, any necessary changes to the site will have to be implemented. Deployment and Testing Once the system is fully functional within the test environment, it will have to be deployed on the live environment and a small set of users can start using the system. Final Bug Fixing In response to user tests of the final system, and remaining bugs will have to be fixed Full Deployment Following successful bug fixes, the site will be deployed fully with all users being given access. The steps involved in development of the system will often occur in parallel. 5
3 User Requirements 3. User Types At this stage we are providing for four main user types. These are: Guest Users Guest users may be from anywhere in the world. Guest users can view any collections/images that are marked for public view. University Departmental Users Departmental users will normally be staff of the University who have been given access to view and download particular sets of images. University Departmental Administrators Departmental administrators will normally be staff or students of the University who have the ability to add and remove images from their own collections on the system. Site Administrators Site administrators will have access to all collections, and will be able to add and remove users from the system. It is hoped that we will be able to integrate some existing collections within and outwith the University into the system. As the function of other collections will have to be addressed on a case by case basis, other user types may need to be added in the future. 6
4 Functional Requirements The following use case diagrams demonstrare the basic functionality that different users are able to access. As the specification is developed, these diagrams will become far more complex. 7
BrowseDatabase Anonymous User Search Database Verify Image Permits Anonymous Access View Image Details View Public Access Collections Figure : Anonymous user functions Questions: 8
Browse Image Database Departmental User Search Database View Image Details View Departmental / Anonymous Access Collections Verify User s Access Permissions Download resampled Image Figure 2: Departmental user functions Questions: Should a departmental user have to be explicitly given access to the department collections, or should an Art History student automatically gain access to all Art History collections marked for departmental access? 9
Browse Image Database Search Database View Image Details Verify User s Access Permissions View Departmental / Anonymous Access Collections Download resampled Image Departmental User Add Collection Modify Collection Access Add/Modify Item Metadata Add Item (Image / Metadata) To Collection Figure 3: Departmental administrator functions Questions: 0
Browse Image Database Manage Images Site Administrator Add User Verify User s Access Permissions Delete User Modify User Permissions Figure 4: Site administrator functions Questions:
5 Non Functional Specifications 5. Platforms and Server Environments The system is spread across three different technologies - a file store, a PostgreSQL database, and an Apache 2 Webserver. High resolution images will be stored on the file store, in whatever format they are submitted in. At this stage, the file store will probably exist on the same server as the database. References to the files will be stored within the database, with all metadata relating to the images. Some images (e.g. thumbnails and web sizes) may be stored internally to the database to improve system performance. The front end will be built with php, and will act as an interface between the user and the database. By separating the interface from the storage engine, we are able to future proof the system as much as possible. If the interface can be improved in some way, it can be modified or completely replaced without affecting the content within the database. 5.2 External Standards Image Metadata It is intended that all images will be stored with a corresponding metadata set that complies with the VRA Core 4 specifications. Though the VRA Core 4 allows extensions to the specifications for individual cases, this will be discouraged, as tailoring the system to these specific purposes is not possible as an ongoing practice. Rather, users will be encouraged to ensure that the data they use complies with the VRA Core 4 Restricted specification set. HTML/CSS All HTML and CSS will be coded to W3C XHTML.0 Strict standards. This will mean that the site will function properly in all recent standards compliant browsers. Users of older browser will still be able to use the site, but with reduced visual finesse. Accessibility All aspects of the system must comply with the UK Disability Discrimination Act (995) and the relevant changes to the DDA made by Special Educational Needs and Disability Act (200) that specifically apply to educational establishments. 2
In accordance with this, every step will be taken to guarantee compliance with W3C Web Content Accessibility (WCA) guidelines. To ensure compliance, all markup will be semantically accurate and content will be kept as seperate from form as possible. Additionally, all content must be understandable and navigable without relying on graphical interfaces or colour alone. This will impose limits on the user interface to the database. Many other web repositories are either not required to comply with the UK DDA, have chosen not to, or are ignorant of their responsibilities. In practical terms, this means that sometimes people will request a feature or function for the database interface (that might exist on another site) but that will have to be denied for legal reasons. 3
6 Deployment Hardware Webserver: Linux <<device>> <<Execution Environment>> Local Filesystem Storage <<device>> Personal Computer Client Request <<http>> <<execution environment>> Apache 2.x, php 5.x <<execution environment>> Browser <<Execution Environment>> PostgreSQL 8.2 Database Engine User Authentication <<device>> LDAP Server <<application environment>> LDAP Open Directory? Figure 5: Deployment Hardware Notes: 4
7 Security 7. User Authentication It is intended that users will be authenticated using their University logon. It appears at this stage that the permissions system for collection and image access will be too complex to be managed using LDAP groups, and so the database will store internal access permission on departmental and user levels. 7.2 Image Access All image access will be performed through the web server. Because different users will be allowed different levels of access, the server will have to compare every request for an image with the user s access rights. For this reason, access to the images won t be direct through the webserver - the web server will receive the request, check the access permissions, and if authorised either open the file on the file store or retrieve the image binary data from the database, generate the image and output it to the browser. This is going to cause an large load on the server, and may become a significant performance issue, but is necessary if access is to be sufficiently secure. 7.3 File Access As the images are to be stored on a file store, local filesystem privileges will have to be managed. The implementation of this is the responsibility of the server administrator. 5
8 Risk Analysis At this stage, the main risks identified in the system, and possible solutions are: Server Load Image manipulation requires large amounts of processing power. It may be discovered that the web server, where the processing will take plce, just doesn t have the resources to drive the site. If this is the case, there are a number of strategies that can be taken to minimise server load, but all with associated costs. The best solution would be to set up a dedicated server - for a small financial and administrative penalty, a much more capable system will result. A possible system modification would be sampling the images at upload time and saving sampled versions, which would greatly increase the required storage space. There are advantages to storing resampled images internally to the database, but this would require storage space and processor power on the database server. Storing the files on the file store in a useable way would decrease the security of access to the images. File Storage Images can often take up large amounts of hard drive space. Initial estimates suggest -2TB of storage space will be sufficient to begin with - for storage of the unsampled images entered into the system - but we must bear in mind from the outset that this is an estimate, and that storage requirements will almost certainly increase in the future. Drive Failure Given the large quantities of data that will be accessed, and an as yet unknown frequency of access, drive failure must be considered to be a risk. Because of this, we should ensure that a mirrored system is used or a stringent backup schedule is followed. 6