Database Preservation Toolkit: a flexible tool to normalize and give access to databases
|
|
|
- Norma Barnett
- 10 years ago
- Views:
Transcription
1 Database Preservation Toolkit: a flexible tool to normalize and give access to databases José Carlos Ramalho University of Minho [email protected] Luis Faria KEEP SOLUTIONS Lda [email protected] Miguel Coutada University of Minho [email protected] Hélder Silva KEEP SOLUTIONS Lda [email protected] ABSTRACT Digital preservation is emerging as an area of work and research that tries to provide answers that will ensure a continued and long-term access to information stored digitally. IT Platforms are constantly changing and evolving and nothing can guarantee the continuity of access to digital artifacts in their absence. This paper focuses on a specic family of digital objects: Relational Databases; they are the most frequent type of databases used by organizations worldwide. Database Preservation Toolkit enables the preservation of relational databases holding the structure and content of the the database in a preservation format in order to provide access to the database information in a long term period. If in one hand there is a need to migrate databases to newer ones that appear with technological evolution, on the other hand there is also the need to preserve the information they hold for a long time period, due to legal duties but also due to archival issues. That being said, that information must be available no matter the database management system where the information came from. In this area, solutions are still scarce. Main products for relational database preservation include CHRONOS and SIARD. The first one is, in most of the cases, unreachable due to the associated costs. The second one is not really a product but a preservation format. The main idea behind this work was to explore the main features and limitations of the existing products in order to improve db-preservation-toolkit ( db-preservation-toolkit/), an extracted component from the RODA project ( Therefore, db-preservation-toolkit was improved with respect to performance and also with new features addiction in order to support more database management systems, address some missing features of the other products, support of a new preservation format (SIARD) and provide an interface where it is possible to access and search the information of the archived databases. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; D.2.8 [Software Engineering]: Metrics complexity measures, performance measures Keywords Digital Preservation, Databases, Migration, Significant Properties, Digital Object 1. INTRODUCTION In the current paradigm of information society more than one hundred exabytes of data are already used to support our information systems [6]. The evolution of the hardware and software industry causes that progressively more of the intellectual and business information are stored in computer platforms. The main issue lies exactly within these platforms. If in the past there was no need of mediators to understand the analogical artifacts today, in order to understand digital objects, we depend on those mediators (computer platforms). In the eventual absence of appropriate mediators, who can guarantee the preservation of the digital artifacts? In other words, who has the responsibility to support the continuity of access to digital data [1]? Despite the concrete responsibilities and considering that there is no generic solution, several researchers and research projects aim to face this problem. Although digital information can be exactly preserved in its original form by only copying (preserving) the bits, the problem appears when we notice the very fast evolution of
2 those platforms (hardware and software) where the bits can be transformed into something human intelligible [4]. Digital archives and digital libraries are complex structures that without the software and hardware which they depend on the human being, or others, will certainly be unable to experience or understand them [3]. Our work addresses this issue of Digital Preservation and focuses on a specific class of digital objects: Relational Databases [4]. Relational databases are a very important piece in the global context of digital information and therefore it is fundamental not to compromise its longevity (life cycle) and also its integrity, liability and authenticity [8]. These kinds of archives are especially important to organizations because they can justify their activities and characterize the organization itself. Current studies claim that 90% of the information produced in a daily basis is stored in a relational database. Currently, in this project, we aim to support more database formats on ingestion, more database preservation formats as AIPs and new ways to explore the archived databases. In the following section we will describe the project context and its roots. Next we analyze the relational databases class of objects; we should be able to completely characterize this type of digital objects so that one may choose what are the issues (the things) important/valid/necessary for preservation. Following section establishes the significant properties for relational databases digital preservation. The significant properties are addressed, individually and globally, over different levels of abstraction. At the end we will draw some conclusions, specify the future work to be done and also enumerate some questions that emerge from the research. 2. RODA: THE BEGINNING... In mid 2006, the Portuguese National Archives (Directorate- General of the Portuguese Archives) have launched a project called RODA (Repository of Authentic Digital Objects) aiming at identifying and bringing together all the necessary technology, human resources and political support to carry out long-term preservation of digital materials produced by the Portuguese public administration. As part of the original goals of the RODA project was the development of a digital repository capable of ingesting, managing and providing access to the various types of digital objects produced by national public institutions. The development of such repository was to be supported by open-source technologies and should, as much as possible, be based on existing standards such as the Open Archival Information System (OAIS) [2], METS [12], EAD [11] and PREMIS [7]. The OAIS model is composed by three top processes: ingest, administration and dissemination. In RODA we have specified the workflows for each of these processes. The ingest process takes care of new information added to the repository. This information is delivered by the producer as an Submission Information Package or SIP. The SIP structure had to be formally specified so that third-party institutions were able to communicate with the repository. During ingest SIPs are transformed into AIPs (Archival Information Packages). The dissemination process takes care of consumer requests by transforming AIPs into DIPs (Dissemination Information Packages), a subset of the preserved information more adequate for delivery to end-users. Currently RODA is capable of storing and give access to the following types of digital objects: text-documents, still images, relational databases, video, audio and s. Normalization plays an important role in RODA. It was not possible to archive every kind of text-document or every kind of still image. Even with databases normalization was necessary as each Database Management System (DBMS) had its own data model. So we had to take mesures towards format normalization. Every digital object being stored in RODA is subjected to a normalization process: text documents are normalized as PDF files; Still Images are converted to uncompressed TIFFs; Relational databases are converted to DBML[8] (Database Markup Language). The RODA project is divided into many different components and services having the Fedora Commons at the core of its framework. Fedora implements the common digital repository features, as digital objects and metadata storage and the ability to create relationships between objects. Fedora Commons also provides search capabilities by using the Lucene search engine under the hood. On top of that, we have developed the RODA Core Services, i.e. the basic RODA services, which can be accessed programatically. Finally, the RODA Web User Interface allows the end user to easily browse, search, access and administrate stored information, metadata, execute ingest procedures, preservation and dissemination tasks. In spite of all the efforts invested in the development of RODA, there was still no support for real active digital preservation. Once the materials got into the archival storage they remained untouched and, therefore, susceptible to technological obsolescence, especially at the format level. At the same time, at the University of Minho, a project called CRiB (Conversion and Recommendation of Digital Object Formats) was being devised. This project aimed at assisting cultural heritage institutions as well as normal users in the implementation of migration-based preservation interventions. Among those services were format converters, quality-assessment tools, preservation planning and automatic metadata production for retaining representations authenticity. The CRiB system was developed as a Service Oriented Architecture (SOA) and is capable of providing the following set of services: File format identification; Recommendation of optimal migration options taking into consideration the individual preservation requirements of each client institution or user; Conversion of digital objects from their original formats to more up-to-date encodings; Quality-control assessment of the overall migration process - data-loss, performance and format suitability for long-term preservation;
3 Generation of preservation metadata in PREMIS format to adequately document the preservation intervention and retain the objects authenticity. After obtaining supplementary funding to continue the development of RODA, the team decided to use CRiB as its preservation planning and execution unit. The RODA project follows a service-oriented architecture to facilitate the parallel development and update and allow heterogeneous technology and platform independence between its various components. The CRiB project is also serviceoriented, to allow the implementation of services that are only possible in specific platforms and technologies. This paper provides a description of both projects and about the integration of CRiB as on of RODA s components, allowing the use of its features for normalization processes during ingest, metadata generation, preservation planning and format migrations, and even dissemination services. In this paper we are going to focus on the digital preservation of databases. We will raise the relevant questions on this topic and we are going to discuss the decisions we took in the past and the ongoing work. 4. OAIS, SIPS AND DATABASES RODA follows the Open Archival Information System Reference Model (OAIS) [2]. OAIS identifies the main functional components that should be present in a archival system capable performing long-term preservation of digital materials. The proposed model is composed of four principal functional units: Ingest, Data management, Archival storage and Access; and two additional units called Preservation planning and Administration. Figure 1 depicts how these functional units interact with each other and with all the stakeholders of the repository (internal and external). PREMIS NISO Z39.87 METS preservation technical structural Producer EAD descriptive METS SIP PREMIS METS NISO Z39.87 EAD Ingest DM AIP OAIS Preservation Planning Data Management Archival Storage Administration Management? DM AIP Access DIP Consumer? 3. PAST Preserving digital data is a complex technological puzzle. Databases are one of the most complex digital object types to deal with. To simplify the problem we decided to address the problem by layers: data, structure and semantics. These layers match database significant properties and tell us what to preserve and how to measure the quality of the digital preservation strategy being followed. The data layer extracts data and migrates it to the preservation format. Structure layer does the same with the database structure. The semantics layer will deal with all the remaining database features that should be preserved. Our first approach was to deal with the first two layers, the preservation of the database data and structure, i.e., the preservation of the database logical model. We developed a RODA component that extracts the first two layers from its specific database management environment (DBMS). Its first version used DBML[8] neutral format for the representation of both data and structure (schema) of the database. This component was presented and demonstrated at the Open Planet workshop, Database Archiving, held at Danmark National Archives in During the workshop it became clear that more formats should be supported and we should also change the preservation format. Although there is no standard for a database preservation format, SIARD[10] is being adopted in several european institutions and projects and when compared to DBML it already supports part of the semantics layer and had some scalability properties. So we decided to have it also as our preservation format. Back then we also decided to support other DBMS formats like DB2, and other preservation formats like AADL (used by Sweden, Norway and Finland) as input and output formats of our toolkit. This way our toolkit will become a real interoperability tool. Figure 1: RODA general architecture 4.1 Ingest process The ingest process is responsible for accommodating new materials into the repository and takes care of every task necessary to adequately describe, index and store those materials. For example, in this stage the repository may transform submitted representations to normalized formats adequate for long-term preservation and request the user to add descriptive metadata to those objects to facilitate their future retrieval using available search mechanisms. It is also common practice to store the original bit-streams of ingested materials together with the normalized version (just in case a more advanced preservation strategy comes along to rescue those old bits of information). New entries come in packages called Submission Information Packages (SIP). When the ingest process terminates, SIPs are transformed into Archival Information Packages (AIP), i.e. the actual packages that will be kept in the repository. Associated with the AIP is the structural, technical and preservation metadata, as they are essential for carrying out preservation activities. The SIP is the format used to transfer new content from the producer to the repository. It is composed of one or more digital representations and all of the associated metadata, packaged inside a METS envelope. The structure of a SIP supported by RODA is depicted in Figure 2. The RODA SIP is basically a compressed ZIP file containing a METS document, the set of files that compose the submitted representations and a series of metadata records. Within the SIP there should be at least one record of descriptive metadata in EAD-Component format 1. However, one may also find preservation and technical metadata inside a submission 1 An EAD record does not describe a single representation.
4 Compressed ZIP file <METS> envelope Descriptive metadata <EAD> Preservation metadata <PREMIS> Technical metadata <NISO,...> Representation File 1 File 2... File n Figure 2: Submission Information Package structure package, although this last set of metadata is not mandatory as it is seldom created by producers. Nevertheless, it was felt important that RODA should support those additional SIP elements for special situations such as repository succession, i.e. when ingested items belong to another repository that is to be deactivated. Before SIPs can be fully incorporated into the repository they are submitted to a series of tests to assess its integrity, completeness and conformity to the ingest policy. If any of the validation steps fails, the SIP is rejected and a report is sent to the archivists group as well as to the producer. The producer may then fix the problem and resubmit a new version of the SIP. Since late 1990 s, was accepted as the neutral format for information representation and information interchange. This is due, mainly, to two factors. On one hand, documents are purely textual files, structured and independent of any hardware or software platforms. On the other hand, it is widespread and more and more public domain tools are available to help users transforming documents. was the obvious choice for the base format of our representation files. Both DBML and SIARD use as the base format. DBML and SIARD are the only based database preservation formats. Although they are easy to process by both machines and humans, converting a database into DBML or SIARD is not easy and it is not a task humans can do by hand. So, the next step was to create a tool capable of generating DBML from different DBMS. We also keep an SQL version of the database information in the version supported by the original DBMS. This has to do with the preservation policy, we always keep the original object or, at least, the closest version of it (we do not know what the future may bring and we can t predict how the actual DBMS will evolve). 6. DATABASE SIP BUILDER In RODA s context we soon realized that we could not just deliver a format and demand from producers to send us the information packages accordingly. In projects like this one it is important to have wide acceptance from the community of users. We developed a tool to create these SIPs. This tool was integrated in RODA but due the growing interest it has emancipated as a tool that can be integrated with other systems and tools. Its architecture is presented in figure 3. Import modules MySQL Oracle'12 Streaming data model Export modules MySQL SQL Server 5. DATABASE SIP Database SIPs are very similar to other SIPs. The difference relies on the representation files. For the other formats we only had to choose one normalization format to use for the representation files: for images we chose TIFF, for text based documents we chose PDF and so on. But for databases there wasn t such a format. Each DBMS supported its own format. Even SQL has some different versions. So, we had to create and specify a new format. SQL Server PostgreSQL DB2 MS Access ODBC DBML SIARD PostgreSQL DB2 MS Access ODBC DBML SIARD PhpMyAdmin A neutral format that is hardware and software (platform) independent is the key to achieve a standard format to use in digital preservation of relational databases. This neutral format should meet all the requirements established by the designated community of interest. In fact, EAD is used to describe an entire collection of representations. Our SIP includes only a segment of EAD, sufficient to describe one representation, i.e. a <c> element and all its sub-elements. The team has called this subset of the EAD an EAD-Component. Figure 3: Database SIP builder architecture We are addressing database significant properties by layers. Each layer raises different problems that have to be solved with appropriate solutions. 6.1 Data layer Extracting data from a DBMS it is not difficult, we just have to connect to the DBMS and issue an SQL statement like
5 SELECT * FROM. In DBML all the data is dumped in a single file. We had the idea to segment the data but SIARD already did that. That was one of the reasons that took us to support SIARD as the preservation format. DBML had to change to be able to take care of real databases and most of the needed changes were already implemented in SIARD. 6.2 Structure layer Each DBMS stores the structural information in its own specific way and to overcome this situation we had to develop specific connectors, import modules, for each one. For each DBMS we created a connector that connects to the database and knows how to extract its structural information. If we need to support a new DBMS in the future we just need to program a new import module for that DBMS. In the last version, we added support for DB2 creating a new import module for this DBMS. Preservation formats Fast viewer resources Fast viewer application DBML SIARD db-preservation-toolkit Lucene / Solar index Web interface & REST API 6.3 Semantics layer This layer corresponds to the behavioral part of a database and is where the focus of the discussion in this area is. We include in this layer: views, stored procedures, rights, roles, user management, APIs, interfaces, and other feature we can come across. Currently there are partial solutions for it. DBML does not support it, it only deals with the first two layers. SIARD enables SIP creators to store views, stored procedures and constrains capturing a significant part of the database behavior. These behavioral components are captured in SQL99 and stored inside an envelope. For some consumers we are still missing many things: forms that the application uses to capture input from users, reports, etc. In most of these cases, we try to capture de knowledge with application metadata and application images/screenshots. 7. DATABASE ACCESS We can see dbtoolkit as a SIP builder but also as a tool that enables several ways to accessor deploy archived databases. It can deploy a DIP very similar to the original SIP, an SQL based original database or, like figure 3 shows, any other format that has an export module contributed by some community programer. This way, we can look at this tool also as a database converter between different DBMS. Back in RODA, we needed a nice user interface that would enable users to explore the archived databases. We took php- MyAdmin, simplified it and we end up with a tool that allows users to browse databases, to access data, to access structure information and to execute some SQL queries. This new access component works with MySQL export module and uses a local MySQL DBMS to cache the database. The problem with this approach is that it does not scale for large databases or a large database quantity. Pursuing the scalability idea we are launching a new project to create a faster viewer with simpler interfaces. The project is illustrated in figure 4. The idea is to dump and index the Figure 4: New Database Viewer data on a search engine like Lucene and having that engine as the interface to access data. This way we won t need an external DBMS or an external database cache to access the data making the access functionality simpler and faster. 8. FUTURE WORK As future work we still have to improve some features and to run some tests. We are working on new small projects pursuing the idea of reverse engineering the relational model. Since we are free from the DBMS why shall we stick with the relational model? Relational model is optimized for transactions. If we have an archived frozen database we won t be executing transactions. If we don t need the relational model we can undo the database normalization going towards the original conceptual database model. During a phd thesis we have been working to create algorithms to migrate data from a relational model into an ontological model close to the database conceptual model [5]. In a more recente work we created a SIARD to RDF converter and implemented a simple RDF navigator for databases [9]. 9. ACKNOWLEDGMENTS This work is supported by the European Commission under FP7 CIP PSP grant agreement number E-ARK. 10. REFERENCES [1] F. Berman. Surviving the data deluge. Communications of the ACM, 51(12), [2] Consultative Committee for Space Data Systems. National Aeronautics and Space Administration, [3] M. Ferreira. Introdução à preservacao digital - Conceitos, estratégias e actuais consensos. Escola de Engenharia da Universidade do Minho, Guimarães,, [4] R. Freitas. Preservação digital de bases de dados relacionais. Master s thesis, Escola de Engenharia,
6 Universidade do Minho,, [5] R. A. P. Freitas. Relational databases digital preservation. PhD thesis, Engineering School, University of Minho,, [6] P. Manson. Digital preservation research: An evolving landscape. European Research Consortium for Informatics and Mathematics - NEWS, [7] PREMIS Working Group OCLC Online Computer Library Center & Research Libraries Group. Data dictionary for preservation metadata: final report of the premis working group oclc online computer library center & research libraries group. Technical report, Dublin, Ohio, USA, [8] J. Ramalho, M. Ferreira, L. Faria, and R. Castro. Relational database preservation through xml modelling. In Extreme Markup Languages 2007, Montréal, Québec, [9] F. Rocha. Preservação de Bases de Dados com SIARD. Master s thesis, Engineering School, University of Minho,, [10] Swiss Federal Archives - SFA. Siard - format description [11] The Library of Congress. Página oficial do ead versão de [12] The Library of Congress. Mets webpage
Database preservation toolkit:
Nov. 12-14, 2014, Lisbon, Portugal Database preservation toolkit: a flexible tool to normalize and give access to databases DLM Forum: Making the Information Governance Landscape in Europe José Carlos
Ex Libris Rosetta: A Digital Preservation System Product Description
Ex Libris Rosetta: A Digital Preservation System Product Description CONFIDENTIAL INFORMATION The information herein is the property of Ex Libris Ltd. or its affiliates and any misuse or abuse will result
Data Warehouses in the Path from Databases to Archives
Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction
2009 ikeep Ltd, Morgenstrasse 129, CH-3018 Bern, Switzerland (www.ikeep.com, [email protected])
CSP CHRONOS Compliance statement for ISO 14721:2003 (Open Archival Information System Reference Model) 2009 ikeep Ltd, Morgenstrasse 129, CH-3018 Bern, Switzerland (www.ikeep.com, [email protected]) The international
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
Preserving French Scientific data
Preserving French Scientific data Marion MASSOL (CINES) [email protected] DARIAH General VCC Meeting November 28 th, 29 th, 30 th 2012 AGENDA 1. Preserving data: our mission and strategy 4. The file
Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria
Why archiving erecords influences the creation of erecords Martin Stürzlinger scopepartner Vienna, Austria Electronic Records In a Productive System Created Used Changed Deleted In an Archival System No
Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina
Digital Assets Repository 3.0 PASIG User Group Conference Noha Adly Bibliotheca Alexandrina DAR 3.0 DAR manages the full lifecycle of a digital asset: its creation, ingestion, metadata management, storage,
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories MacKenzie Smith, Associate Director for Technology Massachusetts Institute of Technology Libraries, Cambridge,
INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD
INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD National Archives of Estonia Kuldar As National Archives of Sweden Karin Bredenberg University of Portsmouth Janet Delve
Long-term archiving and preservation planning
Long-term archiving and preservation planning Workflow in digital preservation Hilde van Wijngaarden Head, Digital Preservation Department National Library of the Netherlands The Challenge: Long-term Preservation
Chapter 5: The DAITSS Archiving Process
Chapter 5: The DAITSS Archiving Process Topics covered in this chapter: A brief glossary of terms relevant to this chapter Specifications for Submission Information Packages (SIPs) DAITSS archiving workflow
Digital Preservation. OAIS Reference Model
Digital Preservation OAIS Reference Model Stephan Strodl, Andreas Rauber Institut für Softwaretechnik und Interaktive Systeme TU Wien http://www.ifs.tuwien.ac.at/dp Aim OAIS model Understanding the functionality
Reverse Engineering in Data Integration Software
Database Systems Journal vol. IV, no. 1/2013 11 Reverse Engineering in Data Integration Software Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] Integrated applications
Technical concepts of kopal. Tobias Steinke, Deutsche Nationalbibliothek June 11, 2007, Berlin
Technical concepts of kopal Tobias Steinke, Deutsche Nationalbibliothek June 11, 2007, Berlin 1 Overview Project kopal Ideas Organisation Results Technical concepts DIAS kolibri Models of reusability 2
Functional Requirements for Digital Asset Management Project version 3.0 11/30/2006
/30/2006 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 = required; 2 = optional; 3 = not required functional requirements Discovery tools available to end-users:
José Carlos Ramalho [email protected] [email protected]
José Carlos Ramalho [email protected] [email protected] 1 Database migration CLI José Carlos Ramalho [email protected] [email protected] 1 Intermediate Representation DBML; Hardware and Software independent; It can
The challenges of becoming a Trusted Digital Repository
The challenges of becoming a Trusted Digital Repository Annemieke de Jong is Preservation Officer at the Netherlands Institute for Sound and Vision (NISV) in Hilversum. She is responsible for setting out
Interagency Science Working Group. National Archives and Records Administration
Interagency Science Working Group 1 National Archives and Records Administration Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS)
Designing Data Models for Asset Metadata Daniel Hurtubise SGI
Designing Data Models for Asset Metadata Daniel Hurtubise SGI Abstract The Media Asset Management System (MAMS) stores digital data and metadata used to support the mission of a company. In essence, the
System Requirements for Archiving Electronic Records PROS 99/007 Specification 1. Public Record Office Victoria
System Requirements for Archiving Electronic Records PROS 99/007 Specification 1 Public Record Office Victoria Version 1.0 April 2000 PROS 99/007 Specification 1: System Requirements for Archiving Electronic
AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS
Distr. GENERAL Working Paper No.2 26 April 2007 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL
Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries
Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries Andrew Weidner University of Houston, USA [email protected] Annie Wu University of Houston,
Long-term Archiving of Relational Databases with Chronos
First International Workshop on Database Preservation (PresDB'07) 23 March 2007, at the UK Digital Curation Centre and the Database Group in the School of Informatics, University of Edinburgh Long-term
NTU-IR: An Institutional Repository for Nanyang Technological University using DSpace
Abrizah Abdullah, et al. (Eds.): ICOLIS 2007, Kuala Lumpur: LISU, FCSIT, 2007: pp 103-108 NTU-IR: An Institutional Repository for Nanyang Technological University using DSpace Jayan C Kurian 1, Dion Hoe-Lian
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
How To Manage Your Digital Assets On A Computer Or Tablet Device
In This Presentation: What are DAMS? Terms Why use DAMS? DAMS vs. CMS How do DAMS work? Key functions of DAMS DAMS and records management DAMS and DIRKS Examples of DAMS Questions Resources What are DAMS?
ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001
ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel
Service Oriented Architecture
Service Oriented Architecture Charlie Abela Department of Artificial Intelligence [email protected] Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM) Oracle's Sun Storage Archive Manager (SAM) self-protecting file system software reduces operating costs by providing data
Questionnaire on Digital Preservation in Local Authority Archive Services
Questionnaire on Digital Preservation in Local Authority Archive Services A - Digital Preservation Planning 1. Would you describe your Archive Service as: Actively seeking digital material Reacting to
Long Term Knowledge Retention and Preservation
Long Term Knowledge Retention and Preservation Aziz Bouras University of Lyon, DISP Laboratory France [email protected] Recent years: How should digital 3D data and multimedia information
DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM
DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM Introduction The Institute of Museum and Library Services (IMLS) is committed to expanding public access to federally funded research, data, software,
Best Practices for Structural Metadata Version 1 Yale University Library June 1, 2008
Best Practices for Structural Metadata Version 1 Yale University Library June 1, 2008 Background The Digital Production and Integration Program (DPIP) is sponsoring the development of documentation outlining
Managing large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob ([email protected]) ABSTRACT
A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
EUR-Lex 2012 Data Extraction using Web Services
DOCUMENT HISTORY DOCUMENT HISTORY Version Release Date Description 0.01 24/01/2013 Initial draft 0.02 01/02/2013 Review 1.00 07/08/2013 Version 1.00 -v1.00.doc Page 2 of 17 TABLE OF CONTENTS 1 Introduction...
SEVENTH FRAMEWORK PROGRAMME THEME ICT -1-4.1 Digital libraries and technology-enhanced learning
Briefing paper: Value of software agents in digital preservation Ver 1.0 Dissemination Level: Public Lead Editor: NAE 2010-08-10 Status: Draft SEVENTH FRAMEWORK PROGRAMME THEME ICT -1-4.1 Digital libraries
#MMTM15 #INFOARCHIVE #EMCWORLD 1
#MMTM15 #INFOARCHIVE #EMCWORLD 1 1 INFOARCHIVE A TECHNICAL OVERVIEW DAVID HUMBY SOFTWARE ARCHITECT #MMTM15 2 TWEET LIVE DURING THE SESSION! Connect with us: Sign up for a Hands On Lab 6 th May, 1.30 PM,
Gradient An EII Solution From Infosys
Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such
DAR: A Digital Assets Repository for Library Collections
DAR: A Digital Assets Repository for Library Collections Iman Saleh 1, Noha Adly 1,2, Magdy Nagi 1,2 1 Bibliotheca Alexandrina, El Shatby 21526, Alexandria, Egypt {iman.saleh, noha.adly, magdy.nagi}@bibalex.org
Building An Institutional Repository With DSpace
102 PLANNER - 2008 Building An Institutional Repository With DSpace Juli Thakuria Abstract Paper deals with open source institutional repository software specially DSpace. After defining the terms, it
The Spatial Data Standards for Facilities, Infrastructure, and Environment Online (SDSFIE Online) Web Site. http://www.sdsfieonline.
The Spatial Data Standards for Facilities, Infrastructure, and Environment Online (SDSFIE Online) Web Site http://www.sdsfieonline.org Mr. Kurt Buehler DISDI Program Support Image Matters LLC July 22,
Organization of VizieR's Catalogs Archival
Organization of VizieR's Catalogs Archival Organization of VizieR's Catalogs Archival Table of Contents Foreword...2 Environment applied to VizieR archives...3 The archive... 3 The producer...3 The user...3
MultiMimsy database extractions and OAI repositories at the Museum of London
MultiMimsy database extractions and OAI repositories at the Museum of London Mia Ridge Museum Systems Team Museum of London [email protected] Scope Extractions from the MultiMimsy 2000/MultiMimsy
Preservation Handbook
Preservation Handbook [Binary Text / Word Processor Documents] Author Rowan Wilson and Martin Wynne Version Draft V3 Date 22 / 08 / 05 Change History Revised by MW 22.8.05; 2.12.05; 7.3.06 Page 1 of 7
Report of the Ad Hoc Committee for Development of a Standardized Tool for Encoding Archival Finding Aids
1. Introduction Report of the Ad Hoc Committee for Development of a Standardized Tool for Encoding Archival Finding Aids Mandate The International Council on Archives has asked the Ad Hoc Committee to
Create a single 360 view of data Red Hat JBoss Data Virtualization consolidates master and transactional data
Whitepaper Create a single 360 view of Red Hat JBoss Data Virtualization consolidates master and transactional Red Hat JBoss Data Virtualization can play diverse roles in a master management initiative,
Security Issues for the Semantic Web
Security Issues for the Semantic Web Dr. Bhavani Thuraisingham Program Director Data and Applications Security The National Science Foundation Arlington, VA On leave from The MITRE Corporation Bedford,
Institutional Repositories: Staff and Skills Set
SHERPA Document Institutional Repositories: Staff and Skills Set University of Nottingham 25 th August 2009 Circulation PUBLIC Mary Robinson University of Nottingham Introduction This document began in
Semantic Exploration of Archived Product Lifecycle Metadata under Schema and Instance Evolution
Semantic Exploration of Archived Lifecycle Metadata under Schema and Instance Evolution Jörg Brunsmann Faculty of Mathematics and Computer Science, University of Hagen, D-58097 Hagen, Germany [email protected]
Building Semantic Content Management Framework
Building Semantic Content Management Framework Eric Yen Computing Centre, Academia Sinica Outline What is CMS Related Work CMS Evaluation, Selection, and Metrics CMS Applications in Academia Sinica Concluding
A Grid Architecture for Manufacturing Database System
Database Systems Journal vol. II, no. 2/2011 23 A Grid Architecture for Manufacturing Database System Laurentiu CIOVICĂ, Constantin Daniel AVRAM Economic Informatics Department, Academy of Economic Studies
Service-Oriented Architectures
Architectures Computing & 2009-11-06 Architectures Computing & SERVICE-ORIENTED COMPUTING (SOC) A new computing paradigm revolving around the concept of software as a service Assumes that entire systems
Technology Watch Report
Technology Watch Report The Open Archival Information System Reference Model: Introductory Guide Brian F. Lavoie Office of Research OCLC Online Computer Library Center, Inc. 6565 Frantz Road, Dublin OH
M3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France
XML Processing and Web Services. Chapter 17
XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing
Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology
Semantic Knowledge Management System Paripati Lohith Kumar School of Information Technology Vellore Institute of Technology University, Vellore, India. [email protected] Abstract The scholarly activities
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
Copying Archives. Ngoni Munyaradzi (MNYNGO001) Email: [email protected]
Copying Archives Ngoni Munyaradzi (MNYNGO001) Email: [email protected] Abstract This paper focuses on the problem of trying to define a common exchange interface. That will be used to implement repository-to-repository
MarkLogic Enterprise Data Layer
MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer September 2011 September 2011 September 2011 Table of Contents Executive Summary... 3 An Enterprise Data
SQL Maestro and the ELT Paradigm Shift
SQL Maestro and the ELT Paradigm Shift Abstract ELT extract, load, and transform is replacing ETL (extract, transform, load) as the usual method of populating data warehouses. Modern data warehouse appliances
DA-NRW: A distributed architecture for longterm preservation
DA-NRW: A distributed architecture for longterm preservation Manfred Thaller, Sebastian Cuy, Jens Peters, Daniel de Oliveira, Martin Fischer Universität zu Köln International Workshop on Semantic Digital
A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire
A Selection of Questions from the Stewardship of Digital Assets Workshop Questionnaire SECTION A: Institution Information What year did your institution begin creating digital resources? What year did
Introduction to Service Oriented Architectures (SOA)
Introduction to Service Oriented Architectures (SOA) Responsible Institutions: ETHZ (Concept) ETHZ (Overall) ETHZ (Revision) http://www.eu-orchestra.org - Version from: 26.10.2007 1 Content 1. Introduction
North Carolina Digital Preservation Policy. April 2014
North Carolina Digital Preservation Policy April 2014 North Carolina Digital Preservation Policy Page 1 Executive Summary The North Carolina Digital Preservation Policy (Policy) governs the operation,
EQUELLA. One Central Repository for a Diverse Range of Content. www.equella.com
EQUELLA One Central Repository for a Diverse Range of Content www.equella.com What is EQUELLA? EQUELLA, our web-based platform, provides one central location for the delivery of a diverse range of content
Digital Archiving at the Swiss Federal Archives (SFA)
Swiss Federal Archives SFA Dr. Krystyna W. Ohnesorge Archivstrasse 24, CH-3003 Bern Phone +41 31 32 45827, Fax +41 31 32 27823 [email protected] www.bar.admin.ch itopia ag corporate information
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
A DICOM-based Software Infrastructure for Data Archiving
A DICOM-based Software Infrastructure for Data Archiving Dhaval Dalal, Julien Jomier, and Stephen R. Aylward Computer-Aided Diagnosis and Display Lab The University of North Carolina at Chapel Hill, Department
ADRI. Digital Record Export Standard. ADRI-2007-01-v1.0. ADRI Submission Information Package (ASIP)
ADRI Digital Record Export Standard ADRI Submission Information Package (ASIP) ADRI-2007-01-v1.0 Version 1.0 31 July 2007 Digital Record Export Standard 2 Copyright 2007, Further copies of this document
a division of Technical Overview Xenos Enterprise Server 2.0
Technical Overview Enterprise Server 2.0 Enterprise Server Architecture The Enterprise Server (ES) platform addresses the HVTO business challenges facing today s enterprise. It provides robust, flexible
Sisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
Open Source egovernment Reference Architecture Osera.modeldriven.org. Copyright 2006 Data Access Technologies, Inc. Slide 1
Open Source egovernment Reference Architecture Osera.modeldriven.org Slide 1 Caveat OsEra and the Semantic Core is work in progress, not a ready to use capability Slide 2 OsEra What we will cover OsEra
U.S. Department of Health and Human Services (HHS) The Office of the National Coordinator for Health Information Technology (ONC)
U.S. Department of Health and Human Services (HHS) The Office of the National Coordinator for Health Information Technology (ONC) econsent Trial Project Architectural Analysis & Technical Standards Produced
Implementing an Institutional Repository for Digital Archive Communities: Experiences from National Taiwan University
Implementing an Institutional Repository for Digital Archive Communities: Experiences from National Taiwan University Chiung-min Tsai Department of Library and Information Science, National Taiwan University
METADATA STANDARDS AND GUIDELINES RELEVANT TO DIGITAL AUDIO
This chart provides a quick overview of metadata standards and guidelines that are in use with digital audio, including metadata used to describe the content of the files; metadata used to describe properties
Assignment 1 Briefing Paper on the Pratt Archives Digitization Projects
Twila Rios Digital Preservation Spring 2012 Assignment 1 Briefing Paper on the Pratt Archives Digitization Projects The Pratt library digitization efforts actually encompass more than one project, including
Chapter 2 Database System Concepts and Architecture
Chapter 2 Database System Concepts and Architecture Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Outline Data Models, Schemas, and Instances Three-Schema Architecture
Database Preservation Case Study: Review
Database Preservation Case Study: Review Mette van Essen, Maurice de Rooij, Bill Roberts, Maurice van den Dobbelsteen National Archives of the Netherlands 12 July 2011 Introduction As part of the PLANETS
BUSINESS REQUIREMENTS SPECIFICATION (BRS) Business domain: Archiving and records management. Transfer of digital records
BUSINESS REQUIREMENTS SPECIFICATION (BRS) Business domain: Archiving and records management Business process: Transfer of digital records Document identification: Title: Transfer of digital records Trade
Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting
Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting S.N.CHEONG AZHAR K.M. M. HANMANDLU Faculty Of Engineering, Multimedia University, Jalan Multimedia,
Preserving Digital Materials
Preserving Digital Materials Final Report of the Digital Preservation and Archive Committee Submitted to SOPAG on October 18, 2001 Membership Howard Besser, UCLA Curtis Fornadley, UCLA Anne Gilliland-Swetland,
High-Volume Data Warehousing in Centerprise. Product Datasheet
High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified
Notes about possible technical criteria for evaluating institutional repository (IR) software
Notes about possible technical criteria for evaluating institutional repository (IR) software Introduction Andy Powell UKOLN, University of Bath December 2005 This document attempts to identify some of
