Knowledge-Based Persistent Archives
|
|
- Magnus Ramsey
- 7 years ago
- Views:
Transcription
1 SDSC TR Knowledge-Based Persistent Archives Reagan W. Moore San Diego Supercomputer Center Sponsored by NATIONAL ARCHIVES AND RECORDS ADMINISTRATION and ADVANCED RESEARCH PROJECTS AGENCY ITO INTELLIGENT METACOMPUTING TESTBED ARPA Order D570 Issued by ESC/ENS under contract F C-0020 January 18, 2001 San Diego Supercomputer Center TECHNICAL REPORT Copyright 2001, The Regents of the University of California
2 The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Advanced Research Projects Agency or the U.S. Government.
3 Knowledge-based Persistent Archives Reagan W. Moore San Diego Supercomputer Center La Jolla, CA Abstract The preservation of digital information for long periods of time is becoming feasible through the integration of archival storage technology from supercomputer centers, information models from the digital library community, and preservation models from the archivist s community. The supercomputer centers provide the technology needed to store the immense amounts of digital data that are being created, while the digital library community provides the mechanisms to define the context needed to interpret the data. The coordination of these technologies with preservation and management policies defines the infrastructure for a collection based persistent archive [1]. This report discusses the use of knowledge representations to augment collection-based persistent archives. 1. Introduction Supercomputer centers, digital libraries, and archival storage communities have common persistent archival storage requirements. Each of these communities is building software infrastructure to organize and store large collections of data. An emerging common requirement is the ability to maintain data collections for long periods of time. The challenge is to maintain the ability to discover, access, and display digital objects that are stored within the archive, while the technology used to manage the archive evolves. We originally implemented a collection-based persistent archive [1] in which a description of the collection is stored along with the data. The approach focused on the development of infrastructure independent representations for the information content of the collection, interoperability mechanisms to support migration of the collection onto new software and hardware systems, and use of a standard tagging language to annotate the information content. The process used to ingest a collection, transform it into an infrastructure independent form, and recreate the collection on new technology is shown schematically in Figure 1. 1
4 Figure 1. Persistent Collection Process Two phases are emphasized, the archiving of the collection, and the retrieval or instantiation of the collection onto new technology. The diagram shows the multiple steps that are necessary to preserve digital objects through time. The steps form a cycle that can be used for migrating data collections onto new infrastructure as technology evolves. The technology changes can occur at the system-level where archive, file, compute and database software evolves, or at the information model level where formats, programming languages and practices change. The ultimate goal is to maintain not only the bits associated with the original data, but also the context that permits the data to be interpreted. We rely on the use of collections to define the context to associate with digital data. Each digital object is maintained as a tagged structure that includes the original bytes of data, as well as attributes that have been defined as relevant for the data collection. A collection-based persistent archive is therefore one in which the organization of the collection is archived simultaneously with the digital objects that comprise the collection. A persistent collection requires the ability to dynamically recreate the collection on new technology. Scalable archival storage systems are used to ensure that sufficient resources are available for continual migration of digital objects to new media. The software systems that interpret the infrastructure independent representation for the collections are based upon generic digital library systems, and are migrated explicitly to new platforms. In this system, the original representation of the digital objects and of the collections does 2
5 not change. The maintenance of the persistent archive is then achieved through application of archivist policies that govern the rate of migration of the objects and the collection instantiation software. 2. Knowledge-based Archives The preservation of the context to associate with digital objects is the dominant issue for knowledge-based persistent archives. The context is traditionally defined through specification of attributes that are associated with each digital object. The context is also defined through the implied relationships that exist between the attributes, and the preferred organization of the attributes in user interfaces for viewing the data collection. Management of the collection context is made difficult by the rapid change of technology. Software systems used to manage collections are changing on five to tenyear time scale. Of greater concern is that the information tagging languages used to annotate digital objects is also changing. The persistent archiving of a collection must also handle the evolution of the information mark-up language. We have characterized persistent archives in prior publications [1,2] as collection-based repositories. We now recognize the need to broaden the archive characterization to knowledge-based repositories. Not only the information content, but also the processing steps used to accession the collection must be preserved. Conceptually, one can view the accessioning process as the equivalent of the process needed to instantiate the collection on new technology. If the accessioning process can be captured in an infrastructure independent representation, the same process can be used to manage the migration of the collection to new markup languages, archival data repositories, information repositories, and knowledge repositories. The archival description of a collection then must include not only contextual information about the digital objects, but also knowledge about the relationships used to derive the contextual information. The architecture that is needed to implement a knowledge-based persistent archive is shown in figure 2. 3
6 Ingest Manage Access Knowledge Relationships between Concepts Knowledge Repository for Rules Knowledge or Topic-Based Query Information Attributes Semantics Information Repository Attribute- based Query Data Fields Containers Folders Storage (Replicas, Persistent IDs) Feature-based Query Process Infrastructure Process Figure 2. Knowledge-based Persistent Archive The three columns represent the technologies needed to manage the ingestion process, manage the persistent archive, and manage the access environment. The three rows represent the infrastructure needed to manage knowledge, information and data. Knowledge is represented as relationships between domain concepts. Information is represented as attributes about digital objects within the collection. The digital objects are images of the reality described by the domain concepts. Ingestion corresponds to the steps of knowledge mining/tagging, information mining/tagging, and digital object organization/storage. Persistent archive management requires infrastructure to store the digital objects (archives), information repositories to hold the metadata (databases), and knowledge repositories to organize the relationships (logic systems). The access environment provides mechanisms to query the collection at the data level through feature extraction, at the information level through database queries, and at the knowledge-level through domain concepts. Just as the data management infrastructure is intended to provide access without having to know data object names, the knowledge access infrastructure is intended to provide access without having to know the explicit metadata attribute names used to organize the collection database. 4
7 The knowledge-based persistent archive requires software infrastructure to support interoperability between different implementations of ingestion, management, and access infrastructure components. This is shown in Figure 3. Between Ingest platforms and Management repositories, standards are needed to define consistent tagging mechanisms for knowledge (XML Topic Map DTD[3] or XTM DTD) for information (XML DTD[4]), and for data organization (logical folders and physical containers). Between Management repositories and Access platforms, standard query languages are needed for knowledge-based access (Knowledge query language or rule manipulation language), attribute-based access (EMCAT SGL generator or MIX mediator[5]), and feature-based access (application of procedures within a computational grid). Between the knowledge and information environments, a standard representation is needed to map from concepts to attributes, such as topic maps or model-based access systems. Between information and data storage environments, a data handling system is needed to map from attributes to storage locations, such as the SDSC Storage Resource Broker.[6] Ingest Manage Access Knowledge Relationships Between Concepts X T M D T D Knowledge Repository for Rules Ru les - K Q L Knowledge or Topic-Based Query Information Attributes Semantics (Topic Maps / Model-based Access) X M L D T D Information Repository (Data Handling System - Storage Resource Broker) E M C A T / M IX Attribute- based Query Data Fields Containers Folders M C A T/ H D F Storage (Replicas, Persistent IDs) Gr ids Feature-based Query Figure 3. Persistent Archive Interfaces 5
8 Persistence is achieved through the infrastructure middleware (shown in Figure 3 as the blue grid) that links accession platforms, management repositories, and access platforms. The same middleware is needed to support grid environments (such as computation on distributed data collections) and digital library environments (such as curricula support in the National Science, Mathematics, Engineering, and Technology Education Digital Library - NDSL). This architecture has been proposed to both the Grid Forum and the NSDL, and may be the architecture that integrates knowledge management activities from these communities with the persistent archive community. 2.1 Archive Accessioning Process: Of interest is the emerging need for knowledge management as well as information management and data management when ingesting collections. When we look at collections, we see multiple interfaces where knowledge is required to be able to adequately describe relationships inherent within the collection. We have been looking at the preservation of relationships that are needed to describe: - implied knowledge (interpretation of fields) - structural knowledge (topology associated with digital line graphs) - domain knowledge (relationships between domain concepts) - procedural knowledge (workflow creation steps for digital objects) - presentation knowledge (support for knowledge-based queries). One way to accomplish the goal of knowledge-based access is to use the ISO Topic Maps standard to maintain mappings between domain concepts and the attribute names used in the collection schema. It is very interesting to note that relationships are implicit between each of the nine infrastructure components defined in Figure 2. The relationships either define rules that can be applied to the collection, or quantify associations that can be made between collection elements. Examples are: Relationships that quantify rules: Rules for defining collection attributes Rules for organizing attributes into a schema Rules for feature extraction Rules governing data set creation Relationships that quantify associations: Organization of concepts into topic maps Ontology mapping between concept maps Mapping of concepts to collection attributes Mapping of concepts to feature extraction rules Mapping between attributes and data fields (semantics) 6
9 Semantic mapping between collections Mapping between attributes and storage Mapping between attributes and features Clustering of data into containers The relationships can be separated into four broad classes: Semantic/logical relationships. Relationships can be defined to map from the concepts used to describe the collection to the attribute tags used to annotate the collection. Semantic relationships can also be defined between the domain specific concepts as knowledge bases or semantic maps. Procedural/temporal relationships. The transformations that are applied to the collection to create the archival form constitute a workflow that represents the ingestion process. The temporal order and explicit transformations can be represented as a set of states through which the collection is processed. Structural/spatial relationships. The internal organization of digital objects within the collection can be represented as a structural ordering of the tagged elements. The representation of the structure can be expressed using the same types of characterization as needed for spatially tagged data. Functional relationships. For scientific applications, analysis algorithms are needed to identify features that might be associated with a digital object. The expression of the relationship between the named feature and its presence within a digital object will require the ability to archive mathematical expressions. In the ingestion process, a major challenge has been the need to be able to differentiate between artifacts and implied knowledge. Essentially, the steps of refining the description of a collection by including more attributes, must be integrated with the identification of anomalies. To make progress, we apply the concepts of occurrence tagging and closure to the archived collections. Occurrence tagging is the explicit annotation of the location of each tagged attribute along with the associated value. This provides a representation that captures all of the information content, without imposing constraints on permissible attribute values. Closure is the analysis of the occurrences to identify both completeness and consistency. Completeness is evaluated by verifying that all attributes are populated, and that the information content is fully annotated. Consistency checks that all attribute values fall within defined ranges. Consistency can be checked by construction of inverse indexes that point to all occurrences of each attribute value. It is necessary to iterate between knowledge extraction and attribute mining. We illustrate this through application of the ingestion process shown in Figure 4. 7
10 Define a representation of the concepts inherent within the collection. Build a concept map that identifies all of the possible attributes to associate with each concept Tag the collection to identify attributes for each of the possible fields. Restructure the concept map to eliminate unused fields, specialize classes, rearrange class attributes, etc. Mine the collection to identify differences between bill versions, identify missing attributes, identify implicit attributes, and identify invalid data (such as duplicated pages). Accession Template Closure Concept/Attribute Attribute Inverse Indexing Knowledge Generation Information Generation Attribute Selection Attribute Tagging Occurrence Tagging View Management Data Organization Collection Figure 4. Ingestion Process At one time, the hope was to be able to ingest a collection in a single pass. Based upon the above steps, at least three analyses are needed to mine knowledge, information, and organize data. Depending upon the number of iterations used to refine the concept space, additional passes through the data may be necessary. It is still an area of debate for whether it will be possible to differentiate in general between concept map refinement and error analysis. These steps will have to be done jointly for most collections. 8
11 Note that once the data has been wrapped into XML, all integrity checking, knowledge mining, derivation of a "consolidated version", etc., can be seen as (albeit very elaborate) queries against an XML collection. The interesting research issue is to find out how well XML query languages (including the UCSD/SDSC XMAS system) are able to express the analysis queries. Especially for integrity checking, logic-based XML query languages seem to be a good choice for an ingestion environment. 2.2 Archival Representation of Collections: One of the results of the analysis of the collections provided by NARA was the realization that multiple views of a collection may need to be archived. Typical views include: Original form as submitted XML tagged form Occurrence representation (occurrence, attribute, value) Knowledge-based representation (recreation of the original form from the occurrence representation). This view can be thought of as the noise-free representation of the original collection based upon the knowledge and information content that was created during the accessioning process. This view can be designed to include white space and all anomalies if desired. Consolidated representation (elimination of all duplicated information) By archiving descriptions of the processing steps needed to go between each of these views, one can guarantee that the same processing steps could be applied in the future to re-instantiate the collection on new technology, including new information and knowledge representations. 3. Relationships between NARA and other Agency projects: There is a strong synergy between the development of persistent archive infrastructure for NARA, digital library development for NSF, and data grid development for DOE, NASA, and NLM. All of these research areas require the ability to manage knowledge, information, and data objects. What has become apparent, is that even though the requirements driving the infrastructure development for each agency are different, a uniform architecture is emerging that meets all agency requirements. The architecture shown in Figure 3 provides: Validation mechanism for the common data management architecture 9
12 Validation mechanism for the differentiation between knowledge, information, and data and the choice of representation standards Integration vehicle for tying together persistent archives with grid environments Integration vehicle for tying together grid environments with digital libraries Integration vehicle for tying together digital libraries with persistent archives It is interesting to note the multiple projects that are building upon the architecture that is being developed in the NARA collaboration: NSF Digital Library Initiative, Phase 2. NSF National SMET Education Digital Library NSF NPACI data grid for neuroscience brain image federation NASA Information Power Grid distributed data processing DOE ASCI Data Visualization Corridor remote data processing DOE Particle Physics Data Grid object replication NLM Digital Embryo Project data grid for image processing and storage NARA Persistent Archive It is also interesting to note the iterative technology development cycle that links all of the projects. An original DARPA project developed the data handling capabilities as part of the Distributed Object Computation Testbed. The NASA IPG integrated the data handling technology with computational grid technology (common security environments). The NSF NPACI project integrated information management with data handling to support digital libraries. The ASCI PPDG then applied the technology to support replica management across heterogeneous systems. And the NARA project applied the technology to manage migration of collections across evolving infrastructure technology. Acknowledgements: This research has been sponsored by the National Archives and Records Administration and Advanced Research Projects Agency/ITO, "Intelligent Metacomputing Testbed", ARPA Order No. D570, issued by ESC/ENS under Contract #F C-0020, and by the Data Intensive Computing thrust area of the National Science Foundation project ASC National Partnership for Advanced Computational Infrastructure. The research topics have been investigated by the following members of the Data Intensive Computing Environment Group at the San Diego Supercomputer Center: Richard Marciano, Bertram Ludaescher, Ilya Zaslavsky, Amarnath Gupta, and Chaitan Baru. 10
13 References: [1] Moore, R., C. Baru, A. Rajasekar, B. Ludascher, R. Marciano, M. Wan, W. Schroeder, and A. Gupta, Collection-Based Persistent Digital Archives - Part 1, D-Lib Magazine, March 2000, [2] Moore, R., C. Baru, A. Rajasekar, B. Ludascher, R. Marciano, M. Wan, W. Schroeder, and A. Gupta, Collection-Based Persistent Digital Archives - Part 2, D-Lib Magazine, April 2000, [3] ISO/IEC FCD Topic Maps, [4] Extensible Markup Language (XML) [5] Baru, C., V. Chu, A. Gupta, B. Ludäscher, R. Marciano, Y. Papakonstantinou, and P. Velikhov. XML-Based Information Mediation for Digital Libraries. In ACM Conference on Digital Libraries, Berkeley, CA, Exhibition program. [6] Baru, C., R, Moore, A. Rajasekar, M. Wan,"The SDSC Storage Resource Broker, Proc. CASCON'98 Conference, Nov.30-Dec.3, 1998, Toronto, Canada. 11
Assessment of RLG Trusted Digital Repository Requirements
Assessment of RLG Trusted Digital Repository Requirements Reagan W. Moore San Diego Supercomputer Center 9500 Gilman Drive La Jolla, CA 92093-0505 01 858 534 5073 moore@sdsc.edu ABSTRACT The RLG/NARA trusted
More informationPreservation Environments
Preservation Environments Reagan W. Moore San Diego Supercomputer Center University of California, San Diego 9500 Gilman Drive, MC-0505 La Jolla, CA 92093-0505 moore@sdsc.edu tel: +1-858-534-5073 fax:
More informationConcepts in Distributed Data Management or History of the DICE Group
Concepts in Distributed Data Management or History of the DICE Group Reagan W. Moore 1, Arcot Rajasekar 1, Michael Wan 3, Wayne Schroeder 2, Antoine de Torcy 1, Sheau- Yen Chen 2, Mike Conway 1, Hao Xu
More informationArchiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 minor@sdsc.edu San Diego Supercomputer Center
More informationUS Korea Joint Workshop on Digital Libraries August 10-11, 2000 San Diego Supercomputer Center San Diego, California
US Korea Joint Workshop on Digital Libraries August 10-11, 2000 San Diego Supercomputer Center San Diego, California 1. Executive Summary There are many barriers to the worldwide development of digital
More informationBuilding Preservation Environments with Data Grid Technology
SOAA_SP09 23/5/06 3:32 PM Page 139 Building Preservation Environments with Data Grid Technology Reagan W. Moore Abstract Preservation environments for digital records are successful when they can separate
More informationPolicy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano marciano@unc.edu Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
More informationBuilding Semantic Content Management Framework
Building Semantic Content Management Framework Eric Yen Computing Centre, Academia Sinica Outline What is CMS Related Work CMS Evaluation, Selection, and Metrics CMS Applications in Academia Sinica Concluding
More informationDigital Preservation Lifecycle Management
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar San Diego Supercomputer Center, University of California,
More informationA MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
More informationUsing Databases to Manage State Information for. Globally Distributed Data
Storage Resource Broker Using Databases to Manage State Information for Globally Distributed Data Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc sdsc.edu/srb Abstract The
More informationConceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora
Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora David Pcolar Carolina Digital Repository (CDR) david_pcolar@unc.edu Alexandra Chassanoff School of Information &
More informationThe Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationXML DATA INTEGRATION SYSTEM
XML DATA INTEGRATION SYSTEM Abdelsalam Almarimi The Higher Institute of Electronics Engineering Baniwalid, Libya Belgasem_2000@Yahoo.com ABSRACT This paper describes a proposal for a system for XML data
More informationIT S ABOUT TIME. Sponsored by. The National Science Foundation. Digital Government Program and Digital Libraries Program
IT S ABOUT TIME RESEARCH CHALLENGES IN DIGITAL ARCHIVING AND LONG-TERM PRESERVATION Sponsored by The National Science Foundation Digital Government Program and Digital Libraries Program Directorate for
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationData Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
More informationAbstract. 1. Introduction. irods White Paper 1
irods: integrated Rule Oriented Data System White Paper Data Intensive Cyber Environments Group University of North Carolina at Chapel Hill University of California at San Diego September 2008 Abstract
More informationData Grid Landscape And Searching
Or What is SRB Matrix? Data Grid Automation Arun Jagatheesan et al., University of California, San Diego VLDB Workshop on Data Management in Grids Trondheim, Norway, 2-3 September 2005 SDSC Storage Resource
More informationData Management System for grid and portal services
Data Management System for grid and portal services Piotr Grzybowski 1, Cezary Mazurek 1, Paweł Spychała 1, Marcin Wolski 1 1 Poznan Supercomputing and Networking Center, ul. Noskowskiego 10, 61-704 Poznan,
More informationIntegrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
More informationConcepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches
Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways
More informationArchiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie. uwe.borghoff@unibw.
Archiving Systems Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie uwe.borghoff@unibw.de Decision Process Reference Models Technologies Use Cases
More informationZhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley
P1.1 AN INTEGRATED DATA MANAGEMENT, RETRIEVAL AND VISUALIZATION SYSTEM FOR EARTH SCIENCE DATASETS Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University Xu Liang ** University
More informationAnnotation for the Semantic Web during Website Development
Annotation for the Semantic Web during Website Development Peter Plessers, Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,
More informationKnowledge based Replica Management in Data Grid Computation
Knowledge based Replica Management in Data Grid Computation Riaz ul Amin 1, A. H. S. Bukhari 2 1 Department of Computer Science University of Glasgow Scotland, UK 2 Faculty of Computer and Emerging Sciences
More informationDigital Preservation. OAIS Reference Model
Digital Preservation OAIS Reference Model Stephan Strodl, Andreas Rauber Institut für Softwaretechnik und Interaktive Systeme TU Wien http://www.ifs.tuwien.ac.at/dp Aim OAIS model Understanding the functionality
More informationDigital libraries of the future and the role of libraries
Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their
More informationISO 19119 and OGC Service Architecture
George PERCIVALL, USA Keywords: Geographic Information, Standards, Architecture, Services. ABSTRACT ISO 19119, "Geographic Information - Services," has been developed jointly with the Services Architecture
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationDatagridflows: Managing Long-Run Processes on Datagrids
Datagridflows: Managing Long-Run Processes on Datagrids Arun Jagatheesan 1,2, Jonathan Weinberg 1, Reena Mathew 1, Allen Ding 1, Erik Vandekieft 1, Daniel Moore 1,3, Reagan Moore 1, Lucas Gilbert 1 and
More informationQueensland recordkeeping metadata standard and guideline
Queensland recordkeeping metadata standard and guideline June 2012 Version 1.1 Queensland State Archives Department of Science, Information Technology, Innovation and the Arts Document details Security
More informationTools and Services for the Long Term Preservation and Access of Digital Archives
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer Studies Department of Electrical and Computer
More informationGIS Databases With focused on ArcSDE
Linköpings universitet / IDA / Div. for human-centered systems GIS Databases With focused on ArcSDE Imad Abugessaisa g-imaab@ida.liu.se 20071004 1 GIS and SDBMS Geographical data is spatial data whose
More information2009 ikeep Ltd, Morgenstrasse 129, CH-3018 Bern, Switzerland (www.ikeep.com, info@ikeep.com)
CSP CHRONOS Compliance statement for ISO 14721:2003 (Open Archival Information System Reference Model) 2009 ikeep Ltd, Morgenstrasse 129, CH-3018 Bern, Switzerland (www.ikeep.com, info@ikeep.com) The international
More informationMULTI AGENT-BASED DISTRIBUTED DATA MINING
MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:
More informationThe Southern California Earthquake Center Information Technology Research Initiative
The Southern California Earthquake Center Information Technology Research Initiative Toward a Collaboratory for System-Level Earthquake Science Tom Jordan USC Kim Olsen - UCSB 4th Meeting of the US-Japan
More informationSecure Semantic Web Service Using SAML
Secure Semantic Web Service Using SAML JOO-YOUNG LEE and KI-YOUNG MOON Information Security Department Electronics and Telecommunications Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon KOREA
More informationAn Oracle White Paper October 2013. Oracle Data Integrator 12c New Features Overview
An Oracle White Paper October 2013 Oracle Data Integrator 12c Disclaimer This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should
More informationThe Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets Ann Chervenak Ian Foster $+ Carl Kesselman Charles Salisbury $ Steven Tuecke $ Information
More informationOpen DMIX - Data Integration and Exploration Services for Data Grids, Data Web and Knowledge Grid Applications
Open DMIX - Data Integration and Exploration Services for Data Grids, Data Web and Knowledge Grid Applications Robert L. Grossman, Yunhong Gu, Dave Hanley, Xinwei Hong and Gokulnath Rao Laboratory for
More informationGEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
More informationDataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD
DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure Arcot (RAJA) Rajasekar DICE/SDSC/UCSD What is SRB? First Generation Data Grid middleware developed at the San Diego Supercomputer Center
More informationJOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
More informationThe Service Availability Forum Specification for High Availability Middleware
The Availability Forum Specification for High Availability Middleware Timo Jokiaho, Fred Herrmann, Dave Penkler, Manfred Reitenspiess, Louise Moser Availability Forum Timo.Jokiaho@nokia.com, Frederic.Herrmann@sun.com,
More informationInformation Services for Smart Grids
Smart Grid and Renewable Energy, 2009, 8 12 Published Online September 2009 (http://www.scirp.org/journal/sgre/). ABSTRACT Interconnected and integrated electrical power systems, by their very dynamic
More informationInfosys GRADIENT. Enabling Enterprise Data Virtualization. Keywords. Grid, Enterprise Data Integration, EII Introduction
Infosys GRADIENT Enabling Enterprise Data Virtualization Keywords Grid, Enterprise Data Integration, EII Introduction A new generation of business applications is emerging to support customer service,
More informationSemantic Exploration of Archived Product Lifecycle Metadata under Schema and Instance Evolution
Semantic Exploration of Archived Lifecycle Metadata under Schema and Instance Evolution Jörg Brunsmann Faculty of Mathematics and Computer Science, University of Hagen, D-58097 Hagen, Germany joerg.brunsmann@fernuni-hagen.de
More informationReport on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
More informationLuc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department
Luc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department What is cyberinfrastructure? Outline Examples of cyberinfrastructure t Why is this relevant to Libraries?
More informationMetadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting
Metadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting MA Xiaogang WANG Xinqing WU Chonglong JU Feng ABSTRACT: One of the core developments in geomathematics in now days
More informationTheme 6: Enterprise Knowledge Management Using Knowledge Orchestration Agency
Theme 6: Enterprise Knowledge Management Using Knowledge Orchestration Agency Abstract Distributed knowledge management, intelligent software agents and XML based knowledge representation are three research
More informationService Oriented Architecture
Service Oriented Architecture Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline
More informationDA-NRW: a distributed architecture for long-term preservation
DA-NRW: a distributed architecture for long-term preservation Manfred Thaller manfred.thaller@uni-koeln.de, Sebastian Cuy sebastian.cuy@uni-koeln.de, Jens Peters jens.peters@uni-koeln.de, Daniel de Oliveira
More informationTransparency and Efficiency in Grid Computing for Big Data
Transparency and Efficiency in Grid Computing for Big Data Paul L. Bergstein Dept. of Computer and Information Science University of Massachusetts Dartmouth Dartmouth, MA pbergstein@umassd.edu Abstract
More informationDesign and Implementation of a Semantic Web Solution for Real-time Reservoir Management
Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management Ram Soma 2, Amol Bakshi 1, Kanwal Gupta 3, Will Da Sie 2, Viktor Prasanna 1 1 University of Southern California,
More informationDistributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
More informationRUP Design. Purpose of Analysis & Design. Analysis & Design Workflow. Define Candidate Architecture. Create Initial Architecture Sketch
RUP Design RUP Artifacts and Deliverables RUP Purpose of Analysis & Design To transform the requirements into a design of the system to-be. To evolve a robust architecture for the system. To adapt the
More informationIntegrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen
More informationIn ediscovery and Litigation Support Repositories MPeterson, June 2009
XAM PRESENTATION (extensible TITLE Access GOES Method) HERE In ediscovery and Litigation Support Repositories MPeterson, June 2009 Contents XAM Introduction XAM Value Propositions XAM Use Cases Digital
More informationTalend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain
Talend Metadata Manager Reduce Risk and Friction in your Information Supply Chain Talend Metadata Manager Talend Metadata Manager provides a comprehensive set of capabilities for all facets of metadata
More informationBusiness Intelligence: Recent Experiences in Canada
Business Intelligence: Recent Experiences in Canada Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies 2 Business Intelligence
More informationCiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
More informationAmit Sheth & Ajith Ranabahu, 2010. Presented by Mohammad Hossein Danesh
Amit Sheth & Ajith Ranabahu, 2010 Presented by Mohammad Hossein Danesh 1 Agenda Introduction to Cloud Computing Research Motivation Semantic Modeling Can Help Use of DSLs Solution Conclusion 2 3 Motivation
More informationDatabases in Organizations
The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron
More informationA View Integration Approach to Dynamic Composition of Web Services
A View Integration Approach to Dynamic Composition of Web Services Snehal Thakkar, Craig A. Knoblock, and José Luis Ambite University of Southern California/ Information Sciences Institute 4676 Admiralty
More informationReverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,
More informationService Cloud for information retrieval from multiple origins
Service Cloud for information retrieval from multiple origins Authors: Marisa R. De Giusti, CICPBA (Comisión de Investigaciones Científicas de la provincia de Buenos Aires), PrEBi, National University
More informationIntegrating Relational Database Schemas using a Standardized Dictionary
Integrating Relational Database Schemas using a Standardized Dictionary Ramon Lawrence Advanced Database Systems Laboratory University of Manitoba Winnipeg, Manitoba, Canada umlawren@cs.umanitoba.ca Ken
More informationCollaborative SRB Data Federations
WHITE PAPER Collaborative SRB Data Federations A Unified View for Heterogeneous High-Performance Computing INTRODUCTION This paper describes Storage Resource Broker (SRB): its architecture and capabilities
More informationLong Term Knowledge Retention and Preservation
Long Term Knowledge Retention and Preservation Aziz Bouras University of Lyon, DISP Laboratory France abdelaziz.bouras@univ-lyon2.fr Recent years: How should digital 3D data and multimedia information
More informationFiltering the Web to Feed Data Warehouses
Witold Abramowicz, Pawel Kalczynski and Krzysztof We^cel Filtering the Web to Feed Data Warehouses Springer Table of Contents CHAPTER 1 INTRODUCTION 1 1.1 Information Systems 1 1.2 Information Filtering
More informationData Integration Hub for a Hybrid Paper Search
Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu,
More informationKnowledge Management in Heterogeneous Data Warehouse Environments
Management in Heterogeneous Data Warehouse Environments Larry Kerschberg Co-Director, E-Center for E-Business, Department of Information and Software Engineering, George Mason University, MSN 4A4, 4400
More informationKnowledge-based Expressive Technologies within Cloud Computing Environments
Knowledge-based Expressive Technologies within Cloud Computing Environments Sergey V. Kovalchuk, Pavel A. Smirnov, Konstantin V. Knyazkov, Alexander S. Zagarskikh, Alexander V. Boukhanovsky 1 Abstract.
More information1 What Are Web Services?
Oracle Fusion Middleware Introducing Web Services 11g Release 1 (11.1.1) E14294-04 January 2011 This document provides an overview of Web services in Oracle Fusion Middleware 11g. Sections include: What
More informationChapter 11 Mining Databases on the Web
Chapter 11 Mining bases on the Web INTRODUCTION While Chapters 9 and 10 provided an overview of Web data mining, this chapter discusses aspects of mining the databases on the Web. Essentially, we use the
More information1 What Are Web Services?
Oracle Fusion Middleware Introducing Web Services 11g Release 1 (11.1.1.6) E14294-06 November 2011 This document provides an overview of Web services in Oracle Fusion Middleware 11g. Sections include:
More informationMiddleware support for the Internet of Things
Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,
More informationAutonomy for SOHO Ground Operations
From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. Autonomy for SOHO Ground Operations Walt Truszkowski, NASA Goddard Space Flight Center (GSFC) Walt.Truszkowski@gsfc.nasa.gov
More informationWHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT
WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT CONTENTS 1. THE NEED FOR DATA GOVERNANCE... 2 2. DATA GOVERNANCE... 2 2.1. Definition... 2 2.2. Responsibilities... 3 3. ACTIVITIES... 6 4. THE
More informationWeb Service Based Data Management for Grid Applications
Web Service Based Data Management for Grid Applications T. Boehm Zuse-Institute Berlin (ZIB), Berlin, Germany Abstract Web Services play an important role in providing an interface between end user applications
More informationEnabling the Big Data Commons through indexing of data and their interactions
biomedical and healthcare Data Discovery Index Ecosystem Enabling the Big Data Commons through indexing of and their interactions 2 nd BD2K all-hands meeting Bethesda 11/12/15 Aims 1. Help users find accessible
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
More informationECS 165A: Introduction to Database Systems
ECS 165A: Introduction to Database Systems Todd J. Green based on material and slides by Michael Gertz and Bertram Ludäscher Winter 2011 Dept. of Computer Science UC Davis ECS-165A WQ 11 1 1. Introduction
More informationBringing Business Objects into ETL Technology
Bringing Business Objects into ETL Technology Jing Shan Ryan Wisnesky Phay Lau Eugene Kawamoto Huong Morris Sriram Srinivasn Hui Liao 1. Northeastern University, jshan@ccs.neu.edu 2. Stanford University,
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationBUSINESS VALUE OF SEMANTIC TECHNOLOGY
BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director
More informationMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com
More informationImplementing Ontology-based Information Sharing in Product Lifecycle Management
Implementing Ontology-based Information Sharing in Product Lifecycle Management Dillon McKenzie-Veal, Nathan W. Hartman, and John Springer College of Technology, Purdue University, West Lafayette, Indiana
More informationIO Informatics The Sentient Suite
IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric
More informationWorkflow Requirements (Dec. 12, 2006)
1 Functional Requirements Workflow Requirements (Dec. 12, 2006) 1.1 Designing Workflow Templates The workflow design system should provide means for designing (modeling) workflow templates in graphical
More informationOWL based XML Data Integration
OWL based XML Data Integration Manjula Shenoy K Manipal University CSE MIT Manipal, India K.C.Shet, PhD. N.I.T.K. CSE, Suratkal Karnataka, India U. Dinesh Acharya, PhD. ManipalUniversity CSE MIT, Manipal,
More informationLightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
More information2 Associating Facts with Time
TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points
More informationData Quality in Information Integration and Business Intelligence
Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies
More informationA Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems
Proceedings of the Postgraduate Annual Research Seminar 2005 68 A Model-based Software Architecture for XML and Metadata Integration in Warehouse Systems Abstract Wan Mohd Haffiz Mohd Nasir, Shamsul Sahibuddin
More information