Metadata for Big Data: A preliminary investigation of metadata quality issues in research data repositories
|
|
|
- Oswald Mitchell
- 9 years ago
- Views:
Transcription
1 Information Services & Use 34 (2014) DOI /ISU IOS Press Metadata for Big Data: A preliminary investigation of metadata quality issues in research data repositories Dimitris Rousidis a, Emmanouel Garoufallou b,, Panos Balatsoukas c and Miguel-Angel Sicilia a a University of Alcala, Madrid, Spain b Alexander Technological Educational Institute of Thessaloniki, Thessaloniki, Greece c University of Manchester, Manchester, UK Abstract. Data-driven approaches to scientific research have generated new types of repositories that provide scientists the means necessary to store, share and re-use big data-sets generated at various stages of the research process. As the number and heterogeneity of research data repositories increase, it becomes critical for scientists to solve data quality problems associated to the data-sets stored in these repositories. To date, several authors have been focused on the data quality issues associated to the data-sets stored in the repositories, yet there is little knowledge about the quality problems of the metadata used to describe these data-sets. Metadata is important for the long-term sustainability of research data repositories and data re-use. The aim of the research reported in this paper was to identify the data quality problems associated with the metadata used in the Dryad data repository. The paper concludes with some recommendations for improving the quality of metadata in research data repositories. Keywords: Big Data, data quality, descriptive analysis, metadata, research objects, e-research 1. Introduction The availability of scientific data (big data) and the emergence of cloud computing have radically changed research activities. escience and eresearch applications have extended traditional forms of scholarly e-infrastructure (such as institutional repositories and digital libraries) and enabled scientists to store, access, analyze, use and share datasets generated at various stages of the research process [5]. Given the big volume and diversity of scientific data, research repositories are becoming integral part of the communication and collaboration process between scientists and research groups. Yet problems related to data quality may impede the process of analysing, integrating and re-using heterogeneous datasets. Although several researchers have been focused on the development of new methods to improve the quality of the data stored in research data repositories, e.g. [19,21], there is little research on the data quality issues of the metadata used to describe and annotate datasets in this type of repositories. The use of complete and accurate metadata is important for several processes, including the re-use and sharing * Corresponding author. [email protected] /14/$ IOS Press and the authors. This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License.
2 280 D. Rousidis et al. / Metadata for Big Data of research datasets among scientists; the application of digital curation and data provenance strategies; and the analysis of the contents of research data repositories. The aim of the research reported in this paper was to identify the data quality problems associated with the metadata used in a research data repository, called Dryad. Being this a first attempt to analyse the metadata used in research data repositories [17], the objectives were chiefly exploratory, concretely: To perform a descriptive analysis of the metadata elements used in the Dryad repository; and To identify the main metadata quality issues. This paper is structured as follows: First, a literature review on previous work is presented and the Dryad repository is described. Then the methodology and results of this study are presented. Finally, conclusions and suggestions for further research are reported on the last section of the paper. 2. Background Data quality is defined as the state of completeness, validity, consistency, timeliness and accuracy that makes data suitable for a specific use [6]. Dekkers et al. [3] states that data is of high quality if they are fit for their intended uses in operations, decision making and planning. There is no distinction between the data and metadata quality considerations [3]. The growth, proliferation and evolution of digital objects are accompanied by an analogous transformation of their metadata which causes a consistency issue affecting at the same time their quality [12,13]. In many cases, the larger the dataset, the greater the probability a problem will emerge [2]. Also, research has shown that there are effects of discipline of the quality of metadata, thus suggesting a cultural dimension on data quality (e.g. [1]) Metadata quality issues in repositories In an early study, Sokvitne [18] examined the effectiveness of the metadata elements of the Dublin Core for information retrieval. The study showed several problems especially with popular elements. In particular, the authors found that the DC.title and the DC.subject elements did not add any value for retrieval purposes, while the DC.creator, DC.publisher and DC.contributor elements presented inconsistent name formats [18] concluded the study by questioning the suitability of the Dublin Core for information retrieval unless various problematic issues were resolved. The main issues were that the elements should be populated and used correctly, while precise instructions, descriptions and rules should be set. In addition to general metadata standards, like Dublin Core, researchers have examined metadata quality in the context of specialized repositories, such as architectural repositories [15]; digital libraries [20]; agricultural collections [23]; health databases [16]; and learning object repositories [14]. Despite the heterogeneity of metadata and repositories examined in these studies, there is a common set of metadata issues that appears to influence quality. For example, Barton et al. [2] outlined the areas where metadata element problems most commonly occur. These were: Spelling, abbreviations and other similar data entry errors and ambiguities; Inconsistencies with the Author and other contributor/creators metadata elements; Use of multiple Title elements; Use of correct and standardised terminology (in the case of the Subject metadata element); Inconsistencies with the format of the Date metadata element. While the aforementioned studies provided some evidence about the type of metadata quality issues that apply in the context of information-driven repositories (such as digital libraries and repositories of information resources), there is little known about research data repositories.
3 D. Rousidis et al. / Metadata for Big Data The DRYAD repository Dryad is an open access repository that permits scientists in pure sciences and medicine to store, search, retrieve and re-use research data associated to their scholarly publications. Data are deposited as files with permanent identifiers (DOIs) and metadata. Collections of related files may be grouped into data packages with metadata describing a combined set of files. Currently the repository contains approximately 4500 data packages associated with scholarly articles published in almost 300 international journals [4]. Dryad s developers, by using the Singapore framework metadata architecture in a DSpace environment via an Extensible Markup Language (XML) schema [11,22] and HIVE (Helping Interdisciplinary Vocabulary Engineering), implemented the infrastructure so that the automatically generated metadata inherit characteristics from their original sources by harvesting keywords assigned by authors and controlled vocabularies ontologies [7]. Greenberg et al. initially [10] and [9] performed quantitative studies which were focused on the reusability of the repository s metadata. The main findings of the studies, based on the study of two Dryad workflows, were that 8 out of 12 metadata elements (contributor, corresponding author, identifier citation, subject, publication name, description, relation is referenced by, title) had a reuse at 50% or greater. The researchers concluded that reuse was more common in the case of traditional bibliographic elements; and the generation of more accurate metadata earlier in the metadata workflow is necessary. As opposed to the studies conducted by Greenberg and colleagues on the re-usability of metadata, the research reported in this paper is focused on the identification of the main quality issues related to analysis of the metadata elements of the Dryad repository and how these may affect re-use and the analysis of the contents of research data repositories. 3. Methodology A mechanism that involved the downloading of the metadata elements from the Dryad and their transformation to a proper format for analysis was employed. Metadata was harvested in January At this point the Dryad was holding 4,557 packages, 13,638 data files, 287 journals, 16,595 authors and 751,658 times an instance of the repository was downloaded. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Validator & Data Extraction Tool was used for the metadata harvesting. 1 A total of 516 XML files were downloaded (135 MB). The XML files were merged into a single file using Mergex, a command line tool for merging XML files. 2 Finally, a method to use and analyze the data from the XML files had to be employed. Due to the descriptive nature of the statistical analysis performed it was decided to analyze the data using Microsoft Excel 2010 (as opposed to the use of more data analysis tools, like R). Therefore the XML to CSV Conversion Tool 3 was used to transform the XML files into CSVs and import these to Excel. It is worth mentioning that importing directly the XML file to Excel provided very frustrating results. The converter provided 19 CSV files, each corresponding to a different metadata element: (i) contributor, (ii) coverage, (iii) creator, (iv) date, (v) dc, (vi) format, (vii) header, (viii) identifier, (ix) listrecords, (x) metadata, (xi) record, (xii) relation, (xiii) request, (xiv) responsedate, (xv) resumptiontoken, (xvi) setspec, (xvii) subject, (xviii) title and (xix) type. A selected
4 282 D. Rousidis et al. / Metadata for Big Data sample of metadata elements was analyzed. These were: contributor, creator, date, subject, type, relation, coverage, dc, identifier and title. However, since the focus of this paper is on the presentation of the data quality issues, rather than a detailed description of the contents of the Dryad repository, a small subset of three metadata elements is presented: Creator, Type and Date. These elements represent typical cases where data quality issues can impede the quantitative and qualitative analysis of the Dryad repository, as well as the re-use of the data stored in the repository. 4. Results A significant number of major data problems were identified in the case of the Creator, Date and Type metadata elements. The methodology for the conversion and analysis of data was quite problematic. The noise accumulation and the incorrect assignment of the records to the proper fields were the main problems with the conversion. Data irrelevant to the fields and data misplaced made the initial files difficult to analyze and manipulate making a manual intervention essential. Furthermore, the quality of the data, an issue completely irrelevant with the conversion procedure, was not the anticipated one taking into account Dryad s development Creator The number of contribution per author is depicted on Table 1. In total 16,567 authors, just 28 less than the number of authors referred at the Dryad s webpage, contributed 86,087 objects. As it is shown in Table 1, the majority of creators (i.e. authors of the research objects) contributed between one to five research objects in the repository. Out of 16,568 records, a total of 1,443 (8.71%) demonstrated the following issues: Additional names: Many authors were entered with just their first name. The problem emerged in 614 (42.55%) cases. Also, this percentage included cases where an author s additional names or surname were added as a different record. Use of initials: Another major issue was the use of initials instead of the whole name (11.64%). Different languages: Almost twelve percent (12.06%) of problems occurred with this quality issue. There are numerous variations for writing a name in non-english language. Trying to convert a name Table 1 Amount of objects published by each contributor Number of Number of Number of Number of Number of Number of contributions creators contributions creators contributions creators 1 1, , , , , > Total 16,567
5 D. Rousidis et al. / Metadata for Big Data 283 by the English alphabet may be problematic as there are many symbols that do not exist (for example due to different accents). The most frequent mistakes were made in French, Spanish, Scandinavian, German, Chinese, Balkan and East Europe names. The use of short names and diminutives were also included in this category. Invalid input: A 2.56% of errors occurred due to typos. Typical examples of errors in this category occurred when a first name was missing or when the first name was inserted at the surname field. Dots and commas. The second most frequent type of an error (23.08%) involved the absence of dots or the use of commas at the end of initials. Spacing: Invalid creator entries existed as in a few cases (2.36%) no or too many spaces were inserted during the name input. Miscellaneous: Issues like using irrelevant text (e.g. et al., PhD, status, code, etc.) were grouped in this category (0.83%) Date This metadata element was assigned to various types of a date like date accessed, date available and date issued. For the purpose of this analysis we gathered the dates corresponding to the date issued of the 43,453 objects in the repository. According to the cataloging guidelines of Dyad s wiki, 4 the DC.date.issued is the official date of publication, inherited by the dataset; i.e. the date of the formal issuance of the resource. The distribution per year is depicted in Table 2. The growth of the Dryad Repository over time based on the objects issued date is shown in Fig. 1. It should be noted that there are two abnormalities in the flow of the records within the repository. On October ,572 publications where entered when the previous month the amount was a few dozens and on April 2011 the number was skyrocketed to around 23,000, more than half (52.67%) of the total publications of the repository. Since it is highly unlike that on a single month half of the input of the repository was published it seems that there is mix-up with date issued and the date input in Dryad. Major quality issues occurred also in the case of the DC.date element. Most of these included: Lack of consistency in the format. For example, four dates from , 321 dates after the date that the metadata was harvested, 476 dates equal to 1/1/9999 and 40 dates that were blank or with text; and Lack of standardization of the date format. Table 3 shows the inconsistency in the length of this element s values which varies from being blank to 20 characters long. Table 2 Contributions per creator Date Amount of contributions Date Amount of contributions Date Amount of contributions , , , , /1/2014 9/1/ Invalid input
6 284 D. Rousidis et al. / Metadata for Big Data Fig. 1. Growth over time. (Colors are visible in the online version of the article; Table 3 Length of issued date Length Count Example , T10:19:28Z Various 41 Blanks, unacceptable format 4.3. Type A total of 53,598 records were retrieved for the DC.type element and their distribution is shown in Table 4. In the type field is shown the exact text that was found in the type field, except from blank were actually there was nothing inserted. As shown in Table 4, the Dataset type holds the vast majority of the DC.type element with 70.17%, followed by the Article with 8.30%. However, it is apparent that there are types in the table that should not appear in a first place like custom, blanks, none, oneyear, protocol and untilarticleappears. According to the Dryad s Cataloging Guidelines 5 the DC.type element is the Code indicating the type of file. This is automatically detected by DSpace, but can be modified manually. Obviously there are issues with the automatic detection and irrelevant/unrelated with the DC.type entries were inserted. If 5
7 D. Rousidis et al. / Metadata for Big Data 285 Table 4 Type distribution of objects Type Amount Percentage % Type Amount Percentage % Activity Image Article 4, Map Book None 4, Blank Oneyear Custom Protocol Dataset 36, UntilArticleAppears 6, we cleaned the data and left only the suitable type files, then 42,129 records would remain and the percentages would change as follows: Activity 0.009%; Article %; Book 0.007%; Dataset %; Image 0.147%; and Map 0.002%. Consequently, nearly 90% of the stored files were datasets and nearly 10% were articles. Almost twenty percent (21.4%) of the records in the DC.type metadata element was jargon or blank or completely irrelevant to the element. The absence of data control and quality was more than obvious. As with the other elements a mechanism that will allow only correct data entry has to be employed. 5. Conclusions The purpose of this study was to illustrate some of the main data quality issues associated with the use of metadata in the Dryad Repository. Our study validated this assumption as major issues were identified in the case of the DC.creator, DC.Date and DC.type metadata elements. In addition to the reusability of research data, addressing issues related to data quality of metadata in the Dryad repository is important for the accurate analysis and monitoring of the growth of the repository. In order to address the aim of this study all metadata from the Dryad repository were harvested and analyzed. A plethora of data misuse issues were identified; issues that constitute the data inappropriate for text mining or data mining purposes. A mechanism that secures the metadata input from the issues that we identified needs to be employed. Data control would make repositories far more appealing and sustainable. For example, a solid format for the creators names should be specified. Each creator and contributor should be assigned with a unique ID that would hold their full name (e.g. When requesting an entry of the full name at the repository this unique ID should be inserted. If for any reason the creator wishes to change the name, then all of the records related with the name should be updated automatically, through the unique ID. In the case of dates, input should follow the same format (e.g. dd-mm-yyyy). Validation rules must be applied when each date is entered (e.g. it is more than obvious that a date cannot be posterior than the current date or prior than the creator s birthday). In the case of the Type metadata element, inconsistencies can be fixed through the use of pre-defined lists of values for authors to select from. Finally, we validated the fact that poor quality metadata have drastic impact on the results of the quantitative analysis of the repository s metadata elements. Our future work will be focused on the analysis of the remaining metadata elements of the Dryad repository. More elaborate statistical analysis by using R will be employed and data mining and text mining techniques will be applied to provide a better understanding of the repository s data, to identify associations, clusters or hidden patterns and to develop novel visualisations for displaying the contents of research data repositories based on the analysis of their metadata [8].
8 286 D. Rousidis et al. / Metadata for Big Data References [1] P. Balatsoukas, A. O Brien and A. Morris, The effects of discipline on the application of learning object metadata in UK Higher Education: the case of the JORUM repository, Information Research 16(3) (2011). [2] J. Barton, S. Currier and J.M.N. Hey, Building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice, in: Proceeding of 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice Metadata Research and Applications, 2013, pp [3] M. Dekkers, N. Loutas, M. De Keyzer and S. Goedertier, Open data and metadata quality, 2013, available at: v0.09_en.pdf [25/01/2014]. [4] Dryad Digital Repository, Frequently asked questions, available at: [29/12/2013]. [5] E. Garoufallou and C. Papatheodorou, Editorial Special issue on Metadata for e-science and e-research, International Journal of Metadata Semantics and Ontologies 9(1) (2014), 1 4. [6] K. Gordon, Principles of Data Management, Facilitating Information Sharing, [7] J. Greenberg, Theoretical considerations of lifecycle modeling: an analysis of the Dryad repository demonstrating automatic metadata propagation, inheritance, and value system adoption, Cataloguing & Classification Quarterly 47(3,4) (2009), [8] J. Greenberg and E. Garoufallou, Change and a future for metadata, in: Metadata and Semantic Research: 7th Research Conference, MTSR 2013, Thessaloniki, Greece, November 19 22, 2013, Proceedings. Communications in Computer and Information Science (CCIS), E. Garoufallou and J. Greenberg, eds, Vol. 390, 2013, pp [9] J. Greenberg, S. Swauger and E.M. Feinstein, Metadata capital in a data repository, in: Proceedings of the International Conference on Dublin Core and Metadata Applications, 2013, pp [10] J. Greenberg and T. Vision, The Dryad repository: A new path for data publication in scholarly communication, OCLC, Dublin, Ohio, 2011, available at: [22/1/2014]. [11] J. Greenberg, H.C. White, S. Carrier and R. Scherle, A metadata best practice for a scientific data repository, Journal of Library Metadata 9(3) (2009), [12] D. Lee, Practical maintenance of evolving metadata for digital preservation: Algorithmic solution and system support, International Journal on Digital Libraries 6(4) (2007), [13] X. Ochoa and E. Duval, Automatic evaluation of metadata quality in digital repositories, International Journal on Digital Libraries 10(2) (2009), [14] N. Palavitsinis, N. Manouselis and S. Sanchez-Alonso, Metadata quality in digital repositories: empirical results from the cross-domain transfer of a quality assurance process, Journal of the Association of Information Science and Technology 65(6) (2014), [15] E. Park, Building interoperable Canadian architecture collections: Initial metadata assessment, The Electronic Library 25(2) (2007), [16] D. Rousidis, E. Garoufallou, P. Balatsoukas, K. Paraskevopoulos, S. Asderi and D. Koutsomicha, Metadata requirements for repositories in health informatics research: evidence from the analysis of social media citations, in: MTSR 2013, E. Garoufallou and J. Greenberg, eds, 2013, pp [17] D. Rousidis, E. Garoufallou, P. Balatsoukas and M.-A. Sicilia, Data quality issues and content analysis for research data repositories: The case of Dryad, ELPUB2014. Let s put data to use: digital scholarship for the next generation, in: 18th International Conference on Electronic Publishing, Thessaloniki, Greece, June 2014, available at: [18] L. Sokvitne, An evaluation of the effectiveness of current Dublin core metadata for retrieval, in: Proceedings of VALA 2000, Victorian Association for Library Automation, Melbourne, [19] B. Stvilia, C. Hinnant, S. Wu, A. Worall, D.J. Lee, K. Burnett, G. Burnett, M. Kazmer and P. Marty, Research project tasks, data and perceptions of data quality in a condensed matter physics community, Journal of the Association for Information Science and Technology, in press. [20] M.H. Vinagre, L.G. Pinto and P. Ochôa, Revisiting digital libraries quality: a multiple-item scale approach, Performance Measurement and Metrics 12(3) (2011), [21] N.G. Weiskopf and C. Weng, Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research, JAMIA 20(1), [22] H. White, S. Carrier, A. Thompson, J. Greenberg and R. Scherle, The Dryad data repository: A Singapore framework metadata architecture in a DSpace environment, in: DC 2008, The 2008 International Conference on Dublin Core and Metadata Applications, Berlin, [23] T. Zechocke and J. Beniest, Adapting a quality assurance framework for creating educational metadata in an agricultural learning repository, The Electronic Library 29(2) (2011),
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
How To Useuk Data Service
Publishing and citing research data Research Data Management Support Services UK Data Service University of Essex April 2014 Overview While research data is often exchanged in informal ways with collaborators
Checklist for a Data Management Plan draft
Checklist for a Data Management Plan draft The Consortium Partners involved in data creation and analysis are kindly asked to fill out the form in order to provide information for each datasets that will
Service Road Map for ANDS Core Infrastructure and Applications Programs
Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external
Notes about possible technical criteria for evaluating institutional repository (IR) software
Notes about possible technical criteria for evaluating institutional repository (IR) software Introduction Andy Powell UKOLN, University of Bath December 2005 This document attempts to identify some of
Building integration environment based on OAI-PMH protocol. Novytskyi Oleksandr Institute of Software Systems NAS Ukraine [email protected].
Building integration environment based on OAI-PMH protocol Novytskyi Oleksandr Institute of Software Systems NAS Ukraine [email protected] Roadmap What is OAI-PMH? Requirements for infrastructure Step by
How To Write A Blog Post On Globus
Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, [email protected] Jim Pruyne, University of Chicago Computation Institute, [email protected]
Building Semantic Content Management Framework
Building Semantic Content Management Framework Eric Yen Computing Centre, Academia Sinica Outline What is CMS Related Work CMS Evaluation, Selection, and Metrics CMS Applications in Academia Sinica Concluding
THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy 2015-2018. Page 1 of 8
THE BRITISH LIBRARY Unlocking The Value The British Library s Collection Metadata Strategy 2015-2018 Page 1 of 8 Summary Our vision is that by 2020 the Library s collection metadata assets will be comprehensive,
Integrity 10. Curriculum Guide
Integrity 10 Curriculum Guide Live Classroom Curriculum Guide Integrity 10 Workflows and Documents Administration Training Integrity 10 SCM Administration Training Integrity 10 SCM Basic User Training
A Capability Maturity Model for Scientific Data Management
A Capability Maturity Model for Scientific Data Management 1 A Capability Maturity Model for Scientific Data Management Kevin Crowston & Jian Qin School of Information Studies, Syracuse University July
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Dr Liz Lyon, UKOLN, University of Bath Introduction and Objectives UKOLN is undertaking
Data Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse Mercè Crosas, Ph.D. Twitter: @mercecrosas Director of Data Science Institute for Quantitative Social Science, Harvard University MIT, May 6, 2014 Intro to our Data
The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols
The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols Claudia Nicolai; Imma Subirats; Stephen Katz Food and Agriculture Organization of the United
School of Nursing University of Minnesota Informatics Competencies across the Curriculum
Informatics Competencies across the Curriculum Data - basic elements & related standards BSN DNP PhD Resources Course Assignment Describe the value of using standardized terminologies and messaging standards
OPENGREY: HOW IT WORKS AND HOW IT IS USED
OPENGREY: HOW IT WORKS AND HOW IT IS USED CHRISTIANE STOCK [email protected] INIST-CNRS, France Abstract OpenGrey is a unique repository providing open access to European grey literature references,
Databases & Data Infrastructure. Kerstin Lehnert
+ Databases & Data Infrastructure Kerstin Lehnert + Access to Data is Needed 2 to allow verification of research results to allow re-use of data + The road to reuse is perilous (1) 3 Accessibility Discovery,
Principles of Distributed Database Systems
M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition
General concepts: DDI
General concepts: DDI Irena Vipavc Brvar, ADP SEEDS Kick-off meeting, Lausanne, 4. - 6. May 2015 How to describe our survey What we learned so far: If we want to use data at some point in the future,
data.bris: collecting and organising repository metadata, an institutional case study
Describe, disseminate, discover: metadata for effective data citation. DataCite workshop, no.2.. data.bris: collecting and organising repository metadata, an institutional case study David Boyd data.bris
On the general structure of ontologies of instructional models
On the general structure of ontologies of instructional models Miguel-Angel Sicilia Information Engineering Research Unit Computer Science Dept., University of Alcalá Ctra. Barcelona km. 33.6 28871 Alcalá
Research Data Management Guide
Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that
Digital libraries of the future and the role of libraries
Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their
MultiMimsy database extractions and OAI repositories at the Museum of London
MultiMimsy database extractions and OAI repositories at the Museum of London Mia Ridge Museum Systems Team Museum of London [email protected] Scope Extractions from the MultiMimsy 2000/MultiMimsy
STORRE: Stirling Online Research Repository Policy for etheses
STORRE: Stirling Online Research Repository Policy for etheses Contents Content and Collection Policies Definition of Repository Structure Content Guidelines Submission Process Copyright and Licenses Metadata
Workprogramme 2014-15
Workprogramme 2014-15 e-infrastructures DCH-RP final conference 22 September 2014 Wim Jansen einfrastructure DG CONNECT European Commission DEVELOPMENT AND DEPLOYMENT OF E-INFRASTRUCTURES AND SERVICES
Knowledge Management using Open Source Repository
Knowledge Management using Open Source Repository GIULIO CONCAS, FILIPPO EROS PANI, MARIA ILARIA LUNESU Department of Electric and Electronic Engineering, Agile Group University of Cagliari Piazza d Armi,
Monitoring and Reporting Drafting Team Monitoring Indicators Justification Document
INSPIRE Infrastructure for Spatial Information in Europe Monitoring and Reporting Drafting Team Monitoring Indicators Justification Document Title Creator Justification document Creation date 2008-12-15
SQL Server Integration Services Design Patterns
SQL Server Integration Services Design Patterns Second Edition Andy Leonard Tim Mitchell Matt Masson Jessica Moss Michelle Ufford Apress* Contents J First-Edition Foreword About the Authors About the Technical
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness Melanie Dulong de Rosnay Fellow, Science Commons and Berkman Center for Internet & Society at Harvard University This article
Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan
Metadata for Data Discovery: The NERC Data Catalogue Service Steve Donegan Introduction NERC, Science and Data Centres NERC Discovery Metadata The Data Catalogue Service NERC Data Services Case study:
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories MacKenzie Smith, Associate Director for Technology Massachusetts Institute of Technology Libraries, Cambridge,
A Framework to Assess Healthcare Data Quality
The European Journal of Social and Behavioural Sciences EJSBS Volume XIII (eissn: 2301-2218) A Framework to Assess Healthcare Data Quality William Warwick a, Sophie Johnson a, Judith Bond a, Geraldine
Using Dataverse Virtual Archive Technology for Research Data Management. Jonathan Crabtree Thu-Mai Christian Amanda Gooch
Using Dataverse Virtual Archive Technology for Research Data Management Jonathan Crabtree Thu-Mai Christian Amanda Gooch H. W. Odum Institute Archive Services The Howard W. Odum Institute was founded in
Scholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
A TOOL FOR METADATA ANALYSIS
Working Paper Series ISSN 1177-777X A TOOL FOR METADATA ANALYSIS David M. Nichols Chu-Hsiang Chan David Bainbridge Dana McKay and Michael B. Twidale Working Paper: 02/2008 February 2008 2008 David M. Nichols,
SHared Access Research Ecosystem (SHARE)
SHared Access Research Ecosystem (SHARE) June 7, 2013 DRAFT Association of American Universities (AAU) Association of Public and Land-grant Universities (APLU) Association of Research Libraries (ARL) This
D5.5 Initial EDSA Data Management Plan
Project acronym: Project full : EDSA European Data Science Academy Grant agreement no: 643937 D5.5 Initial EDSA Data Management Plan Deliverable Editor: Other contributors: Mandy Costello (Open Data Institute)
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
SQL Server 2012. Integration Services. Design Patterns. Andy Leonard. Matt Masson Tim Mitchell. Jessica M. Moss. Michelle Ufford
SQL Server 2012 Integration Services Design Patterns Andy Leonard Matt Masson Tim Mitchell Jessica M. Moss Michelle Ufford Contents J Foreword About the Authors About the Technical Reviewers Acknowledgments
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
TABLE OF CONTENT IDENTIFICATION OF CORE COMPETENCIES FOR 35 SOFTWARE ENGINEERS
TABLE OF CONTENT DECLARATION BY THE SCHOLAR SUPERVISOR S CERTIFICATE ACKNOWLEDGEMENTS ABSTRACT LIST OF FIGURES LIST OF TABLES iv vi xiv xvi xviii xix CHAPTER-1 INTRODUCTION 1 1.1 BASIS FOR THE NEED FOR
RCUK Policy on Open Access and Supporting Guidance
RCUK Policy on Open Access and Supporting Guidance 1. Introduction (i) Free and open access to the outputs of publicly funded research offers significant social and economic benefits as well as aiding
DRIVER Providing value-added services on top of Open Access institutional repositories
DRIVER Providing value-added services on top of Open Access institutional repositories Dr Dale Peters Scientific Technical Manager : DRIVER SUB Goettingen Germany Gaining the momentum: Open Access and
Web Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
ELPUB Digital Library v2.0. Application of semantic web technologies
ELPUB Digital Library v2.0 Application of semantic web technologies Anand BHATT a, and Bob MARTENS b a ABA-NET/Architexturez Imprints, New Delhi, India b Vienna University of Technology, Vienna, Austria
Implementation of Open Researcher and Contributor ID. (ORCID) at a Large Academic Institution
Implementation of Open Researcher and Contributor ID (ORCID) at a Large Academic Institution Merle Rosenzweig*, AMLS Informationist Taubman Health Sciences Library, University of Michigan Abstract ORCID
Vilas Wuwongse, Thiti Vacharasintopchai, Neelawat Intaraksa Asian Institute of Technology www.ait.asia
A Common Infrastructure for Digital Contents Vilas Wuwongse, Thiti Vacharasintopchai, Neelawat Intaraksa Asian Institute of Technology www.ait.asia Outline Introduction Issues Proposed Approach A Common
Metadata Repositories in Health Care. Discussion Paper
Health Care and Informatics Review Online, 2008, 12(3), pp 37-44, Published online at www.hinz.org.nz ISSN 1174-3379 Metadata Repositories in Health Care Discussion Paper Dr Karolyn Kerr [email protected]
A Java Tool for Creating ISO/FGDC Geographic Metadata
F.J. Zarazaga-Soria, J. Lacasta, J. Nogueras-Iso, M. Pilar Torres, P.R. Muro-Medrano17 A Java Tool for Creating ISO/FGDC Geographic Metadata F. Javier Zarazaga-Soria, Javier Lacasta, Javier Nogueras-Iso,
Integration of Polish National Bibliography within the repository platform for science and humanities
Marcin Roszkowski Integration of Polish National Bibliography within the repository platform for science and humanities The best thing to do to your data will be thought of by somebody else W3C LLD Agenda
I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION
Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University
The Data Warehouse Challenge
The Data Warehouse Challenge Taming Data Chaos Michael H. Brackett Technische Hochschule Darmstadt Fachbereichsbibliothek Informatik TU Darmstadt FACHBEREICH INFORMATIK B I B L I O T H E K Irwentar-Nr.:...H.3...:T...G3.ty..2iL..
Implementing and Administering an Enterprise SharePoint Environment
Implementing and Administering an Enterprise SharePoint Environment There are numerous planning and management issues that your team needs to address when deploying SharePoint. This process can be simplified
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
Annotea and Semantic Web Supported Collaboration
Annotea and Semantic Web Supported Collaboration Marja-Riitta Koivunen, Ph.D. Annotea project Abstract Like any other technology, the Semantic Web cannot succeed if the applications using it do not serve
Information and documentation The Dublin Core metadata element set
ISO TC 46/SC 4 N515 Date: 2003-02-26 ISO 15836:2003(E) ISO TC 46/SC 4 Secretariat: ANSI Information and documentation The Dublin Core metadata element set Information et documentation Éléments fondamentaux
CLIR/CIC - Archives for Non-Archivists Workshop Introduction to The Archivists Toolkit. Introduction
Introduction The Archivists Toolkit, or the AT, is the first open source archival data management system to provide broad, integrated support for the management of archives. It is intended for a wide range
1. Who can use Agent Portal? 2. What is the definition of an active agent? 3. How to access Agent portal? 4. How to login?
1. Who can use Agent Portal? Any active agent who is associated with Future Generali Life Insurance Company Limited can logon to Agent Portal 2. What is the definition of an active agent? An agent, whose
Research Data Archival Guidelines
Research Data Archival Guidelines LEROY MWANZIA RESEARCH METHODS GROUP APRIL 2012 Table of Contents Table of Contents... i 1 World Agroforestry Centre s Mission and Research Data... 1 2 Definitions:...
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
RESEARCH DATA MANAGEMENT POLICY
Document Title Version 1.1 Document Review Date March 2016 Document Owner Revision Timetable / Process RESEARCH DATA MANAGEMENT POLICY RESEARCH DATA MANAGEMENT POLICY Director of the Research Office Regular
Multi-domain Research Data Description
Multi-domain Research Data Description Fostering the participation of researchers in a ontology-based data management environment João Aguiar Castro Faculdade de Engenharia da Universidade do Porto / INESC
DATA QUALITY AND SCALE IN CONTEXT OF EUROPEAN SPATIAL DATA HARMONISATION
DATA QUALITY AND SCALE IN CONTEXT OF EUROPEAN SPATIAL DATA HARMONISATION Katalin Tóth, Vanda Nunes de Lima European Commission Joint Research Centre, Ispra, Italy ABSTRACT The proposal for the INSPIRE
SERVICE-ORIENTED MODELING FRAMEWORK (SOMF ) SERVICE-ORIENTED SOFTWARE ARCHITECTURE MODEL LANGUAGE SPECIFICATIONS
SERVICE-ORIENTED MODELING FRAMEWORK (SOMF ) VERSION 2.1 SERVICE-ORIENTED SOFTWARE ARCHITECTURE MODEL LANGUAGE SPECIFICATIONS 1 TABLE OF CONTENTS INTRODUCTION... 3 About The Service-Oriented Modeling Framework
An Introduction to Managing Research Data
An Introduction to Managing Research Data Author University of Bristol Research Data Service Date 1 August 2013 Version 3 Notes URI IPR data.bris.ac.uk Copyright 2013 University of Bristol Within the Research
OvidSP Quick Reference Guide
OvidSP Quick Reference Guide Opening an OvidSP Session Open the OvidSP URL with a browser or Follow a link on a web page or Use Athens or Shibboleth access Select Resources to Search In the Select Resource(s)
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.
Master Data Management
Master Data Management David Loshin AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO Ик^И V^ SAN FRANCISCO SINGAPORE SYDNEY TOKYO W*m k^ MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER
Security Metrics. A Beginner's Guide. Caroline Wong. Mc Graw Hill. Singapore Sydney Toronto. Lisbon London Madrid Mexico City Milan New Delhi San Juan
Security Metrics A Beginner's Guide Caroline Wong Mc Graw Hill New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Contents FOREWORD
MicroStrategy Course Catalog
MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY
Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries
Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries Andrew Weidner University of Houston, USA [email protected] Annie Wu University of Houston,
Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources
Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Investigating Automated Sentiment Analysis of Feedback Tags in a Programming Course Stephen Cummins, Liz Burd, Andrew
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
A grant number provides unique identification for the grant.
Data Management Plan template Name of student/researcher(s) Name of group/project Description of your research Briefly summarise the type of your research to help others understand the purposes for which
A. Document repository services for EU policy support
A. Document repository services for EU policy support 1. CONTEXT Type of Action Type of Activity Service in charge Associated Services Project Reusable generic tools DG DIGIT Policy DGs (e.g. FP7 DGs,
Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer
Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,
OCLC CONTENTdm. Geri Ingram Community Manager. Overview. Spring 2015 CONTENTdm User Conference Goucher College Baltimore MD May 27, 2015
OCLC CONTENTdm Overview Spring 2015 CONTENTdm User Conference Goucher College Baltimore MD May 27, 2015 Geri Ingram Community Manager Overview Audience This session is for users library staff, curators,
