Project Data Archiving Lessons from a Case Study



Similar documents
Using Excel for Statistics Tips and Warnings

Disciplined Use of Spreadsheet Packages for Data Entry

Case Study No. 6. Good practice in data management

Multi-Environment Trials: Data Quality Guide

A grant number provides unique identification for the grant.

Complying with the Records Management Code: Evaluation Workbook and Methodology. Module 8: Performance measurement

Competent Data Management - a key component

The guidance applies to all records, regardless of the medium in which they are held, including , spreadsheets, databases and paper files.

MICRO-COMPUTER BASED REAL ESTATE DECISION MAKING AND INFORMATION MANAGEMENT - AN INTEGRATED APPROACH

Corporate Records Management Policy

A Mapping of the Victorian Electronic Records Strategy Schema to openehr

Changes to the Energy Performance of Buildings Framework. Policy update 5 Energy Performance Certificate compliance and enforcement

User Guide to Retention and Disposal Schedules Council of Europe Records Management Project

REFLECTING ON EXPERIENCES OF THE TEACHER INDUCTION SCHEME

Information Management Policy CCG Policy Reference: IG 2 v4.1

Opus: University of Bath Online Publication Store

LJMU Research Data Policy: information and guidance

Management of Research Data Procedure

An introduction to using Microsoft Excel for quantitative data analysis

Ardington Archives. Components of the process. Audit of needs. Cataloguing. Retrieval. Collection and deposit. Destruction. The archiving.

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

World Health Organization

Records Management - Department of Health

Research Data Archival Guidelines

Assessment of compliance with the Code of Practice for Official Statistics

An Introduction to Managing Research Data

Module 2: Introduction to Quantitative Data Analysis

Data Management for Multi-Environment Trials in Excel

Archiving and Backup - The Basics

An Export Marketing Research Project. Guidance Notes

Checklist for a Data Management Plan draft

WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE

XBRL guide for UK businesses

QUANTITATIVE ANALYSIS APPROACHES TO QUALITATIVE DATA: WHY, WHEN AND HOW - Savitri Abeyasekera

Timescapes Methods Guides Series 2012 Guide No. 17

Training, Support and Development Standards for Foster Care

Higher National Unit Specification. General information for centres. Occupational Therapy Support: Audit. Unit code: F3NE 34

D1.3 Data Management Plan

Management Accounting and Excel - A Book Summary

Benefits of travel surveys

Checklist and guidance for a Data Management Plan

AN OVERVIEW OF VULNERABILITY ANALYSIS AND MAPPING (VAM)

Suite. How to Use GrandMaster Suite. Backup and Restore

Records Disposal Schedule Anti-Discrimination Services Northern Territory Anti-Discrimination Commission

Research Data Management Procedures

Consultation Response Medical profiling and online medicine: the ethics of 'personalised' healthcare in a consumer age Nuffield Council on Bioethics

Guide 4 Keeping records to meet corporate requirements

THE BASIC BACKUP GUIDE TAPE BACKUP STRATEGIES FOR THE SMALL-TO-MEDIUM BUSINESS

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

ACP-NEP Co-ord (Smith, Lyn C2) Military Goods: A400M Collaborative Programme OPEN GENERAL EXPORT LICENCE APRIL 2014

Information Management Advice 39 Developing an Information Asset Register

Research Data Management Policy

Project Plan DATA MANAGEMENT PLANNING FOR ESRC RESEARCH DATA-RICH INVESTMENTS

ACCOUNTING IN COMPUTERISED ENVIRONMENT

White Paper. The integration of Formate and Alchemy

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

TEACHING OF STATISTICS IN KENYA. John W. Odhiambo University of Nairobi Nairobi

Strategy for data collection

How to achieve a successful 360-Degree Appraisal

UTILITIES BACKUP. Figure 25-1 Backup & Reindex utilities on the Main Menu

PUBLIC HEALTH LIMITED OPEN CALL SCHEME (LOCS) PHA202 RESEARCH BRIEFING

Royal Borough of Kensington and Chelsea. Data Quality Framework. ACE: A Framework for better quality data and performance information

ITIL and ISO/IEC How ITIL can be used to support the delivery of compliant practices for Information Security Management Systems

Summary of feedback on Big data and data protection and ICO response

Getting Started With SPSS

Quality Management Systems Foundation Training Course

DATA LIFE CYCLE & DATA MANAGEMENT PLANNING

Research Data Management. Peter Muraya, ICRAF, Kenya Cathy Garlick, SSC, UK Richard Coe, ICRAF, Kenya

Scotland s Commissioner for Children and Young People Records Management Policy

International Workshop Agreement 2 Quality Management Systems Guidelines for the application of ISO 9001:2000 on education.

Information Management Advice 62 Help! We're moving

ANU Electronic Records Management System (ERMS) Manual

University of Cambridge: Programme Specifications CHEMICAL ENGINEERING TRIPOS

archiving for records management

INFORMATION UPDATE: Removable media - Storage and Retention of Data - Research Studies

Chapter 8. Secondary Storage. McGraw-Hill/Irwin. Copyright 2008 by The McGraw-Hill Companies, Inc. All rights reserved.

Evaluating health & wellbeing interventions for healthcare staff: Key findings

Life Cycle of Records

2. Issues using administrative data for statistical purposes

ICT Strategy

Information Management Policy

MRes Psychological Research Methods

USE OF INFORMATION SOURCES AMONGST POSTGRADUATE STUDENTS IN COMPUTER SCIENCE AND SOFTWARE ENGINEERING A CITATION ANALYSIS YIP SUMIN

Why SAAS makes sense: The benefits of Cloud Computing for Archiving

LORD CHANCELLOR S CODE OF PRACTICE ON THE MANAGEMENT OF RECORDS UNDER

Transcription:

Project Data Archiving Lessons from a Case Study March 1998 The University of Reading Statistical Services Centre Biometrics Advisory and Support Service to DFID

Contents 1. Introduction 3 2. Why preserve project data? 4 3. What should be archived? 5 4. What is a good data archive? 6 5. Medium of dissemination 7 6. Example of a successful data archiving exercise 8 7. What does the archive comprise? 9 8. Further Information 10 Acknowledgements Statistical Services Centre staff members Savitri Abeyasekera and Carlos Barahona involved in the case study work described below, are pleased to acknowledge their indebtedness to members of the ELUS project team in Malawi, 1995-1997: Joanne Bosworth, Rural Sociologist, Jasper Steele, Farming Systems Economist, and especially to Steve Gossage, ELUS Project Team Leader, for their generous help, encouragement and hospitality during the creation of this data archive. Thanks are also due to Harry Potter, DFID Natural Resources Adviser in Malawi, for his support to this project. 1998 Statistical Services Centre, The University of Reading, UK.

1. Introduction An integral part of many research projects is the collection of survey or experimental data at considerable expense, time and effort. Great care may be taken to produce good quality data at both the data collection and computerisation stages, but there is usually little emphasis on ensuring that the data are available to other users in a form that will allow the data to be readily understood and correctly used in subsequent studies. Ideally the creation of an archive must be integrated with the ongoing work of a project rather than being an afterthought, on which time may run out when the team is dispersed. This brief note is intended to raise awareness of the importance of preserving project data, to discuss characteristics that make for a good data archive, and to provide an example of a successful archive. SSC 1998 Archiving Case Study 3

2. Why preserve project data? In the past, the generally-available records of project data have often been only in publications where limited space was available to summarise key features relevant to the specific slants of the papers concerned. Modern computing and data storage facilities mean there is now no technical reason why much more detailed data should not be preserved and readily reproduced, in a form where it can be accessed and used by others. As part of the case for support for certain projects, proposers may argue that quantitative information produced would be of relatively long-term or wide-ranging interest and certainly producing results of only ephemeral value should not recommend a project. This implies a duty on the project team to document and archive data collected in the course of the work. Given a worthwhile project, such a record is potentially valuable to secondary users and later workers, if they are given the opportunity to extract information in a form where it will make their own work more effective. Guaranteeing to add this value strengthens the case for funding the initial project. 4 SSC 1998 Archiving Case Study

3. What should be archived? There are three main types of information which need to be accurately recorded: the project main data themselves, not just summary tables; the record of how and why data were acquired, and what they represent; and documentation about computer files which will allow later data retrieval. To be useful beyond the project lifespan, archives need to be in an organised form, in almost all cases computerised. Files should be backed up, with a securely stored master version, and should have a system set up during the project to make full or partial copies accessible to legitimate users thereafter. General principles of data quality control apply at all stages, e.g. during the definition and development of the data to be collected, data acquisition, and the creation of final computerised datafiles. SSC 1998 Archiving Case Study 5

4. What is a good data archive? Many characteristics determine the production of a good data archive. In brief: Accessibility, so that users can reach the stored information via widely-available software. Ease of use, by ensuring that (i) the data archiving structure is simple so that the relationship between the forms used in the field and the computerised information is evident; (ii) there are clear definitions of variables stored in the archive (e.g. units of measurement) and codes used (labels for categorical variates, etc.); and (iii) there is consistency in names, codes, units of measurement, and abbreviations throughout the archive. Reliability must be ensured with the archive as free of errors as can be managed within the timescale and budget of the project. Documentation viz. (i) procedures used for data collection including sampling methodology and sampling units used, (ii) the structure of the archive, e.g. how different files link together, (iii) a list of computer files comprising the archive, (iv) a full list of all variables including notes on how missing values are treated, (v) summary statistics that allow the user to cross-check if the information retrieved corresponds to that required, and (vi) relevant warnings and comments relating to any part of the database. Preservation of anonymity or any conditions of confidentiality with which the data sets were made available by the sources. Completeness as far as that is possible and useful. The archive should include a computer file copy of (i) the field forms; (ii) the data management log-book; (iii) descriptions of derived variables, and (iv) special comments and observations. 6 SSC 1998 Archiving Case Study

5. Medium of dissemination The medium for dissemination of project data has to be considered in planning any form of archiving. So does the choice of items to be disseminated which clearly may be selective, e.g. because of confidentiality of some data. The argument made at the present time is that: it is easier and cheaper to duplicate floppy disks than to photocopy lots of reports, and it is easier to re-use numerical information if it is disseminated in computer-readable form. In future it should become possible to disseminate data on CDs, including GIS data, but perhaps also large images with built-in software to let the user view them. At present this more advanced work looks in many cases too expensive, and too demanding to expect of the facilities available to project staff or those who will use the data. For the moment it seems that items like aerial photographs should be lodged in a place where they can be preserved safely for a reasonably long time, and which has the capacity to copy negatives and positives for legitimate users. Of course details of how to obtain copies should be part of the archive information! SSC 1998 Archiving Case Study 7

6. Example of a successful data archiving exercise The Statistical Services Centre (SSC) at the University of Reading was closely involved with statistical aspects of the Estate Land Utilisation Study (ELUS) in Malawi a large nationwide survey carried out over the period from mid-1995 to mid- 1997 and funded by ODA (now DFID). As part of this involvement, a proposal was made for archiving the large volume of detailed information collected about the socioeconomic structure and utilisation of land within the estate sector of Malawi. The proposal was supported by the NR Adviser in Malawi and the ELUS project team. The main survey involved a 125- and a 411-response questionnaire, while three subsequent and more detailed sub-sample studies used longer questionnaires. Two additional but smaller surveys were also a part of the project. A member of SSC staff, who had considerable experience of all the computer packages concerned, visited Malawi for three weeks in February/March 1997 to carry out the archiving exercise. An additional week was needed after the visit to complete the archive and its documentation. The time needed depends on the length of the questionnaires concerned and the quality of the data available to the data archiving consultant. While the ELUS team had paid close attention to ensuring that their main datasets were as free as possible from errors, we note on the basis of other experiences that data-cleaning can be immensely time-consuming. The archiving exercise involved a one-day workshop to a few identified users of the ELUS database. The aim was to familiarise the participants with the archive structure and organisation and to get their views on ways to improve the archiving procedure. Archiving the ELUS database successfully in the time span described was possible as the work was approved, funded and completed within the life span of the project, as SSC staff were familiar with the ELUS data structure, and because of cooperation by the ELUS team in making all relevant information available on disk at the appropriate time. 8 SSC 1998 Archiving Case Study

7. What does the archive comprise? In the case of ELUS, a large A4 ringbinder contains three write-protected floppy disks of zipped (compressed) files, and software to allow these to be automatically restored into 15Mb of hard disk space. The data files are all included in duplicate in two common formats as SPSS portable files and dbase IV files, at least one of which should be accessible to users for many years to come. Word 6 was used for text files, including the full description of the sampling schemes. On paper, there is a ten-page introduction and a summary sampling report. There are then several hundred pages giving details, for each questionnaire used, of every file and every variable. For each file the description includes the number of cases, the number of variables, the full list of variables declared, and of variable labels and value labels. For each variable the description includes the variable name and label, the minimum and maximum values and the number of valid (non-missing) cases stored. The volume weighs 1.65 Kg. Thirty copies were prepared, five including some information rated as confidential for commercial reasons or to protect the anonymity of respondents. Both types of copy were appropriately distributed e.g. amongst offices of the Government of Malawi, academic institutions, and DFID. Legitimate users and authorised researchers should be able to find a copy of the data in a form where they can for example (a) perform further analyses; (b) with appropriate access, use the information in the archive as an extremely detailed sampling frame, through which to revisit sub-samples of the ELUS estates; (c) integrate ELUS data with their own later findings for longitudinal analysis. SSC 1998 Archiving Case Study 9

8. Further Information It is hoped that this document provides some initial ideas on issues of importance in data archiving. The Statistical Services Centre intends to work towards more detailed guidelines on archiving procedures to potential researchers initiating projects, to their appraisers, and perhaps to government agencies in countries where projects may be done. We would be pleased to hear from researchers with examples or experience of work similar to that described here, from successful or frustrated users of data archives, or anyone with ideas to share on the issues involved. O O O O 10 SSC 1998 Archiving Case Study

SSC 1998 Archiving Case Study 11

The Statistical Services Centre is attached to the Department of Applied Statistics at The University of Reading, UK, and undertakes training and consultancy work on a non-profit-making basis for clients outside the University. These statistical guides were originally written as part of a contract with DFID to give guidance to research and support staff working on DFID Natural Resources projects. The available titles are listed below. Statistical Guidelines for Natural Resources Projects On-Farm Trials Some Biometric Guidelines Data Management Guidelines for Experimental Projects Guidelines for Planning Effective Surveys Project Data Archiving Lessons from a Case Study Informative Presentation of Tables, Graphs and Statistics Concepts Underlying the Design of Experiments One Animal per Farm? Disciplined Use of Spreadsheets for Data Entry The Role of a Database Package for Research Projects Excel for Statistics: Tips and Warnings The Statistical Background to ANOVA Moving on from MSTAT (to Genstat) Some Basic Ideas of Sampling Modern Methods of Analysis Confidence & Significance: Key Concepts of Inferential Statistics Modern Approaches to the Analysis of Experimental Data Approaches to the Analysis of Survey Data Mixed Models and Multilevel Data Structures in Agriculture The guides are available in both printed and computer-readable form. For copies or for further information about the SSC, please use the contact details given below. Statistical Services Centre, The University of Reading P.O. Box 240, Reading, RG6 6FN United Kingdom tel: SSC Administration +44 118 931 8025 fax: +44 118 975 3169 e-mail: statistics@reading.ac.uk web: http://www.reading.ac.uk/ssc/