Multi-Environment Trials: Data Quality Guide

Similar documents
Case Study No. 6. Good practice in data management

Data Management for Multi-Environment Trials in Excel

Competent Data Management - a key component

Research Data Management. Peter Muraya, ICRAF, Kenya Cathy Garlick, SSC, UK Richard Coe, ICRAF, Kenya

Project Data Archiving Lessons from a Case Study

Disciplined Use of Spreadsheet Packages for Data Entry

Data Flow Organising action on Research Methods and Data Management

Guidelines for Minimum Standards Property Management Planning. Financial Management Module

Checklist for a Data Management Plan draft

Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu.

A grant number provides unique identification for the grant.

Checklist and guidance for a Data Management Plan

Qualitative Agriculture Product Analysis Based SPSS Software & Management using Cloud Computing

Which Online 360? A 10-step checklist

The Clinical Research Center

A Decision Guide on the Uses and Applications of EpiData Entry and EpiData Analysis Software

Subject Access Request Protocol

APPENDIX B: COMPUTER SOFTWARE TO COMPILE AND ANALYZE DATA

Catalogue of services Microdata Services

Using Excel for Statistics Tips and Warnings

A PLATFORM FOR SHARING DATA FROM FIELD OPERATIONAL TESTS

As part of the grant effort, research will be conducted in two phases:

MICS DATA PROCESSING WORKSHOP Al Ghurair Rayhaan by Rotana, Dubai UAE October, 2013 AGENDA

WSIC Integrated Care Record FAQs

MASENO UNIVERSITY OFFICE OF THE DEPUTY VICE-CHANCELLOR, PARTNERSHIPS, RESEARCH & INNOVATIONS

Gender Equality: Student Culture Survey Guidance for Departments

An Export Marketing Research Project. Guidance Notes

Introduction to Microsoft Access 2003

Bring Your Own Device (BYOD)

OpenAIRE Research Data Management Briefing paper

Introduction to the Survey Research Data Archive of Taiwan ( 學 術 調 查 研 究 資 料 庫 )

Research Data Management Procedures

MRIA CODE OF CONDUCT FOR MARKET AND SOCIAL MEDIA RESEARCH. Appendix E GUIDELINE FOR QUALITATIVE RESEARCH

Program Information. Certificate IV in Training and Assessment. TAE Institute. Aligned to TAE10 Training Package, release 3.4

The Child at the Centre. Overview

Data Coding and Entry Lessons Learned

Managing Data Issues Identified During Programming

HE STEM Staff Culture Survey Guidance

Document Number: SOP/RAD/SEHSCT/007 Page 1 of 17 Version 2.0

Data Management Implementation Plan

SMART GOAL SETTING WORKSHEET With Guidance Notes

Research Data Archival Guidelines

ENHANCED PUBLICATIONS IN THE CZECH REPUBLIC

An Introduction to Managing Research Data

ESRC Research Data Policy

Work Experience Portfolio

Efficient, Quality-assured Data capture and analysis using EpiData

Opus: University of Bath Online Publication Store

Video Producer. Ref No. A69/2016. Permanent Full time. March 2016

1 About This Proposal

HERON (No: ): Deliverable D.2.6 DATA MANAGEMENT PLAN AUGUST Partners: Oxford Brookes University and Università Commerciale Luigi Bocconi

CDI SSF Category 1: Management, Policy and Standards

Financial Planning module

OBL4HE: students' use of online learning resources and preferences for design and content. February Rebecca Reynolds

Participant Invitation and Information Sheet MRI Test Run

CIP s Open Data & Data Management Guidelines and Procedures

Introduction to Datastream

SIP Trunking, ITSP Checklist

VPN Configuration Guide. Linksys (Belkin) LRT214 / LRT224 Gigabit VPN Router

BM3375 ADVANCED MARKET RESEARCH

Lesson #10: Importing Yield Data

Project Acronym: CRM ACCORD Version: 2 Contact: Joanne Child, Doncaster College Date: 30 April JISC Final Report CRM ACCORD

Tracking Survey of Graduates from Self-financed Associate Degree & Higher Diploma Programmes of 2005 and 2006 Cohorts SURVEY REPORT

Required Trial Management Software Features. Gylling Data Management, Inc.

Getting Started With SPSS

Importing TSM Data into Microsoft Excel using Microsoft Query

Making the Most of Lectures

Unlimited. Click4Assistance - Package Comparison. The Packages...

Gender and Agriculture: what do we know about what interventions work for technology adoption? Markus Goldstein The World Bank

KEYS TO A SUCCESSFUL BUSINESS PLAN

Weather Indexed Crop Insurance Jared Brown, Justin Falzone, Patrick Persons and Heekyung Youn* University of St. Thomas

Guide on how to choose an Employment Lawyer to represent you* Do you need an Employment Lawyer to help you with a problem you may have?

How to Use SDTM Definition and ADaM Specifications Documents. to Facilitate SAS Programming

Question Bank FACTSHEET 2. Methods of Data Collection in Social Surveys

Ithaca College Survey Research Center Survey Research Checklist and Best Practices

IAM Level 2. NVQ Certificate in Business and Administration. Qualification handbook edition

GOOD LABORATORY PRACTICE UK GLPMA POLICY ON THE USE OF NON-GLP FACILITIES FOR THE CONDUCT OF STUDY PHASES.

Lesson 13. Basic Productivity Applications

Benefits of managing and sharing your data

Transcription:

Multi-Environment Trials: Data Quality Guide Thomas Mawora (tmawora@yahoo.com ), Maseno University, Kenya Cathy Garlick (c.a.garlick@reading.ac.uk), Statistical Services Centre, University of Reading, UK Maxwell Mkondiwa (maxii88@yahoo.co.uk ), Bunda College, Malawi Ric Coe (r.coe@cgiar.org), Statistical Services Centre, University of Reading, UK and World Agroforestry Centre, Kenya 5 April 2012

1. Introduction The purpose of this guide is to help managers of Multi-Environmental Trials (METs) to collect, share, compile and store data ready for analysis. The emphasis is on Data Quality. METs are expensive to run and we want to ensure that the datasets they generate are reliable, secure, well-documented, and can be processed efficiently both now and in the future. In other words we want them to be of high quality. This guide is organised using the Data Flow concept devised by the Statistical Services Centre (SSC) in 2010. This is summarised in Figure 1. Figure 1: Data flow (Statistical Services Centre, 2010). 2. Using the guide The guide is designed to act as a checklist. As you plan and carry out a MET, look through it regularly to make sure you have all angles covered. If there are areas you need help with, ask the Research Methods Support Team. 3. Data Ownership Data ownership is a contentious issue at times. To avoid disagreements, at the start of the work you should draw up a clear, written agreement which all parties should sign. Include in the agreement details of who has access to the data both during and after the project and for what purposes; include details on what data, if any, will be put into the public domain, and set

a timeline for carrying this out. Also, include a section on authorship of any publications coming from the project. In this context we are considering the data and accompanying documentation to be a publication from the project. A data ownership and sharing agreement is particularly important if the MET involves individuals from different organisations. Will the MET involve people from different organisations? Yes [ ] No [ ] Do you have a written Data Ownership & Sharing agreement? Yes [ ] No [ ] Does the agreement include information on: o Who has access to the data? Yes [ ] No [ ] o Authorship of publications? Yes [ ] No [ ] o Archiving including timescales? Yes [ ] No [ ] 4. Planning Data Flow Planning the whole process is not a trivial task. You will need to define the activities for the MET and work out what data you are expecting from these activities. Objectives and timelines for each activity must be documented. Staff roles and responsibilities need to be assigned and training may be needed to address any skills shortage. Alternatively, extra staff or external consultants may need to be brought on board. Also at this stage you should consider equipment needs make sure you have everything you need for measuring and collecting the data in the field and for entering, storing and managing the data in the field and/or back at the main office. When considering IT equipment also think about the software you might need. Do you have a list of the activities for the MET that will generate data? Yes [ ] No [ ] Are all the activities documented with objectives and timelines? Yes [ ] No [ ] Have staff responsibilities been assigned? Yes [ ] No [ ] Have you identified the data collectors? Yes [ ] No [ ] Do you have a list of all equipment needed? Yes [ ] No [ ] Is all equipment included in your budget? Yes [ ] No [ ] 5. The Data Manager The role of the data manager is pivotal in this process; this section is included here to emphasise the importance of this role. The data manager should be in position from the start

of the MET so that he/she can be instrumental in ensuring good data management practices are followed throughout. The data manager s role can include: Preparing Data Management Guidelines and documenting procedures; Contributing to fieldwork manuals; Creating and documenting data entry systems; Training data entry staff; Validating data from all sites and producing site reports on the quality of the data; Merging and compiling data from all sites; Producing a secure data and document store (including pictures and videos) and controlling access as agreed by the PI and in the Data Ownership agreement; Regularly backing up the data and documentation; Preparing the meta-data and keeping all data documentation up to date; Providing data for analysis as and when needed; Ensuring data can be managed and used across seasons and sites; Regularly update the project teams on the status of the data; Archiving data and documentation at the end of the MET/research activity (as defined in the data ownership guide) taking into account confidentiality and anonymity. The data manager has a great deal of responsibility, and you will need to decide whether you need just one person for this role or whether you need a data manager at each site, albeit on a part-time basis. Depending on the size of the MET you may also need a data technician or two to assist with some of the detailed data checking tasks you may find you need this extra help just at particular times. Have you identified your data manager? Yes [ ] No [ ] Do you need a data manager at each site as well as someone with overall responsibility? Yes [ ] No [ ] Have you drawn up a Terms of Reference (ToR) for the data manager? Yes [ ] No [ ] Do you need to employ one or more data technicians for extra help at certain times? Yes [ ] No [ ] Does your data manager have all the necessary skills? Yes [ ] No [ ] Does your data manager need further training? Yes [ ] No [ ]

6. Data Entry Systems We mention data entry systems at this stage as it is useful to consider the software you are going to use for data entry before you do any data collection. We also want to separate the task of preparing the data entry system from the task of data entry. In many research activities the task of preparing the data entry system is omitted altogether and researchers enter their data directly into blank Excel worksheets. This is not recommended, as there are then no checks on the data during data entry, which inevitably leads to errors. There are a number of software packages that can be used for data entry. We shall mention three of them here: Excel, Access and CS-Pro. Excel is easy to use and is the preferred choice for many researchers. However, it must be used carefully. We recommend using the guides produces by the Statistical Services Centre (SSC) for learning how to enter and manage data in Excel Disciplined Use of Spreadsheets for Data Entry, and Data Management for Multi-Experiments Trials in Excel. If using Excel we would strongly recommend preparing a worksheet for data entry with validation rules and checks already in place. A video demonstrating the use of such a worksheet is available on the SSC s You Tube Channel www.youtube.com/user/sscbroadcast - this video was prepared for a data entry workshop. Access is a relational database package ideal for storing data with complex structures, for example data at several levels. Data entry screens can be designed to resemble the questionnaire or data collection instrument, and skip patterns and checks can be programmed into the system. Creating a well-designed database is a highly skilled undertaking and, if you want to follow this route, you may need to employ the services of an external consultant. CS-Pro is a software package produced by the US Census Bureau and designed for survey data entry and management. Similar to Access, data entry screens can be created to resemble the questionnaire, and skip patterns programmed to follow those in the questionnaire to help ensure accurate data entry. The package also includes a Data Comparison tool for doubledata entry and a tabulation component to assist with data checking. Once entered and checked, data can be exported to Excel or directly into SPSS, STATA or SAS. The SSC have prepared a series of video demonstrations showing how to set up a data entry system in CS- Pro these are available from the SSC website www.reading.ac.uk/ssc. Have you selected the software to use for data entry? Yes [ ] No [ ] Do your staff need training in the use of this software? Yes [ ] No [ ]

Do you need to employ an external consultant to set up, or help set up, the data entry system? Yes [ ] No [ ] Have you prepared the data entry system? Yes [ ] No [ ] Has the system been documented? Yes [ ] No [ ] 7. Planning Data Collection The MET must have a written protocol developed and approved. This should include details of how all data will be collected and the choice of environments/locations/plots. A field manual should be produced detailing the method of data collection, units of measurement to be used, etc. You will need to consider whether farmers are to be involved in the data collection, and if so, what their role might be. For example: Designing aspects of the experiment; Managing field plots; Measuring and assessing the experiment; Helping with data interpretation; Data collection staff needs to be identified and trained even if they have carried out data collection for other projects, they will still need instruction in the methods to be used for your MET. Has a written protocol been developed? Yes [ ] No [ ] Has it been approved? Yes [ ] No [ ] Has a field manual been written? Yes [ ] No [ ] Have data collection staff been identified? Yes [ ] No [ ] Have staff been trained in the relevant techniques for your MET? Yes [ ] No [ ] 8. Confidentiality of participants Personal information from participating farmers and others should be treated as confidential. Participants should be made aware of how the data is to be used. Consent forms should be prepared and participants given the opportunity to opt out at any time. Before photos or videos of individuals are taken, consent must be obtained from those involved. Has signed consent been given by all participating farmers and others? Yes [ ] No [ ] Are signed consent forms stored centrally? Yes [ ] No [ ]

Do you have means of keeping data about people confidential? Yes [ ] No [ ] 9. Different levels of data A MET has at least two levels of variation and possibly more. Data must be collected and managed from each level. There will be data from sites or environments and from plots. You may also have farmer level data if these are not the same as sites. Data at the site level could include: The location (GPS) Data to describe the environment such as: o Climate o Soil o Farming system o Cultural variables At the plot level you might have: Treatments in each plot Management such as fertilizer input Sole crop or intercropping used Crop performance yield, disease, farmer s assessments Have you identified the different levels of data for your MET? Yes [ ] No [ ] Have you identified the site level data relevant to your MET? Yes [ ] No [ ] Have you identified the plot level data? Yes [ ] No [ ] Have the data collection tools been tested? Yes [ ] No [ ] Have you devised a means of field data recording? Yes [ ] No [ ] 10. Identification Every item in the MET must have a unique key that will allow it to be identified and linked to other data. This includes every plot, field, farmer, site, etc. Data at the lower levels will need to include the key for the higher level so that the data can be linked. For example, the plot level data may include the key for the site which links it to the site level data. In Excel, site level and plot level data would be stored on separate worksheets and VLOOKUP could be used to merge the data across levels. In Access, site and plot level data would be stored in

separate tables and a relationship created between them. Queries would be used to merge the data across levels. Do you have unique identifiers for each item in your trial? Yes [ ] No [ ] 11. Data Collection The organisation of field data measurement can have a large impact on the overall quality of the data. If measurement processes are clumsy, inappropriate, too time-consuming, etc., then they will not generate good data. Likewise they may not give good data if they require subjective decisions in the field, excessive work in harsh conditions, or are simply too tedious. You should pilot all measurement and data recording procedures to make sure they are practical, understood and repeatable. Prepare data recording sheets that are in field order (not treatment order), and prohibit any hand calculations. Check that data collectors understand how to use the forms correctly. Data collection should include recoding notes and comments. It is very useful to also take photos of plots before they are disturbed. These provide a valuable visual record of the trial, but to be useful, every photo must be linked to a plot and date so that it is later possible to tell exactly which plot matches each photo. Have you tested all data collection procedures? Yes [ ] No [ ] Have you estimated how long data collection will take and ensured workloads are reasonable? Yes [ ] No [ ] Will you record photos of every plot? Yes [ ] No [ ] 12. Data entry and organisation The data entry system should have been devised before data collection started (see above). Once you have the data, you should be able to get data entry completed swiftly and accurately. How will you confirm data entry is accurate? Build in some quality control procedures that should be run as data is entered, not long after when it is no longer possible to correct mistakes. Files then need organising. Keep records of who enters and checks data. Make sure your file naming and backup system is fool-proof! Data entry at each site is

probably better than trying to do it centrally, as it is easier to query data with the collectors and correct oddities. Do you have a clear plan for getting data entry completed promptly? Yes [ ] No [ ] 13. Data storage and archiving Requirements for long term secure storage and archiving of data from a MET are the same as those for other research data. Do you have a plan for secure storage and archiving of the data? Yes [ ] No [ ] 14. Conclusion It will be nice to have a planned set of actions to take. The following table will help identify what needs to be done and the quality assurance measures for each of the stages in data flow.

Figure 2: Some questions that can be asked during the data flow process to ensure adequate data Management for the MET Data ownership Planning data flow Planning data collection Data collection Data entry Data storage Are there collaborators from different organisations? Is there an agreement on how data will be shared? Who will manage the data? What skills should the data manager possess? Which other staff will be there for the MET? Is there a protocol for data collection What roles will different farmers have? What measures are there for confidentiality of participants information? Have the collectors been trained? How soon after collection should all data be submitted for compiling? Does the data collector and data entry clerks have unique identifications that can be entered? Where will data be entered from? Field or entral site? What measurements will be taken for the crops? How will data be backed up? Will a remote store be used? What naming conventions will be used? What site information will be collected? What plot information will be collected? Will they be similar for all MET sites or will conversion be necessary? How will field reports be entered? How will videos and pictures be stored? How will the data be archived at the end of season / project? Will videos be used? What data entry tools will be used for this MET? Have th data collection tools been checked/ tested? What quality check measures are in place?

Works Cited Statistical Services Centre. 2010. Data Flow: Organisation action on Research Methods and Data Management. University of reading, Mc Knight Foundation. 2010. pp. 1-18.