Multi-Environment Trials: Data Quality Guide Thomas Mawora (tmawora@yahoo.com ), Maseno University, Kenya Cathy Garlick (c.a.garlick@reading.ac.uk), Statistical Services Centre, University of Reading, UK Maxwell Mkondiwa (maxii88@yahoo.co.uk ), Bunda College, Malawi Ric Coe (r.coe@cgiar.org), Statistical Services Centre, University of Reading, UK and World Agroforestry Centre, Kenya 5 April 2012
1. Introduction The purpose of this guide is to help managers of Multi-Environmental Trials (METs) to collect, share, compile and store data ready for analysis. The emphasis is on Data Quality. METs are expensive to run and we want to ensure that the datasets they generate are reliable, secure, well-documented, and can be processed efficiently both now and in the future. In other words we want them to be of high quality. This guide is organised using the Data Flow concept devised by the Statistical Services Centre (SSC) in 2010. This is summarised in Figure 1. Figure 1: Data flow (Statistical Services Centre, 2010). 2. Using the guide The guide is designed to act as a checklist. As you plan and carry out a MET, look through it regularly to make sure you have all angles covered. If there are areas you need help with, ask the Research Methods Support Team. 3. Data Ownership Data ownership is a contentious issue at times. To avoid disagreements, at the start of the work you should draw up a clear, written agreement which all parties should sign. Include in the agreement details of who has access to the data both during and after the project and for what purposes; include details on what data, if any, will be put into the public domain, and set
a timeline for carrying this out. Also, include a section on authorship of any publications coming from the project. In this context we are considering the data and accompanying documentation to be a publication from the project. A data ownership and sharing agreement is particularly important if the MET involves individuals from different organisations. Will the MET involve people from different organisations? Yes [ ] No [ ] Do you have a written Data Ownership & Sharing agreement? Yes [ ] No [ ] Does the agreement include information on: o Who has access to the data? Yes [ ] No [ ] o Authorship of publications? Yes [ ] No [ ] o Archiving including timescales? Yes [ ] No [ ] 4. Planning Data Flow Planning the whole process is not a trivial task. You will need to define the activities for the MET and work out what data you are expecting from these activities. Objectives and timelines for each activity must be documented. Staff roles and responsibilities need to be assigned and training may be needed to address any skills shortage. Alternatively, extra staff or external consultants may need to be brought on board. Also at this stage you should consider equipment needs make sure you have everything you need for measuring and collecting the data in the field and for entering, storing and managing the data in the field and/or back at the main office. When considering IT equipment also think about the software you might need. Do you have a list of the activities for the MET that will generate data? Yes [ ] No [ ] Are all the activities documented with objectives and timelines? Yes [ ] No [ ] Have staff responsibilities been assigned? Yes [ ] No [ ] Have you identified the data collectors? Yes [ ] No [ ] Do you have a list of all equipment needed? Yes [ ] No [ ] Is all equipment included in your budget? Yes [ ] No [ ] 5. The Data Manager The role of the data manager is pivotal in this process; this section is included here to emphasise the importance of this role. The data manager should be in position from the start
of the MET so that he/she can be instrumental in ensuring good data management practices are followed throughout. The data manager s role can include: Preparing Data Management Guidelines and documenting procedures; Contributing to fieldwork manuals; Creating and documenting data entry systems; Training data entry staff; Validating data from all sites and producing site reports on the quality of the data; Merging and compiling data from all sites; Producing a secure data and document store (including pictures and videos) and controlling access as agreed by the PI and in the Data Ownership agreement; Regularly backing up the data and documentation; Preparing the meta-data and keeping all data documentation up to date; Providing data for analysis as and when needed; Ensuring data can be managed and used across seasons and sites; Regularly update the project teams on the status of the data; Archiving data and documentation at the end of the MET/research activity (as defined in the data ownership guide) taking into account confidentiality and anonymity. The data manager has a great deal of responsibility, and you will need to decide whether you need just one person for this role or whether you need a data manager at each site, albeit on a part-time basis. Depending on the size of the MET you may also need a data technician or two to assist with some of the detailed data checking tasks you may find you need this extra help just at particular times. Have you identified your data manager? Yes [ ] No [ ] Do you need a data manager at each site as well as someone with overall responsibility? Yes [ ] No [ ] Have you drawn up a Terms of Reference (ToR) for the data manager? Yes [ ] No [ ] Do you need to employ one or more data technicians for extra help at certain times? Yes [ ] No [ ] Does your data manager have all the necessary skills? Yes [ ] No [ ] Does your data manager need further training? Yes [ ] No [ ]
6. Data Entry Systems We mention data entry systems at this stage as it is useful to consider the software you are going to use for data entry before you do any data collection. We also want to separate the task of preparing the data entry system from the task of data entry. In many research activities the task of preparing the data entry system is omitted altogether and researchers enter their data directly into blank Excel worksheets. This is not recommended, as there are then no checks on the data during data entry, which inevitably leads to errors. There are a number of software packages that can be used for data entry. We shall mention three of them here: Excel, Access and CS-Pro. Excel is easy to use and is the preferred choice for many researchers. However, it must be used carefully. We recommend using the guides produces by the Statistical Services Centre (SSC) for learning how to enter and manage data in Excel Disciplined Use of Spreadsheets for Data Entry, and Data Management for Multi-Experiments Trials in Excel. If using Excel we would strongly recommend preparing a worksheet for data entry with validation rules and checks already in place. A video demonstrating the use of such a worksheet is available on the SSC s You Tube Channel www.youtube.com/user/sscbroadcast - this video was prepared for a data entry workshop. Access is a relational database package ideal for storing data with complex structures, for example data at several levels. Data entry screens can be designed to resemble the questionnaire or data collection instrument, and skip patterns and checks can be programmed into the system. Creating a well-designed database is a highly skilled undertaking and, if you want to follow this route, you may need to employ the services of an external consultant. CS-Pro is a software package produced by the US Census Bureau and designed for survey data entry and management. Similar to Access, data entry screens can be created to resemble the questionnaire, and skip patterns programmed to follow those in the questionnaire to help ensure accurate data entry. The package also includes a Data Comparison tool for doubledata entry and a tabulation component to assist with data checking. Once entered and checked, data can be exported to Excel or directly into SPSS, STATA or SAS. The SSC have prepared a series of video demonstrations showing how to set up a data entry system in CS- Pro these are available from the SSC website www.reading.ac.uk/ssc. Have you selected the software to use for data entry? Yes [ ] No [ ] Do your staff need training in the use of this software? Yes [ ] No [ ]
Do you need to employ an external consultant to set up, or help set up, the data entry system? Yes [ ] No [ ] Have you prepared the data entry system? Yes [ ] No [ ] Has the system been documented? Yes [ ] No [ ] 7. Planning Data Collection The MET must have a written protocol developed and approved. This should include details of how all data will be collected and the choice of environments/locations/plots. A field manual should be produced detailing the method of data collection, units of measurement to be used, etc. You will need to consider whether farmers are to be involved in the data collection, and if so, what their role might be. For example: Designing aspects of the experiment; Managing field plots; Measuring and assessing the experiment; Helping with data interpretation; Data collection staff needs to be identified and trained even if they have carried out data collection for other projects, they will still need instruction in the methods to be used for your MET. Has a written protocol been developed? Yes [ ] No [ ] Has it been approved? Yes [ ] No [ ] Has a field manual been written? Yes [ ] No [ ] Have data collection staff been identified? Yes [ ] No [ ] Have staff been trained in the relevant techniques for your MET? Yes [ ] No [ ] 8. Confidentiality of participants Personal information from participating farmers and others should be treated as confidential. Participants should be made aware of how the data is to be used. Consent forms should be prepared and participants given the opportunity to opt out at any time. Before photos or videos of individuals are taken, consent must be obtained from those involved. Has signed consent been given by all participating farmers and others? Yes [ ] No [ ] Are signed consent forms stored centrally? Yes [ ] No [ ]
Do you have means of keeping data about people confidential? Yes [ ] No [ ] 9. Different levels of data A MET has at least two levels of variation and possibly more. Data must be collected and managed from each level. There will be data from sites or environments and from plots. You may also have farmer level data if these are not the same as sites. Data at the site level could include: The location (GPS) Data to describe the environment such as: o Climate o Soil o Farming system o Cultural variables At the plot level you might have: Treatments in each plot Management such as fertilizer input Sole crop or intercropping used Crop performance yield, disease, farmer s assessments Have you identified the different levels of data for your MET? Yes [ ] No [ ] Have you identified the site level data relevant to your MET? Yes [ ] No [ ] Have you identified the plot level data? Yes [ ] No [ ] Have the data collection tools been tested? Yes [ ] No [ ] Have you devised a means of field data recording? Yes [ ] No [ ] 10. Identification Every item in the MET must have a unique key that will allow it to be identified and linked to other data. This includes every plot, field, farmer, site, etc. Data at the lower levels will need to include the key for the higher level so that the data can be linked. For example, the plot level data may include the key for the site which links it to the site level data. In Excel, site level and plot level data would be stored on separate worksheets and VLOOKUP could be used to merge the data across levels. In Access, site and plot level data would be stored in
separate tables and a relationship created between them. Queries would be used to merge the data across levels. Do you have unique identifiers for each item in your trial? Yes [ ] No [ ] 11. Data Collection The organisation of field data measurement can have a large impact on the overall quality of the data. If measurement processes are clumsy, inappropriate, too time-consuming, etc., then they will not generate good data. Likewise they may not give good data if they require subjective decisions in the field, excessive work in harsh conditions, or are simply too tedious. You should pilot all measurement and data recording procedures to make sure they are practical, understood and repeatable. Prepare data recording sheets that are in field order (not treatment order), and prohibit any hand calculations. Check that data collectors understand how to use the forms correctly. Data collection should include recoding notes and comments. It is very useful to also take photos of plots before they are disturbed. These provide a valuable visual record of the trial, but to be useful, every photo must be linked to a plot and date so that it is later possible to tell exactly which plot matches each photo. Have you tested all data collection procedures? Yes [ ] No [ ] Have you estimated how long data collection will take and ensured workloads are reasonable? Yes [ ] No [ ] Will you record photos of every plot? Yes [ ] No [ ] 12. Data entry and organisation The data entry system should have been devised before data collection started (see above). Once you have the data, you should be able to get data entry completed swiftly and accurately. How will you confirm data entry is accurate? Build in some quality control procedures that should be run as data is entered, not long after when it is no longer possible to correct mistakes. Files then need organising. Keep records of who enters and checks data. Make sure your file naming and backup system is fool-proof! Data entry at each site is
probably better than trying to do it centrally, as it is easier to query data with the collectors and correct oddities. Do you have a clear plan for getting data entry completed promptly? Yes [ ] No [ ] 13. Data storage and archiving Requirements for long term secure storage and archiving of data from a MET are the same as those for other research data. Do you have a plan for secure storage and archiving of the data? Yes [ ] No [ ] 14. Conclusion It will be nice to have a planned set of actions to take. The following table will help identify what needs to be done and the quality assurance measures for each of the stages in data flow.
Figure 2: Some questions that can be asked during the data flow process to ensure adequate data Management for the MET Data ownership Planning data flow Planning data collection Data collection Data entry Data storage Are there collaborators from different organisations? Is there an agreement on how data will be shared? Who will manage the data? What skills should the data manager possess? Which other staff will be there for the MET? Is there a protocol for data collection What roles will different farmers have? What measures are there for confidentiality of participants information? Have the collectors been trained? How soon after collection should all data be submitted for compiling? Does the data collector and data entry clerks have unique identifications that can be entered? Where will data be entered from? Field or entral site? What measurements will be taken for the crops? How will data be backed up? Will a remote store be used? What naming conventions will be used? What site information will be collected? What plot information will be collected? Will they be similar for all MET sites or will conversion be necessary? How will field reports be entered? How will videos and pictures be stored? How will the data be archived at the end of season / project? Will videos be used? What data entry tools will be used for this MET? Have th data collection tools been checked/ tested? What quality check measures are in place?
Works Cited Statistical Services Centre. 2010. Data Flow: Organisation action on Research Methods and Data Management. University of reading, Mc Knight Foundation. 2010. pp. 1-18.