Data Flow Organising action on Research Methods and Data Management Research Methods Support for Collaborative Crop Research Program (CCRP) Projects Funded by the McKnight Foundation
Data Flow Organising action on research method and data management Why has this guide been written? SSC has been asked to provide support and training to the projects funded under the McKnight Foundation CCRP. This guide on "Data Flow" was written because, while much of this support is on demand, we need a way of organising and structuring our offer so that: Projects and scientists have a good idea of what it is reasonable to demand We can communicate efficiently though the use of a common means of visualising and describing what projects and SSC are doing We can link the work to the integrated monitoring and planning of the CCRP Data Flow The idea of data flow was developed by participants of the E/H Af CoP meeting in May 2009 Since then it has been further developed and discussed by SSC Staff and by the S Af CoP at their meeting in September 2009. All the SSC work in support for Research Methods is about contributing to the generation, management, quality assurance and use of research data within research and development projects. We are therefore structuring our inputs around the way data 'flows' through a project. A first thought is that typically when a researcher collects a data set in the field or lab, he or she enters it into a computer and organises it, then subjects it to statistical analysis. A little further reflection shows that there are important steps before and after those, as shown Figure 1: Figure 1: Data Flow In any project there are loops and feedback. This is acknowledged but for the purpose of this guide those loops are implicit (not indicated in the figure). An important feature of the data flow depicted is that there are multiple stages in the flow of information. At each of them, quality assurance needs to be considered to ensure the overall quality of the research, leading to effective knowledge generation. Page 1 of 12
For each stage in data flow (as illustrated in Figure 1), there are a number of aspects that project teams and scientists need to be aware of and, usually, take some decisive action on. Some of the most common aspects are listed in Table 1. This list is not exhaustive, nor are the items prioritised. They will have differing relevance for different projects and we expect further items of concern to emerge when specific projects assess their data flow process. Many of the items in Table 1 are relevant and important for any type of research for development, including social research and participatory research. The concepts behind many are also relevant to qualitative research and to action research that will generate and use data. Table 1: Data flow and quality assurance Data flow step Examples of areas for action Quality assurance Data ownership Intellectual property Data exchange and sharing agreements Authorship Planning for data collection Understand the problem and set clear objectives Determine appropriate research approaches Plan and describe the outputs, including outline tables and graphs, that the data will be used to generate. Design the research activities (experiments, surveys, observation, or other information collection techniques) Decide what to measure Decide what supporting data to collect e.g. climatic data Design data collection tools and instruments Plan details field layout, sampling, calendar Prepare human & other resources: Training and logistics Document the plan in a research protocol Liaise with partners to ensure relevance of and shared objectives Are we using current best practice in our methods? Is there clear allocation of responsibilities to team members? Does the research protocol exist? Is there a system of version control for all documents (protocols, questionnaires etc)? Is there a well defined calendar of activities? Has the design of the research activity been optimised? Is the research plan understood by all involved? Data collection Field and lab work Implementation of an error trapping system at the time of data collection Assessment of field reports to take actions to improve quality Identification of omissions and gaps in the data collected with Page 2 of 12
respect to the original plan Monitoring quality Has the plan of activities been updated to reflect the implementation of the research process? Data entry Find efficient, accurate and welladapted systems Prepare data entry system (e.g. spreadsheets, databases) Enter or import data Clean data and validate data Document how it was done Statistical analysis Plan and describe the statistical analyses that will lead from data to outputs. Decision on which statistical or other information processing software to use Data formatting for analysis Indicators and transformations Data exploration and summary Statistical Analysis trade off perfection and practice Design of automatic checks for data entry Ensure well trained staff and collaborators Ensure backups are made of all the information in electronic form Keeping records of data processing decisions Keep a data processing log Keep well documented syntax of data processing tasks. Well organised and documented datasets Interpret and write up Interpret data Presentation of results tables & graphs Merging new information with what was previously known and working out the implications Reporting summary outputs, conclusions, and next steps Documentation for dissemination (report, journal papers, leaflets etc) Does the research team have a system for checking and reviewing quality of products before they are released? Storage and access Data storage and archive Data re use Public access and donor requirements Existence of a clear access policy and a defined system to request data access Metadata available and organised together with the data Feedback to originators Suitable formats and occasions Have research products been generated in a form appropriate to the intended audiences? Is there a defined mechanism for receiving feedback from interested Page 3 of 12
parties? Has the plan of activities been updated to reflect the implementation of the research process? Have research products been reviewed and accepted by stakeholders? There are loops and feedbacks implicit in this system. This means projects can and should be planning for some of the later steps in the data flow before the earlier steps have been completed. For example data entry procedures should be in place prior to data collection. Indeed developing the data entry system in parallel with developing the data collection instruments facilitates and speeds up the data entry process as it can be started as soon as the first data collection sheets are returned. Diagnosis, planning and action The Data Flow concept can be used for improved performance of research projects, using the ideas of IMEP (integrated monitoring, evaluation and planning) or the What? So What? Now What? framework. For Data Flow, we find it useful to think of Problems (What?), Targets (So what?) and strategies and actions (Now what?). The same model is used in quality assurance, with the emphasis on continuous improvement of performance. This is very relevant in this context as project teams try to improve themselves so they reach their own goals, are better prepared for future projects, and contribute to building the skills of research organisations Diagnosing problems Researchers are well aware of some of the data flow problems they have. However there are some areas in which they are not aware of the issues. This might be because they have not yet had to face them (e.g. If a data sharing conflict has not yet arisen in their project ) or because they are unaware of alternatives (e.g. If data has always been collected on paper then entered by hand onto a computer). Therefore projects will need assistance in assessing the current status. This could be (a) with experts visiting a project, (b) during workshops, or (c) through self-diagnosis guides. Setting targets The target is the state of managing data flow that the project would like to get to. For each data flow step and issue there are many possible targets, with no universally applicable standards. However there are some recognised good practices, expectations of some parties (such as journal editors, donors, universities in which students are registered). Projects should be selecting data flow targets which (a) will ensure they meet their overall project objectives, (b) are appropriate to the scale of the project and its human and technical resources, and (c) are realistic and achievable while also pushing the project towards higher standards. Hence we will help projects set targets by describing some of the options and standards that have been used elsewhere. Page 4 of 12
Taking action The first action that needs taking is for the projects to agree that they want to try to change and improve their data flow, and to set suitable targets. Then the actions needed will be a blend of formal training, support from specialists and self-learning. The last is most important: no amount of formal training and external input can make a real difference if scientists do not take seriously the task of broadening their understanding and learning new skills. So the actions of those outside the project should be directed at that helping scientists become selflearners and providing access to appropriate resources. Project level It is assumed that the project is the right level at which to do the diagnosis, planning and take action. Here we are referring to a project as a body of work supported by a grant from The McKnight Foundation. Such a project has a (fairly) clear boundary and timeframe, specific outputs to produce and an identified team to implement it all necessary for diagnosis, planning and action. Some projects are spread across countries and organisations. In these cases it may be sensible for different parts of a project to manage their dataflow in different ways. However at some point data for a project will have to be pulled together, and this will be easiest if there is some coherence across sites. Projects actually function because of the individuals in them. Improving data flow on a project will only be possible if those individuals understand and are committed to it. It is not sufficient for The project (usually meaning the PI) to take a decision on improving data flow if the people who will need to do something different do not understand and support the decision. Next Steps The final two tables here are provided to help projects through the process of exploring data flow, exploring important issues and deciding on priorities for action. Table 2 provides examples of some of the problems, targets and actions that are commonly found in research projects data flows. Table 3 is a template for you to use to diagnose the current situation with your own project. Page 5 of 12
Table 2: Potential problems, targets and actions Data flow item Diagnosis (What?) Targets (So what?) Taking action (What now?) Typical problems and diagnosis tools Data ownership No clear understanding of data ownership Conflicts over who can access and use data Conflicts over authorship of publications Unclear IP ownership with project scientists from different organisations Examples of targets projects may set, including those used by others Written data sharing and authorship agreements in place at the start of the project All partners and partner organisations understand and agree to IP status of project outputs Possible actions Guides and tools to help Discuss authorship and data ownership and access in early project meetings SSC prepare a template data sharing agreement SSC prepare a template authorship agreement Raise awareness of the IP ownership early in the project before any problem arises All project scientist look at CCRP IP pages Planning for data collection Lack of rigour in collection of data in participatory research All participatory research meets usual standards of scientific rigour and could be published. No sample size justification Explicit sample size justification for all research studies. Experimental designs not suitable or optimised for the problem Design of all experiments goes through peer and statistical review Seek peer review of research design Seek exchange of experience with partners who may have experience in tackling this issue Get statistical advice (Yes! This is important in participatory research!) Seek guidance on choice of sample size Learn about methods of choosing sample size Seek guidance on experimental designs Training courses in design of research studies SSC prepare checklists for some common designs Page 6 of 12
Data flow item Diagnosis (What?) Targets (So what?) Taking action (What now?) Typical problems and diagnosis tools No written protocol before start of data collection Project does not use research activity protocols Starting data collection with no plan of exactly how the data will be used and the outputs that will be generated from it. Data collection Data returned from field is often incomplete or error prone No formal system exists to ensure completeness of data when brought from the field to the office Senior scientist rarely in the field to check data collection Data entry Data entry and organisation of low quality as no one in the team is an expert Data entry very slow, delaying progress with research Examples of targets projects may set, including those used by others A written protocol prepared, shared and reviewed before every data collection activity starts (survey, experiment, participatory data collection) Skeleton or outline tables and graphs prepared that show how the data will be used before every data collection activity. Explicit quality assurance processes in place for all data collection All involved in field data collection trained in their use. Senior scientists spend time in the field during each data collection activity A data manager with the right skills and background in each project team All data ready for analysis within 1 week of field collection Possible actions Guides and tools to help SSC prepare templates or checklists for protocols for some common research types. SSC provide a service that reads protocols and offers feedback SSC prepare a guide on use of skeleton tables, graphs and statistical analyses. Get all such plans review by SSC. SSC prepare guide on simple field data quality assurance techniques Develop a list of key skills for data collection and organise training Seek advice on key skills required from a competent data manager Train a member of the team on data management Recruit a new team member who brings good data management skills Get advice on appropriate data entry methods Adopt new technology for data entry entry in the field, data entry software, etc. Page 7 of 12
Data flow item Diagnosis (What?) Targets (So what?) Taking action (What now?) Typical problems and diagnosis tools Multiple versions of data files exist with no one certain why they differ and which is correct Data dispersed over many computers and locations Statistical analysis Difficulties in deciding which statistical technique is most appropriate to ensure that the project data provides evidence that fulfils the project objectives Long time spent organising data sets when trying to analyse. Project does not have the skills required to carry out statistical data analysis Examples of targets projects may set, including those used by others Everyone in a project knows where the current correct data is, and are able to access and use any data they need. Data organised for efficient processing. Difficulties with statistical analysis never hold up interpretation and use of research data Possible actions Guides and tools to help Prepare a data management plan that includes processes for tracking changes data file versions and reasons for changes. Seek advice on statistical techniques available to fulfil project objectives Seek advice on options available for data organisation from SSC Data organisation and formats planning before data collection Seek partnership with institutions or individuals who can provide data analysis services Request SSC to help in developing data analysis skills and/or analysis of project data Interpret and write up Results from data analysis are difficult to interpret? Seek advice on the interpretation of statistical results from SSC Seek advice on interpretation of nonstatistical results from experts in the field (SSC may be able to help locating them) Page 8 of 12
Data flow item Diagnosis (What?) Targets (So what?) Taking action (What now?) Typical problems and diagnosis tools Examples of targets projects may set, including those used by others Possible actions Guides and tools to help Difficulties in linking data processing with writing up because different people and skills are required for each stage? Organise an analysis and writing up workshop. Seek specialised support from SSC and Regional Team. Storage and access Data lost due to hardware failure, theft or misplacement. Data dispersed over many locations and computers, with no one able to retrieve what is needed Data from earlier activities can not be found, retrieved or used Project progress held up because some project members hold data needed by others Feedback to originators There is no defined mechanism to give feedback to the originators of the data, for example, farmers engaged in the implementation of field activities and collection of information Data archive built as the project progresses, so it is complete when the project finishes. All research results shared with farmers who participated in the research process Use the process of feedback to engage in discussions about the "So what" and "What now" of the research outputs with the farmers who participated in the research process Seek advice on technical options available for the production of a data archive. Training in data archiving, including use of public data archives. Plan for and budget activities that enable the project to share results with farmers Engage participating farmers in the process of analysis and conclusions from the research process Produce research results dissemination products that are suitable for participating farmers and their communities Page 9 of 12
Table 3: Template for problems, targets and actions Data flow item Diagnosis (What?) Targets (So what?) Taking action (What now?) Typical problems and diagnosis tools Examples of targets projects may set, including those used by others Possible actions Data ownership Planning for data collection Data collection Data entry Page 10 of 12
Data flow item Diagnosis (What?) Targets (So what?) Taking action (What now?) Typical problems and diagnosis tools Examples of targets projects may set, including those used by others Possible actions Statistical analysis Interpret and write up Storage and access Feedback to originators Page 11 of 12