Data Management for Multi-Environment Trials in Excel



Similar documents
Disciplined Use of Spreadsheet Packages for Data Entry

Using Excel for Statistics Tips and Warnings

How to Excel with CUFS Part 2 Excel 2010

Multi-Environment Trials: Data Quality Guide

Working with Spreadsheets

Case Study No. 6. Good practice in data management

Microsoft Excel 2007 Mini Skills Overview of Tables

ACCESS Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818)

Creating A Grade Sheet With Microsoft Excel

Excel 2007 Tutorials - Video File Attributes

Sample- for evaluation only. Advanced Excel. TeachUcomp, Inc. A Presentation of TeachUcomp Incorporated. Copyright TeachUcomp, Inc.

Excel 2003 Tutorials - Video File Attributes

Designed by Jason Wagner, Course Web Programmer, Office of e-learning ZPRELIMINARY INFORMATION... 1 LOADING THE INITIAL REPORT... 1 OUR EXAMPLE...

Research Data Management. Peter Muraya, ICRAF, Kenya Cathy Garlick, SSC, UK Richard Coe, ICRAF, Kenya

Tommy B. Harrington 104 Azalea Drive Greenville, NC

Advanced Excel 10/20/2011 1

CHAPTER 6: ANALYZE MICROSOFT DYNAMICS NAV 5.0 DATA IN MICROSOFT EXCEL

Microsoft Office 2010: Access 2010, Excel 2010, Lync 2010 learning assets

Performing Simple Calculations Using the Status Bar

BSBITU402A Develop and use complex spreadsheets

INTRODUCTION TO EXCEL

Monte Carlo Simulation. SMG ITS Advanced Excel Workshop

EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002

Data Analysis with Microsoft Excel 2003

University of Southern California Marshall Information Services

Excel Database Management Microsoft Excel 2003

Excel 2010: Create your first spreadsheet

Microsoft Excel 2010 Training. Use Excel tables to manage information

Technical White Paper. Automating the Generation and Secure Distribution of Excel Reports

Monthly Payroll to Finance Reconciliation Report: Access and Instructions

By: Peter K. Mulwa MSc (UoN), PGDE (KU), BSc (KU)

EXCEL 2007 VLOOKUP FOR BUDGET EXAMPLE

Information Technology Grades Students will know and be able to: Basic Operations 1.1 (I)

Overview What is a PivotTable? Benefits

Microsoft Office Word 2010: Level 1

Task Force on Technology / EXCEL

Advanced Excel Charts : Tables : Pivots : Macros

How To Use Excel With A Calculator

Using Excel to find Perimeter, Area & Volume

IRF Business Objects. Using Excel as a Data Provider in an IRF BO Report. September, 2009

Using VLOOKUP to Combine Data in Microsoft Excel

GroundWork group Microsoft Office and Computer Class Offerings

Excel for Data Cleaning and Management

Using MS Excel V Lookups

Loading Data into salesforce.com

Computer Training Centre University College Cork. Excel 2013 The Quick Analysis Tool

Section 1 Spreadsheet Design

Excel Using Pivot Tables

How To Create A Report In Excel

Macros allow you to integrate existing Excel reports with a new information system

To add a data form to excel - you need to have the insert form table active - to make it active and add it to excel do the following:

4. The Third Stage In Designing A Database Is When We Analyze Our Tables More Closely And Create A Between Tables

Moving Data Between Access and Excel

Pulling a Random Sample from a MAXQDA Dataset

Excel Using Pivot Tables

Business Objects Enterprise version 4.1. Report Viewing

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

Intermediate. Microsoft Excel Tables and Printing

Getting Started with Excel Table of Contents

INTERMEDIATE Excel 2013

To reuse a template that you ve recently used, click Recent Templates, click the template that you want, and then click Create.

Microsoft Excel 2010 Pivot Tables

Creating a Gradebook in Excel

Coding & Data Skills for Communicators Dr. Cindy Royal Texas State University - San Marcos School of Journalism and Mass Communication

Basic Pivot Tables. To begin your pivot table, choose Data, Pivot Table and Pivot Chart Report. 1 of 18

Microsoft Excel 2010 Part 3: Advanced Excel

Using Excel for descriptive statistics

Comparing Excel, Access and REDCap as Data Management Tools for Human Health Research Data

Text files fall into several classifications. When importing from a text file, you need to specify the correct type for the import to go smoothly.

Excel Intermediate. Table of Contents UPPER, LOWER, PROPER AND TRIM...28

Microsoft Excel 2010

Time Clock Import Setup & Use

Here is a Trial Balance sample, generated and exported into an Excel spreadsheet, by an accounting software*

Search help. More on Office.com: images templates. Here are some basic tasks that you can do in Microsoft Excel 2010.

MS Excel as a Database

CGS2531 Problem Solving Using Computer Software Sample Exam 3. Select the most appropriate answer(s).

In This Issue: Excel Sorting with Text and Numbers

USC Marshall School of Business Marshall Information Services

Creating an RFM Summary Using Excel

NEXT-ANALYTICS lets you specify more than one profile View in a single query.

Commonly Used Excel Functions. Supplement to Excel for Budget Analysts

Power Users Series: Excel Databases

Using Excel for Analyzing Survey Questionnaires Jennifer Leahy

Departmental Reporting in Microsoft Excel for Sage 50 Accounts

4. Are you satisfied with the outcome? Why or why not? Offer a solution and make a new graph (Figure 2).

Microsoft Courses. Microsoft Office 2007

How to Use a Data Spreadsheet: Excel

Exam Name: Excel 2010 Exam Type: Microsoft Exam Code: Certification Microsoft Office Specialist (MOS) Total Questions: 26

A Guide to Using Excel in Physics Lab

Excel Reporting with 1010data

Excel Add-ins Quick Start Guide

ABSTRACT INTRODUCTION CLINICAL PROJECT TRACKER OF SAS TASKS. Paper PH

Lesson 4.3: Using the VLOOKUP Function

Using Microsoft Excel to Manage and Analyze Data: Some Tips

Excel IF Statements, LOOKUPS & INDEX/MATCH. When to choose which one?

Customized Excel Output Using the Excel Libname Harry Droogendyk, Stratia Consulting Inc., Lynden, ON

Improving Productivity using IT - Level 3 Scenario Assignment Sample Test 4 Version SampleMQTB/1.0/IP3/v1.0. Part 1 Performance

Transcription:

Data Management for Multi-Environment Trials in Excel Cathy Garlick (c.a.garlick@reading.ac.uk), Statistical Services Centre, University of Reading, UK 27 May 2010

1. Introduction Excel is widely available and easy to use, and thus, is frequently used in agricultural research for entering and managing data. However, it is easy to use badly. The Statistical Services Centre has produced a guide on using Excel with discipline, available from the SSC web site www.reading.ac.uk/ssc - the booklet is entitled Disciplined Use of Spreadsheets for Data Entry. This is recommended reading for all researchers using Excel. In this supplement we emphasise the use of Excel for preparing data from multi-environment trials (METs) for analysis. 2. List Format Many users of Excel are tempted to make their spreadsheets look pretty ; effectively laying out the data in tabular form. The following is a simple example for an experiment with 3 treatments and 4 breeds of hen. The number of eggs laid and the cholesterol levels in the blood were measured. The researcher laid out the data in table format as shown below: This layout is fine for a summary report but is not suitable for analysis. Instead you should store your data in a List Format. List format is a rectangular block of data where the rows represent cases and the columns represent variables for experimental data these are generally factors or measurement variables. The list should include column headers. When transferring to a statistics package the column headers are generally used as variable names. There should be no blank rows in the data a statistics package will often interpret a blank row as the end of the data. Each column should have the same data type throughout a statistics package often determines the data type from the first few rows of data. So the data for our example experiment should be stored as follows:

The layout is less pretty but more practical. You can always use Pivot Tables to create the summary table from the list. Note we have included a longer description of the variable in the top row with a short variable name in row 2. It is useful to have the longer description, although some statistics packages restrict variable names to 8 characters and do not allow spaces. Thus, we have the description in row 1 and the shorter name in row 2. When the data are exported, we would export from row 2 onwards. 3. Multiple Site Experiments In multiple site experiments the same rules apply, for example, data from all plots in all sites should be together in one list. A column should be added to distinguish the site. This is illustrated below: The last case for site 1 is on row 14, and the first case for site 2 is on row 15. Do not attempt to make this prettier by adding blank rows between sites - there should be no blank rows in the data as this is often interpreted as the end of the data.

If the layout factors differ at some sites, then include extra columns and leave blanks as appropriate. For example let s assume site 1 has blocks as a layout factor, but site 2 has ROWs and COLUMNs 1 ; we need 3 columns for the layout factors and would leave blanks for site 1 for ROW and COLUMN and blanks for site 2 for block : 4. Dealing with Numeric Codes Many variables are set up as numeric codes. For example let s assume we have codes associated with districts as follows: District Code District Name 1 Nsitu 2 Kalulu 3 Mkanda 4 Dididi 5 Chitunda 6 Linga 7 Mwansamba 8 Nkhunga The best way to deal with these codes is to enter just the numeric code in the main data sheet. Then, on a separate sheet, enter the list of codes with the district names this is like a codebook. You can then use the Vlookup function in Excel to include the district names in a new column in the main sheet. So we would have a sheet which we ll call Codes, shown below: 1 Note when we say ROW and COLUMN using capital letters we are referring to rows and columns as used in agricultural experiments. When we use local case letters we are referring to rows and columns in the Excel spreadsheet.

This is our lookup table or code list. On the main data sheet shown below, we have entered the district code into column A. We insert a new column B and use the vlookup function as shown: Note the formula vlookup (A3, Codes!$A$2:$B$9, 2, FALSE) in cell B3. Vlookup takes 4 parameters: The first in this case A3 is the reference for the cell containing the value we are looking up. In this case A3 contains the value 1. The second parameter Codes!$A$2:$B$9 defines the range of cells, which defines the lookup table or code list. In this example it is the range of cells A2:B9 on the sheet named Codes. Note, here we use absolute cell references (i.e. with the $ signs), so that if we copy this formula the reference remains the same. The third parameter specifies the column in the lookup table from where we should take the value. In this case it is column 2, i.e. column B. Thus in our example, Excel will look for a 1 in the first column of the range A2:B9 on the sheet called Codes. It then takes the corresponding value in column B, i.e. the second column. There is a 1 in cell A2 so the function returns the value in B2 which is Nsitu. The fourth parameter is TRUE or FALSE and determines the action if the code is not found in the lookup table. If you choose TRUE, Excel will return the nearest value; if you choose

FALSE it will give the result as #N/A. For example, our coding table includes codes from 1 to 8 if we were to look up the value of 9, then Excel would consider this 4 th parameter. If TRUE, it would find the nearest value which is 8 and return Nkhunga; if FALSE it would return #N/A. For most cases it is best to use FALSE so you can easily see if you have used invalid codes. This method of using codes improves data quality, as you are only typing the text values once entering text values repeatedly can lead to errors. Note you should not attempt to change the district names in the main data sheet if the district is wrong you should change the code and the name will automatically update. If the district name has a mistake, then change it in the Codes sheet. 5. Importing Excel data into a Statistics Package Note that when Excel data are imported into statistics packages, it is the values that are imported and not the formulae. Therefore, if the data mentioned above with the district codes and names were imported into Genstat, changing the district code in Genstat would not automatically change the district name. You must be aware that once you start importing data into other packages you are producing copies of the data, and it is important to keep all copies up-to-date or to have some level of version control. A typical scenario is that a researcher has the data in Excel and imports it into Genstat. She then notices an error which she corrects in Genstat. Meanwhile, another researcher takes the Excel data into STATA where he notices a different error, which he corrects within STATA. There are now 3 copies of the data, all slightly different and none of them totally correct. The solution is to designate the Excel file as the MASTER dataset. If and when errors are found, the correction must be made immediately in the MASTER dataset and ideally a log should be kept of the change so that others using the data will be aware of the change. The corrected data can then be reimported into the statistics package and analysis can continue or be rerun. In most statistics packages it is possible to save the syntax for any analysis you do, so rerunning the analysis should not be a problem. 6. Raw data and Results It is advisable to keep raw data and results on separate sheets in the Excel file. For example Pivot tables and graphs should not be on the same sheet with the raw data. Some researchers like to include summaries (totals and averages) at the end of the data again we would recommend storing these summaries on a separate sheet otherwise you will have problems when exporting the data. 7. Calculations As far as possible, all calculations should be done using formulae in Excel. For example, in the dataset below we have the Fresh Pod Weight in column H and the Sample Grain weight in column I; to calculate the Total grain per plot we use the formula I*H as shown below.

We keep the formulae in the spreadsheet so that the values can easily be checked in this case we would not enter values into column K but only the formulae. Doing calculations by hand and entering the results into the spreadsheet can and will lead to errors both in the calculations themselves and in the transcription of the results let the computer do the boring work! 8. Data at different levels Often you will have data at different levels. For example you may have data at the plot level and at the plant level; or data at the site or district level, as well as plot level data. Data at different levels should be stored on separate sheets in the workbook to avoid unnecessary duplication, which can lead to errors. There must be a column or variable which links the 2 sets of data. The example below shows some data from a pigeon pea experiment. Note the farmers are identified by a code (Farmer ID). We also have data at the farmer level, which are shown below:

The data are linked by the FarmerID, which appears in both sheets. Data from the farmer level can be displayed in the pigeon pea sheet using Vlookup. The example below shows farm size a farmer level variable included with the pigeon pea data. This is an extension of using Vlookup to display labels for numeric codes. Here we just have more columns in our lookup table, the range for which is A3:K57, so we have 11 columns in the table and we are picking data from the 10 th column. In each sheet you should be able to determine what each row represents e.g. one row per plot, one row per farmer, etc. You should know the number of plots, farmers and other subjects in your experiment, and easily check that you have the correct number of rows. If you have 10 farmers, you must have 10 rows of data at the farmer level; if you have 50 plots in total, you must have 50 rows of data at the plot level. 9. One item per cell dealing with repetitions and missing data A single cell in an Excel spreadsheet must only contain a single item. Two or more numbers in the same cell will be treated as a text value, and will not be unusable. The examples below to the left show where the diameter was measured twice for each tree; this should be split into two columns as shown in the example below to the right.

Where data are missing, the cell should be left blank in Excel. You might want to include the reason for the missing value, and this can be done by giving the explanation as a comment or in a separate column. In this example, the tree in row 15 died and the text Dead was entered instead of the diameter measurements. This is text in an otherwise numeric column, which will cause problems and should be avoided. The cells should be left blank and a comment added explaining the missing data as shown below. 10. Sorting the data Data should be entered and stored in field collection order. There is no need to sort the data. The layout factors can be entered prior to data collection, and the spreadsheet printed and used as a data collection sheet.

Sorting the sheet can lead to problems and it is easy in Excel to accidentally sort a single column. It is important to check your data after every manipulation, and of course to keep backup copies of your data. 11. Summary The ideas introduced here can be summarised into a set of rules or recommendations as follows: 1. Store the data in List format ; 2. Data from a multiple site experiment should be combined into a single worksheet, not a separate sheet of file for each location; 3. Do not have blank rows in the data; 4. Keep data and results separate; 5. Let the computer do the calculations don t do hand calculations; 6. Store data at different levels on separate sheets; 7. Use vlookup for coded data and to link data at different levels; 8. Only enter one item per cell; 9. Do not include text in numeric columns use comments instead; 10. Keep regular backups of all your data; 11. Designate a MASTER dataset and ensure all corrections are made to this dataset.