DATA PREPARATION: PRECODING THE QUESTIONNAIRE Coding: process of preparing both qualitative and quantitative information from the survey for data analysis To ease in the transition from survey to coding, researchers take several steps to precode the survey - response codes: set numerical values for categorical responses to questions - format codes: numbers that guide the positioning of numerical data in the database Response codes 1. What is your marital status? Check one. 1. married 2. separated or divorced 3. widowed 4. never married 2. For the following statements, indicate whether you agree or disagree using the following scale. Place in place to the left of the statement. 1. strongly agree 2. somewhat agree 3. somewhat agree 4. strongly agree 2.1 I think my pay is fair. 2.2 I think that my company s promotion policies are appropriate. 2.3 I think the number of sick days offered to me at this firm are appropriate. 2.4 I am satisfied with the health plan my firm has offered me. 2.5 My firm would give me an appropriate amount of time off if I became ill.
Format codes placed in the right-hand margin of the survey two critical notes - do coding last - code precisely the structure of the format codes will depend on the type of the database 1. Spreadsheets (like EXCEL) - Advantages: all PCs have necessary software simplicity (iii) allows for conversion into other two types of databases - Disadvantages: limits on the number of variables (columns) limits on the number of observations (rows) (iii) analysis of data is limited
- Example A B C D E 1 Race Sex Marstat Age Ed 2 2 1 4 22 2 3 1 1 2 57 2 4 1 2 1 42 1 5 3 1 1 38 3 (1) Which of the following best describes your race? A 1. White 2. Black 3. Asian 4. Other (2) Which of the following best describes your sex? B 1. Male 2. Female (3) Which of the following best describes your marital C status? 1. Married 2. Separated or divorced 3. Widowed 4. Never married (4) What is your age? D (5) What is the highest level of education you E attained? 1. Less high school degree 2. High school degree 3. BS, BA, or higher
2. Free-floating format - each set of responses to questions in the survey are entered and separated by a comma 2,1,4,22,2 1,1,2,57,2 1,2,1,42,1 3,1,1,38,3 (1) Which of the following best describes your race? 1 1. White 2. Black 3. Asian 4. Other (2) Which of the following best describes your sex? 2 1. Male 2. Female (3) Which of the following best describes your marital 3 status? 1. Married 2. Separated or divorced 3. Widowed 4. Never married (4) What is your age? 4 (5) What is the highest level of education you 5 attained? 1. Less high school degree 2. High school degree 3. BS, BA, or higher
- Advantages: Statistical packages can read it. Easy to precode - Disadvantages Adding commas when entering data takes time. Difficult to catch errors 3. Fixed format - data is entered in preset horizontal positions 214222 112572 121421 311383 (1) Which of the following best describes your race? 1 1. White 2. Black 3. Asian 4. Other (2) Which of the following best describes your sex? 2 1. Male 2. Female
(3) Which of the following best describes your marital status? 3 1. Married 2. Separated or divorced 3. Widowed 4. Never married (4) What is your age? 4-5 (5) What is the highest level of education you 6 attained? - Advantages: 1. Less high school degree 2. High school degree 3. BS, BA, or higher (iii) Computer can easily handle data in fixed record format. No limits are placed on the size of the database. Easy to enter numbers - Disadvantages: Often requires more work once data is entered Difficult to work with raw data Other notes on precoding - assign case numbers - leave space in spreadsheet for the maximum number of questionnaires - pretest the questionnaire - before collecting any survey or entering data, one should also pretest the database, which should include the entering of preliminary data and performing mock analyses - prepare a codebook, which lists all variable names, response codes, and format codes
EDITING Part of the data preparation that involves assessing the completeness of surveys and preparing them for data analysis When the questionnaires are returned to the research/clerical staff, several steps need to be undertaken Record date of collection and/or completion of interview Record on the survey a number indicates order of receipt/completion (iii) Maintain same procedures throughout the collection process for handling completed questionnaires Two basic tasks are necessary in the editings - sight-editing: checking for the completeness and accuracy of the received questionnaires - post-coding: assigning values to any unstructured questions that could not be pre-coded
Sight-editing First task is to spot check all questionnaires for glaring problems Second task is to judge completeness, which proceeds in two steps quick glance: if blank or barely filled out, discard if largely answered, a thorough examination into usability is necessary; some notes: - set criteria for useable surveys before study - recognize that some missing data can be tolerated - any time you discard the survey, the chance of non-response bias or termination bias increases - questions may be completed but marked in the wrong place Final task is to check the branching instruction - use a key so you can highlight the places where branching could be misunderstood - answering some questions inappropriately does not render the rest of the survey useless Postcoding While most of the survey was pre-coded, some information will likely be left to post-coding
Post-coding may be done along with the sight-editing or on its own (if the post-coding task is complex) In any case, post-coding uses a sample questionnaire that highlights the necessary portions for attention Enter new codes for answers that have no precodes and add this information to the codebook Postcoding will be required in three cases: a. purely open-ended questions b. branching renders questions inapplicable c. when a question allows one to fill in details where other is a potential answer Illustrative example - For (a): 1. Do you think that your employee treats you fairly? 1. yes 2. no 2. If no, briefly explain in what way you were treated unfairly: Aside Why might we see such an open-ended question? For questions of lesser importance, survey designers save some space by using less-structured questions. For some questions, you want as much detail as possible; could be looking for colorful quotes
(iii) Sometimes others have designed a survey and you are left to use their questionnaires and pre-codes. When post-coding this data: - design a unique code for every possible answer - ensure that one person handles the post-coding for each individual question - in practice, one decides on possible postcodes by the end of the first 50 questions
- For (b) 1. Do you qualify for health insurance from your employer? 1. Yes 2. No 2. If so, are you a member of the XYZ HMO? 1. Yes 2. No 3. If enrolled, please rate your level of satisfaction with each of the following aspects of your health insurance plan on the following scale..(likely more instruction) Extremely unsatisfied 1 2 3 4 5 6 7 Extremely satisfied a. General attention to your care provided by XYZ HMO b. Choice of physicians afforded under XYZ c. Level of care received d. Amount of employee contribution - For (c): 1. Which of these best describes what attracted you to this job? Check one. 1. Pay 2. Benefits 3. Challenge 4. Fits training 5. Location 6. Other [Please specify: ]
Final note on post-coding: - leave significant space in the codebook for question responses that require additional codes - when choice is between narrow and broad response categories, researchers should choose narrow classifications Preparation of final codebook Critical that the codes are clearly listed If possible, maintain codebook as a word processing programming that allows for constant editing and adjusting Failure to list a code in book can be difficult to detect and lead to research being unreliable
Documents for data entry 1. Spreadsheets the simplest and easiest mode of data entry - each variable listed in the codebook corresponds to a column in the spreadsheet - each row of spreadsheet should correspond to a single questionnaire - the case identifier should be listed in the first column of the spreadsheet 2. Text files generated in word processing programs like Word; also, Excel can generate as well - free-floating databases (comma delimited) or fixed format databases are created as text files - most statistical packages read text files only - for larger databases, one will need to find word processing programs that can handle large amounts of data - some notes on generating fixed format files Data can be saves as a text file in a format referred to a ASCII Codebooks must specify the position of the information for the variable in the database
Computer database editing Once all data is entered into computer database, one must check for several types of errors - missing records or records entered twice - variables exceeding or below accepted values - positions in the database are missing information Checking for problems with spreadsheet databases - To check for missing or multiple records, sort data based on case id where possible - To check for variables above or below allowable ranges, calculate maximum and minimum values; if there are values that exceed ranges, sort columns to find values - To check for missing data, sorting will leave blank spaces at the top Be sure to back data up somewhere other than a PC hard drive