A support tool for composing questionnaires in social survey data archive SRDQ

Similar documents
Lesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables

Setting up a basic database in Access 2003

KPN SMS mail. Send SMS as fast as !

2: Entering Data. Open SPSS and follow along as your read this description.

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY


Downloading <Jumping PRO> from Page 2

DHA (Dubai Health Authority) Document Specification: Volunteer Program Version 1.0

Memo. Open Source Development and Documentation Project English 420. instructor name taken out students names taken out OSDDP Proposal.

Nonparametric Tests. Chi-Square Test for Independence

AdHoc Translation Management System (TMS) Guide Contents

Egon Zehnder International. The Leading Edge of Diversity and Inclusion. 11th International Executive Panel October 2012

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

SPSS for Simple Analysis

Statistical tests for SPSS

Getting started with the Stata

Importing Data from a Dat or Text File into SPSS

A Quick Guide to Constructing an SPSS Code Book Prepared by Amina Jabbar, Centre for Research on Inner City Health

EXCEL FINANCIAL USES

Working with SPSS. A Step-by-Step Guide For Prof PJ s ComS 171 students

Designing a Graphical User Interface

2070 Work Life Balance Survey - Employees

OUTLOOK 2003: HOW TO GET OUT OF JAIL

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Installing a Personal Server on your PC

Teaching Strategies GOLD Online Guide for Administrators

SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY

After you complete the survey, compare what you saw on the survey to the actual questions listed below:

Tutorial Segmentation and Classification

Data exploration with Microsoft Excel: analysing more than one variable

Table of Contents. Event Management System Online Event Management System

Student Guide to Symplicity

Q1. Where else, other than your home, do you use the internet? (Check all that apply). Library School Workplace Internet on a cell phone Other

EARTHSOFT DNREC. EQuIS Data Processor Tutorial

Creating a Web Site with Publisher 2010

2012 uptimedevices.com

National Data Archive Application for Access to a Licensed Dataset

Favorite Book, Movie, and TV Show Survey

Register To Volunteer with Weave

Appendix III: SPSS Preliminary

Using Mail Merge: How to automate the distribution of a document to different destinations

COMMUNICATIONS... 1 COMMUNICATIONS...

INTRO120: Billing and Accounts Receivable Overview. Web Based Training

Patterns of Media Usage and the Nonprofessional

Analyzing Research Data Using Excel

Tobacco Questions for Surveys A Subset of Key Questions from the Global Adult Tobacco Survey (GATS) 2 nd Edition GTSS

The Chi-Square Test. STAT E-50 Introduction to Statistics

What's New in ADP Reporting?

Amerigroup Website User Guide for Providers: Provider Updates page 1

Using SPSS, Chapter 2: Descriptive Statistics

How To Use Exhange On Outlook On A Pc Or Macintosh Outlook 2007 On Your Pc Or Ipad (For Windows Xp) On Your Ipad Or Ipa (For Your Windows Xp). (For A Macintosh) On A

2015 CPRC CoA Annual Report

Stress and Job Satisfaction of Child Protective Services Workers

Excel Charts & Graphs

THRUST CURVE LOGGER V-4.200

UNIVERSITY TRAVEL EXPENSE REPORT STEP-BY-STEP INSTRUCTIONS

2Creating Reports: Basic Techniques. Chapter

2015 Medicare CAHPS At-A-Glance Report

Is it statistically significant? The chi-square test

How to standardize procedures JOB AIDS & STANDARD OPERATING PROCEDURES (SOPs) Oktober 2009

Greetings Keyboard Mastery Keyboarding Students! Teacher: Mrs. Wright

The AP-Viacom Survey of Youth on Education March, 2011

Evaluating Web Site Structure A Set of Techniques

The importance of using marketing information systems in five stars hotels working in Jordan: An empirical study

FirstClass FAQ's An item is missing from my FirstClass desktop

ADWR GIS Metadata Policy

SuccessFactors Learning: Scheduling Management

Compensation Survey Questionnaire. SOURCE: hrvillage.com

FIRST STEPS WITH SCILAB

StatCrunch and Nonparametric Statistics

Company Setup 401k Tab

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Customer Preferences of Hotel Information on Online Travel Websites in Thailand

The Parts of a Flower

Contract Number: GS 21F 0029W Contract Period: 11/19/2009 thru 11/18/2019

SPSS Workbook 1 Data Entry : Questionnaire Data

RISP Consultant RFEI Response Guide

Rational DOORS Next Generation. Quick Start Tutorial

Scatter Plots with Error Bars

SAP Business Intelligence ( BI ) Financial and Budget Reporting. 7.0 Edition. (Best Seller At Least 43 copies Sold)

ECDL. European Computer Driving Licence. Word Processing Software BCS ITQ Level 2. Syllabus Version 5.0

Window Glass Design 5 According to ASTM E 1300

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

RECOMMENDED JAVA SETTINGS

Mailing lists process, creation and approval. Mailing lists process, creation and approval

PROJECT MANAGEMENT PLAN CHECKLIST

EQUATIONS and INEQUALITIES

How To Optimize your Marketing Strategy with Smart WiFi

Health and Care Experience Survey 2013/14 Results for Arran Medical Group- Arran

ANGEL 7.3 Instructor Step-by-Step

Vodafone PC SMS (Software version 4.7.1) User Manual

WebShopApps MatrixRate Shipping Module

Business Statistics: Chapter 2: Data Quiz A

CALCULATIONS & STATISTICS

Functional Skills Mathematics Level 2 sample assessment

9 Calculated Members and Embedded Summaries

Access Queries (Office 2003)

Chapter 7 Section 7.1: Inference for the Mean of a Population

- 1 - Guidance for the use of the WEB-tool for UWWTD reporting

Transcription:

20th Workshop on Methodologies and Tools for Complex System Modeling and Integrated Policy Assessment (CSM2006) 2006.8.28-30: Laxenburg, Austria A support tool for composing questionnaires in social survey data archive SRDQ Graduate School of Information Science and Technology, Osaka University, Japan Norihisa Komoda, Shingo Tamura, Yoshitomo Ikkai, Koichi Higuchi

1.Social Survey Data Archive The social survey data archive is an archive that collects, storages and disseminates lots of social survey data such as Social Network Survey. Each survey data contains various types of items such as question items, dataset (answers of respondents), sample design, and papers/repots about the survey. Objectives Maintaining the quality of social surveys When composing questionnaires for new surveys, it is imperative to review question items and dataset of existing surveys for maintaining the quality. Effective use of existing data It reduces the need to conduct repetitious surveys for similar purposes. Thus large amounts of survey costs can be eliminated. Education The archive makes it possible to develop social survey methodology lessons using high quality survey data.

1.Social Survey Data Archive: SRDQ SRDQ : the Social Research Database on Questionnaires One of the most advanced social survey data archive in Japan. (http://srdq.hus.osaka-u.ac.jp/en) developed by Graduate School of Human Science, Osaka University in 2003.

1.Social Survey Data Archive: SRDQ SRDQ : the Social Research Database on Questionnaires Specifications Hierarchical textual data Searching system (string search) Dataset analysis system (crosstab, etc.) Subjects, sample designs, papers & reports of each survey are also stored 119 surveys 17,232 items Information Class Information Society Survey Subjects: Information, Social Psychology, etc. Question Items, Dataset, Papers & Reports. Social Network Survey Subjects: Information, Family, etc. Question Items, Dataset, Papers & Reports. Question item 1 Qeestion item 2 Simple string search of question items or surveys

1.Social Survey Data Archive: SRDQ SRDQ allows the direct analysis of dataset over the web pages Example: Crosstab analysis In this window, the row and column items of the crosstab are selected. Crosstab analysis alternatives of row item Alternatives of column item A B male female 30 20 100 50 30 male answered A.

Finally, An analyst the analyst selects pushes the variables crosstabs he/she for wants starting to use, analysis. then push >>. 1.Social Survey Data Archive: SRDQ How to execute Crosstab analysis by using SRDQ

47% of male use PC, while only 29% of female do. 1.Social Survey Data Archive: SRDQ The Result: Row: Gender Column: PC

2. Purpose of the Study To make SRDQ more useful, we planed to add a new function to help researchers in composing new questionnaires. Procedures to compose a new questionnaire: Decide the purpose and the design 3 man-month Intermittent discussion by research group members (approx. 10 researchers), Continue 3 12 months. Summarize existing question items 0.5 man-month Select exiting surveys or question items to compare with new ones 0.75 man-month Create new question items 0.75 man-month Decide the order of question items 0.25 man-month Search for related surveys A tool to support this process has been developed. Survey containing 200-300 questions

2. Purpose of the Study Procedures to compose a new questionnaire: Decide the purpose and the design Summarize existing question items In this process, Summary of Question Items is used. Select exiting surveys or question items to compare with new ones Create new question items A tool to support this process has been developed. Decide the order of question items

2. Summary of Question Items Summary of Question Items is a synopsis of similar question items included in particular surveys. Information Society Survey 2001 Information Society Survey 2002 q1. Do you use the q3.do you use e-mail on following items? your cell phone or PC a. E-Mail 1. yes 2. no b. Fax q45.do you use Home Page on your cell phone or PC f.home Page 1. yes 2. no Question Items Do you use the following items? E-Mail Do you use the following items? Fax Do you use the following items? Home Page ISS 2001 q1a q1b q1f ISS 2002 q3 q45 JGSS 2003 q22a q22b surveys Break down the question items to the minimum units (red underlined). And summarize the similar items/units. Trends of the question items and differences between the surveys become clear. SRDQ Searched with keywords and name of surveys

2.Support System for the Summarization Summary of Question Items Surveys Question Items Do you use the following items? E-Mail Do you use the following items? Fax Do you use the following items? Home Page ISS 2001 q1a q1b q1f ISS 2002 JGSS 2003 It takes approx. 1 week to process only 3 or 4 surveys manually. q3 q45 q22a q22b Goal The automatic creation of the summary that is sufficiently accurate to meet the demands of social survey specialists. And, the provision of the editing interface to correct the errors and to produce a final, completed summary in less time. Evaluation of accuracy: E = W * Non- items + Miss items ( W > 1 ) Number of rows includes detection errors should be under 10%

3.Overview of the System Input Surveys + Keywords (ex. Survey A, B, C, D + mail ) Search Target (several surveys) A B C D For similarity judgments of question items, Jaccard Coefficient is used. Output 19 question items about mail 1. How often do you use e-mail for each of the purposes listed below? 1.1 business communication a. every day b. 3 or more days a week 1.2 Personal communication with friends a. every day b. 3 or more days a week 1.3 Personal communication with family a. every day b. 3 or more days in a week A B C D q2 q2 q2 q3 q3 q15 q23

3. Similarity Judgments by Jaccard Coefficient Original Method Jaccard Coefficient: J = a / (a+b+c) a: number of common words between 2 question items b, c: number of words which appear in only 1 question items Similarity Judgment Calculate similarity for all combination of question items in target surveys The pair which has maximum similarity value will be judged as similar. (Repeat this step while similarity values are higher than the threshold) Q.A1 Q.B1 Q.C1 Q.A1,Q.B1 Q.C1 Q.A2 Q.B2 Q.C2 Q.A2 Q.B2 Q.A1 Q.B1: Maximum similarity Q.C2

3.Difficulty of Similarity Judgments 1. Partial match in juxtaposed words Survey A Survey B 1. Treat juxtaposed words as a group 2. Almost all words are same except one core word, but the intended purposes of the questions are different. Survey A Survey B How often do you do the things on this list? Practice flower arranging, tea ceremony, or calligraphy Do you practice cooking, sewing, or calligraphy? How often do you use e-mail for personal communication with friends? How often do you use e-mail for personal communication with family? 3. Different expression, but asking the same thing. Survey A Survey B 2. Apply a penalty if core words don t match Do you perform following actions in your everyday life? Reuse bathwater for laundering to conserve water. Do you try to do things in this list? Saving resources such as water. Non- Miss Non 3.Apply neighborhood bonus for word matches

3.Similarity Judgments (1/2) New similarity measure which uses structural characteristics of surveys 1. Treat juxtaposed words as a group Juxtaposed words can be viewed as a group If one or more words matches in juxtaposed words, treat those words as a group and ignore unmatched words when calculating similarity 2. Apply penalty if core words don t match For pairs of similar question items within one survey, if only a few words differs, that words are recognized as core words. Detect core words before calculating similarity, and decrease similarity value if core words don t match. Survey A Q1-a. How often do you use e-mail for each of the purposes? communication with family Q1-a. How often do you use e-mail for each of the purposes? business communication core words Survey B Q1-a. How often do you use e-mail for each of the purposes? communication with friends Don t Match Under specific conditions, values of existing Jaccard coefficient are adjusted. If a pair within one survey has similarity value higher than 0.6, un-matched words are recognized as core words. Penalty

3.Similarity Judgments (2/2) 3.Apply neighborhood bonus for word matches There is significance to the order of the question items. Question items having the same meaning tend to be arranged in the same order. Increase similarity values if highly similar pairs are found in the neighborhood Survey A Q7. Do you perform following actions in your daily life? 1.Turn off lights not in use 2. Reuse bathwater for laundering to conserve water. Question items in the same hierarchical positions High similarity value Survey B Q2. Do you try to do things in this list? 1. Always turn off lights not in use. a. yes b. no 2. Saving resources such as water. a. yes b. no Survey C High similarity value Survey D 1. Do you use e-mail on your pc? 1. Do you use e-mail on your cell phone or pc? 2. How often do you use e-mail? 2-a. Gathering info for daily life Bonus 2. How many times do you send/receive e-mails? 2-1. To get info about everyday life

4.Evaluation of Similarity Judgments (1/2) Compare correct result manually prepared with result using proposed measure and result using Jaccard coefficient only 36 question items about environmental protection (from 3 surveys) Threshold value of similarity judgments T = 0.6 Errors Miss Jaccard 8 7 1 22 Penalty: 0.5, Neighborhood bonus : 0.3 E Rows Rows contain errors 30 5 Proposed 2 0 2 2 22 2 T = 0.5 Errors Non- Non- Miss Jaccard 6 5 1 16 E Rows Rows contain errors 28 5 Proposed 2 0 2 2 22 2 Non-detections are more problematic than miss detections Evaluation: E = W (3 = number of surveys) * Non- + Miss Non-detection: a pair was judged as not similar while it should be judged as similar Miss detection: a pair was judged as similar while it should be judged as not similar

4.Evaluation of Similarity Judgments (2/2) 113 question items about Leisure (from 10 surveys) Penalty: 0.5, Neighborhood bonus : 0.3 Threshold value of similarity judgments T = 0.6 Errors Miss Jaccard 36 35 1 351 E Rows Rows contain errors 70 17 Proposed 19 15 4 154 50 7 T = 0.5 Errors Non- Non- Miss Jaccard 35 33 2 332 E Rows Rows contain errors 67 15 Proposed 10 1 9 19 43 3 Evaluation: E = W (10 = number of surveys) * Non- + Miss Non-detection: a pair was judged as not similar while it should be judged as similar Miss detection: a pair was judged as similar while it should be judged as not similar Non-detection & miss detection are reduced, and thus E is improved Number of rows containing detection errors is under 10% The efficiency of the proposed method has been confirmed.

5.Editing Interface The prototype tool has been developed. The editing interface is build as CGI script.(perl). scrolling total 10 surveys scrolling Click to open an editing window Select item to move Specify the destination Possible miss detection: exceeds the threshold but the value is close to the threshold value 0.5~0.6 Possible non-detection: does not exceed the threshold but the value is close to the threshold 0.4~0.5

5. Editing Interface Moved an item to a new row (the last row) to correct a detection error. scrolling Survey E total 10 surveys scrolling Possible miss detection moved Possible non-detection

5.Evaluation Test of the Editing Interface Evaluation test: compare the time taken to create the summary by hand with the time using the proposed system / interface. Material: 113 question items about Leisure ( from 10 surveys ) Contains 1 non-detection and 9 miss detection ( T = 0.5 ). Manual Proposed System Time taken to create a correct summary 3 hours 20 minutes view & check the question items 15 min. move the items to correct errors 5 min. Possible miss detection Possible non-detection 3 rows contain detection errors 10 question items are moved Possible miss detection: 6 items Possible non-detection: 22 items (All detection errors are displayed as these possible error )

6. Conclusions Using structural characteristics of social survey questionnaires, we have developed a support tool for generation of the summary of question items. The proposed method is capable of automatically creating the summary that is sufficiently accuracy to meet the demands of specialists. With the man-machine interface system, final and completed summaries can be generated in less time than manual means.

20th Workshop on Methodologies and Tools for Complex System Modeling and Integrated Policy Assessment (CSM2006) 2006.8.28-30: Laxenburg, Austria Thank you for your kind attention. Really? Great! Doubtful! Check detail.