Intelligent Use of Metadata in the Questionnaire Design Process Karen BRANNEN Centre for Educational Sociology University of Edinburgh St John s Land Holyrood Road Edinburgh EH8 8AQ United Kingdom e-mail: K.Brannen@ed.ac.uk Abstract: IQML (A Software Suite and Extended Markup Language (XML) Standard for Intelligent Questionnaires) is a project funded by the EU. Modules will be produced for metadata maintenance, questionnaire designer, questionnaire presentation, database interrogation and survey administration. This paper will discuss the main points of innovation used in the questionnaire designer module: capturing and storing the data used within the questionnaire design process as metadata within a repository for re-use later; the ability to design any questionnaire at a conceptual level and have this design realised in different media; questions stored in question banks. Additionally, different approaches to questionnaire design and different stages within the process will be discussed. Finally the paper will give an overview of the other modules in the IQML system and demonstrating how they interact. Keywords: metadata, intelligent questionnaires, questionnaire design, surveys, question re-use, question banks, metadata repository 1. Introduction Within the questionnaire design process there is a great deal of information that is used to realise the goal of producing a questionnaire and subsequently discarded. This data includes structural, contextual and semantic information as well as validation and navigation rules. However, much of this data can be used as metadata further on in the statistical process for production of datasets and documentation etc. This is one element of intelligence which is inherent in every questionnaire but is very often not used. Another interpretation of intelligence is the tools given to the questionnaire designer, one aspect of which is re-use of questions stored in question banks. An intelligent tool also links the questions to underlying concepts and to the variables underlying those concepts. This paper discusses the main points of innovation used in the IQML system, particularly within the questionnaire design module. The first is that of capturing and storing the data used within the questionnaire design process as metadata within a repository for re-use 155
Karen BRANNEN later within the statistical process. The second is the ability to design any questionnaire at a conceptual level and have this same design realised in different media. The third is the ability to re-use questions which have been stored in a question bank in the metadata repository. In addition the paper discusses different ways of approaching questionnaire design based on the needs of different types of users and different types of questionnaires. Furthermore, the paper describes the two clear stages within the questionnaire design process development and specification. In IQML, we define development as the complete conceptual design of the questionnaire and specification as the implementation of a questionnaire which has already been designed at the conceptual level. Finally the paper places the questionnaire design process in the broader framework by giving an overview of the other modules in the IQML system and demonstrating how they interact. 2. The IQML project IQML 1 (A Software Suite and Extended Markup Language (XML) Standard for Intelligent Questionnaires) is a project funded by the EU under the Framework 5 IST programme. It s goal is to automate and integrate the data collection process by producing software to support and implement emerging metadata standards using definition tools such as XML. The aims of the project will be achieved by developing five related software modules (metadata maintenance, questionnaire design, questionnaire presentation, database interrogation and survey administration) and by contributing to specification standards. These five modules are described in a little more detail later in the paper. The project contributes to standards by participating in the development of the Common Warehouse Metadata Metamodel (CWM) of the Object Management Group (OMG). This ensures that emerging models in the domain of object analysis can be used to define the structure, behaviour and visualisation of a statistical questionnaire including its relevant metadata. Furthermore, the resulting DTD and supporting XML software will demonstrate the benefits of both XML and object technology for administrations, enterprises and other organisations in the context of intelligent questionnaires. The resulting software will be demonstrated to the wider community by using it directly with the databases of six of the largest financial institutions in Ireland, and six SMEs using the Internet solution. By the end of the project, the software will also have been demonstrated at an intensive workshop aimed at members of NSIs and candidate countries. The Centre for Educational Sociology (CES) at The University of Edinburgh co-ordinates and manages the project. The bulk of the technical work is split between four partners: the University of Edinburgh (questionnaire design), Dimension EDI (metadata maintenance), DESAN Marktonderzoek (survey administration) and Comfact AB (questionnaire 1 http://www.epros.ed.ac.uk/iqml 156
Intelligent Use of Metadata in the Questionnaire Design Process presentation and database interrogation). Two National Statistical Institutes (the Central Statistics Office of Ireland and Statistics Norway) contribute to user needs and carry out the prototype testing. The National Statistical University of Athens provides knowledge of statistics, and acts as a channel of communication with users in Greece. This paper discusses the main points of innovation used within the questionnaire design module, focussing particularly on the theoretical and conceptual aspects of the questionnaire design process. 3. Interpreting intelligence What do we mean by an intelligent questionnaire? In this paper the focus is on how intelligence already available within the questionnaire design process can, and indeed should, be exploited. Further, how intelligence, especially in the form of a tool, can be used to aid the questionnaire designer throughout the whole process. 3.1 History of CES From 1976 to 1993, the CES conducted the Scottish Young Peoples Survey first on behalf of the Social Science Research Council (SSRC), and later for the Scottish Office. During this time considerable expertise in the design, conduct and analysis of educational surveys was gained. The Centre wrote its own software to support these activities. Questmast [1], a questionnaire design tool, was developed in the Centre in 1981. In its original form it was used for CES surveys for 10 years, running on the University mainframe, and later was translated into a PC version with a GUI interface. Questmast allowed users to define questions and group them into related questionnaires. Camera ready output was produced for printing, and the programme also exported an SPSS set up file for the resulting data. Later, it also exported a relational database schema. In the late 1980s, the output from the Questmast program was linked to a survey metadata system which included management of the databases and production of the documentation [2]. The experience gained from writing this software and observing its flaws has been vital for our input to the questionnaire design module of the IQML system. Since we were involved in the whole process of survey design and implementation (including questionnaire design, survey administration, data collection, database creation, data documentation and analysis) we were able to make some observations about the use of metadata throughout the process. For instance, there were cases where the same metadata were requiring to be retyped for a different part of the process. We believe that, although there are many questionnaire design packages on the market, there are very few, if any, which deliver the functionality of capturing all of the metadata once and store it for subsequent use. This has been the ideal of statistical information systems for many years but has not yet been realised [3]. 3.2 Data used as metadata The following figure (Figure 1) shows an example of a relatively simple question which was recently used during a project currently running at CES [4]. All of the text is the 157
Karen BRANNEN semantic metadata which can be used in the design of the database and documentation of the survey. However, during the questionnaire design process, this is simply the data which is required in order to produce a questionnaire. Furthermore, there is a set of data which may be unseen in the published questionnaire but which should be captured as metadata for subsequent processing. Some examples of this are Routing information (e.g. whether this question is skipped by a filter) Validation rules (e.g. whether the answer to this question is used to validate any other) Sequencing (e.g. identifying which question comes before and after this one). The ultimate goal of a questionnaire is to collect data for some form of analysis. All of the previously mentioned metadata is vital if the data is to be interpreted in the correct way. 4 How would you describe the involvement of staff in your school in the Higher Still Development programme in any of the capacities listed in Q.3? Please tick one box No staff involved A few staff involved Some staff involved A lot staff involved Figure 1: an example question The ability to capture such metadata at the first moment it is used and store it for subsequent use is one element of intelligence. 3.3 An intelligent tool 3.3.1 Re-use of questions in question banks Intelligence within the questionnaire design process can also be seen as the intelligence of the tools provided to the designer, one aspect of which would be tools to allow re-use of existing questions. Very often, the questions asked are very similar, or identical, to questions which appear either elsewhere in the current questionnaire or are asked in another questionnaire possibly at another time-point. One of the major innovations within the IQML system is the storage of question banks within a metadata repository [5]. In this way, questions will either be available from a question bank belonging to someone else for browsing and copying, or from one s own question bank for browsing, copying and saving. 158
Intelligent Use of Metadata in the Questionnaire Design Process 3.3.2 Linking questions to concepts and variables Another aspect of an intelligent tool is the provision to the questionnaire designer of the ability to link the questions to an underlying concept. The concept is the main underlying construct or indicator which the questionnaire designer is trying to identify. Generally, the questionnaire designer starts with an idea of the research question or hypothesis which requires to be answered [6]. This main hypothesis is broken down into a series of concepts into which the questions can be collected. The concepts are in turn broken down into a series of variables which taken together can be used to describe the concept. A classic example of this is the construction of some kind of social class indicator. There are usually a number of questions (eg description of job, whether self-employed, size of business), the answers to which are taken together to form a further variable describing social class. A concept can therefore be seen as a hierarchical structure, each branch of which ends in a single variable, each of which should be linked to a question. 3.3.3 Using different media A further point of innovation within the IQML system is the ability to carry out a design of questionnaire within the system and then to see this same design realised in several different media. How a questionnaire is delivered to a respondent has changed along with technology. In the past, the only method available was paper. This still has a place and is still widely used. However, there are now many other media which can be used. For instance, computer-aided telephone interviewing, computer-aided personal interviewing and delivery via the web. Each of these has a place and it is conceivable that several different media be used in a single survey design. For instance, the main survey could be carried out on paper but telephone interviewing may be used in the case of non-response. In the IQML system, the user will be able to design the questionnaire conceptually and then present it using different media. 3.3.4 Question types In the IQML system, every question in a questionnaire has a type. The use of question types is desirable for a number of reasons: it helps keep the styles of the questionnaire consistent it gives the user a type of shorthand it maintains consistency with the data to be captured. At first sight, a question type may be treated syntactically, however, there are some underlying semantics which can be captured. There are three underlying concepts which define a question type: data type, response type and sub-questions. The data type will be that of the resulting answer (eg integer, string, date). There is also the type of response expected. This can be classified as eg Single a single response to a question (eg number of children) Choice the respondent must choose one of a number of possible responses Multiple the respondent can choose any of a number of possible responses. A single question on a page can be seen as having several sub-questions. For instance, Figure 2 shows a single question which actually asks 3 separate questions with the same response categories. 159
Karen BRANNEN 5 Before Higher Still, did your school offer Please tick one box for each line GSVQs Yes No If yes, how many School Group Awards NC clusters Figure 2: an example of a more complex question In some cases, the expected data type of the response changes (see Figure 2) in which case, the question has several chunks. In the figure, the data type of the third column differs from the other two since it expects a number to be entered rather than a tick or a check. A chunk is therefore a part of a question which shares a uniform response data type. Once question types have been identified, they are available to the user who can specify which question type a question belongs to [7]. The use of question types also determines the nature of the related variables, thereby using intelligence in the construction of the database for later data capture. 4. Questionnaire development 4.1 Approaches to design There are different approaches to the development of a questionnaire which are based on the needs of different types of user and different types of questionnaire. The expected users of a questionnaire design module of IQML are, in the main, Statistical Institutes, Academic researchers and Market Researchers [8]. Cross-cutting types of users are the types of questionnaires which these users produce. These vary from very short and simple (no routing) questionnaires being distributed to a small number of respondents up to long and complex (lots of routing) questionnaires being distributed to a large number of respondents several times over a period of years. Obviously, everything in between is also covered (e.g. short questionnaires to many respondents or long questionnaires distributed just once to a small number of respondents). The combination of type of user and type of questionnaire will influence the way in which questionnaire design is approached. If a short, simple questionnaire is being designed then the user will not want to start with design of concepts they will simply want to design a few questions and put them in the order they require as quickly as possible. However, if the user is designing a large study with several different types of lengthy questionnaire with lots of routing then they will require a different level of support. The use of question banks and design of concepts will be vital for this user. 160
Intelligent Use of Metadata in the Questionnaire Design Process 4.2 Stages of design Linked with the above discussion on types of user and types of questionnaire is the concept of stages of questionnaire design. In the IQML system we have identified two clear stages of designing a questionnaire. These are defined as development and specification. In the development stage, the designer will carry out the complete conceptual design of the questionnaire, starting with designing the concepts and continuing on down to the definition of variables. The question types to be used, overall style, classifications etc may also be defined at this stage. The questionnaire will then be specified by actually constructing the questions using the previously defined question types and apply style and navigation. There are some cases where the questionnaire designer is handed a design of a questionnaire already sketched out, perhaps by hand and asked to implement it. In this case, only the questionnaire specification stage needs to be done. 5. The IQML system The Questionnaire Design module, as previously discussed, enables the user to design and manage questionnaires which can be deployed using the other software modules of the suite. The tool allows the user to define questionnaires at a number of levels: conceptual, logical, and formal. Attention is paid to requirements of different types of respondent (business and individual), and to the different types of surveys (e.g. economic or social) that may be addressed. The questionnaire design tool captures all relevant metadata and stores it in the metadata repository. The Questionnaire Presentation tool renders the questionnaire for use with PCs and in particular with web browsers. XML support for the presentation, validation, navigation and calculation will be implemented by the tool. This allows users to fill in the data and validate it as appropriate. The first trial of the IQML project involved a trial of the Questionnaire Presentation Tool in order to prove the concept. It was piloted in the field in two applications: Balance of payments by CSO and financial and servicing data from local authorities by SSB. The trial was completed successfully and an evaluation report was written, peer-reviewed and forms deliverable 5 of the project [9]. The Database Interrogation tool supports the extraction of data from popular databases and maps these data to the XML. It also allows data to be extracted from the XML and loaded into a database. Once configured, this will support the automated loading and extraction of data to and from databases and the electronic questionnaire. The Survey Administration package allows the questionnaires to be integrated with registers and sample frames. It tracks the despatch and receipts of questionnaires and software to individuals and organisations. Sitting at the heart of all of these modules is the Metadata Repository which supports the definition of metadata objects that can be used in a questionnaire. APIs are being developed to store and access these metadata objects. The product allows questionnaire design systems and other software to access the metadata without the need to know the 161
Karen BRANNEN underlying structure or source of the metadata by implementing object interfaces that follow international standards. 6. Conclusions In conclusion, intelligence should be used right the way through the questionnaire design process and into the subsequent stages of survey processing. Intelligence within this process can be in many forms. The questionnaire designer should be given a tool with enough intelligence to allow any type of user to design any type of questionnaire at the conceptual level and then realise that same design in many different media. At the same time, it should allow capture of as much metadata as possible and store it so that it can be used, not only by themselves for design of future questionnaires but also in the eventual interpretation of the resulting data. The use of a question bank is important to allow re-use of questions. The innovation in IQML is in the use of a metadata repository which allows sharing of metadata objects (including question banks) between modules of the IQML system and other software. IQML is contributing towards the realisation of the ideal of capturing all metadata, once, at the moment it is most appropriate, for use in the subsequent processing of the resulting data. References [1] Lamb, J.M. QUESTMAST: a package to aid the design and construction of questionnaires, Royal Statistical Society News and Notes, December 1983. [2] Lamb, J.M. Metadata in survey processing, Proceedings of the EUROSTAT Statistical Meta Information Systems Workshop, Luxembourg, 2-4 February, 1993 ISBN92-826-0478-0. [3] Sundgren, B. An Infological Approach to data bases, Statistics Sweden, Stockholm, 1973. [4] The Introduction of a Unified System of Post-Compulsory Education in Scotland (IUS), ESRC, April 2000 July 2003. [5] Nelson, C. The Affect of Standards on Software Component Architecture in Proceedings of the ASC conference: The Challenge of the Internet, edited by Andrew Westlake, Chesham, UK, 11-12 May, 2001, forthcoming. [6] Peterson, R. A. Constructing Effective Questionnaires Sage Publications inc., California, 2000, ISBN 0-7619-1641-5. [7] Lamb, J.M. Formatting Questionnaires the Questionnaire Design Module, presented to IQML Fourth Project Meeting, Den Dolder, Netherlands, 28 30 March, 2001. [8] Pagrach, K., Rutjes, H., Tjemmes, R. Synthesised description of user needs, Deliverable 4 of the IQML project, 2001. [9] Folkedal, J., Hoel, T. Evaluation report of Trial 1, Deliverable 5 of the IQML project, 2000. 162