Acronym: Data without Boundaries



Similar documents
Regulations for Data Access

Research Data Centre network for transnational access - four years of experiences by seven European RDCs

DwB - WP4 Improving Access to Microdata

Archives as full partners in transnational micro-data access. - UK and France examples

Cloud Computing Survey Perception of the companies. DPDP - Macedonia

Working Paper Series of the German Data Forum (RatSWD)

Methods Commission CLUB DE LA SECURITE DE L INFORMATION FRANÇAIS. 30, rue Pierre Semard, PARIS

Partnership Satisfaction & Impact Survey

Performance Management Guide

You've just been assigned to a team with

How To Monitor A Project

Evaluation of degree programs. Self-Evaluation Framework

Higher Education Information Systems and the Agency of Science and Higher Education

INTERNATIONAL COMPARISONS OF PART-TIME WORK

Work based learning. Executive summary. Background

Factors Affecting the Completion of Post Graduate Degrees using Distance Mode T. S. V. De Zoysa, The Open University of Sri Lanka

Week 3. COM1030. Requirements Elicitation techniques. 1. Researching the business background

Performance Management Development System (PMDS) for all Staff at NUI Galway

FINANCIAL INCLUSION: ENSURING ACCESS TO A BASIC BANK ACCOUNT

Monitoring and Reporting Drafting Team Monitoring Indicators Justification Document

RATIONALISING DATA COLLECTION: AUTOMATED DATA COLLECTION FROM ENTERPRISES

Project Outline: Data Integration: towards producing statistics by integrating different data sources

Catalogue of services Microdata Services

HMRC Tax Credits Error and Fraud Additional Capacity Trial. Customer Experience Survey Report on Findings. HM Revenue and Customs Research Report 306

TRAINING NEEDS ANALYSIS

Utilising Online Learning in a Humanities Context

European Migration Survey

(Refer Slide Time: 01:52)

Basel Committee on Banking Supervision. Working Paper No. 17

Mobile multifactor security

Table of Contents. Excutive Summary

Urban Big Data Centre. Data services: Guide for researchers. December 2014 Version 2.0 Authors: Nick Bailey

Economic impact of regulation in the field of liberal professions in different Member States

The AR Factor. The economic value of Accounts Receivable Finance to Europe s leading economies October 2011

Success in Change. Anabel Houben Carsten Frigge C4 Consulting GmbH. Representative Survey on Success and Failure in Managing Change

Cloud for Business Managers in Midsize Organisations: the Good, the Bad & the Ugly

Guidelines for Applicants

Analyzing Research Articles: A Guide for Readers and Writers 1. Sam Mathews, Ph.D. Department of Psychology The University of West Florida

Development, Acquisition, Implementation, and Maintenance of Application Systems

Draft Resolutions Proposed by the Website Working Group. Proposer: Information and Privacy Commissioner for British Columbia

1 INTRODUCTION TO SYSTEM ANALYSIS AND DESIGN

Draft guidelines and measures to improve ICT procurement. Survey results

Guidelines for Applicants

Position of leading German business organisations

Procurement Programmes & Projects P3M3 v2.1 Self-Assessment Instructions and Questionnaire. P3M3 Project Management Self-Assessment

THE VALUE OF SOURCE-TO-PAY SUITES MODEL

Opinion and recommendations on challenges raised by biometric developments

Report on impacts of raised thresholds defining SMEs

Using Administrative Data in the Production of Business Statistics - Member States experiences

Complying with the Records Management Code: Evaluation Workbook and Methodology. Module 8: Performance measurement

Online Reputation in a Connected World

IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise.

Strategic Plan

Project Plan DATA MANAGEMENT PLANNING FOR ESRC RESEARCH DATA-RICH INVESTMENTS

From Research Question to Exploratory Analysis

Scoping Study on Service Design

Analysis of Employee Contracts that do not Guarantee a Minimum Number of Hours

How are companies currently changing their facilities management delivery model...?

1. TERMS OF REFERENCE 1 2. INTRODUCTION 2 3. ACTION ITEMS 7 4. SUPPORTING COMMENTS ON THE ACTION ITEMS LAWYERS AND LEGAL ADVICE 19

The certification process

FBF position paper on the European Commission's proposal for a Directive on bank accounts ****

USING DIRECTED ONLINE TUTORIALS FOR TEACHING ENGINEERING STATISTICS

PhD Quality Enhancement Project

Data for the Public Good. The Government Statistical Service Data Strategy

POLICY AND PROCEDURES FOR THE ACCREDITATION OF PRIOR LEARNING (APL)

Data quality and metadata

Exit Interview with Dean, Department Chair and Leadership Team: March 22, 2012

CP14 ISSUE 5 DATED 1 st OCTOBER 2015 BINDT Audit Procedure Conformity Assessment and Certification/Verification of Management Systems

Common security headaches. Common security headaches and how to avoid them. PineApp.com

ICT MICRODATA LINKING PROJECTS. Brian Ring Central Statistics Office

Customer Guide Helpdesk & Product Support. [Customer Name] Page 1 of 13

Introduction The Co-Motion Data Management Plan draws upon guidance from the Centre for Housing Policy s Data

Postdoctoral Researchers International Mobility Experience (P.R.I.M.E.)

How To Use Big Data For Official Statistics

Qualitative methods for effectiveness evaluation: When numbers are not enough

FINAL WORKSHOP REPORT. IU21KT stakeholders, European Commission, Study Team

By Jack Phillips and Patti Phillips How to measure the return on your HR investment

Equity Release Guide.

Communication Plan. for the. ATLANTIC AREA Transnational Cooperation Programme

Attempt of reconciliation between ESSPROS social protection statistics and EU-SILC

Use of Consumer Credit Data for Statistical Purposes: Korean Experience

REDUCTION OF BUREAUCRATIC BARRIERS FOR SUCCESSFUL PV DEPLOYMENT IN THE EU KEY RECOMMENDATIONS

Evaluation of an Applied Psychology Online Degree Maggie Gale, University of Derby

ERASMUS MUNDUS GRADUATE IMPACT SURVEY

Integrating health promotion interventions for hazardous and harmful alcohol consumption into primary health care professionals daily work

COMMISSION OF THE EUROPEAN COMMUNITIES. Proposal for a RECOMMENDATION OF THE COUNCIL AND OF THE EUROPEAN PARLIAMENT

CESR Consultation Paper Standardisation and exchange trading of OTC derivatives FBF S RESPONSE

Rules for the PhD Programme at the Graduate School, Arts

Measures to Improve Accessibility of Public Websites in Europe

SOCIETY OF ACTUARIES THE AMERICAN ACADEMY OF ACTUARIES RETIREMENT PLAN PREFERENCES SURVEY REPORT OF FINDINGS. January 2004

The UK Reputation Dividend Study

MEHARI Overview. April Methods working group. Please post your questions and comments on the forum:

HIT Educational Programs Inventory Analysis Report Executive Summary Background Key findings Introduction/Context...

WORLD S TOP UNIVERSITIES THROUGH STUDENT EYES

INVITATION TO BECOME AN ASSOCIATE OF THE EDUCATION AND TRAINING FOUNDATION

Project Acronym: CRM ACCORD Version: 2 Contact: Joanne Child, Doncaster College Date: 30 April JISC Final Report CRM ACCORD

For an accessible and useful programme for all local authorities in the period

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

DoQuP project. WP.1 - Definition and implementation of an on-line documentation system for quality assurance of study programmes in partner countries

Cognitive Area Program Requirements 10/4/12 1

Transcription:

Project N : 262608 Acronym: Data without Boundaries STAND-ALONE DOCUMENT Analysis of Researchers' Needs re. Secure Access to Official Microdata WORK PACKAGE 4 Improving Access to Official Statistics Microdata DATE OF ISSUE OF DELIVERABLE: April 2015 DOCUMENT PREPARED BY: Partner 1 CNRS-RQ Combination of CP & CSA project funded by the European Community Under the programme FP7 - SP4 Capacities Priority 1.1.3: European Social Science Data Archives and remote access to Official Statistics

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n 262608 (DwB - Data without Boundaries).

PRELIMINARY STATEMENT This report is meant to complement and update the findings of a feasibility study on the organisational architecture for managing pan-european access carried out as part of the DwB deliverable D4.2. D4.2 is available online on the DwB website here. 3/65

TABLE OF CONTENTS PRELIMINARY STATEMENT... 3 INTRODUCTION... 5 PART 1 - SURVEY METHODOLOGY: A DOUBLE QUANTITATIVE AND QUALITATIVE APPROACH TO CAPTURE A SHARPER VISION OF NEEDS... 7 1. Preliminary Approach: A Business Case Analysis... 7 2. The Online Survey: A Quantitative Approach... 8 3. Individual and Collective Interviews: The Paris & The Hague Workshops and The Supplementary Questionnaire... 9 PART 2 - SOME CHARACTERISTICS OF ACCESS CENTERS, RESEARCHERS, TEAMS AND RESEARCH PROJECTS.. 11 1. The Surveyed Research Data Centers... 11 2. Researchers Profile... 12 PART 3 - RESULTS... 15 1. "Soft Needs": What Can Be Easily Done to Improve Current Way of Work... 18 2. Strong Needs and Researchers Requirements Regarding a Eu-RAN... 20 Reduce Waiting Time throughout the Process... 20 Access Points: No Travel and One System... 21 Anticipate a Potential Mobility between Institutions or Countries... 23 Research Environment... 23 Improve Documentation and Support... 24 Output Checking: An Essential Issue... 24 Merging Data: A Request For New Research Opportunities... 26 IN CONCLUSION: SOME GENERAL POINTS... 27 ANNEXES... 29 ANNEX 1: QUESTIONNAIRE... 30 ANNEX 2: WHO ARE THE RESPONDENTS?... 60 ANNEX 3: ADDITIONAL QUESTIONNAIRE... 61 4/65

INTRODUCTION The drive to meet researcher's expectations while ensuring confidential data protection was crucial in the process of extending in a number of countries secure access to data, initially on site then on a remote execution and remote access basis. In a similar manner, it is crucial that researchers' expectations should be taken into account to monitor the implementation of a secure European data network, making it possible to achieve complex research projects involving transnational access to confidential data stored in different sites across borders. In order to evidence the necessity of a secure data centers network and its feasibility, we first have to accurately identify to what extent the current development of transnational access with non-resident researchers travelling to the relevant country and getting access in situ to the requested data (United Kingdom), or allowed to use remote access and process foreign data from their regular workplace (France) leaves us with unfulfilled research needs. Such a network has also to be instrumental in providing researchers with the relevant facilities. This involves a full exploration of researchers' working methods and organization, without discounting the legal constraints and safety requirement of the centers where data are stored. In order to collect this type of information, we conducted an on line survey, and both individual and group interviews on a sample of researchers with some experience of secure remote facilities, that is having processed confidential data, mostly in DwB Research Data Centres partners. Between the time the on line survey was started, at the beginning of the project and the time the group interviews took place, at the end of the project, the reflection upon a possible architecture for a European Remote Access Network has improved, making it possible to present researchers with a more elaborate project to be discussed and help them assess more precisely the needs for such a network. Previously, various phases of a research project involving secure access to confidential data had been defined. The online survey has been designed to capture for each phase the difficulties and needs met by the researchers based on their actual experience mainly restricted to one single facility; group interviews allowed a more prospective approach: researchers were then expected to project into a different context where they could access different facilities from one single point of access. It is worth noting that researchers, when discussing certain obstacles pertaining to legal or privacy issues that would for instance make difficult merging datasets from different countries for running a single analysis, suggested mid-range solutions, somewhat less ambitious but in the short term more pragmatic than those that WP4 had been envisaged with a longer perspective in mind. We will first rapidly describe the various methods we used: on line survey, individual and group interviews, and their rationale, then, after having described the responding population and underlined some important characteristics for the analysis, we will present the results under two headlines. We will first identify what we label as soft needs, likely to be met relatively easily and eventually already dealt with in some facilities. These are relevant for all types of access, transnational or not, within the framework of a future network or in the current context and they are widely expressed by researchers whose experience is mostly based on national access. We also identified strong needs, harder to be met, considering the current design of access procedures and security constraints, 5/65

especially in the case of transnational access that is our main concern here. It is worth noting that a network meeting these "strong needs" would also be likely to satisfy needs which so far have not been dealt also with at national level. In conclusion, we will present a few users' cases reflecting the problems experienced by some researchers in the current situation; we will also address some major points for the design of a European Remote Access Network and the benefit of such a network for research activities. 6/65

PART 1 - SURVEY METHODOLOGY: A DOUBLE QUANTITATIVE AND QUALITATIVE APPROACH TO CAPTURE A SHARPER VISION OF NEEDS This was a three steps process; a preliminary approach aiming at defining the survey basis and its main assumptions concerning the problems experienced by the researchers, then a quantitative online survey which made it possible to weight their relative importance also investigating possible relations with researchers profiles (disciplines, types of projects, research institutions contexts), and a qualitative investigation, well adjusted to a more prospective approach concerning transnational access issues; this latter approach was determined by the fact that, given the current constraints, few researchers have had the opportunity to get any actual experience of transnational access. 1. Preliminary Approach: A Business Case Analysis Our major concern at first was to avoid an abstract mode of reasoning; we decided to start from a real situation, making it possible to identify the obstacles experienced by researchers whenever their project requires a secure transnational access to confidential data bases, stored in various facilities and in different countries. Though in progress (see WP3), transnational secure access remains complex and poorly publicized when available. It is therefore no wonder that complex research projects (involving use of multiple data bases from different countries) remain an exception; the very few projects of that kind happen to be authored by highly experienced research groups. No matter how rare they are, such situations do exist; researchers who designed them have invented solutions in a highly constraining environment that can be of use to anticipate the design of the future network. Such is the case of the research project we selected as a starting point: this project was conducted by a team of several partners based in 4 countries (already endowed with Research Data Centers) eventually members of DWB: France, the Netherlands, Germany and the United Kingdom. In the course of this research project, researchers use administrative micro data produced by the various national social insurance offices; these data involved individual data on wages and employment in each country. The project aims to provide new evidence on the effects of social security contributions using large administrative panel datasets in France, Germany, The Netherlands and the UK 1 using a micro-based across-country analysis. Further to an in-depth discussion with our team, the analysis which was provided by J. Grenet, one of the persons in charge of this project, allowed to identify three major problems hindering the achievement of this type of project. This analysis illustrating the researchers' perspective was presented by Grenet on the occasion of the 1st European Data Accesses Forum. The first problem pertains to information: lack of information on existing data, information at the variable level are rarely available, since we are dealing with confidential data; poorly documented metadata; language barrier: non existing or very partial translations. The second major issue pertains to the accreditation process required to get permission to access and process the data. Whenever a project involves 1 GRENET J. (2012), «Crossing Obstacles for a European Research Project: From a Business Case to an Ideal World, DwB First European Data Access Forum, Luxembourg. 7/65

several teams, several national data bases stored in various facilities and countries, this implies multiple accreditations, with procedures and forms specific to each country and facility, which have to be signed by all members, which is time consuming and not compatible with the agenda and the financial constraints of research conducted within the framework of funding agencies calls. Once these obstacles were overcome, even access to data as such differs across facilities and/or countries. The researcher has to deal with different access systems, different standards concerning output checking and anonymization regulations, not to mention the variable amount of access fees, impossible to predict at the start of the project. Let us also mention problems arising from the lack of compatibility between software programs and ultimately the long delay before obtaining exploitable results. Lastly, whenever research projects involve more than one country, remote access across borders may be forbidden, making it sometimes necessary to go on site for non residents; an even more serious issue may lie in the impossibility to merge data bases from various national sources (indeed, this would imply their transfer from one country into another) in order to run a single analysis, as opposed to several distinct cross tabs yielding different statistical results. The organizational structure of the team in charge of this project (several teams based in various countries), its working methods (need to work together at a certain point and, to compare the data), comparative nature of the project type (comparative), type of data used (administrative files) and the obstacles met, appear ultimately once the survey completed, as iconic of projects that a network of secure data centers makes it possible to develop. 2. The Online Survey: A Quantitative Approach The online survey conducted on a sample of researchers working in several secure data centers was based on this first preliminary study which made it possible to better identify the major points to be further explored for each phase of a research project demanding access to confidential data. Eight phases had been previously described for such projects: information - accreditation - data - access - support - output checking - feedback - project closure, largely confirmed by the analysis initially conducted on the business case. Though the business case evidenced issues related to the information and accreditation phase, however, these questions having been thoroughly explored in a precise way in other parts of the project (WP3 and WP7 and 8), we decided to focus the questionnaire on these 4 phases: Access - Data (processing) - output checking - support and surveillance. Interestingly, though they were not at the core of the on line survey, here again, the researchers spontaneously mentioned the importance of issues related to information and accreditation. The questionnaire, focused on the selected steps, was designed to further explore the possible impact of various technical modes and security constraints on data processing by the researchers, as they appeared in the description of several Research Data Centers (see deliverable 4.1). For each step, based on the business case analysis, the questionnaire tried to identify the more specific difficulties in the event of transnational access; moreover a few questions were targeted at transversal issues common to all phases such as the language issues. Detailed results of this investigation as 8/65

well as underlying assumptions will be presented further down. The questionnaire also made it possible for the researcher, to address other points more freely in a final part. Assuming that, in a European secure data network, researchers should be able to process all foreign data without having to travel abroad, we selected a sample of researchers having all experienced remote access, either from his desk or a specific room within his/her university or a specific center outside his university but at a reasonable distance. Though, in line with the overall project, we selected researchers having worked with remote access - which allows seeing the actual data and work freely until the final output - however a few researchers had also worked using the remote execution mode and were able to make some comparisons. The first part of the questionnaire made it possible to check the researcher's experience as regards secure access modes and to identify his/her institutional affiliation: research environment, field of research, nature of the research project (namely need for accessing data stored in different RDCs/countries), individual or collective working method, all characteristics to be compared to the greater or lesser intensity of experienced problems. The questionnaire with 57 questions (Annex 1) was sent to researchers by 5 European RDCs, partners of the DwB project, offering remote access solutions in France (CASD), Germany (IAB), the Netherlands (CBS), and the United Kingdom (SDS, ONS). The centers sent the questionnaire s weblink to their users either via their general mailing list or via a selection of researchers. Researchers completed the questionnaire anonymously and submitted it directly online to the survey design team. 90 researchers submitted a questionnaire but 65 were entirely completed and usable. Among the 65 respondents, 40% have been contacted by the Centre d Accès Sécurisé Distant aux Données (CASD) in France, 22% by the Research Data Centre of the German Federal Employment Agency at the Institute for Employment Research, 26% by an English system (20% Virtual Microdata Laboratory however not all questions were usable for this centre- and 6% Secure Data Service) and finally 12 % by the Central Bureau of Statistics (CBS) in the Netherlands. Obviously, this sample, deliberately centered on a specific population, cannot be viewed as representative; however, it provides us with converging pieces of information adequate to the aim of identifying, on the base of actual researchers' experience, mostly limited to national access, major points to consider in designing a secure European Data network. 3. Individual and Collective Interviews: The Paris & The Hague Workshops and The Supplementary Questionnaire In order to explore more prospective aspects concerning research projects based on multiple national data and conducted by teams located in various access points who need to work together across borders, we conducted a set of focused individual and group interviews. Thus, certain points insufficiently covered by the online survey were further explored: researchers' organization and working methods, division of labor within team (who processes the data?, who needs to analyze the output in the course of the data 9/65

processing phase without working on raw data?, who is only implied in the report writing process?). Among interviewees, few were implied in transnational projects involving several teams, which is of no surprise, considering the problems currently experienced to achieve this type of projects. This was also the case for the researchers who took part in group interviews; however, the group discussions was a great opportunity for researchers to describe, on the base of their current practice, the essential needs that should be taken into account to support team work in a secure network providing access to data bases of multinational sources. Eventually, group discussions also revealed some problems experienced by the team when working in a less complex situation, i.e. within a national framework, suggesting that the design of a network structure dedicated to transnational access might also provide solutions at the national level. On the occasion of two workshops, organized with the assistance of Kamel Gadouche (CASD) and Leo Engberts (CBS), two groups of ten researchers were formed, a first one in Paris at the Centre d accès sécurisé distant(casd) attached to GENES, a second one at the Research Data Center attached to CBS, the Dutch statistical institute. In both cases, workshop attendees were users of the data facility. After having described their research project and the framework in which it was carried out (individual or collective, each partner's role in the latter case), the researchers were asked to comment on the overall project, on the preliminary results of the on line survey, and to debate over the broad outline of the design of a secure European network (Eu-RAN). Further to these two workshops, a supplementary questionnaire focused on projects and teams typology was sent to the researchers, thus enabling them to revise their needs and to add comments. In a few cases, individual interviews were carried out by phone with researchers who had been unable to attend the workshop or in order to further explore some points of major interest. 10/65

PART 2 - SOME CHARACTERISTICS OF ACCESS CENTERS, RESEARCHERS, TEAMS AND RESEARCH PROJECTS Let us first present some information on researchers, teams and research projects, as revealed by this qualitative and quantitative investigation. There is obviously some link between problems and needs as expressed by interviewees and their personal characteristics which can differ widely from one researcher to another; or from a project to another. As already mentioned, few interviewees were currently involved in complex multinational projects due to the extreme difficulty in achieving projects of that type. Nevertheless, we are dealing with a population with some experience of secure access to confidential data, mostly from a single facility, but sometimes with some experience of other centers; we may reasonably assume that this population is quite similar to the population of future users of a Eu-RAN. Here are some results drawn from the on line survey. Workshops attendees are not exactly a subsample of the online survey sample: the researchers selected for these workshops were not necessarily part of the on line survey sample. However, they do not form a very different group since they were drawn out of the pool of users of these two facilities that provided the online survey sample. Graphs and tables below report the situation of the 65 respondents to the on line survey but we can assume they would be basically similar for the 20 additional researchers having taken part in the group interviews. 1. The Surveyed Research Data Centers Let us first recap the characteristics of the various Research Data Centers where the questionnaire was administered on a sample of users. All these centers have in common to provide remote access, however equipments and procedures are variable (see deliverable 4.1 on the state of the art regarding secure remote access centers in Europe); certain issues mentioned by researchers can be directly linked with these specificities and their importance may be weighted differently from one center to another. As mentioned above, 5 RDCs from 4 different European countries, all partners of the DwB project, took part in the survey: IAB for Germany, CASD attached to GENES for France, CBS for the Netherlands; in the United Kingdom two centers were involved: VML, affiliated to ONS and SDS, affiliated with UKDA. All of them provide remote access however with notable differences regarding earlier phases (accreditation procedure, available information), access system, data processing possibilities and constraints regarding points of access. Some are already of age, such as VML and especially CBS where for long many researchers have had the opportunity to work, others like CASD and SDS are more recent and were in a developing phase at the time of the survey. Some issues mentioned by researchers during the survey e.g. authentication problems can be related to this context and may have been solved since then. The institutional context of these centers is also of some relevance. Those attached to a national Statistical Institute, such as VML for ONS in the United Kingdom and CBS in the Netherlands, provide access to a wide range of datasets stored by the NSI. The field may be more restricted whenever the data center is a service of a ministry, which happens to be the case for IAB with its 11/65

longitudinal employers/employees databases. The situation is less predictable for centers under contract with data producers in order to provide secure access: such is the case for CASD in France, under contract with INSEE and increasingly with other official statistics producers (statistical services of various ministries in particular); for SDS attached to UKDA, the UK data archive, currently providing access only to several Secure Use Files from ONS. The accreditation procedures also vary. In the case of IAB for example, the accreditation procedure is placed under the umbrella of the access data center and general requires a short amount of time; conversely, the accreditation procedure may depend upon an external specific committee as this is the case for VML, SDS and CASD; this implies that the researcher while submitting his request of accreditation to the relevant committee, must simultaneously get in touch with the data center access and the data producer which are required to assess the project feasibility and give a positive response to the accreditation application. Moreover, when the screening process for accreditation is not continuous, but is performed on a quarterly basis, as it is the case for the Committee of statistical confidentiality in France, the delay is increased; therefore, it is no wonder that CASD users are likely to mention this problem more frequently than others. Conversely, if the facility does not use dedicated and integrated equipment as it is the case for CBS, the researcher is more likely to experience problems with software installation on his/her computer, requiring third-party help; on the other hand, whenever a dedicated equipment is provided, the researcher is deprived of his/her usual working environment and has to get familiar with the new equipment. Constraints vary widely from one facility to another, typically on those centers that serve as a basis for remote access. In the case of CBS and CASD (equipped in that latter case with dedicated equipment, SdBox sent by post) the researcher is allowed to access data from his/her university office also if affiliated with an institution of another European country; for all the other centers constraints are heavier: the researcher is bound to travel to an accredited center at national level; meaning also that for European researchers from other nation states they currently have to travel to the United Kingdom or to Germany to enjoy remote access. Note that for IAB, SDS will in the future been accredited as a point of access in UK for IAB data, as a result of the work conducted within DwB (including installation of an IAB server in SDS and signature of a contract with the University of Essex), thus saving the travel to Germany. Researchers dealing with those facilities where the constraints happen to be the strongest are obviously those who tend to consider travelling duties as a hassle. Let us note finally that researchers having had comparative experience with VML and more recently with SDS in the United Kingdom spontaneously report to appreciate greatly systems which allow the researcher to actually see the data. 2. Researchers Profile The breakdown by country of the 65 respondents affiliation institutions (see Annex 2, graph 1) is fairly similar to their distribution by access center; in most cases (9 out of 10) respondents are using an access center located in their academic home country; few of them have any experience of other secure data centers, and more rarely any experience of transnational access since in a number of cases, it would precisely involve traveling abroad. 12/65

In most cases (83%), the users were affiliated with academic and public research centers (see Annex 2, table 1). This point is likely to ease the implementation of a secure European network, which will run much better in the future once the accreditation authorities will have agreed on converging criteria in this matter. It is clear however that the specific organization of research in each country cannot be discounted: some countries leave more space for private institutions (11% of the researchers in our sample are affiliated with private institutions). As to the definition of who is, or is not, a researcher, there is some debate as we can see from the experiment conducted during the 1 st European Data Access Forum where research projects and researchers' profiles have been submitted to group of participants who were to declare if, in their opinion these profiles pertained to research activity or not. The questionnaire makes it possible to collect information on the researchers' institutional environment, profile, experience, all characteristics that may impact their opinion concerning the current framework but that are also relevant for the design of a secure European network able to embrace a variety of situations. The institutional research environment in particular is a relevant point regarding the differential capacity and willing of institutions to provide dedicated spaces for the access points or to provide financial support for access fees usually charged to the final user when a certain share of the infrastructure's investment and overhead costs are transferred to users (as this is the case for VML, CBS and CASD). We may thus assume that a single researcher, affiliated to a university with little demand for such services, is in a more difficult situation than other researchers attached to large research centers with high demand for confidential data. It may also impact the capacity of new users to benefit from their colleagues' experience. Two researchers out of three are economists, and one on four is a sociologist, demographer or geographer. The importance of the economists' community is obviously not a surprise: economists in general are more likely to consume large individual data bases than other disciplines in the social sciences. Moreover, econometric tools increasingly require highly detailed data, accessible only via secure data systems. However other disciplines are represented and one may assume that their importance will be increasing in the future. Their needs are somewhat specific, as revealed by some answers, however difficult to generalize. Geographers for instance are in need for highly desegregated spatial data, which pose specific problems in term of anonymization criteria concerning their outputs. The responding population appears as fairly experienced in terms of secure access: in most cases they conduct at least two research projects using such facility. These researchers are also familiar with the datasets; they may have previously worked on Scientific Use Files, which obviously facilitates working in safe mode, all the more that documentation is often incomplete. The datasets at stake may be business, as well as household or individual data; the proportion of business data is somewhat larger due to the fact that such data, considered as impossible to anonymize, are solely be accessed through secure facilities. 13/65

Some research projects would make use of several different data files. However, these datasets were mostly stored in a single access center: only 4 researchers declared they used two different access centers and out of 4, two declared they use the two twin centers in the U. K., VML and SDS, probably due to the recent creation of SDS allowing a more flexible access to certain data of ONS source. However, while working only on data stored in a single secure data access center, the majority of the respondents declare they work on their project with other researchers, (see Annex 2, graph 2), often affiliated with their own research institution. This "collective work" issue in the case of a secure access, which was of special interest to us, in order to assess the specific problems encountered by multinational research projects involving one or several teams, happens to be relevant even in the case of national projects. This reveals the type of problems encountered by research projects under the current framework and simultaneously allows us to extend our conclusions to more complex situations involving multiple partners and/or several teams projects, based on several data files stored in different places. The central question concerning teamwork was to find out how the work was organized and how the tasks were distributed. Only, one half of all respondents working in team with other partners declared that all members of their team had access to the data and could process them directly. Finally, 82% of the respondents, whether or not they were involved in collective work, declared that they had good control over their working organization. When this is not the case, it is mostly due to spatial problems and traveling constraints if the access point is not close to their regular environment (study or office located in their institution). 14/65

PART 3 - RESULTS As mentioned above, the questionnaire was structured along the different steps we had previously identified for a research project involving secure access to confidential data: eight steps i.e. information, accreditation, access, data processing, support, output checking, feedback and finally project closure, among which we retained those phases which are directly related to access systems: access, data processing, support and output checking. We can sort out the answers according to the greater or lesser degree of difficulties. The following table gives an overview of this classification for certain issues concerning each selected phase. 15/65

Access NO PROBLEM SOFT NEEDS STRONG NEEDS Training Authentication Location Equipment Hardware/Software Merging datasets X Best success of authentication More software available Best support to install No travel Its own work environment Single point of access would offer new research opportunities Working with data Output checking Organization of the work More flexibility Software More software Homogeneous practices Going back to the data X Storage X Delay X Could be shorter Documentation Improve documentation and support Delay Restrictions Additionnal formats Support and surveillance General opinion More formats Shorter Be able to discuss outputs with other researchers Human and reactive support Reduce waiting time Anticipate potential mobility with a more flexible system 16/65

Some points do not pose major problems or only problems that could be solved without compromising the delicate but necessary balance between meeting researchers' needs and the security demands specific to confidential data, either by adopting better practices to the satisfaction of everyone in some facilities, or at the minimum cost of making some investments that do not jeopardize the whole operation. Other issues raised by the researchers are more difficult in the current context, thus requiring different compromises and new solutions that will also meet the security demands. The construction of a secure European network must obviously take both minor and major issues into account but we will see that the needs most difficult to meet are also those that are the thorniest within the framework of a transnational network. Before screening these various points, let us first consider two results worth of attention. First of all, even though the questionnaire focused on these phases which involved directly the secure systems, the researchers on all occasions (i.e. the preliminary study or business case, the online survey and the group interviews) repeatedly mentioned the difficulties encountered previous to these phases, concerning firstly access to information and secondly accreditation. Here we are facing a sort of paradox, since conversely, the technical aspects of secure access are well accepted, even when it involves specific constraints concerning authentication as we will see further down. A specific concern concerning metadata issues when it comes to the Secure Use Files is about the need for details at the variable level. When Scientific Use Files and Secure Use Files are both available, it is sometimes hard for the researchers to rapidly identify which file they need and where they differ. The lengthy accreditation and access procedure is all the more difficult to accept than the researcher does not have access to a precise documentation or that repeated interactions with the producer are necessary to get hold of this information. The problem is not quite as acute for older (CBS) or more highly-specialized centers (IAB). The problem gets harder when researchers need to gain access to data (often administrative ones) that are not managed by the national statistical institute such as social security files or job-search files stored by specific government agencies that have less invested in metadata. Only a few respondents had some experience of transnational access but it is obvious that the difficulty is increased for researchers from other countries: not only are they less familiar with the datasets, but they also suffer from the language barrier when the metadata are unavailable in English. One may assume that part of the difficulties experienced within WP9 and WP10 to get research proposals was related to this deficit of information and language issues for no- resident researchers. Researchers also spontaneously mentioned other issues: difficulty to get information on the accreditation process, procedure involving cumbersome tasks whenever the team includes many researchers, lengthy delay to obtain it, back and forth movements of the application forms between the data producer who must give his authorization and the organization in charge of accreditation. The discrepancy between two different schedules is often mentioned: there is frequently a mismatch between the timing of appeal for proposals and the timing of the accreditation process. The seriousness of this issue is variable depending on procedure complexity; it is likely to be quite serious in the case of a transnational network, as evidenced by the obstacles met within the DwB Work Packages 9 and 10. 17/65

Conversely, several of the points mentioned concerning access (enrolment, authentication) and data processing do not raise important remarks. In three cases out of four, researchers declare they are satisfied with the data processing schedule; one out of four is unhappy because to his/her opinion the system runs too slow; this is likely to be fixed. A large majority (61/65) considers the storage capacity as appropriate. Only researchers in charge of very large projects express some dissatisfaction on that issue. Almost all researchers were able to resume data processing after having run their programs a first time; this is of major importance to go further into the details or to meet the referees' demands before publication. Whereas secure access solutions had sometimes been a source of concern in the research community, we observe that this working method was easily accepted all the more for remote access (as opposed to remote execution). We therefore may assume that the implementation of a secure European network will attract quickly, and increasingly so, its pool of users, in the same way as the national secure facilities did. Let us now consider, with more detail, firstly these issues that raise difficulties without representing a major obstacle and secondly some issues likely to be fixed with more difficulty, typically for a transnational network. 1. "Soft Needs": What Can Be Easily Done To Improve Current Way of Work As to the initial access phase, the questionnaire included questions about enrollment, authentication methods, places of access, and computer equipment. Let us note that a very large share of all researchers attended a training session on the technical use of the facility (47), on the legal aspects (44), on anonymisation and output procedures (41) and finally on data and metadata (33). Some differences between the centers are revealed; some seem likely to emphasize more technical aspects, whereas others insist on the legal or anonymization issues. In general, researchers appreciate these sessions, which seem to meet their expectations. Once they have attended the enrollment meeting, researchers have to go through an authentication process in order to access the required data; this process is based on various techniques (biometric, login, password, smartcards,). Most of the time, a single method is required (51 cases out of 65) the most widespread being the login/password combination; biometric fingerprint comes next (see Graph). Whenever two methods are combined, it consists of the biometric print supplemented by the smartcard. 18/65

Which authentication methods were required? 40 35 30 25 20 15 10 5 0 Biometric Login, password Smartcard Other The good news is that a large number of researchers seem to be fairly satisfied with the authentication methods available (58/65) as well as with the frequency of authentication (52/65). Only a minority considers the requests for authentication as too frequent, and complains it may slow down data processing. The bad news however is that authentication may sometimes be somewhat unpredictable. More than one half of survey respondents and participants in the workshops experienced problems with passwords or biometric devices (excessive sensitivity of the fingerprint device, this problem being concentrated in certain facilities). However, this problem is about to be fixed, since researchers mentioned that problems they experienced at first had found a solution later on. Lastly, the fact of having to use a different card for each project seems to be a hassle. Authentication is a serious issue since it takes place at the beginning of the process - but the problems mentioned seem to be likely to be fairly easily fixed. Two major "soft needs" related to the next phase (data processing phase), appear once the researchers are allowed to access the data: installation of computerized equipment and additional software. Some researchers may have to install a device allowing them to access the data (this varies according to the facilities' access systems, the need to install or not a dedicated equipment - SD Box for example - or to perform a modification on the computer, or have a computer technician doing it for you) and/or installing software for data processing. Out of 65 respondents, 24 were obliged to install some hardware device or a software; in one half of cases, this task was performed by the researcher or by a member of his institution" IT (respectively 9 and 8 answers) In other cases; it was performed by an external operator. Most difficulties mentioned here are related with technical issues dependent on the facilities' security requirements and with communication problems between the secure data center staff and the members of the academic or research institution. Getting help from the IT experts, in particular from the IT team in the university may sometimes be a problem. The local IT team may also have concerns about security regarding external installations. For a European remote access network, (Eu-RAN), this issue should be considered with the greatest attention, since we can expect all access systems to be different. The adjustment process should be userfriendly and the computer installations as homogeneous as possible. Providing swift and competent assistance to researchers will be a crucial point for success. This is a very important point indeed, however we categorize it as a "soft need', since in general those researchers who needed assistance, finally obtained satisfaction. 19/65

Software turns out to be a major topic for discussion, with once again strong variations from one facility to the other: software supply is variable and researchers' expectations are dealt with in different manners. 13 researchers were in need of computer software, which was unavailable, and on several occasions, in the questionnaire or during the workshops and interviews, researchers mentioned the need to get or install new software for data analysis or other device for processing the data than those at hand. Specifically, cartography software, SPSS or R are in demand. Sometimes, researcher need only to add specific lines of program or updates certain modules and this is generally a source for technical problems. In the same line, the lack of connection to the web makes it impossible to directly download some software packages; researchers are then obliged to ask the facility staff to do it, this often involving more or less serious installation delay. As to the support and surveillance phases, we should note that a majority of the researchers ignore if they are under scrutiny or not during the data processing phase (only 11 of them mention this point concerning the control of methodology they used or the organization of work at the beginning of analysis); they generally willingly accept the constraints related to a secure access to data; however, it is the output checking phase which turns out to be a source of difficulty, just as the lack of confidence in the researchers and the absence of guarantee for their property rights, all issues that will be addressed next in the strong needs part. 2. Strong Needs and Researchers Requirements Regarding a Eu-RAN Some issues either remain unmentioned by the researchers or raise but minor problems easy to fix without any major change in the way secure data access centers are run; however, other issues turn out to be major obstacles likely either to delay considerably research activities; this may be a source of limitation or at worst, cause the project to fall apart. These difficulties are all the stronger in a transnational perspective. We may reorganize them under 7 major headlines corresponding to strong requirements expressed by the researchers. In the framework of a secure European network of centers, they have to be treated with great care. Some of these points imply to discuss new solutions adapted to researchers' needs while respecting privacy and security constraints. Reduce Waiting Time throughout the Process The issues relating timeframe and delays are mentioned over and over by the researchers in all modes of investigation, qualitative as well as quantitative: simplification of the data access procedure and reduction of the delays are two major requirements. The results suggest that the most disturbing delays occur at two points in the data access process. They occur first early in time, with the accreditation procedure. We have already mentioned that although this issue was out of the survey scope, it was spontaneously pointed out by the researchers on several occasions: they emphasize the discrepancy between the "research timing", marked by specific constraints (dissertation, research projects subject to the specific deadline of tenders for proposals) and bureaucratic timing which involves delays too long and too unpredictable before receiving 20/65

accreditation, thus leaving too short a time for data processing; some projects may even be caused to fail because of such delays. Typically, this is the case, if the researcher has to get accreditations for data involving several countries and/or several teams. At a later stage, researchers complain that control over their outputs is often delayed for too long. One third of all researchers consider that the delay to obtain outputs is too long. And among researchers who declared a delay of several days (as this is mostly the case according to our results - 58%), one out of two consider this to be unacceptable. How long did it take for each checking of output (on average)? Other A few da ys A few hours A few minutes Let us quote here some comments which show that these delays are neither understood nor accepted: such delay is not convenient for quality research, any waiting just to learn that the program crashed is very frustrating, not acceptable for a paid service. Some challenge the competence of the staff in charge of output checking: Remote Access services are generally not experienced enough to judge that output is correct. On top of that, there is the time to find relevant information concerning available data as well as procedures, time lost on trips, when there is no access point available from the researcher's home institution, time to access the data once the accreditation has been granted, time to obtain the necessary support; to make a long story short, it is the whole chain of operations which is being challenged. Since most of the time research projects involve several teams, that have their own deadline, coordination becomes extremely tricky, which in its turn delays the whole operation. Access Points: No Travel and One System 0 10 20 30 40 50 60 70 Researchers express a second major requirement: they want to be able to access and process data from their home institution, without having to travel to another place. Among the on-line survey respondents, 6 researchers out of 10 could reach the data from their home institution, either from their study or from a dedicated room; 3 out of 10 could access data from their study, which is perceived as the least frustrating the solution (see graph below). 21/65

25 Where did you access to data? 20 15 10 5 0 In your own institution, in your own office In your own institution, in a specific room In another research institution In a data center of a NSI Other In fact these results reflect the breakdown of respondents across the various secure centers they used. At CASD and at CBS, researchers may access the data from their home institution, as opposed to the other facilities that expect the researcher to visit an accredited center in order to access the data; the latter may be located in another city than the researcher's residence. One half of the respondents (53%) who could not work out of their study viewed this fact as problematic. The issue at stake is time but also money, i.e; the cost of transportation and accommodation if the accredited access point is located away from the researcher's home city. This is also a source of problem in terms of organization that implies a tighter schedule for the researcher who teaches or is involved in other activities. Among listed problems, we find: the necessity to book seats in advance, to be unable to work when the center is closed, which is a serious problem when the user has to find accommodation on the spot, the fact of not having one's documentation at hand, of not being able to discuss immediately a recent problem with a colleague; all these elements tend to be detrimental to team work, interactions and, ultimately, to efficiency. What is true at national level is also relevant in the case of a transnational access. By using a secure European network to process confidential data, researchers will save travelling expenses, time and hassle. As to know if researchers will be able to process data from their home institution, this remains an open question. This point was crucial in the discussion between national statistical institutes concerning the recent European regulation on researchers' access to the European held by Eurostat which should allow the creation (under project) of a secure remote access for the Secure Use Files, thus saving users a trip to the on-site access at Luxembourg (actually quite unused). It turned out that access would only be allowed from accredited facilities, initially limited to the NSIs ones; though later on, it might be possible to accredit other institutions, under very strict conditions. It is unlikely that many universities should make such an investment for 22/65

a limited number of possible users, therefore constraining many researchers to travel within their country to the specific accredited access points that may be distant from the researcher's usual working place. Similar situation may happen for a Eu-RAN providing access to the national microdata, then raising serious problems for the researchers. Anticipate a Potential Mobility between Institutions or Countries This point is in part related to the access-points issue. Three types of situation regarding mobility may arise: a) occasional or short-term mobility involving a trip to another location in order to work with other colleagues. In this situation, the researcher is unable to resume instantly data processing in order to modify or test new models, refine the analysis; b) another type of short term mobility is involved when the researcher makes a trip of a few days or weeks. Whether access takes place from the researcher's office or from a dedicated spot, the access point is not mobile. Researchers expect a more flexible solution "I wish the box could be usable from several locations", some suggesting for such cases, a more restricted access as remote execution. c) In other cases, we talk about a long-term mobility involving another national or foreign institution; if it takes place during the course of project, it implies to start all over again the accreditation and access procedures (typically if the accreditation involves the institution, as it is generally the case). Research Environment A fourth strong requirement of the researchers pertains to their research environment. Less than one survey respondent out of 10 was able to use his/her own computer to access the data. Most of them used dedicated equipment, whatever the center they described and whatever the location. Among those working on dedicated equipment, from their own study, the hardship is reduced, since their computer happens to be in the same room. Not working with their own computer is often problematic (42% of the researchers mentioned it) for two kinds of reasons. The first reason refers to the inaccessibility to their personal files, documentation and software and more generally to their own work environment. This is seen as an important constraint, expressed by the researchers, particularly during the discussions we had at the workshops. The second reason concerns the transfer of programs or the recovery of results or programs. As to programs transfer (authorized after control in several centers), it involves that the center is in charge of the transfer, sometimes causing long delays. The research environment issue may be addressed under a second perspective that is the communication with other researchers. We have already mentioned the handicap caused by being unable to work in a same place with the research team and other colleagues. The problem of being unable to access the web comes next, frequently mentioned as a limitation involving other problems: the difficulty of being unable to check references, to compare results but also to ask questions, or get support concerning the work in progress. If researchers accept willingly protected access and its constraints, they would consider a more open research environment as an improvement. A researcher made this comment well reflecting general opinion:"it would be possible to do much more in terms of research with accessibility from my desktop computer in the office". How far a more open research environment can be compatible with highly-secure access to data raises issues for a Eu- RAN. 23/65