Urban Big Data Centre Data services: Guide for ers January 2016 Version 3.0 Authors: Mirjam Allik, Nick Bailey
Who we are The Urban Big Data Centre (UBDC) is a national data service, funded by the Economic and Social Research Council to support ers, data owners, policymakers, and everyday citizens in extracting useful information from urban- related data. We help others harness the potential of big data to develop solutions for environmentally sustainable, economically resilient, and socially just cities. The UBDC brings together interdisciplinary expertise of urban social scientists and data scientists from the University of Glasgow and six partner universities of Edinburgh, Bristol, Cambridge, Reading, Sheffield and Illinois- Chicago to seek solutions in addressing such challenges. UBDC services for ers We offer four distinct services for ers to enable access to a wide range of data to study urban problems. These are: Data collections Computing service Data sourcing service Controlled data service In general, the UBDC aims to provide these services free of charge, but additional costs may be involved for data extraction, cleaning, programming or linkage, especially when sourcing new data. Data collections The UBDC provides a web- based service, which permits free access to open and safeguarded data. Researchers can browse or search the collection and download the open data they need, and safeguarded data can be accessed after agreeing to the terms of use. Open data. There is a general policy in Government to make data freely available where possible. We hope that a wide range of organisations will be willing to share data with us on an open basis. We will also be seeking to produce a range of new open data products using modelling or estimation techniques. Safeguarded data. Many organisations share data with the UBDC or licence it to us, under the condition we restrict its use; for example, they limit use to for academic purposes, retaining future commercial interests for themselves. UBDC provides er access to these data under the conditions set by data owners. This means ers have to agree to the terms of use, which is commonly limited to non- commercial use. As safeguarded data may often be quite large, ers may need access to a bigger computing environment. In such cases ers may request access to the UBDC computing services. 2
Computing service The UBDC has a computing cluster which ers can use where they need access to large volumes of storage and/or powerful processing. They may also use it to take advantage of software tools, which have been developed for big data analysis. Researchers can bring their own data to work on or have access to data from the UBDC s collection. Access is via a remote login to a virtual machine. Researchers need to complete an application form explaining the purpose of their and why they need access to this service and, when necessary, provide documentation on data or software licences. An independent Research Approvals Committee (RAC) takes the decision on whether to permit use of UBDC computing cluster for the project. Data sourcing service In some cases the UBDC may be able to source additional data at ers request. To access this service, the project for which the data is needed must be approved by the RAC. As data sourcing may result in additional costs, all requests for sourcing new data will also require prior approval within the context of UBDC s overall Data Strategy. Controlled data service While personal data cannot generally be sold, shared or passed on without permission of the individual concerned, controllers of such data may agree to permit others to have access for purposes where there is a public interest case for them to do so. Access will be limited to approved uses and users, and any risks of disclosure have to be minimised. UBDC does not acquire these data, but can help ers negotiate access to a secure environment to work with these data. To access controlled data, ers need to go through the RAC, which decides on whether to permit use of UBDC facilities for the project. The RAC will also assess ers experience and competence for undertaking the project. Projects by postgraduate students will be considered, but supervisors will have to be listed on the project as lead ers. In addition, data controllers also need to provide approval for each project, meaning the data owners may require additional applications. Before gaining access to data, ers will also need to be accredited by undergoing approved er training so that they are aware of their legal responsibilities. Controlled data are held within a highly secure computing environment where it is possible to closely monitor who works on the data and to ensure no personal data leaves the system. The UBDC has contracted with an established and highly respected provider, electronic Data Research and Innovation Service (edris), to provide this service. Datasets constructed for each project will be destroyed on completion of the ; the UBDC will not become the permanent controller for these data. 3
Accessing our data services The different steps ers will need to go through to access the different kinds of data or services are summarised in Figure 1. For those wishing to access Open Data through the web portal or safeguarded data the UBDC holds, the process is straight forward. If ers wish to use the UBDC cluster, ask us to source new data or wish to use controlled data, the stages are more complex. In these cases, the er will need to go through the approvals process. For more information about the detailed process for the latter three services please see our Research Projects Strategy and Processes. Figure 1. The user journey Access data collection Open Data Download and Computing service Data sourcing Controlled data services Safeguarded data submit project proposal Project development Project development Register RAC discussions with data owners discussions with data owners register, data documentation submit project proposal submit project proposal, ethics approval RAC + DAC RAC Data sourcing formal agreements, training Register data extraction and linkage Download and Work in cluster Download and Research in a secure environment 4
Timescales The timescales for gaining access to our services depend a great deal on what ers want to access. For our data collection access should be quick, pending signing the terms of use in the case of safeguarded data. Where UBDC staff need to negotiate licenses for new uses or to source new data, timescales can be longer. With access to controlled data, it will always be necessary to negotiate a new data sharing agreement with the data owner or data controller, and to arrange for the data to be transferred to our secure facility. We aim to minimise the time this will take by developing relationships with the major data owners, by building up information on the data they hold, and by agreeing standard protocols with them. Applying for funding While there are no charges for UBDC services, ers may need to apply for funding, for example, to cover the costs of additional data acquisitions or the costs of staff time. Applications for funding can come before or after application for approval. We recommend that ers develop secure approval from the Research Approvals Committee before applying for funding. If you require more information about UBDC services, please contact ubdc@glasgow.ac.uk. 5