Getting access to NPD data for EEF evaluations: Advice for evaluators This document outlines the EEF s plans for a data archive and discusses the implications this will have for evaluators. It includes some guidance around applying to the NPD and asking for consent from schools. CONTENTS BACKGROUND...1 AN EEF DATA ARCHIVE...1 APPLYING FOR THE NPD...3 CONSENT...5 BACKGROUND The aim of the EEF is to build the evidence for what works in raising the attainment. Ultimately, this means demonstrating the impact of its projects on children s attainment at Key Stage 2 at 11 and GCSE results at 16. All evaluations will require data on the background characteristics pupils and their attainment. This data can either be collected directly from schools and pupils, or be accessed through the National Pupil Da tabase (NPD). Projects that are being conducted immediately before Key Stage tests will use these as their outcomes. Others might use standardised tests to assess the immediate impact on attainment and then link to the NPD to look at long-term impact Ultimately, the EEF wants to track all of its pupils longitudinally using the NPD and link with data collected directly from its evaluations. All this data will then be stored in an EEF data archive and eventually the data will be made publicly available in an anonymised form through the UK Data Archive. This will result in a rich and powerful dataset both for the EEF and education research community. THE EEF DATA ARCHIVE The EEF has commissioned FFT Education, in partnership with the Institute of Education (IoE), to manage its data archive. All evaluators will be required to submit their data to FFT Purpose The EEF will use the dataset to: Track the impact of its projects longitudinally; Compare and group the impacts of different projects to draw lessons about which approaches to teaching and learning are most effective generally and for different groups (such as FSM); Answer methodological questions about research design a nd outcome measurement; Look at the cumulative impact of projects on schools and pupils; and Understand better its target group; Once made available through the UK Data archive, the education research community will be able to use the dataset to: Verify the results of EEF evaluations; Conduct secondary analysis, such as on particular subgroups or types of intervention; 1
Link to other datasets for research purposes. The IoE plans to work closely with FFT to ensure that EEF datasets becomes widely known and used. For example, they plan to create teaching resources that allow EEF datasets to be used to teach a variety of statistical methods to undergraduates and graduates. They will also promote the datasets through their networks to ensure that the maximum value and learning is created from the data. Size and extent In its first two years the EEF funded 72 projects working in 2,300 schools and with 500,000 pupils worth 37.4 million. There are some large projects and an additional 10 million in funding for literacy catch up projects at the transition in the EEF s first year. Therefore, conservatively assuming approximately 100,000 pupils in 700 schools reached per year going forward, we envisage the data warehouse holding data on over 1 million pupils covering 8000 schools or more within EEF s first 10 years. This would be updated annually as results come out. Data submission requirements Every evaluation commissioned by the EEF will generate data as part of both the impact and process evaluation. All evaluators will be required to submit the following to FFT for input into the EEF data archive within a month of their report being published: Standardised dataset for import into the longitudinal databa se Anonymised datasets to be submitted to the UK data archive Do or syntax files Project documentation (final report, technical appendices, user guide or data dictionary) The diagram on the next page goes into more detail about these four submissions. Further information on submissions is available on FFT s Share Point site. This can be accessed once you have been sent an invitation from the project site which can be requested from FFT at any time. Evaluators can also contact Andrew Bibby at FFT if they have any specific questions: andrew.bibby@fft.org.uk. The standardised data set Evaluators are expected to submit a set of minimum fields. These fields include: Pupil Matching Reference (retrieved from NPD using UPN and other identifiers) FSM, Pupil Premium, EAL Any standardised attainment test data collected for the evaluation Any project information (project / treatment or control). The most important piece of data for linking is the Pupil Matching Reference. More information on the standardised data submission can be found on the Share Point site. This can be accessed one you have a login, which can be requested from FFT at any time.
APPLYING FOR THE NPD Most evaluations, where UPN is being used to link to data generated by the project, will require Tier 1 access for the pupils in its evaluations. This is because the variables UPN, FSM and Pupil Premium require Tier 1 access (see Table 1). Many evaluations have not yet put in their request to the DfE for the NPD. For those evaluators about to make a request, please ensure you: Request Tier 1 access (for the appropriate subset of data); Ask for PMR (this will enable us to link to future NPD records ); and Use the attached guidance provided on completing the NPD application process. The NPD application process requires that you fill in: The NPD Application Pack NPD Data Security Self-Assessment Questionnaire We have gone through these two documents and included suggested wording and guidance in comments boxes in the sections that are relevant to the EEF s requirements and plans for a data archive ( you can find these here on EEF s website: http://educationendowmentfoundation.org.uk/evaluation/evaluationresources/evaluation-resources/ ). For those that have already made a request, if you have not asked for PMR, please ensure that you keep the data that is being used for the matching so that this can be done retrospectively if necessary. Table 1: Tiers of NPD data
Tier Information Type Description 1 Individual pupil level data - Identifying and / or Identifiable and Highly Sensitive 2 Individual pupil level data Identifiable and Sensitive Individual pupil level extracts that include identifying and highly sensitive information about pupils and their characteristics including items described a s sensitive personal data within the UK Data Protection Act 1998. Examples of identifying items include UPN, Names, Address and Date of Birth. Examples of highly sensitive data items include Pupil Premium, Looked After Status, In Need Status, Full Ethnicity, Full Language and Primary and Secondary SEN Type, reasons for exclusions and absence. Individual pupil level extracts that include sensitive information about pupils and their characteristics including items described as sensitive personal data within the UK Data Protection Act 1998 which have been recoded to become less sensitive. 3 Aggregate School level data Identifiable and Sensitive 4 Individual pupil level data Identifiable Examples of sensitive data items include ethnic group minor / minor, language group major/minor, SEN and eligibility for Free School Meals. Aggregated extracts of school level data from the Department s School Level Database which include items described as sensitive personal data within the Data Protection Act 1998 and could include small numbers and single counts. For example, there is 1 white boy eligible for Free School Meals in school x that did not achieve level 4 in English and maths at Key Stage 2. Individual pupil level extracts that do not contain information about pupils and their characteristics which is considered to be identifying or described as sensitive personal data within the Data Protection Act 1998. For example, the extracts may include information about pupil attainment, prior attainment, progression and pupil absences but do not include any trivially identifying data items like names and addresses and any information about pupil characteristics other than gender. Matching There are two options by which data collected from evaluations can be matched to the NPD: Evaluator matching Evaluators use information to match to records in the NPD themselves. They can then eventually remove the identifying information and provide PMR alongside data generated by the project to the archive. DfE matching Evaluators provide their own dataset with UPN, and other identifying i nformation alongside the project data and the DfE conduct the matching on their behalf. You will need to request they include PMR in the matched dataset. The decision about whether the evaluator or DfE do the matching is made by the DfE and will be decided on a case by case basis. If the DfE insist that they do the matching for you, you will need to request that they return the matched data with PMR, not completely anonymised, so that we can track these pupils longitudinally in the future. For either process matching will need to be done using a combination of UPN and other identifiers such as date of birth. Therefore it is essential for evaluation teams to collect this information from schools. PMR should be automatically included in your matched dataset. However, if you have not been given PMR, please ensure that you keep the data that is being used for the matching so that this can be done retrospectively if necessary.
CONSENT You will need to get advice from your research ethics committee on the level of consent required, depending upon the nature of interaction with schools and pupils. Here we provide some broad guidelines, but ultimately the responsibility for determining the appropriate consent lies with the evaluator. There are four things that you will need to gain consent for: The intervention: Where the intervention is delivered within school hours consent from the school is sufficient, where the intervention is out-of-school hours you will need parent consent. The evaluation: Where the random allocation and testing is being conducted with whole classes school consent is sufficient. Where children are being identified and tested separately from their peers, or random allocation is at an individual level, opt-out parent consent may be required. Data-linking: School-level consent is sufficient in most cases for evaluators to link attainment data collected and data on the project (eg, whether treatment or control) to the NPD. However, where you want to link to more sensitive or personal questionnaire data (eg, on family relationships or emotional symptoms), individual opt-out consent from parents should be sought. Archiving data: Data from EEF evaluations will be shared in an anonymised form through the UK Data Archive for research purposes. The UK Data Service has said that all consent needs to make this clear and ideally consent needs to be at the individual level although school -level consent does not necessarily preclude data being submitted. You will need to provide evidence to the DfE that you have gone through a process to determine th e appropriate level of consent and had this reviewed by an ethics committee. They do not provide guidance on the level of consent that is required for linking to the NPD. However, EEF s understanding is that in the majority of cases school-level consent is sufficient, except where you want to link to sensitive data. EEF and the UK Data Archive would rather individual opt-out consent. So going forward we would recommend that projects aim to collect this. The evidence that will be required by FFT, who manage EEF s data archive, is reassurance from the evaluator (by ticking a box on the data submission form) that consent wa s sought. A copy of the template consent letter used needs to be included in the appendices to the evaluation report. We have provided some suggested wording for the consent letters below. School consent The following can be adapted and embedded as part of the memorandum of understanding that schools must sign in order to take part in the evaluation: Pupils test responses and any other pupil data will be treated with the strictest confidence. The responses will be collected online by [test deliverers] and accessed by [evaluator]. Named data will be matched with the National Pupil Database and shared with [delivery partner], [evaluator], EEF s data archive and the UK Data Archive for research purposes. No individual school or pupil will be identified in any report arising from the research. Parent consent The following can be adapted and embedded as part of the letter from the delivery partner and evaluator to parents:
If your child takes part, they will be randomly selected to experience the programme either [insert intervention terms] or [insert control terms]. The programme involves [insert details of involvement]. They will be asked to take a test before and after the programme. Pupils test responses and any other pupil data will be treated with the strictest confidence. The responses will be collected online by [test deliverers] and accessed by [evaluator]. Named data will be matched with the National Pupil Database and shared with [delivery partner], [evaluator], EEF s data archive, and the UK Data Archive for research purposes. We will not use your child s name or the name of the school in any report arising from the research. We expect that your child will enjoy doing the tests and being part of the programme. Your child may withdraw at any time. If you prefer for your child NOT to take part, please inform their teacher. If you would like more information, please contact [delivery partner contact details].