The Data Exchange Project An update for SIF Conference 19 th November, Sheffield Iain Bradley DfE Data Exchange Lead Andy Evason, Actica Consulting, DfE s SIAM Partner
The current landscape a summary of problems Process 1. Gather Data 2. Process and Store Data 3. Make Data Available The Problem Bulk collections limit detail Significant front line compliance cost Changes reliant on MIS systems responding to COLLECT specifications Schools have to send the same data to several places Data stored in silos Data stored inconsistently (version control) Data processed with locally chosen software Data not stored at lowest level Responding to new policies or queries can be time consuming / inflexible Taken together, accessing, combining and then using data is more difficult than it ought to be Several places to go for the same data. Multiple websites and passwords. Varying analytical and visualisation tools Parents, schools, DfE, inspectors, researchers...not all pointing to the same data for the same issue 3 rd party access to data is a bespoke and labour intensive process The Solution Data cleaning can happen outside school MIS 1. DATA EXCHANGE School Performance Data Programme 2. WAREHOUSE 3. PORTAL The Data Transformation Programme
Context: The vision for the end state For Richer, More accurate data to be available quickly in accessible and usable forms, in order to enable others to drive up the quality of education and services received by children Specifically within the context of data exchange: From bulk upload to regular movement with minimal manual intervention a business process should trigger a movement Able to tell someone plugged into the exchange within minutes, but most typically within hours will do. Data could be pushed on change, at defined times, or pulled. Schools don t have to repackage data for different users just be plugged into the exchange
Context: The vision for the end state We anticipate each and every School and LA who uses one, being plugged into exchange via their MIS Appropriate role based authorised access and security Data movements being controlled via a central hub as part of a hub and spoke configuration (as opposed to hierarchical, distributed or centralised) SPDP s data warehouse and portal will be key consumers, and as such the exchange architecture should be closely integrated to maximise performance Able to handle a variety of formats for moving data around the sector when data moves it may be in various formats but only ISB format will be accepted by the Department Significant front line involvement in governance
Designing the Architecture what do we know? 25,000 schools, 152 LAs. Nearly all have MIS, but not all use it to the full degree. Data Exchange will be one size fits most but need a way to bring data into data store for the tail of schools not using MIS. Given the fact DfE is already buying a warehouse and portal under School Performance Data Programme, we should fully exploit the elements of those which can deliver part of the solution for Data Exchange Hub and spoke considered the most efficient design
Data Exchange: What s out of scope? Data / Organisational scope could potentially be massive, but the risk of never getting off the ground would be substantial. Initial scope will focus on individualised data sitting in school and LA MIS as end points. But by building a scalable solution using open standards, we will avoid a cul-de-sac in future. Within that scope, a number of scenarios have been identified which may fall outside the scope of DTP, including: the transfer of information between systems within an organisation, for example to maintain common data in separate systems within the organisation in a consistent state on a sub-second timescale. Whilst not our focus, we of course don t want to unintentionally orphan any existing local movements whilst implementing the exchange. the transfer of information between schools working collaboratively, for example to move in-lesson attainment data captured in an interactive learning environment from one school to another during the lesson to alert the Local Authority children s services immediately when a learner that is being monitored does not arrive for school Data exchange will support the sending of any package but the SPDP project does not cover the extraction or loading of any data which is not in ISB format and not required for SPDP. So sending data that is within an organisation can be done locally using the data exchange but is out of scope of SPDP
What we need To integrate an exchange within the warehouse and portals solutions in hand we need a) School / LA MIS to be able to communicate with Data Exchange Hub b) A data exchange hub, with appropriate routing, control, audit and security c) The hub to seamlessly integrate with the SPDP data warehouse to provide the storage area for all the data DfE receive. We do not need a data store, or way to present data these are to be delivered by SPDP
Challenges / Risks Number of end points and variation in technical ability of schools Implementing the ISB Enterprise Data Architecture with MIS suppliers with whom we have no formal contractual relationship Ensuring integration with SPDP architecture, which itself has not been built yet Cultural shift for data providers, from annual data collections which are physically sent, to data flowing out of system automatically Greater transparency of information than ever before at a local and national level Data cleaning / validation. Ensuring we better support front-line data entry by developing easily accessible rules across the end-to-end solution, and not throwing out the baby with the bathwater in terms of current cleaning and checking roles played by schools and LAs
Summary / Next Steps / Timings 2013 2014 2015 2016+ SPDP Data Exchange Requirements Gathering & Technical Options Work Vision, Blueprint & end to end design DfE Ministerial approval of Full Business Case Cabinet Office approves Full Business Case Preferred supplier chosen contract fine tuning Procurement Strategy Standards implementation decisions Outline Business Case Contract Signed Data Warehouse design, build and test Portal design, build and test Procurement Activity Start : Target Spring 2014 Start Build: Likely to be phased based on SPDP readiness and standards maturity Phased DWH go live Phase one of exchange complete? Phase two of exchange complete? Phased Portal go live Data Warehouse and Portal operational
Developing The Blueprint: Latest Thinking Andy Evason, Actica Consulting
End to end solution overview Connected to Data Exchange Data Entry End User End Point DE Connected Data Store Application Interface DE Message Authentication Service DE Message End Point DE Connected Application Interface Data Store Data Reader End User Data Exchange Hub Control Application End Point Security Message Routing / Queue Audit Database Not Connected to Data Exchange Agnostic of Transfer Mechanism DE Message Schools Performance Data Warehouse Admin End User Control Application (including ETL) RBAC Data Entry End User End Point Not DE Connected Application Data Store Interface Data File DE API Master Data Management Data Validation ODS Data Warehouse Data Extracts Web Portal Web-based End User CIOG Analysis Manual Data Entry Data File Non-DE Interface Data Validation Legacy Data Store Data Analysis Tools Analytical End User
Key requirements for transfer mechanism Automatically transfer information Guaranteed transfer of information between any two end points with no manual intervention (Not guaranteed order of delivery) Addition / removal of end points with minimum effort Prioritisation/precedence to meet SLAs for different message types Configure dataflows (control capability) Enable authorised users to configure pre-defined dataflow services Trigger (on change, scheduled, on request, others TBD) Destination end points (individual or multiple) Validate and cleanse data Data quality is important for end to end solution Transfer mechanism must validate against XSD (Rely on end points and SPDP for data model specific validation rules)
Key requirements for transfer mechanism Monitor and improve performance Performance logging and alerting Maintain security Solution will need accreditation to Official level Authenticate end points (and end points must authenticate hub) Protect information in transit Ensure only authorised users can configure data flows Ensure data flows reflect access privileges of end points Accounting / Audit capability (SPDP protects data at rest)
Key non-functional requirements Support tens of thousands of end points 25k schools, could be several end points for some schools Scalable to support future growth Message volumes still being developed 25k schools reporting on (non)attendance twice per day message per session/school or message per class / session TBD Other messages one or more orders of magnitude lower in frequency Performance Session attendance available on portal within 1 hour of leaving school Performance budget needs to be split between transfer mechanism and SPDP analysis / reporting / publication activities Availability 24/7 with 99%, core working hours higher < 30 minutes interruption of service Simple / efficient / timely process to establish new data flows
Key non-functional requirements Flexibility Keep data model and transfer mechanism separate Maximum flexibility in terms of modes from user perspective Standards Use open standards in widespread use Identify output based requirements and assess solutions offered against VFM for the sector
Information flows and control End point Hub End point/spdp Proprietary Data model End point core (May support ISB natively) Control message Send {Existing flow X} When {trigger} To A, B, C SPDP / End point Core (May support ISB natively) ISB BDA specific ISB Data layer Pack / unpack ISB format business data payload Control application Configure end point data flows ISB Data layer Pack / unpack ISB format business data payload Data model agnostic Message layer Create / send, Receive / extract Hub Message queuing and routing Message layer Create / send, Receive / extract Wide area network Data flows IAW control messages: {Existing flow X} sent to hub labelled for A, B, C when trigger condition met Hub routes message to A, B C based on information in header