SDMX Connectors: using SDMX data in statistical packages and tools (EXCEL, R, Matlab, SAS) Bank of Italy IT Support Unit for Economic Research Department Gianpaolo Lopez Attilio Mattiocco Diana Nicoletti SDMX Global Conference - OECD, Paris, 11-13 September 2013 1
Motivations for the SDMX Connectors The first step of a typical data analysis process is the retrieval of the data to be processed. What do users want? To use the statistical tools they know (R, EXCEL, Matlab, SAS etc.) to analyse data from different sources To discover the data they might want to use for their analysis To repeat the analysis they have developed in their tools, with updated data Data Provider Data Capture Data Processing and Analysis Data Dissemination What do users face? often the data of interest is replicated inside the organization different external and internal data providers with different formats the need for manual steps to get the data into the tool frustration 2
the SDMX Connectors SDMX and Web Services provide the means for strongly simplifying data retrieval from external sources. Reduce the amount of external data that an organization needs to replicate in the internal systems, without impacting data process efficiency. SDMX Provider Data Process Data Dissemination SDMX Connector But SDMX standard and Web Services technology are quite complex. End-users don t want to cope with this kind of IT complexities. The SDMX Connectors framework has been developed for hiding this complexity to the end-user. 3
The Framework ECB Secure OECD Secure IMF ECB OECD ISTAT In the future? LOCAL DB BIS SDMX Connectors internet/intranet SDMX Providers SDMX library 4
Use case: user wants to get exchange rates from ECB... 5
Some basics about SDMX Data Flows in SDMX are families of data that share a common structure The structure of an SDMX data flow is multidimensional and is declared in a Data Structure Definition in SDMX 2.1 (Key Family in SDMX 2.0) The dimensions of a data flow can be used (as the columns of an SQL table) to retrieve the specific parts of the data flow that are of interest Example: Exchange Rates dataflow in the ECB ( http://sdw.ecb.europa.eu/browse.do?node=2018794 ) Data Flow identifier: Dimensions: EXR FREQ CURRENCY CURRENCY_DENOM EXR_TYPE EXR_SUFFIX In order to get the desired time series, you need to know the name of the dataflow, the dimensions and the codes of the dimensions that correspond exactly to your needs IT S NOT EASY!! 6
Knowing SDMX means Knowing DSDs To facilitate query building, we have developed a light HELPER tool, driven by DSDs made available by the data provider with web services 7
Knowing SDMX means Knowing DSDs With the HELPER tool, the user can browse through the DSDs to search data of interest 8
Knowing SDMX means Knowing DSDs The user can learn the codes needed to build the queries 9
The every-day life. of the data-analyst Excel video demo 10
The same access from R. the data 11
The same access from R the Helper 12
The same access from Matlab. 13
next steps and... SDMX 2.1 support, REST (in progress..), JSON? Easier config of data sources (in progress..) Registry for data discovery? New SDMX Data Providers? further statistical tools? Open Source, collaborations A dream? What would be possible if ALL data providers were to give data in SDMX? If we also had an SDMX registry that would tell us where the data is? Life could be so easy both for who need to make data available to researchers and for the researchers, who want all the data and all of this will work even better, if we have DSDs for global use (e.g. BOP and SNA). 14
Thank You! gianpaolo.lopez@bancaditalia.it attilio.mattiocco@bancaditalia.it diana.nicoletti@bancaditalia.it SDMX Global Conference - OECD, Paris, 11-13 September 2013 15
Backup slides for Excel demo 16
17
18
19
20
21
22
The every-day life. of the data-analyst Excel video demo 23