Scanner Data Project: the experience of Statistics Portugal

Similar documents
Integrated Data Collection System on business surveys in Statistics Portugal

Prepared by Anders Norberg and Muhanad Sammar, Statistics Sweden

Alternative data collection methods -

Redeveloping the Norwegian Household Budget Survey

Quality Control of Web-Scraped and Transaction Data (Scanner Data)

Strategic Brand Management Building, Measuring and Managing Brand Equity

Operating cash flow (EBITDA) exceeded Euro 247 million, rising 27.5% on 2000;

Report of the 2015 Big Data Survey. Prepared by United Nations Statistics Division

ECLAC Economic Commission for Latin America and the Caribbean

IDC Abordagem à Implementação de Soluções BPM

How To Understand The Data Collection Of An Electricity Supplier Survey In Ireland

Big Data in Price Statistics -

Generic Statistical Business Process Model

Survey report on Nordic initiative for social responsibility using ISO 26000

REPORT OF THE WORKSHOP

Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu.

The use of online prices in the Norwegian Consumer Price Index

TPI: Traffic Psychology International on a common European curriculum for postgraduate education in traffic psychology

PRINCIPLES FOR EVALUATION OF DEVELOPMENT ASSISTANCE

Methodology of calculating core inflation measures published by Narodowy Bank Polski

THE STATISTICAL DATA WAREHOUSE: A CENTRAL DATA HUB, INTEGRATING NEW DATA SOURCES AND STATISTICAL OUTPUT

Glossary Monitoring and Evaluation Terms

Innovation in New Zealand: 2011

Introduction to Quality Assessment

RATIONALISING DATA COLLECTION: AUTOMATED DATA COLLECTION FROM ENTERPRISES

Transactions Data: From Theory to Practice

Mergers and Acquisitions: The Data Dimension

BSBMKG506B Plan market research

Master of Science Service Oriented Architecture for Enterprise. Courses description

Data quality and metadata

Global Account Management for Sales Organization in Multinational Companies *

Better Gold Initiative

Jerónimo Martins, SGPS, S.A Full Year Results

Technical guidance note for Global Fund HIV proposals in Round 11

Busting 7 Myths about Master Data Management

HLG Initiatives and SDMX role in them

Redesign Options for the Consumer Expenditure Survey

Business Analyst Work Plan. Presented by: Billie Johnson, CBAP CSM

Procurement Programmes & Projects P3M3 v2.1 Self-Assessment Instructions and Questionnaire. P3M3 Project Management Self-Assessment

Conference on Data Quality for International Organizations

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

Annex 1 HARMONIZED INDICES OF CONSUMER PRICES (EUROPEAN UNION)

Ministerie van Toerisme, Economische Zaken, Verkeer en Telecommunicatie Ministry of Tourism, Economic Affairs, Transport and Telecommunication

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

Figure 2: DAMA Publications

Principal MDM Components and Capabilities

EMERGENCY MANAGEMENT BRITISH COLUMBIA A STRATEGY TO ADVANCE SUPPORT FOR LOCAL AUTHORITY EMERGENCY MANAGEMENT PROGRAMS OCTOBER 14, 2015

Draft guidelines and measures to improve ICT procurement. Survey results

GLOSSARY OF EVALUATION TERMS

REPUBLIC OF MACEDONIA STATE STATISTICAL OFFICE. Metadata Strategy

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

EMPLOYING BUSINESS IMPROVEMENT TECHNIQUES TO IMPROVE PERFORMANCE AND REDUCE RISK IN SERVICES OUTSOURCING. Ronan McIvor

Chapter 2: Financial Statements & Operations

Call Center Optimization. Utility retail competition is about customer satisfaction, and not just retail prices

Strategy & the firm. Value creation. Value creation

Experimental HICP-based estimates of administered prices in the euro area

METADATA DRIVEN INTEGRATED STATISTICAL DATA PROCESSING AND DISSEMINATION SYSTEM

The Role of Internal Auditing During Mergers & Acquisitions: The European Union Experience

GSBPM. Generic Statistical Business Process Model. (Version 5.0, December 2013)

Guidance on using indices in Indexation Clauses

Big Data Analytics Valuation Methodology and Strategic initiatives

Consumer Price Indices in the UK. Main Findings

Trends in Tax Administration Outsourcing. Why tax administrations outsource?

EC-learnings: how to measure e-commerce? Peter Boeegh-Nielsen Statistics Denmark Sejrøgade Copenhagen Ø Denmark

UK Service Industries: definition, classification and evolution. Jacqui Jones Office for National Statistics

International workshop on Economic Census (26 29 July 2005, Beijing, China) Data Collection on Economic Activities in I.R. IRAN

Implementation of an Information Technology Infrastructure Library Process The Resistance to Change

Context Capture in Software Development

Implementation progress of the EASEE-gas Common Business Practices (CBP's)

2013FIRSTHALFRESULTS. JERÓNIMO MARTINS Strategic Overview

IMLEMENTATION OF TOTAL QUALITY MANAGEMENT MODEL IN CROATIAN BUREAU OF STATISTICS

ON OECD I-O DATABASE AND ITS EXTENSION TO INTER-COUNTRY INTER- INDUSTRY ANALYSIS " Norihiko YAMANO"

Impact of Standard Business Reporting on Business, Government & Accounting

How To Use Data Mining For Loyalty Based Management

Quality Assurance and Quality Control in Surveys

Definition of Public Interest Entities (PIEs) in Europe

Data Warehouses in the Path from Databases to Archives

TERMS OF REFERENCE FOR THE EVALUATION OF SECTOR SUPPORT IN THE WATER SECTOR.

How To Use Probos.Com To Improve Your Business

Transcription:

Scanner Data Project: the experience of Statistics Portugal Paper presented at the Workshop on Scanner Data Stockholm, June 7-8 2012 Paulo Saraiva dos Santos, Filipa Lidónio and Cecília Cardoso 1 Statistics Portugal Abstract: Portugal undertakes its first integrated research work on the exploitation of scanner data in order to benefit further European Statistical System (ESS) development in the fields of producing multipurpose consumer price statistics. Statistics Portugal has recently been awarded a Eurostat grant to undertake research on the exploitation of scanner data, covering the 2011-13 period. In line with the national strategy on modernization of data collection methods, the project focuses on the study of pragmatic approaches to use scanner data as improvement of the existing price collection systems and other statistical operations. The project is been carried out with internal staff, in straight collaboration with data providers, and other NSI. It covers the following lines of action: (1) Knowledge acquisition; (2) Collaboration with data providers; (3) Pilot project; (4) Infrastructure. This paper presents the present status of the scanner data project, especially our experience to date on cooperating with other authorities and obtaining collaboration with data providers. Key Words: scanner data; consumer price index (CPI); collaboration with data providers; data collection 1 Correspondence: paulo.saraiva@ine.pt; filoipa.lidonio@ine.pt; cecilia.cardoso@ine.pt 1

1. Introduction Scanner Data is defined as detailed data on sales of consumer goods obtained by scanning the bar codes for individual products at electronic points of sale in retail outlets. The data can provide detailed information about quantities, characteristics and values of goods sold as well as their prices. This approach constitutes a rapidly expanding source of data with considerable potential for CPI purposes [1]. Many countries have been using this source of information since the early 2000s and there is broad consensus about its potentialities. Comparing with traditional data collection, where price collectors directly observe prices in sales points, scanner data has remarkable advantages, especially in reducing costs and burden, as well as offering a substantial reduction of the risk of manual collection errors. On the other hand, scanner data also offers opportunities to improve the methods of CPI, namely in dealing with the conflict of goals in reaching both the representativeness 2 and the continuity 3 of the items observed [2]. CPI national authorities can benefit from scanner data approach in many ways, especially replacing part of the manually collected price, or compute index from a subset of items based on all products for which scanner data are available. Of course, other ways can complement the existing procedures, such as the use of scanner data as auxiliary information, as well as for quality control and auditing After an internal discussion, Statistics Portugal has decided to prepare the use of scanner data and evaluate the better way to achieve this purpose. In the first step, we intend to replace part of the manually collected price data with scanner data for the ordinary sample of outlets and products. Afterwards, an evaluation will be made in order to improve the CPI methodology through the extensive use of scanner data available. This paper offers a description of the present experience of Statistics Portugal in developing its scanner data project and lists the results achieved. It starts with a brief presentation of the context, explaining the drivers and motivations in adopting this new approach. Afterwards, section 3 presents the Portuguese scanner data project, especially the planned roadmap. 2 Representativeness in CPI, for each elementary aggregate, the selected items are those that are most representative of private household consumption. With the scanner data approach, the sales volume of the item can measure representativeness. 3 Continuity in CPI, with the exception of seasonal products, the observed items should be continuously selected in order to determine the variation of the prices. This means that we should track the price of the same item as long as possible. 2

Section 4 presents our recent experience on the collaboration with data providers in obtaining scanner data, sharing some lessons learned. Finally, section 5 shows a view of the next envisaged developments. 2. Background and context The role of Statistics Portugal Statistics Portugal is the Portuguese central authority for the production of statistics. Its main task is to develop and supervise the national statistical system. As a national statistical authority, Statistics Portugal is legally empowered to require information (mandatory and gratuitously) to all departments or agencies, individuals and legal entities, any necessary elements for the production of official statistics. Survey s data collection is a core activity of Statistics Portugal, consuming around 40% of its annual budget and 30% of its human resources. A Data Collection department assures mainly the operation of statistical production phases of collection, processing and analysis of collected microdata, covering all business and social surveys. Data collection staff is spread all over the country (mainland and islands), especially in Lisbon, Oporto, Coimbra, Évora, and Faro, but under centralized system. The Autonomous regions of Madeira and Azores have their own authorities for the production of regional specific statistics, while being the data collection centres for those areas for Statistics Portugal, under common technical requirements and infrastructure. The Portuguese CPI The prices for the Portuguese CPI are mainly collected using paper questionnaires, and the same price collector who observes the price makes the data entry at home. Afterwards, the prices collected by a certain price collector are daily sent to local offices, where they are processed and analysed. Then, regional indexes are calculated and sent to the Lisbon office to be aggregated and consolidated. Annually, CPI data collection involve around 1.355.000 prices collected from 11.000 outlets. In 2011, the collection cost was around 740 k, been 83% of this budget represented the costs 3

of the field team, which is composed by 150 price collectors 4 spread through mainland, Azores and Madeira. Statistical production integration Like many other National Statistical Institutes, Statistics Portugal produced statistics through a non-integrated organizational architecture until 2005, based on numerous parallel processes, domain by domain, according to a traditional stovepipe approach. This way of producing statistics was then considered inefficient and not flexible. After a reflection and a reorganization process in 2004, a project to re-engineer the production architecture was undertaken based on an integrated and process driven approach aiming at improving its efficiency and flexibility. Consequently, a central data collection department was created, regional directorates were extinguished, and domain production departments have been merged into three units: economics, social and national accounts. Methods and information system were merged into one department. The current production architecture (simplified) is shown in Figure 1. The departments of Data Collection and National Accounts are directly involved in the CPI production. It was a remarkable challenge, considering the new distribution of resources, roles and responsibilities. The transition was carried out in a way that the statistical operations were not substantially affected, in spite of some resistances and other constraints. This effort resulted in an Integrated Survey Management System (SIGINQ), which covered firstly the business surveys, and later on the social surveys. It is intended to integrate CPI production system in SIGINQ by 2014, incorporating new features and the scanner data as additional information source. Figure 1: Production architecture of Statistics Portugal 4 Statistics Portugal has a team of 350 freelance price collector and interviewers, most of them working in several surveys. 4

Integrated Survey Management System The SIGINQ aims at offering an integrated infrastructure to better support the statistical production and development in an efficient way, covering all the statistical operations (business and social) [3][4]. It unifies the main components into a comprehensive and interdependent system based on the architecture illustrated in Figure 2. Figure 2: Integrated Survey Management System architecture (Level 1) The system follows the basic production sub-processes collect, process, analyse and disseminate. Statistical units registers and metainformation support the flow of the processes. A contact centre system offers the infrastructure to telephone interviews (for social surveys), and the support to data providers. One component of SIGINQ is especially relevant for the scanner data project: the respondent management. This component aims to maximize the relationship with the data provider and the respondent. This is achieved through a repository of all respondents, including information about the identification, localization, contacts, relationships and their collection behavior, (history of the collection activity, quality of the data provided, response timing, etc.,). This tool is very important when the processes are repeated regularly. Information is shared, namely with the provider support and mainly during the Set up collection subprocess (preparing the collection strategy and instruments). SIGINQ building blocks As conclusion of this description, Figure 3 shows the building blocks of a complete representation of SIGINQ. We can note that the system is specialised in three domains: (1) Business Surveys; (2) Social Surveys; and (3) Agriculture Surveys. Figure 3: SIGINQ building blocks 5

3. Scanner Data project As consequence of Statistics Portugal strategy to collect data efficiently, assuring high level of quality and with the lowest response burden, the effort of implementing scanner data in the Portuguese CPI has a high priority for the next years. Statistics Portugal has been awarded in 2011 a Eurostat grant 5 to undertake the initial research on the exploitation of scanner data from the period of 2011-13. The scanner data project covers the following lines of action: Line 1 Knowledge acquisition: to access data on product characteristics for products covered by scanner data, especially on the use of European Article Number (EAN) and its linkage with COICOP classification and in-store codes; to learn from experiences of other countries using scanner data; Line 2 Collaboration with data providers: to explore and to negotiate arrangements to access scanner data from retailer s chains, and to select data providers for a pilot experience; Line 3 Pilot project: to establish continuous scanner data flows routines with selected retail chains at the beginning of 2012; to develop the necessary linking of the aggregated and the product-level codes found in scanner data to COICOP statistical classification; to develop internal data methods for storing and processing scanner data based on a datawarehouse and data mine approaches; to implement actions to develop sample designs and weights, including methods to integrate scanner data with the existing price collection processes; Line 4 Infrastructure: to design and build an information system to support the pilot project. This paper covers the stages of Line 1 (knowledge acquisition) and Line 2 (collaboration with data providers). The approaches taken are described as following. 5 Eurostat Grant n.º 61229.2010.001-2010.549. Objective B: Multi-purpose consumer price statistics The use of scanner data. 6

Line 1: Knowledge acquisition To learn from experiences of other countries using scanner data, Statistics Portugal carried out two visits, one to The Netherlands and another to Switzerland. First visit: CBS, The Hague, The Netherlands: Scanner Data Workshop, September 2011 The Scanner Data Workshop was hosted by CBS Netherlands, counting with 21 participants from 12 countries and Eurostat, covering the following agenda: Based on previously sent contributions each country has presented a brief overview of the involvement with scanner data so far, and plans for the future; Introductory presentation on scanner data (The Netherlands); How we get the data: the practical side to obtaining the data (The Netherlands); Processing the data: the statistical process (The Netherlands); Further and new developments in Norway; Scanner data system and project issues (The Netherlands); Bilateral exchange of views, having three discussion groups: one on obtaining the data, other on the IT-system and one on the statistical process. Second visit: Swiss Federal Statistical Office, Neuchâtel, Switzerland, October 2011 We had the opportunity to organize a bilateral meeting with Swiss Federal Statistical Office for the exchange of practical experiences on the use of scanner data, covering the following items: The Swiss approach on Scanner Data; Advantages and weaknesses of the Scanner data price collection; Quality assurance and risk management; Collaboration with retail chains; Scanner data supply and allocation of items to the COICOP; Item sampling and replacement and other peculiarities of scanner data; 7

Test price collections; Overview of the Scanner Data software application In addition to the main results achieved with these meetings, a significant volume of presentations and papers were available to explore. Reading the documentation allowed us to understand the different phases to implement, the best practices to perform and the particular approaches to follow. The main issues to explore, even in different steps of the implementation plan, are the follow: Exploring and identifying data and files structure; Defining the frequency to transfer the data and security issues; Defining data collection period (week, month); Linking items to COICOP; Defining best data for each item: EAN, short description, (chain specific) product group used to link items to COICOP automatically, expenditures, and quantities sold, and other issues; Selecting the appropriated statistical method; Using expenditure weights; Dealing with temporarily missing prices and imputation methods; Dealing with seasonal items. Dealing with quality differences Making quality adjustment bias with new product; Data cleaning procedures. In the initial phase, specially design of the plan and contact strategies with partners we confirm the advantages of this new approach in terms of quality of data and efficiency of data collection processes. 8

2. Collaboration with data providers As the main focus of the project is to find structured ways to obtain and implement the use of scanner data, a simplification was adopted by limiting the scope to the purchase of food and non alcoholic beverages, which represents approximately 15% of our CPI fixed basket of goods and services. Afterwards, a research into the groceries sector in Portugal has been made. There are five chains that dominate the sector. The current estimated structure of the sector shows the following market share (food and non alcoholic beverages): Modelo Continente (31%); Pingo Doce (30%); Auchan (14%), Lidl (10%), Minipreço (8%), and others (7%). Consequently, there were selected the top two retailers chain, with the highest turnover in food and non alcoholic sector, representing around 60% of the national share: 1. SONAE Group (Hipermercados Modelo Continente), and 2. Jerónimo Martins Group (Pingo Doce). Those two retailers provide, in average, 40% of prices and outlets collected in HICP in food and beverage class, with a significant geographical representativity. In November 2011, the first meeting with SONAE Group was carried out to present the project and understand the availability to collaborate with INE. The results were extremely positive and it was possible to create the conditions to establish a formal protocol with this retailer, which was formalized in the beginning of the year 2012. The two first accesses to scanner data from the Sonae Group were received in April and May 2012, covering information from two consecutive months of 2012 related with the list of outlets, identification and characteristics of the items, and accumulated transactions (quantities and turnover) in reference period. In December 2011, the first meeting with Jerónimo Martins Group was carried out with the same purpose. The cooperation was accepted in March 2012, and two bilateral meetings are established. We expect to receive the first data set in June 2012. Approaching retailers As mentioned before, Statistics Portugal is legally empowered to require statistical information (mandatory and gratuitously) to all companies. This is a strong advantage, but it is considered insufficient to motivate data providers to be involved free of charge in the scanner data project. 9

Otherwise, it is difficult to demonstrate to data providers that scanner data initiative will reduce the statistical burden. In fact, we have found a completely different perception because having our price collectors visiting their outlets is not perceived as a relevant burden. It is considered similar to other natural and frequent activity: data providers do the same with their competitors in order to compare prices. Thus, its is a key challenge to find strong arguments to convince retailers to join the project. As Statistics Portugal has an integrated data collection process, a large amount of information is available to prepare de adoption of a contextualized approach to the retailers. Prior to the first meetings it is possible to understand what surveys the target data providers are currently involved, who are the respondents or key contacts and how are the their response behaviour and quality. This context is considered a critical success factor for the first contact, resulting relevance and credibility to Statistics Portugal. Other issue was to explain what kind of information would be required to the providers. It was seemed too vague and unproductive to start with the following approach: we are very flexible; give us everything you have, because the data structure is not yet defined. We should never forget that prices, quantities and turnover per item are very sensitive information for retailers. So, even mentioning that we are flexible, we opted to show an example of the data structure, as presented on the Appendix of this paper. This simple initiative was very appreciated by the providers. But, we found that the more convincing message in the first face to face meeting was the following: (1) scanner data is the future and will be adopted in our country; (2) the design of this new process has been started and will define the way that retail chains will be required to provide CPI data; (3) we offer the opportunity to the provider to participate in the very beginning of the project, influencing the design of the project in order to be prepared in advance. It has been an interesting journey to achieve the current stage of development. There were many successes, but also some drawbacks. Let s list some of the lessons learned: A full support from the top management is needed from the very beginning of the project; Establishing multi domains working groups with a single leader are required; It is difficult and takes time to reach consensus internally about some CPI methodological changes resulted from scanner data; 10

The main issues to deal with are not the technical ones. Collaboration with data providers are one key point. Otherwise, organizational and change management are the areas where also major barriers and bottlenecks are found; Benefits from the use of scanner data better materialize if taken in a step-by-step approach; Joining a top retail chain at very beginning is crucial. 4. The way forward Statistics Portugal remains seriously engaged with the project and further developments are planned for the period 2013 2017: Create an internal Working Group on Scanner Data, integrating members from prices unit, da collection, information systems and methodology; Establish a working plan for 2013 to 2017; First access to data from de two data providers in a retrospective approach; Evaluate ways to link products to COICOP classification, specially using EAN and internal store codes; Link items collected traditionally and the scanner data sent by the two providers; Analyse and understand price variations between scanner data and traditional approach; Specify and develop the application to support the pilot project, which aims to establish continuous scanner data flow routines with selected retail chains. We intend to develop the necessary linking of the aggregated and the product-level codes found in scanner data to COICOP statistical classification; to develop internal data methods for storing and processing scanner data based on a datawarehouse and data mine approaches; to implement actions to develop sample designs and weights, including methods to integrate scanner data with the existing price collection processes. As mentioned before, due to the many methodical difficulties related to scanner data and the experience known from other countries, Statistics Portugal wants to develop the pilot project through a step by step approach. First, a traditional collection and calculation methods will be pursuit in the initial phase. After, it is expected to use scanner data as improvement of the existing price collection system (new and better data source instead of new calculation method). Infrastructure to design and build an information system to support the pilot project. Evaluate some changes in the CPI methods; 11

Submit the final report of the project describing the approaches taken, detail the results that have been achieved, and providing an assessment of the scope for further work in the field at the national and ESS levels (March 2013) 5. References [1] OECD Glossary of Statistical Terms http://stats.oecd.org/glossary/ [2] Swiss Federal Statistical Office, Scanner data in the Swiss CPI: An alternative to price collection in the field, Reto Müller, room document for the UNECE/ILO meeting of experts on Consumer Price Indices that will take place 10-12 May 2010 in Geneva, Switzerland. [3] Statistics Portugal (2005), Statistical Production System: Architecture, Workgroup Report, internal document. [4] Statistics Portugal (2012), Integrated Data Collection System on business surveys in Statistics Portugal; Paulo Saraiva dos Santos and Carlos Valente; paper presented at the European Conference on Quality in Official Statistics (Q2012); Athens, Greece.. 12

Appendix: Structure of scanner data files (draft version) ITEMS Field Chain EAN Store code Description Hierarchy Package Dimensions Characteristics Other characteristics Description Retail chain code EAN-13 code Internal code for the item Complete description of the item Retail chain hierarchy which includes the item Package characteristics of the item Item package dimensions and capacities Primary characteristics of the item package Other characteristics of the item package TRANSACTIONS Field Chain Outlet EAN Store code Reference Description Retail chain code Retail chain outlet code EAN-13 code Internal code for the item Reference month and year of the item transaction Period Reference code of the accumulation quantities (e.g. W1, H1, M) Quantities Turnover VAT Price Notes Number of items sold in the accumulation reference period Total turnover value of items sold in the accumulation reference period VAT value applied for the item for the accumulation reference period List price of the item for the accumulation reference period Other relevant notes, especially concerning its expected continuity 13