Production of Official Statistics by Using Big Data



Similar documents
1. Establishment of a Big Data Roadmap of the Korean. 3. KOSTAT Countermeasures against Big Data Use

"e-statistics" Integrated Information System

National Enterprise-Wide Statistical System The Paradigm Swift to Department of Statistics Malaysia

Economic Data Management at the IMF

Report of the 2015 Big Data Survey. Prepared by United Nations Statistics Division

Big Data Collection Study for Providing Efficient Information

II. Merchandise trade

South Korea Information and Communication Industry

II. Merchandise trade

Big Data (and official statistics) *

The Intelligence Engine.

INVESTMENT PROMOTION, STRATEGIES, POLICIES AND PRACTICES MALAYSIA S EXPERIENCE

Improvement of data collection and dissemination by fuzzy logic

Module 6 Web Page Concept and Design: Getting a Web Page Up and Running

A. KOSTAT IT Vision and Strategy Overview

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Global Online Payment Methods: First Half 2015

GLOBAL MOBILE PAYMENT METHODS: FIRST HALF 2015

Main trends in industry in 2014 and thoughts on future developments. (April 2015)

Statistical data editing near the source using cloud computing concepts

The Bureau of the Fiscal Service. Privacy Impact Assessment

GLOBAL ONLINE PAYMENT METHODS: FIRST HALF 2015

Big Data uses cases and implementation pilots at the OECD

Executive Summary 13. Estimated worldwide annual supply of industrial robots

Case of Korea s National Paperless Trade Platform utradehub

Energy White Paper at a glance

Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu.

PrimeDeveloper. Develop Advanced Financial Information Systems

Delegation in human resource management

Healthcare Sector Development: Russia s Top Priority. Miguel Hernandez Commercial Officer U.S. Commercial Service U.S. Embassy Moscow 2014

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REPUBLIC OF MACEDONIA STATE STATISTICAL OFFICE. Metadata Strategy

Comparisons of Health Expenditure in 3 Pacific Island Countries using National Health Accounts

Direct Mail & Catalogues

IHS Global Economic and Financial Data

Retail Banking Solutions. Customised to Help Your Investment Decisions

Undersatnding Development and Current Issues Related to Internet Marketing Communication with Respect to Local Business in India

Customized Efficient Collection of Big Data for Advertising Services

UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick

Health Data Governance: Privacy, Monitoring and Research - Policy Brief

Figure 1: Level of CEO Total Remuneration (in HK$ million) by Industry - Hong Kong Listed and Incorporated Companies $10.62 $12.16 $12.45 $16.

Trinidad and Tobago. Table 1: GDP Value Added by Industry (Million Dollars, Constant Prices) & % Share in Total Value Added

Conventional BI Solutions Are No Longer Sufficient

ADVANCED GUIDE TO REFWORKS

Privacy Impact Assessment: Peace Corps Intranet

imanage V2.0 Overview

Life Bancassurance in the Asia-Pacific Region: Investment-Related Life Insurance and Retirement Savings

CRM for Customer Service and Support

Why Has Japan Been Hit So Hard by the Global Recession?

SEMICONDUCTOR INDUSTRY ASSOCIATION FACTBOOK

JAPAN IN THE SHADOW OF CHINA?

HLG Initiatives and SDMX role in them

EXPERIAN FOOTFALL: FASHION CONVERSION BENCHMARKING REPORT: 2014

THE POTENTIAL OF DEVELOPING IRAQ SMARTPHONE MARKET AS AN EMERGING AND LUCRATIVE MARKET

AUSTRALIA S EXPORTS OF EDUCATION SERVICES 1

How To Understand The Economic Indicators Of Korea And Japanese

STATISTICS FOR THE FURNITURE INDUSTRY AND TRADE

STATISTICAL OFFICE OF MONTENEGRO MONSTAT

32 Benefits of Pipeliner CRM

Stock Market Briefing: S&P 500 Profit Margins, Sectors & Industries

Software Cost. Discounted STS Rate Units Total $0.00 $0.00 $0.00 $0.00 Total $0.00

A WEB WITHOUT ADVERTISING

ICT. Overview. The Information and Communications Industry Japan s Largest Industry. Contribution to real GDP by industry (2010)

Leveraging Global Media in the Age of Big Data

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Table of Contents Brightcove, Inc. and TubeMogul, Inc Page 2

Japan - A Model For Joint Projects

Analyzing the Elements of Real GDP in FRED Using Stacking

education. In contrast, workers engaged in fishing worked an average of 61.7 hours per

occasional paper on economic statistics SINGAPORE HOUSEHOLD BALANCE SHEET: 2005 UPDATE AND RECENT TRENDS

Data Journalism - Article 14: First Day

How To Understand Current Account Balance In Armenia

ARIBA Contract Management System. User Guide to Accompany Training

The National Number and the Automation of the Civil Records Project Ministry of Interior Administration of Civil Records

WORLD ROBOTICS 2006 EXECUTIVE SUMMARY

Transcription:

Distr. GENERAL Working Paper No. April 2013 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) ORGANISATION FOR ECONOMIC COOPERATION AND DEVELOPMENT (OECD) STATISTICS DIRECTORATE Meeting on the Management of Statistical Information Systems (MSIS 2013) (Paris, France, and Bangkok, Thailand, 23-25 April 2013) Topic (iii): Innovation I. Introduction Production of Official Statistics by Using Big Data Supporting Paper Prepared by Jeong-Im Ahn and Young-Ja Hwang, Statistics Korea, Republic of Korea 1. Recently a tremendous amount of digital data has been created owing to the boom in smart phones and SNS. All around the world, big data is regarded as a key resource for creating enormous value. In Korea, big data is recognized as a core of national competitiveness in the future to create a new value by both private and public sectors. Accordingly, the Korean government pushes ahead with a Big Data Master Plan for the Implementation of a Smart Nation. 2. The Google Price Index and unemployment rates, which are produced by Google, the biggest portal site, are representative examples of using big data for statistical production by the public sector. These trends ask official statistical agencies to find countermeasures against the statistical production by the public sector by using big data and to find ways to use big data by themselves. In the meantime, a problem of checking reliability of statistics that unofficial statistical institutes produce is addressed, too. Currently, statistically advanced countries and international organizations such as OECD and UNECE are discussing statistical policies related to big data, and roles of national statistical offices. 3. Statistics Korea, a central statistical agency, reviews a possibility of combining big data in statistical production to deal with these domestic and overseas trends and to improve efficiency of statistical production. And Statistics Korea designed a Pilot Project for using big data directly in statistical business processes. According to the standard business processes, when producing the Industrial Production Index, every month enumerators visit a sample of establishments. And data on industrial classification, items, sales, etc. are edited, and then the Index is published. In the Pilot Project, the editing process was redesigned to use media data and the big data processing model was inserted. Through this Pilot Project, Statistics Korea aims at establishing a foundation for producing official statistics by using big data. 4. Chapter II describes the background and overview of the Pilot Project, methods to collect and analyze data, and the results of project development. Chapter III presents future plans.

2 II. Pilot Project for Using Big Data in Official Statistics A. Project Overview and Direction 5. According to the increase in one-person households and growing awareness of privacy protection, survey environment is getting worse and worse. Under these circumstances, the production of official statistics by using big data has a great advantage in terms of timeliness and cost effectiveness. 6. However, currently there is no rationale for the application of information or results coming from big data to the target population. Accordingly, it is difficult to substitute big data for official statistics. But big data can be used when supplementing existing statistics. 7. As part of this attempt, Statistics Korea plans to develop a pilot project for using big data in official statistics. This system is designed to provide an integrated analysis function by automatically collecting media data, to provide survey data in a visualized way so as to apply a big data analysis technique, and to reduce editing time of the Monthly Survey of Mining and Manufacturing. 8. Project directions are as follows: First, up to now, when producing the Mining and Manufacturing Production Index, much time and effort is needed for data editing and level analysis. Accordingly, to reduce time for editing and analysis, big data will be widely used. 9. Second, a huge amount of media data will be used for data analysis. At first, the project will be applied to major establishments and items in the Monthly Survey of Mining and Manufacturing and then expanded more and more. B. Analysis Range 10. In the Pilot Project, not only media data but also survey data will be analyzed. Out of the total survey subjects* of the Monthly Survey of Mining and Manufacturing, 4 industry groups (C21, C24, C26, C28), 162 items and 1,438 establishments are selected. Internet news and websites related to those industries, items or establishments will be browsed. And for survey data from 2005 to the current month, indices by industry, item and establishment as well as production tables will be analyzed. [Table 1][Table 2] * 26 industry groups, 633 items and a sample of 8300 establishment in the Monthly Survey of Mining and Manufacturing Industry Source Contents Korea Pharmaceutical Manufacturers www.kpma.or.kr/pharmaceutical news Association C21 C24 (Basic metal products) Production, goods of pharmaceutical companies (Pharmaceutical C26 (Electronic components, products) computer, radio, television and Related articles communication equipment and apparatuses) Korea Metal Journal new.kmj.co.kr/news per product C24 (Basic www.kosa.or.kr/iron & steel information/survey Korea Iron & Steel Association metal products) report, iron & steel journal Internet Newspaper Related articles C26 (Electronic Ministry of Knowledge Economy Website/Press releases, notices, policy components, Korea Customs Service Website/Policy report, press releases computer, radio, Korea Semiconductor Industry Real-time news on semiconductor television and Association communication Semiconductor Network Semiconductor Network news equipment and Korea Display Industry Association News on display industry

3 apparatuses) Internet newspaper Related articles C28 (Electrical equipment) Internet newspaper Related articles [Table 1] Analysis Range of Media Data Table title Contents mi_jisu_analysis Index by industry mi_jisu_analysis_m Index by item mi_dong1_analysis Table by establishment mi_dong2_analysis Table by item [Table 2] Analysis Range of Survey Data C. Data Collection and Analysis 11. Methods to collect and analyze data in the Pilot Project are as follows: 12. First, when collecting media data, related articles on the Internet as well as on relevant websites are scrolled and examined to find words such as increase and decrease in relation to items and establishments within analysis range, for the period of the 1 st day of the previous month to the current month. News on the Internet is scrolled on a real-time basis, while attached documents in the PDF or MS word format on the websites are loaded into the analysis server according to the scheduling method. 13. When analyzing media data, attached documents coming from the websites and Internet news scrolled from the Internet in real time are integrated to analyze specific websites, economic news and portal news in the order of retrieval accuracy. To improve retrieval accuracy, similar search words are also registered in advance. [Table 3][Figure 1] Classification C21 C24, C26, C28 Common Increase, grow, rebound, rise or expand Decrease, drop, fall or decline Item name Establishment name Release, operating profit, renewal, futures, export, Additional transfer, public relations, prescription drug, UNESCO [Table 3] Media Data Search Word

4 [Figure 1] Collection and Analysis of Media Data 14. Second, when collecting survey data, by linking with the database for the Mining and Manufacturing Survey System, survey data and tabulation data from 2005 to the current period are used on a real-time basis. In addition, responses that enumerators input or establishments input through the CASI (Computer Assisted Survey Input) are used in real time. 15. When analyzing survey data, for easier understanding of data, the Mining and Manufacturing Production Index and time-series data are presented in a visualized way. For example, as for production by item and establishment, month-on-month or year-on-year changes in production are presented in graphs. And the production indices by industry and item are presented in graphs as well as month-on-month or year-onyear changes in indices are presented in graphs. [Figure 2] [Figure 2] Collection and Analysis of Survey Data

5 D. Analysis System Environment 16. By considering the importance of data security for the Mining and Manufacturing Production Index, the visualized data analysis system was built in the Intranet, while the pubic media data analysis system was built in the Internet. [Figure 3] E. As-Is and To-Be [Figure 3] Pilot System Environment 17. For the Monthly Survey of Mining and Manufacturing, data input, editing, inquiry, index analysis and data dissemination are carried out every month. 18. Currently to produce the Mining and Manufacturing Production Index, data are inputted, data are edited by using statistical tables, and phone calls are made to establishments when there is an outlier. For example, if Samsung shows a sharp month-on-month or year-on-year decrease in semiconductor production, calls will be made to get information on the decrease. In addition, to edit outliers, an enumerator visits an establishment, browses Internet news concerning items showing outliers, or browses an association website. 19. Through the Pilot Project where big data are used in statistical production, the following effects are expected: 20. First, from specific websites (e.g. an association website), economic news or portal news, some data related to items and establishments (e.g. data indicating production increase or decrease) are automatically collected and uploaded into the integrated system. Therefore, outliers can be detected at a glance, and then an editing process can be done almost simultaneously. It is expected to reduce editing time. 21. Second, for the data from 2005 to the current period, not only time-series production data by establishment and item but also indices by industry and item are shown in graphs. Outliers can be easily detected. Therefore, it is expected to support editing effectively and reduce editing time. [Figure 4] [Figure 4] Derivation of the Mining and Manufacturing Production Index

6 III. Future Plans 22. As mentioned above, it might be dangerous to substitute big data for official statistics and it would be desirable to supplement official statistics with big data. In the future big data will be widely used when analyzing current trends, predicting future trends, and suggesting an alternative. 23. Through the Pilot Project, Statistics Korea finds a way to use big data for the production of official statistics and improve the efficiency of statistical business processes, and builds up techniques for utilization of big data. Based on the findings of the Project, for analysis, industry groups will be expanded to all the industry groups for the Monthly Survey of Mining and Manufacturing. And then the Project will be expanded to other economy-related surveys including the Monthly Service Industry Survey. 24. In addition, Statistics Korea will cooperate with international organizations including the High Level Group so as to find ways to use big data for official statistics. And Statistics Korea will carry out a research on the utilization of big data, like the Billion Prices Project, a project started by MIT, which aggregates price information from online retailers and shows daily price fluctuations.