BIG Data and OFFICIAL Statistics



Similar documents
What is the Ideal Labour Code: A Measurement Perspective from the United States Mike Horrigan US Bureau of Labor Statistics

Big Data Landscape,

Quarterly Census of Employment and Wages (QCEW) Business Register Metrics August 2005

JOB OPENINGS AND LABOR TURNOVER APRIL 2015

Table of Contents. FY BLS Strategic Plan Our Strategies and Goals Strategy Framework Strategy 1 (Products)...

The Billion Prices Project Using Online Prices for Inflation and Research

Comparing the Consumer Price Index and the Personal Consumption Expenditures Price Index

Consumer Price Index, Anchorage First Half 2015 Areapricesup0.1percentoverthepastsixmonths,up1.1percentfromayearago

The Recession of

2003 National Survey of College Graduates Nonresponse Bias Analysis 1

2003 Annual Survey of Government Employment Methodology

How To Price Index For Couriers

REPORT OF THE MAINE STATE REVENUE FORECASTING COMMITTEE

WEIE Labor Market Context: Methods & Resources

NHE DEFLATOR INTERMEDIATE SUMMARY. National Health Expenditures

QUARTERLY ESTIMATES FOR SELECTED SERVICE INDUSTRIES 1st QUARTER 2015

Research. Dental Services: Use, Expenses, and Sources of Payment,

Technical Memorandum REVENUE FORECASTS. Prepared for: Prepared by:

Tracking the Macroeconomy

Alternative data collection methods -

Quarterly Employment Survey: September 2008 quarter

NATIONAL INCOME AND PRODUCT ACCOUNTING MEASURING THE MACROECONOMY

2. Issues using administrative data for statistical purposes

How To Pass The Health Care Bill

State of the Satellite Industry Report

Business cycles, monetary policy and property markets Governor Svein Gjedrem Næringseiendom April 2005

CHAPTER 7: CHANGE IN PRIVATE INVENTORIES

CHAPTER 7: CHANGE IN PRIVATE INVENTORIES

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

Analysis of JOLTS Research Estimates by Size of Firm

QUARTERLY RETAIL E-COMMERCE SALES 1 ST QUARTER 2016

Retail Food Price Formation in the United States

South Carolina Nurse Supply and Demand Models Technical Report

CHAPTER 5: PERSONAL CONSUMPTION EXPENDITURES (Updated: November 2012)

Wholesale Trade Survey: December 2014 quarter

Quality and Methodology Information

Broward County. Information Technology Industry Edition. Source: Florida Department of Economic Opportunity, Bureau of Labor Market Statistics

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Transit Cost Analysis

State Government Tax Collections Summary Report: 2013

CONSUMER PRICE INDEX: METHODOLOGICAL GUIDELINES ST KITTS & NEVIS

Danny R. Childers and Howard Hogan, Bureau of the,census

Oracle CPQ Cloud Product Overview. Sergio Martini CX/CRM Master Principal Sales Consultant CX Organization June 11, 2014

Facts. Who are direct-care workers? Job titles and responsibilities. February 2011 Update

Reference: Gregory Mankiw s Principles of Macroeconomics, 2 nd edition, Chapters 10 and 11. Gross Domestic Product

Thank you for choosing our firm to prepare your income tax returns for tax year This letter confirms the services we will provide.

World Fuel Services Corporation Overview

Broward County. Aviation Industry Edition. Source: Florida Department of Economic Opportunity, Bureau of Labor Market Statistics

Re: Inquiry into the Private Health Insurance Legislation Amendment (Base Premium) Bill 2013

CHAPTER 5: PERSONAL CONSUMPTION EXPENDITURES

Transportation & Logistics Industry Cluster Profile I Fond du Lac County, WI

Household health care spending: comparing the Consumer Expenditure Survey and the National Health Expenditure Accounts Ann C.

Position Classification Flysheet for Inventory Management Series, GS Table of Contents

Interpretations of the Productivity Growth Slowdown

Big Data: Uses and Limitations

Practice Management 101

Big Data for the Airport How can it Contribute to an Enhanced Travel Experience?

1135 E. Route 66, Suite 108 Glendora, CA

Co-employment and the Business Register:

NORTH CAROLINA GENERAL ASSEMBLY LEGISLATIVE FISCAL NOTE

UK Service Industries: definition, classification and evolution. Jacqui Jones Office for National Statistics

Further Integrating BEA s Economic Accounts: Introducing Annual Input-Output Estimates into the Gross State Product by Industry Accounts 1

Transcription:

BIG Data and OFFICIAL Statistics International Year of Statistics November 13, 2013 Michael W. Horrigan Associate Commissioner Office of Prices and Living Conditions

Big Data and Official Statistics What are big data? A few examples Definition / scope How is BLS using big data? Examining big data through the lens of data quality frameworks What is the future of using big data by statistical agencies? 2

What are Big Data? A few examples Billion prices project Daily CPIs in 20 countries Webscraping technology Google Tools to create large data files that combine publicly available data on social and economic activity stratified by geography, and socialdemographic characteristics 3

What are Big Data? A few examples Google Modeling form combines Google search index data in the current period with past values of an economic measure from the statistical system to predict a future value of the same concept. Y t = f ( Search t-1 to t, Y t-1, t-2, ) Example: Initial claims 4

What are Big Data? A few examples UPS GE Using telematic sensors in over 46,000 vehicles, big data on route selection, speed, and direction Estimated savings of 8.4 million gallons of fuel by cutting off 85 million miles of route driven in 2011. Use of real time monitoring of machines with big data analytic techniques to improve productivity of electricity generating machines, aviation, rail transportation, and health care. 5

Big Data Definitions/Scope Wikipedia Big data is term for the collection of data sets so large and complex that it becomes difficult to process using hands-on data base management tools or traditional data base processing applications. 3V definition Volume, Velocity, Variety 6

Big Data and Official Statistics Non-sampled data Big Data Admin Data Sampled survey data 7

Bureau of Economic Analysis Non-sampled data Big Data Admin Data Sampled survey data GDP 8

Big Data and Official Statistics What are big data? A few examples Definition / scope How is BLS using big data? Examining big data through the lens of data quality frameworks What is the future of using big data by statistical agencies? 9

How are Big Data being used? Webscraping BLS CPI Create data base of product characteristics for use in quality adjustment hedonic models Televisions Camcorders Camera Washing Machines Research to expand use to collect prices for cable TV plans and airline prices 10

How are Big Data being used? Scanner data: Homescan, Nielson Actual sales transactions Comparison of national distribution of selected products with results from CPI disaggregation process JD Power Used car frame for CPI Researching use for CPI production of new car price indexes 11

How are Big Data being used? Medicare part B PPI and CPI use reimbursements to doctors by procedure code in indexes Claims data Validation of MEPS and CPI inflation rates Note: CPI constructs experimental disease based price indexes using annual weights from the MEPS household survey data 12

How are Big Data being used? Stock Exchange Security Trades PPI receives a monthly census of all bid and ask prices and trading volume for all traded securities as of market close for 3 selected days of the month. These data are used for index estimation Ratio allocation CPI cost allocation weights and gasoline data from EIA 13

How are Big Data being used? Company provided data Corp X Research by CPI to use company provided data on all register transactions for sampled outlets Challenges: Can the matched model requirement be satisfied Accounting for substitutes IT production requirements Risk of losing access Opportunity: Use of corporate data in direct estimation 14

How are Big Data being used? Administrative data Published data using universe counts Sampled surveys Estimation Drawing samples Frame refinement Development of weights Imputation 15

How are Big Data being used? BLS Quarterly Census of Employment and Wages: Some examples of uses: BLS sampling: PPI, NCS, CES, OES, OSH, JOLTS, Green Jobs Imputation: State based estimates use QCEW data to impute for key non-respondents Use of QCEW data to develop forecasts that are used in the CES birth death model 16

How are Big Data being used? Administrative data Used directly in estimation IPP uses EIA data on crude petroleum for their import indexes PPI uses Department of Transportation data on baggage fees CPI uses SABRE data for airline prices 17

How are Big Data being used? Administrative data Linking Census Bureau s Longitudinal Establishment. BLS Business Employment Dynamics Linking within agencies Sharing across agencies: CIPSEA 18

Big Data and Official Statistics What are big data? A few examples Definition / scope How is BLS using big data? Examining big data through the lens of data quality frameworks What is the future of using big data by statistical agencies? 19

Assessing Big Data through the lens of Quality frameworks Statistical agencies use a variety of quality dimensions to judge the efficacy of their direct data collection programs. It is reasonable to ask how the use of Big Data by Billion Prices, Google, Intuit and others fare along the same dimensions The use of external data sets (Big, Administrative, Other surveys) by statistical agencies to produce blended estimates should come under the same scrutiny 20

Quality as a three-level concept Product Quality Process Quality Organizational Quality 21

Product Quality Timeliness Relevance Objectivity Clear, unbiased Accuracy sampling errors Calculated, published, used in analysis Accuracy non sampling errors Coverage Primary challenge to statistical systems Often an advantage of Big Data 22

Product Quality Accuracy non sampling errors Non response bias Significant concern of statistical systems about their own data and for Big Data Classification/specification Lack of cross walks across different classification systems across statistical systems, administrative data, firm data, big data Metadata/transparency/interpretability Coherence / comparability 23

Big Data and Official Statistics What are big data? A few examples Definition / scope How is BLS using big data? Examining big data through the lens of data quality frameworks What is the future of using big data by statistical agencies? 24

Leveraging Big Data and the Future of the U.S. statistical system Budgets for the U.S. statistical system are flat or declining in real terms Faced with budget cuts, such as through the sequester, most agencies cut programs For example, in FY 2013, BLS eliminated the International Comparisons Program, the Mass Layoff Survey Program, and the Green Jobs Employment Program 25

Leveraging Big Data and the Future of the U.S. statistical system Big Data Potential for large volumes of data that, except for up front ($) investment in infrastructure and skill enhancement needed for handling big data, potentially relatively less expensive than direct data collection. Replace, enhance, expand or validate 26

Leveraging Big Data and the Future of the U.S. statistical system Replace Replace directly collected data with big data estimates Administrative data less subject to selection bias Use of DOT baggage fees, ask/bid prices on securities Self selection bias from social media sources very problematic for top side estimates Use reliable survey methods for control totals and ratio allocate biased proportions to sub categories Replace the goal of unbiased estimation for detailed estimates that minimize Mean Square Error Bias is not always an issue QA model example 27

Leveraging Big Data and the Future of the U.S. statistical system Enhance Use direct survey (or high quality blended) estimates for periodic repeated cross-sectional or time series estimates (monthly, quarterly, semiannual, annual) and redefine the relevance and mission of the statistical system by publishing (noisier and blended) data on a more frequent basis (weekly, daily) EIA weekly gasoline prices CPI s three pricing periods 28

Leveraging Big Data and the Future of the U.S. statistical system Expand Use big data, especially administrative data and corporate data, to expand coverage of the economy. As of March 2013, 4.3% percent of businesses were of size 50 or more, accounting for just under 60% of employment. Could IRS administrative records for the remaining 95.7% be used as a substitute for expensive direct data collection from small firms and greatly 29 increase their coverage?

Leveraging Big Data and the Future of the U.S. statistical system Expand In the first quarter of 2013, there were just under 16,000 firms with 250 or more employees, accounting for 17% of total employment in the US Explicit strategies to gain access to corporate records from the 16,000 firms could again reduce the direct data collection burden from such large firms. Corporate records have the potential of greatly increasing the quantity, quality and timeliness of data we collect. 30

Leveraging Big Data and the Future of the U.S. statistical system Expand Two largest gaps in our coverage of the economy is in the service providing sector and in global production processes. Could corporate data record keeping systems, especially firms with global production processes, be leveraged to fill in these gaps? 31

Leveraging Big Data and the Future of the U.S. statistical system Validate Use of big data as a check on findings from direct survey collection such as validating the CPI market basket for select items against scanner data Development of control/treatment groups Machine learning by Google Experimental design for social interventions 32

Contact Information Michael Horrigan Associate Commissioner Office of Prices and Living Conditions www.bls.gov 202-691-6960 horrigan.michael@bls.gov

What are Big Data? 34