Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger



Similar documents
Declining Business Dynamism in the United States: A Look at States and Metros

Distribution of Household Wealth in the U.S.: 2000 to 2011

Effective Federal Income Tax Rates Faced By Small Businesses in the United States

Securing Critical Information Assets: A Business Case for Managed Security Services

The Recession of

LIST OF MAJOR LEADING & LAGGING ECONOMIC INDICATORS

Where Have All the Young Firms Gone?

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research

This study is an extension of a research

Table of Contents. FY BLS Strategic Plan Our Strategies and Goals Strategy Framework Strategy 1 (Products)...

Improving on a Good Idea from Employee Survey to Measurecom tm

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

The Return of Saving

Use of Consumer Credit Data for Statistical Purposes: Korean Experience

Non Farm Payroll Employment Developments among States during the Great Recession and Jobless Recovery

College of Business Administration Savannah State University Personal Finance (BUSA 3000) Quiz 1 Fall 2011 Dr. William A. Dowling.

Case Study of Unemployment Insurance Reform in North Carolina

With the current national statistics, there are three approaches that may be taken to defining the IT sector and those who work in it.

IHS Global Economic and Financial Data

NBER WORKING PAPER SERIES HIRING, CHURN AND THE BUSINESS CYCLE. Edward P. Lazear James R. Spletzer

Name: Date: 3. Variables that a model tries to explain are called: A. endogenous. B. exogenous. C. market clearing. D. fixed.

Striking it Richer: The Evolution of Top Incomes in the United States (Updated with 2009 and 2010 estimates)

Startup Business Characteristics and Dynamics: A Data Analysis of the Kauffman Firm Survey

A HOW-TO GUIDE: LABOUR MARKET AND UNEMPLOYMENT STATISTICS

Defined Contribution Plan Participants Activities, 2014

Chapter 18. MODERN PRINCIPLES OF ECONOMICS Third Edition

McKinsey Global Institute. June Growth and competitiveness in the United States: The role of its multinational companies

South Carolina Nurse Supply and Demand Models Technical Report

EC2105, Professor Laury EXAM 2, FORM A (3/13/02)

CONGRESS OF THE UNITED STATES CONGRESSIONAL BUDGET OFFICE CBO. The Distribution of Household Income and Federal Taxes, 2008 and 2009

HARD TIMES A Macroeconomic Analysis Presented To: The Financial Advisor Symposium

August Industry Report: SolarBusinessServices. Solar Businesses in Australia. Prepared for: Rec Agents Association

SEMICONDUCTOR INDUSTRY ASSOCIATION FACTBOOK

Unemployment and Inflation

Analysis of JOLTS Research Estimates by Size of Firm

What Can We Learn by Disaggregating the Unemployment-Vacancy Relationship?

Social, Behavioral, and Economic Science Research: The Need for Federal Investments and Priorities for Funding

On March 11, 2010, President Barack

Big Data: Uses and Limitations

Analysis of the U.S. labor market is a difficult

Federal Budget in Pictures CUT SPENDING // FIX THE DEBT // REDUCE THE TAX BURDEN$

VIEWPOINT. Real Economic Growth in America. A Stephens Inc. Economic and Financial Commentary. October 1, 2014

Occupational and Career Outlook for MIS Majors Ken Laudon New York University Stern School of Business 2011

Moving Towards a More Strategic Federal Pay Comparability Policy

Cloud Computing; the GOOD, the BAD and the BEAUTIFUL

A Gender Reversal On Career Aspirations Young Women Now Top Young Men in Valuing a High-Paying Career

2003 National Survey of College Graduates Nonresponse Bias Analysis 1

The Importance of Startups in Job Creation and Job Destruction

Big Data Landscape,

Issue Brief Number 1. Profile of Veteran Business Owners: More Young Veterans Appear To Be Starting Businesses

Supporting Statement for the Domestic Finance Company Report of Consolidated Assets and Liabilities (FR 2248; OMB No )

Wisconsin's Great Cost Shift

For Immediate Release

How To Find Out How Effective Stimulus Is

EXTERNAL DEBT AND LIABILITIES OF INDUSTRIAL COUNTRIES. Mark Rider. Research Discussion Paper November Economic Research Department

Section 2 Offshore outsourcing trends in the US and Japan and their impact on employment

Table of Contents. A. Aggregate Jobs Effects...3. B. Jobs Effects of the Components of the Recovery Package...5. C. The Timing of Job Creation...

GDP: Does It Measure Up? May Classroom Edition. An informative and accessible economic essay with a classroom application.

Business Subject Matter Requirements. Part I: Content Domains for Subject Matter Understanding and Skill in Business

CENTER FOR LABOR MARKET STUDIES

Transcription:

Grand Challenges Making Drill Down Analysis of the Economy a Reality By John Haltiwanger The vision Here is the vision. A social scientist or policy analyst (denoted analyst for short hereafter) is investigating the impact of the great recession and anemic recovery (as of September 2010) on businesses and workers. The analyst begins by exploring the latest aggregate data showing economy-wide, sector-level and broad regional-level variation in terms of business productivity, output, capital investment, prices, wages, employment, unemployment and population. The data on employment changes can be decomposed into hiring, quits, layoffs, job postings, job creation and destruction. The data on unemployment can be decomposed into gross worker flows tracking flows into and out of unemployment. The data on workers is linked to measures from household data tracking income, consumption, wealth, consumer finances and household composition. The data are high frequency (monthly or quarterly) and timely (data for the most recent quarter or month). The data are available not only for the present time period but historically for several decades permitting analysis of both secular trends and cyclical variation. Starting at the economy-wide level, the analyst can drill down into the various key indicators by detailed worker and firm characteristics such as gender, age, immigration status, and education of workers and business size and business age for firms. The business data permits identifying firm startups and also permits tracking firm exits. The firm characteristics include measures of intangible capital assets such as investments such as R&D and innovation as well as tracking foreign trade and outsourcing activity. The sources of financing for businesses by type of financing and by type of business (e.g., by business size, age, industry and geographic location) are available (e.g., amount of financing from debt, debt type, debt terms such as the interest rate, amount of financing from equity and equity type such as angel financing, venture capital financing, hedge fund financing, private equity financing and public equity financing). The analyst can conduct empirical studies at the economy-wide, broad sectoral and broad regional level with data broken down by all of these dimensions. In addition, the analyst can drill down to the individual and firm level creating a longitudinal matched employer-employee data set with all of this information at the micro level. This permits panel data analysis using rich cross sectional and time variation data tracking the outcomes of businesses, workers and households. These outcomes can be tracked at the very detailed location (Census block or track) and detailed characteristics level. The drilled down data aggregates to the national key indicators that receive so much attention. The analyst can ascertain, for example, is it really the case that it is small, young businesses that normally would be creating jobs given their productivity and profitability who can t get credit that is accounting for the anemic recovery as of September 2010. The analyst could track what

type of financing has especially decreased relative to other economic recoveries. The analyst could analyze the impact of policy interventions historically and how they have or have not had influence on different types of businesses and in turn on the workers employed by these businesses. Beyond analysis of the recession and recovery, the drill down infrastructure would permit a range of analyses of the factors driving economic growth and other important economic outcomes for households and businesses. The role of business startups can be tracked in terms of their contribution to productivity, innovation, and job growth. The origins of business startups can be tracked given the longitudinal matched employer-employee data e.g., what is the career path of entrepreneurs and the factors that impact career path as well as success and failure of business startups? The drill down data infrastructure also permits rich studies of the outcomes of households and workers. The impact of immigrant flows on native and immigrant labor market outcomes can be tracked and studied. The drill down data infrastructure enables tracking the education experiences and outcomes of individuals so the factors impacting human capital investment can be studied. In principle, the drill down data infrastructure also is integrated with a variety of other measures of experiences the housing experiences of children and adults, the health experiences of children and adults, the criminal activity as well as the victims of crimes while children and adults. With this added dimensions, a wide range of socio-economic issues can be investigated. The Reality as of 2010 The above vision is not as far from reality in 2010 as one might first surmise but achieving the above vision faces many different challenges. The good news is that progress has already been made on many of these challenges. The bad news is that many very difficult challenges remain that may take decades to overcome. First, the good news. The U.S. federal statistical agencies, state agencies, and private sector data developers have been working on various components of this vision for the last decade or more. The rapid increases in computer speed and disk storage has meant that the ability to track every business, household and worker in the U.S. on many of these outcomes is already a reality. The U.S. Census Bureau has developed longitudinal business databases tracking every establishment and firm in the U.S. that can be integrated into the rich surveys and censuses of economic businesses conducted by Census. The Bureau of Labor Statistics has developed a similar longitudinal business database tracking every establishment in the U.S. that can be integrated into the surveys on businesses conducted by BLS. 1 The Census Bureau has also developed a longitudinal matched employer-employee database that tracks the employment relationships 1 Of course, one might immediately ask why this duplication? We turn to this below

between all employers and employees in the U.S. This data can in turn be linked to the business data at Census as well as the household databases such as the Current Population Survey and the American Community Survey. A variety of administrative databases on housing, education and health outcomes can be integrated into these data. State agencies are likewise developing longitudinal databases tracking education experiences and outcomes and in turn to labor market outcomes for their workers. Other major efforts developing the longitudinal micro data on households and businesses include the efforts tracking health and other outcomes for older Americans in the Health and Retirement Survey by the University of Michigan as well as efforts by the private sector (such as Dunn and Bradstreet) in tracking U.S. businesses. The Federal Reserve has rich data on balance sheets of the financial sector and is working on developing longitudinal data on financial sector firms and activity. The other part of the good news is that it is not only the hardware of computer processing that has advanced rapidly but so has the software. The software and methodology are increasingly available for massive data integration permitting matching of records at a variety of levels of aggregation using all available information. The software and methodology are increasingly available to address the inherent problems of protecting the confidentiality of such drill down data. However, in spite of enormous progress, the bad news is that the U.S. statistical system (broadly defined to track economic, health, education, housing, and other outcomes) remains incredibly balkanized. Legal issues still block the major U.S. federal agencies from sharing their data (hence the duplication between BLS and Census which not only leads to unnecessary duplication but also to significant limitations in the quality of key economic indicators like U.S. productivity and GDP). Legal issues block the federal and state agencies from working collaboratively. Beyond the legal issues, many technical challenges for the cyber infrastructure needed for a drill down data infrastructure remain. We turn to discussing such challenges in the next section. The Challenges The challenges are many. At a broad level, the challenge is that attaining this vision requires integration of a vast array of administrative and survey data from a variety of sources with different objectives and legal requirements for using and protecting the confidentiality of the data. There are many, many details a non-exhaustive list is as follows: 1. Attaining this vision requires a change in the way data are collected and processed. The statistical system needs to be smart in terms of using scarce resources to avoid duplication but also to collect the many different components in a fashion so they can be integrated. Even within statistical agencies, there is a silo approach towards data collection collection and processing methods are designed to produce the specific micro and aggregate data of interest without regard to data integration. Of course, the problem

is much worse across federal and state agencies. The efforts discussed above to integrate administrative and survey data have been severely hampered by the disparate nature of the data. The development of standards and the agreed upon use of common identifiers would greatly facilitate data integration. Of course, the use of common identifiers raises many privacy concerns which we discuss below. 2. Overcoming the legal challenges is a significant obstacle in its own right. This vision can likely only be achieved with the intensive use of administrative data that are collected for other purposes (i.e., the administration of some other program including the collection of taxes and the monitoring of public programs). There needs to be a mandate that administrative data from all of the types of sources discussed above can be used for this type of statistical analysis. 3. Privacy advocates rightfully express concerns that the above vision creates big brother. There are many issues in dealing with privacy concerns which I will not deal with adequately here. One critical issue is insuring a secure environment for developing and accessing this type of data infrastructure. One working model so far is to create secure enclaves (the Census-NSF research data centers). Secure enclaves have shown that they can provide access to many valuable projects from the research community without endangering the privacy and confidentiality of respondents. While this represents great progress relative to 20 years ago, this likely is not the long run solution. Still even if this is not the long run solution (more on this below), if over the next decade or so the Census-NSF data centers can become Federal-State data center enclaves where a host of federal and state data can be housed this would represent major progress. This has already happened to some degree with agencies such as NCHS now housing their data at the Census RDCs. 4. The long run solution is to create a cyber-infrastructure environment that creates the data infrastructure, is securely protected from hostile attacks as well as inadvertent disclosure, is accessible to the user community from many locations (ideally the desktop of the user) and generates statistically valid inferences from the above data infrastructure without the user ever being able to observe enough details to identify any business or person. This requires real-time disclosure protection on an interactive basis. It requires a 21 st century approach to disclosure protection not simply cell suppression or top coding of some sort but rather the use of statistical methodology that will in a flexible manner protect confidentiality in a flexible manner. The use of synthetic methods to create and analyze micro data has made great progress in the last decade but the statistical and research community has much work to do before these methods are ready for widespread use. 5. Further advances in this drill down data infrastructure will require further advances in both hardware and software. Vast amounts of data will need to be stored and processed in a timely fashion. In addition, data integration methods need further development and

refinement. Currently, the data matching programs are capable of developing data matches using a variety of criteria with measures of the quality of matches. But we often need more than this we need an ability to deal with differences in units of observation and frequency. Smart systems that can integrate weekly, monthly, quarterly and annual data readily in terms of both disaggregation (e.g., create the best possible estimates of weekly data using available data) and aggregation (e.g. creating the best possible annual data). The smart systems need to be able to take data using a variety of different units of observation (e.g., households vs. individuals, establishments vs. firms, larger geographic areas and smaller geographic areas, detailed industry and broad sectoral classifications) and integrate. The smart systems need to use the insights and continuing developments from statisticians on missing data imputations with statistical software packages that can adjust standard errors appropriately based upon sampling variation as well as associated imputation. Smart systems are also needed for the collection of data. Many administrative data sources are in principle available on almost a real time basis (often with filings in the last week, month or quarter) but only become available to the statistical system with a considerable lag. 6. The private sector is already in many ways further along in attaining this vision for profitmaking purposes using data mining techniques. This competition with the public sector in the provision of statistics is not necessarily a bad thing. However, the ability of the scientific community to develop the methodology and standards for data integration and associated analysis is limited for most private sector developments. Of related concern, many in the social science community are already beginning to conduct studies using these private sector data sources. Again, this is not a bad thing per se but often the representativeness and statistical properties are not known and the access is not regulated in a manner that permits replication a critical feature in the advancement of science. A very bad outcome would be that in a few decades the private sector developments have superseded the public sector developments so much that social scientists are essentially forced to use private sector data without adequate quality and methodological standards. The private sector data mining developments will not go away and nor should they go away. Finding a way to create public-private partnerships and to help set standards for methodology, access and privacy protection that have some commonality in the public and private sector developments would be valuable. 7. Achieving this data vision requires champions. It requires champions in the executive and legislative branches of government who make both developing this type of data infrastructure available and to protecting the privacy and confidentiality of respondents. It is hard to find champions from these branches of government since data integration is not a hot button topic. The ironic thing of course is that these branches of government continually ask questions that have critical consequences for the future of the U.S.

economy and involve billions or even trillions of dollars and the only way to answer the questions would be to have this type of data infrastructure available.