ACCESS METHODS FOR UNITED STATES MICRODATA

Similar documents
Triangle Census Research Data Center Notes from information sessions

Research Opportunities at the Triangle Census Research Data Center

An Examination of Monitored, Remote Microdata Access Systems

LOCAL EMPLOYMENT DYNAMICS: Connecting Workers and Jobs. Heath Hayward Geographer Center for Economic Studies U.S. Census Bureau

National Center for Health Statistics Research Data Center. Disclosure Manual. Preventing Disclosure: Rules for Researchers

Research Data Centre network for transnational access - four years of experiences by seven European RDCs

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

Improving Access to Documentation on Restricted Labor Market Data

Statistique. Canada. Statistics. Canada

A User s Guide to the PSU-Census Bureau Research Data Center. Mark Roberts & Jennifer Van Hook

Northwest Census Research Data Center (NWCRDC)

Texas Census Research Data Center

Regulations for Data Access

SECURE DATA SERVICE Karen Dennison 4 th Workshop on Data Access, Luxembourg, 26 th March 2012

Statistical Data Stewardship in the 21st Century: An Academic Perspective

New Developments in Data Sharing, Remote Access, Secure Data, and Documentation at the Cornell Institute for Social and Economic Research (CISER)

Census RDC Research Proposal Guidelines

The California Census Research Data Center. Vocational Education Cluster

Expanding the Research Data Center in Research Data Center Approach

DwB - WP4 Improving Access to Microdata

The 2006 Earnings Public-Use Microdata File:

Measuring Innovation: Building a Data Infrastructure

Testimony on behalf of the. Population Association of America/Association of Population Centers

Volume Title: Producer Dynamics: New Evidence from Micro Data. Volume Author/Editor: Timothy Dunne, J. Bradford Jensen, and Mark J.

RESEARCH DATA CENTRES PROGRAM CONTRACT PROCESSING FOR COURSES CONDUCTED IN THE RDC

Challenges of Data Privacy in the Era of Big Data. Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014

Women in the Workforce

ESSnet Project. Decentralised Access to EU microdata sets" Network of national safe-centres

Results of the Task Team Privacy

A Comparison of the Tax Burden on Labor in the OECD By Kyle Pomerleau

Susan G. Queen, Ph.D. Assistant Secretary for Planning and Evaluation

The Future of Workforce Management and Buyer Perspectives

Expanding Access to Administrative Data for Research in the United States

An Introduction to Secondary Data Analysis

Challenges in Developing a Small Business Tax payer Burden Model

Minnesota State Colleges and Universities System Procedures Chapter 5 Administration. Guideline Information Security Incident Response

National Data Archive Application for Access to a Licensed Dataset

relating to household s disposable income. A Gini Coefficient of zero indicates

BY Aaron Smith NUMBERS, FACTS AND TRENDS SHAPING THE WORLD FOR RELEASE MARCH 10, 2016 FOR MEDIA OR OTHER INQUIRIES:

CORPORATION FOR PUBLIC BROADCASTING Request for Proposals Moderated Online Focus Groups and Data Analysis

MULTIPLE EMPLOYER PLANS

Statistics Canada Research Data Centres (RDCs) Guide for Researcher Under Agreement with Statistics Canada (Draft)

Statistics Canada Research Data Centres (RDCs) Guide for Researchers under Agreement with Statistics Canada

German Administrative Data

Wealth and Assets Survey Introduction to the survey and results. Simon Robinson and Matthew Steel Office for National Statistics

Leases Summary of outreach meetings with investors and analysts on proposed accounting by lessees May September 2013

FORUM ON TAX ADMINISTRATION

APPLICATION FORM FOR POST OF SENIOR CLINICAL BIOCHEMIST. NB: 5 Curriculum Vitae (unbound) must accompany this Application Form

This data brief is the second in a series that profiles children

Forty years ago when the discovery of North Slope

SAP Splash Privacy Statement

A. KOSTAT IT Vision and Strategy Overview

August, 2005 A COMPARISON OF THE BUSINESS REGISTERS USED BY THE BUREAU OF LABOR STATISTICS AND THE BUREAU OF THE CENSUS

Please visit

Westpac Home Ownership Report AUGUST 2013

Opening Conference of the GINI Project March 2010 Income inequality in historical and comparative perspective

INTERNATIONAL PRIVATE PHYSICAL THERAPY ASSOCIATION DATA SURVEY

Staffing Models in Retail Bank Branches Worldwide Summary & Insights

Technology and Writing Assessment: Lessons Learned from the US National Assessment of Educational Progress 1

What Proportion of National Wealth Is Spent on Education?

P R E S S R E L E A S E

THE RISK OF SOCIAL ENGINEERING ON INFORMATION SECURITY:

Miami-Dade County - Downtown Miami Employment Center Census Tracts Selected for Analysis

Privacy Committee. Privacy and Open Data Guideline. Guideline. Of South Australia. Version 1

Insight for Informed Decisions

How Many People Have Nongroup Health Insurance?

Corus Information Security Group

LMF1.2: Maternal employment rates

MEASURING DISCLOSURE RISK AND AN EXAMINATION OF THE POSSIBILITIES OF USING SYNTHETIC DATA IN THE INDIVIDUAL INCOME TAX RETURN PUBLIC USE FILE

REQUEST; PROPOSED INFORMATION COLLECTION ACTIVITY; COMMENT REQUEST; PROPOSED PROJECT

2. Issues using administrative data for statistical purposes

CHILD CARE CENTERS SELF-ASSESSMENT GUIDE HOW TO MAKE YOUR CHILD CARE CENTER A SAFER PLACE FOR CHILDREN

CERTIFIED. SECURE SOFTWARE DEVELOPMENT with COMMON CRITERIA

Policies and Procedures Audit Checklist for HIPAA Privacy, Security, and Breach Notification

DC Fee Management Mitigating Fiduciary Risk and Maximizing Plan Performance. A Mercer US Point of View

Vermont Behavioral Risk Factor Surveillance System. Strategic Plan and Performance Measures

The Beneficiaries of the Dividend Tax Rate Reduction: A Profile of Qualified Dividend Shareholders

Privacy Policy Version 1.0, 1 st of May 2016

Annex 1: Quality Assurance Framework at Statistics Canada

Data Bulletin Findings from the Medical Expenditure Panel Survey Insurance Component

Statistics on E-commerce and Information and Communication Technology Activity

Access to social housing supports for non-irish nationals including clarification re Stamp 4 holders

Establishing Testing Knowledge and Experience Sharing at Siemens

HIPAA POLICY REGARDING DE-IDENTIFICATION OF PROTECTED HEALTH INFORMATION AND USE OF LIMITED DATA SETS

Statistical Issues in Interactive Web-based Public Health Data Dissemination Systems

The AmeriSpeak ADVANTAGE

INTERNATIONAL COMPARISONS OF PART-TIME WORK

OECD SERIES ON PRINCIPLES OF GOOD LABORATORY PRACTICE AND COMPLIANCE MONITORING NUMBER 10 GLP CONSENSUS DOCUMENT

Benchmark Report. Marketing: Sponsored By: 2014 Demand Metric Research Corporation in Partnership with Ascend2. All Rights Reserved.

Comprehensive emissions per capita for industrialised countries

Why actuaries are interested in demographic issues and why others should listen to them IAA Population Issues Working Group

One Hundred Twelfth Congress of the United States of America

Investing in the United States

2002 Census of Government Employment Methodology

IDAHO STATE UNIVERSITY POLICIES AND PROCEDURES (ISUPP) HIPAA Privacy - De-identification of PHI 10030

Statistics Canada data on Aboriginal children

INTERNATIONALE STICHTING ALZHEIMER ONDERZOEK (ISAO) 2015 GRANT APPLICATION GUIDELINES

Università degli Studi di Roma. "Tor Vergata" Facoltà di Economia. Corso di laurea triennale in. Economia e Management.

Research Data Management. Jeff Moon Data & Government Information Librarian Academic Director, Queen s Research Data Centre

UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS. Managing Statistical Confidentiality. Access

Transcription:

ACCESS METHODS FOR UNITED STATES MICRODATA Daniel Weinberg, US Census Bureau John Abowd, US Census Bureau and Cornell U Sandra Rowland, US Census Bureau (retired) Philip Steel, US Census Bureau Laura Zayatz, US Census Bureau August 20, 2007 This presentation is intended to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed on methodological issues are those of the presenter and not necessarily those of the U.S. Census Bureau. 1

Traditional Methods of Data Access 1. Tabulations Confidentiality of respondents protected by limiting the number of cells relative to the number of observations May use complementary suppression 2

Traditional Methods of Data Access 2. Public-Use Microdata Samples Confidentiality of respondents protected by omitting some information and modifying some of the remaining information Methods include top- and bottomcoding, re-categorization, noise infusion, swapping, and geographic aggregation 3

Four recent data access approaches Licensing providing restricted data directly to individuals or organizations under a confidentiality protection agreement. Research data centers statistical enclaves for research purposes. Remote access submission of analysis requests (typically computer programs) Synthetic data data that mirrors the properties of the collected data yet fully protects the confidential data provided by respondents. 4

Licensing Aspects of the license document : Defines the information subject to the agreement; Specifies the individuals who may have access to subject data; Describes limitations of disclosure and clearance procedures; Lists administrative requirements; Requires that copies of publications based on the data be sent to the sponsoring agency; 5

Licensing, continued Requires the organization to contact the sponsoring agency in case of (suspected) breaches of security; Requires the organization to agree to unannounced and unscheduled inspections; and Reviews the security requirements for the maintenance of, and access to, subject data, and describes penalties for violations. 6

Licensing, continued Examples of U.S. uses: National Center for Education Statistics Division of Science Resource Statistics Bureau of Labor Statistics Panel Study of Income Dynamics 7

Research Data Centers (Data Enclaves) The nine Census Bureau facilities are partnerships with academic and nonprofit organizations (one at HQ). Staffed by a Census Bureau employee. Meet all physical and computer security requirements. Host Census Bureau and other federal agency data. Researchers become Special Sworn Status employees of the Census Bureau. 8

Research Data Centers, continued Goals: Increase the utility and quality of Census Bureau data products; Encourages knowledgeable researchers to become familiar with an agency s data products and data collection methods; Research can address important policy questions without the need for additional data collection; 9

Research Data Centers, continued Goals, continued: Improves the quality of data collection and processing practices by subjecting current methods to testing through additional uses; Allows for data linking not possible with aggregates that leverage the value of existing data; 10

Census RDC Projects Must Meet Five Standards Benefit to Census Bureau programs (13 criteria used) Scientific merit Requires non-public data Must be feasible No risk of disclosure Proposals must also pass a review by the Census Bureau policy office. 11

Other Research Data Centers Examples: Bureau of Labor Statistics HQ National Center for Health Statistics HQ National Institute of Child Health and Human Development (3 locations) National Opinion Research Center (2 locations) Canada, UK, Germany 12

Remote Access Data files usually edited in advance to reduce the possibility of disclosure. Employ automated and manual filters that block certain kinds of queries and results. Must be monitored automatically and/or manually for disclosure avoidance. A difficult issue is complementary disclosure review. 13

Remote Access, continued Methodologies Remote job execution systems - an email interface that allows users to send programs; processing is usually done in batch mode. Web interface with custom-built or custom-tailored software. 14

Examples Remote Access, continued Luxembourg Income Study Canada, Denmark, the Netherlands, Sweden, Australia In the U.S.: National Centers for Education and Health Statistics, Census Bureau 15

Synthetic Microdata Has a relatively long history (e.g., U.S. 1990 decennial census, 1989 U.S. Survey of Consumer Finances) Fully synthetic data - posterior predictive distribution from a modelbased data analysis [Rubin, Fienberg] Partially synthetic data - synthesizing either the sensitive values or the identifiers of sensitive cases [Little] 16

Synthetic Microdata, continued Standards to meet: Protect confidentiality at least as well as other methods. Provide inferences that are consistent with the inferences an analyst would have made from the original data (analytical validity). 17

Synthetic Microdata, continued Major Census Bureau uses (1): Survey of Income and Program Participation linked to longitudinal social security benefit histories and longitudinal employeeemployer earnings reports. Cleared for release More tests of analytical validity of value 18

Synthetic Microdata, continued Major Census Bureau uses (2): Longitudinally integrated employer-employee data from the Longitudinal Employer- Household Dynamics (LEHD) program in OnTheMap Internet application tabulating origindestination data. 19

Synthetic Microdata, continued Major Census Bureau uses (3,4): Longitudinal Business Database Beta version released for testing American Community Survey Not yet released 20

Concluding Comment Threats to public microdata release are increasing. Statistical agencies must respond to this threat in order to meet the needs of their users. Statistical agencies have responded - through licensing, research enclaves, remote access, and synthetic data. 21