A User s Guide to the PSU-Census Bureau Research Data Center Mark Roberts & Jennifer Van Hook
Outline Introduction to RDCs Data Resources in Health and Demography Application Process for NCHS Data Resources in Economics Application Process for Census Bureau Data Using the RDC Special Sworn Status Conducting Research Resources at Penn State Synthetic Data Sets
Introduction to RDCs
Social Science & Data Many social scientists rely on large nationallevel data sets Everyone can access the same public-use data (e.g., ACS, NHANES, SIPP) Innovation fueled by how we use data: New statistical and measurement methods Creatively combining data sets Access to restricted-use data
Limitations of Public-use Data To protect confidentiality, public-use data: Are completely de-identified Have limited geography (e.g., county, state, country of birth) Group response categories (e.g., income) Exclude sensitive personal characteristics (e.g., weight) Exclude sensitive economic & social data (e.g., linked tax and Social Security data) Data are perturbed (e.g., age) Data availability has become more limited over the past decade
Census Research Data Centers Established in the 1990s by the Center for Economic Studies (CES) at the U.S. Census Bureau Network of highly-secure computer labs Permits researchers to access restricted-use Census and NCHS data without travel Major Research-I universities have a Census RDC Michigan, Minnesota, UCLA, UNC, Berkeley, Stanford, Cornell, Chicago, Columbia, Boston area, Texas, Penn State
Penn State s Research Data Center Opening in March 2014 Location: 203E Pattee Supported by: NSF Office of the President SSRI PRI University Libraries Colleges of the Liberal Arts, Health and Human Development, Agriculture, and Science (Eberly)
Restricted Data Can Answer Questions Like These: How much has income inequality changed over time? What are the characteristics of firms that provide pension benefits versus those that do not? How does natural gas development affect population distribution and health?.... Working papers and bibliographies available here: http://ideas.repec.org/s/cen/wpaper.html http://www.cdc.gov/rdc/b6pubeyond/pub611.htm
RDC Data Resources in Health and Demography
Primary Data Producers U.S. Census Bureau National Center for Health Statistics National Institute of Justice Agency for Healthcare Research and Quality NOT NCES or Add Health (we have other arrangements for these data sets)
Added Value Detailed Geographic Identifiers Unperturbed data (e.g., Age) More detail on characteristics & events Place of birth Date of birth Occupation Income Administrative record linkages
Demographic Data American Community Survey (1996-2009) American Housing Survey (1984-2009) Decennial Census (1970-2000) Current Population Survey March Supplements (1967-2005) National Longitudinal Survey (original cohorts) Survey of Income and Program Participation linkage is possible at detailed levels of geography Complete list: http://www.census.gov/ces/dataproducts/demographicdata.html
NCHS Data (unlinked) National Health and Nutrition Examination Survey I, II, and III National Ambulatory Medical Care Survey National Hospital Ambulatory Medical Care Survey National Survey of Ambulatory Surgery National Hospital Discharge Survey National Nursing Home Survey National Home and Hospice Care Survey National Employer Health Insurance Survey National Health Provider Inventory National Health Interview Survey National Immunization Survey Longitudinal Study on Aging National Survey of Family Growth State and Local Area Integrated Telephone Survey Health Child Well-Being and Welfare, 1997 National Survey of Early Childhood Health National Survey of Children with Special Health Care Needs National Survey of Children's Health National Asthma Survey National Survey of Children with Special Health Care Needs Vital Statistics (Birth, Mortality, Marriages and Divorces, Fetal, Death National Death Index) http://www.cdc.gov/nchs/r&d/rdc.htm
NCHS Data (linked) National Health Interview Survey with: Mortality Data 1986-2000 Medicare Enrollment and Claims Data Social Security Administration Retirement, Survivors, and Disability Insurance Data, 1962-2003 Social Security Administration Supplemental Security Income Data, 1974-2003 National Health and Nutrition Examination Survey I Epidemiologic Follow-up Study with: Mortality Data 1971-2000 Medicare Enrollment and Claims Data 1991-2000 National Health and Nutrition Examination Survey I with: Social Security Administration Retirement, Survivors, and Disability Insurance Data 1962-2003 Social Security Administration Supplemental Security Income Data,1974-2003 National Health and Nutrition Examination Survey II with: Mortality data 1976-2000 Medicare Utilization and Expenditure Data 1991-2000 National Health and Nutrition Examination Survey III with: Mortality Data 1988-2000 Medicare Enrollment and Claims Data (CMS-1991-2000) http://www.cdc.gov/nchs/r&d/rdc.htm
AHRQ Data Medical Expenditure Panel Survey Datasets Household Component-Insurance Component linked file (1996-1999, 2001) Nursing Home Component (1996) Medical Provider Component (except directly identifiable data) Two-Year, Two-Panel Files Area Resource File (county-level data that can be linked to MEPS-HC) MEPS-HC Public Use Files AHRQ will create a custom extract for each project. http://www.meps.ahrq.gov/mepsweb/data_stats/onsite_datacenter.jsp
Application Process for NCHS Data Step 1: Determine a need for restricted data Step 2: Determine the best mode of access (remote, NCHS, Census) Step 3: Develop your research proposal Step 4: Submit your proposal for review, emailed as one document to Peter Meyer, RDC Director at rdca@cdc.gov. Step 5: Wait for comments from the review committee and respond quickly to expedite review Step 6: Update your proposal when there are changes http://www.cdc.gov/rdc/b3prosal/pp300.htm
Application Process for NCHS Data Proposal Outline A. Abstract B. Research Question C. Background D. Public Health Benefit E. Data Requirements: Survey, Years, Files Restricted Variables: Non-NCHS Data: Merge Variables F. Methodology: G. Output: Overview: Examples/Table Shells Presentation of Results H. Data Dictionary I. References J. Other Authors K. Resumes/C.V Examples NCHS: http://www.cdc.gov/rdc/data/b3/sampleproposal.pdf Van Hook et al. see handout
Application Process for NCHS Data Tips Consult with Penn State s RDC administrator and NCHS RDC staff Smaller-scope projects are easier to obtain approval for Proposals can be slim on theory and lit review Lots of detail about the data and file construction required: what variables will you use and/or construct? What files do the variables come from? How will the data files be constructed? What files are merged, by what identifiers, in what order?
Application Process for Census Data Preliminary Proposal approved by local RDC Director or Administrator Final Proposal. For Census data this includes a Predominant Purpose Statement. Examples Must identify all data sets you will need for the project.
NCHS versus Census Information about data in proposal Benefit to statistical agency NCHS Lots of detail about data and variables required Not required Census Less detail required Required Time to gain approval Usually 3-4 months At least 6 months. More time for projects requesting IRS data Who merges the data files? Data Access NCHS programmers Only the variables specified in proposal Researchers All variables in requested data files Disclosure Review 1-2 days 3-4 weeks