German Record Linkage Center



Similar documents
Private Record Linkage with Bloom Filters

Working Paper Series of the German Data Forum (RatSWD)

Using complete administration data for nonresponse analysis: The PASS survey of low-income households in Germany

Linking Surveys and Administrative Data

Outline. Rules for researchers access to micro data. Data available for researcher. Main task for research service unit

New Developments in Data Sharing, Remote Access, Secure Data, and Documentation at the Cornell Institute for Social and Economic Research (CISER)

Introduction to the Survey Research Data Archive of Taiwan ( 學 術 調 查 研 究 資 料 庫 )

IDL. Get the answers you need from your data. IDL

Programmierbeispiele zur Datenaufbereitung der Stichprobe der Integrierten Arbeitsmarktbiografien (SIAB) in Stata

Regulations for Data Access

E-Commerce and Remote Data Access in Germany

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Bringing Big Data Modelling into the Hands of Domain Experts

IBM Security QRadar Vulnerability Manager Version User Guide

2015 The MathWorks, Inc. 1

Steffen Sirries. Personal. Research Interests. Education. Refereed Publications. Working Papers and Work in Progress

Guidelines of the FDZ of the BA at the IAB as to the Use of Remote Data Access and On-site Use with JoSuA

04/2011. User Guide "Panel Study Labour Market and Social Security" (PASS) Wave 3. Arne Bethmann Daniel Gebhardt (Eds.)

LDAPCON Sébastien Bahloul

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Guide to the MySQL Workbench Migration Wizard: From Microsoft SQL Server to MySQL

Workshop & Chalk n Talk Catalogue Services Premier Workshop & Chalk n Talk Catalogue

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Institutional Repositories: Staff and Skills Set

ARX A Comprehensive Tool for Anonymizing Biomedical Data

ORACLE SYSTEMS OPTIMIZATION SUPPORT

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Data Sheets RMS infinity

preliminary experiment conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or

Supporting a Global SAS Programming Envronment? Real World Applications in an Outsourcing Model

How to Install the VMware ESXi Hypervisor on Physical Hardware

automates system administration for homogeneous and heterogeneous networks

Lectures 9 Advanced Operating Systems Fundamental Security. Computer Systems Administration TE2003

Understanding the Benefits of IBM SPSS Statistics Server

Oracle Technical Cloud Consulting Services Descriptions. July 23, 2015

Structural Health Monitoring Tools (SHMTools)

Profit-sharing and the financial performance of firms: Evidence from Germany

Stretching A Wolfpack Cluster Of Servers For Disaster Tolerance. Dick Wilkins Program Manager Hewlett-Packard Co. Redmond, WA dick_wilkins@hp.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Second International Workshop on Preservation of Evolving Big Data - Panel on Big Data Quality

BestSync Tutorial. Synchronize with a FTP Server. This tutorial demonstrates how to setup a task to synchronize with a folder in FTP server.

areprovidedtoviewprograminformationgatheredbythecompilerandrelateittoinformation

Institutional Repositories: Staff and Skills requirements

Virtualization Techniques for Cross Platform Automated Software Builds, Tests and Deployment

Automating IT Capacity Management

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

Solido Spam Filter Technology

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)

Research Report Job submission instructions for the SOEPremote system at DIW Berlin

Oracle Enterprise Manager

CHAPTER FIVE RESULT ANALYSIS

Privacy Aspects in Big Data Integration: Challenges and Opportunities

Worldclass Recruiting Software for successful Enterprises

WHITE PAPER. Loading Excel Data Securely into SAP ERP Systems

Project Documentation

Semester Thesis Traffic Monitoring in Sensor Networks

IBCSG Tissue Bank Policy

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

This document presents the new features available in ngklast release 4.4 and KServer 4.2.

Embedded Software development Process and Tools: Lesson-3 Host and Target Machines

MRTG used for Basic Server Monitoring

Assessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University - Commerce

VoIP Fraud and Misuse

Oracle to SQL Server 2005 Migration

ORACLE DATABASE 10G ENTERPRISE EDITION

Installing and Configuring Windows Server Module Overview 14/05/2013. Lesson 1: Planning Windows Server 2008 Installation.

How To Study Quality Of Work In Germany

ZABBIX. An Enterprise-Class Open Source Distributed Monitoring Solution. Takanori Suzuki MIRACLE LINUX CORPORATION October 22, 2009

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu.

IMPLEMENTING GREEN IT

What s New in MATLAB and Simulink

Principles and Software Realization of a Multimedia Course on Theoretical Electrical Engineering Based on Enterprise Technology

June 1, Category: Agency

Datzilla. Error Reporting and Tracking for NOAA Data

Monitoring can be as simple as waiting

Software Announcement April 17, 2001

Transcription:

German Record Linkage Center Microdata Computation Centre (MiCoCe) Workshop Nuremberg, 29 April 2014 Johanna Eberle FDZ of BA at IAB

Agenda Basic information on German RLC Services & Software Projects (past/present) Access to linked IAB data Conclusion 2

German Record Linkage Center: Basic information Established: 2011 Directors: Prof. Dr. Rainer Schnell (University of Duisburg-Essen) Stefan Bender (Research Data Centre, Nuremberg) Current staff (N): Dr. Manfred Antoni Dr. Christopher-Johannes Schild Johanna Eberle Funding: DFG (German Research Foundation) Funding Scientific Library Services and Information Systems (LIS) Funding period: 2011-2014 (follow-up grant proposal submitted in Jan 14) 3

Objectives Sustained increase in the number and quality of Record Linkage applications in scientific research (of various fields) Development of new (linked) data sources for research Performing service tasks (FDZ Nuremberg) and research on technical solutions (University of Duisburg-Essen) 4

Activities of the GRLC s two locations FDZ Nuremberg Focus: Service facility Project advisory center Conducting data linkage Recruitment of junior researchers University of Duisburg-Essen Focus: Research unit Development and evaluation of algorithms Development of linkage software Dissemination of current research results Recruitment of junior researchers 5

Services of G-RLC Individual advice during the planning and realization stages of data linkage projects Conducting data linkages as commissioned work Updating and maintaining the record linkage software MTB (Merge ToolBox) Acting as a trustee for the linkage of sensitive datasets Organization of regular workshops and tutorials on Record Linkage, partication in sessions (JSM, ISI, ESRA, SHIP, IHDL) and presence on national and international conferences 6

Webpage www.record-linkage.de Basic information on record linkage (concepts, literature, current research) Record Linkage bibliography Overview of past and present projects and partners Publications of G-RLC staff and Working Paper Series Downloads: MTB, Safelink, TDGen (Test data generator) 7

GermanRLC Working Paper Series Current volumes 2014: Gramlich T 2014. STROKES Record Linkage der Schlaganfälle in Hessen 2007-2010. German RLC Working Paper No. wp-grlc-2014-03. Schild CJ, Antoni M 2014. Linking Survey Data with Administrative Social Security Data - the Project Interactions Between Capabilities in Work and Private Life. German RLC Working Paper No. wp-grlc-2014-02. Kroll M 2014. A Graph Theoretic Linkage Attack on Microdata in a Metric Space. German RLC Working Paper No. wp-grlc-2014-01. All working papers 2011-2014 are free for download via www.record-linkage.de 8

Merge ToolBox (MTB) Collection of Java programs and one GUI Platform-independent (tested for Windows, MacOS, Unix) MTB can be downloaded for free (non-commercial use only) from www.record-linkage.de MTB is widely used in Germany (e.g. evaluation of cancer registry systems) 9

Merge ToolBox (MTB) Features Probabilistic Record Linkage with EM-Estimation Many different string similarity functions (e.g., Jaro, N-Gram, Levenshtein) Array Matching Fuzzy Blocking Privacy preserving Record Linkage with Bloom Filters (Safelink) References: Schnell, R., Bachteler, T. & Bender, S. (2004): A Toolbox for Record Linkage. In: Austrian Journal of Statistics, Vol. 33,1-2, S.125-133. Schnell R., Bachteler T. & Reiher, J. (2009): Privacy-preserving record linkage using Bloom filters. In: BMC Medical Informatics and Decision Making, Vol. 9, 41. 10

Merge ToolBox (MTB) Screenshot 11

Linkage Projects Current focus: Linkage of data on individuals or establishments with administrative data of the German Federal Employment Agency (BA) / Institute for Employment Research (IAB) Advancement of methods: Further development of preprocessing and data cleaning routines (currently: Stata, R; future: Perl) Speed-up of preprocessing and linkage processes 12

Linkage of the German SAVE study with administrative employment biographies (past project) Linkage of two data sets: Wave 9 of the study SAVE Saving and old-age provision in Germany conducted by Munich Center for the Economics of Aging (MEA) Survey on households' saving and asset choices with special focus on old-age provision Administrative Integrated Employment Biographies (IEB) data of the Institute for Employment Research Purpose: Link survey data with administrative information about periods of employment and social security contributions Enhance information from household survey with administrative data on the labour market biographies of respondents and (if applicable) their partner Linkage performed by the G-RLC on behalf of MEA institute 13

Linkage of Bureau van Dijk company data and IAB establishment data (current project) Linkage of Bureau van Dijk enterprise data (German financial company information and business intelligence) with administrative establishment data of the Institute for Employment Research Task: Identification of establishments (IAB) within enterprises (BvD) using company name and legal form Aims: New encompassing data product combining information on establishments and company background (Company-level linked employer-employee data) Opening up new research questions: company-level vs. establishment-level factors Relationship between labor and productivity / capital output 14

Consulting the project Record Linkage between IAB- SOEP Migration Sample and administrative data Project head: P. Trübswetter (Institute for Employment Research) Draw sample of households with migration background from Federal Employment Agency data Integration of subsample in GSOEP survey (German Socioeconomic Panel Study) Link survey data to administrative data of the Institute for Employment Research Advantages: Precise data on employment history of participants over time Longitudinal analyses already after wave 1 of survey 15

Access to linked IAB data Currently 2 modes to access data linked to sensitive IAB data: 1) On-site use at FDZ and remote data access 2) Data transfer to partner institution Both require judicial approval by German ministry of Social Affairs, mode 2 is more extensive Mode 1 requires data sets to be transferred to FDZ So far no ad-hoc way of linking micro data (for data protection reasons) and data sets cannot be stored at separate locations Linkage process: Name and address data are separated from other contents Anonymous linkage (PPRL) is possible and does not require identificators to be transferred 16

Conclusions Foster the creation of linked data sets for scientific research German Record Linkage Center gathers knowledge and resources regarding the linkage of micro data Data cleaning procedures, approximate string matching algorithms, blocking strategies, choice of matching parameters Software and hardware requirements Legal provisions regarding linkage of micro data (e.g., informed consent questions) Privacy-preserving Record Linkage might reduce privacy concerns 17

Thank you for your attention! Visit the German Record Linkage Center online: www.record-linkage.de Or contact us by email: recordlinkage@iab.de www.iab.de

BACKUP www.iab.de

Technical equipment Compute / File Server embedded in a secure IT environment at the Institute for Employment Research multi-core processor, huge RAM, large disk space Software: Statistical software packages: Stata, R Merge ToolBox Routines: Data Cleaning: Scripts in Stata & R 20