Data Management for Biobanks



Similar documents
A collaborative platform for knowledge management

Software Architecture Document

Data and beyond

Connecting Basic Research and Healthcare Big Data

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

ECRIN (European Clinical Research Infrastructures Network)

A Generic Database Web Service

Data Integration and ETL with Oracle Warehouse Builder: Part 1

2012 LABVANTAGE Solutions, Inc. All Rights Reserved.

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

BUILDING OLAP TOOLS OVER LARGE DATABASES

OBIEE DEVELOPER RESUME

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

The basic data mining algorithms introduced may be enhanced in a number of ways.

MS Designing and Optimizing Database Solutions with Microsoft SQL Server 2008

System requirements. Java SE Runtime Environment(JRE) 7 (32bit) Java SE Runtime Environment(JRE) 6 (64bit) Java SE Runtime Environment(JRE) 7 (64bit)

MEGA Web Application Architecture Overview MEGA 2009 SP4

SERVICE ORIENTED ARCHITECTURE

Oracle Warehouse Builder 10g

Oracle Database 11g Comparison Chart

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Software Architecture Document

Communiqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

SimCorp Solution Guide

Denodo Data Virtualization Security Architecture & Protocols

Abdullah Mohammed Abdullah Khamis

The maturity level of APEX. Patrick Hellemans Competence Manager Technology


RRF Reply Reporting Framework

<Insert Picture Here> Move to Oracle Database with Oracle SQL Developer Migrations

APPLICATION COMPLIANCE AUDIT & ENFORCEMENT

Oracle Identity Analytics Architecture. An Oracle White Paper July 2010

Filtering the Web to Feed Data Warehouses

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

enterprise professional expertise distilled

Developing Microsoft SQL Server Databases MOC 20464

Oracle Architecture, Concepts & Facilities

Oracle PharmaGRID Response. Dave Pearson Oracle Corporation UK

An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis

IBM Rational Asset Manager

Oracle 11g New Features - OCP Upgrade Exam

Introduction: Database management system

Report of the DTL focus meeting on Life Science Data Repositories

An Oracle White Paper September Oracle Database 11g DICOM Medical Image Support

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Software Architecture Document

i2b2 Clinical Research Chart

DAWIS-M.D.-adata warehouse system for metabolic data

mframe Software Development Platform KEY FEATURES

Oracle9i Database Release 2 Product Family

SAP BODS - BUSINESS OBJECTS DATA SERVICES 4.0 amron

EHR Standards Landscape

Oracle Data Integrator 11g: Integration and Administration

EDG Project: Database Management Services

Customer Bank Account Management System Technical Specification Document

Metadata Repositories in Health Care. Discussion Paper

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Metadata Strategies: your guide through the data jungle Achim Granzen EMEA Technology Strategist

LabKey Server: An open source platform for scientific data integration, analysis, and collaboration

Module 1: Getting Started with Databases and Transact-SQL in SQL Server 2008

Query Organizer Tool for Oracle DBA Chennakeshava Ramesh Oracle DBA

By Makesh Kannaiyan 8/27/2011 1

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Development and Implementation

Reverse Engineering in Data Integration Software

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL

Gradient An EII Solution From Infosys

Establish and maintain Center of Excellence (CoE) around Data Architecture

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Structure of the presentation

The Inside Scoop on Hadoop

TRANSFoRm: Vision of a learning healthcare system

B.Sc (Computer Science) Database Management Systems UNIT-V

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Oracle Database 11g: New Features for Administrators DBA Release 2

An Oracle White Paper March Managing Metadata with Oracle Data Integrator

Course 20464C: Developing Microsoft SQL Server Databases

i2b2 Clinical Research Chart

Development of Open Source RESTful WHOIS. Haikuo Zhang

Hyperion Data Relationship Management (DRM)

In-Database Analytics

Oracle Data Integrator: Administration and Development

Developing Microsoft SQL Server Databases 20464C; 5 Days

ORM IN WEB PROGRAMMING. Course project report for 6WW Erik Wang

Oracle Big Data Strategy Simplified Infrastrcuture

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL

BusinessObjects XI R2 Product Documentation Roadmap

<Insert Picture Here> Michael Hichwa VP Database Development Tools Stuttgart September 18, 2007 Hamburg September 20, 2007

Extended RBAC Based Design and Implementation for a Secure Data Warehouse

Content Management Systems: Drupal Vs Jahia

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

Data Domain Profiling and Data Masking for Hadoop

Digital Libraries and Content Management

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Transcription:

Data Management for Biobanks JOHANN EDER CLAUS DABRINGER MICHAELA SCHICHO KONRAD STARK University of Klagenfurt and University of Vienna

Data Management for Biobanks Local Integration Project Support Anonymization of Sensitive Data Global Integration

Local Integration 3 Source: MUG Biobank

Project Support for Medical Research Computer Supported Collaborative System Medical research is a highly complex collaborative process Interdisciplinary environment (medical, biological and technical scientists) Distributed resources: patient records, images, gene expression profiles, survival data, lifestyle data Different access rights and permissions (Privacy of patient records, k-anonymization for external cooperation) Cooperative data annotation, formulate hypotheses, statistical analyses Development of a Virtual Scientific Environment allowing to share data and knowledge among collaborating groups

Main Requirements of a CSCW System - R(1) User and Role Management - R(2) Transparency of physical storage - R(3) Flexible data presentation - R(4) Flexible integration and composition of services - R(5) Support of cooperative functions - R(6) Data-coupled communication mechanisms - R(7) Knowledge creation and transparency

Gene Expression Analysis Workflow 6

Anonymization of sensitive-data Release of person-specific data Protection of the anonymity of the individuals Data anonymization Eliminate explicit identifying attributes Transformation of attribute values (generalisation, recoding,..) Problems Information loss: Transformed data may not be applicable in further processing Data quality: Different data quality decrease in transformations

Anonymisation of patient data Our approach Released records meet the k-anonymity constraint Anonymization is influenced by data quality requirements Search for appropriate anonymization solution: Weighted generalisation hierarchies User-defined priorities and generalisation limits Achieve k-anonymity in distributed sources Generate data twins for each data source separately Consider selection and projection criteria

Example: Data to be released

Example: Transformations 10 Generation of data twins Use of international classifications: e.g. ICD-N Remove columns having date type Calculate body mass index

Example: Anonymized data

Global Integration BBMRI Project 12

Challenges within BBMRI Partiality A Biobank as a node in a federation needs descriptive capabilities to be useful for other nodes in the network and it needs the capability to make use of other biobanks. This needs careful design of metadata about the contents of the biobank, the acceptance and interoperability of heterogeneous partner resources. Auditability Documenting the origins and the quality of data and specimens, documenting the sources used for studies and the methods and tools and results of studies is essential for the reproducibility of results. Longevity Incorporating of changes like: new therapies, new analytical methods, new legal regulations and new IT standards. Confidentiality IT-Infrastructure must provide means to protect the confidentiality of protected data and enable the best possible use of data for studies respecting confidentiality constraints.

GATIB II SP4 Research Goals 1. Integration of Biobanks within BBMRI Federation of European Biobanks as essential research infrastructure Service based architecture to implement BBMRI interface specifications Integration of clinical databases and data collections (patient data, clinical records, questionnaires, etc. Security and privacy is a key issue 2. Distributed CSCW system for multicenter studies Researchers workbench extended for distributed work Modeling and enforcement of legal/ethical rules for data exchange Coordination for joint projects 3. Versioning and Evolution of databases historical databases to support longevity of data collection Mapping between different versions (e.g. diagnosis codes, etc.) Re-evaluation of results when underlying data (e.g. GeneOntology) changes

IT Architecture Web Browser (Internet Explorer, Firefox) Apache Tomcat Application Server Distributed CSCW (Scientific Workbench) Http Requests JSP Pages Experiment Server Other Exp. Server Data Management Data Loader Java Objects K-Anonymity Data Analysis Tools Virtual Data Warehouse Data Integration Layer Meta Data Repository Versioning Data Mart JDBC/ODBC JDBC/ODBC PL/SQL Parser XML JDBC/ODBC BBMRI-Services BBMRI Host Sample DB Clinical DBs Gene Expressions Query Repository Experiment Storage Oracle DB MS Access / Excel / SQL / XML Flat Files / DB-Schema XML Files Oracle DB

Example Workflow

Conclusions Information system for biobanks Requirements Challenges in the data management Integration problems Support for medical processes E.g.: Gene Expression Analysis Protection of the anonymity of patients Data twins K-anonymity Challenges of the global integration BBMRI Project