cloud4health cloud-based textmining to exploit the value of freetext documentation within electronic health records Dr.



Similar documents
IDRT: Platform Architecture And Tools to Support The Re-use of Routine Clinical Data For Research

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

LEARNING SOLUTIONS website milner.com/learning phone

Implementing a Data Warehouse with Microsoft SQL Server 2012

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Course: SAS BI(business intelligence) and DI(Data integration)training - Training Duration: 30 + Days. Take Away:

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Building Scalable Big Data Pipelines

A Scalable Data Transformation Framework using the Hadoop Ecosystem

IDRT: Integration and Maintenance of Medical Terminologies in i2b2

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Oracle Architecture, Concepts & Facilities

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Real Time Big Data Processing

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Request for Information Page 1 of 9 Data Management Applications & Services

14. Data Warehousing & Data Mining

Business Intelligence & Product Analytics

Connecting Basic Research and Healthcare Big Data

TURN YOUR DATA INTO KNOWLEDGE

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

The Role of the BI Competency Center in Maximizing Organizational Performance

For Sales Kathy Hall

How To Choose A Business Intelligence Toolkit

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

IOT & Big Data: The Future Information Processing Architecture

i2b2 Clinical Research Chart

Microsoft Data Warehouse in Depth

A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC

Structure of the presentation

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Product Overview. Dream Report. OCEAN DATA SYSTEMS The Art of Industrial Intelligence. User Friendly & Programming Free Reporting.

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Integration Checklist

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Data Warehousing Fundamentals for IT Professionals. 2nd Edition

1.2: DATA SHARING POLICY. PART OF THE OBI GOVERNANCE POLICY Available at:

National Integrated Services Framework The Foundation for Future e-health Connectivity. Peter Connolly HSE May 2013

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Reduce and manage operating costs and improve efficiency. Support better business decisions based on availability of real-time information

Ganzheitliches Datenmanagement

Integrating Custom Sub-Ledgers with EBS Using BI Applications Financial Analytics. 03/09/2012 Jamie Adams, Laxmi Vara Prasad Duvvuri AST Corporation

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Singapore s National Electronic Health Record

GoodData. Platform Overview

Updating Your SQL Server Skills to Microsoft SQL Server 2014

Secondary Use of the EHR via Pseudonymisation

The Role of the Analyst in Business Analytics. Neil Foshay Schwartz School of Business St Francis Xavier U

IDMP: An opportunity for information integration across the pharmaceutical value chain

Privacy and security in the cloud

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

How To Develop A Business Model For Big Data Driven Innovation

OpenText Media Management Audit Module FAQ

The i2b2 Hive and the Clinical Research Chart

The Big Data Bioinformatics System

Building Open-Source Based Architecture of Enterprise Applications for Business Intelligence

Business Intelligence in Healthcare: Trying to Get it Right the First Time!

Contents. visualintegrator The Data Creator for Analytical Applications. Executive Summary. Operational Scenario

INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE

Big Data Architect Certification Self-Study Kit Bundle

AV TSS-05 Avantis.DSS 5.0 For Wonderware Intelligence

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Data Warehouse: Introduction

Course 20465C: Designing a Data Solution with Microsoft SQL Server

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

10977B: Updating Your SQL Server Skills to Microsoft SQL Server 2014

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Cis330. Mostafa Z. Ali

Customer Intimacy Analytics

EMC Greenplum. Big Data meets Big Integration. Wolfgang Disselhoff Sr. Technology Architect, Greenplum. André Münger Sr. Account Manager, Greenplum

Cloud Courses Description

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Data Warehousing and Data Mining

Course Outline. Module 1: Introduction to Data Warehousing

Methods and Technologies for Business Process Monitoring

The only 100% open source, complete and flexible Business Intelligence suite

MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer

Transcription:

cloud4health cloud-based textmining to exploit the value of freetext documentation within electronic health records Dr. Martin Sedlmayr Chair of Medical Informatics 10 Years Medical Informatics Erlangen 26.04.2013

2 What about Freetext? 99.9% 71% 80% 53%

3 Basic Idea Text Mining Text Annonation Deidentification

4 What is Cloud Computing? Metaphore / Paradigm Unlimited (elastic) ressources Everybody can access from everywhere

5 cloud4health Focus Cloud services for Big Data Analytics in Medicine Volume 4 Mio EUR 46,5 person years Duration 3 years, since 01.12.2011

6 Secondary Use BI Data Warehouse ETL

7 local Extract Transform Load CDMS HIS SQL CSV... PIDgen Terminology DeIdent Facts Aggregation Query-Tool WWW XLS...... Dimen sions Statistics Visualization...

8 local cloud Extract Transform Load CDMS HIS SQL CSV letters PIDgen Terminology DeIdent Facts Aggregation Query-Tool WWW XLS... Textmining Dimen sions Statistics... Visualization...

9 Architecture HOSPITAL STUDY PORTAL ETL Anonymization Data Mining Anonymized Text Structured Data Annotations Data Warehouse TRUSTED CLOUD Text Mining

10 Architecture Data Extraction Deidentification A Text Mining Text Annotation Structured Data Annotation Data K-Anonym Export C Data Access Data Analysis Data Mining B D

11 1 2 C 3 A Structured Data Annotation Data K-Anonym Export B D

12 Data Extraction 1

13 Deidentification 2 Metadata Name Lists Patterns Machine Learning

14 IDAT-Translator 3 Person (entspricht Name) - surname <string> - familyname <string> - affix <string> (Graf von) - titel <string> (Dr., Prof.,...,) - sex [f m] <enumeration> Date - Day <byte> 11 - Month <byte> 1..12 - Year <byte> 1921 - Weekday <byte> 1..7 - Holiday <string> (Weihnachten, Ostern..) Location - street <string> (Tennenbacherstrasse.) - housenumber <string> (11a) - city code <int> (79132) - city <string> rule (Freiburg) "IdatPerson" - country when <string> - building? (Beispiel Bahnhof, Flughafen, Post) ContactData (entspricht Phone) - phonenumber <int> -- countrycode idat.setaffix(null); (+49) -- areacode (761) idat.settitel(null); -- phonenumber (65465468) - email (jenshuber@strand.vg) Division - organisation (Universität, Rhön Kliniken) <string> - clinic (Bsp. Uniklinik, Waldkrankenhaus) <string> - department (Innere Medizin) <string> - city (Freiburg) <string> - service? (Sprechstunde, Ambulanz..) <string> ID - entity [MedicalRecordId,???] <enumeration> - - value <string> AGE - days <int> # in Tage, da Alterangaben bei Neugeborenen eingeschlossen werden müssen ---- BIOMETRICS - entity <enumeration> [size, weight] # eav schema - unit <enumeration> [metric] - value idat:personidat() then idat.setfirstname( XXXXX ); idat.setfamilyname(stringutils.left(idat.getfamilyname(),1)); idat.setsex(idat.getsex()); end OTHER # all other

15 1 2 C 3 A 4 Structured Data Annotation Data K-Anonym Export B D 5

16 Cloud Infrastructure 4

17 Text Mining 5

18 1 2 C 3 6 A 4 Structured Data Annotation Data K-Anonym Export B D 5

19 Study Portal 6 Raw data Added value services Statistical analysis Data mining I2b2, R, transmart,...

20 Use Cases in cloud4health Building registries Support to build registries for medical research and health technology assessment (HTA) better to implant a hip prothesis with or without cement? Pharmacovigilance Help to detect signals from narrative reports & medication lists suspicious antibiotics cause joint rupture Plausibility check are biologicals used as last ressort in psoriasis treatment? Pathology get TNM, Grading, Morpholoy ICD-O3, from dictated reports

21 Use Case: Endoprothesis Register 200 discharge letters 500 OP reports + 2 more hospitals

22 Summary Secondary use Structured & unstructured data Text Mining Deidentification Cloud computing (hybrid) Dynamic infrastructure Services on demand External and own use One stop shop Use cases Registries, pharmacovigilance

23 BACKUP SLIDES

24 Deidentification 2

Trusted Cloud

26 Process Use Case Description Fragestellung Einschlusskriterien Notwendige Daten zur Beantwortung Identification of Data Sources Klinische Quellsysteme Schnittstellen, Formate, Qualität... Eigentümer und Schutzbedarf Allowance Szenario Eigentümer Datenschützer Ggfs. Einverständnis des Patienten Data Extraction Technische Realisierung Syntaktisch & semantisch

27 Challenges - Data Privacy Health data = sensible data ( 3 Abs. 9 BDSG) Different laws to be considered Landeskrankenhausgesetze (hospitals) Arzt- und Arbeitsrecht (doctors) Eigentums-, Nutzungs-, Persönlichkeitsrechte der Patienten (patients) Bundes- (BDSG) und Landesdatenschutzgesetze (states, country) Pecularities of medical research Informed consent Bound to well defined research question Data sparseness Goals Generic data privacy concept agreed upon a national level Contract templates, guidelines etc.

28 Agenda Motivation Approach Architecture Walkthrough Use Cases