cloud4health cloud-based textmining to exploit the value of freetext documentation within electronic health records Dr. Martin Sedlmayr Chair of Medical Informatics 10 Years Medical Informatics Erlangen 26.04.2013
2 What about Freetext? 99.9% 71% 80% 53%
3 Basic Idea Text Mining Text Annonation Deidentification
4 What is Cloud Computing? Metaphore / Paradigm Unlimited (elastic) ressources Everybody can access from everywhere
5 cloud4health Focus Cloud services for Big Data Analytics in Medicine Volume 4 Mio EUR 46,5 person years Duration 3 years, since 01.12.2011
6 Secondary Use BI Data Warehouse ETL
7 local Extract Transform Load CDMS HIS SQL CSV... PIDgen Terminology DeIdent Facts Aggregation Query-Tool WWW XLS...... Dimen sions Statistics Visualization...
8 local cloud Extract Transform Load CDMS HIS SQL CSV letters PIDgen Terminology DeIdent Facts Aggregation Query-Tool WWW XLS... Textmining Dimen sions Statistics... Visualization...
9 Architecture HOSPITAL STUDY PORTAL ETL Anonymization Data Mining Anonymized Text Structured Data Annotations Data Warehouse TRUSTED CLOUD Text Mining
10 Architecture Data Extraction Deidentification A Text Mining Text Annotation Structured Data Annotation Data K-Anonym Export C Data Access Data Analysis Data Mining B D
11 1 2 C 3 A Structured Data Annotation Data K-Anonym Export B D
12 Data Extraction 1
13 Deidentification 2 Metadata Name Lists Patterns Machine Learning
14 IDAT-Translator 3 Person (entspricht Name) - surname <string> - familyname <string> - affix <string> (Graf von) - titel <string> (Dr., Prof.,...,) - sex [f m] <enumeration> Date - Day <byte> 11 - Month <byte> 1..12 - Year <byte> 1921 - Weekday <byte> 1..7 - Holiday <string> (Weihnachten, Ostern..) Location - street <string> (Tennenbacherstrasse.) - housenumber <string> (11a) - city code <int> (79132) - city <string> rule (Freiburg) "IdatPerson" - country when <string> - building? (Beispiel Bahnhof, Flughafen, Post) ContactData (entspricht Phone) - phonenumber <int> -- countrycode idat.setaffix(null); (+49) -- areacode (761) idat.settitel(null); -- phonenumber (65465468) - email (jenshuber@strand.vg) Division - organisation (Universität, Rhön Kliniken) <string> - clinic (Bsp. Uniklinik, Waldkrankenhaus) <string> - department (Innere Medizin) <string> - city (Freiburg) <string> - service? (Sprechstunde, Ambulanz..) <string> ID - entity [MedicalRecordId,???] <enumeration> - - value <string> AGE - days <int> # in Tage, da Alterangaben bei Neugeborenen eingeschlossen werden müssen ---- BIOMETRICS - entity <enumeration> [size, weight] # eav schema - unit <enumeration> [metric] - value idat:personidat() then idat.setfirstname( XXXXX ); idat.setfamilyname(stringutils.left(idat.getfamilyname(),1)); idat.setsex(idat.getsex()); end OTHER # all other
15 1 2 C 3 A 4 Structured Data Annotation Data K-Anonym Export B D 5
16 Cloud Infrastructure 4
17 Text Mining 5
18 1 2 C 3 6 A 4 Structured Data Annotation Data K-Anonym Export B D 5
19 Study Portal 6 Raw data Added value services Statistical analysis Data mining I2b2, R, transmart,...
20 Use Cases in cloud4health Building registries Support to build registries for medical research and health technology assessment (HTA) better to implant a hip prothesis with or without cement? Pharmacovigilance Help to detect signals from narrative reports & medication lists suspicious antibiotics cause joint rupture Plausibility check are biologicals used as last ressort in psoriasis treatment? Pathology get TNM, Grading, Morpholoy ICD-O3, from dictated reports
21 Use Case: Endoprothesis Register 200 discharge letters 500 OP reports + 2 more hospitals
22 Summary Secondary use Structured & unstructured data Text Mining Deidentification Cloud computing (hybrid) Dynamic infrastructure Services on demand External and own use One stop shop Use cases Registries, pharmacovigilance
23 BACKUP SLIDES
24 Deidentification 2
Trusted Cloud
26 Process Use Case Description Fragestellung Einschlusskriterien Notwendige Daten zur Beantwortung Identification of Data Sources Klinische Quellsysteme Schnittstellen, Formate, Qualität... Eigentümer und Schutzbedarf Allowance Szenario Eigentümer Datenschützer Ggfs. Einverständnis des Patienten Data Extraction Technische Realisierung Syntaktisch & semantisch
27 Challenges - Data Privacy Health data = sensible data ( 3 Abs. 9 BDSG) Different laws to be considered Landeskrankenhausgesetze (hospitals) Arzt- und Arbeitsrecht (doctors) Eigentums-, Nutzungs-, Persönlichkeitsrechte der Patienten (patients) Bundes- (BDSG) und Landesdatenschutzgesetze (states, country) Pecularities of medical research Informed consent Bound to well defined research question Data sparseness Goals Generic data privacy concept agreed upon a national level Contract templates, guidelines etc.
28 Agenda Motivation Approach Architecture Walkthrough Use Cases