Combining structured data with machine learning to improve clinical text de-identification

Size: px
Start display at page:

Download "Combining structured data with machine learning to improve clinical text de-identification"

Transcription

1 Combining structured data with machine learning to improve clinical text de-identification DT Tran Scott Halgrim David Carrell Group Health Research Institute

2 Clinical text contains Personally identifiable information (PII): information that can identify an individual in context. Some PII are protected by the Health Insurance Portability and Accountability Act (HIPAA) HIPAA-protected PII Patient name Medical record number Age Social security number Dates (including birthday) Address, room Url, IP address Others Not HIPAA-protected PII Provider name, initials Organization name 2

3 Steps to de-identify PII in clinical text Step 1: Find PII text spans Step 2a: Replace PII spans Step 2b: Remove PII spans 3

4 Finding PII text spans Measurement Recall Precision Purpose The number of identifiers (PII) detected divided by the total number of PII in the reference standard The number of correctly predicted identifiers divided by the number of predictions 4

5 Hypothesis 1. We can train a high performing machine learning model to find most PII with reasonable precision 2. We can increase recall of sensitive, HIPAA-protected PII with a data matching algorithm 5

6 Experiment description Chart reviewers annotate PII to create our gold standard Train a machine model on corpus A Apply model to corpus B Compare machine model vs. hybrid on corpus C Develop a secondary algorithm Review documents where recall is not 100% (corpus B ) 6

7 Experiment corpus Corpus Corpus A Description Training data to develop a machine learning model 635 Family Practice, 70 Internal Medicine, 131 Oncology, 70 OBGYN notes Corpus B Baseline test data to get documents for Corpus B 129 Family Practice notes Corpus B Corpus C Documents in corpus B where the machine learning model did not have 100% recall. Used to inform the data matching rules Reserved test data to assess whether the hybrid approach can improve performance on unseen documents in the future 7

8 The machine learning tool used: MITRE Identification Scrubber Toolkit 1 MITRE Identification Scrubber Toolkit 1 (MIST) is an open source machine learning toolkit specifically designed to de-id PII in natural text Scalable and robust User-friendly interface Powerful commands Well documented 1. An earlier version of MIST was the highestperforming automated system in the Informatics for Integrating Biology and the Bedside (i2b2) deidentification Challenge 8

9 Machine model tested on corpus B Precision Recall 9

10 Examples of PII* in corpus B SUBJECTIVE: Abcdef G Hill is a 44 year old male here to follow up on diabetes and pain. Medicine regimen: In am Mr. Hill takes glyburide 10 mg In pm Mr. Hill takes glyburide 10mg pain level between 5-7/10 Lives with parents (Will and Jane Miller) Get a tdap ( tetanus shot) call optometry at yo recent Phd engineering grad from NYU is currently in europe *All PII shown and redacted are fictional (name, age, date, etc.) 10

11 Data matching algorithm For each note and patient id pair in the corpus Read in the output from MIST (text, PII offsets) Get encounter and patient information Attempt to match then edit/add PII in the following order (with higher certainty first): reg. ex pattern for , date, zip, phone, medical record id, social security number reg. ex pattern to rule out blood pressure, decimal values, pulse match on patient data match on encounter data match from a list of hospitals for organization name match from a list countries and states for address For each token match from a list of US census names for patient name If token is a noun or proper noun phrase match from a dictionary of providers by last name 11

12 Data matching algorithm summary Discrete EMR data Regular Expression Lookup list Address Age ~ - Date - Doctor name - - Medical record number - Organization name - Patient name - Phone - SSN - 12

13 Data matching programming tools Python 1 Developed in IronPython 2 (optional) pyodbc to connect to a Clarity SQL database 3 Regular expression 4 Natural Language Tool Kit (NLTK) 5 unittest (32 bit on Window 7) Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O Reilly Media Inc

14 Recall increased when using the hybrid approach in corpus B Recall Machine model Hybrid 14

15 Precision decreased when using the hybrid approach in corpus B Precision Machine model Hybrid 15

16 Net performance gained overall and in HIPAAprotected PII when using the hybrid approach in corpus B Net performance Recall change Precision Change Recall change + Precision change 16

17 Challenges Person names matching Patient names, as defined, include any non provider names Providers are not always Group Health providers Incomplete sentences, lack of grammar and formatting means part of speech tagging is not reliable? Many false positives matches like Will, May, Major, etc. Organization name in the gold standard include non medical facilities Age appears in many form, matching on a number is not good Tried regular expression to rule out age Dates are in unpredictable, non-distinct formats Performance measurements did not give credit to partial span match Occasionally the gold standard gets updated Only reviewed 36 documents What is reasonable precision lost? How to measure it? 17

18 Conclusions With an integrated delivery system, we often have metadata about each chart note De-identified clinical text can still contains important clinical data useful for research if we weight the value of high recall over precision loss differently for each PII type Continue effort to increase PII More patient and encounter identifier data variables Clever surrogates to hide residual PII Hiding in plain sight 1 1. Carrell D, Malin B, Aberdeen J, et al. J Am Med Inform Assoc (2012). doi: /amiajnl

DeMISTifying Deidentification of PHI in Free-formatted Text

DeMISTifying Deidentification of PHI in Free-formatted Text DeMISTifying Deidentification of PHI in Free-formatted Text Cathy Petrozzino March 2016 Approved for Public Release; Distribution Unlimited. Case Number 16-0670 2016 The MITRE Corporation. All rights reserved.

More information

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,

More information

A Method for Automatic De-identification of Medical Records

A Method for Automatic De-identification of Medical Records A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract

More information

De-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study

De-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study De-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study Elyne Scheurwegs Artesis Hogeschool Antwerpen elynescheurwegs@hotmail.com Kim Luyckx biomina - biomedical informatics

More information

How to De-identify Data. Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008

How to De-identify Data. Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008 How to De-identify Data Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008 1 Outline The problem Brief history The solutions Examples with SAS and R code 2 Background The adoption

More information

Annotated Corpora in the Cloud: Free Storage and Free Delivery

Annotated Corpora in the Cloud: Free Storage and Free Delivery Annotated Corpora in the Cloud: Free Storage and Free Delivery Graham Wilcock University of Helsinki graham.wilcock@helsinki.fi Abstract The paper describes a technical strategy for implementing natural

More information

An Interactive De-Identification-System

An Interactive De-Identification-System An Interactive De-Identification-System Katrin Tomanek 1, Philipp Daumke 1, Frank Enders 1, Jens Huber 1, Katharina Theres 2 and Marcel Müller 2 1 Averbis GmbH, Freiburg/Germany http://www.averbis.com

More information

Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics

Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics Privacy Analytics - Overview For organizations that want to safeguard and enable their

More information

Strategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies

Strategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies Strategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies Clete A. Kushida, M.D., Ph.D. Professor, Stanford University Medical Center Overview

More information

Efficient De-Identification of Electronic Patient Records for User Cognitive Testing

Efficient De-Identification of Electronic Patient Records for User Cognitive Testing 2012 45th Hawaii International Conference on System Sciences Efficient De-Identification of Electronic Patient Records for User Cognitive Testing Kenric W. Hammond Department of Veterans Affairs kenric.hammond@va.gov

More information

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach

More information

A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set. Arya Tafvizi

A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set. Arya Tafvizi A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set by Arya Tafvizi S.B., Physics, MIT, 2010 S.B., Computer Science and Engineering, MIT, 2011 Submitted to the Department

More information

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone pbone@csse.unimelb.edu.au June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................

More information

Automated Tool for Anonymization of Patient Records

Automated Tool for Anonymization of Patient Records Automated Tool for Anonymization of Patient Records Nikita Raaj MSc Computing and Management 2011/2012 The candidate confirms that the work submitted is their own and the appropriate credit has been given

More information

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Dingcheng Li Mayo Clinic, USA 20-Dec-2015

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Dingcheng Li Mayo Clinic, USA 20-Dec-2015 PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)

More information

Securing Big Data Learning and Differences from Cloud Security

Securing Big Data Learning and Differences from Cloud Security Securing Big Data Learning and Differences from Cloud Security Samir Saklikar RSA, The Security Division of EMC Session ID: DAS-108 Session Classification: Advanced Agenda Cloud Computing & Big Data Similarities

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Automated Problem List Generation from Electronic Medical Records in IBM Watson

Automated Problem List Generation from Electronic Medical Records in IBM Watson Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei

More information

Secondary Uses of Health Data IMPAC s Oncology Data Alliance Program

Secondary Uses of Health Data IMPAC s Oncology Data Alliance Program Secondary Uses of Health Data IMPAC s Oncology Data Alliance Program NCVHS August 1, 2007 Joel Goldwein, MD Senior Vice President, Medical Affairs IMPAC Medical Systems Inc. IMPAC Medical Systems, Inc.

More information

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction Large-scale evaluation of automated clinical note de-identification and its impact on information extraction Louise Deleger, 1 Katalin Molnar, 1 Guergana Savova, 2 Fei Xia, 3 Todd Lingren, 1 Qi Li, 1 Keith

More information

Clinical Data Services

Clinical Data Services Clinical Data Services Data Storage, Data Collection Data Management Human Research Academy October 2014 CTS Research Development Services 706.721.6247 www.ctsrds@gru.edu Objectives Participants will:

More information

i2b2 Cell Messaging Project Management (PM) Cell

i2b2 Cell Messaging Project Management (PM) Cell i2b2 Cell Messaging Project Management (PM) Cell Table of Contents 2. Document Version History... 3 3. Introduction... 4 3.1 The i2b2 Hive... 4 3.2 i2b2 Messaging Overview... 4 3.2.1 Message Header...

More information

Privacy Techniques for Big Data

Privacy Techniques for Big Data Privacy Techniques for Big Data The Pros and Cons of Syntatic and Differential Privacy Approaches Dr#Roksana#Boreli# SMU,#Singapore,#May#2015# Introductions NICTA Australia s National Centre of Excellence

More information

A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC

A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC De-ID Data Corp, LLC Founded to: ENHANCE DATA ACCESS WHILE PROTECTING PATIENT PRIVACY Founders Problem

More information

PyCantonese: Cantonese linguistic research in the age of big data

PyCantonese: Cantonese linguistic research in the age of big data PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus

More information

Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR

Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR Nathan Manwaring University of Utah Masters Project Presentation April 2012 Equation Consulting Who we are Equation Consulting

More information

The De-identification of Personally Identifiable Information

The De-identification of Personally Identifiable Information The De-identification of Personally Identifiable Information Khaled El Emam (PhD) www.privacyanalytics.ca 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue W, Suite 200 Ottawa, ON Canada K1P 5J6

More information

Administrative Services

Administrative Services Policy Title: Administrative Services De-identification of Client Information and Use of Limited Data Sets Policy Number: DHS-100-007 Version: 2.0 Effective Date: Upon Approval Signature on File in the

More information

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

De-Identification of Clinical Data

De-Identification of Clinical Data De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV Tyrone Grandison, PhD Manager, Privacy Research, IBM TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008

More information

METHODS IN MEDICAL INFORMATICS

METHODS IN MEDICAL INFORMATICS Chapman & Hall/CRC Mathematical and Computational Biology Series METHODS IN MEDICAL INFORMATICS Fundamentals of Healthcare Programming in Perln Pythoni and Ruby Jules J- Berman TECHNISCHE INFORMATION SBIBLIOTHEK

More information

SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS

SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Bachelor of Science with Honors Research Distinction in Electrical

More information

Build Vs. Buy For Text Mining

Build Vs. Buy For Text Mining Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question

More information

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

PPInterFinder A Web Server for Mining Human Protein Protein Interaction PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar

More information

Introduction to IE with GATE

Introduction to IE with GATE Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation

More information

Compliance in Office 365 What You Should Know. Don Miller Vice President of Sales Concept Searching donm@conceptsearching.com Twitter @conceptsearch

Compliance in Office 365 What You Should Know. Don Miller Vice President of Sales Concept Searching donm@conceptsearching.com Twitter @conceptsearch Compliance in Office 365 What You Should Know Don Miller Vice President of Sales Concept Searching donm@conceptsearching.com Twitter @conceptsearch Agenda Concept Searching Who we are and what we do Office

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

Project Management (PM) Cell

Project Management (PM) Cell Informatics for Integrating Biology and the Bedside i2b2 Design Document Project Management (PM) Cell Document Version: 1.7.1 i2b2 Software Version: 1.7.00 Table of Contents DOCUMENT MANAGEMENT... 4 1.

More information

De-identification, defined and explained. Dan Stocker, MBA, MS, QSA Professional Services, Coalfire

De-identification, defined and explained. Dan Stocker, MBA, MS, QSA Professional Services, Coalfire De-identification, defined and explained Dan Stocker, MBA, MS, QSA Professional Services, Coalfire Introduction This perspective paper helps organizations understand why de-identification of protected

More information

Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning

Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning 3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based

More information

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,

More information

Click to edit Master title style

Click to edit Master title style Click to edit Master title style UNCLASSIFIED//FOR OFFICIAL USE ONLY Dr. Russell D. Richardson, G2/INSCOM Science Advisor UNCLASSIFIED//FOR OFFICIAL USE ONLY 1 UNCLASSIFIED Semantic Enrichment of the Data

More information

Wrestling with Python Unit testing. Warren Viant

Wrestling with Python Unit testing. Warren Viant Wrestling with Python Unit testing Warren Viant Assessment criteria OCR - 2015 Programming Techniques (12 marks) There is an attempt to solve all of the tasks using most of the techniques listed. The techniques

More information

De-Identification of health records using Anonym: Effectiveness and robustness across datasets

De-Identification of health records using Anonym: Effectiveness and robustness across datasets De-Identification of health records using Anonym: Effectiveness and robustness across datasets Guido Zuccon a,b, Daniel Kotzur a, Anthony Nguyen a, Anton Bergheim c a The Australian e-health Research Centre

More information

De-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013

De-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013 De-identification Koans ICTR Data Managers Darren Lacey January 15, 2013 Disclaimer There are several efforts addressing this issue in whole or part Over the next year or so, I believe that the conversation

More information

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Scott L. DuVall Jun 27, 2014 1 Julie Lynch Vickie Venne Dawn Provenzale

More information

ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes

ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes Parth Pathak, Pinal Patel, Vishal Panchal, Narayan Choudhary, Amrish Patel, Gautam Joshi ezdi, LLC.

More information

Natural Language Processing Supporting Clinical Decision Support

Natural Language Processing Supporting Clinical Decision Support Natural Language Processing Supporting Clinical Decision Support Applications for Enhancing Clinical Decision Making NIH Worksop; Bethesda, MD, April 24, 2012 Stephane M. Meystre, MD, PhD Department of

More information

Understanding and Selecting a DLP Solution. Rich Mogull Securosis

Understanding and Selecting a DLP Solution. Rich Mogull Securosis Understanding and Selecting a DLP Solution Rich Mogull Securosis No Wonder We re Confused Data Loss Prevention Data Leak Prevention Data Loss Protection Information Leak Prevention Extrusion Prevention

More information

Natural Language Processing for Clinical Informatics and Translational Research Informatics

Natural Language Processing for Clinical Informatics and Translational Research Informatics Natural Language Processing for Clinical Informatics and Translational Research Informatics Imre Solti, M. D., Ph. D. solti@uw.edu K99 Fellow in Biomedical Informatics University of Washington Background

More information

De-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " "

De-Identification of Health Data under HIPAA: Regulations and Recent Guidance  De-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " " D even McGraw " Director, Health Privacy Project January 15, 201311 HIPAA Scope Does not cover all health data Applies

More information

What is Covered under the Privacy Rule? Protected Health Information (PHI)

What is Covered under the Privacy Rule? Protected Health Information (PHI) HIPAA & RESEARCH What is Covered under the Privacy Rule? Protected Health Information (PHI) Health information + Identifier = PHI Transmitted or maintained in any form (paper, electronic, forms, web-based,

More information

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects

More information

Windows Installation Guide

Windows Installation Guide Informatics for Integrating Biology and the Bedside i2b2 Desktop Install: Full VM Server Windows Installation Guide Document Version: 1.6.1 i2b2 Software Version: 1.6 Table of Contents About this Guide...

More information

Putting IBM Watson to Work In Healthcare

Putting IBM Watson to Work In Healthcare Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research marty.kohn@us.ibm.com Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or

More information

How To Edit An Absence Record On A School Website

How To Edit An Absence Record On A School Website ProgressBook GradeBook Attendance User Guide ProgressBook GradeBook Attendance User Guide (This document is current for ProgressBook v12.3 or later.) 2012 Software Answers, Inc. All Rights Reserved. All

More information

ICE Futures Europe. AFTS Technical Guide for Large Position Reporting V1.0

ICE Futures Europe. AFTS Technical Guide for Large Position Reporting V1.0 ICE Futures Europe AFTS Technical Guide for Large Position Reporting V1.0 ICE FUTURES EUROPE Page 1 of 7 Contents 1. Introduction... 3 2. Online access to Clearing Systems... 4 3. Uploading Data... 5 4.

More information

Implementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience

Implementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience Implementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience Christopher Ryan, Ph.D., CIP IRB Director Professor of Psychiatry University of Pittsburgh ryancm@upmc.edu The

More information

DIGITECH AND HIPAA COMPLIANCE

DIGITECH AND HIPAA COMPLIANCE White Paper DIGITECH AND HIPAA COMPLIANCE April 2004 As HIPAA compliance becomes mandatory, Digitech Systems continues to proactively address the unique needs of the Health Care market. PaperVision Enterprise

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Full VM Tutorial. i2b2 Desktop Installation (Windows) Informatics for Integrating Biology and the Bedside

Full VM Tutorial. i2b2 Desktop Installation (Windows) Informatics for Integrating Biology and the Bedside Informatics for Integrating Biology and the Bedside i2b2 Desktop Installation (Windows) Full VM Tutorial Document Version: 1.4.1 i2b2 Software Version: 1.4 Table of Contents About this Guide... v 1. Prerequisites...

More information

The registry of the future: Leveraging EHR and patient data to drive better outcomes

The registry of the future: Leveraging EHR and patient data to drive better outcomes The registry of the future: Leveraging EHR and patient data to drive better outcomes Brian J. Kelly, M.D. President, Payer and Provider Solutions, Quintiles Jason Colquitt, VP, IT, Head of RWLPR IT, Global

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Research Electronic Data Capture (REDCap)

Research Electronic Data Capture (REDCap) Research Electronic Data Capture (REDCap) An Introduction and Training Seminar Kenna Whitley Center for Research Methods and Data Analysis What is REDCap? A secure, web based electronic data capture system

More information

Clinical Database Information System for Gbagada General Hospital

Clinical Database Information System for Gbagada General Hospital International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 9, September 2015, PP 29-37 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

SE Minnesota Beacon Enabling Population Health Research

SE Minnesota Beacon Enabling Population Health Research SE Minnesota Beacon Enabling Population Health Research Minnesota ehealthsummit June 13, 2013 Research into Practice 3:00pm Session Lacey Hart, MBA, PMP Conflict of Interest Disclosure: Speaker has no

More information

DNA Sequencing Data Compression. Michael Chung

DNA Sequencing Data Compression. Michael Chung DNA Sequencing Data Compression Michael Chung Problem DNA sequencing per dollar is increasing faster than storage capacity per dollar. Stein (2010) Data 3 billion base pairs in human genome Genomes are

More information

Extracting value from HIPAA Data James Yaple Jackson-Hannah LLC

Extracting value from HIPAA Data James Yaple Jackson-Hannah LLC Extracting value from HIPAA Data James Yaple Jackson-Hannah LLC Session Objectives Examine the value of realistic information in research and software testing Explore the challenges of de-identifying health

More information

De-Identification of Clinical Data

De-Identification of Clinical Data De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008 1 1 Slide 1 cmw1 Craig M. Winter, 4/25/2008 Background

More information

Using EHRs to extract information, query clinicians, and insert reports

Using EHRs to extract information, query clinicians, and insert reports Using EHRs to extract information, query clinicians, and insert reports Meghan Baker, MD, ScD NIH HCS Collaboratory EHR working group webinar March 26, 2013 1 E S P V A E R S Electronic Support for Public

More information

HIPAA and Big Data Twenty Third National HIPAA Summit. March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer

HIPAA and Big Data Twenty Third National HIPAA Summit. March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer HIPAA and Big Data Twenty Third National HIPAA Summit March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer Overview HIPAA and Big Data Big Data Definitions Big Data and Health Care Benefits

More information

Big Data and Scripting

Big Data and Scripting Big Data and Scripting 1, 2, Big Data and Scripting - abstract/organization contents introduction to Big Data and involved techniques schedule 2 lectures (Mon 1:30 pm, M628 and Thu 10 am F420) 2 tutorials

More information

SECURE ICAP Gateway. Blue Coat Implementation Guide. Technical note. Version 1.0 23/12/13. Product Information. Version & Platform SGOS 6.

SECURE ICAP Gateway. Blue Coat Implementation Guide. Technical note. Version 1.0 23/12/13. Product Information. Version & Platform SGOS 6. Technical note Version 1.0 23/12/13 Product Information Partner Name Web Site Product Name Blue Coat Systems, Inc. www.bluecoat.com ProxySG Version & Platform SGOS 6.5 Product Description Blue Coat ProxySG

More information

Automated Concept Extraction to aid Legal ediscovery Review

Automated Concept Extraction to aid Legal ediscovery Review Automated Concept Extraction to aid Legal ediscovery Review Prasad M Deshpande IBM Research India prasdesh@in.ibm.com Sachindra Joshi IBM Research India jsachind@in.ibm.com Thomas Hampp IBM Software Germany

More information

CFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics

CFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics The Intersection of Technology, HAART Adherence, and Drug Abuse Treatment CFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics Stephen L. Boswell,

More information

Recommendation-based De-Identification

Recommendation-based De-Identification Recommendation-based De-Identification A Practical Systems Approach towards De-identification of Unstructured Text in Healthcare Varun Bhagwan Healthcare Informatics IBM Almaden Research Center San Jose,

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Russ Waitman Kelly Gerard Daniel W. Connolly Gregory A. Ator Division

More information

Extracting Medication Information from Discharge Summaries

Extracting Medication Information from Discharge Summaries Extracting Medication Information from Discharge Summaries Scott Halgrim, Fei Xia, Imre Solti, Eithon Cadag University of Washington PO Box 543450 Seattle, WA 98195, USA {captnpi,fxia,solti,ecadag}@uw.edu

More information

BUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on!

BUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on! BUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on! Mary-Tara Roth, RN, MSN, MPH BUMC Clinical Research Resources Office (CRRO) Mary Banks, RN, BSN Senior Analyst II, BUMC IRB September

More information

Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15

Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15 Medical Big Data Workshop 12:30-5pm Star Conference Room #MedBigData15 Welcome! Today s Goals: Introduce you to the Big Data @ CSAIL Introduce you to the popular MIMIC II Dataset Overview of Database Technologies

More information

EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING

EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING Kocbek et al. Big Data 2015, Sydney 1 EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING Simon Kocbek, Lawrence Cavedon, David Martinez, Christopher Bain, Chris Mac Manus,

More information

Medical Documentation Barry H. Block, DPM, JD bblock@podiatrym.com

Medical Documentation Barry H. Block, DPM, JD bblock@podiatrym.com Medical Documentation Barry H. Block, DPM, JD bblock@podiatrym.com 1 Why is Documentation Important? Why is Documentation Important? Reimbursement 2 EMR DOCUMENTATION ABSURD ICD 10 CODE V97.33XD: Sucked

More information

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data

More information

Data and Information Management in Public Health

Data and Information Management in Public Health Data and Information Management in Public Health Adrienne S. Ettinger, Sc.D., M.P.H. Environmental Public Health Tracking Methods Course July 2004 Outline Information Management in Public Health Information

More information

Memorandum. Factual Background

Memorandum. Factual Background Memorandum TO: FROM: SUBJECT: Chris Ianelli and Jill Mullan, ispecimen, Inc. Kristen Rosati and Ana Christian, Polsinelli, PC ispecimen Regulatory Compliance DATE: January 26, 2014 You have asked us to

More information

Chicago Health Atlas Context, current status, and future work

Chicago Health Atlas Context, current status, and future work Chicago Health Atlas Context, current status, and future work April 30, 2013 Roderick (Eric) Jones, MPH Chicago Department of Public Health Session Preview What is the Chicago Health Atlas? Background:

More information

White Paper. The Five Keys to a Successful Document Management System ABSTRACT. www.treenosoftware.com Command Your Content

White Paper. The Five Keys to a Successful Document Management System ABSTRACT. www.treenosoftware.com Command Your Content 1 White Paper The Five Keys to a Successful Document Management System ABSTRACT The successful implementation of an electronic document management system begins with a detailed understanding the specific

More information

Eric Engelhard, Ph.D. Director of Informatics Mouse Biology Program University of California, Davis

Eric Engelhard, Ph.D. Director of Informatics Mouse Biology Program University of California, Davis KOMPCluster: A Pattern Recognition and 3D Visualization System for Phenotyping Projects Eric Engelhard, Ph.D. Director of Informatics Mouse Biology Program University of California, Davis Overview Large,

More information

Objective Data Dashboard Metrics Overview Document Purpose: What is the ODD and how does it work?

Objective Data Dashboard Metrics Overview Document Purpose: What is the ODD and how does it work? Objective Data Dashboard Metrics Overview Document Purpose: To improve understanding of the Objective Data Dashboard s (ODD) function, intent, and measures by providing simple descriptions of each ODD

More information

EPOWERdoc EMR Medical Content Building Option

EPOWERdoc EMR Medical Content Building Option Overview Hospitals planning to implement the Emergency Department module for an enterprise Health Information System, such as Meditech or CPSI, are typically required to both create the medical content

More information

Bridging Strategy and Data. Overview. Version 3.3.18.11

Bridging Strategy and Data. Overview. Version 3.3.18.11 Bridging Strategy and Data Overview Version 3.3.18.11 2 PROBLEM: Top 3 reasons to mask data 3 1: Data Breach AXIS DATA MASKING There has been growing number of attacks on major enterprises. Insider fraud

More information

National Patient Information Reporting System: National Data Warehouse

National Patient Information Reporting System: National Data Warehouse National Patient Information Reporting System: NDW General Data Mart Technical Guide Current Version:7.0 Created: 9/16/2011 1:39:00 PM Published: Author: NPIRS-NDW Department of Health and Human Services

More information