Combining structured data with machine learning to improve clinical text de-identification

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Combining structured data with machine learning to improve clinical text de-identification"

Transcription

1 Combining structured data with machine learning to improve clinical text de-identification DT Tran Scott Halgrim David Carrell Group Health Research Institute

2 Clinical text contains Personally identifiable information (PII): information that can identify an individual in context. Some PII are protected by the Health Insurance Portability and Accountability Act (HIPAA) HIPAA-protected PII Patient name Medical record number Age Social security number Dates (including birthday) Address, room Url, IP address Others Not HIPAA-protected PII Provider name, initials Organization name 2

3 Steps to de-identify PII in clinical text Step 1: Find PII text spans Step 2a: Replace PII spans Step 2b: Remove PII spans 3

4 Finding PII text spans Measurement Recall Precision Purpose The number of identifiers (PII) detected divided by the total number of PII in the reference standard The number of correctly predicted identifiers divided by the number of predictions 4

5 Hypothesis 1. We can train a high performing machine learning model to find most PII with reasonable precision 2. We can increase recall of sensitive, HIPAA-protected PII with a data matching algorithm 5

6 Experiment description Chart reviewers annotate PII to create our gold standard Train a machine model on corpus A Apply model to corpus B Compare machine model vs. hybrid on corpus C Develop a secondary algorithm Review documents where recall is not 100% (corpus B ) 6

7 Experiment corpus Corpus Corpus A Description Training data to develop a machine learning model 635 Family Practice, 70 Internal Medicine, 131 Oncology, 70 OBGYN notes Corpus B Baseline test data to get documents for Corpus B 129 Family Practice notes Corpus B Corpus C Documents in corpus B where the machine learning model did not have 100% recall. Used to inform the data matching rules Reserved test data to assess whether the hybrid approach can improve performance on unseen documents in the future 7

8 The machine learning tool used: MITRE Identification Scrubber Toolkit 1 MITRE Identification Scrubber Toolkit 1 (MIST) is an open source machine learning toolkit specifically designed to de-id PII in natural text Scalable and robust User-friendly interface Powerful commands Well documented 1. An earlier version of MIST was the highestperforming automated system in the Informatics for Integrating Biology and the Bedside (i2b2) deidentification Challenge 8

9 Machine model tested on corpus B Precision Recall 9

10 Examples of PII* in corpus B SUBJECTIVE: Abcdef G Hill is a 44 year old male here to follow up on diabetes and pain. Medicine regimen: In am Mr. Hill takes glyburide 10 mg In pm Mr. Hill takes glyburide 10mg pain level between 5-7/10 Lives with parents (Will and Jane Miller) Get a tdap ( tetanus shot) call optometry at yo recent Phd engineering grad from NYU is currently in europe *All PII shown and redacted are fictional (name, age, date, etc.) 10

11 Data matching algorithm For each note and patient id pair in the corpus Read in the output from MIST (text, PII offsets) Get encounter and patient information Attempt to match then edit/add PII in the following order (with higher certainty first): reg. ex pattern for , date, zip, phone, medical record id, social security number reg. ex pattern to rule out blood pressure, decimal values, pulse match on patient data match on encounter data match from a list of hospitals for organization name match from a list countries and states for address For each token match from a list of US census names for patient name If token is a noun or proper noun phrase match from a dictionary of providers by last name 11

12 Data matching algorithm summary Discrete EMR data Regular Expression Lookup list Address Age ~ - Date - Doctor name - - Medical record number - Organization name - Patient name - Phone - SSN - 12

13 Data matching programming tools Python 1 Developed in IronPython 2 (optional) pyodbc to connect to a Clarity SQL database 3 Regular expression 4 Natural Language Tool Kit (NLTK) 5 unittest (32 bit on Window 7) Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O Reilly Media Inc

14 Recall increased when using the hybrid approach in corpus B Recall Machine model Hybrid 14

15 Precision decreased when using the hybrid approach in corpus B Precision Machine model Hybrid 15

16 Net performance gained overall and in HIPAAprotected PII when using the hybrid approach in corpus B Net performance Recall change Precision Change Recall change + Precision change 16

17 Challenges Person names matching Patient names, as defined, include any non provider names Providers are not always Group Health providers Incomplete sentences, lack of grammar and formatting means part of speech tagging is not reliable? Many false positives matches like Will, May, Major, etc. Organization name in the gold standard include non medical facilities Age appears in many form, matching on a number is not good Tried regular expression to rule out age Dates are in unpredictable, non-distinct formats Performance measurements did not give credit to partial span match Occasionally the gold standard gets updated Only reviewed 36 documents What is reasonable precision lost? How to measure it? 17

18 Conclusions With an integrated delivery system, we often have metadata about each chart note De-identified clinical text can still contains important clinical data useful for research if we weight the value of high recall over precision loss differently for each PII type Continue effort to increase PII More patient and encounter identifier data variables Clever surrogates to hide residual PII Hiding in plain sight 1 1. Carrell D, Malin B, Aberdeen J, et al. J Am Med Inform Assoc (2012). doi: /amiajnl

DeMISTifying Deidentification of PHI in Free-formatted Text

DeMISTifying Deidentification of PHI in Free-formatted Text DeMISTifying Deidentification of PHI in Free-formatted Text Cathy Petrozzino March 2016 Approved for Public Release; Distribution Unlimited. Case Number 16-0670 2016 The MITRE Corporation. All rights reserved.

More information

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,

More information

A Method for Automatic De-identification of Medical Records

A Method for Automatic De-identification of Medical Records A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract

More information

De-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study

De-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study De-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study Elyne Scheurwegs Artesis Hogeschool Antwerpen elynescheurwegs@hotmail.com Kim Luyckx biomina - biomedical informatics

More information

How to De-identify Data. Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008

How to De-identify Data. Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008 How to De-identify Data Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008 1 Outline The problem Brief history The solutions Examples with SAS and R code 2 Background The adoption

More information

Clinical Data Services

Clinical Data Services Clinical Data Services Data Storage, Data Collection Data Management Human Research Academy October 2014 CTS Research Development Services 706.721.6247 www.ctsrds@gru.edu Objectives Participants will:

More information

Efficient De-Identification of Electronic Patient Records for User Cognitive Testing

Efficient De-Identification of Electronic Patient Records for User Cognitive Testing 2012 45th Hawaii International Conference on System Sciences Efficient De-Identification of Electronic Patient Records for User Cognitive Testing Kenric W. Hammond Department of Veterans Affairs kenric.hammond@va.gov

More information

Secondary Uses of Health Data IMPAC s Oncology Data Alliance Program

Secondary Uses of Health Data IMPAC s Oncology Data Alliance Program Secondary Uses of Health Data IMPAC s Oncology Data Alliance Program NCVHS August 1, 2007 Joel Goldwein, MD Senior Vice President, Medical Affairs IMPAC Medical Systems Inc. IMPAC Medical Systems, Inc.

More information

An Interactive De-Identification-System

An Interactive De-Identification-System An Interactive De-Identification-System Katrin Tomanek 1, Philipp Daumke 1, Frank Enders 1, Jens Huber 1, Katharina Theres 2 and Marcel Müller 2 1 Averbis GmbH, Freiburg/Germany http://www.averbis.com

More information

Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics

Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics Privacy Analytics - Overview For organizations that want to safeguard and enable their

More information

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Dingcheng Li Mayo Clinic, USA 20-Dec-2015

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Dingcheng Li Mayo Clinic, USA 20-Dec-2015 PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)

More information

Annotated Corpora in the Cloud: Free Storage and Free Delivery

Annotated Corpora in the Cloud: Free Storage and Free Delivery Annotated Corpora in the Cloud: Free Storage and Free Delivery Graham Wilcock University of Helsinki graham.wilcock@helsinki.fi Abstract The paper describes a technical strategy for implementing natural

More information

Strategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies

Strategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies Strategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies Clete A. Kushida, M.D., Ph.D. Professor, Stanford University Medical Center Overview

More information

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone pbone@csse.unimelb.edu.au June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................

More information

A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC

A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC De-ID Data Corp, LLC Founded to: ENHANCE DATA ACCESS WHILE PROTECTING PATIENT PRIVACY Founders Problem

More information

A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set. Arya Tafvizi

A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set. Arya Tafvizi A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set by Arya Tafvizi S.B., Physics, MIT, 2010 S.B., Computer Science and Engineering, MIT, 2011 Submitted to the Department

More information

Automated Tool for Anonymization of Patient Records

Automated Tool for Anonymization of Patient Records Automated Tool for Anonymization of Patient Records Nikita Raaj MSc Computing and Management 2011/2012 The candidate confirms that the work submitted is their own and the appropriate credit has been given

More information

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach

More information

The De-identification of Personally Identifiable Information

The De-identification of Personally Identifiable Information The De-identification of Personally Identifiable Information Khaled El Emam (PhD) www.privacyanalytics.ca 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue W, Suite 200 Ottawa, ON Canada K1P 5J6

More information

Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR

Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR Nathan Manwaring University of Utah Masters Project Presentation April 2012 Equation Consulting Who we are Equation Consulting

More information

Statistical Methodology for a Clinical Trial Protocol.

Statistical Methodology for a Clinical Trial Protocol. Statistical Methodology for a Clinical Trial Protocol. McMaster University Anesthesia Research Interest Group Dinner Meeting December, 3 rd -2015. Objectives : Talk about some key concepts for writing

More information

EPOWERdoc EMR Medical Content Building Option

EPOWERdoc EMR Medical Content Building Option Overview Hospitals planning to implement the Emergency Department module for an enterprise Health Information System, such as Meditech or CPSI, are typically required to both create the medical content

More information

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for

More information

De-Identification of Clinical Data

De-Identification of Clinical Data De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV Tyrone Grandison, PhD Manager, Privacy Research, IBM TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008

More information

i2b2 Cell Messaging Project Management (PM) Cell

i2b2 Cell Messaging Project Management (PM) Cell i2b2 Cell Messaging Project Management (PM) Cell Table of Contents 2. Document Version History... 3 3. Introduction... 4 3.1 The i2b2 Hive... 4 3.2 i2b2 Messaging Overview... 4 3.2.1 Message Header...

More information

Automated Problem List Generation from Electronic Medical Records in IBM Watson

Automated Problem List Generation from Electronic Medical Records in IBM Watson Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Securing Big Data Learning and Differences from Cloud Security

Securing Big Data Learning and Differences from Cloud Security Securing Big Data Learning and Differences from Cloud Security Samir Saklikar RSA, The Security Division of EMC Session ID: DAS-108 Session Classification: Advanced Agenda Cloud Computing & Big Data Similarities

More information

Windows Installation Guide

Windows Installation Guide Informatics for Integrating Biology and the Bedside i2b2 Desktop Install: Full VM Server Windows Installation Guide Document Version: 1.6.1 i2b2 Software Version: 1.6 Table of Contents About this Guide...

More information

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction Large-scale evaluation of automated clinical note de-identification and its impact on information extraction Louise Deleger, 1 Katalin Molnar, 1 Guergana Savova, 2 Fei Xia, 3 Todd Lingren, 1 Qi Li, 1 Keith

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS

SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Bachelor of Science with Honors Research Distinction in Electrical

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity

More information

Natural Language Processing Supporting Clinical Decision Support

Natural Language Processing Supporting Clinical Decision Support Natural Language Processing Supporting Clinical Decision Support Applications for Enhancing Clinical Decision Making NIH Worksop; Bethesda, MD, April 24, 2012 Stephane M. Meystre, MD, PhD Department of

More information

Using Electronic Medical Records Data for Health Services Research Case Study: Development and Use of Ambulatory Adverse Event Trigger Tools

Using Electronic Medical Records Data for Health Services Research Case Study: Development and Use of Ambulatory Adverse Event Trigger Tools Using Electronic Medical Records Data for Health Services Research Case Study: Development and Use of Ambulatory Adverse Event Trigger Tools Hillary Mull VA Boston Healthcare System Boston University School

More information

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Russ Waitman Kelly Gerard Daniel W. Connolly Gregory A. Ator Division

More information

Extracting value from HIPAA Data James Yaple Jackson-Hannah LLC

Extracting value from HIPAA Data James Yaple Jackson-Hannah LLC Extracting value from HIPAA Data James Yaple Jackson-Hannah LLC Session Objectives Examine the value of realistic information in research and software testing Explore the challenges of de-identifying health

More information

Project Management (PM) Cell

Project Management (PM) Cell Informatics for Integrating Biology and the Bedside i2b2 Design Document Project Management (PM) Cell Document Version: 1.7.1 i2b2 Software Version: 1.7.00 Table of Contents DOCUMENT MANAGEMENT... 4 1.

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

De-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " "

De-Identification of Health Data under HIPAA: Regulations and Recent Guidance  De-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " " D even McGraw " Director, Health Privacy Project January 15, 201311 HIPAA Scope Does not cover all health data Applies

More information

Cerner i2b2 User s s Guide and Frequently Asked Questions. v1.3

Cerner i2b2 User s s Guide and Frequently Asked Questions. v1.3 User s s Guide and v1.3 Contents General Information... 3 Q: What is i2b2?... 3 Q: How is i2b2 populated?... 3 Q: How often is i2b2 updated?... 3 Q: What data is not in our i2b2?... 3 Q: Can individual

More information

Privacy Techniques for Big Data

Privacy Techniques for Big Data Privacy Techniques for Big Data The Pros and Cons of Syntatic and Differential Privacy Approaches Dr#Roksana#Boreli# SMU,#Singapore,#May#2015# Introductions NICTA Australia s National Centre of Excellence

More information

Flexible solution for interoperable cloud healthcare systems

Flexible solution for interoperable cloud healthcare systems University Politehnica Timişoara, ROMANIA Department of Automation and Applied Informatics Flexible solution for interoperable cloud healthcare systems Authors: Mihaela Marcella VIDA Oana Sorina LUPŞE

More information

Using EHRs to extract information, query clinicians, and insert reports

Using EHRs to extract information, query clinicians, and insert reports Using EHRs to extract information, query clinicians, and insert reports Meghan Baker, MD, ScD NIH HCS Collaboratory EHR working group webinar March 26, 2013 1 E S P V A E R S Electronic Support for Public

More information

CFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics

CFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics The Intersection of Technology, HAART Adherence, and Drug Abuse Treatment CFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics Stephen L. Boswell,

More information

Use of Novel Predictive Models to Improve Hospital Readmission Program. Copyright 2015

Use of Novel Predictive Models to Improve Hospital Readmission Program. Copyright 2015 Use of Novel Predictive Models to Improve Hospital Readmission Program Copyright 2015 1 Presenters Jason Burke, MA Senior Advisor & Faculty at UNC Health Care and School of Medicine Michael Cousins, PhD,

More information

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Scott L. DuVall Jun 27, 2014 1 Julie Lynch Vickie Venne Dawn Provenzale

More information

What is Covered under the Privacy Rule? Protected Health Information (PHI)

What is Covered under the Privacy Rule? Protected Health Information (PHI) HIPAA & RESEARCH What is Covered under the Privacy Rule? Protected Health Information (PHI) Health information + Identifier = PHI Transmitted or maintained in any form (paper, electronic, forms, web-based,

More information

Clinical Notes and Letter Templates - Advanced

Clinical Notes and Letter Templates - Advanced Clinical Notes and Letter Templates - Advanced Discussion Format Computer Classroom Format Lecture will be conducted throughout this session. We have a ton of content to share and everyone will be at a

More information

PyCantonese: Cantonese linguistic research in the age of big data

PyCantonese: Cantonese linguistic research in the age of big data PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus

More information

Implementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience

Implementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience Implementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience Christopher Ryan, Ph.D., CIP IRB Director Professor of Psychiatry University of Pittsburgh ryancm@upmc.edu The

More information

De-identification, defined and explained. Dan Stocker, MBA, MS, QSA Professional Services, Coalfire

De-identification, defined and explained. Dan Stocker, MBA, MS, QSA Professional Services, Coalfire De-identification, defined and explained Dan Stocker, MBA, MS, QSA Professional Services, Coalfire Introduction This perspective paper helps organizations understand why de-identification of protected

More information

Full VM Tutorial. i2b2 Desktop Installation (Windows) Informatics for Integrating Biology and the Bedside

Full VM Tutorial. i2b2 Desktop Installation (Windows) Informatics for Integrating Biology and the Bedside Informatics for Integrating Biology and the Bedside i2b2 Desktop Installation (Windows) Full VM Tutorial Document Version: 1.4.1 i2b2 Software Version: 1.4 Table of Contents About this Guide... v 1. Prerequisites...

More information

De-Identification of Clinical Data

De-Identification of Clinical Data De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008 1 1 Slide 1 cmw1 Craig M. Winter, 4/25/2008 Background

More information

The registry of the future: Leveraging EHR and patient data to drive better outcomes

The registry of the future: Leveraging EHR and patient data to drive better outcomes The registry of the future: Leveraging EHR and patient data to drive better outcomes Brian J. Kelly, M.D. President, Payer and Provider Solutions, Quintiles Jason Colquitt, VP, IT, Head of RWLPR IT, Global

More information

De-Identification of health records using Anonym: Effectiveness and robustness across datasets

De-Identification of health records using Anonym: Effectiveness and robustness across datasets De-Identification of health records using Anonym: Effectiveness and robustness across datasets Guido Zuccon a,b, Daniel Kotzur a, Anthony Nguyen a, Anton Bergheim c a The Australian e-health Research Centre

More information

Putting IBM Watson to Work In Healthcare

Putting IBM Watson to Work In Healthcare Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research marty.kohn@us.ibm.com Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or

More information

De-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013

De-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013 De-identification Koans ICTR Data Managers Darren Lacey January 15, 2013 Disclaimer There are several efforts addressing this issue in whole or part Over the next year or so, I believe that the conversation

More information

ProgressBook GradeBook Attendance User Guide

ProgressBook GradeBook Attendance User Guide ProgressBook GradeBook Attendance User Guide ProgressBook GradeBook Attendance User Guide (This document is current for ProgressBook v12.3 or later.) 2012 Software Answers, Inc. All Rights Reserved. All

More information

W E L C O M E. Event or Meeting Title. Jiajie Zhang, PhD 2013 WISH Closing Keynote

W E L C O M E. Event or Meeting Title. Jiajie Zhang, PhD 2013 WISH Closing Keynote W E L C O M E Event or Meeting Title Jiajie Zhang, PhD 2013 WISH Closing Keynote EHR Usability: The Emotional Stages Some time in the past We are here Some time in the future http://cabarettheatreblog.files.wordpr

More information

The Future of Technology in Long Term Care

The Future of Technology in Long Term Care The Future of Technology in Long Term Care Lisa Mitchelson and Scott White, TEF Traci Jersen, 6N Systems Ellen Flaherty, VCNY David Finkelstein, VCNY In today s workshop. Introduction Overview Electronic

More information

Secondary Uses of Data for Comparative Effectiveness Research

Secondary Uses of Data for Comparative Effectiveness Research Secondary Uses of Data for Comparative Effectiveness Research Paul Wallace MD Director, Center for Comparative Effectiveness Research The Lewin Group Paul.Wallace@lewin.com Disclosure/Perspectives Training:

More information

Identity Management Framework (IM) Cell

Identity Management Framework (IM) Cell Informatics for Integrating Biology and the Bedside i2b2 Cell Messaging Identity Management Framework (IM) Cell Document Version: 1.7.0 i2b2 Software Version: 1.7.00 Table of Contents DOCUMENT MANAGEMENT...

More information

DIGITECH AND HIPAA COMPLIANCE

DIGITECH AND HIPAA COMPLIANCE White Paper DIGITECH AND HIPAA COMPLIANCE April 2004 As HIPAA compliance becomes mandatory, Digitech Systems continues to proactively address the unique needs of the Health Care market. PaperVision Enterprise

More information

Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning

Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning 3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based

More information

SE Minnesota Beacon Enabling Population Health Research

SE Minnesota Beacon Enabling Population Health Research SE Minnesota Beacon Enabling Population Health Research Minnesota ehealthsummit June 13, 2013 Research into Practice 3:00pm Session Lacey Hart, MBA, PMP Conflict of Interest Disclosure: Speaker has no

More information

Joint Principles of the Patient Centered Medical Home February 2007

Joint Principles of the Patient Centered Medical Home February 2007 American Academy of Family Physicians (AAFP) American Academy of Pediatrics (AAP) American College of Physicians (ACP) American Osteopathic Association (AOA) Joint Principles of the Patient Centered Medical

More information

Administrative Services

Administrative Services Policy Title: Administrative Services De-identification of Client Information and Use of Limited Data Sets Policy Number: DHS-100-007 Version: 2.0 Effective Date: Upon Approval Signature on File in the

More information

Clinical Decision Support

Clinical Decision Support Clinical Decision Support The Decision Support Admin screen allows you to search and filter Decision Support recommendations and decide how they should be applied within your practice. Amazing Charts includes

More information

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,

More information

Introduction to Reporting with Allscripts Professional EHR

Introduction to Reporting with Allscripts Professional EHR Introduction to Reporting with Allscripts Professional EHR Demographics, Provider, Encounter December 2014 Today s presenter: Dana McDonough Technical Consultant Galen Healthcare Solutions Fallon Hartford

More information

Objective Data Dashboard Metrics Overview Document Purpose: What is the ODD and how does it work?

Objective Data Dashboard Metrics Overview Document Purpose: What is the ODD and how does it work? Objective Data Dashboard Metrics Overview Document Purpose: To improve understanding of the Objective Data Dashboard s (ODD) function, intent, and measures by providing simple descriptions of each ODD

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

HIPAA Basics for Clinical Research

HIPAA Basics for Clinical Research HIPAA Basics for Clinical Research Audio options: Built-in audio on your computer OR Separate audio dial-in: 415-930-5229 Toll-free: 1-877-309-2074 Access Code: 960-353-248 Audio PIN: Shown after joining

More information

Clinical Database Information System for Gbagada General Hospital

Clinical Database Information System for Gbagada General Hospital International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 9, September 2015, PP 29-37 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org

More information

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data

More information

HIPAA and Big Data Twenty Third National HIPAA Summit. March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer

HIPAA and Big Data Twenty Third National HIPAA Summit. March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer HIPAA and Big Data Twenty Third National HIPAA Summit March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer Overview HIPAA and Big Data Big Data Definitions Big Data and Health Care Benefits

More information

ISM 680: Healthcare Information Technology Management, Fall 2013 Online / Asynchronous Delivery

ISM 680: Healthcare Information Technology Management, Fall 2013 Online / Asynchronous Delivery An EEO/Affirmative Action Institution ISM 680: Healthcare Information Technology Management, Fall 2013 Online / Asynchronous Delivery Instructor: Robert Smith, Training Director for Cone Health Office:

More information

Distributed Networking

Distributed Networking Distributed Networking Millions of people. Strong collaborations. Privacy first. Jeffrey Brown, Lesley Curtis, Richard Platt Harvard Pilgrim Health Care Institute and Harvard Medical School Duke Medical

More information

Understanding and Selecting a DLP Solution. Rich Mogull Securosis

Understanding and Selecting a DLP Solution. Rich Mogull Securosis Understanding and Selecting a DLP Solution Rich Mogull Securosis No Wonder We re Confused Data Loss Prevention Data Leak Prevention Data Loss Protection Information Leak Prevention Extrusion Prevention

More information

Secondary Use of Healthcare Data for Public Health. Leslie Lenert, MD, MS FACMI Director, National Center for Public Health Informatics

Secondary Use of Healthcare Data for Public Health. Leslie Lenert, MD, MS FACMI Director, National Center for Public Health Informatics Secondary Use of Healthcare Data for Public Health Leslie Lenert, MD, MS FACMI Director, National Center for Public Health Informatics NCPHI Overview Agenda Three disparate secondary uses of clinical data

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

Paul Harris, PhD. Planning, Collecting and Managing Data For Clinical And Translational Research

Paul Harris, PhD. Planning, Collecting and Managing Data For Clinical And Translational Research Planning, Collecting and Managing Data For Clinical And Translational Research Paul Harris, PhD Associate Professor Department of Biomedical Informatics Vanderbilt University Agenda Data Planning for Clinical

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Bridging Strategy and Data. Overview. Version 3.3.18.11

Bridging Strategy and Data. Overview. Version 3.3.18.11 Bridging Strategy and Data Overview Version 3.3.18.11 2 PROBLEM: Top 3 reasons to mask data 3 1: Data Breach AXIS DATA MASKING There has been growing number of attacks on major enterprises. Insider fraud

More information

METHODS IN MEDICAL INFORMATICS

METHODS IN MEDICAL INFORMATICS Chapman & Hall/CRC Mathematical and Computational Biology Series METHODS IN MEDICAL INFORMATICS Fundamentals of Healthcare Programming in Perln Pythoni and Ruby Jules J- Berman TECHNISCHE INFORMATION SBIBLIOTHEK

More information

SOP Number: OCR-HIP-001 Effective Date: August 2013 Page 1 of 5

SOP Number: OCR-HIP-001 Effective Date: August 2013 Page 1 of 5 Title: HIPAA Research Policy: General Nova Southeastern University Standard Operating Procedure for GCP Version # 1 SOP Number: OCR-HIP-001 Effective Date: August 2013 Page 1 of 5 PURPOSE: Federal privacy

More information

Safety Implications of EHR/HIT

Safety Implications of EHR/HIT Safety Implications of EHR/HIT PSA Board Meeting July 24, 2012 Erin Sparnon, MEng. William M. Marella, MBA 7/19/2012 2012 Pennsylvania Patient Safety Authority 1 What does it take to have safe HIT? Availability

More information

NYS-HCCN TECHNICAL ASSISTANCE FOR USERS OF VITERA INTERGY

NYS-HCCN TECHNICAL ASSISTANCE FOR USERS OF VITERA INTERGY NYS-HCCN TECHNICAL ASSISTANCE FOR USERS OF VITERA INTERGY WEBINAR #3 DATA CAPTURE FOR MENU OBJECTIVES 1-5 Presented by: Marlen Bazan-DeLeon Clinical Data Supervisor Health Choice Network, Inc HCNClinicalOperations@HCNetwork.org

More information

SMALL BRAIN PROJECT. James Ryan DO Kevin Perdue MS

SMALL BRAIN PROJECT. James Ryan DO Kevin Perdue MS SMALL BRAIN PROJECT James Ryan DO Kevin Perdue MS take home message: multimedia recordings can improve patient engagement, and population health management, while reducing clinicians charting burden. {busy

More information

Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15

Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15 Medical Big Data Workshop 12:30-5pm Star Conference Room #MedBigData15 Welcome! Today s Goals: Introduce you to the Big Data @ CSAIL Introduce you to the popular MIMIC II Dataset Overview of Database Technologies

More information

The Use of Patient Records (EHR) for Research

The Use of Patient Records (EHR) for Research The Use of Patient Records (EHR) for Research Mary Devereaux, Ph.D. Director, Biomedical Ethics Seminars Assistant Director, Research Ethics Program & San Diego Research Ethics Consortium Abstract The

More information

Disease Diagnosis. Supervised By:- International School of Informatics & Management Jaipur. Project on

Disease Diagnosis. Supervised By:- International School of Informatics & Management Jaipur. Project on International School of Informatics & Management Jaipur Project on Disease Diagnosis By: - Supervised By:- 1) Vishal Saxena Mrs. Kapila Pareek 2) Atin Varshneya 3) Priyanka Srivastava TeamNo-06 CERTIFICATE

More information

Memorandum. Factual Background

Memorandum. Factual Background Memorandum TO: FROM: SUBJECT: Chris Ianelli and Jill Mullan, ispecimen, Inc. Kristen Rosati and Ana Christian, Polsinelli, PC ispecimen Regulatory Compliance DATE: January 26, 2014 You have asked us to

More information

Natural Language Processing for Clinical Informatics and Translational Research Informatics

Natural Language Processing for Clinical Informatics and Translational Research Informatics Natural Language Processing for Clinical Informatics and Translational Research Informatics Imre Solti, M. D., Ph. D. solti@uw.edu K99 Fellow in Biomedical Informatics University of Washington Background

More information

CRM Form to Web. Internet Lead Capture. Web Form Configuration Instructions VERSION 1.0 DATE PREPARED: 1/1/2013

CRM Form to Web. Internet Lead Capture. Web Form Configuration Instructions VERSION 1.0 DATE PREPARED: 1/1/2013 CRM Form to Web Internet Lead Capture Web Form Configuration Instructions VERSION 1.0 DATE PREPARED: 1/1/2013 DEVELOPMENT: BRITE GLOBAL, INC. 2013 Brite Global, Incorporated. All rights reserved. The information

More information

Objective Data Dashboard Metrics Overview

Objective Data Dashboard Metrics Overview Objective Data Dashboard Metrics Overview Document Purpose: To improve understanding of the Objective Data Dashboard s (ODD) function, intent, and measures by providing simple descriptions of each ODD

More information

BUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on!

BUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on! BUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on! Mary-Tara Roth, RN, MSN, MPH BUMC Clinical Research Resources Office (CRRO) Mary Banks, RN, BSN Senior Analyst II, BUMC IRB September

More information

Allscripts Tips and Tricks

Allscripts Tips and Tricks Allscripts Tips and Tricks For more detailed information about a topic, such as "Transitioning an Initial Diagnosis to a Final Diagnosis," "Basic Navigation," and "e-prescribing," please download the documents

More information