Combining structured data with machine learning to improve clinical text de-identification
|
|
- Luke Skinner
- 8 years ago
- Views:
Transcription
1 Combining structured data with machine learning to improve clinical text de-identification DT Tran Scott Halgrim David Carrell Group Health Research Institute
2 Clinical text contains Personally identifiable information (PII): information that can identify an individual in context. Some PII are protected by the Health Insurance Portability and Accountability Act (HIPAA) HIPAA-protected PII Patient name Medical record number Age Social security number Dates (including birthday) Address, room Url, IP address Others Not HIPAA-protected PII Provider name, initials Organization name 2
3 Steps to de-identify PII in clinical text Step 1: Find PII text spans Step 2a: Replace PII spans Step 2b: Remove PII spans 3
4 Finding PII text spans Measurement Recall Precision Purpose The number of identifiers (PII) detected divided by the total number of PII in the reference standard The number of correctly predicted identifiers divided by the number of predictions 4
5 Hypothesis 1. We can train a high performing machine learning model to find most PII with reasonable precision 2. We can increase recall of sensitive, HIPAA-protected PII with a data matching algorithm 5
6 Experiment description Chart reviewers annotate PII to create our gold standard Train a machine model on corpus A Apply model to corpus B Compare machine model vs. hybrid on corpus C Develop a secondary algorithm Review documents where recall is not 100% (corpus B ) 6
7 Experiment corpus Corpus Corpus A Description Training data to develop a machine learning model 635 Family Practice, 70 Internal Medicine, 131 Oncology, 70 OBGYN notes Corpus B Baseline test data to get documents for Corpus B 129 Family Practice notes Corpus B Corpus C Documents in corpus B where the machine learning model did not have 100% recall. Used to inform the data matching rules Reserved test data to assess whether the hybrid approach can improve performance on unseen documents in the future 7
8 The machine learning tool used: MITRE Identification Scrubber Toolkit 1 MITRE Identification Scrubber Toolkit 1 (MIST) is an open source machine learning toolkit specifically designed to de-id PII in natural text Scalable and robust User-friendly interface Powerful commands Well documented 1. An earlier version of MIST was the highestperforming automated system in the Informatics for Integrating Biology and the Bedside (i2b2) deidentification Challenge 8
9 Machine model tested on corpus B Precision Recall 9
10 Examples of PII* in corpus B SUBJECTIVE: Abcdef G Hill is a 44 year old male here to follow up on diabetes and pain. Medicine regimen: In am Mr. Hill takes glyburide 10 mg In pm Mr. Hill takes glyburide 10mg pain level between 5-7/10 Lives with parents (Will and Jane Miller) Get a tdap ( tetanus shot) call optometry at yo recent Phd engineering grad from NYU is currently in europe *All PII shown and redacted are fictional (name, age, date, etc.) 10
11 Data matching algorithm For each note and patient id pair in the corpus Read in the output from MIST (text, PII offsets) Get encounter and patient information Attempt to match then edit/add PII in the following order (with higher certainty first): reg. ex pattern for , date, zip, phone, medical record id, social security number reg. ex pattern to rule out blood pressure, decimal values, pulse match on patient data match on encounter data match from a list of hospitals for organization name match from a list countries and states for address For each token match from a list of US census names for patient name If token is a noun or proper noun phrase match from a dictionary of providers by last name 11
12 Data matching algorithm summary Discrete EMR data Regular Expression Lookup list Address Age ~ - Date - Doctor name - - Medical record number - Organization name - Patient name - Phone - SSN - 12
13 Data matching programming tools Python 1 Developed in IronPython 2 (optional) pyodbc to connect to a Clarity SQL database 3 Regular expression 4 Natural Language Tool Kit (NLTK) 5 unittest (32 bit on Window 7) Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O Reilly Media Inc
14 Recall increased when using the hybrid approach in corpus B Recall Machine model Hybrid 14
15 Precision decreased when using the hybrid approach in corpus B Precision Machine model Hybrid 15
16 Net performance gained overall and in HIPAAprotected PII when using the hybrid approach in corpus B Net performance Recall change Precision Change Recall change + Precision change 16
17 Challenges Person names matching Patient names, as defined, include any non provider names Providers are not always Group Health providers Incomplete sentences, lack of grammar and formatting means part of speech tagging is not reliable? Many false positives matches like Will, May, Major, etc. Organization name in the gold standard include non medical facilities Age appears in many form, matching on a number is not good Tried regular expression to rule out age Dates are in unpredictable, non-distinct formats Performance measurements did not give credit to partial span match Occasionally the gold standard gets updated Only reviewed 36 documents What is reasonable precision lost? How to measure it? 17
18 Conclusions With an integrated delivery system, we often have metadata about each chart note De-identified clinical text can still contains important clinical data useful for research if we weight the value of high recall over precision loss differently for each PII type Continue effort to increase PII More patient and encounter identifier data variables Clever surrogates to hide residual PII Hiding in plain sight 1 1. Carrell D, Malin B, Aberdeen J, et al. J Am Med Inform Assoc (2012). doi: /amiajnl
DeMISTifying Deidentification of PHI in Free-formatted Text
DeMISTifying Deidentification of PHI in Free-formatted Text Cathy Petrozzino March 2016 Approved for Public Release; Distribution Unlimited. Case Number 16-0670 2016 The MITRE Corporation. All rights reserved.
More informationIntegrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,
More informationA Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract
More informationDe-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study
De-Identification of Clinical Free Text in Dutch with Limited Training Data: A Case Study Elyne Scheurwegs Artesis Hogeschool Antwerpen elynescheurwegs@hotmail.com Kim Luyckx biomina - biomedical informatics
More informationHow to De-identify Data. Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008
How to De-identify Data Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008 1 Outline The problem Brief history The solutions Examples with SAS and R code 2 Background The adoption
More informationAnnotated Corpora in the Cloud: Free Storage and Free Delivery
Annotated Corpora in the Cloud: Free Storage and Free Delivery Graham Wilcock University of Helsinki graham.wilcock@helsinki.fi Abstract The paper describes a technical strategy for implementing natural
More informationAn Interactive De-Identification-System
An Interactive De-Identification-System Katrin Tomanek 1, Philipp Daumke 1, Frank Enders 1, Jens Huber 1, Katharina Theres 2 and Marcel Müller 2 1 Averbis GmbH, Freiburg/Germany http://www.averbis.com
More informationAnonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics
Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics Privacy Analytics - Overview For organizations that want to safeguard and enable their
More informationStrategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies
Strategies for De-Identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies Clete A. Kushida, M.D., Ph.D. Professor, Stanford University Medical Center Overview
More informationEfficient De-Identification of Electronic Patient Records for User Cognitive Testing
2012 45th Hawaii International Conference on System Sciences Efficient De-Identification of Electronic Patient Records for User Cognitive Testing Kenric W. Hammond Department of Veterans Affairs kenric.hammond@va.gov
More informationChunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA
Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach
More informationA De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set. Arya Tafvizi
A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set by Arya Tafvizi S.B., Physics, MIT, 2010 S.B., Computer Science and Engineering, MIT, 2011 Submitted to the Department
More informationIntegrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project
Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone pbone@csse.unimelb.edu.au June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................
More informationAutomated Tool for Anonymization of Patient Records
Automated Tool for Anonymization of Patient Records Nikita Raaj MSc Computing and Management 2011/2012 The candidate confirms that the work submitted is their own and the appropriate credit has been given
More informationPEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Dingcheng Li Mayo Clinic, USA 20-Dec-2015
PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)
More informationSecuring Big Data Learning and Differences from Cloud Security
Securing Big Data Learning and Differences from Cloud Security Samir Saklikar RSA, The Security Division of EMC Session ID: DAS-108 Session Classification: Advanced Agenda Cloud Computing & Big Data Similarities
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationAutomated Problem List Generation from Electronic Medical Records in IBM Watson
Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei
More informationSecondary Uses of Health Data IMPAC s Oncology Data Alliance Program
Secondary Uses of Health Data IMPAC s Oncology Data Alliance Program NCVHS August 1, 2007 Joel Goldwein, MD Senior Vice President, Medical Affairs IMPAC Medical Systems Inc. IMPAC Medical Systems, Inc.
More informationLarge-scale evaluation of automated clinical note de-identification and its impact on information extraction
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction Louise Deleger, 1 Katalin Molnar, 1 Guergana Savova, 2 Fei Xia, 3 Todd Lingren, 1 Qi Li, 1 Keith
More informationClinical Data Services
Clinical Data Services Data Storage, Data Collection Data Management Human Research Academy October 2014 CTS Research Development Services 706.721.6247 www.ctsrds@gru.edu Objectives Participants will:
More informationi2b2 Cell Messaging Project Management (PM) Cell
i2b2 Cell Messaging Project Management (PM) Cell Table of Contents 2. Document Version History... 3 3. Introduction... 4 3.1 The i2b2 Hive... 4 3.2 i2b2 Messaging Overview... 4 3.2.1 Message Header...
More informationPrivacy Techniques for Big Data
Privacy Techniques for Big Data The Pros and Cons of Syntatic and Differential Privacy Approaches Dr#Roksana#Boreli# SMU,#Singapore,#May#2015# Introductions NICTA Australia s National Centre of Excellence
More informationA Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC
A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC De-ID Data Corp, LLC Founded to: ENHANCE DATA ACCESS WHILE PROTECTING PATIENT PRIVACY Founders Problem
More informationPyCantonese: Cantonese linguistic research in the age of big data
PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus
More informationData Driven Approaches to Prescription Medication Outcomes Analysis Using EMR
Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR Nathan Manwaring University of Utah Masters Project Presentation April 2012 Equation Consulting Who we are Equation Consulting
More informationThe De-identification of Personally Identifiable Information
The De-identification of Personally Identifiable Information Khaled El Emam (PhD) www.privacyanalytics.ca 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue W, Suite 200 Ottawa, ON Canada K1P 5J6
More informationAdministrative Services
Policy Title: Administrative Services De-identification of Client Information and Use of Limited Data Sets Policy Number: DHS-100-007 Version: 2.0 Effective Date: Upon Approval Signature on File in the
More informationTransformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
More informationAutomatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
More information11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationDe-Identification of Clinical Data
De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV Tyrone Grandison, PhD Manager, Privacy Research, IBM TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008
More informationMETHODS IN MEDICAL INFORMATICS
Chapman & Hall/CRC Mathematical and Computational Biology Series METHODS IN MEDICAL INFORMATICS Fundamentals of Healthcare Programming in Perln Pythoni and Ruby Jules J- Berman TECHNISCHE INFORMATION SBIBLIOTHEK
More informationSESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS
SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Bachelor of Science with Honors Research Distinction in Electrical
More informationBuild Vs. Buy For Text Mining
Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question
More informationPPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
More informationIntroduction to IE with GATE
Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation
More informationCompliance in Office 365 What You Should Know. Don Miller Vice President of Sales Concept Searching donm@conceptsearching.com Twitter @conceptsearch
Compliance in Office 365 What You Should Know Don Miller Vice President of Sales Concept Searching donm@conceptsearching.com Twitter @conceptsearch Agenda Concept Searching Who we are and what we do Office
More informationi2b2 Clinical Research Chart
i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser
More informationProject Management (PM) Cell
Informatics for Integrating Biology and the Bedside i2b2 Design Document Project Management (PM) Cell Document Version: 1.7.1 i2b2 Software Version: 1.7.00 Table of Contents DOCUMENT MANAGEMENT... 4 1.
More informationDe-identification, defined and explained. Dan Stocker, MBA, MS, QSA Professional Services, Coalfire
De-identification, defined and explained Dan Stocker, MBA, MS, QSA Professional Services, Coalfire Introduction This perspective paper helps organizations understand why de-identification of protected
More informationExtracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning
3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based
More informationAppraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
More informationClick to edit Master title style
Click to edit Master title style UNCLASSIFIED//FOR OFFICIAL USE ONLY Dr. Russell D. Richardson, G2/INSCOM Science Advisor UNCLASSIFIED//FOR OFFICIAL USE ONLY 1 UNCLASSIFIED Semantic Enrichment of the Data
More informationWrestling with Python Unit testing. Warren Viant
Wrestling with Python Unit testing Warren Viant Assessment criteria OCR - 2015 Programming Techniques (12 marks) There is an attempt to solve all of the tasks using most of the techniques listed. The techniques
More informationDe-Identification of health records using Anonym: Effectiveness and robustness across datasets
De-Identification of health records using Anonym: Effectiveness and robustness across datasets Guido Zuccon a,b, Daniel Kotzur a, Anthony Nguyen a, Anton Bergheim c a The Australian e-health Research Centre
More informationDe-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013
De-identification Koans ICTR Data Managers Darren Lacey January 15, 2013 Disclaimer There are several efforts addressing this issue in whole or part Over the next year or so, I believe that the conversation
More informationDeveloping VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record
Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Scott L. DuVall Jun 27, 2014 1 Julie Lynch Vickie Venne Dawn Provenzale
More informationezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes
ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes Parth Pathak, Pinal Patel, Vishal Panchal, Narayan Choudhary, Amrish Patel, Gautam Joshi ezdi, LLC.
More informationNatural Language Processing Supporting Clinical Decision Support
Natural Language Processing Supporting Clinical Decision Support Applications for Enhancing Clinical Decision Making NIH Worksop; Bethesda, MD, April 24, 2012 Stephane M. Meystre, MD, PhD Department of
More informationUnderstanding and Selecting a DLP Solution. Rich Mogull Securosis
Understanding and Selecting a DLP Solution Rich Mogull Securosis No Wonder We re Confused Data Loss Prevention Data Leak Prevention Data Loss Protection Information Leak Prevention Extrusion Prevention
More informationNatural Language Processing for Clinical Informatics and Translational Research Informatics
Natural Language Processing for Clinical Informatics and Translational Research Informatics Imre Solti, M. D., Ph. D. solti@uw.edu K99 Fellow in Biomedical Informatics University of Washington Background
More informationDe-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " "
De-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " " D even McGraw " Director, Health Privacy Project January 15, 201311 HIPAA Scope Does not cover all health data Applies
More informationWhat is Covered under the Privacy Rule? Protected Health Information (PHI)
HIPAA & RESEARCH What is Covered under the Privacy Rule? Protected Health Information (PHI) Health information + Identifier = PHI Transmitted or maintained in any form (paper, electronic, forms, web-based,
More informationPersonalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences
Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationBerlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
More informationWindows Installation Guide
Informatics for Integrating Biology and the Bedside i2b2 Desktop Install: Full VM Server Windows Installation Guide Document Version: 1.6.1 i2b2 Software Version: 1.6 Table of Contents About this Guide...
More informationPutting IBM Watson to Work In Healthcare
Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research marty.kohn@us.ibm.com Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or
More informationHow To Edit An Absence Record On A School Website
ProgressBook GradeBook Attendance User Guide ProgressBook GradeBook Attendance User Guide (This document is current for ProgressBook v12.3 or later.) 2012 Software Answers, Inc. All Rights Reserved. All
More informationICE Futures Europe. AFTS Technical Guide for Large Position Reporting V1.0
ICE Futures Europe AFTS Technical Guide for Large Position Reporting V1.0 ICE FUTURES EUROPE Page 1 of 7 Contents 1. Introduction... 3 2. Online access to Clearing Systems... 4 3. Uploading Data... 5 4.
More informationImplementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience
Implementing Honest Broker System(s) in Academic Medical Centers: The Pittsburgh Experience Christopher Ryan, Ph.D., CIP IRB Director Professor of Psychiatry University of Pittsburgh ryancm@upmc.edu The
More informationDIGITECH AND HIPAA COMPLIANCE
White Paper DIGITECH AND HIPAA COMPLIANCE April 2004 As HIPAA compliance becomes mandatory, Digitech Systems continues to proactively address the unique needs of the Health Care market. PaperVision Enterprise
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationFull VM Tutorial. i2b2 Desktop Installation (Windows) Informatics for Integrating Biology and the Bedside
Informatics for Integrating Biology and the Bedside i2b2 Desktop Installation (Windows) Full VM Tutorial Document Version: 1.4.1 i2b2 Software Version: 1.4 Table of Contents About this Guide... v 1. Prerequisites...
More informationThe registry of the future: Leveraging EHR and patient data to drive better outcomes
The registry of the future: Leveraging EHR and patient data to drive better outcomes Brian J. Kelly, M.D. President, Payer and Provider Solutions, Quintiles Jason Colquitt, VP, IT, Head of RWLPR IT, Global
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationResearch Electronic Data Capture (REDCap)
Research Electronic Data Capture (REDCap) An Introduction and Training Seminar Kenna Whitley Center for Research Methods and Data Analysis What is REDCap? A secure, web based electronic data capture system
More informationClinical Database Information System for Gbagada General Hospital
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 9, September 2015, PP 29-37 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationSE Minnesota Beacon Enabling Population Health Research
SE Minnesota Beacon Enabling Population Health Research Minnesota ehealthsummit June 13, 2013 Research into Practice 3:00pm Session Lacey Hart, MBA, PMP Conflict of Interest Disclosure: Speaker has no
More informationDNA Sequencing Data Compression. Michael Chung
DNA Sequencing Data Compression Michael Chung Problem DNA sequencing per dollar is increasing faster than storage capacity per dollar. Stein (2010) Data 3 billion base pairs in human genome Genomes are
More informationExtracting value from HIPAA Data James Yaple Jackson-Hannah LLC
Extracting value from HIPAA Data James Yaple Jackson-Hannah LLC Session Objectives Examine the value of realistic information in research and software testing Explore the challenges of de-identifying health
More informationDe-Identification of Clinical Data
De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008 1 1 Slide 1 cmw1 Craig M. Winter, 4/25/2008 Background
More informationUsing EHRs to extract information, query clinicians, and insert reports
Using EHRs to extract information, query clinicians, and insert reports Meghan Baker, MD, ScD NIH HCS Collaboratory EHR working group webinar March 26, 2013 1 E S P V A E R S Electronic Support for Public
More informationHIPAA and Big Data Twenty Third National HIPAA Summit. March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer
HIPAA and Big Data Twenty Third National HIPAA Summit March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer Overview HIPAA and Big Data Big Data Definitions Big Data and Health Care Benefits
More informationBig Data and Scripting
Big Data and Scripting 1, 2, Big Data and Scripting - abstract/organization contents introduction to Big Data and involved techniques schedule 2 lectures (Mon 1:30 pm, M628 and Thu 10 am F420) 2 tutorials
More informationSECURE ICAP Gateway. Blue Coat Implementation Guide. Technical note. Version 1.0 23/12/13. Product Information. Version & Platform SGOS 6.
Technical note Version 1.0 23/12/13 Product Information Partner Name Web Site Product Name Blue Coat Systems, Inc. www.bluecoat.com ProxySG Version & Platform SGOS 6.5 Product Description Blue Coat ProxySG
More informationAutomated Concept Extraction to aid Legal ediscovery Review
Automated Concept Extraction to aid Legal ediscovery Review Prasad M Deshpande IBM Research India prasdesh@in.ibm.com Sachindra Joshi IBM Research India jsachind@in.ibm.com Thomas Hampp IBM Software Germany
More informationCFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics
The Intersection of Technology, HAART Adherence, and Drug Abuse Treatment CFAR Network of Integrated Clinical Systems(CNICS): The Use of Real-Time, Patient-Centered, Clinical Metrics Stephen L. Boswell,
More informationRecommendation-based De-Identification
Recommendation-based De-Identification A Practical Systems Approach towards De-identification of Unstructured Text in Healthcare Varun Bhagwan Healthcare Informatics IBM Almaden Research Center San Jose,
More informationEnglish Grammar Checker
International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,
More informationUnderstanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification
Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Russ Waitman Kelly Gerard Daniel W. Connolly Gregory A. Ator Division
More informationExtracting Medication Information from Discharge Summaries
Extracting Medication Information from Discharge Summaries Scott Halgrim, Fei Xia, Imre Solti, Eithon Cadag University of Washington PO Box 543450 Seattle, WA 98195, USA {captnpi,fxia,solti,ecadag}@uw.edu
More informationBUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on!
BUMC Clinical Research Seminar: What would YOU do? Put your IRB hat on! Mary-Tara Roth, RN, MSN, MPH BUMC Clinical Research Resources Office (CRRO) Mary Banks, RN, BSN Senior Analyst II, BUMC IRB September
More informationMedical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15
Medical Big Data Workshop 12:30-5pm Star Conference Room #MedBigData15 Welcome! Today s Goals: Introduce you to the Big Data @ CSAIL Introduce you to the popular MIMIC II Dataset Overview of Database Technologies
More informationEVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING
Kocbek et al. Big Data 2015, Sydney 1 EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING Simon Kocbek, Lawrence Cavedon, David Martinez, Christopher Bain, Chris Mac Manus,
More informationMedical Documentation Barry H. Block, DPM, JD bblock@podiatrym.com
Medical Documentation Barry H. Block, DPM, JD bblock@podiatrym.com 1 Why is Documentation Important? Why is Documentation Important? Reimbursement 2 EMR DOCUMENTATION ABSURD ICD 10 CODE V97.33XD: Sucked
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationData and Information Management in Public Health
Data and Information Management in Public Health Adrienne S. Ettinger, Sc.D., M.P.H. Environmental Public Health Tracking Methods Course July 2004 Outline Information Management in Public Health Information
More informationMemorandum. Factual Background
Memorandum TO: FROM: SUBJECT: Chris Ianelli and Jill Mullan, ispecimen, Inc. Kristen Rosati and Ana Christian, Polsinelli, PC ispecimen Regulatory Compliance DATE: January 26, 2014 You have asked us to
More informationChicago Health Atlas Context, current status, and future work
Chicago Health Atlas Context, current status, and future work April 30, 2013 Roderick (Eric) Jones, MPH Chicago Department of Public Health Session Preview What is the Chicago Health Atlas? Background:
More informationWhite Paper. The Five Keys to a Successful Document Management System ABSTRACT. www.treenosoftware.com Command Your Content
1 White Paper The Five Keys to a Successful Document Management System ABSTRACT The successful implementation of an electronic document management system begins with a detailed understanding the specific
More informationEric Engelhard, Ph.D. Director of Informatics Mouse Biology Program University of California, Davis
KOMPCluster: A Pattern Recognition and 3D Visualization System for Phenotyping Projects Eric Engelhard, Ph.D. Director of Informatics Mouse Biology Program University of California, Davis Overview Large,
More informationObjective Data Dashboard Metrics Overview Document Purpose: What is the ODD and how does it work?
Objective Data Dashboard Metrics Overview Document Purpose: To improve understanding of the Objective Data Dashboard s (ODD) function, intent, and measures by providing simple descriptions of each ODD
More informationEPOWERdoc EMR Medical Content Building Option
Overview Hospitals planning to implement the Emergency Department module for an enterprise Health Information System, such as Meditech or CPSI, are typically required to both create the medical content
More informationBridging Strategy and Data. Overview. Version 3.3.18.11
Bridging Strategy and Data Overview Version 3.3.18.11 2 PROBLEM: Top 3 reasons to mask data 3 1: Data Breach AXIS DATA MASKING There has been growing number of attacks on major enterprises. Insider fraud
More informationNational Patient Information Reporting System: National Data Warehouse
National Patient Information Reporting System: NDW General Data Mart Technical Guide Current Version:7.0 Created: 9/16/2011 1:39:00 PM Published: Author: NPIRS-NDW Department of Health and Human Services
More information