Natural Language Processing in the EHR Lifecycle



Similar documents
IBM Watson and Medical Records Text Analytics HIMSS Presentation

Shallow Parsing with Apache UIMA

Find the signal in the noise

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Clinical Decision Support Systems An Open Source Perspective

Software Engineering EMR Project Report

Strategic Health IT Advanced Research Projects (SHARP) AREA 4: Secondary Use of EHR Data (SHARPn) Program

Survey Results: Requirements and Use Cases for Linguistic Linked Data

SAP Database Strategy Overview. Uwe Grigoleit September 2013

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Software Architecture Document

The Prolog Interface to the Unstructured Information Management Architecture

How To Understand The Difference Between Terminology And Ontology

Graph Database Performance: An Oracle Perspective

How To Make Sense Of Data With Altilia

Sterling Business Intelligence

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

Zero-in on business decisions through innovation solutions for smart big data management. How to turn volume, variety and velocity into value

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes

Master Data Management and Data Warehousing. Zahra Mansoori

Big Data & Security. Aljosa Pasic 12/02/2015

Putting IBM Watson to Work In Healthcare

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

IBM Content Analytics with Enterprise Search, Version 3.0

Schema documentation for types1.2.xsd

Technical Report. The KNIME Text Processing Feature:

31 Case Studies: Java Natural Language Tools Available on the Web

Real-Time Enterprise Management with SAP Business Suite on the SAP HANA Platform

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Open Platform. Clinical Portal. Provider Mobile. Orion Health. Rhapsody Integration Engine. RAD LAB PAYER Rx

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary (HDD): Implemented with a data warehouse

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

Computer-assisted coding and natural language processing

Providing real-time, built-in analytics with S/4HANA. Jürgen Thielemans, SAP Enterprise Architect SAP Belgium&Luxembourg

Ask your Database: Natural Language Processing using In-Memory Technology

A collaborative platform for knowledge management

Automatic Text Analysis Using Drupal

An Essential Ingredient for a Successful ACO: The Clinical Knowledge Exchange

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

Extending The Value of SAP with the SAP BusinessObjects Business Intelligence Platform Product Integration Roadmap

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

Semantic Stored Procedures Programming Environment and performance analysis

Natural Language Processing

Cognitive z. Mathew Thoennes IBM Research System z Research June 13, 2016

Industry Models and Information Server

Extend your analytic capabilities with SAP Predictive Analysis

Exploration and Visualization of Post-Market Data

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Databases in Organizations

ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics

Microsoft Dynamics AX. Reporting and Business Intelligence in Microsoft Dynamics AX

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

Master Mobile Products Private Label Partner Licensing Program

Deriving Business Intelligence from Unstructured Data

An Overview of SAP BW Powered by HANA. Al Weedman

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Big Data and Text Mining

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

Unified Batch & Stream Processing Platform

KonyOne Server Prerequisites _ MS SQL Server

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Using Big Data in Healthcare

BUSINESSOBJECTS DATA INTEGRATOR

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Abstracting the types away from a UIMA type system

Establishing a business performance management ecosystem.

Transcription:

Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service

Outline Medical Data Landscape Value Proposition of NLP Strategies for voice and text processing Tooling options Integration with the EMR lifecycle

Medical Data Landscape Copyright 2010 Accenture All All Rights Reserved. Accenture, its its logo, and High Performance Delivered are trademarks of of Accenture.

Medical Data Landscape

Medical Data Where is it? Two Types of Content 1. Structured Content - Typically found in a database A. Fits a pre-defined data model B. Fits well into relational tables. Examples 20% Databases XML Data Data warehouses Enterprise systems (CRM, ERP, etc.) UMLS RxNorm 2. Unstructured Content - Can be found throughout an organization A. Does not fit a pre-defined data model B. Does not fit well into relational tables. Examples - Text-based Email messages 80% Office documents Web documents BLOB (Binary Large Object) field type (e.g. Transcribed Doctor s Notes) Examples Non-Text-based Voice/Audio files (e.g. Dictated Doctor s Notes) Images Video files Medical Charts Slide from DataSkill

NLP Value Proposition Copyright 2010 Accenture All All Rights Reserved. Accenture, its its logo, and High Performance Delivered are trademarks of of Accenture.

NLP Value Proposition Data from IBM study at Seton Healthcare

Case Study 5 BJC HealthCare Making healthcare smarter BJC Healthcare NLP Results Results: Follow-up Appointments and Diagnoses Element Precision Recall Alcohol Use 91.8% 96.2% Alcohol Substance 95% 74% Alcohol Volume 96.3% 100.0% Alcohol Duration 86.7% 93.3% Alcohol Quit Duration 100.0% 96.1% Alcohol Family History 95.8% 83.3% Tobacco Use 90.0% 93.0% Medications 90.0% 92.0% 8

Strategies for Voice and Text Analytics Copyright 2010 Accenture All All Rights Reserved. Accenture, its its logo, and High Performance Delivered are trademarks of of Accenture.

Strategic Approach Voice recognition to standard EMR UI Voice recognition to a standard model Voice recognition to unstructured text document Content analytics on unstructured documents written to EMR fields Content analytics on unstructured documents written to a data warehouse Content analytics used at runtime and for predictive analytics and decision support

Is there a limit to Structured Data?

Tooling Options Copyright 2010 Accenture All All Rights Reserved. Accenture, its its logo, and High Performance Delivered are trademarks of of Accenture.

NLP Pipelines - UIMA Unstructured Information Management Architecture 4 Major Software Divisions It specifies component interfaces in an analytics pipeline It describes a set of Design patterns It suggests two data representations: an in-memory representation of annotations for high-performance analytics and an XML representation of annotations for integration with remote web services. It suggests development roles allowing tools to be used by users with diverse skills Is an OASIS Standard Reference Implementation Donated by IBM (SourceForge) Maintained by the Apache Foundation

Tooling

Tooling - Continued

Tooling - Continued

ctakes Clinical Text Analysis and Knowledge Extraction System (Mayo Clinic, Children's Hospital Boston) http://sourceforge.net/projects/ohnlp/files/ctakes/ Components Sentence boundary detector (OpenNLP) Rule-based tokenizer to separate punctuations from words Normalizer (NLM s NORM) Part-of-speech tagger (OpenNLP) Phrasal chunker (OpenNLP) Dictionary lookup annotator Context annotator Negation detector (NegEx) Dependency parser Module for the identification of patient smoking status Drug mention annotator Context dependent tokenizer

ctakes Derivation ctakes

Refined Lucene OWL Code Annotation

ClearTK ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA. http://code.google.com/p/cleartk/ (UCB) A common interface and wrappers for popular machine learning libraries such as SVMlight, LIBSVM, OpenNLP MaxEnt, and Mallet. A rich feature extraction library that can be used with any of the machine learning classifiers. Under the covers, ClearTK understands each of the native machine learning libraries and translates your features into a format appropriate to whatever model you're using. Infrastructure for creating NLP components for specific tasks such as partof-speech tagging, BIO-style chunking, named entity recognition, semantic role labeling, temporal relation tagging, etc. Wrappers for common NLP tools such as the Snowball stemmer, the OpenNLP tools, the MaltParser dependency parser, and the Stanford CoreNLP tools. Corpus readers for collections like the Penn Treebank, ACE 2005, CoNLL 2003, Genia, TimeBank and TempEval.

EMR Integration Options Copyright 2010 Accenture All All Rights Reserved. Accenture, its its logo, and High Performance Delivered are trademarks of of Accenture.

Optimal Goal Goal is: Convert unstructured to structured data Code this data into standard Meaningful Use terminologies Write the data to standard information models for health care data elements in standard ISO Healthcare datatypes

City of Hope A Proposed Architecture ETL Reporting and Business Intelligence Allscripts Database EMR OLTP Connection Content Analytics Natural Language Processing Staging - Relational ETL Staging - Triplestore Physical Layer ETL Logical Layer HL7 RIM V3 ETL EDW and Datamarts OLAP Analytics Predictive Analytics Statistics Datamining Allscripts Healthcare Accelerator RDF Triplestore Datamart Datamining Tool Examples: SPARQL, OWL, IBM SLRP, IBM IODT, OntoBroker, Sesame, Jena ETL ETL High Performance Analytics Risk stratification Treatment/Protocol evaluations Research cohort comparisons Real-time clinical decision support Disease management Population health management Personalized medicine / genomics Performance assessment Patient profiling Treatment cost calculations RDF Resource Description Framework OWL Web Ontology Language SPARQL Protocol and RDF Query Language IBM SLRP IBM Semantic Layer Research Platform IBM IODT IBM s toolkit for ontology-driven development OntoBroker Semantic web middleware Sesame Framework for querying and analyzing RDF data. Jena Semantic Web Framework for Java WATSON for Healthcare WEA Advisor Framework Tools APIs Methods Data Platform Massively Parallel Infrastructure Utilization Management Advisor Diagnosis and Treatment Advisor 25

Wrap Up Questions?? cecil.o.lynch@accenture.com

Thank You - Credits IBM jstart Team Randall Wilcox, Kevin Conroy Dataskill Victor Bagwell - CIO City of Hope Naveen Raja, D.O. CMIO Ying Liu, Ph.D. Bioinformatics Group Accenture German Acuna Suniti Ponkshe Jim Traficant