Travis Goodwin & Sanda Harabagiu

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Travis Goodwin & Sanda Harabagiu"

Transcription

1 Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research Institute The University of Texas at Dallas

2 Outline The Problem The Qualified Medical Knowledge Graph Identifying Medical Concepts Recognizing Assertions Constructing the QMKG Evaluation & Discussion Conclusions

3 The Problem More and more clinical data is available through Electronic Medical Records (EMRs) Notes within EMRs include a variety of knowledge: Medical history Physical exam findings Lab reports Radiology reports Operative reports Discharge summaries Etc. EMRs do not document the rationale for medical decisions Patient cohort studies evaluate progression of disease as well as the factors that influence clinical outcomes

4 Patient Cohort Identification TRECMed: a retrieval task from NIST offered in 2011 & topics : queries targeting patient cohorts Medical concepts e.g. acute coronary syndrome Patient constraints e.g. children 95,703 de-identified EMRs from multiple hospitals in The EMRs were grouped into hospital visits consisting of one or more medical reports from each patient s hospital stay. Thus, the EMRs were organized into 17,199 different patient hospital visits. Each visit had the patient s admission diagnoses, discharge diagnoses, and related ICD-9 codes

5 Sample TRECMed Topics No. Topic 156 Patients with depression on anti-depressant medication. 160 Patients with low back pain who had imaging studies. 172 Patients with peripheral neuropathy and edema. 184 Patients with colon cancer who had chemotherapy. The 35 topics evaluated in 2011 and the 50 topics evaluated in 2012 were characterized by (a) usage of medical concepts (e.g. acute coronary syndrome or plavix ) and (b) constraints imposed on the patient population (e.g. children, female patients).

6 The Barrier Medical science involves: asking hypotheses, experimenting with treatments, and reasoning from medical evidence. Consequently, clinical writing reflects this modus operandus with a rich set of speculative statements. Barriers: Physicians use hedging or linguistic means of expressing an opinion, rather than a fact. Abundance of speculative statements Our Solution: Automatically detect medical concepts Automatically identify medical assertions (belief values) associated with each medical concept Use these qualified concepts to build a graph of medical knowledge.

7 Cohort Retrieval System Retrieval system designed for TRECMed 2011/2012 A brief overview: 1. A topic is analyzed for keywords, and other constraints. 2. Keywords are expanded using our qualified medical knowledge graph 3. Initial BM25 retrieval 4. Re-ranking to assure agreement between assertion values between document and query Qualified Medical Knowledge Graph

8 Outline The Problem The Qualified Medical Knowledge Graph Identifying Medical Concepts Recognizing Assertions Constructing the QMKG Evaluation & Discussion Conclusions

9 The Qualified Medical Knowledge Graph Medical concepts are automatically identified in EMRs, and classified as: Medical Problem Treatment Test Assertions are automatically identified and assigned to each medical concept Graph in which nodes are qualified medical concepts represented as triplets: (concept text, concept type, assertion)

10 Example of assertions

11 Outline The Problem The Qualified Medical Knowledge Graph Identifying Medical Concepts Recognizing Assertions Constructing the QMKG Evaluation & Discussion Conclusions

12 The Qualified Medical Knowledge Graph

13 The Qualified Medical Knowledge Graph An edge between two graph nodes exists if the corresponding medical concepts co-occur within a window of tokens (for our experiments, we set = 20) within the same EMR. This idea of generating edges between medical concepts recognized in EMRs was inspired by the SympGraph methodology reported in Sondhi et al (KDD 2012) which models symptom relationships in clinical notes.

14 Automatic Medical Concept Recognition

15 Medical Concept Identification in EMRs Medical concepts in the form of : 1. medical problems, such as ATRIAL FIBRILLATION (irregular heart beat); 2. treatments, such as ABLATION (removal of undesired tissue); and 3. tests, such as ECG (electrocardiogram) were recognized using the methods reported in (Roberts and Harabagiu JAMIA 2011). This method recognizes medical concepts in two steps: Step 1: Identification of the boundaries within text that refers to a medical concept; Step 2: Classification of the medical concept into (a) medical problems, (b) medical treatments, or (c) medical tests.

16 Medical Concept Identification Preprocessing: Rule-based detection of measurements, dosages, & other entities Boundary: Heuristic separates prose from non-prose text. Then two Conditional Random Field (CRF) classifiers are used to extract concepts (one from prose, one for non-prose) Type: Support Vector Machine (SVM) classifier performs 3-way classification

17 Training the Medical Concept Identification System The data: 349 discharge summaries and progress notes available from the 2011 i2b2 VA challenge, A total of 25K training instances of medical concepts available. Testing data on the TRECMed clinical documents. A very large set of features were extracted Three distinct automatic feature selection method were used: 1. Greedy forward: Also known as additive feature selection, this method takes a greedy approach by always selecting the best feature to add to the feature set. 2. Greedy forward/backward: Also known as floating forward feature selection, this is an extension of greedy forward selection that greedily attempts to remove features from the current feature set after a new feature is added. 3. Feature selection using a genetic algorithm

18 Results for Medical Concept Identification Official i2b2/va results P R F1 Exact Boundary Exact Boundary + Type Inexact Boundary Inexact Boundary + Type System Score Best i2b2 submission Our i2b2 submission Median i2b2 submission Mean i2b2 submission 73.56

19 Outline The Problem The Qualified Medical Knowledge Graph Identifying Medical Concepts Recognizing Assertions Constructing the QMKG Evaluation & Discussion Conclusions

20 Medical Assertion Recognition

21 Assertion Classification Determining the belief status of a medical problem is also known as medical assertion. To be able to recognize automatically assertions, we cast this problem as a classification problem, implemented as an SVM classifier which is influenced by a) the medical concepts on which the assertion is produced, b) the meta data available in the section header where the assertion is implied and c) features available from UMLS (extracted by MetaMap) as well as features reflective of negated statements, disclosed through the NegEx negation detection package. A special case of features that provide belief values are available from the General Inquirer s category information. SVM classifier performed 12-way classification: 6 from 2010 i2b2 6 new assertion types, based on 2,349 new annotations.

22 Assertion Types = new assertion type

23 Results for Medical Assertion Classification System Score GFB+GA+GFB GFB+GA GFB Best i2b2 submission Our i2b2 submission (GF) Median i2b2 submission Mean i2b2 submission A flexible framework for deriving assertions from electronic medical records, By Kirk Roberts and Sanda Harabagiu, JAMIA

24 Outline The Problem The Qualified Medical Knowledge Graph Identifying Medical Concepts Recognizing Assertions Constructing the QMKG Evaluation & Discussion Conclusions

25 Constructing the QMKG Weighted undirected graph encoding similarity between qualified medical concepts. G = (E, V) Vertices: triples representing qualified medical concepts (lexical concept, concept type, assertion) Edge between two vertices if and only if they cooccur within the same context (we used a window of 20 tokens)

26 Vertex Extraction

27 Constructing the QMKG QMKG represented as an Adjacency matrix, A: An associated weight matrix, W, encodes the similarity between all pairs of qualified concepts according to some similarity function S.

28 First-Order Similarity Functions

29 Second-Order Similarity Function Qualified medical concepts are extremely sparse within EMRs Many qualified medical concepts do not share the same window, but still share some degree of semantic similarity that could be of value We generalized the notion of second-order PMI to compute the second-order similarity between two nodes using any first-order similarity measure. Calculates the similarity of two nodes as an aggregation of the first-order similarities between them and the highest weighted β intermediate nodes.

30

31 Outline The Problem The Qualified Medical Knowledge Graph Identifying Medical Concepts Recognizing Assertions Constructing the QMKG Evaluation & Discussion Conclusions

32 Evaluations & Discussion Precision and Recall for our assertion values evaluated against the 2010 i2b2 data, and our own annotations on EMRs.

33 Evaluation of the QMKG Generated QMKG stats: 634 thousand nodes with 13.9 billion edges (3.45% connectivity) 53.0% of nodes are medical problems 23.6% of nodes are medical tests 23.3% of nodes are medical treatments Assertion types distributed as follows:

34 Evaluation of the QMKG Evaluated the QMKG by testing on the TRECMed cohort retrieval task. Used it as a means of query expansion: Keywords mapped to their qualified medical concepts in the QMKG Select the top 20 highest weighted neighbors for each keyword as new keywords

35 Query Expansion using the QMKG

36 TRECMed 2012 Scores iap: inferred Average Precision indcg: inferred Normalized Discounted Cumulative Gain 10: refers to the precision within the first 10 results

37 Outline The Problem The Qualified Medical Knowledge Graph Identifying Medical Concepts Recognizing Assertions Constructing the QMKG Evaluation & Discussion Conclusions

38 Conclusion We created a medical knowledge graph relating pairs of medical concepts qualified by the physician s belief status. By using this kind of information, we are able to make progress towards bridging the inherent knowledge gap tied to understanding EMRs. It provides very promising results for patient cohort identification

39

A flexible framework for deriving assertions from electronic medical records

A flexible framework for deriving assertions from electronic medical records A flexible framework for deriving assertions from electronic medical records Kirk Roberts, Sanda M Harabagiu < Additional materials are published online only. To view these files please visit the journal

More information

A flexible framework for deriving assertions from electronic medical records

A flexible framework for deriving assertions from electronic medical records A flexible framework for deriving assertions from electronic medical records Corresponding Author: Kirk Roberts P.O. Box 830688; MS EC31; Richardson TX USA 75080-0688 kirk@hlt.utdallas.edu Phone: (972)

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Bryan Tinsley, Alex Thomas, Joseph F. McCarthy, Mike Lazarus Atigeo, LLC

More information

Electronic Medical Record Mining. Prafulla Dawadi School of Electrical Engineering and Computer Science

Electronic Medical Record Mining. Prafulla Dawadi School of Electrical Engineering and Computer Science Electronic Medical Record Mining Prafulla Dawadi School of Electrical Engineering and Computer Science Introduction An electronic health record is a systematic collection of electronic health information

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Demonstrating Meaningful Use Stage 1 Requirements for Eligible Providers Using Certified EMR Technology

Demonstrating Meaningful Use Stage 1 Requirements for Eligible Providers Using Certified EMR Technology Demonstrating Meaningful Use Stage 1 Requirements for Eligible Providers Using Certified EMR Technology The chart below lists the measures (and specialty exclusions) that eligible providers must demonstrate

More information

Guidelines for using V-CODES (Status Codes)

Guidelines for using V-CODES (Status Codes) 1 Disclaimer This presentation is intended only for use by Tulane University faculty, staff, and students. No copy or use of this presentation should occur without the permission of Tulane University.

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Sanda Harabagiu. The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

Sanda Harabagiu. The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu Linking Information Extracted from Electronic Medical Records to Structured Knowledge Sanda Harabagiu The University of Texas at Dallas http://www.hlt.utdallas.edu Outline of the talk 1. The Problem 2.

More information

Identify Disorders in Health Records using Conditional Random Fields and Metamap

Identify Disorders in Health Records using Conditional Random Fields and Metamap Identify Disorders in Health Records using Conditional Random Fields and Metamap AEHRC at ShARe/CLEF 2013 ehealth Evaluation Lab Task 1 G. Zuccon 1, A. Holloway 1,2, B. Koopman 1,2, A. Nguyen 1 1 The Australian

More information

Improving Health Records Search Using Multiple Query Expansion Collections

Improving Health Records Search Using Multiple Query Expansion Collections Improving Health Records Search Using Multiple Query Expansion Collections Dongqing Zhu and Ben Carterette Information Retrieval Lab University of Delaware Outline Introduction Retrieval Task & Data Retrieval

More information

Making Sense of Physician Notes: A Big Data Approach. Anupam Joshi UMBC joshi@umbc.edu Joint work with students, UBMC Colleagues, and UMMS Colleagues

Making Sense of Physician Notes: A Big Data Approach. Anupam Joshi UMBC joshi@umbc.edu Joint work with students, UBMC Colleagues, and UMMS Colleagues Making Sense of Physician Notes: A Big Data Approach Anupam Joshi UMBC joshi@umbc.edu Joint work with students, UBMC Colleagues, and UMMS Colleagues Where we are Significant progress in applying NLP and

More information

Automated Problem List Generation from Electronic Medical Records in IBM Watson

Automated Problem List Generation from Electronic Medical Records in IBM Watson Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei

More information

Big Data Analytics for Healthcare

Big Data Analytics for Healthcare Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics

More information

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Aron Henriksson 1, Martin Hassel 1, and Maria Kvist 1,2 1 Department of Computer and System Sciences

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public

More information

Radiology Business Management Association Technology Task Force. Sample Request for Proposal

Radiology Business Management Association Technology Task Force. Sample Request for Proposal Technology Task Force Sample Request for Proposal This document has been created by the RBMA s Technology Task Force as a guideline for use by RBMA members working with potential suppliers of Electronic

More information

2014, IJARCSSE All Rights Reserved Page 629

2014, IJARCSSE All Rights Reserved Page 629 Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Improving Web Image

More information

Exploration and Visualization of Post-Market Data

Exploration and Visualization of Post-Market Data Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson

More information

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics

More information

Tackling the Challenges of Big Data! Tackling The Challenges of Big Data. My Research Group. John Guttag. John Guttag. Big Data Analytics

Tackling the Challenges of Big Data! Tackling The Challenges of Big Data. My Research Group. John Guttag. John Guttag. Big Data Analytics Tackling The Challenges of Big Data John Guttag Professor Massachusetts Institute of Technology Tackling The Challenges of Big Data Applications - Medical Outcomes Introduction John Guttag Professor Massachusetts

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Probabilistic Prediction of Privacy Risks

Probabilistic Prediction of Privacy Risks Probabilistic Prediction of Privacy Risks in User Search Histories Joanna Biega Ida Mele Gerhard Weikum PSBD@CIKM, Shanghai, 07.11.2014 Or rather: On diverging towards user-centric privacy Traditional

More information

Physician and other health professional services

Physician and other health professional services O n l i n e A p p e n d i x e s 4 Physician and other health professional services 4-A O n l i n e A p p e n d i x Access to physician and other health professional services 4 a1 Access to physician care

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Russ Waitman Kelly Gerard Daniel W. Connolly Gregory A. Ator Division

More information

Database and Data Mining Security

Database and Data Mining Security Database and Data Mining Security 1 Threats/Protections to the System 1. External procedures security clearance of personnel password protection controlling application programs Audit 2. Physical environment

More information

arxiv: v1 [cs.ir] 20 Dec 2016

arxiv: v1 [cs.ir] 20 Dec 2016 Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016 Nam Khanh Tran L3S Research Center - Leibniz Universität Hannover ntran@l3s.de arxiv:1612.07117v1 [cs.ir] 20 Dec

More information

Information Management

Information Management Information Management Dr Marilyn Rose McGee-Lennon mcgeemr@dcs.gla.ac.uk What is Information Management about Aim: to understand the ways in which databases contribute to the management of large amounts

More information

CENG 734 Advanced Topics in Bioinformatics

CENG 734 Advanced Topics in Bioinformatics CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the

More information

Using Discharge Summaries to Improve Information Retrieval in Clinical Domain

Using Discharge Summaries to Improve Information Retrieval in Clinical Domain Using Discharge Summaries to Improve Information Retrieval in Clinical Domain Dongqing Zhu 1, Wu Stephen 2, Masanz James 2, Ben Carterette 1, and Hongfang Liu 2 1 University of Delaware, 101 Smith Hall,

More information

Slide 2: Outline. Slide 3: Introduction

Slide 2: Outline. Slide 3: Introduction Slide 1: Using Electronic Health Records to Better Coordinate Decision Making for Complex Patients: What Can We Learn From Wiki? Aanand D. Naik, MD and Hardeep Singh, MD, MPH Houston HSR&D CoE, Michael

More information

BiTeM group report for TREC Medical Records Track 2011

BiTeM group report for TREC Medical Records Track 2011 BiTeM group report for TREC Medical Records Track 2011 J. Gobeill a, A. Gaudinat a, E. Pasche b, D. Teodoro b, D. Vishnyakova b, P. Ruch a a BiTeM group, University of Applied Sciences, Information Studies,

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Risk Adjustment Definitions and Methodology

Risk Adjustment Definitions and Methodology Illness Burden Illness burden measures the relative health of the population based upon the number and types of health care services used by that group of people. For instance, if the number is in reference

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

Data-Driven Exploration of Care Plans for Patients

Data-Driven Exploration of Care Plans for Patients Data-Driven Exploration of Care Plans for Patients Adam Perer IBM T.J. Watson Research Center Yorktown Heights, NY 10598 United States adam.perer@us.ibm.com David Gotz IBM T.J. Watson Research Center Yorktown

More information

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,

More information

IBM Watson s Next Step: Health. All About the Data January 21 st 2016, Groningen

IBM Watson s Next Step: Health. All About the Data January 21 st 2016, Groningen IBM Watson s Next Step: Health All About the Data January 21 st 2016, Groningen Introduction speaker Dr Nicky S. Hekster Technical Leader Healthcare & LifeSciences IBM Nederland BV Johan Huizingalaan 765

More information

Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning

Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning 3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based

More information

Nandan Banerjee Cogent Infotech Corporation COGENT INFOTECH CORPORATION

Nandan Banerjee Cogent Infotech Corporation COGENT INFOTECH CORPORATION Nandan Banerjee Cogent Infotech Corporation Health Care Cost Better, Efficient, Valuable Health care services Stakeholders demand for metrics across clinical, operational and financial disciplines. Overcoming

More information

Signature Segmentation from Machine Printed Documents using Conditional Random Field

Signature Segmentation from Machine Printed Documents using Conditional Random Field 2011 International Conference on Document Analysis and Recognition Signature Segmentation from Machine Printed Documents using Conditional Random Field Ranju Mandal Computer Vision and Pattern Recognition

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Cardiology ICD-10-CM Coding Tip Sheet Overview of Key Chapter Updates for Cardiology

Cardiology ICD-10-CM Coding Tip Sheet Overview of Key Chapter Updates for Cardiology Cardiology ICD-10-CM Coding Tip Sheet Overview of Key Chapter Updates for Cardiology Chapter 4: Endocrine, Nutritional, and Metabolic Diseases (E00-E89) The diabetes mellitus codes are combination codes

More information

Investigating Clinical Care Pathways Correlated with Outcomes

Investigating Clinical Care Pathways Correlated with Outcomes Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

Improving the Management of the Dual Eligible Population

Improving the Management of the Dual Eligible Population Improving the Management of the Dual Eligible Population Kathryn Eshelman, MD, MPH October 26, 2012 Discussion Objectives Since dual eligibles are often and increasingly members of risk adjusted, disease

More information

Information Systems & Semantic Web University of Koblenz Landau, Germany

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany Information Systems University of Koblenz Landau, Germany Exploiting Spatial Context in Images Using Fuzzy Constraint Reasoning Carsten Saathoff & Agenda Semantic Web: Our Context Knowledge Annotation

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Programming Tools based on Big Data and Conditional Random Fields

Programming Tools based on Big Data and Conditional Random Fields Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up,

More information

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities

More information

Medicare & Dual Options. 1. Every page of the EMR document must include: a. Member Name b. Patient Identifiers (i.e. Date of Birth) c.

Medicare & Dual Options. 1. Every page of the EMR document must include: a. Member Name b. Patient Identifiers (i.e. Date of Birth) c. Medicare & SUBMITTING PROGRESS NOTES OR EMR You may use your own progress notes or Electronic Medical Record (EMR) to document the annual comprehensive examination. The EMR must include the elements indicated

More information

Tachyarrhythmias (fast heart rhythms)

Tachyarrhythmias (fast heart rhythms) Patient information factsheet Tachyarrhythmias (fast heart rhythms) The normal electrical system of the heart The heart has its own electrical conduction system. The conduction system sends signals throughout

More information

Patient Similarity-guided Decision Support

Patient Similarity-guided Decision Support Patient Similarity-guided Decision Support Tanveer Syeda-Mahmood, PhD IBM Almaden Research Center May 2014 2014 IBM Corporation What is clinical decision support? Rule-based expert systems curated by people,

More information

SPECIALTY CASE MANAGEMENT

SPECIALTY CASE MANAGEMENT SPECIALTY CASE MANAGEMENT Our Specialty Case Management programs boost ROI and empower members to make informed decisions and work with their physicians to better manage their health. KEPRO is Effectively

More information

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,

More information

Applying Machine Learning to Stock Market Trading Bryce Taylor

Applying Machine Learning to Stock Market Trading Bryce Taylor Applying Machine Learning to Stock Market Trading Bryce Taylor Abstract: In an effort to emulate human investors who read publicly available materials in order to make decisions about their investments,

More information

EHR CURATION FOR MEDICAL MINING

EHR CURATION FOR MEDICAL MINING EHR CURATION FOR MEDICAL MINING Ernestina Menasalvas Medical Mining Tutorial@KDD 2015 Sydney, AUSTRALIA 2 Ernestina Menasalvas "EHR Curation for Medical Mining" 08/2015 Agenda Motivation the potential

More information

Word Polarity Detection Using a Multilingual Approach

Word Polarity Detection Using a Multilingual Approach Word Polarity Detection Using a Multilingual Approach Cüneyd Murad Özsert and Arzucan Özgür Department of Computer Engineering, Boğaziçi University, Bebek, 34342 İstanbul, Turkey muradozsert@gmail.com,

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Does the heart have to be stopped to do a Maze procedure?

Does the heart have to be stopped to do a Maze procedure? What are the surgical options to treat atrial fibrillation? New concepts for the surgical treatment of atrial fibrillation were introduced in recent years when the surgical device industry introduced different

More information

Internist I and II Covered 70-80% of possible diagnoses in internal medicine, also based on Jack Myers expertise Worked best on only single disease

Internist I and II Covered 70-80% of possible diagnoses in internal medicine, also based on Jack Myers expertise Worked best on only single disease 1 Mycin- Stanford Doctoral dissertation of Edward Shortliffe designed to identify bacterial etiology in patients with sepsis and meningitis and to recommend antibiotics Had simple inference engine and

More information

TITLE Dori Whittaker, Director of Solutions Management, M*Modal

TITLE Dori Whittaker, Director of Solutions Management, M*Modal TITLE Dori Whittaker, Director of Solutions Management, M*Modal Challenges Impacting Clinical Documentation HITECH Act, Meaningful Use EHR mandate and adoption Need for cost savings Migration to ICD 10

More information

Copyright 2014. This report and/or appended material may not be partly or completely published or

Copyright 2014. This report and/or appended material may not be partly or completely published or Aalborg University Copenhagen Semester: 4 th Title: Personalized Medicine based on patient journals and family medical history records Aalborg University Copenhagen A.C. Meyers Vænge 15 2450 København

More information

GENETIC DATA ANALYSIS

GENETIC DATA ANALYSIS GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made

More information

Election of Diagnosis Codes: Words as Responsible Citizens

Election of Diagnosis Codes: Words as Responsible Citizens Election of Diagnosis Codes: Words as Responsible Citizens Aron Henriksson and Martin Hassel Department of Computer & System Sciences (DSV), Stockholm University Forum 100, 164 40 Kista, Sweden {aronhen,xmartin}@dsv.su.se

More information

DRAFT. Medical School Health IT Curriculum Topics. Matthew Swedlund, John Beasley, Erkin Otles, Eneida Mendonca

DRAFT. Medical School Health IT Curriculum Topics. Matthew Swedlund, John Beasley, Erkin Otles, Eneida Mendonca Medical School Health IT Curriculum Topics DRAFT Matthew Swedlund, John Beasley, Erkin Otles, Eneida Mendonca PC3. Use information technology to optimize patient care. 1. Attributes relating to appropriate

More information

Secondary Uses of Data for Comparative Effectiveness Research

Secondary Uses of Data for Comparative Effectiveness Research Secondary Uses of Data for Comparative Effectiveness Research Paul Wallace MD Director, Center for Comparative Effectiveness Research The Lewin Group Paul.Wallace@lewin.com Disclosure/Perspectives Training:

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

Arrhythmia Facts. Each day the average heart beats (expands and contracts) 100,000 times and pumps about 2,000 gallons of blood.

Arrhythmia Facts. Each day the average heart beats (expands and contracts) 100,000 times and pumps about 2,000 gallons of blood. Arrhythmia Facts During a 24-hour period, about 20% of healthy adults are likely to have frequent or multiple types of premature ventricular heartbeats. Heart arrhythmias are very common and nearly everyone

More information

Big Data: Image & Video Analytics

Big Data: Image & Video Analytics Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)

More information

Extracting Information from Social Networks

Extracting Information from Social Networks Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

More information

Not all NLP is Created Equal:

Not all NLP is Created Equal: Not all NLP is Created Equal: CAC Technology Underpinnings that Drive Accuracy, Experience and Overall Revenue Performance Page 1 Performance Perspectives Health care financial leaders and health information

More information

Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track

Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track Tracy Edinger, ND, Aaron M. Cohen, MD, MS, Steven Bedrick, PhD, Kyle

More information

Homework 15 Solutions

Homework 15 Solutions PROBLEM ONE (Trees) Homework 15 Solutions 1. Recall the definition of a tree: a tree is a connected, undirected graph which has no cycles. Which of the following definitions are equivalent to this definition

More information

Reputation Management Algorithms & Testing. Andrew G. West November 3, 2008

Reputation Management Algorithms & Testing. Andrew G. West November 3, 2008 Reputation Management Algorithms & Testing Andrew G. West November 3, 2008 EigenTrust EigenTrust (Hector Garcia-molina, et. al) A normalized vector-matrix multiply based method to aggregate trust such

More information

Pain Quick Reference for ICD 10 CM

Pain Quick Reference for ICD 10 CM Pain Quick Reference for ICD 10 CM Coding of acute or chronic pain in ICD 10 CM are located under category G89, Pain, not elsewhere classified. The subcategories are broken down by type, temporal parameter,

More information

ESC/EASD Pocket Guidelines Diabetes, pre-diabetes and cardiovascular disease

ESC/EASD Pocket Guidelines Diabetes, pre-diabetes and cardiovascular disease Diabetes, prediabetes and cardiovascular disease Classes of recommendations Levels of evidence Recommended treatment targets for patients with diabetes and CAD Definition, classification and screening

More information

Atrial Fibrillation Overview for Staff of Community Primary and Specialty Care Providers

Atrial Fibrillation Overview for Staff of Community Primary and Specialty Care Providers Atrial Fibrillation Overview for Staff of Community Primary and Specialty Care Providers Atrial Fibrillation (AF or AFib) Atrial fibrillation (also called AFib or AF) is a quivering or irregular heartbeat

More information

New York ehealth Collaborative. Health Information Exchange and Interoperability April 2012

New York ehealth Collaborative. Health Information Exchange and Interoperability April 2012 New York ehealth Collaborative Health Information Exchange and Interoperability April 2012 1 Introductions Information exchange patient, information, care team How is Health information exchanged Value

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION MATIJA STEVANOVIC PhD Student JENS MYRUP PEDERSEN Associate Professor Department of Electronic Systems Aalborg University,

More information

In the following we will only consider undirected networks.

In the following we will only consider undirected networks. Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each

More information

Final Exam, Spring 2007

Final Exam, Spring 2007 10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

More information

EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING

EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING Kocbek et al. Big Data 2015, Sydney 1 EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING Simon Kocbek, Lawrence Cavedon, David Martinez, Christopher Bain, Chris Mac Manus,

More information

DATA MINING AND REPORTING IN HEALTHCARE

DATA MINING AND REPORTING IN HEALTHCARE DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The

More information

Investment Analysis using the Portfolio Analysis Machine (PALMA 1 ) Tool by Richard A. Moynihan 21 July 2005

Investment Analysis using the Portfolio Analysis Machine (PALMA 1 ) Tool by Richard A. Moynihan 21 July 2005 Investment Analysis using the Portfolio Analysis Machine (PALMA 1 ) Tool by Richard A. Moynihan 21 July 2005 Government Investment Analysis Guidance Current Government acquisition guidelines mandate the

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information