Automated assignment of ICD-9-CM codes to radiology reports
|
|
|
- Phyllis McDowell
- 10 years ago
- Views:
Transcription
1 Automated assignment of ICD-9-CM codes to radiology reports Richárd Farkas University of Szeged Filip Ginter University of Turku
2 Overview Why clinical coding? Importance, use of automated coding Challenge description Data used Evaluation methodology Our solutions Szeged system Turku system Results and comparison Possibilities and challenges of a real-world system
3 NLP in the clinical domain Narrative texts A huge amount of information is hidden Manual processing requires expertise Time Costs Special features of medical texts Unique characteristics of the language used Smokes 2-3 cig / day, occ etoh, and no drugs except marijuana Exam
4 Clinical coding Automatic assignment of disease/symptom codes to clinical records International Classification of Diseases (ICD-X-CM) X revision (current: 10, used: 9) Used for statistics on diseases, or effects of treatment billing the task has commercial relevance Overcoding is penalised by 3x sum Undercoding means loss of revenue Codes are added to the text afterwards the treatment (US) coding costs $25 billion annually in the USA
5 International Challenge on Classifying Clinical Free Text Using Natural Language Processing Shared task challenge to evaluate NLP systems on clinical data ICD-9-CM coding Radiology reports Organization Computational Medicine Center Cincinatti, Ohio, USA February/March, 2007 Motivation Practical importance for hospital administration and health insurance
6 120+ registered participants 44 systems submitted
7 Data Used Radiology records annotated with ICD codes 978 documents used for training ICD-9 systems 976 unseen documents used for evaluation Annotation provided by 3 health institutes majority labeling used as gold standard 45 different ICD codes used codes appear in various combinations (94 different sets of codes) frequency of labels vary The data is made available free of charge for research purposes by the challenge organizers
8 Example <doc id=" " type="radiology_report"> <codes> <code origin="cmc_majority" type="icd-9-cm">786.2</code> <code origin="company3" type="icd-9-cm">518.0</code> <code origin="company1" type="icd-9-cm">786.2</code> <code origin="company2" type="icd-9-cm">786.2</code> </codes> <texts> <text origin="cchmc_radiology" type="clinical_history"> Cough. History of pneumonia on 1/2/01. Increased work of breathing. </text> <text origin="cchmc_radiology" type="impression"> No significant change to overall appearance of perihilar lung opacities and peribronchial thickening most consistent with viral illness vs reactive airways disease. Increased densities superimposed over the right middle lobe and lingular region on the lateral view may represent superimposition of shadows. However atelectasis or a small amount of parenchymal consolidation cannot be fully excluded. This patient's lung markings have appeared prominent on the four existing chest x-rays in our file. It is recommended that the child receive a well - child chest x-ray in order to evaluate lung markings when the child is not sick. </text> </texts> </doc>
9 Distribution of labels
10 Results
11 Szeged, Hungary Richárd Farkas Research Group on Artificial Intelligence of the Hungarian Academy of Sciences, György Szarvas University of Szeged, Department of Informatics, Human Language Technology Group without physicians
12 Szeged ICD coding solutions Language Processing negation/speculation Exploiting ICD and utilise labeled data Inter-label dependecies Synonyms and abbreviations Challenge system: hand crafted reconstructed automatically (machine learning)
13 Language processing Coding guides order that uncertain diagnosis should not be coded speculations Peribronchial thickening most consistent with viral illness vs reactive airways disease negation Normal slightly hypoventilatory chest x-ray, no pneumonia. issues in the past without direct effect on current treatment should not be coded temporal resolution is neglected due to noisy annotation of historical findings
14 Detection of speculation/negation Simple approach, motivated by not too difficult grammar of the text physicians aim to briefly enumerate findings and their opinion rarely use very complex Noun Phrases or syntax Dictionaries of keywords collected from training data Scope identified by naive heuristic right scope end of sentence left scope previous punctuation (or nothing, depending on the keyword) Normal slightly hypoventilatory chest x-ray, no pneumonia.
15 Using the ICD
16 Exploration of inter-label dependencies Overcoding, e.g. symptoms and diseases C4.5 classifiers trained for false positive labels Features: base-system labels Extracted 5 dependencies each express Delete symptom if disease has textual evidence e.g. Delete Cough and Fever if Pneumonia coded
17 Data-driven model Vector Space Model token grams as features C4.5 classifier on 45 binary classification tasks Expanding the dictionaries: Gathering missing synonyms, abbreviations C4.5 classifiers trained for false negative labels
18 Example of terms found Urinary Tract Infection uti Asthma reactive airways disease Laurence-Moon-Biedl syndrome Williams syndrome Beckwith-Wiedemann syndrome hemihypertrophy
19 External knowledge (ICD) vs. Data-driven models ICD data independent robust (information source is reliable) can cover rare codes Data-driven can explore individual coding style (synonyms, abbreviations) requires labeled documents cannot handle rare codes
20 Added values of the subphases 45-class statistical system ICD + inter-label dependencies + statistical enriching (synonyms) Union of statistical and coding guide Hand-crafted system - language processing Train 88.20% 84.07% 85.57% 90.26% 90.53% 90.02% 71.46% Eval 86.69% 83.21% 84.85% 88.93% 89.33% 89.41% 70.48%
21 The Turku Group in the Challenge Language processing group at the Department of IT, University of Turku and Turku Centre for Computer Science (TUCS) Antti Airola Filip Ginter Tapio Pahikkala Sampo Pyysalo Tapio Salakoski Hanna Suominen Department of nursing science, University of Turku Sanna Salanterä
22 The Turku ICD coding system Feature engineering Mapping text to UMLS concepts (MetaMap) Recognition of negation and speculation Generalization via hypernymy Machine learning Primary classifier (RLS) Secondary classifier (Ripper) corrections of known errors made by the primary classifier Additional training instances from ICD definitions
23 MetaMap MetaMap identifies instances of UMLS concepts in running text NLM s MetaMap program Divides running text to phrases Each phrase is mapped into a set of UMLS concepts from specified vocabularies A way to abstract from text
24 MetaMap output example Eleven year old Eleven, Quanitative Concept, C Year, Temporal Concept, C Old, Temporal Concept, C with acute leukemia Acute leukaemia, Neoplastic Process, C bone marrow transplant Bone marrow transplant, Therapeutic or Preventive Procedure, C on Jan. 2 now with three day history Three, Quantitative concept, C day, Temporal concept, C History, Occupation or Discipline, C of cough Cough, Sign or Symptom, C
25 Hypernym expansion Hypernyms as additional features Generalize the identified concepts along the hierarchy Cough Respiratory symptoms Signs and Symptoms Fever Body temperature altered Signs and Symptoms Atelectasis Diseases of the lung Diseases of the respiratory system Pneumonia Diseases of the lung Diseases of the respiratory system
26 Hypernym expansion motivation More accurate similarity information Lexically, cough and fever are different Hypernym expansion adds the information that both are symptoms The connection can also be learned given large quantities of data But rare cases can benefit here
27 Negation and speculation Negation, speculation, temporal information Recognize trigger words could, history of, likely, may, mild, minimal, no, past, possible, possibly, probable, probably, questionable, suggestive, unsure, without Scope: Everything from a trigger word up to the end of the current sentence All features extracted from a negated text span are marked ICD coding guide: speculated / unsure code is not assigned
28 Hypernym expansion & negation Hypernym expansion and negation VALID: pneumonia lung disease INVALID: not pneumonia not lung disease Negated concepts are not expanded with hypernyms Room for improvement VALID: possible pneumonia possible lung disease
29 Feature engineering Final set of features entering the classifier Text tokens No particular order: Bag-of-Words (BoW) model Marked with neg- whenever negated Set of UMLS concepts (their c-codes) extracted with MetaMap Marked with neg- whenever negated Set of hypernyms of the extracted UMLS concepts Included only for non-negated concepts
30 Classification RLS (regularized least-squares) classifier Maximal-margin, kernel-based classifier Close relative of Support Vector Machines (SVMs) Linear kernel (fast & worked well) One classifier for each code 1 versus all classification May lead to no codes assigned or an impossible combination of codes
31 Correcting known errors Cascaded classifier attempts to correct known errors Empty or impossible combinations RIPPER Decision rules Much different paradigm than RLS Trained and applied exactly as the first classifier 1 vs. All Known errors made by the second classifier left uncorrected Experiments show no additional improvement
32 Using ICD-9 in training ICD-9 definitions as training instances Concatenate the textual definitions of each of the 45 codes and its parents in the ICD hierarchy Same generalization idea as previously Extract features in the standard way Pool the resulting 45 training instances with the challenge training data Provides additional positive examples
33 Turku system: Summary FEATURE EXTRACTION Source text UMLS hierarchy CLASSIFICATION RLS classifier 1 vs. All Tokenization Negation and speculation detection MetaMap Set of UMLS concepts UMLS hypernym expansion Extended set of UMLS concepts + Source text tokens Set of ICD codes impossible combination RIPPER classifier 1 vs. All possible combination Final set of ICD codes
34 Turku system: Component contribution F micro Error Relative Gain RLS (initial) Tokenization % UMLS mapping % UMLS hypernyms % Negation/speculation % Cascaded Ripper % ICD-9 training data % Cross-validated performance on training data
35 Turku vs. Szeged: Crucial differences Szeged system No external resources beyond ICD-9 ICD-9 definitions and coding guidelines are the core of the system Challenge system: rule-based Replicated via machine learning Turku system Heavy reliance on UMLS MetaMap Hypernyms ICD-9 definitions used as training examples with 0.1 percentage point improvement No explicit use of ICD-9 coding guidelines Pure machine learning
36 Turku vs. Szeged: Crucial differences Szeged system allows individual ICD code deletion if code X is given, delete code Y Turku system rejects the whole code combination and applies a different classifier Paradoxically, no gain from using the Szeged finer ICD code handling on top of Turku results (0.3 percentage point F-score decrease) E.g. false positive disease code causes a true positive symptom code to be removed Use of hypernym expansion More detailed negation/speculation/temporal detection in Szeged system
37 Language specifics CMC challenge was on English text How about other languages? Szeged system Needs translated ICD Language-adapted negation/speculation detection Turku system Needs translated UMLS resources and MetaMap Much of the features are language-independent UMLS c-codes Language-adapted negation/speculation detection Both systems rely on string search in one way or another Problem in inflective languages
38 Crucial differences (cont.) Different approach to design Turku system Classifier-centric Extract all thinkable features and feed them into a stateof-the-art classifier Szeged system Data-centric Build from the available resources (ICD and training data) and use classifiers with interpretable models Study the mistakes and the model, correct errors
39 CMC challenge results: The big picture Best F-score 89.1 (Szeged system) Mean F-score 76.7 (=13.4) Turku and Szeged baselines Szeged: 83.2% F bare system with just NLP and ICD but no other direct use of the training data Turku: 80.7% F bare machine learning system with no data preprocessing of any kind (only whitespace tokenization) About half of the challenge submissions stayed below these baseline systems!
40 CMC challenge: Lessons learned General observations across all submissions Presented by Pestian et al., ACL 07 BioNLP workshop, 2007 Based on short system descriptions (not publicly available) 1. Best systems explicitly took into account negation and speculation 2. Better systems frequently worked with hypernym and synonym detection 3. Significant amount of symbolic processing 4. Two of the top three systems were ML-based
41 CMC challenge: Lessons learned 5. Careful, medically-informed feature engineering common 6. SVM and related state-of-the-art classification algorithms were strongly represented, but not reliably predictive of high ranking Turku development observation: a number of traditional classifiers matched RLS performance when used correctly
42 Beyond the ICD coding Similar NLP tasks The same architecture can be used Find the relevant parts of the documents Find relevant phrases (synonyms, abbreviations) simple string-matching with a particular dictionary Prototype tasks: The i2b2 obesity challenge Smoking status detection
43 The i2b2 obesity challenge Who's obese and what co-morbidities do they (definitely/likely) have? Informatics for Integrating Biology and the Bedside (i2b2) Febr. June 730 training and 507 evaluation document multi-label problem, 16 morbidities
44 Comparison Focusing on several morbidities (matchable with set of ICD) Longer documents (avg. of the lengths: 130 rows) More noise The patient has a positive family history of coronary disease Negation/speculation detection is highlighted (Y/N/Q/U F-macro)
45 Smoking status detection i2b2 challenge 2006 The patient in question is SMOKER, NON-SMOKER, PAST-SMOKER or smoker status UNKNOWN inter-annotator agreement ~85% 398 train and 104 eval documents Small dictionaries: smoke, tobacco etc. best systems 88% with external data 94%
46 Final thoughts on ICD coding Some clear advantages lower costs less error-prone processing of simpler cases Fully automatic system is impossible (nowadays) Far away from human intelligence will not solve rare, harder cases Right middle and probable right lower lobe pneumonia.
47 The place of an automatic system Pre-labeling/highligthing to speed up manual coding prediction along with confidence measure Validation suggesting erroneous / missed codes monitoring for health insurance companies Automated coding of large datasets mainly for statistical purposes
48 Tasks to be solved Extending systems to thousends of codes If a corpus with appropiate size is available Incorporating more expert knowledge into the statistical methods user-friendly interfaces interactive systems Better language processing Corpus for developing sophisticated scope detectors: BioScope (released 2008 June)
49 Open questions the coder or every institute has its own individual coding styles how to transfer among languages? Is there any drop in accuracy on other languages (free word order in Hungarian) on other domains (nursing notes)? What is the real speed-up of an automatic pre-coding/suggestion system?
50 Open questions (cont.) More training data needed to scale the systems up Hospitals have the data but privacy concerns prevent its dissemination to companies / NLP researchers who build the system Training data generally cannot be reconstructed from trained machine-learning systems Distribute an empty system? Legal issues? Technical issues?
51 Multilingual ICD tagging: summary Basic NLP tools Tokenizer Lemmatizer Tagger, phrase parser (in some approaches) Need domain adaptation Controlled domain vocabulary resources Term variants (e.g. synonyms and abbreviations) Generally scarce Ideally within a large framework such as UMLS Allowing tool re-use
52 Basic NLP resources Tokenizer Preferably domain-adapted Very poor language standards in some clinical documents Lemmatizer Point in case: FinTWOL and nursing narratives Basic FinTWOL extended by Lingsoft with ~3500 domain words Recognition rate grew from 83.1% to 90.7% That corresponds to 42% decrease in unrecognized running words Hungarian: lemmatizers exist but are not domain adapted due to data privacy concerns Researchers who are able to adapt the lemmatizers do not have appropriate data access permissions
53 References 1 st place: Farkas, R., & Szarvas, G. (2008). Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinformatics, 9S3, S10. 2 nd place: Crammer, K., Dredze, M., Ganchev, K., & Talukdar, P. P. (2007). Automatic code assignment to medical text. Proceedings of ACL 07 BioNLP workshop. 3 rd place: Suominen, H., Ginter, F., Pyysalo, S., Airola, A., Pahikkala, T., Salanterä, S., & Salakoski, T. (2008). Machine Learning to Automate the Assignment of Diagnosis Codes to Free-text Radiology Reports: a Method Description. Proceedings of the ICML/UAI/COLT Workshop on Machine Learning for Health-Care Applications. Challenge description: Pestian, J. P., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K. B., & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. Proceedings of ACL 07 BioNLP workshop.
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques
Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques Ramakanth Kavuluru 1,2, Sifei Han 2, and Daniel Harris 2 1 Division of Biomedical
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
A Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA [email protected] Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA [email protected] Abstract
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
Automated Problem List Generation from Electronic Medical Records in IBM Watson
Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei
Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification
Department of Telecommunication Engineering Hijjawi Faculty for Engineering Technology Yarmouk University Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification Prepared by Orobh
Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science
Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,
Find the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
The ICD-9-CM uses an indented format for ease in reference I10 I10 I10 I10. All information subject to change. 2013 1
Section I. Conventions, general coding guidelines and chapter specific guidelines The conventions, general guidelines and chapter-specific guidelines are applicable to all health care settings unless otherwise
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Taxonomies in Practice Welcome to the second decade of online taxonomy construction
Building a Taxonomy for Auto-classification by Wendi Pohs EDITOR S SUMMARY Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods
An intelligent tool for expediting and automating data mining steps. Ourania Hatzi, Nikolaos Zorbas, Mara Nikolaidou and Dimosthenis Anagnostopoulos
An intelligent tool for expediting and automating data mining steps Ourania Hatzi, Nikolaos Zorbas, Mara Nikolaidou and Dimosthenis Anagnostopoulos Outline Data Mining, current tools An intelligent tool
Combining structured data with machine learning to improve clinical text de-identification
Combining structured data with machine learning to improve clinical text de-identification DT Tran Scott Halgrim David Carrell Group Health Research Institute Clinical text contains Personally identifiable
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
Supervised Extraction of Diagnosis Codes from EMRs: Role of Feature Selection, Data Selection, and Probabilistic Thresholding
Supervised Extraction of Diagnosis Codes from EMRs: Role of Feature Selection, Data Selection, and Probabilistic Thresholding Anthony Rios Department of Computer Science University of Kentucky, Lexington,
Computer-assisted coding and natural language processing
Computer-assisted coding and natural language processing Without changes to current coding technology and processes, ICD-10 adoption will be very difficult for providers to absorb, due to the added complexity
Mining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
Natural Language Processing for Clinical Informatics and Translational Research Informatics
Natural Language Processing for Clinical Informatics and Translational Research Informatics Imre Solti, M. D., Ph. D. [email protected] K99 Fellow in Biomedical Informatics University of Washington Background
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Free Text Phrase Encoding and Information Extraction from Medical Notes. Jennifer Shu
Free Text Phrase Encoding and Information Extraction from Medical Notes by Jennifer Shu Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements
Identify Disorders in Health Records using Conditional Random Fields and Metamap
Identify Disorders in Health Records using Conditional Random Fields and Metamap AEHRC at ShARe/CLEF 2013 ehealth Evaluation Lab Task 1 G. Zuccon 1, A. Holloway 1,2, B. Koopman 1,2, A. Nguyen 1 1 The Australian
Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
PharmaSUG2011 Paper HS03
PharmaSUG2011 Paper HS03 Using SAS Predictive Modeling to Investigate the Asthma s Patient Future Hospitalization Risk Yehia H. Khalil, University of Louisville, Louisville, KY, US ABSTRACT The focus of
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts Vitomir Kovanović [email protected] Dragan Gašević [email protected] School of Informatics, University of Edinburgh Edinburgh, United Kingdom [email protected]
A Decision Support Approach based on Sentiment Analysis Combined with Data Mining for Customer Satisfaction Research
145 A Decision Support Approach based on Sentiment Analysis Combined with Data Mining for Customer Satisfaction Research Nafissa Yussupova, Maxim Boyko, and Diana Bogdanova Faculty of informatics and robotics
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Natural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS [email protected] Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
Electronic Medical Record Mining. Prafulla Dawadi School of Electrical Engineering and Computer Science
Electronic Medical Record Mining Prafulla Dawadi School of Electrical Engineering and Computer Science Introduction An electronic health record is a systematic collection of electronic health information
Keywords social media, internet, data, sentiment analysis, opinion mining, business
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction
Medical-Miner at TREC 2011 Medical Records Track
Medical-Miner at TREC 2011 Medical Records Track 1 J.M. Córdoba, 1 M.J. Maña, 1 N.P. Cruz, 1 J. Mata, 2 F. Aparicio, 2 M. Buenaga, 3 D. Glez-Peña, 3 F. Fdez-Riverola 1 Universidad de Huelva 2 Universidad
Searching biomedical data sets. Hua Xu, PhD The University of Texas Health Science Center at Houston
Searching biomedical data sets Hua Xu, PhD The University of Texas Health Science Center at Houston Motivations for biomedical data re-use Improve reproducibility Minimize duplicated efforts on creating
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Projektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model *
Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model * Buzhou Tang 1,2, Yonghui Wu 1, Min Jiang 1, Joshua C. Denny 3, and Hua Xu 1,* 1 School of Biomedical
Disease/Illness GUIDE TO ASBESTOS LUNG CANCER. What Is Asbestos Lung Cancer? www.simpsonmillar.co.uk Telephone 0844 858 3200
GUIDE TO ASBESTOS LUNG CANCER What Is Asbestos Lung Cancer? Like tobacco smoking, exposure to asbestos can result in the development of lung cancer. Similarly, the risk of developing asbestos induced lung
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France [email protected] Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
SVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
Exploration and Visualization of Post-Market Data
Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Analyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
Software Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
3M Health Information Systems
3M Health Information Systems 1 Data Governance Disparate Systems Interoperability Information Exchange Reporting Public Health Quality Metrics Research Data Warehousing Data Standards What is the 3M Healthcare
New Developments in the Automatic Classification of Email Records. Inge Alberts, André Vellino, Craig Eby, Yves Marleau
New Developments in the Automatic Classification of Email Records Inge Alberts, André Vellino, Craig Eby, Yves Marleau ARMA Canada 2014 INTRODUCTION 2014 2 OUTLINE 1. Research team 2. Research context
Predicting Chief Complaints at Triage Time in the Emergency Department
Predicting Chief Complaints at Triage Time in the Emergency Department Yacine Jernite, Yoni Halpern New York University New York, NY {jernite,halpern}@cs.nyu.edu Steven Horng Beth Israel Deaconess Medical
A Medical Decision Support System (DSS) for Ubiquitous Healthcare Diagnosis System
, pp. 237-244 http://dx.doi.org/10.14257/ijseia.2014.8.10.22 A Medical Decision Support System (DSS) for Ubiquitous Healthcare Diagnosis System Regin Joy Conejar 1 and Haeng-Kon Kim 1* 1 School of Information
Health Science Career Field Allied Health and Nursing Pathway (JM)
Health Science Career Field Allied Health and Nursing Pathway (JM) ODE Courses Possible Sinclair Courses CTAG Courses for approved programs Health Science and Technology 1 st course in the Career Field
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Big Data Integration and Governance Considerations for Healthcare
White Paper Big Data Integration and Governance Considerations for Healthcare by Sunil Soares, Founder & Managing Partner, Information Asset, LLC Big Data Integration and Governance Considerations for
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
A.4.2. Challenges in the Deployment of Healthcare Information Systems and Technology
A.4.2. Challenges in the Deployment of Healthcare Information Systems and Technology In order to support its constituent enterprise in Latin America and the Caribbean and deliver appropriate solutions,
11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
X-ray (Radiography) - Chest
Scan for mobile link. X-ray (Radiography) - Chest What is a Chest X-ray (Chest Radiography)? The chest x-ray is the most commonly performed diagnostic x-ray examination. A chest x-ray produces images of
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning
3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market
Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols
GE Healthcare Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols Authors: Tianyi Wang Information Scientist Machine Learning Lab Software Science &
Clinical Database Information System for Gbagada General Hospital
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 9, September 2015, PP 29-37 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org
Protect Your Family. and Friends from. The TB Contact Investigation TUBERCULOSIS
Protect Your Family TB and Friends from TUBERCULOSIS The TB Contact Investigation What s Inside: Read this brochure today to learn how to protect your family and friends from TB. Then share it with people
ASTHMA IN INFANTS AND YOUNG CHILDREN
ASTHMA IN INFANTS AND YOUNG CHILDREN What is Asthma? Asthma is a chronic inflammatory disease of the airways. Symptoms of asthma are variable. That means that they can be mild to severe, intermittent to
Application of Data Mining Methods in Health Care Databases
6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Ear Infections Asthma in childhood asthma in childhood
Asthma Ear Infections in childhood asthma in childhood Asthma in childhood is common and it can be serious. About one in six children (aged less than 15 years) in Western Australia are affected by asthma.
