Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
|
|
- Gordon Whitehead
- 8 years ago
- Views:
Transcription
1 Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for Information Technologies Abstract: One of the major problems in e-health domain is electronic processing of patient health records. The core of the problem is transforming original, free-text health records to structured documents by using a defined structure model. We designed a system, which use combination of two approaches to do this transformation. Regular was used to recognition typical patterns like date or various biophysical parameters. Linguistic was used to analyze sentences and blocks in a text. Resulted structured documents can be used not only for efficient concept-based information retrieval, but, e.g. also for knowledge discovery process in collections of structured electronic patient records. Keywords: patient records, e-health, regular, linguistic, structured patient records 1. ITRODUCTIO Original patient health records are often written by physicians in the form of unstructured text that is not suitable for efficient electronic processing [7]. On the other hand, their electronic version could significantly enhance the possibilities for information retrieval, as well as further of patient records for various purposes [5]. Many publications have suggested positive effects of electronic messaging in health care [9]. From the electronic patient records could profit all groups of stakeholders in the e-health domain providing the possibility to create various types of useful and efficient electronic services [2]. Much more efficient information retrieval by all types of stakeholders (patients about her/his health status, physicians, health insurance companies) can be used e.g. for the following purposes. Sharing of all available information about patients between various departments in a hospital, as well as between various types of medical specialists, avoiding e.g. unnecessary repetition of similar investigations. Generation of pre-filled forms of various types by patients, physicians as well as by insurance companies). Efficient decision support is possible when a large number of electronic health records from history are available and can be used e.g. as follows. Decision support for various physicians, who could consult similar cases from the history. Discovery of new, interesting patterns, which can result in new knowledge about particular decease, effects of pharmaceuticals etc. (for more details see Section 4). Efficient of particular physicians behavior by health insurance companies. The rest of this paper is structured as follows. Section 2 presents the proposed system. First, some basic characteristics of the real data and their common features are summarized. Second, the functional architecture of the proposed and implemented system is provided and explained. Section 3 describes experiments performed with the implemented system on real data from two hospitals. In section 4 some future possibilities to exploit available structured data using knowledge discovery techniques are sketched. Finally section 5 summarizes main contributions of this paper.
2 2. SYSTEM DESIG 2.1. DATA AALYSIS The core problem in the area of electronic health records is transformation of original, free-text health records to structured documents by using a defined structure model. Therefore detailed data was necessary in order to design a suitable architecture for our system. We have available real data from The European Center for Medical Informatics, Statistics and Epidemiology of Charles University and Academy of Sciences (EuroMISE Center 1 ). Data are representing by anonymous health records, which are structured to blocks (social anamnesis, family history, ECG, etc.). In individual blocks are data written as free-text. The data was acquired from cardio clinics in two different hospitals in Czech republic. Basic statistical characteristics of the data are provided in Table 1. regular morphological syntactic training data (hospital 1) testing data (hospital 1) testing data (hospital 2) o. of records Text size [B] found patterns patterns/record 8,90 11,81 10,90 patterns/kb of text 6,49 6,61 5,18 found patterns patterns/record 7,85 10,60 17,37 patterns/kb of text 5,72 5,93 8,25 found patterns patterns/record 0,30 0,35 0,93 patterns/kb of text 0,22 0,20 0,44 Table 1: Statistical characteristics of the given data sets There are some common problems that we identified having analyzed available health records data, e.g.: All physicians who produced the health records have their own, unique writing style. Moreover, there are some small, but notable differences in terminology used in different hospitals, which is implied by different work habits. Significant number of typist s errors. Heavy usage of different acronyms (which may differ also for particular hospitals). Data in individual blocks are mixed. We have also available a structured model (in form of a taxonomy) from EuroMISE Center, which we need to fill in with information extracted patient from health records text. The model consists e.g. from features like blood pressure, pulse, characteristic of ECG, number of smoked cigarettes per day, allergy, etc. 1
3 2.2. SYSTEM ARCHITECTURE Figure 1. shows functional scheme of the proposed system, which uses a combination of regular and linguistic. Use of linguistic approaches is especially difficult for languages like Czech or Slovak [1], [4]. The system is being implemented using Java technology. Already implemented blocks are marked with bold border in Figure 1. 1.Free-text document 2.Regular 3.Tokenization 4.Morphological 5.Identify blocks 6.Syntactic 7.Semantic 8.Context 9.Mapping to data model 10.Structured document Figure 1: Functional architecture of the proposed system Functionality of particular building blocks in the proposed sequential architecture is briefly explained in the following: 1. Free-text document: Given free-text patient health record. 2. Regular : Looking for regular expressions (example: blood pressure TK120/30 we describe in the form of following regular expression: TK\d+/\d+), which are used as special words in next step of. This was the only type of used in system proposed in [8]. 3. Tokenization: Division document to individual tokens (i.e. words or regular expressions identified in previous step). 4. Morphological : Specifying word class to individual words (such as nouns, verbs, etc.) and their grammatical categories [4], [1]. 5. Identify blocks: It would be useful to identify particular text blocks such as e.g. family history block. But this is not easy, because sentences are not exactly defined, physicians often do not use regular sentences. 6. Syntactic : We are looking for simple sentences (if not a whole sentence, then at least some verb phrases), because they have got big information value. Mainly we are searching for subjects and predicates [4]. 7. Semantic : As next step would be to define relations between words, which for Slovak or Czech language would imply the necessity to have special dictionary with semantic bindings of particular verbs [4], [1]. 8. Context : In case that a sentence doe not have a direct object, it is necessary to derive it from a context [4]. But this is not so typical problem for patient health records. 9. Mapping to data model: This is block is to recognize patterns using given data model and results of the previous regular and linguistic and saving recognized patterns to data model. 10. Structured document: Structured document in XML format presents the output of our system.
4 3. EXPERIMETS We used the same data to train and to test as in [8], where regular only has been used to structure patient health records. Our goal was to increase the resulted precision recognized data model structures in free-text. To evaluate of quality of transformation we used coefficients P (precision), R (recall) and F -measure, defined by equations 1, 2 and 3 respectively. To evaluate F we used = 0,5 (harmonic mean of P and R). Precision: marked_relevant P = 0; 1. (1) marked Recall: marked_relevant R = 0; 1. (2) relevant F -measure: = PR F 0; ( 1 β ) P + βr 1 β. (3) Where: marked_relevant - number of all correctly recognized (marked) expression by the system as relevant to the given data model - number of all expression recognized (marked) by the system as relevant to the given marked data model - number of all expressions in the text, that are relevant to the given data model relevant First, the system was trained with data from patient health records from hospital 1 only. ext, we evaluated the influence of adding linguistic (blocks 3, 4 and 6 in Figure 1) to the regular one (block 2 in Figure 1). The detailed results of experiments are presented in Table 2 (regular only) and in Table 3 (regular as well as linguistic ). file marked fault unmarked P R F txt ,80 0,80 0,80 39.txt ,81 0,62 0,70 61.txt ,00 0,75 0,86 75.txt ,00 0,80 0, txt ,82 0,90 0,86 20.txt ,75 0,75 0,75 65.txt ,60 0,30 0, txt ,93 0,81 0,87 64.txt ,00 0,75 0,86 98.txt ,00 0,82 0,90 total ,88 0,72 0,79 Table 2: Patterns found in data from hospital 1 using regular only
5 file marked fault unmarked P R F txt ,80 0,80 0,80 39.txt ,81 0,62 0,70 61.txt ,00 0,75 0,86 75.txt ,00 0,80 0, txt ,82 0,90 0,86 20.txt ,88 0,88 0,88 65.txt ,80 0,40 0, txt ,93 0,81 0,87 64.txt ,00 0,75 0,86 98.txt ,00 0,82 0,90 total ,90 0,74 0,81 Table 3: Patterns found in data from hospital 1 using regular and linguistic From the tables above we can see, that in two cases from the 10 analyzed documents the results have been improved, namely in patient records 20.txt and 65.txt (they are marked with bold font in both Table 2 and Table 3). In the last experiment we applied our system, trained on data from the hospital 1 only to test data from hospital 2. Detailed results of this experiment are presented in Table 4. We can see that the results are a bit worse in comparison to the previous experiment (see Table 3), but in an acceptable measure. The reason is that physicians in the second hospital have a bit different terminology, use different abbreviations etc. that could not be acquired during the training process from patient health records from other hospital. file marked fault unmarked P R F txt ,82 0,60 0,69 06.txt ,00 0,69 0,81 09.txt ,25 0,22 0,24 13.txt ,86 1,00 0,92 20.txt ,40 0,20 0,27 28.txt ,69 0,75 0,72 24.txt ,91 0,77 0,83 27.txt ,82 1,00 0,90 11.txt ,92 0,92 0,92 17.txt ,73 0,55 0,63 total ,76 0,66 0,71 Table 4: Patterns found in data from hospital 2 using regular and linguistic 4. POSSIBLE USE OF TEXT MIIG There are many possibilities for application of text (data) mining approaches for discovery of new and potentially useful patterns from large number of electronic patients health records. In this section some of these possibilities are sketched. Classification/prediction models - can be built on patient medical records data with known value of the target attribute (e.g. diagnosis). For new patients this classification models can suggest the value of the target attribute [3]. Another application of this predictive text mining approaches is to annotate patient records [5] and/or populate existing ontology with instances [10]. Clustering and descriptive data mining clustering and suitable visualization of discovered clusters of patients [5]. Moreover, descriptive data mining techniques may be employed in order to digestedly describe the main characteristics of patients from one cluster.
6 Dialog system produced classification/prediction models, clusters with their descriptions as well as discovered association rules (see e.g. [6]) may be used within a dialog system. This could be a support tool produced e.g. in form of an electronic service, which could help e.g. doctors when facing a new patient (ask for e.g. characterization of similar patients from the same cluster, predicted diagnosis, or known associations for given data about patient etc.). 5. COCLUSIOS In this paper we presented the functional architecture and results achieved in first experiments with a system designed and implemented for transformation of free-text patient health records into a structured, XML format. By using a combination of regular and linguistic we improved the quality of free-text documents transformation when using training and testing set of documents from the same hospital. When documents for training and testing are from different hospitals, improvement of the efficiency measures could not be observed. Presented system is not perfect, but transformation of electronic patient health records into a structured form is a big challenge and this system definitely means a first important step towards this goal. ACKOWLEDGEMETS The work presented in this paper was supported by the Slovak Grant Agency of Ministry of Education and Academy of Science of the Slovak Republic within the project Document classification and annotation for the Semantic web o. 1/1060/04 and by the German-Slovak research project DAAD o. 8/2004 Text Mining for Metadata Extraction and Semantic Retrieval. REFERECES [1] Furdik, K. (2003): Information retrieval in natural language making use of hypertext structures. Technical University of Kosice. PhD-thesis (in Slovak) [2] Hanzlicek, P. (2002): Development of Universal Electronic Health Record in Cardiology. Health Data in the Information Society: Surjan G., Engelbrecht R., Mcair P. (eds.) Amsterdam, IOS Press, pp [3] Machova, K. (2002): Machine Learning Principles and Algorithms. Elfa Press (in Slovak) [4] Pales, E. (1994): SAPFO Design of a Paraphrasing System for the Slovak Language. Bratislava, VEDA (in Slovak) [5] Paralic, J.; Bednar, P. (2004): Text Mining for Document Annotation and Ontology Support. Intelligent Systems at the Service of Mankind, Ubooks, Germany, pp [6] Rauch, J. (2001): Mining for Statistical Association Rules. In: Fong J., g M.K. (eds.): The 5 th Pacific-Asia Conference on Knowledge Discovery and Data Mining, University of Hong-Kong, pp [7] Schultz, S.; Hahn, U. (2001): Medical Knowledge Reengineering Converting Major Portions of the UMLS into Terminological Knowledge Base. Int. Journal on Medical Informatics, 64 (2-3), pp [8] Semecky, J. (2001): Multimedia electronic patient record in cardiology. Charles University in Prague. Diploma thesis (in Czech) [9] Van der Kam, W.; Moorman, P.W.; Koppejan-Mulder, M.J. (2000): Effects of Electronic Communication in General Practice. Int. Journal on Medical Informatics, 60 (1), pp [10] Maria Vargas-Vera, David Celjuska: Event Recognition on ews Stories and Semi-Automatic Population of an Ontology. IEEE/WIC/ACM Int. Conference on Web Intelligence (WI 2004), Beijing, China, pp
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationKnowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationHow the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
More informationSurvey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
More information<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany
Information Systems University of Koblenz Landau, Germany Semantic Multimedia Management - Multimedia Annotation Tools http://isweb.uni-koblenz.de Multimedia Annotation Different levels of annotations
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationRecognition and Privacy Preservation of Paper-based Health Records
Quality of Life through Quality of Information J. Mantas et al. (Eds.) IOS Press, 2012 2012 European Federation for Medical Informatics and IOS Press. All rights reserved. doi:10.3233/978-1-61499-101-4-751
More informationModel Driven Interoperability through Semantic Annotations using SoaML and ODM
Model Driven Interoperability through Semantic Annotations using SoaML and ODM JiuCheng Xu*, ZhaoYang Bai*, Arne J.Berre*, Odd Christer Brovig** *SINTEF, Pb. 124 Blindern, NO-0314 Oslo, Norway (e-mail:
More informationInternet of Things, data management for healthcare applications. Ontology and automatic classifications
Internet of Things, data management for healthcare applications. Ontology and automatic classifications Inge.Krogstad@nor.sas.com SAS Institute Norway Different challenges same opportunities! Data capture
More informationMaster Specialization in Knowledge Engineering
Master Specialization in Knowledge Engineering Pavel Kordík, Ph.D. Department of Computer Science Faculty of Information Technology Czech Technical University in Prague Prague, Czech Republic http://www.fit.cvut.cz/en
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationDistributed Knowledge Management based on Software Agents and Ontology
Distributed Knowledge Management based on Software Agents and Ontology Michal Laclavik 1, Zoltan Balogh 1, Ladislav Hluchy 1, Renata Slota 2, Krzysztof Krawczyk 3 and Mariusz Dziewierz 3 1 Institute of
More informationActionable Awareness. 5/12/2015 TEI Proprietary TEI Proprietary
Actionable Awareness Data - well defined, pedigreed, and connected. Information intelligently integrated data Knowledge carefully applied information to a subject area Actionable Awareness correctly applied
More informationACQUIRING, ORGANISING AND PRESENTING INFORMATION AND KNOWLEDGE ON THE WEB. Pavol Návrat
Computing and Informatics, Vol. 28, 2009, 393 398 ACQUIRING, ORGANISING AND PRESENTING INFORMATION AND KNOWLEDGE ON THE WEB Pavol Návrat Institute of Informatics and Software Engineering Faculty of Informatics
More informationFind the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
More informationUIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis
UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis Jan Hajič, jr. Charles University in Prague Faculty of Mathematics
More informationEXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION
EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION Anna Goy and Diego Magro Dipartimento di Informatica, Università di Torino C. Svizzera, 185, I-10149 Italy ABSTRACT This paper proposes
More informationFolksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationFlattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
More informationAn Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
More informationEnabling Business Experts to Discover Web Services for Business Process Automation. Emerging Web Service Technologies
Enabling Business Experts to Discover Web Services for Business Process Automation Emerging Web Service Technologies Jan-Felix Schwarz 3 December 2009 Agenda 2 Problem & Background Approach Evaluation
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationKey Technology Study of Agriculture Information Cloud-Services
Key Technology Study of Agriculture Information Cloud-Services Yunpeng Cui, Shihong Liu Key Laboratory of Digital Agricultural Early-warning Technology, Ministry of Agriculture, Beijing, The People s epublic
More informationResearch of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationMining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
More informationModeling Temporal Data in Electronic Health Record Systems
International Journal of Information Science and Intelligent System, 3(3): 51-60, 2014 Modeling Temporal Data in Electronic Health Record Systems Chafiqa Radjai 1, Idir Rassoul², Vytautas Čyras 3 1,2 Mouloud
More informationDigital archiving of scientific information Czech experience
Digital archiving of scientific information Czech experience P. Slavik, P. Mach, M. Snorek Czech Technical University in Prague Prague, Czech Republic Slavik mach snorek@fel.cvut.cz Abstract This paper
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
More informationSkills for Effective Business Communication: Efficiency, Collaboration, and Success
Skills for Effective Business Communication: Efficiency, Collaboration, and Success Michael Shorenstein Center for Communication Kennedy School of Government Harvard University September 30, 2014 I: Introduction
More informationElectronic Health Records in Continuous Shared Dental Care
Electronic Health Records in Continuous Shared Dental Care Taťjana Dostálová 1, Jana Zvárová 2, Zuzana Teuberová 1, Michaela Seydlová 1, Martin Pieš 2, Josef Špidlen 2 1 Department of Prosthodontics, First
More informationSOFTWARE ENGINEERING PROGRAM
SOFTWARE ENGINEERING PROGRAM PROGRAM TITLE DEGREE TITLE Master of Science Program in Software Engineering Master of Science (Software Engineering) M.Sc. (Software Engineering) PROGRAM STRUCTURE Total program
More informationSemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks
SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks Melike Şah, Wendy Hall and David C De Roure Intelligence, Agents and Multimedia Group,
More informationThe Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)
The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit
More informationBig Data and Text Mining
Big Data and Text Mining Dr. Ian Lewin Senior NLP Resource Specialist Ian.lewin@linguamatics.com www.linguamatics.com About Linguamatics Boston, USA Cambridge, UK Software Consulting Hosted content Agile,
More informationENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES
ENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES Maria Teresa Andrade FEUP / INESC Porto mandrade@fe.up.pt ; maria.andrade@inescporto.pt http://www.fe.up.pt/~mandrade/
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationWhy SBVR? Donald Chapin. Chair, OMG SBVR Revision Task Force Business Semantics Ltd Donald.Chapin@BusinessSemantics.com
Why SBVR? Towards a Business Natural Language (BNL) for Financial Services Panel Demystifying Financial Services Semantics Conference New York,13 March 2012 Donald Chapin Chair, OMG SBVR Revision Task
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationParsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationDISTRIBUTED ARCHITECTURE FOR ELECTRONIC HEALTH REFERRAL SYSTEM UTILIZING COMPUTATIONAL INTELLIGENCE FOR CLINICAL DECISION SUPPORT
DISTRIBUTED ARCHITECTURE FOR ELECTRONIC HEALTH REFERRAL SYSTEM UTILIZING COMPUTATIONAL INTELLIGENCE FOR CLINICAL DECISION SUPPORT By Majd Misbah Al-Zghoul Supervisor Dr. Majid Al-Taee, Prof. This Thesis
More informationHow To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationSoftware Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationClinical and research data integration: the i2b2 FSM experience
Clinical and research data integration: the i2b2 FSM experience Laboratory of Biomedical Informatics for Clinical Research Fondazione Salvatore Maugeri - FSM - Hospital, Pavia, italy Laboratory of Biomedical
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More informationWhat you can accomplish with IBMContent Analytics
What you can accomplish with IBMContent Analytics An Enterprise Content Management solution What is IBM Content Analytics? Alex On February 14-16, IBM s Watson computing system made its television debut
More informationOntology construction on a cloud computing platform
Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS
ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS Harrie Passier and Johan Jeuring Faculty of Computer Science, Open University of the Netherlands Valkenburgerweg 177, 6419 AT Heerlen,
More informationSEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK
SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK Antonella Carbonaro, Rodolfo Ferrini Department of Computer Science University of Bologna Mura Anteo Zamboni 7, I-40127 Bologna, Italy Tel.: +39 0547 338830
More informationUSING SPATIAL DATA MINING TO DISCOVER THE HIDDEN RULES IN THE CRIME DATA
USING SPATIAL DATA MINING TO DISCOVER THE HIDDEN RULES IN THE CRIME DATA Karel, JANEČKA 1, Hana, HŮLOVÁ 1 1 Department of Mathematics, Faculty of Applied Sciences, University of West Bohemia Abstract Univerzitni
More informationCAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
More informationSelected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms
Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká hladka@ufal.mff.cuni.cz
More informationISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining
A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,
More informationText Mining: The state of the art and the challenges
Text Mining: The state of the art and the challenges Ah-Hwee Tan Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore 119613 Email: ahhwee@krdl.org.sg Abstract Text mining, also known as text data
More informationUsing text mining to understand the call center customers claims
Data Mining VII: Data, Text and Web Mining and their Business Applications 177 Using text mining to understand the call center customers claims G. M. Caputo, V. M. Bastos & N. F. F. Ebecken COPPE Federal
More informationHow To Understand And Understand A Negative In Bbg
Some Aspects of Negation Processing in Electronic Health Records Svetla Boytcheva 1, Albena Strupchanska 2, Elena Paskaleva 2 and Dimitar Tcharaktchiev 3 1 Department of Information Technologies, Faculty
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationA Survey on Web Mining From Web Server Log
A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering
More informationActivity Mining for Discovering Software Process Models
Activity Mining for Discovering Software Process Models Ekkart Kindler, Vladimir Rubin, Wilhelm Schäfer Software Engineering Group, University of Paderborn, Germany [kindler, vroubine, wilhelm]@uni-paderborn.de
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationAutomatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
More informationDelivering Smart Answers!
Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved
More informationData Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
More informationA prototype infrastructure for D Spin Services based on a flexible multilayer architecture
A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103
More informationONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004
ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004 By Aristomenis Macris (e-mail: arism@unipi.gr), University of
More informationPresented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software
Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationThe Specific Text Analysis Tasks at the Beginning of MDA Life Cycle
SCIENTIFIC PAPERS, UNIVERSITY OF LATVIA, 2010. Vol. 757 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES 11 22 P. The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle Armands Šlihte Faculty
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationCOURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
More informationONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS
ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: hasni.neji63@laposte.net;
More informationA Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
More informationProcess Mining in Big Data Scenario
Process Mining in Big Data Scenario Antonia Azzini, Ernesto Damiani SESAR Lab - Dipartimento di Informatica Università degli Studi di Milano, Italy antonia.azzini,ernesto.damiani@unimi.it Abstract. In
More informationIEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationText Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
More informationCombining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery
Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University
More informationDomain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872
More informationIntegrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationExtraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
More informationIdentify Disorders in Health Records using Conditional Random Fields and Metamap
Identify Disorders in Health Records using Conditional Random Fields and Metamap AEHRC at ShARe/CLEF 2013 ehealth Evaluation Lab Task 1 G. Zuccon 1, A. Holloway 1,2, B. Koopman 1,2, A. Nguyen 1 1 The Australian
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationSelbo 2 an Environment for Creating Electronic Content in Software Engineering
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 3 Sofia 2009 Selbo 2 an Environment for Creating Electronic Content in Software Engineering Damyan Mitev 1, Stanimir
More informationA Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
More informationText Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk
Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,
More informationImpelling Heart Attack Prediction System using Data Mining and Artificial Neural Network
General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling
More information