Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning
|
|
|
- Peregrine Reeves
- 9 years ago
- Views:
Transcription
1 3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning Jianhong Wang1, a, Yousong Peng*, 1, b, Bin Liu1, c, Zhiqiang Wu1, d, Lizong Deng2, 3, e, and Taijiao Jiang*, 1, 2, 3, f 1 College of Computer Science and Electronic Engineering,Hunan University,Changsha,China 2 Center of System Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China 3 Center of Systems Medicine,Chinese Academy of Medical Sciences,SuZhou,China a [email protected], [email protected], [email protected], [email protected], e [email protected], [email protected], *Corresponding author Keywords: Chinese Electronic Medical Records, Information extraction, Named Entity Recognition, assertion classification, Machine Learning Abstract. With the rapid growth of electronic medical records (EMRs) in China, large amounts of clinical data have been accumulated. However, limited work for extracting information from EMRs in Chinese has been conducted. In this work, using manually annotated dataset of EMRs in Chinese, we investigated the clinical Named Entities Recognition (NER) based on Conditional Random Field (CRF) and further built a Support Vector Machine (SVM) classifier to determine their assertion status and evaluate the contributions of different features for assertion classification. For Chinese clinical NER, our CRF-based classifier achieved the best F-measure of 89.07%, while the SVM-based assertion classifier achieved a maximum F-measure of 94.10%. Our work suggests that machine learning methods are helpful in NER and assertion determination for Chinese medical clinical records. Introduction Electronic Medical Records (EMRs), generated in the process of clinical treatments, refer to systematized collections of patients clinical information stored in electronic medical records systems [1]. EMRs contain a range of data, including demographics, medical history, medication and allergies, immunization status, laboratory test results, radiology images, vital signs, personal statistics like age and weight, and billing information. A large amount of medical knowledge, closely related to patients, can be discovered through analyzing these medical records [2]. Generally, EMRs are always described in the form of natural language, from which mining patient-related health-care medical knowledge needs applications of information extraction related technologies. NER and assertion classification are the fundamental issues in information extraction of EMRs [2], which not only serves for Clinical Decision Support (CDS) research in medical informatics, but also supports for the user's healthy state modeling and personalized healthcare information services in Consumer Health Informatics. A number of information extraction studies on EMRs in English have been conducted. In 2010, the Center of Informatics for Integrating Biology and the Bedside (i2b2) at Partners Health Care System and Veterans Affairs (VA) Salt Lake City Health Care System organized a challenge in natural-language processing (NLP) for clinical data. Two of the main tasks for the challenge are extracting clinical concepts from natural language text and classifying modification (assertion) made about medical problems [3]. Most clinical NER systems achieved good performance using CRF algorithm for modeling[4-8], while SVM algorithm gets more attention for assertion classification [4-6,9,10]. With the rapid growth of EMRs in China, large amounts of clinical data have been accumulated. However, limited work for extracting information from EMRs in Chinese has been conducted. Lei et al. [11] compared different machine learning algorithms and various types of features for NER in The authors - Published by Atlantis Press 1503
2 Chinese admission notes. Recently, Wu et al. [14] designed a deep learning based NLP systems that could automatically learn useful feature representations from unlabeled corpora through unsupervised learning in Chinese clinical text. Most of work focused on clinical NER [11-14]. Besides, there is a lack of publicly available EMRs in Chinese for medical information extraction research. In this study, we manually annotated a corpus of Chinese typical EMRs and described a machine-learning based system to extract the clinical entities and their assertions. Furthermore, we investigated the contributions of different types of features for assertion classifiers. To the best of our knowledge, this is one of the first extensive studies for Chinese medical entity assertion classification. Datasets and Annotation A corpus of typical EMRs, provided by one Chinese electronic medical records system online ( were collected and collated. This corpus contains 200 internal medicine records. The information which each record includes is shown in Fig. 1. Fig. 1 A sample of a typical Chinese EMRs Recently, under the guidance of several professional doctors, Yang et al. developed an annotation guideline ( for Chinese EMRs according to that of i2b2 [3]. The annotation guideline defines six types of entities, including disease, disease type, complaint-symptom, test-result, test and treatment. For concepts of disease, complaint-symptom, test-result and treatment, eight assertions were further defined: present, absent, possible, conditional, hypothetical, family, history and occasional. Following the above annotation guidelines, two post-graduates of computer science independently marked all the entities and their assertions in our collated EMRs. To calculate the inter-rater agreement for annotation, 40 records were annotated by both annotators. The remaining 160 records were annotated by a single annotator only. The initial agreement between two annotators on 40 records as measured by kappa [15] was 0.926, which indicates the annotation is reliable. 1504
3 Methods CRF-based approaches for clinical NER. Chinese medical NER task requires determination of the boundaries of clinical entities and assignment of their entity types mentioned above. We converted this NER task to a classification task, for which the CRF algorithm was used ( Firstly, we transformed the annotated data into the BIESO format, in which each word was assigned into a label as follows: B=beginning of an entity, I=inside an entity, E=end of an entity S=single word entity and O=outside of an entity. Because the entity type disease type occurred rarely in our collated EMRs, it would be not used in our analyses. Twenty-one tags in total were generated: B-disease, B-complaintsymptom, B-testresult, B-test, B-treatment, I-disease, I-complaintsymptom, I-testresult, I-test, I-treatment, E-disease, E-complaintsymptom, E-testresult, E-test, E-treatment, S-disease, S-complaintsymptom, S-testresult, S-test, S-treatment, and O. Fig. 2 shows a sentence of annotated entities labeled with BIESO tags. Fig. 2 A sample of Chinese medical NER represented in BIESO format. Then, we defined the following four types of features to build the CRF classifier for NER task: 1) Single word features: individual Chinese character, punctuation, English alphabet and number. 2) Part of Speech (POS): We use NLRIR ( developed by Dr. Zhang Huaping, to get POS tags. 3) Dictionary features: We integrated ICD-10 and Chinese EMRs common terms which contain 3000 phrases into NLPIR. After analyzing concepts of diseases and symptoms appeared in Chinese EMRs, we discovered that most concepts consist of a body part or a subject, such as " 体 重 (weight)", and a basic disease or a symptom such as " 减 轻 (loss)". Then, a dictionary containing body parts, subjects and basic diseases and symptoms was further built for our CRF-based NER. 4) Section headings: An EMR is a semi-structured narrative text divided into many sections such as chief complaint, present illness and so on. Different sections record different clinical data. For example, the physical examination has concepts of test and abnormal test results, while the section of preliminary diagnosis contains disease concepts. Therefore, we manually reviewed some records and defined 12 different section headings as section headings feature. SVM-based approaches for assertion classifier. Assertion classification is to assign one of the eight possible assertion labels mentioned above to the entity type of disease, complaint-symptom, test-result, test and treatment. The SVM algorithm was used to conduct the classification, which was implemented using libsvm ( with a linear kernel. Four types of features were used as follows: 1) Context feature: a+4 word window, i.e., words that appear within t+4 word window of the target concept. Given the concept at the nth position in the sentence, the +4 word window captures the words found in the (t-1)th, (t-2)th, (t-3)th, (t-4)th,(t+1)th, (t+2)th, (t+3)th, and (t+4)th positions in the sentence; 2) Section headings: 12 different section headings were used as mentioned above. 3) Cues information feature: before or after a target concept, some word cues can indicate the target concept is present or absent. For example, in the sentence of 双 侧 眼 睑 无 浮 肿 (no bilateral eyelid edema), 浮 肿 (edema) is a target concept, and 无 (no) before 浮 肿 (edema) indicates 浮 肿 (edema) is absent for patient. Then, we developed a system to automatically extract these cues. 4) POS features. The above features were transformed into binary features. For each feature, the binary feature vector lists all possible values of that feature in the corpus as its columns, and for each target medical problem to be classified (row), it sets the columns observed to 1, leaving the rest at zero. If the target has no value for a feature, then all columns representing this feature will be set to zero. 1505
4 Experiments and evaluation. The internal medicine typical records mentioned above were divided into two subsets: two-thirds (120 records) for training and one-third (80 records) for test. The parameters of CRF classifier and SVM classifier were optimized using the training set via the five-fold cross-validations. Then we tested the classifiers using the independent test set. The precision, recall and F-measure were used to measure the performance of the classifiers. To evaluate the contributions of different types of features for assertion classification, we started with the context feature firstly, and then added additional types of features and reported corresponding results. Results and Discussion Table 1 describes the statistics of entities in the annotated EMRs used in our study. There are entities in the training dataset and 8026 entities in the test dataset. Distribution of different types of entities' assertions in the training set and test set were shown in Table 2. The proportion of present, absent, possible and history assertions is far larger than other assertion types. In our annotated dataset, the number of hypothetical, conditional and family assertion is lower than 20, so we ignore these assertion types. Table 1 Distribution of different types of entities in training set and test set Data set disease complaintsymptom testresult test treatment All training test total Table 2 Distribution of different types of assertions in training set and test set Data set Present absent possible occasional history All training test total The detailed results of CRF classifier for each entity type were shown in Table 3. F-measures ranged from 78.47% to 94.58% among five types of entity. Performance was best for the type test and worst for the type treatment. For each type of entity, precision was higher than recall. Table 3 Result of CRF-based named entity recognition for each entity Entity type Precision recall F-measure disease 90.22% 82.66% 86.27% complaint-symptom 87.24% 78.40% 82.58% test-result 94.65% 94.51% 94.58% test 84.85% 82.77% 83.80% treatment 92.90% 67.92% 78.47% overall 91.31% 86.94% 89.07% Our best performance results are similar to the results in [13] for discharge summary and the NER task in i2b challenge [3, 8]. In this study, we found that our CRF model could correctly identify the entity that did not appear in the training set. The reason may mainly lie in the dictionary features used in our study. In Chinese EMRs, most entities have their syntactic structures. For example, the concept of " 左 侧 肢 体 (left limb) 无 力 (weakness)" is made up of a body part " 左 侧 肢 体 (left limb)" and a basic symptom " 无 力 (weakness)". In our dictionary, we defined these common body parts and basic symptoms and diseases. Table 4 shows the performance of SVM classifier in assertion determination when using different feature sets. Context feature set was observed to contribute most to assertion classification with the F-measure as high as 92.07%. Both section headings feature and cues information feature improved assertion classifier performance. For example, section headings 1506
5 feature improved F-measure from 92.07% to 93.14%, while cues information improved F-measure from 92.07% to 93.56%. POS did not further improve assertion classification performance. Table 4 Results of SVM-based assertion classifier when different sets of features were used Feature set F-measure context feature 92.07% context feature+section headings 93.14% context feature+pos 92.18% context feature+cues information 93.56% context feature+section headings+pos+cues information 94.10% Table 5 Detailed results of the best SVM-based Assertion classifier for each entity type Assertion category Precision recall F-measure present 95.68% 92.95% 94.30% absent 96.93% 93.62% 95.25% occasional 92.65% 89.51% 91.05% possible 78.86% 63.47% 70.33% history 81.31% 74.37% 77.69% overall 95.06% 93.15% 94.10% The detailed results of the best SVM assertion classifier for each assertion types are shown in Table 5.The overall F-measure is 94.10%.The classifier achieved the best performances for the present and absent assertion, with F-measure equaling to 94.30% and 95.25%, while it performed worst for the possible assertion (F-measure=70.33%). This may be due to the fact that human determination of possible assertions was not always straightforward, as reported by Jiang s work [3]. Conclusion In this study, we investigated clinical NER from EMRs in Chinese based on CRF algorithm and further built a SVM-based classifier to determine assertion status and evaluate the effects of different features for assertion classification. For the former, our CRF classifier achieved the best F-measure of 89.07%, while for the latter our SVM assertion classifier get a maximum F-measure of 94.10%. Although our experiment achieved a good performance, there are still some limitations. The data used in our study were limited to internal medicine records. In the future, we will cooperate with hospitals to get more medical records to improve our models. Overall, our work suggests machine-learning methods could be helpful in NER and assertion determination for the Chinese EMRs. Acknowledgements This study was supported by National Natural Science Foundation ( and ), and the Young Teachers Development Plan of Hunan University to YS ( ) and the International Scientific and Technological Cooperation project (2014DFB30010).We would like to thank the members of the Jiang lab for their help and deliberations. We would also like to thank the anonymous reviewers for their valuable comments and suggestions. No competing interests exist. References [1] Information on [2] Yang Jinfeng,Yu Qiubin, Guan Yi, Jiang ZhiPeng: An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction. Acta Automatica Sinica. Vol. 40 (2014), p
6 [3] Özlem Uzuner, South B R, Shen S, Duvall S L: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. Vol. 18 (2011), p [4] Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC and Xu H: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. Vol. 18 (2011), p [5] Bruijn B D, Cherry C, Kiritchenko S, Martin J and Zhu X: Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b J Am Med Inform Assoc. Vol. 18 (2011), p [6] Grouin C, Abacha A B, Bernhard D, CARAMBA: concept, assertion, and relation annotation using machine-learning based approaches. In: Proceedings of the 2010 I2B2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA (2010). [7] Uzuner O, Solti I, Cadag E: Extracting medication information from clinical text. J Am Med Inform Assoc. Vol. 17 (2010), p [8] Doan S, Collier N, Hua X, Duy P H and Tu M P: Recognition of medication information from discharge summaries using ensembles of classifiers. Bmc Medical Informatics & Decision Making. Vol. 12 (2012), p [9] Uzuner O, Zhang X, Sibanda T: Machine learning and rule based approaches to assertion classification. J Am Med Inform Assoc. Vol. 16 (2009), p [10] Clark C, Aberdeen J, Coarr M, Tresner-Kirsch D, Wellner B, Yeh A, Hirschman L: MITRE system for clinical assertion status classification. J Am Med Inform Assoc. Vol. 18 (2011), p [11] Lei J, Tang B, Lu X, Gao K, Min J, Hua X: A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inform Assoc. Vol. 21 (2014), p [12] Wang, Shi Kun, L. I. Shao zi, and T. S. Chen: Recognition of Chinese Medicine Named Entity Based on Condition Random Field. Journal of Xiamen University. Vol. 48 (2009), p [13] YAN yang, WEN Dun-wei, WANG Yun-ji, WANG Ke: Named entity recognition in Chinese medical records based on cascaded conditional random field. Journal of Jilin University (Engineering and Technology Edition). Vol. 44 (2014), p [14] Wu Y, Jiang M, Lei J, Xu H: Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network. Studies in Health Technology & Informatics. Vol. 216 (2015), p [15] Hripcsak, George, and A. S. Rothschild: Agreement, the F-Measure, and Reliability in Information Retrieval. J Am Med Inform Assoc. Vol. 12 (2005), p
Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model *
Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model * Buzhou Tang 1,2, Yonghui Wu 1, Min Jiang 1, Joshua C. Denny 3, and Hua Xu 1,* 1 School of Biomedical
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
A Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA [email protected] Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA [email protected] Abstract
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Sentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: [email protected] Mihaela Cocea Email: [email protected] Sanaz Fallahkhair Email:
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
Healthcare Data Mining: Prediction Inpatient Length of Stay
3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Design call center management system of e-commerce based on BP neural network and multifractal
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
2015 CMB Request for Proposals (RFP) Innovations in e-learning for Health Professional Education
2015 CMB Request for Proposals (RFP) Innovations in e-learning for Health Professional Education Invitation for Proposals The China Medical Board ( 美 国 中 华 医 学 基 金 会 ) invites eligible Chinese universities
A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set. Arya Tafvizi
A De-identifier For Electronic Medical Records Based On A Heterogeneous Feature Set by Arya Tafvizi S.B., Physics, MIT, 2010 S.B., Computer Science and Engineering, MIT, 2011 Submitted to the Department
Research on the UHF RFID Channel Coding Technology based on Simulink
Vol. 6, No. 7, 015 Research on the UHF RFID Channel Coding Technology based on Simulink Changzhi Wang Shanghai 0160, China Zhicai Shi* Shanghai 0160, China Dai Jian Shanghai 0160, China Li Meng Shanghai
An Imbalanced Spam Mail Filtering Method
, pp. 119-126 http://dx.doi.org/10.14257/ijmue.2015.10.3.12 An Imbalanced Spam Mail Filtering Method Zhiqiang Ma, Rui Yan, Donghong Yuan and Limin Liu (College of Information Engineering, Inner Mongolia
Network Intrusion Detection System and Its Cognitive Ability based on Artificial Immune Model WangLinjing1, ZhangHan2
3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Network Intrusion Detection System and Its Cognitive Ability based on Artificial Immune Model
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
Tibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation
Tibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation Huidan Liu Institute of Software, Chinese Academy of Sciences, Graduate University of the [email protected]
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Development of a Web-based Information Service Platform for Protected Crop Pests
Development of a Web-based Information Service Platform for Protected Crop Pests Chong Huang 1, Haiguang Wang 1 1 Department of Plant Pathology, China Agricultural University, Beijing, P. R. China 100193
Research of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
End-to-End Sentiment Analysis of Twitter Data
End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India [email protected],
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW
CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW 1 XINQIN GAO, 2 MINGSHUN YANG, 3 YONG LIU, 4 XIAOLI HOU School of Mechanical and Precision Instrument Engineering, Xi'an University
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Research on Sentiment Classification of Chinese Micro Blog Based on
Research on Sentiment Classification of Chinese Micro Blog Based on Machine Learning School of Economics and Management, Shenyang Ligong University, Shenyang, 110159, China E-mail: [email protected] Abstract
A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM
Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian
Journal of Chemical and Pharmaceutical Research, 2014, 6(3):34-39. Research Article. Analysis of results of CET 4 & CET 6 Based on AHP
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(3):34-39 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis of results of CET 4 & CET 6 Based on AHP
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,
Find the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Electronic Medical Record Mining. Prafulla Dawadi School of Electrical Engineering and Computer Science
Electronic Medical Record Mining Prafulla Dawadi School of Electrical Engineering and Computer Science Introduction An electronic health record is a systematic collection of electronic health information
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Combining structured data with machine learning to improve clinical text de-identification
Combining structured data with machine learning to improve clinical text de-identification DT Tran Scott Halgrim David Carrell Group Health Research Institute Clinical text contains Personally identifiable
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
PPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
Medical Big Data Interpretation
Medical Big Data Interpretation Vice president of the Xiangya Hospital, Central South University The director of the ministry of mobile medical education key laboratory Professor Jianzhong Hu BIG DATA
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
An Iterative Link-based Method for Parallel Web Page Mining
An Iterative Link-based Method for Parallel Web Page Mining Le Liu 1, Yu Hong 1, Jun Lu 2, Jun Lang 2, Heng Ji 3, Jianmin Yao 1 1 School of Computer Science & Technology, Soochow University, Suzhou, 215006,
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
SVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
Lasso-based Spam Filtering with Chinese Emails
Journal of Computational Information Systems 8: 8 (2012) 3315 3322 Available at http://www.jofcis.com Lasso-based Spam Filtering with Chinese Emails Zunxiong LIU 1, Xianlong ZHANG 1,, Shujuan ZHENG 2 1
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research [email protected]
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research [email protected] Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record
Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Scott L. DuVall Jun 27, 2014 1 Julie Lynch Vickie Venne Dawn Provenzale
Not all NLP is Created Equal:
Not all NLP is Created Equal: CAC Technology Underpinnings that Drive Accuracy, Experience and Overall Revenue Performance Page 1 Performance Perspectives Health care financial leaders and health information
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL
Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra
A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan
, pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Author Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
Research and Implementation of View Block Partition Method for Theme-oriented Webpage
, pp.247-256 http://dx.doi.org/10.14257/ijhit.2015.8.2.23 Research and Implementation of View Block Partition Method for Theme-oriented Webpage Lv Fang, Huang Junheng, Wei Yuliang and Wang Bailing * Harbin
A Network Simulation Experiment of WAN Based on OPNET
A Network Simulation Experiment of WAN Based on OPNET 1 Yao Lin, 2 Zhang Bo, 3 Liu Puyu 1, Modern Education Technology Center, Liaoning Medical University, Jinzhou, Liaoning, China,[email protected] *2
Tracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey [email protected] b Department of Computer
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
Natural Language Processing for Clinical Informatics and Translational Research Informatics
Natural Language Processing for Clinical Informatics and Translational Research Informatics Imre Solti, M. D., Ph. D. [email protected] K99 Fellow in Biomedical Informatics University of Washington Background
Scientific Computing When Big data Meets Real World Clinical Research
Scientific Computing When Big data Meets Real World Clinical Research Guo-Zheng Li 1,2, Xue-Wen Zuo 2 and Baoyan Liu 1* 1 China Academy of Chinese Medical Science, Beijing 100700, China 2 Department of
Exploration and Visualization of Post-Market Data
Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
Automated Problem List Generation from Electronic Medical Records in IBM Watson
Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
