Interactive Chinese Question Answering System in Medicine Diagnosis

Size: px
Start display at page:

Download "Interactive Chinese Question Answering System in Medicine Diagnosis"

Transcription

1 Interactive Chinese ing System in Medicine Diagnosis Xipeng Qiu School of Computer Science Fudan University Jiatuo Xu Shanghai University of Traditional Chinese Medicine Abstract In this paper, we propose a general framework for the interactive question answering system in medical diagnosis, which can interact simply with user to get more refined question descriptions and return answers. The system first gets FAQ pairs from cqa website, and builds the medical ontology with incremental methods. Then it analyzes the question and enquires user for the lacking information. After getting user s feedbacks, it performs question retrieval and extracts answers. The experiment shows our system has better performance with user s feedbacks. 1. Introduction Automatic question answering (QA) is an important research topic in information retrieval and natural language processing fields [39, 40], which is an alternative to the keyword based information retrieval system, like Google 1, Baidu 2. The input of a QA system is a question, and the output is the corresponding answers extracted from a large corpus or web[20]. However, these system cannot deal with some complicated questions which are related with domain knowledge, such as medical domain. Fig. 1 shows the general framework of question answering in the open domain. To alleviate this problem, we can resort to online large scale FAQ archive for specific domains. In recent years, the community-based question answering services (cqa) have become very popular, such as Baidu Zhidao 3. Instead of finding answer by forums or search engines, users can post their question on cqa websites and wait the other people to answer it. While forums focus on the discussion and communication between users, cqa services focus on answering the questions of users. Therefore, users can get a faster response in cqa websites. These cqa websites also provide an interface to retrieval the answered questions, which are almost based on keyword search engine. So it is not still enough to offer the exact information to user. The user also need consider some appropriate keywords to represent his needs. Besides, the good answers are often mingle with large of bad or wrong answers. Therefore, the major issue is to find the exact one when the answers of many complicated question already exist. There are some related works, including question suggestion, answers qualities, question answering pairs extraction, etc[14, 18, 23, 26, 32, 9, 27]. In this paper, we propose a general framework for the interactive chinese question answering system in medical diagnosis, which can interact simply with user to get refined question descriptions and return user the extracted answers. The system first gets FAQ pairs from cqa website, and collects the medical ontology with incremental method. Then it analyzes the question and enquires user about the lacking information. After getting user s feedbacks, it performs question retrieval and extracts answers. In the rest of the paper, we first describe our system in section 2, and evaluate it by the experiments in section 3. Finally, we give the conclusions in section System Framework In this section, we introduce our system for the interactive chinese question answering system in medical diagnosis Topical Crawler Topical crawlers play an important role in domain search engines. Topical crawlers can start with some seed keywords or urls and gather the web pages which have similar content with seeds [35] [28, 5]. The context is one of the most useful features, which can guide crawler to locate highly relevant target pages. In our system, we collect medical webpages by analyzing the anchor text attached to hyperlinks. We first collect the anchor texts with the corresponding categories from

2 Query Generation Classification Semantic Web Retrieval Ranking Extraction WWW Figure 1. The flowchart of the open domain question answering system two chinese cqa websites, which provide the question categories. Then we select the anchor text with categories related to the medical keywords, such as 医 疗 / 疾 病. Then we build a two-class classifier to classify the anchor texts to medical or non-medical texts. The classifier we used is naive Bayes with multinomial distribution[17] Medical Ontology Construction To take advantage of the medical domain knowledges, we need to establish the ontology about the medical terms, concepts, entities and their relations. Due to the difficulty in collect knowledges manually, we use an automatic method to collect them. There are already some works to extract information within the collected corpus automatically[7, 37]. The objective of information extraction (IE) is to extract certain pieces of information from text that are related to a prescribed set of related concepts. We first collect some initial information, which includes names of drug, symptom, disease and the relations between them manually. Then we build the medical ontology with information extraction methods QA Pairs Extraction Since there are many methods to extract the best answers for a question in cqa websites [26, 15]. The answer quality problem is important when there are many duplicated questions, or wrong questions. These questions have answers with varying quality levels, therefore it is not enough to measure relevance alone and the quality of answers must be considered together. We use the features to predicate to exact the best answer, which are described in [15]. These features includes: er s Acceptance Ratio, Length, er s Self Evaluation, er s Activity Level, er s Category Specialty, Users Recommendation, Number of s In the general question analysis system, the first step is question classification[29, 24, 10, 44]. The categories is consistent with entity extraction in the latter steps. However, there are some difficulties in chinese medical QA system. First, it is very different between English and Chinese question sentence. Second, most questions are not factbased and are complex to be categoried. In our system we build an the question analysis model with medical ontology[13]. It first analyzes the focus words in the question, and finds the related concepts in medical ontology. Then it classifies the question to a category and decides what is missing information for the question Interactive Feedback A user often input a question with just mainly symptom, but it is not often enough to get cause of disease. For example, 有 什 么 方 法 能 治 疗 头 晕?There are many reasons to lead to dizziness, and the corresponding treatments vary greatly with different reasons or state of health. To get the exact answers, the user are asked to provide some extra information, such as his age, other symptoms, etc. With the user provided symptoms, the system firstly gets the related symptoms from the collected medical knowledge. Then the system interact with user to ensure all signs of his disease.

3 Ranking Feedback Lacking Information Type Focus s Filtering Candidates Extraction Auxiliary Information QA Pairs Medical Ontology WWW cqa Websties Topic Web Crawler Figure 2. The flowchart of the interactive chinese question answering system There are also some researches on interactive question answering[11, 12, 25] FAQ Retrieval Giving a FAQ corpus, there is still a problem to retrieve useful information for the user s questions. There are many works to improve the performance of FAQ retrieval[41, 22, 2, 19, 4, 3, 14, 16, 43, 4]. An importance problem is how to calculate the similarity between user s question and a FAQ pair, which requires some semantic analysis. However, measuring semantic similarities between questions is not trivial. Sometimes, two questions that have the same meaning use very different wording. For example, Q1: 糖 尿 病 患 者 长 期 服 用 什 么 药 比 较 有 效, 副 作 用 比 较 小? and Q2: 有 什 么 能 有 效 降 低 血 糖 并 且 对 身 体 无 害 的 药? have almost the identical meaning but they are lexically very different. Similarity measures developed for document retrieval work poorly when there is little word overlap. Thus, if there is the QA pair of Q2 in FAQ corpus, but the user ask the question Q1. Then, he could not get answer because Q1 and Q2 are almost different with traditional information retrieval method. A solution for this issue is query expansion[31, 38, 42]. In our system, we expand the query by the domain ontology. For the name of disease, we add some keywords about its corresponding symptoms Extraction In cqa websites, the repliers often provide background or related informations for the questions, which are useful to help questioner to find out the fact himself. But sometimes, especially for the factoid and list questions, the user need the exact answers instead of the related pieces of answers. For example, 请 问 糖 尿 病 的 症 状 有 哪 些?. So we need extract the answers from the related informations[8, 33, 34, 45]. We first extract the entities from these informations, and classify them to the different entity categories, such as Person, Location, Organization, Durations, Quantities and Dates, etc[1]. Then we score the entities and filtering them with a threshold. Entity scores have two components. The first component is whether or not the entity s category matches the query s

4 category. The second component of the entity score is based on the frequency and position of occurrences of a given entity within the retrieved passages[1]. In our system, we use conditional random fields [21, 30] to label the entities and its corresponding categories Re-ranking Before return answers to user, the system need re-rank the answers to improve the system performance. For example, removing redundancy answers [6]. We can use more features to [36] to judge the scores for each answers candidates. 3. Experiments We implement our system and collect about 84,000 QA pairs in medical domain from cqa websites: Baidu Zhidao 4, WenWen 5. We evaluate our results with mean precision at rank 1 (P@1), which is the percentage of questions with the correct answer on the first position. We use the keywords query as the baseline system. These keywords are just the terms in question. We select randomly 100 questions and evaluate the qualities of answers manually. Table 1. Results of different systems with P@1 Systems P@1 Baseline 79% No feedback 82% Feedback 87% Table 1 shows the results of our system. The feedback of user can improve the answer quality greatly. 4. Conclusion In this paper, we propose a framework of the interactive question system in medical domain. It integrates the question analysis, query expansion, ontology construction, answer extraction and answer ranking. We also address the difficulties in each part and the preliminary solutions. The proposed framework is also applied for the other domain, such as music, travel Acknowledgement This work was supported by the National High Technology Research and Development Program of China (863 Program)(No.2007AA02Z429, the Natural Science Foundation of China (No and ). References [1] S. Abney, M. Collins, and A. Singhal. extraction. Proceedings of the sixth conference on Applied natural language processing, pages , [2] R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval. Addison-Wesley Harlow, England, [3] R. Burke, K. Hammond, and J. Kozlovsky. Knowledgebased information retrieval from semi-structured text. AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, pages 19 24, [4] R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. answering from frequently asked question files: Experiences with the faq finder system. AI Magazine, 18(2):57 66, [5] S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. Proceedings of the 11th international conference on World Wide Web, pages , [6] C. Clarke, G. Cormack, and T. Lynam. Exploiting redundancy in question answering. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages , [7] J. Cowie and W. Lehnert. Information extraction. Communications of the ACM, 39(1):80 91, [8] D. Demner-Fushman and J. Lin. Knowledge extraction for clinical question answering: Preliminary results. Proceedings of the AAAI-05 Workshop on ing in Restricted Domains, pages 9 13, [9] S. Ding, G. Cong, C.-Y. Lin, and X. Zhu. Using conditional random fields to extract contexts and answers of questions from online forums. In Proceedings of ACL-08: HLT, pages , Columbus, Ohio, June Association for Computational Linguistics. [10] J. Ely, J. Osheroff, P. Gorman, M. Ebell, M. Chambliss, E. Pifer, and P. Stavri. A taxonomy of generic clinical questions: classification study, [11] T. Hao, D. Hu, L. Wenyin, and Q. Zeng. Semantic patterns for user-interactive question answering. CONCURRENCY AND COMPUTATION, 20(7):783, [12] S. Harabagiu, A. Hickl, J. Lehmann, and D. Moldovan. Experiments with interactive question-answering. Ann Arbor, 100, [13] U. Hermjakob. Parsing and question classification for question answering. Proceedings of the Workshop on ing at the Conference ACL-2001, [14] J. Jeon, W. Croft, and J. Lee. Finding similar questions in large question and answer archives. Proceedings of the 14th ACM international conference on Information and knowledge management, pages 84 90, 2005.

5 [15] J. Jeon, W. Croft, J. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages , [16] V. Jijkoun and M. de Rijke. Retrieving answers from frequently asked questions pages on the web. Proceedings of the 14th ACM international conference on Information and knowledge management, pages 76 83, [17] M. Jordan. Learning in Graphical Models. Kluwer Academic Publishers, [18] P. Jurczyk and E. Agichtein. Discovering authorities in question answer communities by using link analysis. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages , [19] H. Kim and J. Seo. High-performance faq retrieval using an automatic clustering method of query logs. Information Processing and Management, 42(3): , [20] C. Kwok, O. Etzioni, and D. Weld. Scaling question answering to the web. Proceedings of the 10th international conference on World Wide Web, pages , [21] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML 01: Proceedings of the Eighteenth International Conference on Machine Learning, pages , San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. [22] C. Lee. Intention Extraction and Semantic Matching for Internet FAQ Retrieval. PhD thesis, Master Thesis, Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, ROC, [23] C. LENGELER, D. SAVIGNY, H. MSHINDA, C. MAY- OMBANA, S. TAYARI, C. HATZ, A. DEGRÉMONT, and M. TANNER. Community-based questionnaires and health statistics as tools for the cost-efficient identification of communities at risk of urinary schistosomiasis. International Journal of Epidemiology, 20(3): , [24] X. Li and D. Roth. Learning question classifiers. Proceedings of the 19th International Conference on Computational Linguistics, pages , [25] J. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh, B. Katz, and D. Karger. What makes a good answer? the role of context in question answering. Human-Computer Interaction, [26] X. Liu, W. Croft, and M. Koll. Finding experts in community-based question-answering services. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages ACM New York, NY, USA, [27] Y. Liu and E. Agichtein. You ve got answers: Towards personalized models for predicting success in community question answering. In Proceedings of ACL-08: HLT, Short Papers, pages , Columbus, Ohio, June Association for Computational Linguistics. [28] F. Menczer, G. Pant, P. Srinivasan, and M. Ruiz. Evaluating topic-driven web crawlers. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages , [29] D. Metzler and W. Croft. of statistical question classification for fact-based questions. Information Retrieval, 8(3): , [30] F. Peng, F. Feng, and A. McCallum. Chinese segmentation and new word detection using conditional random fields. Proceedings of the 20th international conference on Computational Linguistics, [31] Y. Qiu and H. Frei. Concept based query expansion. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages , [32] B. Smyth, E. Balfe, J. Freyne, P. Briggs, M. Coyle, and O. Boydell. Exploiting query repetition and regularity in an adaptive community-based web search engine. User Modeling and User-Adapted Interaction, 14(5): , [33] R. Srihari and W. Li. A question answering system supported by information extraction. Proceedings of the sixth conference on Applied natural language processing, pages , [34] R. Srihari, W. Li, and N. CYMFONY. Information extraction supported question answering. NIST SPECIAL PUBLI- CATION SP, pages , [35] P. Srinivasan, F. Menczer, and G. Pant. A General Evaluation Framework for Topical Crawlers. Information Retrieval, 8(3): , [36] M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large online QA collections. In Proceedings of ACL-08: HLT, pages , Columbus, Ohio, June Association for Computational Linguistics. [37] J. Turmo, A. Ageno, and N. Català. Adaptive information extraction. ACM Computing Surveys (CSUR), 38(2), [38] E. Voorhees. Query expansion using lexical-semantic relations. Springer-Verlag New York, Inc. New York, NY, USA, [39] E. Voorhees. The trec-8 question answering track report. NIST SPECIAL PUBLICATION SP, pages 77 82, [40] E. Voorhees. Overview of the trec 2003 question answering track. Proceedings of the Twelfth Text REtrieval Conference (TREC 2003), 142, [41] C. Wu, J. Yeh, and Y. Lai. Semantic segment extraction and matching for internet faq retrieval. IEEE TRANS- ACTIONS ON KNOWLEDGE AND DATA ENGINEERING, pages , [42] J. Xu and W. Croft. Query expansion using local and global document analysis. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4 11, [43] S. Yang, F. Chuang, and C. Ho. Ontology-supported faq processing and ranking techniques. Journal of Intelligent Information Systems, 28(3): , [44] W. Zhang and T. Chen. Classification based on symmetric maximized minimal distance in subspace (SMMS). In Proc. of IEEE Conf. on Comput. Vision and Pattern Recogn. (CVPR), [45] Z. Zheng. bus question answering system. Proceedings of the second international conference on Human Language Technology Research, pages , 2002.

Searching Questions by Identifying Question Topic and Question Focus

Searching Questions by Identifying Question Topic and Question Focus Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn

More information

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Subordinating to the Majority: Factoid Question Answering over CQA Sites Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China [email protected] http://www.ict.ac.cn/

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

How To Cluster On A Search Engine

How To Cluster On A Search Engine Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

More information

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

More information

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of

More information

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services 21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media ABSTRACT Jiang Bian College of Computing Georgia Institute of Technology Atlanta, GA 30332 [email protected] Eugene

More information

Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

More information

Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University [email protected] Aixin Sun School

More information

SEARCHING QUESTION AND ANSWER ARCHIVES

SEARCHING QUESTION AND ANSWER ARCHIVES SEARCHING QUESTION AND ANSWER ARCHIVES A Dissertation Presented by JIWOON JEON Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for

More information

Finding Expert Users in Community Question Answering

Finding Expert Users in Community Question Answering Finding Expert Users in Community Question Answering Fatemeh Riahi Faculty of Computer Science Dalhousie University [email protected] Zainab Zolaktaf Faculty of Computer Science Dalhousie University [email protected]

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Understanding and Summarizing Answers in Community-Based Question Answering Services

Understanding and Summarizing Answers in Community-Based Question Answering Services Understanding and Summarizing Answers in Community-Based Answering Services Yuanjie Liu 1, Shasha Li 2, Yunbo Cao 1,3, Chin-Yew Lin 3, Dingyi Han 1, Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai,

More information

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), [email protected], Velalar College of Engineering and Technology,

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

More information

Teaching in School of Electronic, Information and Electrical Engineering

Teaching in School of Electronic, Information and Electrical Engineering Introduction to Teaching in School of Electronic, Information and Electrical Engineering Shanghai Jiao Tong University Outline Organization of SEIEE Faculty Enrollments Undergraduate Programs Sample Curricula

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra

More information

Identifying Best Bet Web Search Results by Mining Past User Behavior

Identifying Best Bet Web Search Results by Mining Past User Behavior Identifying Best Bet Web Search Results by Mining Past User Behavior Eugene Agichtein Microsoft Research Redmond, WA, USA [email protected] Zijian Zheng Microsoft Corporation Redmond, WA, USA [email protected]

More information

A Framework of User-Driven Data Analytics in the Cloud for Course Management

A Framework of User-Driven Data Analytics in the Cloud for Course Management A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer

More information

The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b 3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

More information

Removing Web Spam Links from Search Engine Results

Removing Web Spam Links from Search Engine Results Removing Web Spam Links from Search Engine Results Manuel EGELE [email protected], 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Importance of Domain Knowledge in Web Recommender Systems

Importance of Domain Knowledge in Web Recommender Systems Importance of Domain Knowledge in Web Recommender Systems Saloni Aggarwal Student UIET, Panjab University Chandigarh, India Veenu Mangat Assistant Professor UIET, Panjab University Chandigarh, India ABSTRACT

More information

Incorporate Credibility into Context for the Best Social Media Answers

Incorporate Credibility into Context for the Best Social Media Answers PACLIC 24 Proceedings 535 Incorporate Credibility into Context for the Best Social Media Answers Qi Su a,b, Helen Kai-yun Chen a, and Chu-Ren Huang a a Department of Chinese & Bilingual Studies, The Hong

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

The Impact of Query Suggestion in E-Commerce Websites

The Impact of Query Suggestion in E-Commerce Websites The Impact of Query Suggestion in E-Commerce Websites Alice Lee 1 and Michael Chau 1 1 School of Business, The University of Hong Kong, Pokfulam Road, Hong Kong [email protected], [email protected] Abstract.

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Query term suggestion in academic search

Query term suggestion in academic search Query term suggestion in academic search Suzan Verberne 1, Maya Sappelli 1,2, and Wessel Kraaij 2,1 1. Institute for Computing and Information Sciences, Radboud University Nijmegen 2. TNO, Delft Abstract.

More information

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives 7 XIN CAO and GAO CONG, Nanyang Technological University BIN CUI, Peking University CHRISTIAN S.

More information

A Rule-Based Short Query Intent Identification System

A Rule-Based Short Query Intent Identification System A Rule-Based Short Query Intent Identification System Arijit De 1, Sunil Kumar Kopparapu 2 TCS Innovation Labs-Mumbai Tata Consultancy Services Pokhran Road No. 2, Thane West, Maharashtra 461, India 1

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

A Comparative Approach to Search Engine Ranking Strategies

A Comparative Approach to Search Engine Ranking Strategies 26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection

Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Jian Qu, Nguyen Le Minh, Akira Shimazu School of Information Science, JAIST Ishikawa, Japan 923-1292

More information

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

A Comparative Study on Sentiment Classification and Ranking on Product Reviews A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan

More information

Link Analysis and Site Structure in Information Retrieval

Link Analysis and Site Structure in Information Retrieval Link Analysis and Site Structure in Information Retrieval Thomas Mandl Information Science Universität Hildesheim Marienburger Platz 22 31141 Hildesheim - Germany [email protected] Abstract: Link

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

A Survey on Product Aspect Ranking Techniques

A Survey on Product Aspect Ranking Techniques A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian

More information

The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

More information

Improving Question Retrieval in Community Question Answering Using World Knowledge

Improving Question Retrieval in Community Question Answering Using World Knowledge Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

More information

Discovering and Querying Hybrid Linked Data

Discovering and Querying Hybrid Linked Data Discovering and Querying Hybrid Linked Data Zareen Syed 1, Tim Finin 1, Muhammad Rahman 1, James Kukla 2, Jeehye Yun 2 1 University of Maryland Baltimore County 1000 Hilltop Circle, MD, USA 21250 [email protected],

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Data-Intensive Question Answering

Data-Intensive Question Answering Data-Intensive Question Answering Eric Brill, Jimmy Lin, Michele Banko, Susan Dumais and Andrew Ng Microsoft Research One Microsoft Way Redmond, WA 98052 {brill, mbanko, sdumais}@microsoft.com [email protected];

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC [email protected] Hong Cheng CS Dept, UIUC [email protected] Abstract Most current search engines present the user a ranked

More information

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of

More information

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

RETRIEVING QUESTIONS AND ANSWERS IN COMMUNITY-BASED QUESTION ANSWERING SERVICES KAI WANG

RETRIEVING QUESTIONS AND ANSWERS IN COMMUNITY-BASED QUESTION ANSWERING SERVICES KAI WANG RETRIEVING QUESTIONS AND ANSWERS IN COMMUNITY-BASED QUESTION ANSWERING SERVICES KAI WANG (B.ENG, NANYANG TECHNOLOGICAL UNIVERSITY) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY SCHOOL OF COMPUTING

More information

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic

More information

Online Marketing Optimization Essentials

Online Marketing Optimization Essentials Online Marketing Optimization Essentials Bilal Saleh Principal Partner E-Nor Inc. May 20, 2014 Agenda 2 E-Nor Overview Search Engine Optimization (SEO) Paid search Web Analytics Q&A Graphics by: http://www.iconarchive.com/show/seo-icons-by-designbolts.html

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

NTT DOCOMO Technical Journal. Knowledge Q&A: Direct Answers to Natural Questions. 1. Introduction. 2. Overview of Knowledge Q&A Service

NTT DOCOMO Technical Journal. Knowledge Q&A: Direct Answers to Natural Questions. 1. Introduction. 2. Overview of Knowledge Q&A Service Knowledge Q&A: Direct Answers to Natural Questions Natural Language Processing Question-answering Knowledge Retrieval Knowledge Q&A: Direct Answers to Natural Questions In June, 2012, we began providing

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority

More information

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

More information

SEO Techniques for various Applications - A Comparative Analyses and Evaluation

SEO Techniques for various Applications - A Comparative Analyses and Evaluation IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 20-24 www.iosrjournals.org SEO Techniques for various Applications - A Comparative Analyses and Evaluation Sandhya

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted

More information

Personalizing Image Search from the Photo Sharing Websites

Personalizing Image Search from the Photo Sharing Websites Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore [email protected] Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore [email protected]

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Query Recommendation employing Query Logs in Search Optimization

Query Recommendation employing Query Logs in Search Optimization 1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: [email protected] Dr Manish

More information

Framework for Intelligent Crawler Engine on IaaS Cloud Service Model

Framework for Intelligent Crawler Engine on IaaS Cloud Service Model International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1783-1789 International Research Publications House http://www. irphouse.com Framework for

More information

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining. Ming-Wei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at Urbana-Champaign, Urbana, IL 61801 +1 (917) 345-6125 [email protected] http://flake.cs.uiuc.edu/~mchang21 Research

More information

Will my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites

Will my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites Will my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites Gideon Dror, Yoelle Maarek and Idan Szpektor Yahoo! Labs, MATAM, Haifa 31905, Israel {gideondr,yoelle,idan}@yahoo-inc.com

More information