Clinical Decision Support with the SPUD Language Model
|
|
- Clare Pope
- 8 years ago
- Views:
Transcription
1 Clinical Decision Support with the SPUD Language Model Ronan Cummins The Computer Laboratory, University of Cambridge, UK Abstract. In this paper we present the systems and techniques used by the University of Cambridge for the CDS (Clinical Decision Support) track of the 24 th Text Retrieval Conference (TREC). The system was among the best automatic approaches for both CDS tasks and is based on a new language modelling approach implemented using Lucene. 1 1 Introduction We outline the main models and techniques used to participate in the CDS track of TREC The CDS track consisted of retrieving relevant biomedical articles for answering generic clinical questions about medical records. As the documents are full scientific articles they tend to be much longer than the average general Web document. Furthermore, the types of the queries issued for this task are also much longer than those used for typical web search. Therefore, we adopted the use of our new document language model [1] that has been shown to retrieve longer documents more fairly. The recently developed SPUD language model [1] treats document generation using the Pólya process and aims to model wordburstiness directly. It has been shown to incorporate a number of theoretically interesting properties. For example, it models the scope and verbosity hypothesis [3] separately, and reintroduces a measure closely related to inverse document frequency [4]. Therefore, we hypothesised that the SPUD language model would be well-suited to the retrieval of scientific texts. We also studied the performance of a newly developed query modelling technique which re-weights salient terms in longer queries. Furthermore, we investigated query expansion using two different models. 2 Models We now outline the main models used in our approaches. 1
2 2 Ronan Cummins 2.1 Document Models We model each document as a mixture of multivariate Pólya distributions. The model captures word burstiness by modelling the dependencies between recurrences of the same word-type. Each document is modelled as follows: α d = (1 ω) α τ + ω α c (1) where α d, α τ, and α c are the document model, topic model, 2 and background model respectively. The hyper-parameter ω controls the smoothing and is stable at approximately ω = 0.8. Each of these models are multivariate Pólya distributions with parameters estimated as follows: ˆα τ = {m d c(t, d) d : t d} ˆα c = {m c df t t df t : t C} (2) where m d is the number of word-types (distinct terms) in d, c(t, d) is the count of term t in document d, d is the number of word tokens in d, df t is the document frequency of term t in the collection C, and m c is a background mass parameter that can be estimated via numerical methods (see [1] for details). The scale parameters m d and m c can be interpreted as beliefs in the parameters c(t, d)/ d and df t / t df t respectively. For the document collection in the CDS track the parameter m c is estimated to be 775. The KL-divergence approach to ranking documents can be used with these document models whereby one takes a point-estimate of the distribution (i.e. E[α d ] is a multinomial) and one can rank documents according to the negative KL-divergence between the query distribution and the expected document multinomial. 2.2 Query Models When a user formulates a short keyword query (e.g. hypotension, hypoxia), it is usually assumed that they have already distilled the topical aspect of the information need. Consequently, one may assume that the probability that a particular query token is topical is 1.0 and this can be normalised accordingly to estimate the maximum likelihood query model (i.e. { 1 2, 1 2 }). This is the standard method of estimating query models for use with KL-Divergence. However, when dealing with natural language queries (e.g. A 44-year-old man with coffee-ground emesis, tachycardia, hypoxia, hypotension and cool, clammy extremities.) it is likely that some terms are generated by a background query language model. Therefore, we assume that long natural language queries are generated by drawing terms from a query model α q which consists of both a topical language model α qτ and a background query language model α qc. The topical query model describes the topical information that the user requires, 2 For the purposes of this paper, we refer to the unsmoothed model as the topic model of the document as it explains words not explained by the general background model.
3 Clinical Decision Support with the SPUD Language Model 3 while the background query model describes the syntactic glue of the general query language. Examples of fragments that can be explained by the background query language model are tokens such as I, am, looking, for,, and A, relevant, document, may, include, (a stereotypical TREC construct). Therefore, our new query model is defined as follows: α q = (1 λ q ) α qτ + (λ q ) α qc (3) where λ q is the probability mass of the background query language model. Although the background query language model is likely to contain some structural clues regarding relevance, we simple regard this model as generating noise tokens, and therefore aim to extract the topical part of each query. This can be achieved by determining using Bayes theorem the probability that a particular query term t was generated by the topical query model as follows: p(α qτ t) = (1 λ q ) p(t α qτ ) (1 λ q ) p(t α qτ ) + (λ q ) p(t α qc ) The final step involves determining the distribution of terms in the topical query model α qτ by normalising over the tokens in q as follows: c(t, q) p(α qτ t) p(t α qτ ) = t q (c(t, q) p(α qτ t )) which we call the discriminative query model (DQM). For the specific instantiation of the model using multivariate Pòlya distributions (α qτ ), the probability that a particular term t came from the topical part of the query model, when assuming that the background query model is the general collection, is as follows: p(α qτ t) = c(t, q) c(t, q) + (1 ωq) ω q df t m c q t df t (4) (5) m d (6) which can be plugged into Eq. 5 to yield the DQM using the multivariate Pòlya. The one free parameter in this specific query model is ω q which determines the belief in the background query model. We set this value to be the same as that from the document model such that ω = ω q. This model essentially introduces a type of idf into longer queries, as common terms occurring in the query will be more likely to come from the background model. As a result, these common terms are down-weighted in the query to varying degrees. 2.3 Psuedo-Relevance Feedback Models We adopt two types of pseudo-relevance feedback models. Firstly, we use the standard RM3 model [2] with our SPUD framework, where we assume the top F documents are relevant. Secondly, we use an approach that extracts the topical terms from the feedback documents in a similar manner to that outlined in the previous section. Given a set of feedback documents F, we then rank terms as follows:
4 4 Ronan Cummins p(θ Q t) = d F p(α τ t) p(q α d ) d F p(q α d ) (7) where p(q α d ) is the document score. Given the SPUD language model (Eq. 1) and its parameters estimates (Eq. 2), the probability that the term t was generated from the topical model α τ of a feedback-document can be calculated via Bayes theorem as follows: (1 ω) α τt p(α τ t) = (8) (1 ω) α τt + ω α ct where α τt and α ct are the parameters of t for the document topic model and background model respectively. A relatively simple intuition for this formula is that topical terms are those that are more likely generated from the topical part of a document than those that are generated by the background model. For all approaches we interpolate 30 feedback terms with the original query using linear-interpolation of System and Topics Our models were all implemented in the Lucene retrieval framework. We stemmed all text using Porter s stemmer and removed a small number of stopwords (the 26 stopwords from the Lucene EnglishAnalyzer). The topics in the CDS track are of three different types (diagnosis, test, and treatment) depending on what the specific task the clinician is involved in. An example topic is as follows: <topic number="1" type="diagnosis"> <description>a 44 yo male is brought to the emergency room after multiple bouts of vomiting that has a coffee ground appearance. His heart rate is 135 bpm and blood pressure is 70/40 mmhg. Physical exam findings include decreased mental status and cool extremities. He receives a rapid infusion of crystalloid solution followed by packed red blood cell transfusion and is admitted to the ICU for further care. </description> <summary>a 44-year-old man with coffee-ground emesis, tachycardia, hypoxia, hypotension and cool, clammy extremities. </summary> </topic> 4 Results This section presents the results of the six runs submitted to the CDS track (three for task A and three for task B). Table 1 shows details of the six runs submitted.
5 Clinical Decision Support with the SPUD Language Model 5 Table 1. Details of settings used for each run System Task Doc Model Query Model Feedback Model Topic Fields CAMspud1 A SPUD ω=0.9 DQM ω=0.9 RM3 λ=0.5, F = 5 type + summary CAMspud3 A SPUD ω=0.9 DQM ω=0.9 QTM λ=0.5, F = 5 summary CAMspud5 A SPUD ω=0.9 DQM ω=0.9 RM3 λ=0.5, F = 5 description CAMspud6 B SPUD ω=0.85 DQM ω=0.85 No Feedback diagnosis + summary CAMspud7 B SPUD ω=0.85 DQM ω=0.85 RM3 λ=0.5, F = 10 diagnosis + summary CAMspud8 B SPUD ω=0.85 DQM ω=0.85 QTM λ=0.5, F = 10 diagnosis + summary 4.1 Task A Table 2 shows the MAP, InfAP, and InfNDCG of the three runs submitted for task A. The rows labelled median and best show the median and best performance per topic averaged over all 30 topics for all of the runs in the track. It is worth noting that best does not represent one system, rather it indicates an upper-bound or oracle approach. All of our approaches performed above the median which is encouraging, with CAMspud1 being our best run for task A. Our best run is very close in performance to the best single run for task A (labelled top system). This approach used the type and summary fields with RM3 relevance feedback of 30 terms. Overall, our system was the third best for task A (out of 36 automatic systems). Table 2. MAP, InfAP, and InfNDCG over 30 Topics System MAP InfAP InfNDCG CAMspud CAMspud CAMspud top system median best Task B Table 3 shows the MAP, InfAP, and InfNDCG of the three runs submitted for task B (where a diagnosis field is included in the topics of type treatment and test). For task B we used both the diagnosis and summary field in all of our runs. Again all of our runs outperform the median run. CAMspud6 does not use pseudo-relevance query expansion and is the worst of the three runs. The only
6 6 Ronan Cummins difference between CAMspud7 and CAMspud8 is that the former uses RM3 pseudo-relevance expansion, while the latter uses the new query topic modelling approach. As these results are higher than in task A, it suggests that including the diagnosis field is useful. Overall, our system was the fifth best for task B (out of 26 automatic systems). Table 3. MAP, InfAP, and InfNDCG over 30 Topics System MAP InfAP InfNDCG CAMspud CAMspud CAMspud top system median best Summary We experimented with using the new SPUD language modelling approach in the CDS track. In general the approach is quite effective and is among the top five systems on both of the CDS tasks. In summary, we found query expansion to be beneficial, the summary field to be more effective than the description field, and we found that using the diagnosis field (when available) also leads to an improvement in the task. References 1. Ronan Cummins, Jiaul H. Paik, and Yuanhua Lv. A Pólya urn document language model for improved information retrieval. ACM Transactions of Informations Systems, 33(4):21, Victor Lavrenko and W. Bruce Croft. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 01, pages , New York, NY, USA, ACM. 3. S. E. Robertson and S. Walker. Some simple effective approximations to the 2- poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 94, pages , New York, NY, USA, Springer-Verlag New York, Inc. 4. Karen Spärck-Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11 21, 1972.
TEMPER : A Temporal Relevance Feedback Method
TEMPER : A Temporal Relevance Feedback Method Mostafa Keikha, Shima Gerani and Fabio Crestani {mostafa.keikha, shima.gerani, fabio.crestani}@usi.ch University of Lugano, Lugano, Switzerland Abstract. The
More informationUMass at TREC 2008 Blog Distillation Task
UMass at TREC 2008 Blog Distillation Task Jangwon Seo and W. Bruce Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This paper presents the work done for
More informationHow To Improve A Ccr Task
The University of Illinois Graduate School of Library and Information Science at TREC 2013 Miles Efron, Craig Willis, Peter Organisciak, Brian Balsamo, Ana Lucic Graduate School of Library and Information
More informationAn Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
More informationTerrier: A High Performance and Scalable Information Retrieval Platform
Terrier: A High Performance and Scalable Information Retrieval Platform Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Christina Lioma Department of Computing Science University
More informationAxiomatic Analysis and Optimization of Information Retrieval Models
Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang Zhai Dept. of Computer Science University of Illinois at Urbana Champaign USA http://www.cs.illinois.edu/homes/czhai Hui Fang
More informationRelevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience
Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience Abstract N.J. Belkin, C. Cool*, J. Head, J. Jeng, D. Kelly, S. Lin, L. Lobash, S.Y.
More informationA survey on the use of relevance feedback for information access systems
A survey on the use of relevance feedback for information access systems Ian Ruthven Department of Computer and Information Sciences University of Strathclyde, Glasgow, G1 1XH. Ian.Ruthven@cis.strath.ac.uk
More informationImproving Web Page Retrieval using Search Context from Clicked Domain Names
Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands
More informationRelevance Feedback between Web Search and the Semantic Web
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Relevance Feedback between Web Search and the Semantic Web Harry Halpin and Victor Lavrenko School of Informatics
More informationUMass at TREC WEB 2014: Entity Query Feature Expansion using Knowledge Base Links
UMass at TREC WEB 2014: Entity Query Feature Expansion using Knowledge Base Links Laura Dietz and Patrick Verga University of Massachusetts Amherst, MA, U.S.A. {dietz,pat}@cs.umass.edu Abstract Entity
More informationPredicting Query Performance in Intranet Search
Predicting Query Performance in Intranet Search Craig Macdonald University of Glasgow Glasgow, G12 8QQ, U.K. craigm@dcs.gla.ac.uk Ben He University of Glasgow Glasgow, G12 8QQ, U.K. ben@dcs.gla.ac.uk Iadh
More informationTowards a Visually Enhanced Medical Search Engine
Towards a Visually Enhanced Medical Search Engine Lavish Lalwani 1,2, Guido Zuccon 1, Mohamed Sharaf 2, Anthony Nguyen 1 1 The Australian e-health Research Centre, Brisbane, Queensland, Australia; 2 The
More informationIncorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
More informationSimple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
More informationThe University of Lisbon at CLEF 2006 Ad-Hoc Task
The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationUniversity of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier
University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier David Hannah, Craig Macdonald, Jie Peng, Ben He, Iadh Ounis Department of Computing Science University of Glasgow
More informationBlog feed search with a post index
DOI 10.1007/s10791-011-9165-9 Blog feed search with a post index Wouter Weerkamp Krisztian Balog Maarten de Rijke Received: 18 February 2010 / Accepted: 18 February 2011 Ó The Author(s) 2011. This article
More informationCan the Content of Public News be used to Forecast Abnormal Stock Market Behaviour?
Seventh IEEE International Conference on Data Mining Can the Content of Public News be used to Forecast Abnormal Stock Market Behaviour? Calum Robertson Information Research Group Queensland University
More informationMedical Information-Retrieval Systems. Dong Peng Medical Informatics Group
Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval
More informationApproaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives
Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives 7 XIN CAO and GAO CONG, Nanyang Technological University BIN CUI, Peking University CHRISTIAN S.
More informationQuery difficulty, robustness and selective application of query expansion
Query difficulty, robustness and selective application of query expansion Giambattista Amati, Claudio Carpineto, and Giovanni Romano Fondazione Ugo Bordoni Rome Italy gba, carpinet, romano@fub.it Abstract.
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationHomework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9
Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;
More informationRetrieving Medical Literature for Clinical Decision Support
Retrieving Medical Literature for Clinical Decision Support Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, and Ophir Frieder Information Retrieval Lab, Georgetown University {luca, arman, andrew,
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More informationA Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track)
A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track) Linh Hoang, Seung-Wook Lee, Gumwon Hong, Joo-Young Lee, Hae-Chang Rim Natural Language Processing Lab., Korea University (linh,swlee,gwhong,jylee,rim)@nlp.korea.ac.kr
More informationComparison of Standard and Zipf-Based Document Retrieval Heuristics
Comparison of Standard and Zipf-Based Document Retrieval Heuristics Benjamin Hoffmann Universität Stuttgart, Institut für Formale Methoden der Informatik Universitätsstr. 38, D-70569 Stuttgart, Germany
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationSIGIR 2004 Workshop: RIA and "Where can IR go from here?"
SIGIR 2004 Workshop: RIA and "Where can IR go from here?" Donna Harman National Institute of Standards and Technology Gaithersburg, Maryland, 20899 donna.harman@nist.gov Chris Buckley Sabir Research, Inc.
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationSupporting Keyword Search in Product Database: A Probabilistic Approach
Supporting Keyword Search in Product Database: A Probabilistic Approach Huizhong Duan 1, ChengXiang Zhai 2, Jinxing Cheng 3, Abhishek Gattani 4 University of Illinois at Urbana-Champaign 1,2 Walmart Labs
More informationEng. Mohammed Abdualal
Islamic University of Gaza Faculty of Engineering Computer Engineering Department Information Storage and Retrieval (ECOM 5124) IR HW 5+6 Scoring, term weighting and the vector space model Exercise 6.2
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationModern Information Retrieval: A Brief Overview
Modern Information Retrieval: A Brief Overview Amit Singhal Google, Inc. singhal@google.com Abstract For thousands of years people have realized the importance of archiving and finding information. With
More informationUsing Contextual Information to Improve Search in Email Archives
Using Contextual Information to Improve Search in Email Archives Wouter Weerkamp, Krisztian Balog, and Maarten de Rijke ISLA, University of Amsterdam, Kruislaan 43, 198 SJ Amsterdam, The Netherlands w.weerkamp@uva.nl,
More informationEarly Warning Scores (EWS) Clinical Sessions 2011 By Bhavin Doshi
Early Warning Scores (EWS) Clinical Sessions 2011 By Bhavin Doshi What is EWS? After qualifying, junior doctors are expected to distinguish between the moderately sick patients who can be managed in the
More informationUsing Discharge Summaries to Improve Information Retrieval in Clinical Domain
Using Discharge Summaries to Improve Information Retrieval in Clinical Domain Dongqing Zhu 1, Wu Stephen 2, Masanz James 2, Ben Carterette 1, and Hongfang Liu 2 1 University of Delaware, 101 Smith Hall,
More informationBagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
More informationGender-based Models of Location from Flickr
Gender-based Models of Location from Flickr Neil O Hare Yahoo! Research, Barcelona, Spain nohare@yahoo-inc.com Vanessa Murdock Microsoft vanessa.murdock@yahoo.com ABSTRACT Geo-tagged content from social
More informationAtigeo at TREC 2014 Clinical Decision Support Task
Atigeo at TREC 2014 Clinical Decision Support Task Yishul Wei*, ChenChieh Hsu*, Alex Thomas, Joseph F. McCarthy Atigeo, LLC 800 Bellevue Way NE, Suite 600 Bellevue, WA 98004 USA joe.mccarthy@atigeo.com
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationEmoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationUsing Transactional Data From ERP Systems for Expert Finding
Using Transactional Data from ERP Systems for Expert Finding Lars K. Schunk 1 and Gao Cong 2 1 Dynaway A/S, Alfred Nobels Vej 21E, 9220 Aalborg Øst, Denmark 2 School of Computer Engineering, Nanyang Technological
More informationA Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives
A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives Xin Cao 1, Gao Cong 1, 2, Bin Cui 3, Christian S. Jensen 1 1 Department of Computer
More informationPREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM
PREDICTING MARKET VOLATILITY FROM FEDERAL RESERVE BOARD MEETING MINUTES Reza Bosagh Zadeh and Andreas Zollmann Lab Advisers: Noah Smith and Bryan Routledge GOALS Make Money! Not really. Find interesting
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationInformation Retrieval Models
Information Retrieval Models Djoerd Hiemstra University of Twente 1 Introduction author version Many applications that handle information on the internet would be completely inadequate without the support
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationLatent Semantic Indexing with Selective Query Expansion Abstract Introduction
Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes
More informationAggregating Evidence from Hospital Departments to Improve Medical Records Search
Aggregating Evidence from Hospital Departments to Improve Medical Records Search NutLimsopatham 1,CraigMacdonald 2,andIadhOunis 2 School of Computing Science University of Glasgow G12 8QQ, Glasgow, UK
More informationInformation Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationContent visualization of scientific corpora using an extensible relational database implementation
. Content visualization of scientific corpora using an extensible relational database implementation Eleftherios Stamatogiannakis, Ioannis Foufoulas, Theodoros Giannakopoulos, Harry Dimitropoulos, Natalia
More informationImproving Student Question Classification
Improving Student Question Classification Cecily Heiner and Joseph L. Zachary {cecily,zachary}@cs.utah.edu School of Computing, University of Utah Abstract. Students in introductory programming classes
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationEfficient Recruitment and User-System Performance
Monitoring User-System Performance in Interactive Retrieval Tasks Liudmila Boldareva Arjen P. de Vries Djoerd Hiemstra University of Twente CWI Dept. of Computer Science INS1 PO Box 217 PO Box 94079 7500
More informationImproving Contextual Suggestions using Open Web Domain Knowledge
Improving Contextual Suggestions using Open Web Domain Knowledge Thaer Samar, 1 Alejandro Bellogín, 2 and Arjen de Vries 1 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands 2 Universidad Autónoma
More informationOptimization of Internet Search based on Noun Phrases and Clustering Techniques
Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna
More informationQuery term suggestion in academic search
Query term suggestion in academic search Suzan Verberne 1, Maya Sappelli 1,2, and Wessel Kraaij 2,1 1. Institute for Computing and Information Sciences, Radboud University Nijmegen 2. TNO, Delft Abstract.
More informationInformation Retrieval Systems in XML Based Database A review
Information Retrieval Systems in XML Based Database A review Preeti Pandey 1, L.S.Maurya 2 Research Scholar, IT Department, SRMSCET, Bareilly, India 1 Associate Professor, IT Department, SRMSCET, Bareilly,
More informationRetrieval and Feedback Models for Blog Feed Search
Retrieval and Feedback Models for Blog Feed Search Jonathan L. Elsas, Jaime Arguello, Jamie Callan and Jaime G. Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationCombining Global and Personal Anti-Spam Filtering
Combining Global and Personal Anti-Spam Filtering Richard Segal IBM Research Hawthorne, NY 10532 Abstract Many of the first successful applications of statistical learning to anti-spam filtering were personalized
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationHuy Nguyen. Department of Electrical Engineering and Computer Science May 22, 2009 Certified by / VI-
Improving Search Quality of the Google Search Appliance by Huy Nguyen Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationCINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
More informationRanked Keyword Search in Cloud Computing: An Innovative Approach
International Journal of Computational Engineering Research Vol, 03 Issue, 6 Ranked Keyword Search in Cloud Computing: An Innovative Approach 1, Vimmi Makkar 2, Sandeep Dalal 1, (M.Tech) 2,(Assistant professor)
More informationAn Exploration of Ranking Heuristics in Mobile Local Search
An Exploration of Ranking Heuristics in Mobile Local Search ABSTRACT Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801, USA ylv2@uiuc.edu Users increasingly
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationRanking on Data Manifolds
Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationIntelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives
Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationInformation Retrieval System Assigning Context to Documents by Relevance Feedback
Information Retrieval System Assigning Context to Documents by Relevance Feedback Narina Thakur Department of CSE Bharati Vidyapeeth College Of Engineering New Delhi, India Deepti Mehrotra ASCS Amity University,
More informationFINANCIAL SERVICES BOARD INSURANCE DEPARTMENT
FINANCIAL SERVICES BOARD INSURANCE DEPARTMENT SPECIAL REPORT ON THE RESULTS OF THE LONG-TERM INSURANCE INDUSTRY FOR THE PERIOD ENDED MARCH 6 June 1 SPECIAL REPORT ON THE RESULTS OF THE LONG-TERM INSURANCE
More informationA Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR
A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR Dan Wu School of Information Management Wuhan University, Hubei, China woodan@whu.edu.cn Daqing He School of Information
More informationDirect Marketing Profit Model. Bruce Lund, Marketing Associates, Detroit, Michigan and Wilmington, Delaware
Paper CI-04 Direct Marketing Profit Model Bruce Lund, Marketing Associates, Detroit, Michigan and Wilmington, Delaware ABSTRACT A net lift model gives the expected incrementality (the incremental rate)
More informationHYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationMining Semi-Structured Online Knowledge Bases to Answer Natural Language Questions on Community QA Websites
Mining Semi-Structured Online Knowledge Bases to Answer Natural Language Questions on Community QA Websites ABSTRACT Parikshit Sondhi Department of Computer Science University of Illinois at Urbana-Champaign
More informationCengage Learning at TREC 2011 Medical Track
Cengage Learning at TREC 2011 Medical Track Benjamin King University of Michigan benjaminking@umich.edu Lijun Wang Wayne State University ljwang@wayne.edu Ivan Provalov Cengage Learning ivan.provalov@cengage.com
More informationQuery Modification through External Sources to Support Clinical Decisions
Query Modification through External Sources to Support Clinical Decisions Raymond Wan 1, Jannifer Hiu-Kwan Man 2, and Ting-Fung Chan 1 1 School of Life Sciences and the State Key Laboratory of Agrobiotechnology,
More informationUniversity of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion
University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion Gina-Anne Levow University of Chicago 1100 E. 58th St, Chicago, IL 60637, USA levow@cs.uchicago.edu Abstract Pseudo-relevance feedback,
More informationProcedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.
Procedia Computer Science 00 (2012) 1 21 Procedia Computer Science Top-k best probability queries and semantics ranking properties on probabilistic databases Trieu Minh Nhut Le, Jinli Cao, and Zhen He
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationRanking using Multiple Document Types in Desktop Search
Ranking using Multiple Document Types in Desktop Search Jinyoung Kim and W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts Amherst {jykim,croft}@cs.umass.edu
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationEfficient Cluster Detection and Network Marketing in a Nautural Environment
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin
More informationLearning to Expand Queries Using Entities
This is a preprint of the article Brandão, W. C., Santos, R. L. T., Ziviani, N., de Moura, E. S. and da Silva, A. S. (2014), Learning to expand queries using entities. Journal of the Association for Information
More informationComparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search
Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Orland Hoeber and Hanze Liu Department of Computer Science, Memorial University St. John s, NL, Canada A1B 3X5
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
More informationChapter 2 The Information Retrieval Process
Chapter 2 The Information Retrieval Process Abstract What does an information retrieval system look like from a bird s eye perspective? How can a set of documents be processed by a system to make sense
More informationHealth Service Circular
Health Service Circular Series Number: HSC 2002/009 Issue Date: 04 July 2002 Review Date: 04 July 2005 Category: Public Health Status: Action sets out a specific action on the part of the recipient with
More informationBlocking Blog Spam with Language Model Disagreement
Blocking Blog Spam with Language Model Disagreement Gilad Mishne Informatics Institute, University of Amsterdam Kruislaan 403, 1098SJ Amsterdam The Netherlands gilad@science.uva.nl David Carmel, Ronny
More information