Exploring Adaptive Window Sizes for Entity Retrieval

Size: px
Start display at page:

Download "Exploring Adaptive Window Sizes for Entity Retrieval"

Transcription

1 Exploring Adaptive Window Sizes for Entity Retrieval Fawaz Alarfaj, Udo Kruschwitz, and Chris Fox School of Computer Science and Electronic Engineering University of Essex Colchester, CO4 3SQ, UK Abstract. With the continuous attention of modern search engines to retrieve entities and not just documents for any given query, we introduce a new method for enhancing the entity-ranking task. An entity-ranking task is concerned with retrieving a ranked list of entities as a response to a specific query. Some successful models used the idea of association discovery in a window of text, rather than in the whole document. However, these studies considered only fixed window sizes. This work proposes a way of generating an adaptive window size for each document by utilising some of the document features. These features include document length, average sentence length, number of entities in the document, and the readability index. Experimental results show a positive effect once taking these document features into consideration when determining window size. 1 Introduction In an organisational setting, search engines have become mandatory for aiding knowledge workers with their day-to-day information needs. Traditionally, information retrieval systems function by returning a list of documents in response to the user s query, although, the needed information may not be necessarily in the form of documents. In fact, users more often search for specific things or entities which include people, organisations, or products. One special type of entity-search is concerned with finding people who have specific knowledge; this type of entity-search is called expert-finding, i.e. identifying experts who have the relevant skills and knowledge on a given topic [4]. With a search topic input, state-of-the-art expert-finding systems will measure the knowledge of any candidate expert using the content of the highest ranking documents by highlighting associations based on co-occurrences between the search topic and the candidate evidence [2]. Evidence of expertise could be considered to be highlighted with search terms, furthermore the number and frequency is used to ascertain the likelihood of an individual being considered an expert. There are two main assumptions, firstly, the more a candidate is located within a document including terms of description the more likely they are to be an expert on the subject and secondly, a stronger association is seen when the identifiers are closer to the search terms. With these assumptions in mind, M. de Rijke et al. (Eds.): ECIR 2014, LNCS 8416, pp , c Springer International Publishing Switzerland 2014

2 574 F. Alarfaj, U. Kruschwitz, and C. Fox some research has used fixed-size windows to measure the proximity between candidate identifiers and search terms. Zhu et al. tested 31 window sizes on the W3C collection. They found the best window size to be around 200 words. According to Zhu et al., small window sizes could lead to high precision but low recall. Conversely, large window sizes lead to high recall but low precision [7]. Therefore, other studies consider multiple levels of associations in documents by combining multiple fixed window sizes [3]. In this paper, we consider the idea of an adaptive window size, where the size of the window is a function of various document features. We argue that, in general, each document has distinct features differing from other documents in the collection. The proposed idea is to use these features to set the window size in order to improve the overall ranking function, while many document features could be examined. The focus in this work is aimed at four main features: document length; candidate frequency (i.e., number of candidates that appear in a document); average sentence length; and readability index. To the best of our knowledge, no existing work has dealt with using the document features to determine the optimal window size for the proximity function, apart from our earlier work [1]. It is important to note that the adaptive window size approach could be applied to any proximity search, in particular for an entity-oriented search that generalises expert search. The study is performed in the expert-search domain due to the availability of expert-search benchmarks. The main research question considered is whether an adaptive window size leads to improvements over fixed window size methods. 2 Adaptive Window Size for Proximity Ranking The window size for the proximity function will be determined for each document based on the following features. Document Length: according to Miao et. al. [5], in large documents, it is more likely to find more occurrences of a query topic. It is also more likely to have irrelevant words (noise) in such documents. Thus, in order to minimise the negative influence of noise, the window size should be relatively smaller as the document gets bigger. Candidate Frequency: refers to the number of candidates found in a document. When a document has more occurrences of candidates evidence, the window size should be relatively larger to accommodate more occurrences. Average Sentence Length: the window size is adjusted in proportion to the average sentence length (in tokens) in the document. Readability Index 1 : the window size is adjusted using the readability index where the window gets bigger whenever the index gets smaller. These features are combined in the following equation: W indowsize = σ 4 (log( 1 DocLength ) β 1 + CanF req β 2 + AvgSentSize β 3 + ReadabilityIndex β 4 ) (1) 1 FleschKincaid test is used to calculate the Readability Index in this experiment.

3 Exploring Adaptive Window Sizes for Entity Retrieval 575 The variable σ allows to scale the window size. The weighting factors β, which determine each feature s contribution in the equation are determined empirically. Once the size of the window has been identified, it can be applied to all search terms found in the document, enabling the extraction of the candidates evidence accompanying the search term. Each of these are given a weight in the window depending upon their proximity to the search query. The proximity weight is calculated using Gaussian kernel function, which according to previous work [6], produces the best results in this context. 3 Experiments Improving on our earlier work [1], we have added the new feature, readability index, to the set of features. Moreover, we have applied this method on extra test collections. In this work, two datasets are used to test the proposed approach: W3C corpus and CSIRO corpus, and the four test collections of TREC Enterprise Track between see (Table 1). We used 10 training topics to train our variables, thus having a clear distinction between test and training data. Table 1. TREC Expert Finding Test Collections W3C CSIRO TREC 2005 TREC 2006 TREC 2007 TREC 2008 Documents 331, , , ,715 Candidates 1,092 1,092 3,000 3,000 Size 5.7 GB 5.7 GB 4.2 GB 4.2 GB Topics Qrels Stopwords and HTML markup were eliminated prior to processing. Lucene 3 was used as a retrieval engine. For evaluation, we applied a range of standard IR measures, but in our discussion, we focus on mean average precision (MAP). In this work, we use the two-stage model [2] for the initial candidate ranking as follows: D p(ca q) = p(d i q) p(ca d i,q) (2) i=1 where p(d q) is the document relevance to the query, which is calculated by the underlying search engine. p(ca d i,q) is calculated using the two assumptions mentioned earlier: p(ca d, q) = P occu(ca d)+p kernel (ca d) ζ (3)

4 576 F. Alarfaj, U. Kruschwitz, and C. Fox where P occu (ca d) represents the first assumption (i.e., the more often the candidate appears in relevant documents, the more likely he/she is an expert), and P kernel (ca d) represents the second assumption (i.e., the closer the candidate appears to relevant terms, the more likely he/she is an expert). In this work, the two probabilities are considered as independent, hence the summation. The value of the constant ζ is chosen to ensure that p(c d, q) is a probability measure. The value of ζ is computed as follows: ζ = N (P occu (ca i d)+p kernel (ca i d)) i=1 where N is the total number of candidates in the document d. For the cooccurrence part, (i.e., P occu (ca d)), a TF IDF weighting scheme is applied [3]: P occu (ca d) = n(ca, d) i n(ca i,d) log D {d : n(ca, d ) > 0} (4) where n(ca, d) is the number of times the candidate appears in the document. i n(ca i,d) is the number of times any candidate appears. D is the number of documents in the collection. d : n(ca, d ) > 0 is number of documents where the candidate appears. Finally, P kernel (ca d) is defined as follows: P kernel (ca d) = k(t, c) N i=1 k(t, ca i) (5) As mentioned above, non-uniform Gaussian kernel functions have been used to calculate the candidate s proximity: { 1 k(t, c) = 2πσ 2 exp( u2 2σ 2 ), u = c t, if c t w (6), otherwise where c is the candidate position in the document, t is the topic position, and w is the window size for the current document. For further elucidation, Figure 1 shows a simple illustrative example of how p(ca q) is measured. The example topic returned three relevant documents, which used to rank three candidates. In this example, each candidate c i has n = number of times he/she appears in the document and k = the result of the kernel function. The two ranking models (P occu (ca d) and P kernel (ca d)) are combined to determine the final candidate rank. To test the effect of each document feature separately, using training topics, we first generate the adaptive window size with a single feature. Figure 2 shows the MAP at sigma values between 0 and The analysis of variance, ANOVA, test at p<0.05 suggested a statistical difference between the features. This is true for all datasets. It is clear from the figure that the second feature (i.e., the number of candidates in the document) appears to score the highest in all datasets.

5 Exploring Adaptive Window Sizes for Entity Retrieval 577 c1 n=4 k=0.2 N = 10 Σ = 0.6 k d1 c2 c3 n=0 k=0.0 n=6 k=0.4 C 1 C 2 C 3 i = Poccu(ca d) j =Pkernel(ca d) i+j (i+j)/z Z = Σ(i+j) Rank = Σ p(c d,q) d c1 n=10 k=0.4 N = 14 Σ = 1.2 k c 1 q d2 c2 n=1 k=0.6 C 1 i = Poccu(ca d) j =Pkernel(ca d) i+j (i+j)/z c 3 C d 3 c3 c1 n=3 k=0.2 n=2 k=0.5 C N = 8 Σ = 1.5 k Z = Σ(i+j) 0.19 c 2 c2 n=5 k=0.5 C 1 i = Poccu(ca d) j =Pkernel(ca d) i+j (i+j)/z C c3 n=1 k=0.5 C Z = Σ(i+j) 0.23 Fig. 1. An example for the system framework 0.27 TREC TREC TREC TREC MAP Fig. 2. The results with an adaptive window using a single feature, where 1 is the document length, 2 is the number of candidates in the document, 3 is the average sentence length, and 4 is the readability index In order to compare the proposed adaptive-window method to a strong baseline, a fixed window size of 200 words is used as suggested by [7] with Gaussian proximity functions. For comparison, we also added the highest-scoring result from the TREC Enterprise track. From the results table, (Table 2), it can be seen that the use of the proposed method resulted in an improvement ranging from 10% to 20% over the fixed window baseline. Using paired t-test on average precision values, we found the difference between our best run and the corresponding baseline to be statistically significant. We indicate p<0.01 using and p<0.05 using. The significant improvement is reported for MAP only.

6 578 F. Alarfaj, U. Kruschwitz, and C. Fox Table 2. Summarised results, here MAP means Mean Average Precision and MRR means Mean Reciprocal Rank W3C CSIRO TREC 2005 TREC 2006 TREC 2007 TREC 2008 MAP MRR MAP MRR MAP MRR MAP MRR F ix200 Gaussian AdaptiveGaussian Best T REC Conclusions and Future Work We proposed an approach to adaptively select the size of the context window for boosting the retrieval scores of the entities that are close to query terms. As such, the size of the window cannot be fixed for all documents, rather it should be dependent upon the features of the current document. We found that adopting this method results in significant improvements over standard metrics. Moreover, we also find that the results of the adaptive-window using the four features outperform the results using only a single feature. Among the four features used in this study, the number of candidates feature appears to be the most important. Going forward, we intend to put the adaptive window size method into practice on other TREC benchmarks and expert-finding collections. Furthermore, we will investigate whether using other document features to determine window size can be effective. References 1. Alarfaj, F., Kruschwitz, U., Fox, C.: An adaptive window-size approach for expertfinding. In: DIR 2013, Delft, The Netherlands (April 2013) 2. Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Foundations and Trends in Information Retrieval 6(2-3), (2012) 3. Balog, K., Azzopardi, L., de Rijke, M.: A language modeling framework for expert finding. Information Processing and Management 45(1), 1 19 (2009) 4. Macdonald, C., Ounis, I.: Searching for expertise: Experiments with the voting model. The Computer Journal 52(7), (2009) 5. Miao, J., Huang, J.X., Ye, Z.: Proximity-based rocchio s model for pseudo relevance. In: SIGIR 2012, Portland, Oregon, pp (2012) 6. Petkova, D., Croft, W.: Proximity-based document representation for named entity retrieval. In: CIKM 2007, pp ACM, New York (2007) 7. Zhu, J., Song, D., Rüger, S.: Integrating multiple windows and document features for expert finding. JASIST 60(4), (2009)

TEMPER : A Temporal Relevance Feedback Method

TEMPER : A Temporal Relevance Feedback Method TEMPER : A Temporal Relevance Feedback Method Mostafa Keikha, Shima Gerani and Fabio Crestani {mostafa.keikha, shima.gerani, fabio.crestani}@usi.ch University of Lugano, Lugano, Switzerland Abstract. The

More information

Using Transactional Data From ERP Systems for Expert Finding

Using Transactional Data From ERP Systems for Expert Finding Using Transactional Data from ERP Systems for Expert Finding Lars K. Schunk 1 and Gao Cong 2 1 Dynaway A/S, Alfred Nobels Vej 21E, 9220 Aalborg Øst, Denmark 2 School of Computer Engineering, Nanyang Technological

More information

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

Predicting Query Performance in Intranet Search

Predicting Query Performance in Intranet Search Predicting Query Performance in Intranet Search Craig Macdonald University of Glasgow Glasgow, G12 8QQ, U.K. craigm@dcs.gla.ac.uk Ben He University of Glasgow Glasgow, G12 8QQ, U.K. ben@dcs.gla.ac.uk Iadh

More information

Improving Contextual Suggestions using Open Web Domain Knowledge

Improving Contextual Suggestions using Open Web Domain Knowledge Improving Contextual Suggestions using Open Web Domain Knowledge Thaer Samar, 1 Alejandro Bellogín, 2 and Arjen de Vries 1 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands 2 Universidad Autónoma

More information

University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier

University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier David Hannah, Craig Macdonald, Jie Peng, Ben He, Iadh Ounis Department of Computing Science University of Glasgow

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Improving Web Page Retrieval using Search Context from Clicked Domain Names

Improving Web Page Retrieval using Search Context from Clicked Domain Names Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands

More information

Combining Document and Sentence Scores for Blog Topic Retrieval

Combining Document and Sentence Scores for Blog Topic Retrieval Combining Document and Sentence Scores for Blog Topic Retrieval Jose M. Chenlo, David E. Losada Grupo de Sistemas Inteligentes Departamento de Electrónica y Comunicación Universidad de Santiago de Compostela,

More information

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Subordinating to the Majority: Factoid Question Answering over CQA Sites Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

More information

Using Contextual Information to Improve Search in Email Archives

Using Contextual Information to Improve Search in Email Archives Using Contextual Information to Improve Search in Email Archives Wouter Weerkamp, Krisztian Balog, and Maarten de Rijke ISLA, University of Amsterdam, Kruislaan 43, 198 SJ Amsterdam, The Netherlands w.weerkamp@uva.nl,

More information

Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/

Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ promoting access to White Rose research papers Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ This is an author produced version of a paper published in Advances in Information

More information

Query term suggestion in academic search

Query term suggestion in academic search Query term suggestion in academic search Suzan Verberne 1, Maya Sappelli 1,2, and Wessel Kraaij 2,1 1. Institute for Computing and Information Sciences, Radboud University Nijmegen 2. TNO, Delft Abstract.

More information

Predicting IMDB Movie Ratings Using Social Media

Predicting IMDB Movie Ratings Using Social Media Predicting IMDB Movie Ratings Using Social Media Andrei Oghina, Mathias Breuss, Manos Tsagkias, and Maarten de Rijke ISLA, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

UMass at TREC 2008 Blog Distillation Task

UMass at TREC 2008 Blog Distillation Task UMass at TREC 2008 Blog Distillation Task Jangwon Seo and W. Bruce Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This paper presents the work done for

More information

Retrieving Medical Literature for Clinical Decision Support

Retrieving Medical Literature for Clinical Decision Support Retrieving Medical Literature for Clinical Decision Support Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, and Ophir Frieder Information Retrieval Lab, Georgetown University {luca, arman, andrew,

More information

Terrier: A High Performance and Scalable Information Retrieval Platform

Terrier: A High Performance and Scalable Information Retrieval Platform Terrier: A High Performance and Scalable Information Retrieval Platform Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Christina Lioma Department of Computing Science University

More information

Optimization of Algorithms and Parameter Settings for an Enterprise Expert Search System

Optimization of Algorithms and Parameter Settings for an Enterprise Expert Search System Optimization of Algorithms and Parameter Settings for an Enterprise Expert Search System Valentin Molokanov, Dmitry Romanov, Valentin Tsibulsky National Research University Higher School of Economics Moscow,

More information

Blog feed search with a post index

Blog feed search with a post index DOI 10.1007/s10791-011-9165-9 Blog feed search with a post index Wouter Weerkamp Krisztian Balog Maarten de Rijke Received: 18 February 2010 / Accepted: 18 February 2011 Ó The Author(s) 2011. This article

More information

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Statistics Statistics are quantitative methods of describing, analysing, and drawing inferences (conclusions)

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

Mining Expertise and Interests from Social Media

Mining Expertise and Interests from Social Media Mining Expertise and Interests from Social Media Ido Guy, Uri Avraham, David Carmel, Sigalit Ur, Michal Jacovi, Inbal Ronen IBM Research Lab Haifa, Israel {ido, uria, carmel, sigalit, jacovi, inbal}@il.ibm.com

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

Expert Finding Using Social Networking

Expert Finding Using Social Networking San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 1-1-2009 Expert Finding Using Social Networking Parin Shah San Jose State University Follow this and

More information

Towards a Visually Enhanced Medical Search Engine

Towards a Visually Enhanced Medical Search Engine Towards a Visually Enhanced Medical Search Engine Lavish Lalwani 1,2, Guido Zuccon 1, Mohamed Sharaf 2, Anthony Nguyen 1 1 The Australian e-health Research Centre, Brisbane, Queensland, Australia; 2 The

More information

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Blog Site Search Using Resource Selection

Blog Site Search Using Resource Selection Blog Site Search Using Resource Selection Jangwon Seo jangwon@cs.umass.edu Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts, Amherst Amherst, MA 01003

More information

Gender-based Models of Location from Flickr

Gender-based Models of Location from Flickr Gender-based Models of Location from Flickr Neil O Hare Yahoo! Research, Barcelona, Spain nohare@yahoo-inc.com Vanessa Murdock Microsoft vanessa.murdock@yahoo.com ABSTRACT Geo-tagged content from social

More information

Aggregating Evidence from Hospital Departments to Improve Medical Records Search

Aggregating Evidence from Hospital Departments to Improve Medical Records Search Aggregating Evidence from Hospital Departments to Improve Medical Records Search NutLimsopatham 1,CraigMacdonald 2,andIadhOunis 2 School of Computing Science University of Glasgow G12 8QQ, Glasgow, UK

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives

A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives Xin Cao 1, Gao Cong 1, 2, Bin Cui 3, Christian S. Jensen 1 1 Department of Computer

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9 Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;

More information

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Shaghayegh Sahebi and Peter Brusilovsky Intelligent Systems Program University

More information

An Analysis of Factors Used in Search Engine Ranking

An Analysis of Factors Used in Search Engine Ranking An Analysis of Factors Used in Search Engine Ranking Albert Bifet 1 Carlos Castillo 2 Paul-Alexandru Chirita 3 Ingmar Weber 4 1 Technical University of Catalonia 2 University of Chile 3 L3S Research Center

More information

SUSHI: Scoring Scaled Samples for Server Selection

SUSHI: Scoring Scaled Samples for Server Selection : Scoring Scaled Samples for Server Selection Paul Thomas CSIRO Canberra, Australia paul.thomas@csiro.au Milad Shokouhi Microsoft Research Cambridge, UK milads@microsoft.com ABSTRACT Modern techniques

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track)

A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track) A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track) Linh Hoang, Seung-Wook Lee, Gumwon Hong, Joo-Young Lee, Hae-Chang Rim Natural Language Processing Lab., Korea University (linh,swlee,gwhong,jylee,rim)@nlp.korea.ac.kr

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Clinical Decision Support with the SPUD Language Model

Clinical Decision Support with the SPUD Language Model Clinical Decision Support with the SPUD Language Model Ronan Cummins The Computer Laboratory, University of Cambridge, UK ronan.cummins@cl.cam.ac.uk Abstract. In this paper we present the systems and techniques

More information

Axiomatic Analysis and Optimization of Information Retrieval Models

Axiomatic Analysis and Optimization of Information Retrieval Models Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang Zhai Dept. of Computer Science University of Illinois at Urbana Champaign USA http://www.cs.illinois.edu/homes/czhai Hui Fang

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

Top-k Retrieval using Facility Location Analysis

Top-k Retrieval using Facility Location Analysis Top-k Retrieval using Facility Location Analysis Guido Zuccon 1, Leif Azzopardi 1, Dell Zhang 2, and Jun Wang 3 {guido, leif}@dcs.gla.ac.uk, dell.z@ieee.org, j.wang@cs.ucl.ac.uk 1 School of Computing Science,

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com

More information

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,

More information

Improving Non-English Web Searching (inews07)

Improving Non-English Web Searching (inews07) SIGIR 2007 WORKSHOP REPORT Improving Non-English Web Searching (inews07) Fotis Lazarinis Technological Educational Institute Mesolonghi, Greece lazarinf@teimes.gr Jesus Vilares Ferro University of A Coruña

More information

2. EXPLICIT AND IMPLICIT FEEDBACK

2. EXPLICIT AND IMPLICIT FEEDBACK Comparison of Implicit and Explicit Feedback from an Online Music Recommendation Service Gawesh Jawaheer Gawesh.Jawaheer.1@city.ac.uk Martin Szomszor Martin.Szomszor.1@city.ac.uk Patty Kostkova Patty@soi.city.ac.uk

More information

Finding Advertising Keywords on Web Pages. Contextual Ads 101

Finding Advertising Keywords on Web Pages. Contextual Ads 101 Finding Advertising Keywords on Web Pages Scott Wen-tau Yih Joshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University Contextual Ads 101 Publisher s website Digital Camera Review The

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Information Retrieval Systems in XML Based Database A review

Information Retrieval Systems in XML Based Database A review Information Retrieval Systems in XML Based Database A review Preeti Pandey 1, L.S.Maurya 2 Research Scholar, IT Department, SRMSCET, Bareilly, India 1 Associate Professor, IT Department, SRMSCET, Bareilly,

More information

The Degree of Randomness in a Live Recommender Systems Evaluation

The Degree of Randomness in a Live Recommender Systems Evaluation The Degree of Randomness in a Live Recommender Systems Evaluation Gebrekirstos G. Gebremeskel and Arjen P. de Vries Information Access, CWI, Amsterdam, Science Park 123, 1098 XG Amsterdam, Netherlands

More information

Electronic Document Management Using Inverted Files System

Electronic Document Management Using Inverted Files System EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Effective Metadata for Social Book Search from a User Perspective

Effective Metadata for Social Book Search from a User Perspective Effective Metadata for Social Book Search from a User Perspective Hugo Huurdeman 1,2, Jaap Kamps 1,2,3, and Marijn Koolen 1 1 Institute for Logic, Language and Computation, University of Amsterdam 2 Archives

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Shopping for Top Forums: Discovering Online Discussion for Product Research

Shopping for Top Forums: Discovering Online Discussion for Product Research Shopping for Top Forums: Discovering Online Discussion for Product Research ABSTRACT Jonathan L. Elsas Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213 jelsas@cs.cmu.edu

More information

Word Clouds of Multiple Search Results

Word Clouds of Multiple Search Results Word Clouds of Multiple Search Results Rianne Kaptein 1 and Jaap Kamps 1,2 1 Archives and Information Studies, University of Amsterdam, the Netherlands 2 ISLA, Informatics Institute, University of Amsterdam,

More information

IQ Functional Skills Qualification in Mathematics at. Entry Level 1 Entry Level 2 Entry Level 3 Level 1 Level 2. Qualification Guide

IQ Functional Skills Qualification in Mathematics at. Entry Level 1 Entry Level 2 Entry Level 3 Level 1 Level 2. Qualification Guide IQ Functional Skills Qualification in Mathematics at Entry Level 1 Entry Level 2 Entry Level 3 Level 1 Level 2 Qualification Guide Version 2.0 Contents Gateway Qualifications and Industry Qualifications...

More information

EUR-Lex 2012 Data Extraction using Web Services

EUR-Lex 2012 Data Extraction using Web Services DOCUMENT HISTORY DOCUMENT HISTORY Version Release Date Description 0.01 24/01/2013 Initial draft 0.02 01/02/2013 Review 1.00 07/08/2013 Version 1.00 -v1.00.doc Page 2 of 17 TABLE OF CONTENTS 1 Introduction...

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

More information

An Information Retrieval System for Expert and Consumer Users

An Information Retrieval System for Expert and Consumer Users An Information Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis, Angelos Hliaoutakis Department of Electronic and Computer Engineering Technical University of Crete (TUC)

More information

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article

More information

Sponsored Search Ad Selection by Keyword Structure Analysis

Sponsored Search Ad Selection by Keyword Structure Analysis Sponsored Search Ad Selection by Keyword Structure Analysis Kai Hui 1, Bin Gao 2,BenHe 1,andTie-jianLuo 1 1 University of Chinese Academy of Sciences, Beijing, P.R. China huikai10@mails.ucas.ac.cn, {benhe,tjluo}@ucas.ac.cn

More information

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007 Information Retrieval Lecture 8 - Relevance feedback and query expansion Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 32 Introduction An information

More information

SIGIR 2004 Workshop: RIA and "Where can IR go from here?"

SIGIR 2004 Workshop: RIA and Where can IR go from here? SIGIR 2004 Workshop: RIA and "Where can IR go from here?" Donna Harman National Institute of Standards and Technology Gaithersburg, Maryland, 20899 donna.harman@nist.gov Chris Buckley Sabir Research, Inc.

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

More information

Entity Ranking as a Search Engine Front-End

Entity Ranking as a Search Engine Front-End 68 Entity Ranking as a Search Engine Front-End Alexandros Komninos Department of Computer Science University of York York, YO10 5GH, UK ak1153@york.ac.uk Avi Arampatzis Department of Electrical and Computer

More information

Property of Average Precision and its Generalization: An Examination of Evaluation Indicator for Information Retrieval Experiments

Property of Average Precision and its Generalization: An Examination of Evaluation Indicator for Information Retrieval Experiments ISSN 1346-5597 NII Technical Report Property of Average Precision and its Generalization: An Examination of Evaluation Indicator for Information Retrieval Experiments Kazuaki Kishida NII-2005-014E Oct.

More information

A survey on the use of relevance feedback for information access systems

A survey on the use of relevance feedback for information access systems A survey on the use of relevance feedback for information access systems Ian Ruthven Department of Computer and Information Sciences University of Strathclyde, Glasgow, G1 1XH. Ian.Ruthven@cis.strath.ac.uk

More information

Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School

More information

An Exploration of Ranking Heuristics in Mobile Local Search

An Exploration of Ranking Heuristics in Mobile Local Search An Exploration of Ranking Heuristics in Mobile Local Search ABSTRACT Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801, USA ylv2@uiuc.edu Users increasingly

More information

Using Wikipedia to Translate OOV Terms on MLIR

Using Wikipedia to Translate OOV Terms on MLIR Using to Translate OOV Terms on MLIR Chen-Yu Su, Tien-Chien Lin and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University of Technology Taichung County 41349, TAIWAN

More information

Question Answering for Dutch: Simple does it

Question Answering for Dutch: Simple does it Question Answering for Dutch: Simple does it Arjen Hoekstra Djoerd Hiemstra Paul van der Vet Theo Huibers Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, P.O.

More information

The Distribution Rules Guide

The Distribution Rules Guide The PPL licenses recorded music played in public or broadcast and then distributes the licence fees to its performer and recording rightholder members. PPL s sister company VPL licenses music videos played

More information

Who is in charge: Corporate Communications or Corporate Marketing?

Who is in charge: Corporate Communications or Corporate Marketing? Who is in charge: Corporate or Corporate Marketing? A European survey amongst the top reputation leading companies i Markus Will, Malte Probst and Thomas Schmidt, ii Centre for Corporate, mcm institute

More information

Institute of Chartered Accountants Ghana (ICAG) Paper 2.2 Management Accounting

Institute of Chartered Accountants Ghana (ICAG) Paper 2.2 Management Accounting Institute of Chartered Accountants Ghana (ICAG) Paper. Management Accounting Final Mock Exam Marking scheme and suggested solutions DO NOT TURN THIS PAGE UNTIL YOU HAVE COMPLETED THE MOCK EXAM ii Management

More information

Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

More information

Anotaciones semánticas: unidades de busqueda del futuro?

Anotaciones semánticas: unidades de busqueda del futuro? Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding

More information

Named Entity Recognition in Broadcast News Using Similar Written Texts

Named Entity Recognition in Broadcast News Using Similar Written Texts Named Entity Recognition in Broadcast News Using Similar Written Texts Niraj Shrestha Ivan Vulić KU Leuven, Belgium KU Leuven, Belgium niraj.shrestha@cs.kuleuven.be ivan.vulic@@cs.kuleuven.be Abstract

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

Query Recommendation employing Query Logs in Search Optimization

Query Recommendation employing Query Logs in Search Optimization 1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish

More information

Software Defect Prediction Modeling

Software Defect Prediction Modeling Software Defect Prediction Modeling Burak Turhan Department of Computer Engineering, Bogazici University turhanb@boun.edu.tr Abstract Defect predictors are helpful tools for project managers and developers.

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

The University of Amsterdam s Question Answering System at QA@CLEF 2007

The University of Amsterdam s Question Answering System at QA@CLEF 2007 The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,

More information

A Pseudo Nearest-Neighbor Approach for Missing Data Recovery on Gaussian Random Data Sets

A Pseudo Nearest-Neighbor Approach for Missing Data Recovery on Gaussian Random Data Sets University of Nebraska at Omaha DigitalCommons@UNO Computer Science Faculty Publications Department of Computer Science -2002 A Pseudo Nearest-Neighbor Approach for Missing Data Recovery on Gaussian Random

More information

How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic Approach*

How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic Approach* Int. J. Engng Ed. Vol. 22, No. 6, pp. 1281±1286, 2006 0949-149X/91 $3.00+0.00 Printed in Great Britain. # 2006 TEMPUS Publications. How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic

More information

Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Developing a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value

Developing a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value , pp. 397-408 http://dx.doi.org/10.14257/ijmue.2014.9.11.38 Developing a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value Mohannad Al-Mousa 1

More information