TEMPER : A Temporal Relevance Feedback Method

Transcription

1 TEMPER : A Temporal Relevance Feedback Method Mostafa Keikha, Shima Gerani and Fabio Crestani {mostafa.keikha, shima.gerani, fabio.crestani}@usi.ch University of Lugano, Lugano, Switzerland Abstract. The goal of a blog distillation (blog feed search) method is to rank blogs according to their recurrent relevance to the query. An interesting property of blog distillation which differentiates it from traditional retrieval tasks is its dependency on time. In this paper we investigate the effect of time dependency in query expansion. We propose a framework, TEMPER, which selects different terms for different times and ranks blogs according to their relevancy to the query over time. By generating multiple expanded queries based on time, we are able to capture the dynamics of the topic both in aspects and vocabulary usage. We show performance gains over the baseline techniques which generate a single expanded query using the top retrieved posts or blogs irrespective of time. 1 Introduction User generated content is growing very fast and becoming one of the most important sources of information on the Web. Blogs are one of the main sources of information in this category. Millions of people write about their experiences and express their opinions in blogs everyday. Considering this huge amount of user generated data and its specific properties, designing new retrieval methods is necessary to facilitate addressing different types of information needs that blog users may have. Users information needs in blogosphere are different from those of general Web users. Mishne and de Rijke [1] analyzed a blog query log and accordingly they divided blog queries into two broad categories called context and concept queries. In context queries users are looking for contexts of blogs in which a Named Entity occurred to find out what bloggers say about it, whereas in concept queries they are looking for blogs which deal with one of searcher s topics of interest. In this paper we focus on the blog distillation task (also known as blog feed search) 1 where the goal is to answer topics from the second category [2]. Blog distillation is concerned with ranking blogs according to their recurring central interest to the topic of a user s query. In other words, our aim is to discover relevant blogs for each topic 2 that a user can add to his reader and read them in future [3]. 1 In this paper we use words feed and blog interchangeably 2 In this paper we use words topic and query interchangeably

2 An important aspect of blog distillation, which differentiates it from other IR tasks, is related to the temporal properties of blogs and topics. Distillation topics are often multifaceted and can be discussed from different perspectives [4]. Vocabulary usage in the relevant documents to a topic can change over time in order to express different aspects (or sub-topics) of the query. These dynamics might create term mismatch problem during the time, such that a query term may not be a good indicator of the query topic in all different time intervals. In order to address this problem, we propose a time-based query expansion method which expands queries with different terms at different times. This contrasts other applied query expansion methods in blog search where they generate only one single query in the expansion phase [5, 4]. Our experiments on different test collections and different baseline methods indicate that time-base query expansion is effective in improving the retrieval performance and can outperform existing techniques. The rest of the paper is organized as follows. In section 2 we review state of the art methods in blog retrieval. Section 3 describes existing query expansion methods for blog retrieval in more detail. Section 4 explains our time-based query expansion approach. Experimental results over different blog data sets are discussed in section 6. Finally, we conclude the paper and describe future work in section 7. 2 Related Work The main research on the blog distillation started after 2007, when the TREC organizers proposed this task in the blog track [3]. Researchers have applied different methods from areas that are similar to blog distillation, like ad-hoc search, expert search and resource selection in distributed information retrieval. The most simple models use ad-hoc search methods for finding relevant blogs to a specific topic. They treat each blog as one long document created by concatenating all of its posts together [6, 4, 7]. These methods ignore any specific property of blogs and mostly use standard IR techniques to rank blogs. Despite their simplicity, these methods perform fairly well in blog retrieval. Some other approaches have been applied from expert search methods in blog retrieval [8, 2]. In these models, each post in a blog is seen as evidence that the blog has an interest in the query topic. In [2], MacDonald et al. use data fusion models to combine this evidence and compute a final relevance score for the blog, while Balog et al. adapt two language modeling approaches of expert finding and show their effectiveness in blog distillation [8]. Resource selection methods from distributed information retrieval have been also applied to blog retrieval [4, 9, 7]. Elsas et al. deal with blog distillation as a recourse selection problem [4, 9]. They model each blog as a collection of posts and use a Language Modeling approach to select the best collection. A similar approach is proposed by Seo and Croft [7], which they call Pseudo Cluster Selection. They create topic-based clusters of posts in each blog and select blogs that have the most similar clusters to the query.

3 Temporal properties of posts have been considered in different ways in blog retrieval. Nunes et al. define two new measures called temporal span and temporal dispersion to evaluate how long and how frequently a blog has been writing about a topic [10]. Similarly Macdonald and Ounis [2] use a heuristic measure to capture the recurring interests of blogs over time. Some other approaches give higher scores to more recent posts before aggregating them [11, 12]. All these proposed methods and their improvements show the importance and usefulness of temporal information in blog retrieval. However, none of the mentioned methods investigates the effect of time on the vocabulary change for a topic. We employ the temporal information as a source to distinguish between different aspects of topic and terms that are used for each aspect. This leads us to a time-based query expansion method where we generate mutliple expanded queries to cover multiple aspects of a topic over time. Different query expansion possibilities for blog retrieval have been explored by Elsas et al. [4] and Lee et al. [5]. Since we use these methods as our baselines, we will discuss them in more detail in the next section. 3 Query Expansion in Blog Retrieval Query expansion is known to be effective in improving the performance of the retrieval systems [13 15]. In general the idea is to add more terms to an initial query in order to disambiguate the query and solve the possible term mismatch problem between the query and the relevant documents. Automatic Query Expansion techniques usually assume that top retrieved documents are relevant to the topic and use their content to generate an expanded query. In some situations, it has been shown that it is better to have multiple expanded queries as apposed to the usual single query expansion, for example in server-based query expansion technique in distributed information retrieval [16]. An expanded query, while being relevant to the original query, should have as much coverage as possible on all aspects of the query. If the expanded query is very specific to some aspect of the original query, we will miss part of the relevant documents in the re-ranking phase. In blog search context, where queries are more general than normal web search queries [4], the coverage of the expanded query gets even more important. Thus in this condition, it might be better to have multiple queries where each one covers different aspects of a general query. Elsas et al. made the first investigation on the query expansion techniques for blog search [4]. They show that normal feedback methods (selecting the new terms from top retrieved posts or top retrieved blogs) using the usual parameter settings is not effective in blog retrieval. However, they show that expanding query using an external resource like Wikipedia can improve the performance of the system. In a more recent work, Lee et al. [5] propose new methods for selecting appropriate posts as the source of expansion and show that these methods can be effective in retrieval. All these proposed methods can be summarized as follows:

4 Top Feeds: Uses all the posts of the top retrieved feeds for the query expansion. This model has two parameters including number of selected feeds and number of the terms in the expanded query [4]. Top Posts: Uses the top retrieved posts for the query expansion. Number of the selected posts and number of the terms to use for expansion are the parameters of this model [4]. FFBS: Uses the top posts in the top retrieved feeds as the source for selecting the new terms. Number of the selected posts from each feed is fixed among different feeds. This model has three parameters; number of the selected feeds, number of the selected posts in each feed and number of the selected terms for the expansion [5]. WFBS: Works the same as FFBS. The only difference is that number of the selected posts for each feed depends on the feed rank in the initial list, such that more relevant feeds contribute more in generating the new query. Like FFBS, WFBS has also three parameters that are number of the selected feeds, total number of the posts to be used in the expansion and number of the selected terms [5]. Among the mentioned methods, Top Feeds method has the possibility to expand the query with non-relevant terms. The reason is that all the posts in a top retrieved feed are not necessarily relevant to the topic. On the other hand, Top Posts method might not have enough coverage on all the subtopics of the query, because the top retrieved posts might be mainly relevant to some dominant aspect of the query. FFBS and WFBS methods were originally proposed in order to have more coverage than the Top Posts method while selecting more relevant terms than the Top Feeds method [5]. However, since it is difficult to summarize all the aspects of the topic in one single expanded query, these methods would not have the maximum possible coverage. 4 TEMPER In this section we describe our novel framework for time-based relevance feedback in blog distillation called TEMPER. TEMPER assumes that posts at different times talk about different aspects (sub-topics) of a general topic. Therefore, vocabulary usage for the topic is time-dependant and this dependancy can be considered in a relevance feedback method. Following this intuition, TEMPER selects time-dependent terms for query expansion and generated one query for each time point. We can summarize the TEMPER framework in the following 3 steps: 1. Time-based representation of blogs and queries 2. Time-based similarity between a blogs and a query 3. Ranking blogs according to the their overall similarity to the query. In the remainder of this section, we describe our approach in fulfilling each of these steps.

5 4.1 Time-Based Representation of Blogs and Queries Initial Representation of Blogs and Queries In order to consider time in the TEMPER framework, we first need to represent blogs and queries in the time space. For a blog representation, we distribute its posts based on their publish date. In order to have a daily representation of the blog, we concatenate all the posts that have the same date. For a query representation, we take advantage of the top retrieved posts for the query. Same as blog representation, we select the top K relevant posts for the query and divide them based on their publish date while concatenating posts with the same date. In order to have a more informative representation of the query, we select the top N terms for each day using the KL-divergence between the term distribution of the day and the whole collection [17]. Note that in the initial representation, there can be days that do not have any term distribution associated with them. However, in order to calculate the relevance of a blog to a query, TEMPER needs to have the representation of the blog and query in all the days. We employ the available information in the initial representation to estimate the term distributions for the rest of the days. In the rest of this section, we explain our method for estimating these representations. Term Distributions Over Time TEMPER generates a representation for each topic or blog for each day based on the idea that a term at each time position propagates its count to the other time positions through a proximity-based density function. By doing so, we can have a virtual document for a blog/topic at each specific time position. The term frequencies of such a document is calculated as follows: T tf (t, d, i) = tf(t, d, j)k(i, j) (1) j=1 where i and j indicate time position (day) in the time space. T denotes the time span of the collection. tf shows the term frequency of term t in blog/topic d at day i and it is calculated based on the frequency of t in all days. K(i, j) decreases as the distance between i and j increases and can be calculated using kernel functions that we describe later. The proposed representation of document in the time space is similar to the proximity-based method where they generate a virtual document at each position of the document in order to capture the proximity of the words [18, 19]. However, here we aim to capture the temporal proximity of terms. In this paper we employ the laplace kernel function which has been shown to be effective in a previous work [19] together with the Rectangular (square) kernel function. In the following formulas, we present normalized kernel functions with their corresponding variance formula.

6 1. Laplace Kernel k(i, j) = 1 [ ] i j 2b exp b (2) where σ 2 = 2b 2 2. Rectangular Kernel k(i, j) = { 1 2a if i j a 0 otherwise where σ 2 = a2 3 (3) 4.2 Time-Based Similarity Measure By having the daily representation of queries and blogs, we can calculate the daily similarity between these two representations and create a daily similarity vector for the blog and the query. The final similarity between the blog and the query is then calculated by summing over the daily similarities: sim temporal (B, Q) = T sim(b, Q, i) (4) where sim(b i, Q i ) shows the similarity between a blog and a query representation at day i and T shows the time span of the collection in days. Another popular method in time series similarity calculation is to see each time point as one dimension in the time space and use the euclidian length of the daily similarity vector as the final similarity between the two representations [20]: sim temporal (B, Q) = T sim(b, Q, i) 2 (5) We use the cosine similarity as a simple and effective similarity measure for calculating similarity between the blog and the topic representations at the specific day i: w tf(w, B, i) tf(w, Q, i) sim(b, Q, i) = w tf(w, B, i)2 w tf(w, Q, (6) i)2 The normalized value of the temporal similarity over all blogs is then used as P temporal. sim temporal (B, Q) P temporal (B Q) = B sim temporal(b (7), Q) Finally in order to take advantage of all the available evidence regarding the blog relevance, we interpolate the temporal score of the blog with its initial relevance score. i=1 i=1

7 Table 1. Effect of cleaning the data set on Blogger Model. Statistically significant improvements at the 0.05 level is indicated by. Model Cleaned MAP Bpref BloggrModel No BloggrModel Yes P (B Q) = αp initial (B Q) + (1 α)p temporal (B Q) (8) where α is a parameter that controls the amount of temporal relevance that is considered in the model. We use the Blogger Model method for the initial ranking of the blogs [8]. The only difference with the original Blogger Model is that we set the prior of a blog to be proportional to the log of the number of its posts, as opposed to the uniform prior that was used in the original Blogger Model. This log-based prior has been used and shown to be effective by Elsas et al. [4]. 5 Experimental Setup In this section we first explain our experimental setup for evaluating the effectiveness of the proposed framework. Collection and Topics We conduct our experiments over three years worth of TREC blog track data from the blog distillation task, including TREC 07, TREC 08 and TREC 09 data sets. The TREC 07 and TREC 08 data sets include 45 and 50 assessed queries respectively and use Blog06 collection. The TREC 09 data set uses Blog08, a new collection of blogs, and has 39 new queries 3 We use only the title of the topics as the queries. The Blogs06 collection is a crawl of about one hundred thousand blogs over an 11-weeks period [22], and includes blog posts (permalinks), feed, and homepage for each blog. Blog08 is a collection of about one million blogs crawled over a year with the same structure as Blog06 collection [21]. In our experiments we only use the permalinks component of the collection, which consist of approximately 3.2 million documents for Blog06 and about 28.4 million documents for Blog08. We use the Terrier Information Retrieval system 4 to index the collection with the default stemming and stopwords removal. The Language Modeling approach using the dirichlet-smoothing has been used to score the posts and retrieve top posts for each query. 3 Initially there were 50 queries in TREC 2009 data set but some of them did not have relevant blogs for the selected facets and are removed in the official query set [21]. We do not use of the facets in this paper however we use the official query set to be able to compare with the TREC results. 4

8 Table 2. Evaluation results for the implemented models over TREC09 data set. BloggerModel TopFeeds TopPosts FFBS WFBS TEMPER-Rectangular-Sum TEMPER-Rectangular-Euclidian TEMPER-Laplace-Sum TEMPER-Laplace-Euclidian Retrieval Baselines We perform our feedback methods on the results of the Blogger Model method [8]. Therefore, Blogger Model is the first baseline against which, we will compare the performance of our proposed methods. The second set of baselines are the query expansion methods proposed in previous works [4, 5]. In order to have a fair comparison, we implemented the mentioned query expansion methods on top of Blogger Model. We tuned the parameters of these models using 10-fold cross validation in order to maximize MAP. The last set of baselines are provided by TREC organizers as part of the blog facet distillation task. We use these baselines to see the effect of TEMPER in re-ranking the results of other retrieval systems. Evaluation We used the blog distillation relevance judgements provided by TREC for evaluation. We report the Mean Average Precision (MAP) as well as binary Preference (bpref), and Precision at 10 documents (P@10). Throughout our experiments we use the Wilcoxon signed ranked matched pairs test with a confidence level of 0.05 level for testing statistical significant improvements. 6 Experimental Results In this section we explain the experiments that we conducted in order to evaluate the usefulness of the proposed method. We mainly focus on the results of TREC09 data set, as it is the most recent data set and has enough temporal information which is an important feature for our analysis. However, in order to see the effect of the method on the smaller collections, we briefly report the final results on the TREC07 and TREC08 data sets. Table 1 shows the evaluation results of Blogger Model on TREC09 data set. Because of the blog data being highly noisy, we carry out a cleaning step on the collection in order to improve the overall performance of the system. We use the cleaning method proposed by Parapar et al. [23]. As we can see in Table 1, cleaning the collection is very useful and improves the MAP of the system about 14%. We can see that the results of Blogger Model on the cleaned data is already better than the best TREC09 submission on the title-only queries.

9 Table 3. Evaluation results for the implemented models over TREC08 data set. BloggerModel TopPosts WFBS TEMPER-Laplace-Euclidian Table 4. Evaluation results for the implemented models over TREC07 data set. BloggerModel TopPosts WFBS TEMPER-Laplace-Euclidian Table 2 summarizes retrieval performance of Blogger Model and the baseline query expansion methods along with different settings of TEMPER on the TREC 2009 data set. The best value in each column is bold face. A dag( ), a ddag( ) and a star( ) indicate statistically significant improvement over Blogger Model, TopPosts and WFBS respectively. As can be seen from the table, none of the query expansion baselines improves the underlying Blogger Model significantly. From table 2 we can see that TEMPER with different settings (using rectangular/laplace kernel, sum/euclidean similarity method) improves Blogger Model and the query expansion methods significantly. These results show the effectiveness of time-based representation of blogs and query and highlights the importance of time-based similarity calculation of blogs and topics. In tables 3 and 4 we present similar results over TREC08 and TREC07 data sets. Over the TREC08 dataset, it can be seen that TEMPER improves Blogger Model and different query expansion methods significantly. Over the TREC07 dataset, TEMPER improves Blogger Model significantly. However, the performance of TEMPER is comparable with the other query expansion methods and the difference is not statistically significant. As it was mentioned in section 5, we also consider the three standard baselines provided by TREC10 organizers in order to see the effect of our proposed feedback method on retrieval baselines other than Blogger Model. Table 8 shows the results of TEMPER over the TREC baselines. It can be seen that TEMPER improves the baselines in most of the cases. The only baseline that TEMPER does not improve significantly is stdbaseline1 5. Tables 5, 6 and 7 show the performance of TEMPER compared to the best title-only TREC runs in 2009, 2008 and 2007 respectively. It can be seen from the tables that TEMPER is performing better than the best TREC runs over the TREC09 dataset. The results over the TREC08 and TREC07 are comparable 5 Note that the stdbaslines are used as blackbox and we are not yet aware of the underlying method

10 Table 5. Comparison with the best TREC09 title-only submissions. TEMPER-Laplace-Euclidian TREC09-rank1 (buptpris 2009) TREC09-rank2 (ICTNET) TREC09-rank3 (USI) Table 6. Comparison with the best TREC08 title-only submissions. TEMPER-Laplace-Euclidian TREC08-rank2 (CMU-LTI-DIR) TREC08-rank1 (KLE) TREC08-rank3 (UAms) to the best TREC runs and can be considered as the third and second best reported results over TREC08 and TREC07 datasets respectively. TEMPER has four parameters including : number of the posts selected for expansion, number of the terms that are selected for each day, standard deviation (σ) of the kernel functions and α as the weight of the initial ranking score. Among these parameters, we fix number of the terms for each day to be 50, as used in a previous work [4]. Standard deviation of the kernel function is estimated using top retrieve posts for each query. Since the goal of the kernel function is to model the distribution of distance between two consequent relevant posts, we assume the distances between selected posts (top retrieved posts) as the samples of this distribution. We then use the standard deviation of the sample as an estimation for σ. The other two parameters are tuned using 10-fold cross validation method. Figure 1 and 2 show sensitivity of the system to these parameters. It can be seen that the best performance is gained by selecting about 150 posts for expansion while any number more than 50 gives a reasonable result. The value of α depends on the underneath retrieval model. We can see that TEMPER outperforms Blogger Model for all values of α and the best value is about Conclusion and Future Work In this paper we investigated blog distillation where the goal is to rank blogs according to their recurrent relevance to the topic of the query. We focused on the temporal properties of blogs and its application in query expansion for blog retrieval. Following the intuition that term distribution for a topic might change over time, we propose a time-based query expansion technique. We showed that it is effective to have multiple expanded queries for different time points and score the posts of each time using the corresponding expanded query. Our experiments on different blog collections and different baseline methods showed that this method can improve the state of the art query expansion techniques.

11 Table 7. Comparison with the best TREC07 title-only submissions. TEMPER-Laplace-Euclidian TREC07-rank1 (CMU) TREC07-rank2 (UGlasgow) TREC07-rank3 (UMass) Table 8. Evaluation results for the standard baselines on TREC09 data set. Statistically significant improvements are indicated by. stdbaseline TEMPER-stdBaseline stdbaseline TEMPER-stdBaseline stdbaseline TEMPER-stdBaseline Future work will involve more analysis on temporal properties of blogs and topics. In particular, modeling the evolution of topics over time can help us to better estimate the topics relevance models. This modeling over time can be seen as a temporal relevance model which is an unexplored problem in blog retrieval. 8 Acknowledgement This work was supported by Swiss National Science Foundation (SNSF) as XMI project (ProjectNr /1). References 1. Mishne, G., de Rijke, M.: A study of blog search. In: Proceedings of ECIR (2006) Macdonald, C., Ounis, I.: Key blog distillation: ranking aggregates. In: Proceedings of CIKM (2008) Macdonald, C., Ounis, I., Soboroff, I.: Overview of the trec-2007 blog track. In: Proceedings of TREC (2008) 4. Elsas, J.L., Arguello, J., Callan, J., Carbonell, J.G.: Retrieval and feedback models for blog feed search. In: Proceedings of SIGIR (2008) Lee, Y., Na, S.H., Lee, J.H.: An improved feedback approach using relevant local posts for blog feed retrieval. In: Proceedings of CIKM (2009) Efron, M., Turnbull, D., Ovalle, C.: University of Texas School of Information at TREC In: Proceedings of TREC (2008) 7. Seo, J., Croft, W.B.: Blog site search using resource selection. In: Proceedings of CIKM 2008, New York, NY, USA, ACM (2008) Balog, K., de Rijke, M., Weerkamp, W.: Bloggers as experts: feed distillation using expert retrieval models. In: Proceedings of SIGIR (2008) Arguello, J., Elsas, J., Callan, J., Carbonell, J.: Document representation and query expansion models for blog recommendation. In: Proceedings of ICWSM (2008)

12 MAP TEMPER Number of the posts MAP TEMPER Blogger Model Alpha Fig. 1. Effect of number of the posts used for expansion on the performance of TEMPER. Fig. 2. Effect of alpha on the performance of TEMPER. 10. Nunes, S., Ribeiro, C., David, G.: Feup at trec 2008 blog track: Using temporal evidence for ranking and feed distillation. In: Proceedings of TREC (2009) 11. Ernsting, B., Weerkamp, W., de Rijke, M.: Language modeling approaches to blog postand feed finding. In: Proceedings of TREC (2007) 12. Weerkamp, W., Balog, K., de Rijke, M.: Finding key bloggers, one post at a time. In: Proceedings of ECAI (2008) Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of SIGIR 2008, New York, NY, USA, ACM (2008) Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of SIGIR 2001, New York, NY, USA, ACM (2001) Salton, G.: The SMART Retrieval System Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1971) 16. Shokouhi, M., Azzopardi, L., Thomas, P.: Effective query expansion for federated search. In: Proceedings of SIGIR 2009, New York, NY, USA, ACM (2009) Zhai, C., Lafferty, J.D.: Model-based feedback in the language modeling approach to information retrieval. (2001) Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings SIGIR 09. (2009) Gerani, S., Carman, M.J., Crestani, F.: Proximity-based opinion retrieval. In: Proceedings of SIGIR 10. (2010) Keogh, E.J., Pazzani, M.J.: Relevance feedback retrieval of time series data. In: Proceeding of SIGIR (1999) Macdonald, C., Ounis, I., Soboroff, I.: Overview of the TREC-2009 Blog Track. In: Proceedings of TREC (2009) 22. Macdonald, C., Ounis, I.: The TREC Blogs06 collection: Creating and analysing a blog test collection. Department of Computer Science, University of Glasgow Tech Report TR (2006) 23. Parapar, J., López-Castro, J., Barreiro, Á.: Blog Posts and Comments Extraction and Impact on Retrieval Effectiveness. In: Proceeding of Spanish Conference on Information Retrieval (2010)