Ranking Community Answers by Modeling QuestionAnswer Relationships via Analogical Reasoning


 Ralph Wheeler
 1 years ago
 Views:
Transcription
1 Ranking Community Answers by Modeling QuestionAnswer Relationships via Analogial Reasoning XinJing Wang Mirosoft Researh Asia 4F Sigma, 49 Zhihun Road Beijing, P.R.China Xudong Tu,Dan Feng Huazhong Si.&Teh. Univ Luoyu Road, Wu Han Hu Bei, P.R.China Lei Zhang Mirosoft Researh Asia 4F Sigma, 49 Zhihun Road Beijing, P.R.China ABSTRACT The method of finding highquality answers has signifiant impat on user satisfation in ommunity question answering systems. However, due to the lexial gap between questions and answers as well as spam typially existing in usergenerated ontent, filtering and ranking answers is very hallenging. Previous solutions mainly fous on generating redundant features, or finding textual lues using mahine learning tehniques; none of them ever onsider questions and their answers as relational data but instead model them as independent information. Moreover, they only onsider the answers of the urrent question, and ignore any previous knowledge that would be helpful to bridge the lexial and semanti gap. We assume that answers are onneted to their questions with various types of latent links, i.e. positive links indiating highquality answers, negative links indiating inorret answers or usergenerated spam, and propose an analogial reasoningbased approah whih measures the analogy between the new questionanswer linkages and those of previous relevant knowledge whih ontains only positive links; the andidate answer whih has the most analogous link is assumed to be the best answer. We onduted experiments based on 29.8 million Yahoo!Answer questionanswer threads and showed the effetiveness of our approah. Categories and Subjet Desriptors H.3.3 [Information Storage and Retrieval]: Information Searh and Retrieval information filtering, searh proess; G.3 [Probability and Statistis]: orrelation and regression analysis; H.3.5 [Information Storage and Retrieval]: Online Information Servies webbased servies General Terms Algorithms, Experimentation This work was performed in MSRA. Permission to make digital or hard opies of all or part of this work for personal or lassroom use is granted without fee provided that opies are not made or distributed for profit or ommerial advantage and that opies bear this notie and the full itation on the first page. To opy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speifi permission and/or a fee. SIGIR 09, July 19 23, 2009, Boston, Massahusetts, USA. Copyright 2009 ACM /09/07...$5.00. Keywords Community Question Answering, Analogial Reasoning, Probabilisti Relational Modeling, Ranking 1. ITRODUCTIO Usergenerated ontent (UGC) is one of the fastestgrowing areas of the use of the Internet. Suh ontent inludes soial question answering, soial book marking, soial networking, soial video sharing, soial photo sharing, et. UGC web sites or portals providing these servies not only onnet users diretly to their information needs, but also hange everyday users from ontent onsumers to ontent reators. One type of UGC portals that has beome very popular in reent years is the ommunity questionanswering (CQA) sites, whih have attrated a large number of users both seeking and providing answers to a variety of questions on diverse subjets. For example, by Deember, 2007, one popular CQA site, Yahoo!Answers, had attrated 120 million users worldwide, and had 400 million answers to questions available [21]. A typial harateristi of suh sites is that they allow anyone to post or answer any questions on any subjet, whih intuitively results in high variane in the quality of answers, e.g. UGC data typially has many ontent spam produed for fun or for profit. Thus, the ability, or inability, to obtain a highquality answer has signifiant impat on user satisfation [28]. Distinguishing highquality answers from others in CQA sites is not a trivial task, e.g. a lexial gap typially exists between a question and its highquality answers. The lexial gap in ommunity questions and answers is aused by at least two fators: (1) textual mismath between questions and answers; and (2) usergenerated spam or flippant answers. In the first ase, questions and answers are generally short, and the words that appear in a question are not neessarily repeated in its highquality answers. Moreover, a word itself an be ambiguous or have multiple meanings, e.g., apple an either refer to apple omputer or apple the fruit. Meanwhile, the same onept an be desribed with different words, e.g. ar and automobile. In the seond ase, usergenerated spam and flippant answers usually have a negative effet and greatly inrease the number of answers, thereby make it diffiult to identify the highquality ones. Figure 1 gives several examples of the lexial gap between questions and answers. To bridge the lexial gap for better answer ranking, various tehniques have been proposed. Conventional teh 179
2 niques for filtering answers primarily fous on generating omplementary features provided by highly strutured CQA sites [1, 4, 5, 14, 22], or finding textual lues using mahinelearning tehniques [2, 3, 19, 20], or identifying user authority via graphbased link analysis whih assumes that authoritative users tend to generate highquality answers [17]. There are two disadvantages of these work. Firstly, only the answers of the new question are taken into onsideration. Intuitively it suffers greatly from the word ambiguity sine questioners and answerers may use different words to desribe the same objets (e.g. ar and automobile ). Seondly, questions and answers are assumed as independent information resoures and their impliit orrelations are ignored. We argue that questions and answers are relational. Reall the traditional QA approahes based on natural language proessing (LP) tehniques [25] whih attempted to disover natural language properties suh as targeted answer format, targeted answer partofspeeh, et., and used them as lues to figure out right answers. These are examples of impliit semanti lues, whih is more valuable for right answer detetion than the lexial lues suggested by terms. We address these two problems in this study. In the offline stage, a large arhive of questions and their answers are olleted as a knowledge base. We then assume that there are various types of latent linkages between questions and answers, e.g. the highquality answers are onneted to their questions via semanti links, the spam or flippant answers are via noisy links, while lowquality or inorret answers are through inferior links, et. We denote the links assoiated with highquality answers as positive links and the rest as negative ones and train a Bayesian logisti regression model for link predition. By this means we are able to expliitly model the impliit qa relationships. In the online stage, given a new question, we retrieve a set of relevant qa pairs from the knowledge base with the hope that they would over the voabulary of the new question and its answers. These qa pairs onstrut the prior knowledge on a similar topi to help bridge the lexial gap. Moreover, to disover the semanti lues, instead of prediting diretly based on the new question and its answers, we measure the analogy of a andidate linkage to the links embedded in the prior knowledge, and the most analogous link indiates the best answer. This is what we all analogial reasoning(ar). Intuitively, taking an ideal ase as an example, if the rawled knowledge base ontains all questions in the world, to identify the best answer for a new question, we an just find the dupliate question in the knowledge base and use its best answer to evaluate a andidate answer s quality. However, sine it is impratial to obtain all available questions as new questions are being submitted every day, we an instead searh for a set of previously answered questions that best math the new question aording to some speified metris and disover analogous relationships between them. Intuitively, the retrieved similar questions and their bestanswers are more likely to have words in ommon with the orret answers to the new question. Moreover, the retrieved set provides the knowledge of how the questions similar to the new question were answered, while the way of answering an be regarded as a kind of positive linkage between a question and its best answer. We rawled 29.8 million Yahoo!Answers questions to evaluate our proposed approah. Eah question has about 15 answers on average. We used 100,000 qa pairs to train Q: How do you pronoune the Congolese ity Kinshasa? A: Kin as in your family sha as in sharp sa as in sargent. Q: What is the minimum positive real number in Matlab? A: Your IQ. Q: How will the sun enigma affet the earth? A: Only God truly knows the mysteries of the sun, the universe and the heavens. They are His serets! Figure 1: Real examples from Yahoo!Answers, whih suggest the lexial gap between questions and answers. the link predition model and tested the entire proess with about 200,000 qa pairs. We ompared our method with two widely adopted information retrieval metris and one stateoftheart method, and ahieved signifiant performane improvement in terms of average preision and mean reiproal rank. These experimental results suggest that taking the struture of relational data into aount is very helpful in the noisy environment of a CQA site. Moreover, the idea of leveraging previous knowledge to bridge the lexial gap and ranking a doument with ommunity intelligene is more effetive than traditional approahes whih only rely on individual intelligene. The paper is organized as follows. In Setion 2, the most related work is disussed, whih overs the stateoftheart approahes on ommunitydriven answer ranking. In Setion 3, we detail the proposed approah. We evaluate its performane in Setion 4, and disuss several fators whih affet the ranking performane. We onlude our paper in Setion 5 with disussions on future work. 2. RELATED WORK 2.1 Community Answer Quality Ranking Different from traditional QA systems whose goal is to automatially generate an answer for the question of interest [25, 16], the goal of ommunity answer ranking, however, is to identify from a losed list of andidate answers one or more that semantially answers the orresponding question. Sine CQA sites have rih strutures, e.g. question/answer assoiations, useruser interations, user votes, et., they offer more publily available information than traditional QA domains. Jeon et al. [14] extrated a number of nontextual features whih over the ontextual information of questions and their answers, and proposed a language modelingbased retrieval model for proessing these features in order to predit the quality of answers olleted from a speifi CQA servie. Agihtein and his olleagues [1, 23] made great efforts on finding powerful features inluding strutural, textual, and ommunity features, and proposed a general lassifiation framework for ombining these heterogeneous features. Meanwhile, in addition to identifying answer quality, they also evaluated question quality and users satisfation. Blooma et al. [5] proposed more features, textual and nontextual, and used regression analyzers to generate preditive features for the best answer identifiation. Some researhers resorted to mahine learning tehniques. Ko et al. [19] proposed a unified probabilisti answer rank 180
3 Figure 2: Sketh of the ARbased approah: 1)given a question Q (green blok), some previous positive qa pairs are retrieved (red dots in dotted blue irle); 2) eah andidate qa pair is sored aording to how well it fits into the previous knowledge w.r.t. their linkages properties. The magenta blok highlights the highestsored qa pair andidate, whih is assumed to ontain the best answer. ing model to simultaneously address the answer relevane and answer similarity problems. The model used logisti regression to estimate the probability that a andidate answer is orret given its relevane to the supporting evidene. A disadvantage of this model is that it onsidered eah andidate answer separately. To solve this problem, the authors improved their solution and proposed a probabilisti graphial model to take into aount the orrelation of andidate answers [20]. It estimated the joint probability of the orretness of all answers, from whih the probability of orretness of an individual answer an be inferred, and the performane was improved. Bian et al. [3] presented the GBRank algorithm whih utilizes users interations to retrieve relevant highquality ontent in soial media. It explored the mehanism to integrate relevane, user interation, and ommunity feedbak information to find the right fatual, wellformed ontent to answer a user s question. Then they improved the approah by expliitly onsidering the effet of maliious users interations, so that the ranking algorithm is more resilient to vote manipulation or shilling [2]. Instead of diretly solving the answer ranking problem, some researhers proposed to find experts in ommunities with the assumption that authoritative users tend to produe high quality ontent. For example, Jurzyk et al. [17] adapted the HITS [18] algorithm to a CQA portal. They ran this algorithm on the useranswer graph extrated from online forums and showed promising results. Zhang et al. [30] further proposed ExpertiseRank to identify users with high expertise. They found that the expertise network is highly orrelated to answer quality. 2.2 Probabilisti Relational Modeling Probabilisti Relational Models [11] are Bayesian etworks whih simultaneously onsider the onepts of objets, their properties, and relations. Getoor et al. [10] inorporated models of link predition in relational databases, and we adopt the same idea to learn the logisti regression model for link predition. 3. THE APPROACH 3.1 Proess Overview Figure 2 shows the entire proess of our approah. The red dots in the CQA Arhives stand for positive qa pairs whose answers are good, while the blak rosses represent negative pairs whose answers are not good (not neessarily noisy). Eah qa pair is represented by a vetor of textual and nontextual features as listed in Table 2. In the offline stage, we learn a Bayesian logisti regression model based on the rawled QA arhive, taking into aount both positive and negative qa pairs. The task of the model is to estimate how likely a qa pair ontains a good answer. In the online stage, a supporting set (enlosed by the dotted blue irle in Figure 2) of positive qa pairs is first retrieved from the CQA arhives using only the new question Q (the green blok) as a query. The supporting set, along with the learnt link predition model, is used for soring and ranking eah new qa pair, and the topranked qa pair (the magenta blok) ontains the best answer. Some real qa examples are given to better illustrate the idea. 3.2 Learning the Link Predition Model Modeling latent linkage via logisti regression We train a Bayesian logisti regression (BLR) model with finitedimensional parameters for latent linkage predition and set multivariate Gaussian priors for the parameters. The advantage of introduing a prior is that it helps to integrate over funtion spaes. Formally, let X ij = [Φ 1(Q i, A j ), Φ 2(Q i, A j ),..., Φ K(Q i, A j )] be a Kdimensional feature vetor of the pair of question Q i and answer A j, where Φ defines the mapping Φ : Q A R K. Let C ij {0, 1} be an indiator of linkage types, where C ij = 1 indiates a positive linkage, i.e. A j is a highquality answer of Q i, and C ij = 0 otherwise. Let Θ = [Θ 1, Θ 2,..., Θ K] be the parameter vetor of the logisti regression model to be learnt, i.e. P(C ij = 1 X ij, Θ) = exp(θ T X ij ) (1) 181
4 Figure 3: (a) The plate model of the Bayesian logisti regression model. (b) When C = 1, Q and A is positively linked whih shares information embedded in Θ, represented by the undireted edges. where X ij X train = {X ij, 1 i D q, 1 j D a} is a training qa pair and D q, D a are the number of training questions and answers respetively. Generally we have D a D q. Figure 3(a) shows the plate model. The C node is introdued to expliitly model the fators that link a question Q and an answer A. Sine C represents a link while Q and A represent ontent, this model seamlessly reinfores the ontent and linkage in relational data. If C = 1 is observed, as shown in Figure 3(b), it means that there exists a positive link between the orresponding Q and A, and this link is analogous to links whih onstrut Θ. We use undireted edges to represent these impliations. To ahieve a more disriminative formulation, we use both positive qa pairs and negative qa pairs to train this model. Meanwhile, sine noisy answers generally oupy a larger population, to balane the number of positive and negative training data, we randomly sample a similar number of negative and positive points Learning the Gaussian prior P(Θ) The reason that we use a Gaussian prior is threefold: Firstly, instead of diretly using the BLR model s outputs, we evaluate how probably the latent linkage embedded in a new qa pair belongs to the subpopulation defined by the previous knowledge, with respet to the BLR predition (we detail this approah in Setion 3.4.2). Therefore Gaussian is a good prior andidate. Seondly, Gaussian distribution has a omparatively small parameter spae (mean and variane), whih makes the model easier for ontrol. Thirdly, Gaussian distribution is onjugate to itself, i.e. the posterior is still a Gaussian, whih results in a simple solution. To ensure the preiseness of the prior, we adopt the approah suggested by Silva et al.[26]. Firstly the BLR is fit to the training data using Maximum Likelihood Estimation (MLE), whih gives an initial ˆΘ. Then the ovariane matrix ˆΣ is defined as a smoothed version of the MLE estimated ovariane: (ˆΣ) 1 = (XT ŴX) (2) where is a salar; is the size of the training data set; X = {X ij } is the K feature matrix of the training qa pairs, either positive or negative. Ŵ is a diagonal matrix with Ŵ ii = ˆp(i) (1 ˆp(i)), where ˆp(i) is the predited probability of a positive link for the ith row of X. The prior for Θ is then the Gaussian (ˆΘ, ˆΣ). 3.3 Previous Knowledge Mining To our knowledge, none of previous CQA answer ranking approahes ever leveraged a supporting set. In fat, suh an idea is advantageous in at least two aspets: 1) bridging the lexial gap: it enlarges the voabulary, whih is more likely to over the words not appearing in a question but in its orret answers or vie versa. In fat, some traditional QA approahes have disovered that the amount of impliit knowledge whih assoiates an answer to a question an be quantitatively estimated by exploiting the redundany in a large data set [6, 24]. 2) bridging the semanti gap: its positive linkages enable the analogy reasoning approah for new link predition. Intuitively, it is more advantageous to rank a doument based on ommunity intelligene than simply on individual intelligene [7, 12]. We use information retrieval tehniques to identify suh a supporting set. However, ommunity qa pairs retrieval again is not a trivial task. Jeon and his olleagues [15] proposed a translationbased retrieval model using the textual similarity between answers to estimate the relevane of two questions. Xue et al [29], furthermore, ombined the wordtoword translation probabilities of questiontoquestion retrieval and answertoanswer retrieval. Lin et al. [22] and Bilotti et al. [4], on the other hand, adopted the traditional information retrieval method but utilized strutural features to ensure retrieval preision. We adopt Lin and Bilotti s way for qa pair retrieval for simpliity. In partiular, let Q q be a new question and its answer list be {A j q, 1 j M}, the supporting qa pairs S = {(Q 1 : A 1 ), (Q 2 : A 2 ),..., (Q L : A L )} ontain those whose questions osine similarities to the new question are above a threshold: S = {(Q i : A i ) os(q q, Q i ) > λ, i 1,..., D} (3) where D is the size of our rawled Yahoo!Answer arhive and λ is the threshold. Eah question is represented in the bagofword model. The effet of λ is shown in Figure 5. As analyzed before, whether the retrieved questions are semantially similar to the new question is not ritial. This is beause the inferene in our ase is based on the strutures of qa pairs rather than on the ontents. Moreover, aording to the exhaustive experiments onduted by Dumais et al. [8], when the knowledge base is large enough, the auray of question answering an be promised with simple doument ranking and ngram extration tehniques. ote that S ontains only positive qa pairs. Therefore if the linkage of a andidate qa pair is predited as analogous to the linkages in the subpopulation S, we an say that it is a positive linkage whih indiates a highquality answer. 3.4 Link Predition The idea of analogial reasoning There is a large literature on analogial reasoning in artifiial intelligene and psyhology, whih ahieved great suess in many domains inluding lustering, predition, and dimensionality redution. Interested readers an refer to Frenh s survey [9]. However, few previous work ever applied analogial reasoning onto IR domain. Silva et al. [27] used it to model latent linkages among the relational objets ontained in university webpages, suh as student, ourse, department, staff, et., and obtained promising result. An analogy is defined as a measure of similarity between strutures of related objets (qa pairs in our ase). The key aspet is that, typially, it is not so important how eah individual objet of a andidate pair is similar to individual 182
5 objets of the supporting set (i.e. the relevant previous qa pairs in our ase). Instead, implementations that rely on the similarity between the pairs of objets will be used to predit the existene of the relationships. In other words, similarities between two questions or two answers are not as important as the similarity between two qa pairs. Silva et al. [27] proposed a Bayesian analogial reasoning (BAR) framework, whih uses a Bayesian model to evaluate if an objet belongs to a onept or a luster, given a few items from that luster. We use an example to better illustrate the idea. Consider a soial network of users, there are several diverse reasons that a user u links to a user v: they are friends, they joined the same ommunities, or they ommented the same artiles, et. If we know the reasons, we an group the users into subpopulations. Unfortunately, these reasons are impliit; yet we are not ompletely in the dark: we are already given some subgroups of users whih are representative of a few most important subpopulation, although it is unlear what reasons are underlying. The task now beomes to identify whih other users belong to these subgroups. Instead of writing some simple query rules to explain the ommon properties of suh subgroups, BAR solves a Bayesian inferene problem to determine the probability that a partiular user pair should be a member of a given subgroup, or say they are linked in an analogous way Answer ranking via analogial reasoning The previous knowledge retrieved is just suh a subpopulation to predit the membership of a new qa pair. Sine we keep only positive qa pairs in this supporting set and highquality answers generally answer a question semantially rather than lexially, it is more likely that a andidate qa pair ontains a highquality answer when it is analogous to this supporting qa set. To measure suh an analogy, we adopt the soring funtion proposed by Silva et al. [27] whih measures the marginal probability that a andidate qa pair (Q q, A j q) belongs to the subpopulation of previous knowledge S: sore(q q, A j q) = log P(C j q = 1 X j q,s,c S = 1) log P(C j = 1 X j q)) where A j q is the jth andidate answer of the new question. X j q represents the features of (Q q, A j q). C S is the vetor of link indiators for S, and C S = 1 indiates that all pairs in S is positively linked, i.e. C 1 = 1, C 2 = 1,..., C L = 1. The idea underlying is to measure to what extent (Q q, A j q) would fit into S, or to what extent S explains (Q q, A j q). The more analogous it is to the supporting linkages, the more probability the andidate linkage is positive. Aording to the Bayes Rule, the two probabilities in Eq.(4) an be solved by Eq.(5) and Eq.(6) respetively: P(C j q = 1 X j q,s,c S = 1) = P(C j = 1 X j q, Θ)P(Θ S,C S = 1)dΘ (4) (5) P(Cq j = 1 Xq) j = P(C j = 1 Xq, j Θ)P(Θ)dΘ (6) where Θ is the parameter set. P(C j = 1 X j q, Θ) is given by the BLR model defined in Eq.(1). The solution of these two equations is given in the appendix. The entire proess is shown in Table 1. Table 1: Summary of our ARbased method I. Training Stage: Input: feature matrix X train of all the training qa pairs Output: BLR model parameters Θ and its prior P(Θ). Algorithm: 1. Train BLR model on X train using Eq.(1). 2. Learn Prior P(Θ) using Eq.(2). II. Testing Stage: Input: a query QA thread: {Q q : A i q} M i=1 Output: sore(q q, A i q), i = 1,..., M Algorithm: 1. Generate the feature matrix X test = {X j q } with X j q the features of the jth qa pair (Q q : A j q); 2. Retrieve a set of positive qa pairs S from the CQA arhive using Eq.(3); 3. Do Bayesian inferene to obtain P(Θ S, C S = 1); 4. For eah (Q q, A j q), estimate the probability of a positive linkage using Eq.(4); 5. Ranking the answers {A j q, j = 1,..., M} in desending order of the sores. The topranked answer is assumed as the best answer of Q q. 4. EVALUATIO We rawled 29.8 million questions from the Yahoo!Answers web site; eah question has answers on average. These questions over 1,550 leaf ategories defined by expert and all of them have userlabeled best answers, whih is a good test bed to evaluate our approah. 100,000 randomly seleted qa pairs were used to train the link predition model, and 16,000 qa threads were used for testing eah ontains answers on average, whih resulted in about 200,000 testing qa pairs. A typial harateristi of CQA sites is its rih struture whih offers abundant metadata. Previous work have shown the effetiveness of ombining strutural features with textual features [1, 14, 23]. We adopted a similar approah and defined about 20 features to represent a qa pair, as listed in Table 2. Two metris were used for the evaluation. One is Average For a given query, it is the mean fration of relevant answers ranked in the top K results; the higher the preision, the better the performane is. We use the best answer tagged by the Yahoo!Answers web site as the ground truth. Sine average preision ignores the exat rank of a orret answer, we use the Mean Reiproal Rank (MRR) metri for ompensation. The MRR of an individual query is the reiproal of the rank at whih the first relevant answer is returned, or 0 if none of the top K results ontain a relevant answer. The sore for a set of queries is the mean of eah individual query s reiproal ranks: MRR = 1 1 (7) Q r r q q Q r 183
6 Table 2: The BAR Algorithm in CQA Formulation Textual Features of QuestionsAnswers Pairs Q.(A.) TF Question (Answer) term frequeny, stopwords removed, stemmed ovel word TF Term frequeny of nonditionary word, e.g. ms #Commonwords umber of ommon words in Q and A Statistial Features of QuestionsAnswers Pairs Q.(A.) raw length umber of words, stopwords not removed Q/A raw length ratio Question raw length / answer raw length Q/A length ratio Q/A length ratio, stopword removed Q/A antistop ratio #stopwordinquestion/#stopwordinanswer Common ngram len. Length of ommon ngrams in Q and A #Answers Answer position umber of answers to a question The position of an answer in the qa thread User interation / Soial Elements Features #Interesting mark umber of votes mark a question as interesting Good mark by users umber of votes mark an answer as good Bad mark by users umber of votes mark an answer as bad Rating by asker The askerassigned sore to an answer Thread life yle The time span between Q and its latest A where Q r is the set of test queries; r q is the rank of the first relevant answer for the question q. Three baseline methods were used: the earest eighbor Measure (), the Cosine Distane Metri (COS), and the Bayesian Set Metri (BSets)[12]. The first two diretly measure the similarity between a question and an answer, without using a supporting set; neither do they treat questions and answers as relational data but instead as independent information resoures. The BSets model uses the soring funtion Eq.(8): sore(x) = log p(x S) log p(x) (8) where x represents a new qa pair and S is the supporting set. It measures how probably x belongs to S. The differene of BSets to our ARbased method is that the former ignores the qa orrelations, but instead join the features of a question and an answer into a single row vetor. 4.1 Performane Average Figure 4 illustrates the performane of our method and the baselines with the Average metri when K = 1, 5, 10. Sine eah question has less than 15 answers on average, the average preision at K > 10 is not evaluated Our BSets COS K=1 K=5 K=10 Figure 4: Average Our method signifiantly outperformed the baselines. Table 3: MRR for Our Method and Baselines Method MRR Method MRR 0.56 Cosine 0.59 BSets [12] 0.67 Our Method 0.78 The red bar shows the performane of our approah. The blue, yellow, and purple bars are of the BSets, COS and methods respetively. In all ases, our method signifiantly outperformed the baselines. The gap between our method and BSets shows the positive effet of modeling the relationships between questions and answers, while the gap between BSets and, COS shows the power of ommunity intelligene. The superiority of our model to BSets shows that modeling ontent as well as data strutures improves the performane than modeling only ontent; this is beause in CQA sites, questions are very diverse and the retrieved previous knowledge are not neessarily semantially similar to the query question (i.e. they are still noisy). Moreover, from the experimental result that method performed the worst, we an tell that the lexial gap between questions and answers, questions and questions, and qa pairs annot be ignored in the Yahoo!Answers arhive Mean Reiproal Rank (MRR) Table 3 gives the MRR performane of our method and the baselines. The trend oinides with the one that is suggested by the Average measure. Again our method signifiantly outperformed the baseline methods, whih means that generally the best answers rank higher than other answers in our method. 4.2 Effet of Parameters Figure 5 evaluates how our method performs with the two parameters:, the salar in Eq.(2), and λ, the threshold in Eq.(3). MRR metri is used here. Figure 5(a) shows the joint effet of these two parameters on the MRR performane. The best performane was ahieved when λ = 0.8, = 0.6. To better evaluate the individual effet of the parameters, we illustrate the urve of MRR vs. λ and MRR vs. in Figure 5(b) and Figure 5() respetively by fixing the other to its optimal value. As desribed in Setion 3.3, the threshold λ ontrols the relevane of the supporting qa pair set. Intuitively, if λ is set too high, few qa pairs will be retrieved. This auses too small a subpopulation of the supporting qa pairs suh that the linkage information beomes too sparse and thus is inadequate to predit the analogy of a andidate. On the 184
7 MRR 0.4 MRR delta λ / (a) The effet of and λ on the performane with MRR metri (b) The trend of performane on MRR vs. λ. = 0.6. () The trend of performane on MRR vs.. λ = 0.8. Figure 5: The effet of the prior salar in Eq.(2) and the similarity threshold λ in Eq.(3). The Mean Reiproal Rank metri was used. The best performane was ahieved at λ = 0.8, = 0.6. other hand, if λ is set too low, likely too many qa pairs will be retrieved whih will introdue too muh noise into the subpopulation. Therefore the semanti meaning of the supporting links is obsured. Figure 5(b) exatly reflets suh a ommon sense. The best performane was ahieved when λ was about 0.8. After that the performane dropped quikly. However, before λ was inreased to 0.8, the performane improved slowly and smoothly. This indiates that the noisy qa pairs orresponding to diverse link semantis were removed gradually and the subpopulation was getting more and more foused, thus strengthened the power of analogial reasoning. is used to sale the ovariane matrix of the prior P(Θ). As shown in Figure 5(), it is a sensitive parameter and is stable in a omparatively narrow range (i.e. about ). This, intuitively, oinides with the property of analogial reasoning approahes [9, 27, 26], where a datadependent prior is quite important for the performane. In our approah, we used the same prior for the entire testing dataset whih ontains qa pairs from diverse ategories. A better performane an be foreseen if we learn different priors for different ategories. 5. COCLUSIO A typial harateristi of ommunity question answering sites is the high variane of the quality of answers, while a mehanism to automatially detet a highquality answer has a signifiant impat on users satisfation with suh sites. The typial hallenge, however, lies in the lexial gap aused by textual mismath between questions and answers as well as usergenerated spam. Previous work mainly fouses on deteting powerful features, or finding textual lues using mahine learning tehniques, but none of them ever took into aount previous knowledge and the relationships between questions and their answers. Contrarily, we treated questions and their answers as relational data and proposed an analogial reasoningbased method to identify orret answers. We assume that there are various types of linkages whih attah answers to their questions, and used a Bayesian logisti regression model for link predition. Moreover, in order to bridge the lexial gap, we leverage a supporting qa set whose questions are relevant to the new question and whih ontain only highquality answers. This supporting set, together with the logisti regression model, is used to evaluate 1) how probably a new qa pair has the same type of linkages as those in the supporting set, and 2) how strong it is. The andidate answer that has the strongest link to the new question is assumed as the best answer that semantially answers the question. The evaluation on 29.8 million Yahoo!Answers qa threads showed that our method signifiantly outperformed the baselines both in average preision and in mean reiproal rank, whih suggests that in the noisy environment of a CQA site, leveraging ommunity intelligene as well as taking the struture of relational data into aount are benefiial. The urrent model only uses ontent to reinfore strutures of relational data. In the future work, we would like to investigate how latent linkages reinfore ontent, and vie versa, with the hope of improved performane. 6. ACKOWLEDGEMET We thank ChinYew Lin and Xinying Song s help on the qa data. 7. REFERECES [1] E. Agihtein, C. Castillo, and et. Finding highquality ontent in soial media. In Pro. of WSDM, [2] J. Bian, Y. Liu, and et. A few bad votes too many? towards robust ranking in soial media. In Pro. of AIRWeb, [3] J. Bian, Y. Liu, and et. Finding the right fats in the rowd: Fatoid question answering over soial media. In Pro. of WWW, [4] M. Bilotti, P. Ogilvie, and et. Strutured retrieval for question answering. In Pro. of SIGIR, [5] M. Blooma, A. Chua, and D. Goh. A preditive framework for retrieving the best answer. In Pro. of SAC, [6] E. Brill, J. Lin, M. Banko, and et. Dataintensive question answeringa. In TREC, [7] J. ChuCarroll, K. Czuba, and et. In question answering, two heads are better than one. In Pro. of HLT/AACL, [8] S. Dumais, M. Banko, and et. Web question answering: Is more always better? In Pro. of SIGIR,
8 [9] R. Frenh. The omputational modeling of analogymarking. Trends in ognitive Sienes, 6, [10] L. Getoor,. Friedman, and et. Learning probabilisti models of link struture. JMLR, 3: , [11] L. Getoor,. Friedman, and et. Probabilisti relational models. Introdution to Statistial Relational Learning, [12] Z. Ghahramani and K. Heller. Bayesian sets. In Pro. of IPS, [13] T. Jaakkola and M. Jordan. Bayesian parameter estimation via variational methods. Statistis and Computing, 10:25 37, [14] J. Jeon, W. Croft, and et al.. a framework to predit the quality of answers with nontextual features. In Pro. of SIGIR, [15] J. Jeon, W. Croft, and et. Finding similar questions in large question and answer arhives. In Pro. of CIKM, [16] V. Jijkoun and M. Rijke. Retrieving answers from frequently asked questions pages on the web. In Pro. of CIKM, pages 76 83, [17] P. Jurzyk and E. Agihtein. Disovering authorities in question answer ommunities by using link analysis. In Pro. of CIKM, [18] J. Kleinberg. Authoritative soures in a hyperlinked environment. Journal of the ACM, 46(5): , [19] J. Ko, L. Si, and E. yberg. A probabilisti framework for answer seletion in question answering. In Pro. of AACL/HLT, [20] J. Ko, L. Si, and E. yberg. A probabilisti graphial model for joint answer ranking in question answering. In Pro. of SIGIR, [21] J. Leibenluft. Librarianaŕs worst nightmare: Yahoo!answers, where 120 million users an be wrong. Slate Magazine, [22] J. Lin and B. Katz. Question answering from the web using knowledge annotation and knowledge mining tehniques. In Pro. of CIKM, [23] Y. Liu, J. Bian, and E. Agihtein. Prediting information seeker satisfation in ommunity question answering. In Pro. of SIGIR, pages , [24] B. Magnini and M. e. a. egri. Is it the right answer? exploiting web redundany for answer validation. In Pro. of ACL, pages , [25] D. Molla and J. Viedo. Question answering in restrited domains: An overview. In Pro. of ACL, [26] R. Silva, E. Airoldi, and K. Heller. Small sets of interating proteins suggest latent linkage mehanisms through analogial reasoning. In Gatsby Tehnial Report, GCU TR , [27] R. Silva, K. Heller, and Z. Ghahramani. Analogial reasoning with relational bayesian sets. In Pro. of AISTATS, [28] Q. Su, D. Pavlov, and et. Internetsale olletion of humanreviewed data. In Pro. of WWW, pages , [29] X. Xue, J. Jeon, and W. Croft. Retrieval models for question and answer arhives. In Pro. of SIGIR, pages , [30] J. Zhang, M. Akerman, and L. Adami. Expertise networks in online ommunities: Struture and algorithms. In Pro. of WWW, pages , APPEDIX Jaakkola et al. [13] suggested a variational approximation solution to the BLR model. Let g(ξ) be the logisti funtion, g(ξ) = (1 + e ξ ) 1, and onsider the ase for the single data point evaluation, Eq.(6), the method lowerbounds the integrand as follows: P(C X, Θ) = g(θ T X) g(ξ) exp{ HC ξ 2 λ(ξ)(h 2 C ξ 2 )} where H C = (2C 1)Θ T X and λ(ξ) = tanh( ξ 2 ). tanh( ) is 4ξ the hyperboli tangent funtion. Thus P(Θ X, C) an be approximated by normalizing P(C X, Θ)P(Θ) Q(C X, Θ)P(Θ) where Q(C X, Θ) is the righthand side of Eq.(9). Sine this bound assumes a quadrati form as a funtion of Θ and our priors are Gaussian, the approximate posterior will be Gaussian, whih we denote by (µ pos, Σ pos). However, this bound an be loose unless a suitable value for the free parameter ξ is hosen. The key step in the approximation is then to optimize the bound with respet to ξ. Let the Gaussian prior P(Θ) be denoted as (µ, Σ). The proedure redues to an iterative optimization algorithm where for eah step the following updates are made: Σ 1 pos = Σ 1 + 2λ(ξ)XX T µ pos = Σ 1 pos[σ 1 µ + (C 1 2 )X] ξ = (X T Σ posx + (X T µ pos) 2 ) 1 2 To approximate Eq.(5), a sequential update is performed: starting from the prior P(Θ) for the first data point (X, C) in (S,C S = 1), the resulting posterior (µ pos, Σ pos) is treated as the new prior for the next point. The ordering is hosen from an uniform distribution in our implementation. Finally, given the optimized approximate posterior, the preditive integral Eq.(5) an be approximated as: log(q(c ij X ij,s,c S )) = log g(ξ ij) ξij 2 + λ(ξij)ξ2 ij 1 2 µt SΣ 1 S µ T S µt ijσ 1 ij µ T ij log Σ 1 ij Σ 1 S where parameters ( S, Σ S ) are the ones in the approximate posterior Θ (S,C S ) (µ S, Σ S ), and (µ ij, Σ ij) ome from the approximate posterior Θ (S,C S, X ij, C ij ). Parameters (ξ ij, ξ S) ome from the respetive approximate posteriors. And Eq.(6) an be approximated as: log Q(C ij X ij ) = log g(ξ ij) xiij 2 + λ(ξij)ξ2 ij (9) 186
Random Walk Inference and Learning in A Large Scale Knowledge Base
Random Walk Inferene and Learning in A Large Sale Knowledge Base Ni Lao Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 nlao@s.mu.edu Tom Mithell Carnegie Mellon University 5000 Forbes
More informationWORKFLOW CONTROLFLOW PATTERNS A Revised View
WORKFLOW CONTROLFLOW PATTERNS A Revised View Nik Russell 1, Arthur H.M. ter Hofstede 1, 1 BPM Group, Queensland University of Tehnology GPO Box 2434, Brisbane QLD 4001, Australia {n.russell,a.terhofstede}@qut.edu.au
More informationFinding the Right Facts in the Crowd: Factoid Question Answering over Social Media
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media ABSTRACT Jiang Bian College of Computing Georgia Institute of Technology Atlanta, GA 30332 jbian@cc.gatech.edu Eugene
More informationFrom the Invisible Handshake to the Invisible Hand? How Import Competition Changes the Employment Relationship
From the Invisible Handshake to the Invisible Hand? How Import Competition Changes the Employment Relationship Marianne Bertrand, University of Chiago, Center for Eonomi Poliy Researh, and National Bureau
More informationHow can firms profitably give away free products? This paper provides a novel answer and articulates
MANAGEMENT SCIENCE Vol. 5, No. 0, Otober 005, pp. 494 504 issn 005909 eissn 56550 05 50 494 informs doi 0.87/mns.050.0400 005 INFORMS TwoSided Network Effets: A Theory of Information Produt Design Geoffrey
More informationHierarchical Beta Processes and the Indian Buffet Process
Hierarhial Beta Proesses and the Indian Buffet Proess Romain Thibaux Dept. of EECS University of California, Berkeley Berkeley, CA 9472 Mihael I. Jordan Dept. of EECS and Dept. of Statistis University
More informationNodal domains on graphs  How to count them and why?
Proeedings of Symposia in Pure Mathematis Nodal domains on graphs  How to ount them and why? Ram Band, Idan Oren and Uzy Smilansky, Abstrat. The purpose of the present manusript is to ollet known results
More informationCQARank: Jointly Model Topics and Expertise in Community Question Answering
CQARank: Jointly Model Topics and Expertise in Community Question Answering Liu Yang,, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen School of Software and Microelectronics,
More informationDiscovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
More informationSupporting Keyword Search in Product Database: A Probabilistic Approach
Supporting Keyword Search in Product Database: A Probabilistic Approach Huizhong Duan 1, ChengXiang Zhai 2, Jinxing Cheng 3, Abhishek Gattani 4 University of Illinois at UrbanaChampaign 1,2 Walmart Labs
More informationOpen Domain Question Answering via Semantic Enrichment
Open Domain Question Answering via Semantic Enrichment Huan Sun, Hao Ma #, Wentau Yih #, ChenTse Tsai +, Jingjing Liu #, MingWei Chang # huansun@cs.ucsb.edu {haoma, scottyih, jingjl, minchang}@microsoft.com
More informationApproaches to Exploring Category Information for Question Retrieval in Community QuestionAnswer Archives
Approaches to Exploring Category Information for Question Retrieval in Community QuestionAnswer Archives 7 XIN CAO and GAO CONG, Nanyang Technological University BIN CUI, Peking University CHRISTIAN S.
More informationAddressing Cold Start in Recommender Systems: A Semisupervised Cotraining Algorithm
Addressing Cold Start in Recommender Systems: A Semisupervised Cotraining Algorithm Mi Zhang,2 Jie Tang 3 Xuchen Zhang,2 Xiangyang Xue,2 School of Computer Science, Fudan University 2 Shanghai Key Laboratory
More informationSilence is Also Evidence: Interpreting Dwell Time for Recommendation from Psychological Perspective
Silence is Also Evidence: Interpreting Dwell Time for Recommendation from Psychological Perspective Peifeng Yin Ping Luo WangChien Lee Min Wang, Department of Computer Science & Engineering, Pennsylvania
More informationWEBSCALE image search engines mostly use keywords
810 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 4, APRIL 2014 Web Image ReRanking Using QuerySpecific Semantic Signatures Xiaogang Wang, Member, IEEE, Shi Qiu, Ke Liu,
More informationMining Templates from Search Result Records of Search Engines
Mining Templates from Search Result Records of Search Engines Hongkun Zhao, Weiyi Meng State University of New York at Binghamton Binghamton, NY 13902, USA {hkzhao, meng}@cs.binghamton.edu Clement Yu University
More informationExpert Finding for Question Answering via Graph Regularized Matrix Completion
1 Expert Finding for Question Answering via Graph Regularized Matrix Completion Zhou Zhao, Lijun Zhang, Xiaofei He and Wilfred Ng Abstract Expert finding for question answering is a challenging problem
More informationRouting Questions for Collaborative Answering in Community Question Answering
Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com
More informationThe Social World of Content Abusers in Community Question Answering
The Social World of Content Abusers in Community Question Answering Imrul Kayes Computer Science and Engineering University of South Florida Tampa FL, USA imrul@mail.usf.edu Adriana Iamnitchi Computer
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,
More informationLearning for Search Result Diversification
Learning for Search Result Diversification Yadong Zhu Yanyan Lan Jiafeng Guo Xueqi Cheng Shuzi Niu Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China {zhuyadong, niushuzi}@software.ict.ac.cn
More informationSearching Questions by Identifying Question Topic and Question Focus
Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, ChinYew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn
More informationRecovering Semantics of Tables on the Web
Recovering Semantics of Tables on the Web Petros Venetis Alon Halevy Jayant Madhavan Marius Paşca Stanford University Google Inc. Google Inc. Google Inc. venetis@cs.stanford.edu halevy@google.com jayant@google.com
More informationIntroduction to Recommender Systems Handbook
Chapter 1 Introduction to Recommender Systems Handbook Francesco Ricci, Lior Rokach and Bracha Shapira Abstract Recommender Systems (RSs) are software tools and techniques providing suggestions for items
More informationAn Evaluation of Classification Models for Question Topic Categorization
An Evaluation of Classification Models for Question Topic Categorization Bo Qu, Gao Cong, Cuiping Li, Aixin Sun, Hong Chen Renmin University, Beijing, China {qb8542,licuiping,chong}@ruc.edu.cn Nanyang
More informationTop 10 algorithms in data mining
Knowl Inf Syst (2008) 14:1 37 DOI 10.1007/s1011500701142 SURVEY PAPER Top 10 algorithms in data mining Xindong Wu Vipin Kumar J. Ross Quinlan Joydeep Ghosh Qiang Yang Hiroshi Motoda Geoffrey J. McLachlan
More informationWas This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content
Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content Ruben Sipos Dept. of Computer Science Cornell University Ithaca, NY rs@cs.cornell.edu Arpita Ghosh Dept. of Information
More informationSnapandAsk: Answering Multimodal Question by Naming Visual Instance
SnapandAsk: Answering Multimodal Question by Naming Visual Instance Wei Zhang, Lei Pang, ChongWah Ngo Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong {wzhang34, leipang3}@student.cityu.edu.hk,
More informationSo Who Won? Dynamic Max Discovery with the Crowd
So Who Won? Dynamic Max Discovery with the Crowd Stephen Guo Stanford University Stanford, CA, USA sdguo@cs.stanford.edu Aditya Parameswaran Stanford University Stanford, CA, USA adityagp@cs.stanford.edu
More information