Subordinating to the Majority: Factoid Question Answering over CQA Sites
|
|
- Opal Stanley
- 8 years ago
- Views:
Transcription
1 Journal of Computational Information Systems 9: 16 (2013) Available at Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei ZHANG Institute of Computer Science and Technology, Nankai University, Tianjin , China Abstract Question Answering communities such as Yahoo! Answers have emerged as a popular media for online information seeking and knowledge sharing. When an asker doesn t choose the best answer, the best answer may be chosen by the voters. Unfortunately, the quality of the submitted questions and answers vary widely increasingly so that a large fraction of the content is not usable for answer queries. There re more and more researches on best answer selection. However, they require large amounts of training data or manually labeled data, which limits the applicability of the supervised approaches to new sites and domains. In this paper we address this problem by the similarity between answers. The similarity between any two answers is evaluated by VSM( vector space model). We regard the similarityas their effect for each other, and the effect is transmitted by iteration. The iteration stops when the computation reaches a stable state. Finally, the rank of answers depends on the iteration result and votes of others. The experimental results show that our approach leads to a better performance than other baseline approaches. Keywords: Community Question Answering; Best Answer Selection; Factoid Question; Answers Similarity 1 Introduction Community Question Answering (CQA) has become a popular media for online information seeking and knowledge sharing [1]. In the last few years, many CQA systems have been launched, including Yahoo! Answers, BuyAns, Live QnA. CQA sites make their content-questions and associated answers submitted on the site. Rather than browsing results of search engines, users present detailed information needs and get direct responses authored by humans. Su et al. [2] analyzed the quality of answers in QA portals and found that the quality of each answers vary significantly. In addition, the ability, or inability, to obtain a high-quality answer has significant impact on user satisfaction. Many previous approaches can be classified into the three categories. 1) Probabilistic approaches [3, 4, 5, 6]: They make researches on the content of CQA sites, which includes analysis of the Project supported by the National Nature Science Foundation of China (No ). Corresponding author. address: forwarding82@gmail.com (Xiaojie YUAN) / Copyright 2013 Binary Information Press DOI: /jcis7716 August 15, 2013
2 6410 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) content and the quality of the questions and answers. Some methods analyze the reputation systems and social norms on these sites. 2) Link-based approaches [7, 8]: The methods assume that good users give good answers. Therefore, their target is to find expert for a given question by user networks. They apply link-analysis algorithms PageRank [14], HITS [15] to identify users with high expertise. 3) Learning-based approaches [9, 10, 11]: The methods extract various features from questions, answers, and the users who posted them, and training a number of classifiers to select the best answer using those features. In a word, existing methods either require large amounts of supervision or only focus on the network properties of the CQA. Some methods consider the content similarity between questions and answers, without the content similarity between the answers. In this paper we present a ranking framework to take advantage of the similarity between the answers to retrieve high quality answer for factoid question. For factoid question, the best answer is generally definite. The majority of people give similar answers. Our goal is to find the most supported answer by the similarity and votes of others. We construct similarity matrix by computing the similarity between any two answers, which is evaluated by VSM( vector space model). The score of an answer is just the expected score of answers it s similar to. The effect factor between the two answers is described by the similarity, and it s transmitted by iteration. The iteration stops when the computation reaches a stable state. Then the scores of answers are modified with the vote information. The experimental results show our approach leads to a better performance than other baseline approaches. To our knowledge, this is the first method of no training data and manually labeled data, which is fit for large amount of question-answers in CQA sites. The rest of this paper is organized as follows. Section 2 reviews some prior work related to our approach. Section 3 details the proposed method including algorithms. Section 4 reports on the performance study. At last, we conclude the paper in Section 5. 2 Related Work Probabilistic approaches focus on the content of CQA sites. Bian et al. [4] utilized users interactions to retrieve relevant high-quality content in social media. It explored the algorithm to integrate relevance, user interaction, and community feedback information to find the right factual, well-formed content to answer a user s question. Wang et al. [5] assumed that answers were connected to their questions with various types of latent links, and proposed an analogical reasoning-based approach which measured the analogy between the new question-answer linkages and those of previous relevant knowledge which contained only positive links; the candidate answer which had the most analogous link was assumed to be the best answer. Linked-based methods have been shown to be successful for several tasks in social media. Their target is to discover users authorities by user networks, which is also called expert finding. Jurczyk et al. [7] and Zhang et al. [8] evaluated link algorithms PageRank and HITS to rank users based on their authority scores. The difference is that Zhang et al. is applied to a small data set. Some researchers resorted to machine learning techniques. Jeon et al. [9] extracted a number of non-textual features which cover the contextual information of questions and their answers, and proposed a language modeling-based retrieval model for processing these features in order
3 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) to predict the quality of answers collected from a specific CQA service. Agichtern et al. [12] introduced a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. Blooma et al. [13] proposed more features, textual and non-textual, and used regression analyzers to generate predictive features for the best answer identification. Shah and Pomerantz [11] measured the quality of answers in CQA sites by extracting various features and training a number of classifiers to select the best answer using those features. The definition of quality is, akin to popularity. Bian et al. [10] developed a semi-supervised coupled mutual reinforcement framework for simultaneously calculating content quality and user reputation, that requires relatively few labeled examples to initialize the training process. Closest to our work, Ko et al. [3] focused on developing a unified framework that not only used multiple resources for validating answer candidates, but also considered evidence of similarity among answer candidates in order to boost the ranking of the correct answer. In their another paper [16], they applied a probabilistic graphical model for answer ranking in question answering. This model estimated the joint probability of correctness of all answer candidates, from which the probability of correctness of an individual candidate can be inferred. The joint prediction model can estimate both the correctness of individual answers as well as their correlations, which enables a list of accurate and comprehensive answers. However, the two methods need training data. 3 Prediction Model In this section we will describe how to find the best answer of factoid question over CQA sites. We start with a more precise definition of the problem of best answer retrieval. 3.1 Problem definition In QA systems, there are a very large amount of questions and answers posted by a diverse community of users. One posted question can attract several answers from a number of different users. For factoid question, the best answer is generally definite. There are some similarity between these answers. Our goal is to find the most supported answer by the similarity. The most supported answer is regarded as the best answer. Definition 1 (Score of answers) The score of an answer A i (denoted by score(a i )) in a answer set A is the probability of A i being the best answer. We abstract the social content in QA system as a set of question-answers triples: < q, A, V > where q is one of factoid questions in the whole archive of the QA system, A is the answer set to this question. V is the vote set corresponding with the answers. Each answer have a positive vote (thumbs up) and a negative vote (thumbs down). For A i, the vote information of the ith answer to question q is: V i =< upnum, downnum >
4 6412 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) We will first discuss how to describe the similarity between the answers with the text content and then discuss a support matrix for iterative computation. Finally, we discuss how to integrate vote information for best answer retrieval. 3.2 Support matrix For question q, the similarity between answer A i and answer A j is described by text similarity sim(a i, A j ). sim(a i, A j ) is based on VSM (vector space model). We abstract nouns as index item. For answer set A, we compute the similarity sim(a i, A j ) between any two answers(a i,a j,i j). sim(a i, A j ) = sim(a j, A i ), sim(a i, A i ) = 0, so the time complexity is n(n 1), n is the number of 2 answers in the A i. The similarity matrix is defined as follows: M sim = s 11 s s 1n s 21 s s 2n s n1 s n2... s nn where s ij = sim(a i, A j ) (1) For A i and A j, A i may only be similar to A j while A j is similar to many answers. The support degree between is different. Therefore, we normalize the similarity matrix to describe the support degree from other answers, which is called as support matrix M sup : t 11 t t 1n M sup = t 21 t t 2n where t ij = t n1 t n2... t nn s ij n i=1 s, s ij M sim (2) ij M sim focuses on the relationship between two answers. M sup considers the effect of other answers. In the M sup, t ij t ji. 3.3 Iterative computation An answer is ranked higher as there are more answers that are similar to it. An answer that is supported by many answers with high scores receives a high rank itself. If no answer is similar to an answer, there is no support for that answer. As in Authority-hub analysis and PageRank, BestF inder adopts an iterative method to compute the scores of answers. Initially, it has very little information about the answers. At each iteration BestF inder updates the scores of answers. Finally, it stops when the computation reaches a stable state. The score of an answer is just the expected score of answers it s similar to. For answer A i, we compute its score score(a i ) by calculating the average score of answers that has support degree to A i. m score(a j ) M sup [j, i] score(a i ) = j=1 m, M sup [j, i] > 0, j i (3)
5 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) m is the number of the answers that are similar to A i. We choose the initial state in which all answers have a uniform score s 0. (s 0 is set to 1/n, n is the number of answers.) In each iteration, BestF inder improves the score of high-quality answer while reduces the score of lowquality answer. It stops iterating when it reaches a stable state. The stableness is measured by the change of the scores of all answers. If it changes a little after an iteration, then BestF inder will stop. Algorithm 1: Iterative computation function Input: support matrix M sup [n][n], Answers old score array oldscore[n] Output: Answers new score array newscore[n] for i = 0; i < n; i + + do newscore[n] = 0 ; nonzeronum = 0 ; for j = 0; j < n; j + + do /* count the number of answers which are similar to answer A i */ if M sup [j][i] > 0 then newscore[i] = newscore[i] + oldscore[i] M sup [j][i] ; nonzeronum = nonzeronum + 1 ; end end /* for the expected score */ if nonzeronum > 0 then newscore[i] = newscore[i] ; nonzeronum end end 3.4 Answers score In the Yahoo! Answers, after reading existing answers for a question, a user can give his or her judgment as the evaluation for the answers. If he or she considers the answer as useful, he or she can add a plus vote to this answer. Otherwise, a minus votes may be added to the answer. If the asker doesn t choose the best answer after some fixed period of time, the best answer may be chosen by the voters. Therefore, vote information from others is an important factor for answer selection. The answer s score should integrate the answer similarity and vote information. We introduce two effect factors α and β to describe the effect degree of answers similarity and others votes. Then we can define the answer s score as follows: score(a i ) = α score(a i) n + β score(a i ) i=1 V up i V up i Vi down + 2, α + β = 1 (4) In the experiment, we set α = 0.6, β = 0.4. Some answers have no votes. In this case, the support of the answers is 0.5. That is, the probability of being supported is the same as the probability of being opposed to. Therefore, we add a constant 2 to the denominator, and add a constant 1 to the numerator. For an answer set A, if there s no similarity between any two answers, the scores of answers are decided by the portion of votes information. Finally, the answer with the maximal score is the best answer.
6 6414 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) Experimental Evaluation We now describe the measures used for the evaluation, the dataset and the experimental results. For evaluation, we consider an answer to be a high quality answer, if the asker chose it as the best answer, and gave it a rating of at least Datasets We use the same datasets used in [10]. They used the TREC QA benchmarks to crawl QA archives and related user information. This was done by submitting TREC QA queries into the CQA site and retrieving the returned questions, answers and related users. The factoid questions are from seven years of the TREC QA track evaluations (years ). They submitted each TREC query to the Yahoo! Answers web service and retrieve up to 10 top-ranked related questions according to the Yahoo! Answers ranking. The detail of data collection can be found in the paper [10]. There are, in total, users, questions and answers. Note that, although the proportion of factoid questions in Yahoo! Answers may not be large, we use them in order to have objective metric of correctness, and extrapolate performance to whole QA archives. 4.2 Evaluation metrics We consider an answer to be a high quality answer, if the asker chose it as the best answer, and gave it a rating of at least 3. Therefore, there s only one correct answer for a question. Two metrics were used for the evaluation. One is Accuracy: for a given question, Accuracy reports the fraction of answers ranked in the first that was chosen as the best answer. We used the best answer tagged by the Yahoo! Answers web site as the ground truth. Since Accuracy ignores the exact rank of a correct answer, we used Mean Reciprocal Rank (MRR) metric for compensation. The MRR of each individual query is the reciprocal of the rank at which the first relevant answer was returned, or 0 if none of the top N results contained a relevant answer. The score for a sequence of queries is the mean of the individual query s reciprocal ranks. Thus, MRR is calculated as: 4.3 Methods compared MRR = 1 Q r q Q r 1 r q To our knowledge, this is the first method of no training data and manually labeled. To evaluate the Q&A quality, we compare the quality of the baseline methods: Baseline BestRatio: Answers are ranked by the best answer ratio of answerers. The best ratio is the ratio of the answerer s answers being regarded as the best answer. It indicates an answerer s authority. Baseline Votes: Answers are ranked by the score computed as the difference of thumbs-up votes and thumbs-down votes received for each answer. This ranking closely approximates
7 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) the ranking obtained when a user clicks order by votes option on the Yahoo! Answers site. The detail of this method and how to compute MRR under this setting is discussed in [4]. 4.4 Experimental results Figure 1 illustrates the performance of our method and the baselines with varying number of candidate answers. BestF inder significantly outperformed the baselines with less than 5 candidate answers. The Baseline V otes is stable with more than 8 candidate answers. BestF inder is not very effective with more candidate answers. That s because of the complexity of CQA sites. There may be some correct answers, but the system requires only one best answer. The choice of best answer depends on the asker. He/she may integrate some subjective factors. In contrary to traditional QA, answerers give more detailed description, which decrease the weight of keywords. In fact, most questions have less than 5 answers. Therefore, BestF inder is more effective as a whole. Fig. 1: MPP and Accuracy of BestF inder and baselines for varying number of candidate answers Figure 2 shows the changes of answers scores after each iteration, which is defined as Euclidean distance of the old and new scores. We can see BestFinder@number of answers converges in a steady speed. Therefore, BestF inder doesn t require too much iteration to reach a stable state. Fig. 2: Changes of answers scores after each iteration 5 Conclusions We presented a framework for non-supervised best answer selection of factoid questions in Community Question Answering. We regard the similarity between any two answers as their effect for
8 6416 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) each other, and the effect is transmitted by iteration. The iteration stops when the computation reaches a stable state. Finally, the rank of answers depends on the iteration result and votes of others. We have demonstrated the effectiveness of BestF inder in large-scale experiments of a CQA dataset comprising over 100,000 users, 27,000 questions and 200,000 answers. In contrary to supervised method, BestF inder doesn t require training data and manually labeled data. In addition, our experiments demonstrate significant improvements over the baselines especially for the less answers. References [1] L. A. Adamic, J. Zhang, E. Bakshy and M. S. Ackerman. Knowledge sharing and yahoo answers: Everyone knows something. In Proc of WWW, 2008, pp [2] Q. Su, D. Pavlov, J. Chow and W. Baker. Internet-scale collection of human-reviewed data. In Proc of WWW, 2007, pp [3] J. Ko, L. Si and E. Nyberg. A probabilistic framework for answer selection in question answering. In Proc of NAACL HLT, 2007, pp [4] J. Bain, Y. Liu, E. Agichtein and H. Zha. Finding the right facts in the crowd: Factoid question answering over social media. In: Proc. of WWW, 2008, pp [5] X. Wang, X. Tu, D. Feng and L. Zhang. Ranking community answers by modeling question-answer relationships via analogical reasoning. In: Proc. of SIGIR, 2009, pp [6] J. Liu, S. Wang, Y. Peng, X. Huang and W. Wang. Answer Extraction of Chinese Restricted Domain Question Answering System Based on Ontology. Journal of Computational Information Systems 2010, 6(1), [7] P. Jurczyk and E. Agichterin. Discovering authorities in question answer communities by using link analysis. In: Proc. of ACM CIKM, 2007, pp [8] J. Zhang, M. S. Ackerman and L. Adamic. Expertise networks in online communities: structure and algorithms. In: Proc. of WWW, 2007, pp [9] J. Jeon, W. Croft, J. Lee and S. Park. A framework to predict the quality of answers with nontextual features. In Proc of SIGIR HLT, 2006, pp [10] J. Bain, Y. Liu, D. Zhou, E. Agichtein and H. Zha. Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In: Proc. of WWW, 2009, pp (2009). [11] C. Shah and J. Pomerantz. Evaluating and predicting answer quality in community QA. In: Proc. of SIGIR, 2010, pp [12] E. Agichtein, C. Castillo, D. Donato, A. Gionis and G. Mishne. Finding high-quality content in social media with an application to community-based question answering. In Proc of WSDM, 2008, pp [13] M. Blooma, A. Chua and D. Goh. A predictive framework for retrieving the best answer. In: Proc. of SAC, 2008, pp [14] L. Page, S. Brin, R. Motwani and T. Winograd. The pagerank citation ranking: Bringing order to the web. In: Technical report, Stanford Digital Library Technologies Project, [15] J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999, 46(5), [16] J. Ko, L. Si and E. Nyberg. A probabilistic graphical model for joint answer ranking in question answering. In: Proc. of SIGIR, 2007, pp
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein
More informationFinding the Right Facts in the Crowd: Factoid Question Answering over Social Media
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media ABSTRACT Jiang Bian College of Computing Georgia Institute of Technology Atlanta, GA 30332 jbian@cc.gatech.edu Eugene
More informationIncorporating Participant Reputation in Community-driven Question Answering Systems
Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,
More informationTopical Authority Identification in Community Question Answering
Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95
More informationQuestion Routing by Modeling User Expertise and Activity in cqa services
Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,
More informationJoint Relevance and Answer Quality Learning for Question Routing in Community QA
Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy
More informationRanking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning
Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Xin-Jing Wang Microsoft Research Asia 4F Sigma, 49 Zhichun Road Beijing, P.R.China xjwang@microsoft.com Xudong
More informationIncorporate Credibility into Context for the Best Social Media Answers
PACLIC 24 Proceedings 535 Incorporate Credibility into Context for the Best Social Media Answers Qi Su a,b, Helen Kai-yun Chen a, and Chu-Ren Huang a a Department of Chinese & Bilingual Studies, The Hong
More informationQuality-Aware Collaborative Question Answering: Methods and Evaluation
Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School
More informationIntegrated Expert Recommendation Model for Online Communities
Integrated Expert Recommendation Model for Online Communities Abeer El-korany 1 Computer Science Department, Faculty of Computers & Information, Cairo University ABSTRACT Online communities have become
More informationComparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering
Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of
More informationModel for Voter Scoring and Best Answer Selection in Community Q&A Services
Model for Voter Scoring and Best Answer Selection in Community Q&A Services Chong Tong Lee *, Eduarda Mendes Rodrigues 2, Gabriella Kazai 3, Nataša Milić-Frayling 4, Aleksandar Ignjatović *5 * School of
More informationInformation Quality on Yahoo! Answers
Information Quality on Yahoo! Answers Pnina Fichman Indiana University, Bloomington, United States ABSTRACT Along with the proliferation of the social web, question and answer (QA) sites attract millions
More informationRouting Questions for Collaborative Answering in Community Question Answering
Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com
More informationPredicting Answer Quality in Q/A Social Networks: Using Temporal Features
Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Predicting Answer Quality in Q/A Social Networks: Using Temporal Features Yuanzhe Cai and Sharma Chakravarthy
More informationA Semi-Supervised Learning Approach to Enhance Community-based
A Semi-Supervised Learning Approach to Enhance Community-based Question Answering Papis Wongchaisuwat, MS 1 ; Diego Klabjan, PhD 1 ; Siddhartha Jonnalagadda, PhD 2 ; 1 Department of Industrial Engineering
More informationEvaluating and Predicting Answer Quality in Community QA
Evaluating and Predicting Answer Quality in Community QA Chirag Shah Jefferey Pomerantz School of Communication & Information (SC&I) School of Information & Library Science (SILS) Rutgers, The State University
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationCOMMUNITY QUESTION ANSWERING (CQA) services, Improving Question Retrieval in Community Question Answering with Label Ranking
Improving Question Retrieval in Community Question Answering with Label Ranking Wei Wang, Baichuan Li Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong
More informationPredicting Web Searcher Satisfaction with Existing Community-based Answers
Predicting Web Searcher Satisfaction with Existing Community-based Answers Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, Idan Szpektor, Emory University,
More informationA Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationA survey on click modeling in web search
A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationA Tri-Role Topic Model for Domain-Specific Question Answering
A Tri-Role Topic Model for Domain-Specific Question Answering Zongyang Ma Aixin Sun Quan Yuan Gao Cong School of Computer Engineering, Nanyang Technological University, Singapore 639798 {zma4, qyuan1}@e.ntu.edu.sg
More informationFinding High-Quality Content in Social Media
Finding High-Quality Content in Social Media Eugene Agichtein Emory University Atlanta, USA eugene@mathcs.emory.edu Aristides Gionis Yahoo! Research Barcelona, Spain gionis@yahoo-inc.com ABSTRACT The quality
More informationKnowledge and Social Networks in Yahoo! Answers
Knowledge and Social Networks in Yahoo! Answers Amit Rechavi Sagy Center for Internet Research Graduate School of Management, Univ. of Haifa Haifa, Israel Amit.rechavi@gmail.com Abstract This study defines
More informationDetecting Promotion Campaigns in Community Question Answering
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Detecting Promotion Campaigns in Community Question Answering Xin Li, Yiqun Liu, Min Zhang, Shaoping
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationSocial Tagging Behaviour in Community-driven Question Answering
Social Tagging Behaviour in Community-driven Question Answering Eduarda Mendes Rodrigues Natasa Milic-Frayling Blaz Fortuna Microsoft Research Microsoft Research Dept. of Knowledge Technologies 7 JJ Thomson
More informationBooming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services
Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services
More informationQuestion Quality in Community Question Answering Forums: A Survey
Question Quality in Community Question Answering Forums: A Survey ABSTRACT Antoaneta Baltadzhieva Tilburg University P.O. Box 90153 Tilburg, Netherlands a baltadzhieva@yahoo.de Community Question Answering
More informationLearning to Rank Revisited: Our Progresses in New Algorithms and Tasks
The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy
More informationRANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
More informationWill my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites
Will my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites Gideon Dror, Yoelle Maarek and Idan Szpektor Yahoo! Labs, MATAM, Haifa 31905, Israel {gideondr,yoelle,idan}@yahoo-inc.com
More informationFINDING THE RIGHT EXPERT Discriminative Models for Expert Retrieval
FINDING THE RIGHT EXPERT Discriminative Models for Expert Retrieval Philipp Sorg 1 and Philipp Cimiano 2 1 AIFB, Karlsruhe Institute of Technology, Germany 2 CITEC, University of Bielefeld, Germany philipp.sorg@kit.edu,
More informationRanking on Data Manifolds
Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname
More informationOn the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services
21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationA Classification-based Approach to Question Answering in Discussion Boards
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University Bethlehem, PA 18015 USA {lih307,davison}@cse.lehigh.edu
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationEvolution of Experts in Question Answering Communities
Evolution of Experts in Question Answering Communities Aditya Pal, Shuo Chang and Joseph A. Konstan Department of Computer Science University of Minnesota Minneapolis, MN 55455, USA {apal,schang,konstan}@cs.umn.edu
More informationA PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH
International Journal of Computer Science and System Analysis Vol. 5, No. 1, January-June 2011, pp. 37-43 Serials Publications ISSN 0973-7448 A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED
More informationInteractive Chinese Question Answering System in Medicine Diagnosis
Interactive Chinese ing System in Medicine Diagnosis Xipeng Qiu School of Computer Science Fudan University xpqiu@fudan.edu.cn Jiatuo Xu Shanghai University of Traditional Chinese Medicine xjt@fudan.edu.cn
More informationQASM: a Q&A Social Media System Based on Social Semantics
QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media
More informationDetecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com
More informationEarly Detection of Potential Experts in Question Answering Communities
Early Detection of Potential Experts in Question Answering Communities Aditya Pal 1, Rosta Farzan 2, Joseph A. Konstan 1, and Robert Kraut 2 1 Dept. of Computer Science and Engineering, University of Minnesota
More informationProbabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
More informationPredicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
More informationNew Metrics for Reputation Management in P2P Networks
New for Reputation in P2P Networks D. Donato, M. Paniccia 2, M. Selis 2, C. Castillo, G. Cortesi 3, S. Leonardi 2. Yahoo!Research Barcelona Catalunya, Spain 2. Università di Roma La Sapienza Rome, Italy
More informationInternational Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.
RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,
More informationII. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
More informationPersonalizing Image Search from the Photo Sharing Websites
Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in
More informationRemoving Web Spam Links from Search Engine Results
Removing Web Spam Links from Search Engine Results Manuel EGELE pizzaman@iseclab.org, 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features
More informationEffective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted
More informationSIGIR 2004 Workshop: RIA and "Where can IR go from here?"
SIGIR 2004 Workshop: RIA and "Where can IR go from here?" Donna Harman National Institute of Standards and Technology Gaithersburg, Maryland, 20899 donna.harman@nist.gov Chris Buckley Sabir Research, Inc.
More informationIdentifying Best Bet Web Search Results by Mining Past User Behavior
Identifying Best Bet Web Search Results by Mining Past User Behavior Eugene Agichtein Microsoft Research Redmond, WA, USA eugeneag@microsoft.com Zijian Zheng Microsoft Corporation Redmond, WA, USA zijianz@microsoft.com
More informationIdentifying Influential Scholars in Academic Social Media Platforms
Identifying Influential Scholars in Academic Social Media Platforms Na Li, Denis Gillet École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland {na.li, denis.gillet}@epfl.ch Abstract
More informationImproving Web Page Retrieval using Search Context from Clicked Domain Names
Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationBoosting the Feature Space: Text Classification for Unstructured Data on the Web
Boosting the Feature Space: Text Classification for Unstructured Data on the Web Yang Song 1, Ding Zhou 1, Jian Huang 2, Isaac G. Councill 2, Hongyuan Zha 1,2, C. Lee Giles 1,2 1 Department of Computer
More informationResolving Common Analytical Tasks in Text Databases
Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information
More informationA STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationIntelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives
Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos
More informationHITS vs. Non-negative Matrix Factorization
Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 HITS vs. Non-negative Matrix Factorization Yuanzhe Cai, Sharma Chakravarthy Technical Report CSE 2014
More informationSearching Questions by Identifying Question Topic and Question Focus
Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn
More informationCQARank: Jointly Model Topics and Expertise in Community Question Answering
CQARank: Jointly Model Topics and Expertise in Community Question Answering Liu Yang,, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen School of Software and Microelectronics,
More informationCAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
More informationRESEARCH ISSUES IN COMMUNITY BASED QUESTION ANSWERING
RESEARCH ISSUES IN COMMUNITY BASED QUESTION ANSWERING Mohan John Blooma, Centre of Commerce, RMIT International University, Ho Chi Minh City, Vietnam, blooma.john@rmit.edu.vn Jayan Chirayath Kurian, Centre
More informationIII. DATA SETS. Training the Matching Model
A Machine-Learning Approach to Discovering Company Home Pages Wojciech Gryc Oxford Internet Institute University of Oxford Oxford, UK OX1 3JS Email: wojciech.gryc@oii.ox.ac.uk Prem Melville IBM T.J. Watson
More informationExploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering
Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationINFERRING ANSWER QUALITY, ANSWERER EXPERTISE, AND RANKING IN QUESTION ANSWER SOCIAL NETWORKS YUANZHE CAI
INFERRING ANSWER QUALITY, ANSWERER EXPERTISE, AND RANKING IN QUESTION ANSWER SOCIAL NETWORKS by YUANZHE CAI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial
More informationImproving Question Retrieval in Community Question Answering Using World Knowledge
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang
More informationLow Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment
2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. ahmad@abdulkader.org
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationThe Social World of Content Abusers in Community Question Answering
The Social World of Content Abusers in Community Question Answering Imrul Kayes Computer Science and Engineering University of South Florida Tampa FL, USA imrul@mail.usf.edu Adriana Iamnitchi Computer
More informationTowards Inferring Web Page Relevance An Eye-Tracking Study
Towards Inferring Web Page Relevance An Eye-Tracking Study 1, iconf2015@gwizdka.com Yinglong Zhang 1, ylzhang@utexas.edu 1 The University of Texas at Austin Abstract We present initial results from a project,
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationEvidentiality for Text Trustworthiness Detection
Evidentiality for Text Trustworthiness Detection Qi Su 1, 2, Chu-Ren Huang and Helen Kai-yun Chen 1 Depart of Chinese & Bilingual Studies, The Hong Kong Polytechnic University 2 Key Laboratory of Computational
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationAn Evaluation of Classification Models for Question Topic Categorization
An Evaluation of Classification Models for Question Topic Categorization Bo Qu, Gao Cong, Cuiping Li, Aixin Sun, Hong Chen Renmin University, Beijing, China {qb8542,licuiping,chong}@ruc.edu.cn Nanyang
More informationMALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
More informationHow To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationSpam Host Detection Using Ant Colony Optimization
Spam Host Detection Using Ant Colony Optimization Arnon Rungsawang, Apichat Taweesiriwate and Bundit Manaskasemsak Abstract Inappropriate effort of web manipulation or spamming in order to boost up a web
More informationLDA Based Security in Personalized Web Search
LDA Based Security in Personalized Web Search R. Dhivya 1 / PG Scholar, B. Vinodhini 2 /Assistant Professor, S. Karthik 3 /Prof & Dean Department of Computer Science & Engineering SNS College of Technology
More informationMultimedia Answer Generation from Web Information
Multimedia Answer Generation from Web Information Avantika Singh Information Science & Engg, Abhimanyu Dua Information Science & Engg, Gourav Patidar Information Science & Engg Pushpalatha M N Information
More informationCorporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China
Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China Yuanyuan Man, Shuai Wang, Yi Li, Yong Zhang, Long Cheng,
More informationAnalyzing Download Time Performance of University Websites in India
, pp.1-6 http://dx.doi.org/10.14257/ijwse.2014.1.1.01 Analyzing Time Performance of University Websites in India G. Sreedhar Associate Professor Department of Computer Science, Rashtriya Sanskrit Vidyapeetha
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationRevenue Optimization with Relevance Constraint in Sponsored Search
Revenue Optimization with Relevance Constraint in Sponsored Search Yunzhang Zhu Gang Wang Junli Yang Dakan Wang Jun Yan Zheng Chen Microsoft Resarch Asia, Beijing, China Department of Fundamental Science,
More informationLarge Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More information