Subordinating to the Majority: Factoid Question Answering over CQA Sites

Size: px
Start display at page:

Download "Subordinating to the Majority: Factoid Question Answering over CQA Sites"

Transcription

1 Journal of Computational Information Systems 9: 16 (2013) Available at Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei ZHANG Institute of Computer Science and Technology, Nankai University, Tianjin , China Abstract Question Answering communities such as Yahoo! Answers have emerged as a popular media for online information seeking and knowledge sharing. When an asker doesn t choose the best answer, the best answer may be chosen by the voters. Unfortunately, the quality of the submitted questions and answers vary widely increasingly so that a large fraction of the content is not usable for answer queries. There re more and more researches on best answer selection. However, they require large amounts of training data or manually labeled data, which limits the applicability of the supervised approaches to new sites and domains. In this paper we address this problem by the similarity between answers. The similarity between any two answers is evaluated by VSM( vector space model). We regard the similarityas their effect for each other, and the effect is transmitted by iteration. The iteration stops when the computation reaches a stable state. Finally, the rank of answers depends on the iteration result and votes of others. The experimental results show that our approach leads to a better performance than other baseline approaches. Keywords: Community Question Answering; Best Answer Selection; Factoid Question; Answers Similarity 1 Introduction Community Question Answering (CQA) has become a popular media for online information seeking and knowledge sharing [1]. In the last few years, many CQA systems have been launched, including Yahoo! Answers, BuyAns, Live QnA. CQA sites make their content-questions and associated answers submitted on the site. Rather than browsing results of search engines, users present detailed information needs and get direct responses authored by humans. Su et al. [2] analyzed the quality of answers in QA portals and found that the quality of each answers vary significantly. In addition, the ability, or inability, to obtain a high-quality answer has significant impact on user satisfaction. Many previous approaches can be classified into the three categories. 1) Probabilistic approaches [3, 4, 5, 6]: They make researches on the content of CQA sites, which includes analysis of the Project supported by the National Nature Science Foundation of China (No ). Corresponding author. address: (Xiaojie YUAN) / Copyright 2013 Binary Information Press DOI: /jcis7716 August 15, 2013

2 6410 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) content and the quality of the questions and answers. Some methods analyze the reputation systems and social norms on these sites. 2) Link-based approaches [7, 8]: The methods assume that good users give good answers. Therefore, their target is to find expert for a given question by user networks. They apply link-analysis algorithms PageRank [14], HITS [15] to identify users with high expertise. 3) Learning-based approaches [9, 10, 11]: The methods extract various features from questions, answers, and the users who posted them, and training a number of classifiers to select the best answer using those features. In a word, existing methods either require large amounts of supervision or only focus on the network properties of the CQA. Some methods consider the content similarity between questions and answers, without the content similarity between the answers. In this paper we present a ranking framework to take advantage of the similarity between the answers to retrieve high quality answer for factoid question. For factoid question, the best answer is generally definite. The majority of people give similar answers. Our goal is to find the most supported answer by the similarity and votes of others. We construct similarity matrix by computing the similarity between any two answers, which is evaluated by VSM( vector space model). The score of an answer is just the expected score of answers it s similar to. The effect factor between the two answers is described by the similarity, and it s transmitted by iteration. The iteration stops when the computation reaches a stable state. Then the scores of answers are modified with the vote information. The experimental results show our approach leads to a better performance than other baseline approaches. To our knowledge, this is the first method of no training data and manually labeled data, which is fit for large amount of question-answers in CQA sites. The rest of this paper is organized as follows. Section 2 reviews some prior work related to our approach. Section 3 details the proposed method including algorithms. Section 4 reports on the performance study. At last, we conclude the paper in Section 5. 2 Related Work Probabilistic approaches focus on the content of CQA sites. Bian et al. [4] utilized users interactions to retrieve relevant high-quality content in social media. It explored the algorithm to integrate relevance, user interaction, and community feedback information to find the right factual, well-formed content to answer a user s question. Wang et al. [5] assumed that answers were connected to their questions with various types of latent links, and proposed an analogical reasoning-based approach which measured the analogy between the new question-answer linkages and those of previous relevant knowledge which contained only positive links; the candidate answer which had the most analogous link was assumed to be the best answer. Linked-based methods have been shown to be successful for several tasks in social media. Their target is to discover users authorities by user networks, which is also called expert finding. Jurczyk et al. [7] and Zhang et al. [8] evaluated link algorithms PageRank and HITS to rank users based on their authority scores. The difference is that Zhang et al. is applied to a small data set. Some researchers resorted to machine learning techniques. Jeon et al. [9] extracted a number of non-textual features which cover the contextual information of questions and their answers, and proposed a language modeling-based retrieval model for processing these features in order

3 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) to predict the quality of answers collected from a specific CQA service. Agichtern et al. [12] introduced a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. Blooma et al. [13] proposed more features, textual and non-textual, and used regression analyzers to generate predictive features for the best answer identification. Shah and Pomerantz [11] measured the quality of answers in CQA sites by extracting various features and training a number of classifiers to select the best answer using those features. The definition of quality is, akin to popularity. Bian et al. [10] developed a semi-supervised coupled mutual reinforcement framework for simultaneously calculating content quality and user reputation, that requires relatively few labeled examples to initialize the training process. Closest to our work, Ko et al. [3] focused on developing a unified framework that not only used multiple resources for validating answer candidates, but also considered evidence of similarity among answer candidates in order to boost the ranking of the correct answer. In their another paper [16], they applied a probabilistic graphical model for answer ranking in question answering. This model estimated the joint probability of correctness of all answer candidates, from which the probability of correctness of an individual candidate can be inferred. The joint prediction model can estimate both the correctness of individual answers as well as their correlations, which enables a list of accurate and comprehensive answers. However, the two methods need training data. 3 Prediction Model In this section we will describe how to find the best answer of factoid question over CQA sites. We start with a more precise definition of the problem of best answer retrieval. 3.1 Problem definition In QA systems, there are a very large amount of questions and answers posted by a diverse community of users. One posted question can attract several answers from a number of different users. For factoid question, the best answer is generally definite. There are some similarity between these answers. Our goal is to find the most supported answer by the similarity. The most supported answer is regarded as the best answer. Definition 1 (Score of answers) The score of an answer A i (denoted by score(a i )) in a answer set A is the probability of A i being the best answer. We abstract the social content in QA system as a set of question-answers triples: < q, A, V > where q is one of factoid questions in the whole archive of the QA system, A is the answer set to this question. V is the vote set corresponding with the answers. Each answer have a positive vote (thumbs up) and a negative vote (thumbs down). For A i, the vote information of the ith answer to question q is: V i =< upnum, downnum >

4 6412 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) We will first discuss how to describe the similarity between the answers with the text content and then discuss a support matrix for iterative computation. Finally, we discuss how to integrate vote information for best answer retrieval. 3.2 Support matrix For question q, the similarity between answer A i and answer A j is described by text similarity sim(a i, A j ). sim(a i, A j ) is based on VSM (vector space model). We abstract nouns as index item. For answer set A, we compute the similarity sim(a i, A j ) between any two answers(a i,a j,i j). sim(a i, A j ) = sim(a j, A i ), sim(a i, A i ) = 0, so the time complexity is n(n 1), n is the number of 2 answers in the A i. The similarity matrix is defined as follows: M sim = s 11 s s 1n s 21 s s 2n s n1 s n2... s nn where s ij = sim(a i, A j ) (1) For A i and A j, A i may only be similar to A j while A j is similar to many answers. The support degree between is different. Therefore, we normalize the similarity matrix to describe the support degree from other answers, which is called as support matrix M sup : t 11 t t 1n M sup = t 21 t t 2n where t ij = t n1 t n2... t nn s ij n i=1 s, s ij M sim (2) ij M sim focuses on the relationship between two answers. M sup considers the effect of other answers. In the M sup, t ij t ji. 3.3 Iterative computation An answer is ranked higher as there are more answers that are similar to it. An answer that is supported by many answers with high scores receives a high rank itself. If no answer is similar to an answer, there is no support for that answer. As in Authority-hub analysis and PageRank, BestF inder adopts an iterative method to compute the scores of answers. Initially, it has very little information about the answers. At each iteration BestF inder updates the scores of answers. Finally, it stops when the computation reaches a stable state. The score of an answer is just the expected score of answers it s similar to. For answer A i, we compute its score score(a i ) by calculating the average score of answers that has support degree to A i. m score(a j ) M sup [j, i] score(a i ) = j=1 m, M sup [j, i] > 0, j i (3)

5 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) m is the number of the answers that are similar to A i. We choose the initial state in which all answers have a uniform score s 0. (s 0 is set to 1/n, n is the number of answers.) In each iteration, BestF inder improves the score of high-quality answer while reduces the score of lowquality answer. It stops iterating when it reaches a stable state. The stableness is measured by the change of the scores of all answers. If it changes a little after an iteration, then BestF inder will stop. Algorithm 1: Iterative computation function Input: support matrix M sup [n][n], Answers old score array oldscore[n] Output: Answers new score array newscore[n] for i = 0; i < n; i + + do newscore[n] = 0 ; nonzeronum = 0 ; for j = 0; j < n; j + + do /* count the number of answers which are similar to answer A i */ if M sup [j][i] > 0 then newscore[i] = newscore[i] + oldscore[i] M sup [j][i] ; nonzeronum = nonzeronum + 1 ; end end /* for the expected score */ if nonzeronum > 0 then newscore[i] = newscore[i] ; nonzeronum end end 3.4 Answers score In the Yahoo! Answers, after reading existing answers for a question, a user can give his or her judgment as the evaluation for the answers. If he or she considers the answer as useful, he or she can add a plus vote to this answer. Otherwise, a minus votes may be added to the answer. If the asker doesn t choose the best answer after some fixed period of time, the best answer may be chosen by the voters. Therefore, vote information from others is an important factor for answer selection. The answer s score should integrate the answer similarity and vote information. We introduce two effect factors α and β to describe the effect degree of answers similarity and others votes. Then we can define the answer s score as follows: score(a i ) = α score(a i) n + β score(a i ) i=1 V up i V up i Vi down + 2, α + β = 1 (4) In the experiment, we set α = 0.6, β = 0.4. Some answers have no votes. In this case, the support of the answers is 0.5. That is, the probability of being supported is the same as the probability of being opposed to. Therefore, we add a constant 2 to the denominator, and add a constant 1 to the numerator. For an answer set A, if there s no similarity between any two answers, the scores of answers are decided by the portion of votes information. Finally, the answer with the maximal score is the best answer.

6 6414 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) Experimental Evaluation We now describe the measures used for the evaluation, the dataset and the experimental results. For evaluation, we consider an answer to be a high quality answer, if the asker chose it as the best answer, and gave it a rating of at least Datasets We use the same datasets used in [10]. They used the TREC QA benchmarks to crawl QA archives and related user information. This was done by submitting TREC QA queries into the CQA site and retrieving the returned questions, answers and related users. The factoid questions are from seven years of the TREC QA track evaluations (years ). They submitted each TREC query to the Yahoo! Answers web service and retrieve up to 10 top-ranked related questions according to the Yahoo! Answers ranking. The detail of data collection can be found in the paper [10]. There are, in total, users, questions and answers. Note that, although the proportion of factoid questions in Yahoo! Answers may not be large, we use them in order to have objective metric of correctness, and extrapolate performance to whole QA archives. 4.2 Evaluation metrics We consider an answer to be a high quality answer, if the asker chose it as the best answer, and gave it a rating of at least 3. Therefore, there s only one correct answer for a question. Two metrics were used for the evaluation. One is Accuracy: for a given question, Accuracy reports the fraction of answers ranked in the first that was chosen as the best answer. We used the best answer tagged by the Yahoo! Answers web site as the ground truth. Since Accuracy ignores the exact rank of a correct answer, we used Mean Reciprocal Rank (MRR) metric for compensation. The MRR of each individual query is the reciprocal of the rank at which the first relevant answer was returned, or 0 if none of the top N results contained a relevant answer. The score for a sequence of queries is the mean of the individual query s reciprocal ranks. Thus, MRR is calculated as: 4.3 Methods compared MRR = 1 Q r q Q r 1 r q To our knowledge, this is the first method of no training data and manually labeled. To evaluate the Q&A quality, we compare the quality of the baseline methods: Baseline BestRatio: Answers are ranked by the best answer ratio of answerers. The best ratio is the ratio of the answerer s answers being regarded as the best answer. It indicates an answerer s authority. Baseline Votes: Answers are ranked by the score computed as the difference of thumbs-up votes and thumbs-down votes received for each answer. This ranking closely approximates

7 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) the ranking obtained when a user clicks order by votes option on the Yahoo! Answers site. The detail of this method and how to compute MRR under this setting is discussed in [4]. 4.4 Experimental results Figure 1 illustrates the performance of our method and the baselines with varying number of candidate answers. BestF inder significantly outperformed the baselines with less than 5 candidate answers. The Baseline V otes is stable with more than 8 candidate answers. BestF inder is not very effective with more candidate answers. That s because of the complexity of CQA sites. There may be some correct answers, but the system requires only one best answer. The choice of best answer depends on the asker. He/she may integrate some subjective factors. In contrary to traditional QA, answerers give more detailed description, which decrease the weight of keywords. In fact, most questions have less than 5 answers. Therefore, BestF inder is more effective as a whole. Fig. 1: MPP and Accuracy of BestF inder and baselines for varying number of candidate answers Figure 2 shows the changes of answers scores after each iteration, which is defined as Euclidean distance of the old and new scores. We can see of answers converges in a steady speed. Therefore, BestF inder doesn t require too much iteration to reach a stable state. Fig. 2: Changes of answers scores after each iteration 5 Conclusions We presented a framework for non-supervised best answer selection of factoid questions in Community Question Answering. We regard the similarity between any two answers as their effect for

8 6416 X. Lian et al. /Journal of Computational Information Systems 9: 16 (2013) each other, and the effect is transmitted by iteration. The iteration stops when the computation reaches a stable state. Finally, the rank of answers depends on the iteration result and votes of others. We have demonstrated the effectiveness of BestF inder in large-scale experiments of a CQA dataset comprising over 100,000 users, 27,000 questions and 200,000 answers. In contrary to supervised method, BestF inder doesn t require training data and manually labeled data. In addition, our experiments demonstrate significant improvements over the baselines especially for the less answers. References [1] L. A. Adamic, J. Zhang, E. Bakshy and M. S. Ackerman. Knowledge sharing and yahoo answers: Everyone knows something. In Proc of WWW, 2008, pp [2] Q. Su, D. Pavlov, J. Chow and W. Baker. Internet-scale collection of human-reviewed data. In Proc of WWW, 2007, pp [3] J. Ko, L. Si and E. Nyberg. A probabilistic framework for answer selection in question answering. In Proc of NAACL HLT, 2007, pp [4] J. Bain, Y. Liu, E. Agichtein and H. Zha. Finding the right facts in the crowd: Factoid question answering over social media. In: Proc. of WWW, 2008, pp [5] X. Wang, X. Tu, D. Feng and L. Zhang. Ranking community answers by modeling question-answer relationships via analogical reasoning. In: Proc. of SIGIR, 2009, pp [6] J. Liu, S. Wang, Y. Peng, X. Huang and W. Wang. Answer Extraction of Chinese Restricted Domain Question Answering System Based on Ontology. Journal of Computational Information Systems 2010, 6(1), [7] P. Jurczyk and E. Agichterin. Discovering authorities in question answer communities by using link analysis. In: Proc. of ACM CIKM, 2007, pp [8] J. Zhang, M. S. Ackerman and L. Adamic. Expertise networks in online communities: structure and algorithms. In: Proc. of WWW, 2007, pp [9] J. Jeon, W. Croft, J. Lee and S. Park. A framework to predict the quality of answers with nontextual features. In Proc of SIGIR HLT, 2006, pp [10] J. Bain, Y. Liu, D. Zhou, E. Agichtein and H. Zha. Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In: Proc. of WWW, 2009, pp (2009). [11] C. Shah and J. Pomerantz. Evaluating and predicting answer quality in community QA. In: Proc. of SIGIR, 2010, pp [12] E. Agichtein, C. Castillo, D. Donato, A. Gionis and G. Mishne. Finding high-quality content in social media with an application to community-based question answering. In Proc of WSDM, 2008, pp [13] M. Blooma, A. Chua and D. Goh. A predictive framework for retrieving the best answer. In: Proc. of SAC, 2008, pp [14] L. Page, S. Brin, R. Motwani and T. Winograd. The pagerank citation ranking: Bringing order to the web. In: Technical report, Stanford Digital Library Technologies Project, [15] J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999, 46(5), [16] J. Ko, L. Si and E. Nyberg. A probabilistic graphical model for joint answer ranking in question answering. In: Proc. of SIGIR, 2007, pp

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein

More information

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media ABSTRACT Jiang Bian College of Computing Georgia Institute of Technology Atlanta, GA 30332 jbian@cc.gatech.edu Eugene

More information

Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

More information

Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

More information

Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

More information

Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

More information

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Xin-Jing Wang Microsoft Research Asia 4F Sigma, 49 Zhichun Road Beijing, P.R.China xjwang@microsoft.com Xudong

More information

Incorporate Credibility into Context for the Best Social Media Answers

Incorporate Credibility into Context for the Best Social Media Answers PACLIC 24 Proceedings 535 Incorporate Credibility into Context for the Best Social Media Answers Qi Su a,b, Helen Kai-yun Chen a, and Chu-Ren Huang a a Department of Chinese & Bilingual Studies, The Hong

More information

Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School

More information

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of

More information

Integrated Expert Recommendation Model for Online Communities

Integrated Expert Recommendation Model for Online Communities Integrated Expert Recommendation Model for Online Communities Abeer El-korany 1 Computer Science Department, Faculty of Computers & Information, Cairo University ABSTRACT Online communities have become

More information

Model for Voter Scoring and Best Answer Selection in Community Q&A Services

Model for Voter Scoring and Best Answer Selection in Community Q&A Services Model for Voter Scoring and Best Answer Selection in Community Q&A Services Chong Tong Lee *, Eduarda Mendes Rodrigues 2, Gabriella Kazai 3, Nataša Milić-Frayling 4, Aleksandar Ignjatović *5 * School of

More information

Information Quality on Yahoo! Answers

Information Quality on Yahoo! Answers Information Quality on Yahoo! Answers Pnina Fichman Indiana University, Bloomington, United States ABSTRACT Along with the proliferation of the social web, question and answer (QA) sites attract millions

More information

Routing Questions for Collaborative Answering in Community Question Answering

Routing Questions for Collaborative Answering in Community Question Answering Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com

More information

Predicting Answer Quality in Q/A Social Networks: Using Temporal Features

Predicting Answer Quality in Q/A Social Networks: Using Temporal Features Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Predicting Answer Quality in Q/A Social Networks: Using Temporal Features Yuanzhe Cai and Sharma Chakravarthy

More information

A Semi-Supervised Learning Approach to Enhance Community-based

A Semi-Supervised Learning Approach to Enhance Community-based A Semi-Supervised Learning Approach to Enhance Community-based Question Answering Papis Wongchaisuwat, MS 1 ; Diego Klabjan, PhD 1 ; Siddhartha Jonnalagadda, PhD 2 ; 1 Department of Industrial Engineering

More information

Evaluating and Predicting Answer Quality in Community QA

Evaluating and Predicting Answer Quality in Community QA Evaluating and Predicting Answer Quality in Community QA Chirag Shah Jefferey Pomerantz School of Communication & Information (SC&I) School of Information & Library Science (SILS) Rutgers, The State University

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

COMMUNITY QUESTION ANSWERING (CQA) services, Improving Question Retrieval in Community Question Answering with Label Ranking

COMMUNITY QUESTION ANSWERING (CQA) services, Improving Question Retrieval in Community Question Answering with Label Ranking Improving Question Retrieval in Community Question Answering with Label Ranking Wei Wang, Baichuan Li Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

A survey on click modeling in web search

A survey on click modeling in web search A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

More information

Predicting Web Searcher Satisfaction with Existing Community-based Answers

Predicting Web Searcher Satisfaction with Existing Community-based Answers Predicting Web Searcher Satisfaction with Existing Community-based Answers Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, Idan Szpektor, Emory University,

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

Knowledge and Social Networks in Yahoo! Answers

Knowledge and Social Networks in Yahoo! Answers Knowledge and Social Networks in Yahoo! Answers Amit Rechavi Sagy Center for Internet Research Graduate School of Management, Univ. of Haifa Haifa, Israel Amit.rechavi@gmail.com Abstract This study defines

More information

A Tri-Role Topic Model for Domain-Specific Question Answering

A Tri-Role Topic Model for Domain-Specific Question Answering A Tri-Role Topic Model for Domain-Specific Question Answering Zongyang Ma Aixin Sun Quan Yuan Gao Cong School of Computer Engineering, Nanyang Technological University, Singapore 639798 {zma4, qyuan1}@e.ntu.edu.sg

More information

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

Finding High-Quality Content in Social Media

Finding High-Quality Content in Social Media Finding High-Quality Content in Social Media Eugene Agichtein Emory University Atlanta, USA eugene@mathcs.emory.edu Aristides Gionis Yahoo! Research Barcelona, Spain gionis@yahoo-inc.com ABSTRACT The quality

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Detecting Promotion Campaigns in Community Question Answering

Detecting Promotion Campaigns in Community Question Answering Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Detecting Promotion Campaigns in Community Question Answering Xin Li, Yiqun Liu, Min Zhang, Shaoping

More information

USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP

USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP IADIS International Journal on WWW/Internet Vol. 12, No. 1, pp. 52-64 ISSN: 1645-7641 USER INTENT PREDICTION FROM ACCESS LOG IN ONLINE SHOP Hidekazu Yanagimoto. Osaka Prefecture University. 1-1, Gakuen-cho,

More information

Will my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites

Will my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites Will my Question be Answered? Predicting Question Answerability in Community Question-Answering Sites Gideon Dror, Yoelle Maarek and Idan Szpektor Yahoo! Labs, MATAM, Haifa 31905, Israel {gideondr,yoelle,idan}@yahoo-inc.com

More information

Social Tagging Behaviour in Community-driven Question Answering

Social Tagging Behaviour in Community-driven Question Answering Social Tagging Behaviour in Community-driven Question Answering Eduarda Mendes Rodrigues Natasa Milic-Frayling Blaz Fortuna Microsoft Research Microsoft Research Dept. of Knowledge Technologies 7 JJ Thomson

More information

Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services

Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services

More information

Question Quality in Community Question Answering Forums: A Survey

Question Quality in Community Question Answering Forums: A Survey Question Quality in Community Question Answering Forums: A Survey ABSTRACT Antoaneta Baltadzhieva Tilburg University P.O. Box 90153 Tilburg, Netherlands a baltadzhieva@yahoo.de Community Question Answering

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

FINDING THE RIGHT EXPERT Discriminative Models for Expert Retrieval

FINDING THE RIGHT EXPERT Discriminative Models for Expert Retrieval FINDING THE RIGHT EXPERT Discriminative Models for Expert Retrieval Philipp Sorg 1 and Philipp Cimiano 2 1 AIFB, Karlsruhe Institute of Technology, Germany 2 CITEC, University of Bielefeld, Germany philipp.sorg@kit.edu,

More information

Ranking on Data Manifolds

Ranking on Data Manifolds Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

More information

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services 21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Probabilistic topic models for sentiment analysis on the Web

Probabilistic topic models for sentiment analysis on the Web University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

More information

Early Detection of Potential Experts in Question Answering Communities

Early Detection of Potential Experts in Question Answering Communities Early Detection of Potential Experts in Question Answering Communities Aditya Pal 1, Rosta Farzan 2, Joseph A. Konstan 1, and Robert Kraut 2 1 Dept. of Computer Science and Engineering, University of Minnesota

More information

A Classification-based Approach to Question Answering in Discussion Boards

A Classification-based Approach to Question Answering in Discussion Boards A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University Bethlehem, PA 18015 USA {lih307,davison}@cse.lehigh.edu

More information

Evolution of Experts in Question Answering Communities

Evolution of Experts in Question Answering Communities Evolution of Experts in Question Answering Communities Aditya Pal, Shuo Chang and Joseph A. Konstan Department of Computer Science University of Minnesota Minneapolis, MN 55455, USA {apal,schang,konstan}@cs.umn.edu

More information

II. RELATED WORK. Sentiment Mining

II. RELATED WORK. Sentiment Mining Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

More information

QASM: a Q&A Social Media System Based on Social Semantics

QASM: a Q&A Social Media System Based on Social Semantics QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media

More information

Interactive Chinese Question Answering System in Medicine Diagnosis

Interactive Chinese Question Answering System in Medicine Diagnosis Interactive Chinese ing System in Medicine Diagnosis Xipeng Qiu School of Computer Science Fudan University xpqiu@fudan.edu.cn Jiatuo Xu Shanghai University of Traditional Chinese Medicine xjt@fudan.edu.cn

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH

A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH International Journal of Computer Science and System Analysis Vol. 5, No. 1, January-June 2011, pp. 37-43 Serials Publications ISSN 0973-7448 A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

New Metrics for Reputation Management in P2P Networks

New Metrics for Reputation Management in P2P Networks New for Reputation in P2P Networks D. Donato, M. Paniccia 2, M. Selis 2, C. Castillo, G. Cortesi 3, S. Leonardi 2. Yahoo!Research Barcelona Catalunya, Spain 2. Università di Roma La Sapienza Rome, Italy

More information

ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

Exploring Adaptive Window Sizes for Entity Retrieval

Exploring Adaptive Window Sizes for Entity Retrieval Exploring Adaptive Window Sizes for Entity Retrieval Fawaz Alarfaj, Udo Kruschwitz, and Chris Fox School of Computer Science and Electronic Engineering University of Essex Colchester, CO4 3SQ, UK {falarf,udo,foxcj}@essex.ac.uk

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer. RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

More information

SIGIR 2004 Workshop: RIA and "Where can IR go from here?"

SIGIR 2004 Workshop: RIA and Where can IR go from here? SIGIR 2004 Workshop: RIA and "Where can IR go from here?" Donna Harman National Institute of Standards and Technology Gaithersburg, Maryland, 20899 donna.harman@nist.gov Chris Buckley Sabir Research, Inc.

More information

Personalizing Image Search from the Photo Sharing Websites

Personalizing Image Search from the Photo Sharing Websites Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Removing Web Spam Links from Search Engine Results

Removing Web Spam Links from Search Engine Results Removing Web Spam Links from Search Engine Results Manuel EGELE pizzaman@iseclab.org, 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features

More information

Identifying Best Bet Web Search Results by Mining Past User Behavior

Identifying Best Bet Web Search Results by Mining Past User Behavior Identifying Best Bet Web Search Results by Mining Past User Behavior Eugene Agichtein Microsoft Research Redmond, WA, USA eugeneag@microsoft.com Zijian Zheng Microsoft Corporation Redmond, WA, USA zijianz@microsoft.com

More information

Identifying Influential Scholars in Academic Social Media Platforms

Identifying Influential Scholars in Academic Social Media Platforms Identifying Influential Scholars in Academic Social Media Platforms Na Li, Denis Gillet École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland {na.li, denis.gillet}@epfl.ch Abstract

More information

Improving Web Page Retrieval using Search Context from Clicked Domain Names

Improving Web Page Retrieval using Search Context from Clicked Domain Names Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Resolving Common Analytical Tasks in Text Databases

Resolving Common Analytical Tasks in Text Databases Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

Boosting the Feature Space: Text Classification for Unstructured Data on the Web

Boosting the Feature Space: Text Classification for Unstructured Data on the Web Boosting the Feature Space: Text Classification for Unstructured Data on the Web Yang Song 1, Ding Zhou 1, Jian Huang 2, Isaac G. Councill 2, Hongyuan Zha 1,2, C. Lee Giles 1,2 1 Department of Computer

More information

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos

More information

Searching Questions by Identifying Question Topic and Question Focus

Searching Questions by Identifying Question Topic and Question Focus Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn

More information

HITS vs. Non-negative Matrix Factorization

HITS vs. Non-negative Matrix Factorization Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 HITS vs. Non-negative Matrix Factorization Yuanzhe Cai, Sharma Chakravarthy Technical Report CSE 2014

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

CQARank: Jointly Model Topics and Expertise in Community Question Answering

CQARank: Jointly Model Topics and Expertise in Community Question Answering CQARank: Jointly Model Topics and Expertise in Community Question Answering Liu Yang,, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen School of Software and Microelectronics,

More information

RESEARCH ISSUES IN COMMUNITY BASED QUESTION ANSWERING

RESEARCH ISSUES IN COMMUNITY BASED QUESTION ANSWERING RESEARCH ISSUES IN COMMUNITY BASED QUESTION ANSWERING Mohan John Blooma, Centre of Commerce, RMIT International University, Ho Chi Minh City, Vietnam, blooma.john@rmit.edu.vn Jayan Chirayath Kurian, Centre

More information

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of

More information

III. DATA SETS. Training the Matching Model

III. DATA SETS. Training the Matching Model A Machine-Learning Approach to Discovering Company Home Pages Wojciech Gryc Oxford Internet Institute University of Oxford Oxford, UK OX1 3JS Email: wojciech.gryc@oii.ox.ac.uk Prem Melville IBM T.J. Watson

More information

An Evaluation of Classification Models for Question Topic Categorization

An Evaluation of Classification Models for Question Topic Categorization An Evaluation of Classification Models for Question Topic Categorization Bo Qu, Gao Cong, Cuiping Li, Aixin Sun, Hong Chen Renmin University, Beijing, China {qb8542,licuiping,chong}@ruc.edu.cn Nanyang

More information

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

INFERRING ANSWER QUALITY, ANSWERER EXPERTISE, AND RANKING IN QUESTION ANSWER SOCIAL NETWORKS YUANZHE CAI

INFERRING ANSWER QUALITY, ANSWERER EXPERTISE, AND RANKING IN QUESTION ANSWER SOCIAL NETWORKS YUANZHE CAI INFERRING ANSWER QUALITY, ANSWERER EXPERTISE, AND RANKING IN QUESTION ANSWER SOCIAL NETWORKS by YUANZHE CAI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial

More information

Improving Question Retrieval in Community Question Answering Using World Knowledge

Improving Question Retrieval in Community Question Answering Using World Knowledge Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

More information

Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment

Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment 2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. ahmad@abdulkader.org

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Towards Inferring Web Page Relevance An Eye-Tracking Study

Towards Inferring Web Page Relevance An Eye-Tracking Study Towards Inferring Web Page Relevance An Eye-Tracking Study 1, iconf2015@gwizdka.com Yinglong Zhang 1, ylzhang@utexas.edu 1 The University of Texas at Austin Abstract We present initial results from a project,

More information

The Social World of Content Abusers in Community Question Answering

The Social World of Content Abusers in Community Question Answering The Social World of Content Abusers in Community Question Answering Imrul Kayes Computer Science and Engineering University of South Florida Tampa FL, USA imrul@mail.usf.edu Adriana Iamnitchi Computer

More information

A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication

A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

Evidentiality for Text Trustworthiness Detection

Evidentiality for Text Trustworthiness Detection Evidentiality for Text Trustworthiness Detection Qi Su 1, 2, Chu-Ren Huang and Helen Kai-yun Chen 1 Depart of Chinese & Bilingual Studies, The Hong Kong Polytechnic University 2 Key Laboratory of Computational

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

A Solution for Data Inconsistency in Data Integration *

A Solution for Data Inconsistency in Data Integration * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 27, 681-695 (2011) A Solution for Data Inconsistency in Data Integration * Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai,

More information

Multimedia Answer Generation from Web Information

Multimedia Answer Generation from Web Information Multimedia Answer Generation from Web Information Avantika Singh Information Science & Engg, Abhimanyu Dua Information Science & Engg, Gourav Patidar Information Science & Engg Pushpalatha M N Information

More information

Spam Host Detection Using Ant Colony Optimization

Spam Host Detection Using Ant Colony Optimization Spam Host Detection Using Ant Colony Optimization Arnon Rungsawang, Apichat Taweesiriwate and Bundit Manaskasemsak Abstract Inappropriate effort of web manipulation or spamming in order to boost up a web

More information

LDA Based Security in Personalized Web Search

LDA Based Security in Personalized Web Search LDA Based Security in Personalized Web Search R. Dhivya 1 / PG Scholar, B. Vinodhini 2 /Assistant Professor, S. Karthik 3 /Prof & Dean Department of Computer Science & Engineering SNS College of Technology

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information