Topical Authority Identification in Community Question Answering
|
|
- Rebecca Barker
- 8 years ago
- Views:
Transcription
1 Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95 Zhongguancun East Road, Beijing , China Abstract. In this paper, we address the problem of authority identification in community question answering (CQA). Most of the existing approaches attempt to identify authorities in CQA by means of link analysis techniques. However, these traditional techniques only consider the link structure while ignore the topic information about the users, giving rise to an increasing problem of topic drift. Tosolvetheproblem of topic drift, we propose a topical ranking method, which is an extension of PageRank algorithm to identify authorities in CQA. Compared to the traditional link analysis techniques, our proposed method is more effective because it measures the authority scores by taking into account both the link structure and the topic information. We conduct experiments on real world data set from Yahoo! Answers. Experimental results show that our proposed method significantly outperforms the traditional link analysis techniques and achieves the state-of-the-art performance for authority identification in CQA. Keywords: authority identification; PageRank; community question answering. 1 Introduction Community question answering (CQA) is a particular form of online service for leveraging user-generated content, which has gained increasing popularity in recent years. These online services, such as Yahoo! Answers 1 and Live QnA 2, provide a platform for users to ask and answer questions. Unfortunately, the quality of answers has high variance: ranging from very high to low quality, sometimes abusive content or even spam [1]. Therefore, it is desirable to automatically identify authorities in CQA, so as to route the newly posted questions to the appropriate authorities, who can provide good quality answers to these questions [2,3,4]. Finally, the overall quality of answers can be substantially improved C.-L. Liu, C. Zhang, and L. Wang (Eds.): CCPR 2012, CCIS 321, pp , c Springer-Verlag Berlin Heidelberg 2012
2 Topical Authority Identification in Community Question Answering 623 Authority Identification in CQA is the task of identifying users who can provide a large number of high quality, complete, and reliable answers [5], which has recently gained a wide interest in NLP and IR communities [2,3,6,7]. These existing approaches measure the authority scores by means of link analysis techniques such as PageRank [8] and HITS [9], or their variants. However, the traditional link analysis techniques only consider the link structure while ignore the topic information about the users, giving rise to an increasing problem of topic drift. To tackle the problem of topic drift, this paper proposes a topical ranking method for authority identification in CQA. Given a set of users, we first automatically distill the topics that users are interested in by analyzing the content of their answered questions and the corresponding answers. Based on the topics distilled, topic-specific question-answer relationships between askers and answerers are constructed. Finally, we measure the authority scores by taking into account both the link structure and the topic information about users. To the best of our knowledge, this is the first extensive and empirical study of identifying authoritative users in CQA by taking into account both the link structure and the topic information about users. To date, little work has been made regarding topic information about users in studies of authority identification in CQA, which remains an under-explored research area. This paper is thus designed to fill the gap. Specially, we make the following contributions: We automatically distill the topics that users are interested in by analyzing the content of their answered questions and the corresponding answers (in Section 2.1). We propose a topical ranking method by taking into account both the link structure and the topic information about users (in Section 2.2). Finally, we conduct experiments on CQA data set. The results show that our proposed method significantly outperforms the traditional link analysis techniques (in Section 3). The rest of this paper is organized as follows. Section 2 presents our proposed method. Section 3 presents the experimental results. Finally, we conclude with ideas for future work in Section 4. 2 Topical Authority Identification 2.1 Topic Distillation Topic distillation aims to automatically identify the topics that users (askers and answerers) are interested in based on the user profiles. 3 In this paper, we use the widely studied topic model Latent Dirichlet Allocation (LDA) [10] to identify the latent topic information from the large scale question-answer collection. 3 Here, the user profiles refer to the questions answered by the users and the corresponding answers.
3 624 G. Zhou, K. Liu, and J. Zhao LDA is a bayesian probabilistic graphical model, which models each document as a mixture of underlying topics and generates each word from one topic. The generation process of a document is described in Table 1. A document d is associated with a multinomial distribution over K topics, which is denoted as θ d.for each word w di in document d: (1) a topic z di is first sampled from the multinomial distribution θ d, which is generated from the Dirichlet prior parameterized by α; (2) then each word w di is generated from multinomial distribution φ zdi, which is generated from the Dirichlet prior parameterized by β. The two Dirichlet priors for document-topic distributions θ d and topic-word distributions φ z reduce the probability of overfitting training documents and enhance the ability of inferring topic distribution for new documents [11]. Here, we employ Gibbs sampling [12] for parameter estimation due to its faster convergence and better performance [13]. Table 1. The generation process of LDA For each topic z i {1,,K}, sample a multinomial distribution over words, φ zi Dir(β) For each document d: 1. sample a multinomial distribution over topics, θ d Dir(α) 2. For each word w di in document d: * sample a topic z di Multinomial(θ d ) * sample a word w di Multinomial(φ zdi ) To distill the topics that users are interested in using LDA, documents should naturally correspond to questions and answers. However, since the goal is to distill the topics that each user is interested in rather than the topics that each question and the corresponding answers are about, we aggregate the user profiles provided by each individual user into a big document. Thus, each document essentially corresponds to a user. The results of topic distillation are represented in two matrices: DK =[θ] D K,a D K matrix, where D is the number of users, and K is the number of topics. DK ij DK contains the number of times a word in u i s profiles (questions and the corresponding answers) has been assigned to topic z j. WK = [φ] W K,a W K matrix, where W is the number of unique words used in question-answer collection, and K is the number of topics. WK ij WK denotes the number of times unique word w i has been assigned to the specific topic z j. In these two matrices, matrix DK contains the number of times a word in a user (e.g., u i ) profiles has been assigned to a particular topic. We can row normalize it as DK such that DK i 1 =1foreachrowDK i.. Each row of matrix DK is the probability distribution of u i s interest over the K topics, e.g., each element DK ij denotes the probability that u i is interested in topic z j ( P (z j u i )=DK ij ).
4 Topical Authority Identification in Community Question Answering PageRank for Authority Identification Based on the topics distilled in subsection 2.1, a directed graph G =(V,E) is formed with the topic-specific question-answer relationships among users. V is a set of nodes representing users (askers and answerers). A directed edge e E where e =(u i,u j ), u i V and u j V, indicates that user u j answers the questions of user u i.eachedgee ij E is associated with an affinity weight f(i j) between u i and u j. The weight is computed as follows: f(i j) = Q(i) A(j) (1) where Q(i) is the set of questions asked by u i, A(j) is the set of questions answered by u j. Two users are connected if their affinity weight is larger than 0 and we let f(i i) = 0 to avoid self transition. 4 The transition probability from u i to u j is then defined by normalizing the corresponding affinity weight as follows: p(i j) = { f(i j) V k=1 f(i k) if f 0 0 otherwise (2) where p(i j) is usually not equal to p(j i). We use the row-normalized matrix M = [ M ij ] V V to describe G with each entry corresponding to the transition probability. M ij = p(i j) (3) In order to make the graph fulfill the property of being aperiodic and M be a stochastic matrix, the rows with all zero elements are replaced by a smoothing vector with all elements set to 1/ V. Basedonthematrix M, the saliency score R(u i )foru i can be deduced from those of all other users linked with it and it can be formulated in a recursive manner as in the PageRank algorithm. R(u i )=λ R(u j ) M ji +(1 λ) 1 V j:u j u i where λ [0, 1] is a damping factor. The damping factor indicates that each vertex has a probability of (1 λ) to perform random jump to another vertex within this graph. The saliency score are obtained by running equation (4) iteratively until convergence. (4) 2.3 Topical PageRank for Authority Identification In equation (4), the second term is set to be the same value 1/ V for all vertices within the graph, which indicates that there are equal probabilities of random jump to all vertices. However, Haveliwala [14] and Nie et al. [15] proposed a 4 In CQA, the users cannot answer their own questions.
5 626 G. Zhou, K. Liu, and J. Zhao topical PageRank-like algorithm (TPR) and argued that the second term in equation (4) should be set to be non-uniformed. The assumption is that if we assign larger probabilities to some vertices, the final saliency score will prefer these vertices. The idea of TPR is to run PageRank for each topic separately. Each topicspecific PageRank prefers those users with high relevance to the corresponding topic. Formally, for a specific topic z, we will assign a topic-specific preference value p(u z) toeachuseru as its random jump probability u V p(u z) =1. The users who are interested in topic z will be assigned larger probabilities when performing the PageRank. Given a topic z, the TPR-like saliency score are defined as follows: R(u i z) =λ R(u j z) M ji +(1 λ)p(u i z) (5) j:u j u i The setting of preference value p(u i z) in equation (5) will have great influence to TPR. In this paper, we set p( z) =DZ.z, wheredz.z is the zth column of matrix DZ, which is the column normalized form of matrix DZ such that DZ.z 1 = 1. A large R(u z) indicates a user u is a good candidate authority in topic z. For implementation, the initial scores of all users are set to 1 and the iteration algorithm in equation (5) is used to compute the new scores of the users. Usually the convergence of the iteration is achieved when the difference between the scores computed at two successive iterations for any users falls below a given threshold ( in this paper). After ranking the users by using the TPR or other methods, we select top K users for each topic as topical candidate authorities. 3 Experiments 3.1 Data Set Yahoo! Answers web service supplies an API to allow web users to crawl the existing question answer archives and the corresponding user information from the website [17]. We crawl the data set from Yahoo! Answers, the data set consists of 237,083 resolved questions, and 593,107 answers posted by 286,053 users. Table 2 presents the statistics on the data set. In this paper, for all resolved questions, the information of each question includes: (1) Texts of question and the associated answers, with stop words being excluded 5 and the words being stemmed. 6 (2) User IDs of all questions and answers. (3) Users rating information (e.g., thumbs up, thumbs down, the best answers and so on.)
6 Topical Authority Identification in Community Question Answering 627 Table 2. Yahoo! Answers data set Number of questions 237,083 Number of answers 593,107 Number of best answers 162,733 Number of total users 286,053 Number of askers 180,166 Number of answerers 135,441 Number of both askers and answerers 29,554 Since there is no available benchmark for authority identification for a given topic in CQA, we manually inspect the authority identification results. For each candidate authority u for topic z, we ask two annotators to check whether u is a real authority for the given topic. In this process, the annotators are given the top topic words and user profile. Each identified authority is voted by two annotators with label Yes (the user is a real authority for the given topic) or No (the user is not a real authority for the given topic). If a conflict happens, a third person will make judgement for the final result. The Cohen s Kappa coefficients of the Z topics range from 0.51 to 0.77, showing fair to good agreement. 3.2 Evaluation Metrics To evaluate the performance of authority identification, we use the three widely studied metrics in information retrieval. Mean Average Precision (MAP): This metric is the mean of the average precision scores for each topic. Mean Reciprocal Rank (MRR): This metric is the multiplicative inverse of the rank of the first retrieved authority for each topic. Average Precision@n (Avg. P@n): This metric denotes the average ratio of the relevant authorities in top n identified authorities for each topic. 3.3 Parameter Setting We have several parameters: i.e., Dirichlet hyper-parameters α, β, topicnumber Z, damping factor parameter λ used in PageRank. In this paper, we set Dirichlet priors α =50/Z,andβ =0.05 as Griffiths and Steyvers [12]. We run LDA with 200 iterations of Gibbs sampling. After trying a few different numbers of topics, we empirically set Z = 15. We choose these parameter settings because they give coherent and meaningful topics for our data set. For parameter λ, we conduct an experiment on a small development set to determine the best value among 0.1, 0.2,,0.9 in terms of MAP. This set is also extracted from Yahoo! Answers, and it is not included in the evaluation set. We find that λ =0.2 is the optimal parameter for PR, and TPR.
7 628 G. Zhou, K. Liu, and J. Zhao Table 3. Comparison of authority identification for different methods # Methods MAP MRR Avg. P@10 1 PR HITS InD ER TPR Experimental Results Comparison with different methods To demonstrate the effectiveness of our proposed TPR method, comparisons against some previous work are also included: PageRank (PR): This method finds the authorities with only link structure taken into account [8]. HITS:Jurczyk and Agichtein [3] proposed to find authorities in CQA and estimated the ranking scores by using HITS algorithm. InDegree(InD):This method identifies the authorities based on the number of best answers described in Bouguessa et al. [2] ExpertiseRank (ER): Zhang et al. [7] proposed a PageRank-like algorithm called ExpertiseRanking to rank authorities in an expertise network considering how many users involved in asking and answering questions. Table 3 presents the comparison of authority identification for different methods. From this table, we can find that our proposed method significantly outperforms all previous works (row 1, row 2, row 3, and row 4 vs. row 5). 7 The results show the effectiveness of the propose method by considering the topic information users. 4 Conclusion and Future Work In this paper, we propose a topical rank method for authority identification in CQA. Compared to the traditional link analysis techniques, our proposed method is more effective because it finds the authorities by taking into account both the link structure and the topic information about users. We conduct experiments on real world data set from Yahoo! Answers. Experimental results show that our proposed method significantly outperforms the traditional link analysis techniques and achieves the state-of-the-art performance. Acknowledgements. This work was supported by the National Natural Science Foundation of China (No ), the National Basic Research Program of China (No. 2012CB316300), Tsinghua National Laboratory for Information 7 We perform a significant t-test. The comparisons between our method and previous works are significant at p<0.05.
8 Topical Authority Identification in Community Question Answering 629 Science and Technology (TNList) Cross-discipline Foundation and the Opening Project of Beijing Key Laboratory of Internet Culture and Digital Dissemination Research (No ). We thank the anonymous reviewers for their insightful comments. References 1. Agichtein, E., Castillo, C., Donato, D.: Finding High-Quality Content in Social Media. In: Proceedings of WSDM, pp Bouguessa, M., Dumoulin, B., Wang, S.: Identifying authoritative actors in question-answering forums-the case of Yahoo! Answers. In: Proceedings of KDD, pp Jurczyk, P., Agichtein, E.: Discovering authorities in question answer communities by using link analysis. In: Proceedings of CIKM, pp Liu, J., Song, Y.-I., Lin, C.-Y.: Competition-based user expertise score estimation. In: Proceedings of SIGIR, pp Pal, A., Konstan, J.: Expert identification in community question answering: exploring question selection bias. In: Proceedings of CIKM, pp Kao, W., Liu, D., Wang, S.: Expert finding in question-answering websites: a novel hybrid approach. In: Proceedings of SAC, pp Zhang, J., Ackerman, M., Adamic, L.: Expertise networks in online commmunities: structure and algorithm. In: Proceedings of WWW 8. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Stanford Digtital Library Technologies Project 9. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, Guo, J., Xu, S., Bao, S., Yu, Y.: Tapping on the potential of Q&A community by recommending answer providers. In: Proceedings of CIKM, pp Griffiths, T., Steyvers, M.: Finding scientific topics. The National Academy of Sciences 101, Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: Proceedings of KDD, pp Haveliwala. T. H.: Topic-sensitive pagerank. In: Proceedings of WWW 15. Nie, L., Davison, B.D., Qi, X.: Topic link analysis for web search. In: Proceedings of SIGIR 16. Li, B., King, I.: Routing questions to appropriate answerers in community question answering services. In: Proceedings of CIKM, pp Zhou, G., Cai, L., Zhao, J., Liu, K.: Phrase-based translation model for question retrieval in community question answer archives. In: Proceedings of ACL, pp
Subordinating to the Majority: Factoid Question Answering over CQA Sites
Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei
More informationJoint Relevance and Answer Quality Learning for Question Routing in Community QA
Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy
More informationIncorporating Participant Reputation in Community-driven Question Answering Systems
Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,
More informationLearning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein
More informationRouting Questions for Collaborative Answering in Community Question Answering
Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com
More informationIntegrated Expert Recommendation Model for Online Communities
Integrated Expert Recommendation Model for Online Communities Abeer El-korany 1 Computer Science Department, Faculty of Computers & Information, Cairo University ABSTRACT Online communities have become
More informationQuestion Routing by Modeling User Expertise and Activity in cqa services
Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,
More informationA Tri-Role Topic Model for Domain-Specific Question Answering
A Tri-Role Topic Model for Domain-Specific Question Answering Zongyang Ma Aixin Sun Quan Yuan Gao Cong School of Computer Engineering, Nanyang Technological University, Singapore 639798 {zma4, qyuan1}@e.ntu.edu.sg
More informationExploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering
Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationEarly Detection of Potential Experts in Question Answering Communities
Early Detection of Potential Experts in Question Answering Communities Aditya Pal 1, Rosta Farzan 2, Joseph A. Konstan 1, and Robert Kraut 2 1 Dept. of Computer Science and Engineering, University of Minnesota
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationNew Metrics for Reputation Management in P2P Networks
New for Reputation in P2P Networks D. Donato, M. Paniccia 2, M. Selis 2, C. Castillo, G. Cortesi 3, S. Leonardi 2. Yahoo!Research Barcelona Catalunya, Spain 2. Università di Roma La Sapienza Rome, Italy
More informationInternational Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.
RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,
More informationCQARank: Jointly Model Topics and Expertise in Community Question Answering
CQARank: Jointly Model Topics and Expertise in Community Question Answering Liu Yang,, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen School of Software and Microelectronics,
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationIncorporate Credibility into Context for the Best Social Media Answers
PACLIC 24 Proceedings 535 Incorporate Credibility into Context for the Best Social Media Answers Qi Su a,b, Helen Kai-yun Chen a, and Chu-Ren Huang a a Department of Chinese & Bilingual Studies, The Hong
More informationEvolution of Experts in Question Answering Communities
Evolution of Experts in Question Answering Communities Aditya Pal, Shuo Chang and Joseph A. Konstan Department of Computer Science University of Minnesota Minneapolis, MN 55455, USA {apal,schang,konstan}@cs.umn.edu
More informationFINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS
FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS by Fatemeh Riahi Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie
More informationCorporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China
Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China Yuanyuan Man, Shuai Wang, Yi Li, Yong Zhang, Long Cheng,
More informationMALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationLatent Dirichlet Markov Allocation for Sentiment Analysis
Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer
More informationPersonalizing Image Search from the Photo Sharing Websites
Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in
More informationImproving Question Retrieval in Community Question Answering Using World Knowledge
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationLearning to Suggest Questions in Online Forums
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Learning to Suggest Questions in Online Forums Tom Chao Zhou 1, Chin-Yew Lin 2,IrwinKing 3, Michael R. Lyu 1, Young-In Song 2
More informationFinding Expert Users in Community Question Answering
Finding Expert Users in Community Question Answering Fatemeh Riahi Faculty of Computer Science Dalhousie University riahi@cs.dal.ca Zainab Zolaktaf Faculty of Computer Science Dalhousie University zolaktaf@cs.dal.ca
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationProbabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
More informationInformation Quality on Yahoo! Answers
Information Quality on Yahoo! Answers Pnina Fichman Indiana University, Bloomington, United States ABSTRACT Along with the proliferation of the social web, question and answer (QA) sites attract millions
More informationA survey on click modeling in web search
A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models
More informationHow To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
More informationThe PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationEnhancing the Ranking of a Web Page in the Ocean of Data
Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today
More informationBig Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time
Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time Edward Bortnikov & Ronny Lempel Yahoo! Labs, Haifa Class Outline Link-based page importance measures Why
More informationRanking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning
Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Xin-Jing Wang Microsoft Research Asia 4F Sigma, 49 Zhichun Road Beijing, P.R.China xjwang@microsoft.com Xudong
More informationRANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationCharacterization of Latent Social Networks Discovered through Computer Network Logs
Characterization of Latent Social Networks Discovered through Computer Network Logs Kevin M. Carter MIT Lincoln Laboratory 244 Wood St Lexington, MA 02420 kevin.carter@ll.mit.edu Rajmonda S. Caceres MIT
More informationInference Methods for Analyzing the Hidden Semantics in Big Data. Phuong LE-HONG phuonglh@gmail.com
Inference Methods for Analyzing the Hidden Semantics in Big Data Phuong LE-HONG phuonglh@gmail.com Introduction Grant proposal for basic research project Nafosted, 2014 24 months Principal Investigator:
More informationWeb Graph Analyzer Tool
Web Graph Analyzer Tool Konstantin Avrachenkov INRIA Sophia Antipolis 2004, route des Lucioles, B.P.93 06902, France Email: K.Avrachenkov@sophia.inria.fr Danil Nemirovsky St.Petersburg State University
More informationAn Improved Page Rank Algorithm based on Optimized Normalization Technique
An Improved Page Rank Algorithm based on Optimized Normalization Technique Hema Dubey,Prof. B. N. Roy Department of Computer Science and Engineering Maulana Azad National Institute of technology Bhopal,
More informationThe Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity David Cohn Burning Glass Technologies 201 South Craig St, Suite 2W Pittsburgh, PA 15213 david.cohn@burning-glass.com
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More information1. Systematic literature review
1. Systematic literature review Details about population, intervention, outcomes, databases searched, search strings, inclusion exclusion criteria are presented here. The aim of systematic literature review
More informationHITS vs. Non-negative Matrix Factorization
Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 HITS vs. Non-negative Matrix Factorization Yuanzhe Cai, Sharma Chakravarthy Technical Report CSE 2014
More informationFraudulent Support Telephone Number Identification Based on Co-occurrence Information on the Web
Fraudulent Support Telephone Number Identification Based on Co-occurrence Information on the Web Xin Li, Yiqun Liu, Min Zhang, Shaoping Ma State Key Laboratory of Intelligent Technology and Systems Tsinghua
More informationQuestion Quality in Community Question Answering Forums: A Survey
Question Quality in Community Question Answering Forums: A Survey ABSTRACT Antoaneta Baltadzhieva Tilburg University P.O. Box 90153 Tilburg, Netherlands a baltadzhieva@yahoo.de Community Question Answering
More informationDiscovering Social Media Experts by Integrating Social Networks and Contents
Proceedings of the Twenty-Third Australasian Database Conference (ADC 2012), Melbourne, Australia Discovering Social Media Experts by Integrating Social Networks and Contents Zhao Zhang Bin Zhao Weining
More informationSpam Detection with a Content-based Random-walk Algorithm
Spam Detection with a Content-based Random-walk Algorithm ABSTRACT F. Javier Ortega Departamento de Lenguajes y Sistemas Informáticos Universidad de Sevilla Av. Reina Mercedes s/n 41012, Sevilla (Spain)
More informationLearning to Rank Revisited: Our Progresses in New Algorithms and Tasks
The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationPersonalized Reputation Management in P2P Networks
Personalized Reputation Management in P2P Networks Paul - Alexandru Chirita 1, Wolfgang Nejdl 1, Mario Schlosser 2, and Oana Scurtu 1 1 L3S Research Center / University of Hannover Deutscher Pavillon Expo
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationFinding the Right Facts in the Crowd: Factoid Question Answering over Social Media
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media ABSTRACT Jiang Bian College of Computing Georgia Institute of Technology Atlanta, GA 30332 jbian@cc.gatech.edu Eugene
More informationQuality-Aware Collaborative Question Answering: Methods and Evaluation
Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School
More informationRanking on Data Manifolds
Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname
More informationImproving Web Page Retrieval using Search Context from Clicked Domain Names
Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationUnderstanding Web Hosting Utility of Chinese ISPs
Understanding Web Hosting Utility of Chinese ISPs Zhang Guanqun 1,2, Wang Hui 1,2, Yang Jiahai 1,2 1 The Network Research Center, Tsinghua University, 2 Tsinghua National Laboratory for Information Science
More informationOnline Courses Recommendation based on LDA
Online Courses Recommendation based on LDA Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura Cruz Quispe, José Ochoa Luna National University of St. Agustin Arequipa - Perú {r.guzmanap,elizavvc,lvcruzq,eduardo.ol}@gmail.com
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationParallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationCrowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis
, pp.138-142 http://dx.doi.org/10.14257/astl.2013.31.31 Crowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis Li Peng 1,2, Yu Xiao-yang 1, Liu Yang 2, Bi Ting-ting 2 1 Higher
More informationOn the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services
21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationQuality of Service Routing Network and Performance Evaluation*
Quality of Service Routing Network and Performance Evaluation* Shen Lin, Cui Yong, Xu Ming-wei, and Xu Ke Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 {shenlin, cy, xmw,
More informationRanking User Influence in Healthcare Social Media
Ranking User Influence in Healthcare Social Media XUNING TANG College of Information Science and Technology, Drexel University, PA, U.S.A. and CHRISTOPHER C. YANG College of Information Science and Technology,
More informationAffinity Prediction in Online Social Networks
Affinity Prediction in Online Social Networks Matias Estrada and Marcelo Mendoza Skout Inc., Chile Universidad Técnica Federico Santa María, Chile Abstract Link prediction is the problem of inferring whether
More informationIdentifying Influential Scholars in Academic Social Media Platforms
Identifying Influential Scholars in Academic Social Media Platforms Na Li, Denis Gillet École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland {na.li, denis.gillet}@epfl.ch Abstract
More informationPULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL
Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationWeb based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection
Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Jian Qu, Nguyen Le Minh, Akira Shimazu School of Information Science, JAIST Ishikawa, Japan 923-1292
More informationRanked Keyword Search in Cloud Computing: An Innovative Approach
International Journal of Computational Engineering Research Vol, 03 Issue, 6 Ranked Keyword Search in Cloud Computing: An Innovative Approach 1, Vimmi Makkar 2, Sandeep Dalal 1, (M.Tech) 2,(Assistant professor)
More informationEffective and Efficient Approaches to Retrieving and Using Expertise in Social Media
Effective and Efficient Approaches to Retrieving and Using Expertise in Social Media Reyyan Yeniterzi CMU-LTI-15-008 Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationQuery term suggestion in academic search
Query term suggestion in academic search Suzan Verberne 1, Maya Sappelli 1,2, and Wessel Kraaij 2,1 1. Institute for Computing and Information Sciences, Radboud University Nijmegen 2. TNO, Delft Abstract.
More informationEmoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
More informationExtracting Information from Social Networks
Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationOvercoming Spammers in Twitter A Tale of Five Algorithms 1
Overcoming Spammers in Twitter A Tale of Five Algorithms 1 Daniel Gayo-Avello and David J. Brenes Dept. of Computer Science, University of Oviedo, Calvo Sotelo s/n 33007 Oviedo (SPAIN), Simplelógica, Fray
More informationSUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis Fangtao Li 1, Sheng Wang 2, Shenghua Liu 3 and Ming Zhang
More informationAnalyzing Download Time Performance of University Websites in India
, pp.1-6 http://dx.doi.org/10.14257/ijwse.2014.1.1.01 Analyzing Time Performance of University Websites in India G. Sreedhar Associate Professor Department of Computer Science, Rashtriya Sanskrit Vidyapeetha
More informationBig Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationExpert Finding for Question Answering via Graph Regularized Matrix Completion
1 Expert Finding for Question Answering via Graph Regularized Matrix Completion Zhou Zhao, Lijun Zhang, Xiaofei He and Wilfred Ng Abstract Expert finding for question answering is a challenging problem
More informationA PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH
International Journal of Computer Science and System Analysis Vol. 5, No. 1, January-June 2011, pp. 37-43 Serials Publications ISSN 0973-7448 A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED
More informationSearch engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
More informationDocument Classification with Latent Dirichlet Allocation
Document Classification with Latent Dirichlet Allocation Ph.D. Thesis Summary István Bíró Supervisor: András Lukács Ph.D. Eötvös Loránd University Faculty of Informatics Department of Information Sciences
More informationBooming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services
Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services
More informationQASM: a Q&A Social Media System Based on Social Semantics
QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationDetecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
More information