Topical Authority Identification in Community Question Answering

Size: px
Start display at page:

Download "Topical Authority Identification in Community Question Answering"

Transcription

1 Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95 Zhongguancun East Road, Beijing , China Abstract. In this paper, we address the problem of authority identification in community question answering (CQA). Most of the existing approaches attempt to identify authorities in CQA by means of link analysis techniques. However, these traditional techniques only consider the link structure while ignore the topic information about the users, giving rise to an increasing problem of topic drift. Tosolvetheproblem of topic drift, we propose a topical ranking method, which is an extension of PageRank algorithm to identify authorities in CQA. Compared to the traditional link analysis techniques, our proposed method is more effective because it measures the authority scores by taking into account both the link structure and the topic information. We conduct experiments on real world data set from Yahoo! Answers. Experimental results show that our proposed method significantly outperforms the traditional link analysis techniques and achieves the state-of-the-art performance for authority identification in CQA. Keywords: authority identification; PageRank; community question answering. 1 Introduction Community question answering (CQA) is a particular form of online service for leveraging user-generated content, which has gained increasing popularity in recent years. These online services, such as Yahoo! Answers 1 and Live QnA 2, provide a platform for users to ask and answer questions. Unfortunately, the quality of answers has high variance: ranging from very high to low quality, sometimes abusive content or even spam [1]. Therefore, it is desirable to automatically identify authorities in CQA, so as to route the newly posted questions to the appropriate authorities, who can provide good quality answers to these questions [2,3,4]. Finally, the overall quality of answers can be substantially improved C.-L. Liu, C. Zhang, and L. Wang (Eds.): CCPR 2012, CCIS 321, pp , c Springer-Verlag Berlin Heidelberg 2012

2 Topical Authority Identification in Community Question Answering 623 Authority Identification in CQA is the task of identifying users who can provide a large number of high quality, complete, and reliable answers [5], which has recently gained a wide interest in NLP and IR communities [2,3,6,7]. These existing approaches measure the authority scores by means of link analysis techniques such as PageRank [8] and HITS [9], or their variants. However, the traditional link analysis techniques only consider the link structure while ignore the topic information about the users, giving rise to an increasing problem of topic drift. To tackle the problem of topic drift, this paper proposes a topical ranking method for authority identification in CQA. Given a set of users, we first automatically distill the topics that users are interested in by analyzing the content of their answered questions and the corresponding answers. Based on the topics distilled, topic-specific question-answer relationships between askers and answerers are constructed. Finally, we measure the authority scores by taking into account both the link structure and the topic information about users. To the best of our knowledge, this is the first extensive and empirical study of identifying authoritative users in CQA by taking into account both the link structure and the topic information about users. To date, little work has been made regarding topic information about users in studies of authority identification in CQA, which remains an under-explored research area. This paper is thus designed to fill the gap. Specially, we make the following contributions: We automatically distill the topics that users are interested in by analyzing the content of their answered questions and the corresponding answers (in Section 2.1). We propose a topical ranking method by taking into account both the link structure and the topic information about users (in Section 2.2). Finally, we conduct experiments on CQA data set. The results show that our proposed method significantly outperforms the traditional link analysis techniques (in Section 3). The rest of this paper is organized as follows. Section 2 presents our proposed method. Section 3 presents the experimental results. Finally, we conclude with ideas for future work in Section 4. 2 Topical Authority Identification 2.1 Topic Distillation Topic distillation aims to automatically identify the topics that users (askers and answerers) are interested in based on the user profiles. 3 In this paper, we use the widely studied topic model Latent Dirichlet Allocation (LDA) [10] to identify the latent topic information from the large scale question-answer collection. 3 Here, the user profiles refer to the questions answered by the users and the corresponding answers.

3 624 G. Zhou, K. Liu, and J. Zhao LDA is a bayesian probabilistic graphical model, which models each document as a mixture of underlying topics and generates each word from one topic. The generation process of a document is described in Table 1. A document d is associated with a multinomial distribution over K topics, which is denoted as θ d.for each word w di in document d: (1) a topic z di is first sampled from the multinomial distribution θ d, which is generated from the Dirichlet prior parameterized by α; (2) then each word w di is generated from multinomial distribution φ zdi, which is generated from the Dirichlet prior parameterized by β. The two Dirichlet priors for document-topic distributions θ d and topic-word distributions φ z reduce the probability of overfitting training documents and enhance the ability of inferring topic distribution for new documents [11]. Here, we employ Gibbs sampling [12] for parameter estimation due to its faster convergence and better performance [13]. Table 1. The generation process of LDA For each topic z i {1,,K}, sample a multinomial distribution over words, φ zi Dir(β) For each document d: 1. sample a multinomial distribution over topics, θ d Dir(α) 2. For each word w di in document d: * sample a topic z di Multinomial(θ d ) * sample a word w di Multinomial(φ zdi ) To distill the topics that users are interested in using LDA, documents should naturally correspond to questions and answers. However, since the goal is to distill the topics that each user is interested in rather than the topics that each question and the corresponding answers are about, we aggregate the user profiles provided by each individual user into a big document. Thus, each document essentially corresponds to a user. The results of topic distillation are represented in two matrices: DK =[θ] D K,a D K matrix, where D is the number of users, and K is the number of topics. DK ij DK contains the number of times a word in u i s profiles (questions and the corresponding answers) has been assigned to topic z j. WK = [φ] W K,a W K matrix, where W is the number of unique words used in question-answer collection, and K is the number of topics. WK ij WK denotes the number of times unique word w i has been assigned to the specific topic z j. In these two matrices, matrix DK contains the number of times a word in a user (e.g., u i ) profiles has been assigned to a particular topic. We can row normalize it as DK such that DK i 1 =1foreachrowDK i.. Each row of matrix DK is the probability distribution of u i s interest over the K topics, e.g., each element DK ij denotes the probability that u i is interested in topic z j ( P (z j u i )=DK ij ).

4 Topical Authority Identification in Community Question Answering PageRank for Authority Identification Based on the topics distilled in subsection 2.1, a directed graph G =(V,E) is formed with the topic-specific question-answer relationships among users. V is a set of nodes representing users (askers and answerers). A directed edge e E where e =(u i,u j ), u i V and u j V, indicates that user u j answers the questions of user u i.eachedgee ij E is associated with an affinity weight f(i j) between u i and u j. The weight is computed as follows: f(i j) = Q(i) A(j) (1) where Q(i) is the set of questions asked by u i, A(j) is the set of questions answered by u j. Two users are connected if their affinity weight is larger than 0 and we let f(i i) = 0 to avoid self transition. 4 The transition probability from u i to u j is then defined by normalizing the corresponding affinity weight as follows: p(i j) = { f(i j) V k=1 f(i k) if f 0 0 otherwise (2) where p(i j) is usually not equal to p(j i). We use the row-normalized matrix M = [ M ij ] V V to describe G with each entry corresponding to the transition probability. M ij = p(i j) (3) In order to make the graph fulfill the property of being aperiodic and M be a stochastic matrix, the rows with all zero elements are replaced by a smoothing vector with all elements set to 1/ V. Basedonthematrix M, the saliency score R(u i )foru i can be deduced from those of all other users linked with it and it can be formulated in a recursive manner as in the PageRank algorithm. R(u i )=λ R(u j ) M ji +(1 λ) 1 V j:u j u i where λ [0, 1] is a damping factor. The damping factor indicates that each vertex has a probability of (1 λ) to perform random jump to another vertex within this graph. The saliency score are obtained by running equation (4) iteratively until convergence. (4) 2.3 Topical PageRank for Authority Identification In equation (4), the second term is set to be the same value 1/ V for all vertices within the graph, which indicates that there are equal probabilities of random jump to all vertices. However, Haveliwala [14] and Nie et al. [15] proposed a 4 In CQA, the users cannot answer their own questions.

5 626 G. Zhou, K. Liu, and J. Zhao topical PageRank-like algorithm (TPR) and argued that the second term in equation (4) should be set to be non-uniformed. The assumption is that if we assign larger probabilities to some vertices, the final saliency score will prefer these vertices. The idea of TPR is to run PageRank for each topic separately. Each topicspecific PageRank prefers those users with high relevance to the corresponding topic. Formally, for a specific topic z, we will assign a topic-specific preference value p(u z) toeachuseru as its random jump probability u V p(u z) =1. The users who are interested in topic z will be assigned larger probabilities when performing the PageRank. Given a topic z, the TPR-like saliency score are defined as follows: R(u i z) =λ R(u j z) M ji +(1 λ)p(u i z) (5) j:u j u i The setting of preference value p(u i z) in equation (5) will have great influence to TPR. In this paper, we set p( z) =DZ.z, wheredz.z is the zth column of matrix DZ, which is the column normalized form of matrix DZ such that DZ.z 1 = 1. A large R(u z) indicates a user u is a good candidate authority in topic z. For implementation, the initial scores of all users are set to 1 and the iteration algorithm in equation (5) is used to compute the new scores of the users. Usually the convergence of the iteration is achieved when the difference between the scores computed at two successive iterations for any users falls below a given threshold ( in this paper). After ranking the users by using the TPR or other methods, we select top K users for each topic as topical candidate authorities. 3 Experiments 3.1 Data Set Yahoo! Answers web service supplies an API to allow web users to crawl the existing question answer archives and the corresponding user information from the website [17]. We crawl the data set from Yahoo! Answers, the data set consists of 237,083 resolved questions, and 593,107 answers posted by 286,053 users. Table 2 presents the statistics on the data set. In this paper, for all resolved questions, the information of each question includes: (1) Texts of question and the associated answers, with stop words being excluded 5 and the words being stemmed. 6 (2) User IDs of all questions and answers. (3) Users rating information (e.g., thumbs up, thumbs down, the best answers and so on.)

6 Topical Authority Identification in Community Question Answering 627 Table 2. Yahoo! Answers data set Number of questions 237,083 Number of answers 593,107 Number of best answers 162,733 Number of total users 286,053 Number of askers 180,166 Number of answerers 135,441 Number of both askers and answerers 29,554 Since there is no available benchmark for authority identification for a given topic in CQA, we manually inspect the authority identification results. For each candidate authority u for topic z, we ask two annotators to check whether u is a real authority for the given topic. In this process, the annotators are given the top topic words and user profile. Each identified authority is voted by two annotators with label Yes (the user is a real authority for the given topic) or No (the user is not a real authority for the given topic). If a conflict happens, a third person will make judgement for the final result. The Cohen s Kappa coefficients of the Z topics range from 0.51 to 0.77, showing fair to good agreement. 3.2 Evaluation Metrics To evaluate the performance of authority identification, we use the three widely studied metrics in information retrieval. Mean Average Precision (MAP): This metric is the mean of the average precision scores for each topic. Mean Reciprocal Rank (MRR): This metric is the multiplicative inverse of the rank of the first retrieved authority for each topic. Average Precision@n (Avg. P@n): This metric denotes the average ratio of the relevant authorities in top n identified authorities for each topic. 3.3 Parameter Setting We have several parameters: i.e., Dirichlet hyper-parameters α, β, topicnumber Z, damping factor parameter λ used in PageRank. In this paper, we set Dirichlet priors α =50/Z,andβ =0.05 as Griffiths and Steyvers [12]. We run LDA with 200 iterations of Gibbs sampling. After trying a few different numbers of topics, we empirically set Z = 15. We choose these parameter settings because they give coherent and meaningful topics for our data set. For parameter λ, we conduct an experiment on a small development set to determine the best value among 0.1, 0.2,,0.9 in terms of MAP. This set is also extracted from Yahoo! Answers, and it is not included in the evaluation set. We find that λ =0.2 is the optimal parameter for PR, and TPR.

7 628 G. Zhou, K. Liu, and J. Zhao Table 3. Comparison of authority identification for different methods # Methods MAP MRR Avg. P@10 1 PR HITS InD ER TPR Experimental Results Comparison with different methods To demonstrate the effectiveness of our proposed TPR method, comparisons against some previous work are also included: PageRank (PR): This method finds the authorities with only link structure taken into account [8]. HITS:Jurczyk and Agichtein [3] proposed to find authorities in CQA and estimated the ranking scores by using HITS algorithm. InDegree(InD):This method identifies the authorities based on the number of best answers described in Bouguessa et al. [2] ExpertiseRank (ER): Zhang et al. [7] proposed a PageRank-like algorithm called ExpertiseRanking to rank authorities in an expertise network considering how many users involved in asking and answering questions. Table 3 presents the comparison of authority identification for different methods. From this table, we can find that our proposed method significantly outperforms all previous works (row 1, row 2, row 3, and row 4 vs. row 5). 7 The results show the effectiveness of the propose method by considering the topic information users. 4 Conclusion and Future Work In this paper, we propose a topical rank method for authority identification in CQA. Compared to the traditional link analysis techniques, our proposed method is more effective because it finds the authorities by taking into account both the link structure and the topic information about users. We conduct experiments on real world data set from Yahoo! Answers. Experimental results show that our proposed method significantly outperforms the traditional link analysis techniques and achieves the state-of-the-art performance. Acknowledgements. This work was supported by the National Natural Science Foundation of China (No ), the National Basic Research Program of China (No. 2012CB316300), Tsinghua National Laboratory for Information 7 We perform a significant t-test. The comparisons between our method and previous works are significant at p<0.05.

8 Topical Authority Identification in Community Question Answering 629 Science and Technology (TNList) Cross-discipline Foundation and the Opening Project of Beijing Key Laboratory of Internet Culture and Digital Dissemination Research (No ). We thank the anonymous reviewers for their insightful comments. References 1. Agichtein, E., Castillo, C., Donato, D.: Finding High-Quality Content in Social Media. In: Proceedings of WSDM, pp Bouguessa, M., Dumoulin, B., Wang, S.: Identifying authoritative actors in question-answering forums-the case of Yahoo! Answers. In: Proceedings of KDD, pp Jurczyk, P., Agichtein, E.: Discovering authorities in question answer communities by using link analysis. In: Proceedings of CIKM, pp Liu, J., Song, Y.-I., Lin, C.-Y.: Competition-based user expertise score estimation. In: Proceedings of SIGIR, pp Pal, A., Konstan, J.: Expert identification in community question answering: exploring question selection bias. In: Proceedings of CIKM, pp Kao, W., Liu, D., Wang, S.: Expert finding in question-answering websites: a novel hybrid approach. In: Proceedings of SAC, pp Zhang, J., Ackerman, M., Adamic, L.: Expertise networks in online commmunities: structure and algorithm. In: Proceedings of WWW 8. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Stanford Digtital Library Technologies Project 9. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, Guo, J., Xu, S., Bao, S., Yu, Y.: Tapping on the potential of Q&A community by recommending answer providers. In: Proceedings of CIKM, pp Griffiths, T., Steyvers, M.: Finding scientific topics. The National Academy of Sciences 101, Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: Proceedings of KDD, pp Haveliwala. T. H.: Topic-sensitive pagerank. In: Proceedings of WWW 15. Nie, L., Davison, B.D., Qi, X.: Topic link analysis for web search. In: Proceedings of SIGIR 16. Li, B., King, I.: Routing questions to appropriate answerers in community question answering services. In: Proceedings of CIKM, pp Zhou, G., Cai, L., Zhao, J., Liu, K.: Phrase-based translation model for question retrieval in community question answer archives. In: Proceedings of ACL, pp

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Subordinating to the Majority: Factoid Question Answering over CQA Sites Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

More information

Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

More information

Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

More information

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein

More information

Routing Questions for Collaborative Answering in Community Question Answering

Routing Questions for Collaborative Answering in Community Question Answering Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com

More information

Integrated Expert Recommendation Model for Online Communities

Integrated Expert Recommendation Model for Online Communities Integrated Expert Recommendation Model for Online Communities Abeer El-korany 1 Computer Science Department, Faculty of Computers & Information, Cairo University ABSTRACT Online communities have become

More information

Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

More information

A Tri-Role Topic Model for Domain-Specific Question Answering

A Tri-Role Topic Model for Domain-Specific Question Answering A Tri-Role Topic Model for Domain-Specific Question Answering Zongyang Ma Aixin Sun Quan Yuan Gao Cong School of Computer Engineering, Nanyang Technological University, Singapore 639798 {zma4, qyuan1}@e.ntu.edu.sg

More information

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

Early Detection of Potential Experts in Question Answering Communities

Early Detection of Potential Experts in Question Answering Communities Early Detection of Potential Experts in Question Answering Communities Aditya Pal 1, Rosta Farzan 2, Joseph A. Konstan 1, and Robert Kraut 2 1 Dept. of Computer Science and Engineering, University of Minnesota

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

New Metrics for Reputation Management in P2P Networks

New Metrics for Reputation Management in P2P Networks New for Reputation in P2P Networks D. Donato, M. Paniccia 2, M. Selis 2, C. Castillo, G. Cortesi 3, S. Leonardi 2. Yahoo!Research Barcelona Catalunya, Spain 2. Università di Roma La Sapienza Rome, Italy

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer. RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,

More information

CQARank: Jointly Model Topics and Expertise in Community Question Answering

CQARank: Jointly Model Topics and Expertise in Community Question Answering CQARank: Jointly Model Topics and Expertise in Community Question Answering Liu Yang,, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen School of Software and Microelectronics,

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Incorporate Credibility into Context for the Best Social Media Answers

Incorporate Credibility into Context for the Best Social Media Answers PACLIC 24 Proceedings 535 Incorporate Credibility into Context for the Best Social Media Answers Qi Su a,b, Helen Kai-yun Chen a, and Chu-Ren Huang a a Department of Chinese & Bilingual Studies, The Hong

More information

Evolution of Experts in Question Answering Communities

Evolution of Experts in Question Answering Communities Evolution of Experts in Question Answering Communities Aditya Pal, Shuo Chang and Joseph A. Konstan Department of Computer Science University of Minnesota Minneapolis, MN 55455, USA {apal,schang,konstan}@cs.umn.edu

More information

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS by Fatemeh Riahi Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie

More information

Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China

Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China Yuanyuan Man, Shuai Wang, Yi Li, Yong Zhang, Long Cheng,

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Latent Dirichlet Markov Allocation for Sentiment Analysis

Latent Dirichlet Markov Allocation for Sentiment Analysis Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer

More information

Personalizing Image Search from the Photo Sharing Websites

Personalizing Image Search from the Photo Sharing Websites Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in

More information

Improving Question Retrieval in Community Question Answering Using World Knowledge

Improving Question Retrieval in Community Question Answering Using World Knowledge Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

Learning to Suggest Questions in Online Forums

Learning to Suggest Questions in Online Forums Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Learning to Suggest Questions in Online Forums Tom Chao Zhou 1, Chin-Yew Lin 2,IrwinKing 3, Michael R. Lyu 1, Young-In Song 2

More information

Finding Expert Users in Community Question Answering

Finding Expert Users in Community Question Answering Finding Expert Users in Community Question Answering Fatemeh Riahi Faculty of Computer Science Dalhousie University riahi@cs.dal.ca Zainab Zolaktaf Faculty of Computer Science Dalhousie University zolaktaf@cs.dal.ca

More information

Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

More information

Probabilistic topic models for sentiment analysis on the Web

Probabilistic topic models for sentiment analysis on the Web University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

More information

Information Quality on Yahoo! Answers

Information Quality on Yahoo! Answers Information Quality on Yahoo! Answers Pnina Fichman Indiana University, Bloomington, United States ABSTRACT Along with the proliferation of the social web, question and answer (QA) sites attract millions

More information

A survey on click modeling in web search

A survey on click modeling in web search A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

More information

How To Cluster On A Search Engine

How To Cluster On A Search Engine Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

More information

The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Enhancing the Ranking of a Web Page in the Ocean of Data

Enhancing the Ranking of a Web Page in the Ocean of Data Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today

More information

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time Edward Bortnikov & Ronny Lempel Yahoo! Labs, Haifa Class Outline Link-based page importance measures Why

More information

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Xin-Jing Wang Microsoft Research Asia 4F Sigma, 49 Zhichun Road Beijing, P.R.China xjwang@microsoft.com Xudong

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

Characterization of Latent Social Networks Discovered through Computer Network Logs

Characterization of Latent Social Networks Discovered through Computer Network Logs Characterization of Latent Social Networks Discovered through Computer Network Logs Kevin M. Carter MIT Lincoln Laboratory 244 Wood St Lexington, MA 02420 kevin.carter@ll.mit.edu Rajmonda S. Caceres MIT

More information

Inference Methods for Analyzing the Hidden Semantics in Big Data. Phuong LE-HONG phuonglh@gmail.com

Inference Methods for Analyzing the Hidden Semantics in Big Data. Phuong LE-HONG phuonglh@gmail.com Inference Methods for Analyzing the Hidden Semantics in Big Data Phuong LE-HONG phuonglh@gmail.com Introduction Grant proposal for basic research project Nafosted, 2014 24 months Principal Investigator:

More information

Web Graph Analyzer Tool

Web Graph Analyzer Tool Web Graph Analyzer Tool Konstantin Avrachenkov INRIA Sophia Antipolis 2004, route des Lucioles, B.P.93 06902, France Email: K.Avrachenkov@sophia.inria.fr Danil Nemirovsky St.Petersburg State University

More information

An Improved Page Rank Algorithm based on Optimized Normalization Technique

An Improved Page Rank Algorithm based on Optimized Normalization Technique An Improved Page Rank Algorithm based on Optimized Normalization Technique Hema Dubey,Prof. B. N. Roy Department of Computer Science and Engineering Maulana Azad National Institute of technology Bhopal,

More information

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity David Cohn Burning Glass Technologies 201 South Craig St, Suite 2W Pittsburgh, PA 15213 david.cohn@burning-glass.com

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

1. Systematic literature review

1. Systematic literature review 1. Systematic literature review Details about population, intervention, outcomes, databases searched, search strings, inclusion exclusion criteria are presented here. The aim of systematic literature review

More information

HITS vs. Non-negative Matrix Factorization

HITS vs. Non-negative Matrix Factorization Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 HITS vs. Non-negative Matrix Factorization Yuanzhe Cai, Sharma Chakravarthy Technical Report CSE 2014

More information

Fraudulent Support Telephone Number Identification Based on Co-occurrence Information on the Web

Fraudulent Support Telephone Number Identification Based on Co-occurrence Information on the Web Fraudulent Support Telephone Number Identification Based on Co-occurrence Information on the Web Xin Li, Yiqun Liu, Min Zhang, Shaoping Ma State Key Laboratory of Intelligent Technology and Systems Tsinghua

More information

Question Quality in Community Question Answering Forums: A Survey

Question Quality in Community Question Answering Forums: A Survey Question Quality in Community Question Answering Forums: A Survey ABSTRACT Antoaneta Baltadzhieva Tilburg University P.O. Box 90153 Tilburg, Netherlands a baltadzhieva@yahoo.de Community Question Answering

More information

Discovering Social Media Experts by Integrating Social Networks and Contents

Discovering Social Media Experts by Integrating Social Networks and Contents Proceedings of the Twenty-Third Australasian Database Conference (ADC 2012), Melbourne, Australia Discovering Social Media Experts by Integrating Social Networks and Contents Zhao Zhang Bin Zhao Weining

More information

Spam Detection with a Content-based Random-walk Algorithm

Spam Detection with a Content-based Random-walk Algorithm Spam Detection with a Content-based Random-walk Algorithm ABSTRACT F. Javier Ortega Departamento de Lenguajes y Sistemas Informáticos Universidad de Sevilla Av. Reina Mercedes s/n 41012, Sevilla (Spain)

More information

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

Personalized Reputation Management in P2P Networks

Personalized Reputation Management in P2P Networks Personalized Reputation Management in P2P Networks Paul - Alexandru Chirita 1, Wolfgang Nejdl 1, Mario Schlosser 2, and Oana Scurtu 1 1 L3S Research Center / University of Hannover Deutscher Pavillon Expo

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media ABSTRACT Jiang Bian College of Computing Georgia Institute of Technology Atlanta, GA 30332 jbian@cc.gatech.edu Eugene

More information

Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School

More information

Ranking on Data Manifolds

Ranking on Data Manifolds Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

More information

Improving Web Page Retrieval using Search Context from Clicked Domain Names

Improving Web Page Retrieval using Search Context from Clicked Domain Names Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Understanding Web Hosting Utility of Chinese ISPs

Understanding Web Hosting Utility of Chinese ISPs Understanding Web Hosting Utility of Chinese ISPs Zhang Guanqun 1,2, Wang Hui 1,2, Yang Jiahai 1,2 1 The Network Research Center, Tsinghua University, 2 Tsinghua National Laboratory for Information Science

More information

Online Courses Recommendation based on LDA

Online Courses Recommendation based on LDA Online Courses Recommendation based on LDA Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura Cruz Quispe, José Ochoa Luna National University of St. Agustin Arequipa - Perú {r.guzmanap,elizavvc,lvcruzq,eduardo.ol}@gmail.com

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Crowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis

Crowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis , pp.138-142 http://dx.doi.org/10.14257/astl.2013.31.31 Crowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis Li Peng 1,2, Yu Xiao-yang 1, Liu Yang 2, Bi Ting-ting 2 1 Higher

More information

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services 21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

Quality of Service Routing Network and Performance Evaluation*

Quality of Service Routing Network and Performance Evaluation* Quality of Service Routing Network and Performance Evaluation* Shen Lin, Cui Yong, Xu Ming-wei, and Xu Ke Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 {shenlin, cy, xmw,

More information

Ranking User Influence in Healthcare Social Media

Ranking User Influence in Healthcare Social Media Ranking User Influence in Healthcare Social Media XUNING TANG College of Information Science and Technology, Drexel University, PA, U.S.A. and CHRISTOPHER C. YANG College of Information Science and Technology,

More information

Affinity Prediction in Online Social Networks

Affinity Prediction in Online Social Networks Affinity Prediction in Online Social Networks Matias Estrada and Marcelo Mendoza Skout Inc., Chile Universidad Técnica Federico Santa María, Chile Abstract Link prediction is the problem of inferring whether

More information

Identifying Influential Scholars in Academic Social Media Platforms

Identifying Influential Scholars in Academic Social Media Platforms Identifying Influential Scholars in Academic Social Media Platforms Na Li, Denis Gillet École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland {na.li, denis.gillet}@epfl.ch Abstract

More information

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection

Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Jian Qu, Nguyen Le Minh, Akira Shimazu School of Information Science, JAIST Ishikawa, Japan 923-1292

More information

Ranked Keyword Search in Cloud Computing: An Innovative Approach

Ranked Keyword Search in Cloud Computing: An Innovative Approach International Journal of Computational Engineering Research Vol, 03 Issue, 6 Ranked Keyword Search in Cloud Computing: An Innovative Approach 1, Vimmi Makkar 2, Sandeep Dalal 1, (M.Tech) 2,(Assistant professor)

More information

Effective and Efficient Approaches to Retrieving and Using Expertise in Social Media

Effective and Efficient Approaches to Retrieving and Using Expertise in Social Media Effective and Efficient Approaches to Retrieving and Using Expertise in Social Media Reyyan Yeniterzi CMU-LTI-15-008 Language Technologies Institute School of Computer Science Carnegie Mellon University

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

Query term suggestion in academic search

Query term suggestion in academic search Query term suggestion in academic search Suzan Verberne 1, Maya Sappelli 1,2, and Wessel Kraaij 2,1 1. Institute for Computing and Information Sciences, Radboud University Nijmegen 2. TNO, Delft Abstract.

More information

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Emoticon Smoothed Language Models for Twitter Sentiment Analysis Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of

More information

Extracting Information from Social Networks

Extracting Information from Social Networks Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction

More information

Overcoming Spammers in Twitter A Tale of Five Algorithms 1

Overcoming Spammers in Twitter A Tale of Five Algorithms 1 Overcoming Spammers in Twitter A Tale of Five Algorithms 1 Daniel Gayo-Avello and David J. Brenes Dept. of Computer Science, University of Oviedo, Calvo Sotelo s/n 33007 Oviedo (SPAIN), Simplelógica, Fray

More information

SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis

SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis Fangtao Li 1, Sheng Wang 2, Shenghua Liu 3 and Ming Zhang

More information

Analyzing Download Time Performance of University Websites in India

Analyzing Download Time Performance of University Websites in India , pp.1-6 http://dx.doi.org/10.14257/ijwse.2014.1.1.01 Analyzing Time Performance of University Websites in India G. Sreedhar Associate Professor Department of Computer Science, Rashtriya Sanskrit Vidyapeetha

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

Expert Finding for Question Answering via Graph Regularized Matrix Completion

Expert Finding for Question Answering via Graph Regularized Matrix Completion 1 Expert Finding for Question Answering via Graph Regularized Matrix Completion Zhou Zhao, Lijun Zhang, Xiaofei He and Wilfred Ng Abstract Expert finding for question answering is a challenging problem

More information

A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH

A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED WEB SEARCH International Journal of Computer Science and System Analysis Vol. 5, No. 1, January-June 2011, pp. 37-43 Serials Publications ISSN 0973-7448 A PREDICTIVE MODEL FOR QUERY OPTIMIZATION TECHNIQUES IN PERSONALIZED

More information

Search engines: ranking algorithms

Search engines: ranking algorithms Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated

More information

Document Classification with Latent Dirichlet Allocation

Document Classification with Latent Dirichlet Allocation Document Classification with Latent Dirichlet Allocation Ph.D. Thesis Summary István Bíró Supervisor: András Lukács Ph.D. Eötvös Loránd University Faculty of Informatics Department of Information Sciences

More information

Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services

Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services

More information

QASM: a Q&A Social Media System Based on Social Semantics

QASM: a Q&A Social Media System Based on Social Semantics QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information