# Finding Expert Users in Community Question Answering

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 In this study, our objective is to compute the probability P (q u) that captures the expertise of user u on question q. This probability model was first introduced by [3]. In the dataset used in this research, each question has three parts: question tag, question title, and question body. Question tag is the tag assigned by user who posted the question. Question title is a short description of the question. The detailed description is given in the question body. The main challenge is representing the questions. Additionally, the expertise and interest of a user should be modeled by taking advantage of the activity history of the user. 4. MODELING EXPERT SEARCH In the Community Question Answering Services, answerers usually choose a category that they are more interested in and then pick a question from that category. Therefore, user interests can be inferred from answering history. In this section, we explore different methods for ranking users based on their interests. These methods can be divided into two main categories: word-based methods and topic models. In the first category, we model user interest by using TF-IDF and language model. In general, word-based methods use a smoothed distribution to estimate the likelihood of a query in a given collection of documents. Topic models have an additional representational level. Documents in these models are a mixture of topics and topics are mixtures of words. 4.1 TF-IDF TF-IDF is a standard measure to compute importance and relevance of a word in a document based on the frequency of that word in the document and the inverse proportion of documents containing the word over the entire document corpus. Words that appear only in a small group of documents will have higher tf-idf scores than other words. Basically, TF-IDF is defined as follows: given a document collection Q, awordw, and a document d Q: tfidf = f w,d log( Q f w,q ) (2) where f w,d is the number of times w appears in d, Q is the size of the corpus and f w,q is the total number of documents that have the word w [28]. The end result is a term-by-document matrix for the entire corpus, where columns represent terms, rows represent documents and each value in the matrix represents the tfidf weight for the corresponding term and document. Thus, the TF-IDF reduces documents of different lengths to vectors with a fixed length. For the expert retrieval task, given a test question q composed of a set of words, we represent test question and each profile as vectors of their tf-idf weights and then calculate the Cosine Similarity between each user profile and question vector: w tfidf(u, w)tfidf(q, w) s(u, q) = w tfidf(u, w)2 w tfidf(q, (3) w)2 where tfidf(q, w) isthetf-idfweightofwordw in q, and tfidf(u, w) isthetf-idfweightofw in the profile of user u. 4.2 Language Model Language Model is related to traditional TF-IDF models. Similar to TF-IDF, rare terms in the corpus which occur in only a group of documents in the corpus, have a great influence on the ranking. Some research in information retrieval shows that the language model approach is more effective than TF-IDF [25]. In this model, a multinomial probability distribution over words in the vocabulary is used to represent a candidate user. A new question is represented by a set of words q= {w 1,w 2,...,w N} where w i is a non-stop word and each question is assumed to be generated independently. Therefore, the probability of a question being generated by a candidate user can be computed by taking the product of each word s probability in the question given the user profile. P (q u) = P (w θ u) n(w,q) (4) w where θ u denotes user profile for user u, P (w θ u)istheprobability of generating word w from user profile θ u and n(w, q) is the number of times word w appears in question q. Since many words in the vocabulary will not appear in a given user profile, and P (w θ u) will be zero for such w, we need to use a smoothing method on the P (w θ u)). By doing so, we can avoid zero probability for unseen words [29]. We apply Dirichlet smoothing method for P (w θ u): P LM (w θ u)=λp (w θ u)+(1 λ)p (w) (5) where P (w) denotes the background language model built on the entire collection Q and λ [0,1] is a coefficient to control the influence of the background model and is defined as: w θ λ = u tf(w, θ u) (6) w θ u tf(w, θ u)+μ where tf(w, θ u) is the frequency of w in the profile of u and parameter μ is set to 1000 in the experiments. The background model P (w) can be computed through a maximum likelihood estimation: n(w, Q) P (w) = (7) Q where n(w, Q) denotes the frequency of words w being in the collection Q and Q is the total number of words in the collection. 4.3 Latent Dirichlet Allocation While TF-IDF and language model have some interesting features, they still provide a relatively small reduction of dimensionality and do not model much of the inter or intra document structure. LDA is a three-level hierarchical Bayesian model and has been used extensively for modeling text corpora. Each document in a collection is modeled as a mixture over a set of topics where each topic is a distribution over words in a given vocabulary [5]. To apply LDA on the problem of expert finding, we have to model user profiles as a mixture of topics. A question is represented as a vector of N w words where each w i is chosen from a vocabulary of size V. A collection of user profiles is denoted as D ={(w 1, u 1),..., (w m, u n)} wheremisthetotal number of words in the vocabulary and n is the number of users in the corpus. The LDA model assumes a certain generative process for data. To generate a user profile, LDA assumes that for each user profile a distribution over topics is sampled from a Dirichlet distribution. In the next step, for each word in the 793

4 Figure 1: Generative model for documents in LDA Notation K U N q N q,w V α μ u ν u,q φ γ w u,q,v z u,q,v Table 1: list of notations Description Number of topics Number of profiles Number of questions in profile u Number of words in question q, profileu Size of vocabulary Prior distribution for profile topic distribution Profile topic probabilities for profile u Question topic probabilities for user u, question q Words probability matrix Dirichlet prior for φ Word in profile u, question q, at position v Topic for word in profile u, question q, at position v user profile, a single topic is chosen according to this topic distribution. Finally, each word is sampled from a multinomial distribution over words specific to the sampled topic. The generative process of the LDA model is shown in Fig. 1. The only observed random variable in this model is W and the rest are latent variables. A box around groups of random variables indicates replication. In this model, θ is a matrix of user-profile probabilities for K topics drawn independently from a symmetric Dirichlet α prior. φ denotes a matrix of topic probabilities for all the words in the vocabulary drawn from a symmetric Dirichlet β prior. z is the topic assigned to word w from θ distribution, where w is drawn from the topic distribution corresponding to z. The process of generating the profile of user u is as follows: 1. Choose a topic k {1,..., K} from the θ distribution. 2. Pick a word w from the multinomial distribution φ k. 3. Repeat the process for N w times where N w is the total number of words in user profile. LDA assumes that this process is repeated for generating user profiles for every user in the dataset. We use Gibbs sampling [12] for estimating parameters of the model. 4.4 Segmented Topic Model LDA is informative about the content of user profiles, but it does not take advantage of the structure of profiles. Each profile is composed of questions where each question contains sentences. The shared topics between questions can be extracted from the structure of profiles. Segmented Topic Model (STM) introduced by Lan Du et al. [10] is a topic model that discovers the hierarchical structure of topics by using the two-parameter Poisson Dirichlet process [24]. A four-level probabilistic model, STM contains two levels of topic proportions. Instead of grouping all the questions of a user under a single topic distribution, it allows each question to have a different and separate distribution over the topics. This can lead to more realistic modeling of expertise. In the STM model, words are the basic element of the data represented by 1,...,W. Each question q is considered as a segment that contains N q,w words. A user profile is considered a document that contains questions (segments). A corpus is a collection of profiles. The complete list of notations is shown in Table 1. Figure 2: Generative model for documents in STM Each profile u is a mixture of latent topics denoted by probability vector μ u; each question is also a mixture on the same space of latent topics and is drawn from a probability vector ν u,q for question q of profile u. The expertise set of a user, the main topics of each question in the profile and the correlation between each profile and its questions are modeled by these distributions over topics μ and ν. The generative process of STM for a profile u is as follows: 1. Pick μ u Dirichlet(α). 2. For each question q draw ν u,q PDP(a,b,μ u). 3. For each word w u,q,v choose a topic from z u,q,v discrete (ν u,q). 4. Select a word from w u,q,v discrete(φ zu,q,v ). The graphical representation of STM is shown in Fig.2. The number of topics is assumed to be given. The only observed random variable in this model is W. 5. EXPERIMENTAL STUDY The experimental dataset is based on a snapshot of the community based question answering site Stackoverflow 1.It features questions and answers on a wide range of topics in computer programming

8 answering from frequently-asked question files: Experiences with the faq finder system. Technical report, AI Magazine, [7] Y. Cao, H. Duan, C. yew Lin, Y. Yu, and H. Wuen Hon. Recommending questions using the MDL-based tree cut model. In Proceeding of the 17th International Conference on World Wide Web, (WWW 08, pages ACM, [8] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41: , [9] B. Dom, I. Eiron, A. Cozzi, and Y. Zhang. Graph-based ranking algorithms for expertise analysis. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages ACM, [10] L. Du, W. Buntine, and H. Jin. A segmented topic model based on the two-parameter poisson-dirichlet process. Mach. Learn., 81:5 19, [11] Y. Fu, R. Xiang, Y. Liu, M. Zhang, and S. Ma. A CDD-based formal model for expert finding. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 07, pages ACM, [12] T. L. Griffiths and M. Steyvers. Finding scientific topics. 101: , [13] J. Guo, S. Xu, S. Bao, and Y. Yu. Tapping on the potential of Q&A community by recommending answer providers. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 08, pages ACM, [14] T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 99, pages ACM, [15] J. Jeon, W. B. Croft, and J. H. Lee. Finding semantically similar questions based on their answers. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages ACM, [16] J. Jeon, W. B. Croft, J. H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In SIGIR 06: Proceedings of the 29th Annual International ACM, pages ACM Press, [17] M. Liu, Y. Liu, and Q. Yang. Predicting best answerers for new questions in community question answering. In Proceedings of the 11th International Conference on Web-age Information Management, pages Springer-Verlag, [18] X. Liu, W. B. Croft, and M. B. Koll. Finding experts in community-based question-answering services. In Proceedings of the 14th ACM Conference on Information and Knowledge Management, pages ACM, [19] A. K. McCallum. Mallet: A machine learning for language toolkit [20] D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD, pages , [21] A. Mockus and J. D. Herbsleb. Expertise browser: A quantitative approach to identifying expertise. In Proceedings of International Conference on Software Engineering (ICSE 2002), pages , [22] M. S. Pera and Y.-K. Ng. A community question-answering refinement system. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pages ACM, [23] D. Petkova and W. B. Croft. Hierarchical language models for expert finding in enterprise corpora. Int. J. on AI Tools, [24] J. Pitman and M. Yor. The two-parameter poisson-dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25: , [25] J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 98, pages ACM, [26] M. Qu, G. Qiu, X. He, C. Zhang, H. Wu, J. Bu, and C. Chen. Probabilistic question recommendation for question answering communities. In Proceedings of the 18th International Conference on World Wide web, WWW 09, pages ACM, [27] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages AUAI Press, [28] G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. In Information Processing and Management, pages , [29] C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22: ,

### FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS by Fatemeh Riahi Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie

### Topic and Trend Detection in Text Collections using Latent Dirichlet Allocation

Topic and Trend Detection in Text Collections using Latent Dirichlet Allocation Levent Bolelli 1, Şeyda Ertekin 2, and C. Lee Giles 3 1 Google Inc., 76 9 th Ave., 4 th floor, New York, NY 10011, USA 2

### Finding Similar Questions in Large Question and Answer Archives

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts,

Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com

### Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### I, ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. & A.

ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. Micheal Fernandas* & A. Anitha** * PG Scholar, Department of Master of Computer Applications, Dhanalakshmi Srinivasan Engineering College, Perambalur, Tamilnadu

### PROBABILISTIC MODELING IN COMMUNITY-BASED QUESTION ANSWERING SERVICES

PROBABILISTIC MODELING IN COMMUNITY-BASED QUESTION ANSWERING SERVICES by Zeinab Zolaktaf Zadeh Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie

### Subordinating to the Majority: Factoid Question Answering over CQA Sites

Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

### Latent Dirichlet Markov Allocation for Sentiment Analysis

Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

### Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

### Improving Question Retrieval in Community Question Answering Using World Knowledge

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

### IT services for analyses of various data samples

IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

### Integrated Expert Recommendation Model for Online Communities

Integrated Expert Recommendation Model for Online Communities Abeer El-korany 1 Computer Science Department, Faculty of Computers & Information, Cairo University ABSTRACT Online communities have become

### Searching Questions by Identifying Question Topic and Question Focus

Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn

### WHAT DEVELOPERS ARE TALKING ABOUT?

WHAT DEVELOPERS ARE TALKING ABOUT? AN ANALYSIS OF STACK OVERFLOW DATA 1. Abstract We implemented a methodology to analyze the textual content of Stack Overflow discussions. We used latent Dirichlet allocation

### Online Courses Recommendation based on LDA

Online Courses Recommendation based on LDA Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura Cruz Quispe, José Ochoa Luna National University of St. Agustin Arequipa - Perú {r.guzmanap,elizavvc,lvcruzq,eduardo.ol}@gmail.com

### Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese

### Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

### A Tri-Role Topic Model for Domain-Specific Question Answering

A Tri-Role Topic Model for Domain-Specific Question Answering Zongyang Ma Aixin Sun Quan Yuan Gao Cong School of Computer Engineering, Nanyang Technological University, Singapore 639798 {zma4, qyuan1}@e.ntu.edu.sg

### Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University

Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Presented by Qiang Yang, Hong Kong Univ. of Science and Technology 1 In a Search Engine Company Advertisers

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Learning to Suggest Questions in Online Forums

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Learning to Suggest Questions in Online Forums Tom Chao Zhou 1, Chin-Yew Lin 2,IrwinKing 3, Michael R. Lyu 1, Young-In Song 2

### Interactive Chinese Question Answering System in Medicine Diagnosis

Interactive Chinese ing System in Medicine Diagnosis Xipeng Qiu School of Computer Science Fudan University xpqiu@fudan.edu.cn Jiatuo Xu Shanghai University of Traditional Chinese Medicine xjt@fudan.edu.cn

### 1 o Semestre 2007/2008

Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

### Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

### MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

### Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

### Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

### Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

### TRENDS IN AN INTERNATIONAL INDUSTRIAL ENGINEERING RESEARCH JOURNAL: A TEXTUAL INFORMATION ANALYSIS PERSPECTIVE. wernervanzyl@sun.ac.

TRENDS IN AN INTERNATIONAL INDUSTRIAL ENGINEERING RESEARCH JOURNAL: A TEXTUAL INFORMATION ANALYSIS PERSPECTIVE J.W. Uys 1 *, C.S.L. Schutte 2 and W.D. Van Zyl 3 1 Indutech (Pty) Ltd., Stellenbosch, South

### REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

### Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

### Using Biased Discriminant Analysis for Email Filtering

Using Biased Discriminant Analysis for Email Filtering Juan Carlos Gomez 1 and Marie-Francine Moens 2 1 ITESM, Eugenio Garza Sada 2501, Monterrey NL 64849, Mexico juancarlos.gomez@invitados.itesm.mx 2

### A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

### QASM: a Q&A Social Media System Based on Social Semantics

QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media

### Search Result Optimization using Annotators

Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

### WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Improving Web Image

### International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

### Question Quality in Community Question Answering Forums: A Survey

Question Quality in Community Question Answering Forums: A Survey ABSTRACT Antoaneta Baltadzhieva Tilburg University P.O. Box 90153 Tilburg, Netherlands a baltadzhieva@yahoo.de Community Question Answering

### The Use of Categorization Information in Language Models for Question Retrieval

The Use of Categorization Information in Language Models for Question Retrieval Xin Cao, Gao Cong, Bin Cui, Christian S. Jensen and Ce Zhang Department of Computer Science, Aalborg University, Denmark

### Using Transactional Data from ERP Systems for Expert Finding

Using Transactional Data from ERP Systems for Expert Finding Lars K. Schunk 1 and Gao Cong 2 1 Dynaway A/S, Alfred Nobels Vej 21E, 9220 Aalborg Øst, Denmark 2 School of Computer Engineering, Nanyang Technological

### Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management

Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management Enrique Amigó 1, Damiano Spina 1, Bernardino Beotas 2, and Julio Gonzalo 1 1 Departamento de Lenguajes y Sistemas

### Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

### Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of

### Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

### Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

### Social Business Intelligence Text Search System

Social Business Intelligence Text Search System Sagar Ligade ME Computer Engineering. Pune Institute of Computer Technology Pune, India ABSTRACT Today the search engine plays the important role in the

### PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

### Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection

Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Jian Qu, Nguyen Le Minh, Akira Shimazu School of Information Science, JAIST Ishikawa, Japan 923-1292

### Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes

### Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein

### A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

### Question Utility: A Novel Static Ranking of Question Search

Question Utility: A Novel Static Ranking of Question Search Young-In Song Korea University Seoul, Korea song@nlp.korea.ac.kr Chin-Yew Lin, Yunbo Cao Microsoft Research Asia Beijing, China {cyl, yunbo.cao}@microsoft.com

### Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

### Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

### AMiner-mini: A People Search Engine For University

AMiner-mini: A People Search Engine For University Jingyuan Liu*, Debing Liu*, Xingyu Yan*, Li Dong #, Ting Zeng #, Yutao Zhang*, and Jie Tang* *Dept. of Com. Sci. and Tech., Tsinghua University # Tsinghua

### From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance Emanuele Di Buccio Department of Information Engineering University of Padua, Italy dibuccio@dei.unipd.it Mounia Lalmas

### Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

### Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

### Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Ms. M. Subha #1, Mr. K. Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional

### Query Recommendation employing Query Logs in Search Optimization

1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish

### COMMUNITY QUESTION ANSWERING (CQA) services, Improving Question Retrieval in Community Question Answering with Label Ranking

Improving Question Retrieval in Community Question Answering with Label Ranking Wei Wang, Baichuan Li Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong

### Shopping for Top Forums: Discovering Online Discussion for Product Research

Shopping for Top Forums: Discovering Online Discussion for Product Research ABSTRACT Jonathan L. Elsas Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213 jelsas@cs.cmu.edu

### arxiv: v1 [cs.ir] 20 Dec 2016

Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016 Nam Khanh Tran L3S Research Center - Leibniz Universität Hannover ntran@l3s.de arxiv:1612.07117v1 [cs.ir] 20 Dec

### Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

### Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

### Analysis of Web Archives. Vinay Goel Senior Data Engineer

Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner

### Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems

Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems 1 Ciro Cattuto, 2 Dominik Benz, 2 Andreas Hotho, 2 Gerd Stumme 1 Complex Networks Lagrange Laboratory (CNLL), ISI Foundation,

### A Study and Comparison of Sentiment Analysis Methods for Reputation Evaluation

A Study and Comparison of Sentiment Analysis Methods for Reputation Evaluation Anaïs Collomb (anais.collomb@insa-lyon.fr) Crina Costea (crina.costea@insa-lyon.fr) Damien Joyeux (damien.joyeux@insa-lyon.fr)

### Mobile Storage and Search Engine of Information Oriented to Food Cloud

Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

### Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

### Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of

### A Survey and Evaluation of State-of-the-Art. Intelligent Question Routing Systems

A Survey and Evaluation of State-of-the-Art Intelligent Routing Systems Bojan Furlan, Bosko Nikolic, and Veljko Milutinovic, Fellow of the IEEE School of Electrical Engineering, University of Belgrade,

### A Classification-based Approach to Question Answering in Discussion Boards

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University Bethlehem, PA 18015 USA {lih307,davison}@cse.lehigh.edu

### Web Usage in Client-Server Design

Web Search Web Usage in Client-Server Design A client (e.g., a browser) communicates with a server via http Hypertext transfer protocol: a lightweight and simple protocol asynchronously carrying a variety

### ADI: Towards a Framework of App Developer Inspection

ADI: Towards a Framework of App Developer Inspection Kai Xing, Di Jiang, Wilfred Ng, Xiaotian Hao Department of Computer Science and Engineering Hong Kong University of Science and Technology Clear Water

### PRODUCT REVIEW RANKING SUMMARIZATION

PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

### ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti

### W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

### Clustering Documents with Active Learning using Wikipedia

Clustering Documents with Active Learning using Wikipedia Anna Huang David Milne Eibe Frank Ian H. Witten Department of Computer Science, University of Waikato Private Bag 3105, Hamilton, New Zealand {lh92,

### dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

### Intinno: A Web Integrated Digital Library and Learning Content Management System

Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master

### INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

### Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

### How much can Behavioral Targeting Help Online Advertising? Jun Yan 1, Ning Liu 1, Gang Wang 1, Wen Zhang 2, Yun Jiang 3, Zheng Chen 1

WWW 29 MADRID! How much can Behavioral Targeting Help Online Advertising? Jun Yan, Ning Liu, Gang Wang, Wen Zhang 2, Yun Jiang 3, Zheng Chen Microsoft Research Asia Beijing, 8, China 2 Department of Automation

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

### Search engine ranking

Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University

### Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

### A Framework to Predict the Quality of Answers with Non-Textual Features

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon 1, W. Bruce Croft 1, Joon Ho Lee 2 and Soyeon Park 3 1 Center for Intelligent Information Retrieval, University of Massachusetts-Amherst,

### Inference Methods for Analyzing the Hidden Semantics in Big Data. Phuong LE-HONG phuonglh@gmail.com

Inference Methods for Analyzing the Hidden Semantics in Big Data Phuong LE-HONG phuonglh@gmail.com Introduction Grant proposal for basic research project Nafosted, 2014 24 months Principal Investigator:

### CatDV Pro Workgroup Serve r

Architectural Overview CatDV Pro Workgroup Server Square Box Systems Ltd May 2003 The CatDV Pro client application is a standalone desktop application, providing video logging and media cataloging capability

### Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School

### Predicting Publication Date: a Text Analysis Exercise over 250,000 Volumes in the HTRC Secure HathiTrust Analytics Research Commons

Predicting Publication Date: a Text Analysis Exercise over 250,000 Volumes in the HTRC Secure HathiTrust Analytics Research Commons Use case: RDA Digital Humanities Workshop, May 2015 The HathiTrust digital