# Question Utility: A Novel Static Ranking of Question Search

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 surface forms. For example, What is the must-see things in Paris? and What should I see in Paris? To avoid the bias that we don t expect to have, we introduce a length normalization method to regularize the raw probabilities: [ ] log p(q) p norm (Q) exp (2) log(m + α) where α is a smoothing parameter in case that m equals 1. In our experiments, it is set 0.1. LexRank Method for Evaluating Question Utility One alternative way to evaluate question utility is to identify the most central questions in a given question collection based on the lexical centrality: If a topic is very useful to people, there can be many lexically similar questions related to the topic in the question collection. Among those similar questions, the most central questions can be regarded as the most representative (or useful) ones regarding the topic. To measure such lexical centrality of questions, we use the LexRank method which is proposed for document summarization by Erkan and Radev (2004). The LexRank estimates the centrality of a sentence (in our case, a question) in a manner similar to the PageRank (Brin and Page 1998) algorithm: First, from a question collection, lexically similar questions are selected by examining cosign similarity between questions, and on the basis of it, an undirected graph of questions is constructed. In the graph, each node represents a question and two nodes are edged such that similarity between them exceeds a certain threshold value. Second, for a question Q, the centrality c(q) is calculated based on the Random Walk algorithm with the following weighting scheme: c i (Q) = d + (1 d) N v adj[q] c i 1 (v) deg(v) where N is the total number of nodes in the graph, d is a dampening factor, adj[q] is the set of nodes adjacent to Q, and deg(v) denotes the degree of node v (the number of its adjacent nodes). For the detailed description, see (Erkan and Radev 2004). Also, we can combine the language modeling approach described in the previous section with the LexRank method by using utility score from a language model as an initial value in the LexRank weighting scheme (Otterbacher, Erkan, and Radev 2005): c i (Q) = d p(q) + (1 d) v adj[q] c i 1 (v) deg(v) where p(q) is the likelihood of the question Q estimated by language model. We will investigate the effectiveness of the both method in our experiments. Using the Utility as Static Ranking of Search In this paper, we confine ourselves to use question utility as static ranking of question search. The use of question utility, (3) (4) however, can be extended to other applications of reusing questions, too. In terms of question retrieval, the query likelihood retrieval model is defined as the probabilistic function of generating a user s query Q from the question language model Q as follows: p(q Q ) = p(q Q)p(Q) p(q ) p(q Q)p(Q) (5) where p(q ), the likelihood of a query, does not affect the ranking of questions so it can be ignored by the rank preserving principle (Jones, Walker, and Robertson 2000). Then, the generation probability p(q Q) is decomposed into a unigram model by using zero order Markov assumption: p(q Q ) p(q) w Q p(w Q) (6) In this equation, p(q) is the prior probability of question Q reflecting a static rank of the question, independent on a query Q. Since our utility of a question is defined as the likelihood of a question regardless of a specific query, it can be naturally integrated in this retrieval framework. Therefore, in our approach, we use a value calculated by the proposed methods for a question Q as a value of the probabilistic term p(q) in the equation (6). To add flexibility to control the importance of each factor in the retrieval model, we modify the equation (6) into a log linear form: p(q Q 1 ) λ 1 log p(q) + λ 2 log p(w Q) Z(λ 1, λ 2 ) w Q (7) where λ 1 and λ 2 are interpolation parameters, and Z(λ 1, λ 2 ) is a normalization factor. Because the normalization factor Z(λ 1, λ 2 ) also does not affect the ranks of question, we can modify the equation (7) by removing Z(λ 1, λ 2 ) and introducing a new constant α (α = λ 1 /λ 2 ): p(q Q ) α log p(q) + log p(w Q) (8) w Q To estimate the unigram probability p(w Q), we use the linear interpolated smoothing (Zhai and Lafferty 2001): p(w Q) = λ d p mle (w Q) + (1 λ d ) p mle (w C) (9) where p mle ( ) indicates a probability estimated using maximum likelihood estimator, C denotes a question collection and λ d is the smoothing parameter. In our experiments, the optimal value for λ d has been empirically determined by an exhausted search of the parameter space. Empirical Evaluations Our empirical evaluations consist of two experiments. One is to evaluate the proposed approach to assessing question utility. The other is to evaluate the use of question utility as static ranking of question search.

5 Table 4: Assessing Question Utility ( +NORM indicates that the length normalization is used and a number in each cell indicate MAP score for each method.) Method LA Paris Beijing Inverse Length Unigram Unigram + Norm Trigram Trigram + Norm Seoul Tokyo Average Inverse Length Unigram Unigram + Norm Trigram Trigram + Norm Table 5: Comparison of Top 3 Ranked Results for L.A. topic Inverse Length Trigram Trigram+Norm 1. Downtown 1. Go to LA? 1. What are some fun LA? things to do in LA? 2. Go to LA? 2. Live in L.A.? 2. Where is the best place to live in LA? 3. NYC vs. L.A? 3. Hotel in LA? 3. What to do in L.A? length of question can be a competitive baseline method. In the experiments, for each topic (city name), we used both our method and the baseline method to rank all the questions from SRC-DAT regarding the topic. Then, we made use of SET-B to evaluate the results. From Table 4, we see that our method based on either unigram language model or trigram language model outperforms the baseline method significantly. Furthermore, the trigram language model performs better than unigram language model. This might be because the former can take into consideration rich and meaningful word groups, e.g., best hotel or place to stay. These results indicate that our language model-based method has ability to measure utility of questions. Also, the length normalization has proven to be very effective to meliorate the bias that our language modeling based method prefers to short texts (questions). For both the unigram model and the trigram model, the length normalization boosts the performance by about 20%. Table 5 shows top 3 results from three different methods: the baseline method, the trigram method and the trigram method with the normalization. In the table, relevant questions are highlighted in bold face. Using Question Utility as Static Ranking of Question Search. In the experiment, we evaluate the performance of question search with four different configurations: (a) the query likelihood model without static ranking (We used it as our baseline method and refer to it as QM); (b) QM incorporating (question) utility scores estimated by the language modeling method (denoted as QM+ULM); (c) QM incorporating (question) utility scores estimated by the LexRank method (denoted as QM+LEX); (d) QM incor- Table 6: Comparison of Retrieval Performances MAP ( %) R-prec ( %) ( %) QM (-) (-) (-) QM+LEX 0.494* (+1.02) 0.443*(3.50) (2.55) QM+ULM 0.509* (4.09) 0.462* (7.94) (4.25) QM+BOTH 0.512* (4.70) 0.469* (9.58) (8.93) porating (question) utility scores estimated by the combination of the language model method and the LexRank method (denoted as QM+BOTH). For the QM+ULM and the QM+BOTH, we used the trigram language model with length normalization. Table 6 provides the experimental results. When compared with the baseline method, our methods incorporating question utility consistently show better performance in terms of all the evaluation measures. It can support our hypothesis that question utility can be important factor in question search. All the performance improvements obtained by the use of question utility are statistically significant (t-test, p-value < 0.05). As shown in Table 6, although both QM+LEX and QM+ULM can improve the performance of question search compared to QM (not using question utility), QM+ULM outperformed QM+LEX (the LexRank method). To our observation, one reason for this can be related to the ability of the language model to capture meaningful word sequences. N-gram language model can naturally reflect important word sequences carrying on askers intention, for example where to stay or how far from. While the LexRank method cannot model such word sequences because it assumes that words in questions are statistically independent from each other. However, the LexRank method is able to reflect similarity between questions into the estimation of question utility, which cannot be achieved by the language model method. Thus, in our experiments, we use the combination (QM+BOTH) of both methods for further improvement of the question utility estimation. To our observation, our method is especially effective when a query is short and ambiguous. For example, best of boston, italy travel package, and boston to London. This is because these queries usually have many related results, which is very similar to what happens with web search. In a contrast, our method fails to improve (or even drop) the performance when a query is very specific. For example, miles from Los Angeles to New York. Table 7 provides the results that are rendered by the baseline method (QM) and our method (QM+ULM). In the table, relevant questions are highlighted in bold face. Conclusion In this paper, we studied usefulness of questions within the setting of question search. The contribution of this paper can be summarized in four-fold: (a) we proposed the notion of question utility to study usefulness of questions. To the best of our knowledge, this is the first trial of studying usefulness of questions. (b) We proposed to use question utility as static ranking to boost the performance of question search. (c) We

### Searching Questions by Identifying Question Topic and Question Focus

Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn

### Probability Estimation

Probability Estimation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 23, 2009 Outline Laplace Estimator Good-Turing Backoff The Sparse Data Problem There is a major problem with

### Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese

### A Framework to Predict the Quality of Answers with Non-Textual Features

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon 1, W. Bruce Croft 1, Joon Ho Lee 2 and Soyeon Park 3 1 Center for Intelligent Information Retrieval, University of Massachusetts-Amherst,

### Subordinating to the Majority: Factoid Question Answering over CQA Sites

Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

### Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

### Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

### Finding Similar Questions in Large Question and Answer Archives

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts,

### The Use of Categorization Information in Language Models for Question Retrieval

The Use of Categorization Information in Language Models for Question Retrieval Xin Cao, Gao Cong, Bin Cui, Christian S. Jensen and Ce Zhang Department of Computer Science, Aalborg University, Denmark

### Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

### Improving Question Retrieval in Community Question Answering Using World Knowledge

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

### Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

### Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

### Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

### A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives

A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives Xin Cao 1, Gao Cong 1, 2, Bin Cui 3, Christian S. Jensen 1 1 Department of Computer

### Statistical Natural Language Processing

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language Processing Some subproblems are partially solved Spelling correction, grammar checking Information retrieval with

### Improving Web Page Retrieval using Search Context from Clicked Domain Names

Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands

### HMM Speech Recognition. Words: Pronunciations and Language Models. Pronunciation dictionary. Out-of-vocabulary (OOV) rate.

HMM Speech Recognition ords: Pronunciations and Language Models Recorded Speech Decoded Text (Transcription) Steve Renals Acoustic Features Acoustic Model Automatic Speech Recognition ASR Lecture 9 January-March

### Incorporate Credibility into Context for the Best Social Media Answers

PACLIC 24 Proceedings 535 Incorporate Credibility into Context for the Best Social Media Answers Qi Su a,b, Helen Kai-yun Chen a, and Chu-Ren Huang a a Department of Chinese & Bilingual Studies, The Hong

### RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

### An Information Retrieval using weighted Index Terms in Natural Language document collections

Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

### 1 Contact:

1 Contact: fowler@panam.edu Visualizing the Web as Hubs and Authorities Richard H. Fowler 1 and Tarkan Karadayi Technical Report CS-02-27 Department of Computer Science University of Texas Pan American

### Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

### Learning to Suggest Questions in Online Forums

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Learning to Suggest Questions in Online Forums Tom Chao Zhou 1, Chin-Yew Lin 2,IrwinKing 3, Michael R. Lyu 1, Young-In Song 2

### CS 533: Natural Language. Word Prediction

CS 533: Natural Language Processing Lecture 03 N-Gram Models and Algorithms CS 533: Natural Language Processing Lecture 01 1 Word Prediction Suppose you read the following sequence of words: Sue swallowed

### Research on News Video Multi-topic Extraction and Summarization

International Journal of New Technology and Research (IJNTR) ISSN:2454-4116, Volume-2, Issue-3, March 2016 Pages 37-39 Research on News Video Multi-topic Extraction and Summarization Di Li, Hua Huo Abstract

### Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

### Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School

### International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.

RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,

### Mining Navigation Histories for User Need Recognition

Mining Navigation Histories for User Need Recognition Fabio Gasparetti and Alessandro Micarelli and Giuseppe Sansonetti Roma Tre University, Via della Vasca Navale 79, Rome, 00146 Italy {gaspare,micarel,gsansone}@dia.uniroma3.it

### Ranking on Data Manifolds

Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

### Distinguishing Opinion from News Katherine Busch

Distinguishing Opinion from News Katherine Busch Abstract Newspapers have separate sections for opinion articles and news articles. The goal of this project is to classify articles as opinion versus news

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### Building Enriched Document Representations using Aggregated Anchor Text

Building Enriched Document Representations using Aggregated Anchor Text Donald Metzler, Jasmine Novak, Hang Cui, and Srihari Reddy Yahoo! Labs 1 http://www.search-engines-book.com/ http://research.microsoft.com/

### Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

### Query Performance Prediction

Query Performance Prediction Ben He a Iadh Ounis a a Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, United Kingdom ben, ounis@dcs.gla.ac.uk Abstract The prediction of query performance

### Citation Context Sentiment Analysis for Structured Summarization of Research Papers

Citation Context Sentiment Analysis for Structured Summarization of Research Papers Niket Tandon 1,3 and Ashish Jain 2,3 ntandon@mpi-inf.mpg.de, ashish.iiith@gmail.com 1 Max Planck Institute for Informatics,

### Clinical Decision Support with the SPUD Language Model

Clinical Decision Support with the SPUD Language Model Ronan Cummins The Computer Laboratory, University of Cambridge, UK ronan.cummins@cl.cam.ac.uk Abstract. In this paper we present the systems and techniques

### SEARCHING QUESTION AND ANSWER ARCHIVES

SEARCHING QUESTION AND ANSWER ARCHIVES A Dissertation Presented by JIWOON JEON Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for

### A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track)

A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track) Linh Hoang, Seung-Wook Lee, Gumwon Hong, Joo-Young Lee, Hae-Chang Rim Natural Language Processing Lab., Korea University (linh,swlee,gwhong,jylee,rim)@nlp.korea.ac.kr

### Information Retrieval Statistics of Text

Information Retrieval Statistics of Text James Allan University of Massachusetts, Amherst (NTU ST770-A) Fall 2002 All slides copyright Bruce Croft and/or James Allan Outline Zipf distribution Vocabulary

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

### THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

### Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives 7 XIN CAO and GAO CONG, Nanyang Technological University BIN CUI, Peking University CHRISTIAN S.

### Similarity Measures for Short Segments of Text

Similarity Measures for Short Segments of Text Donald Metzler 1, Susan Dumais 2, Christopher Meek 2 1 University of Massachusetts 2 Microsoft Research Amherst, MA Redmond, WA Abstract. Measuring the similarity

### Language Modeling. Chapter 1. 1.1 Introduction

Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

### Web Page Scoring Based on Link Analysis of Web Page Sets

Web Page Scoring Based on Link Analysis of Web Page Sets Hitoshi Nakakubo, Shinsuke Nakajima, Kenji Hatano, Jun Miyazaki, and Shunsuke Uemura Innovative Technology Development Center, U-TEC Corporation

### Tetris: Experiments with the LP Approach to Approximate DP

Tetris: Experiments with the LP Approach to Approximate DP Vivek F. Farias Electrical Engineering Stanford University Stanford, CA 94403 vivekf@stanford.edu Benjamin Van Roy Management Science and Engineering

### Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

### Identifying Best Bet Web Search Results by Mining Past User Behavior

Identifying Best Bet Web Search Results by Mining Past User Behavior Eugene Agichtein Microsoft Research Redmond, WA, USA eugeneag@microsoft.com Zijian Zheng Microsoft Corporation Redmond, WA, USA zijianz@microsoft.com

### Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of

### Phrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues

Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries

### A Classification-based Approach to Question Answering in Discussion Boards

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University Bethlehem, PA 18015 USA {lih307,davison}@cse.lehigh.edu

### Interactive Chinese Question Answering System in Medicine Diagnosis

Interactive Chinese ing System in Medicine Diagnosis Xipeng Qiu School of Computer Science Fudan University xpqiu@fudan.edu.cn Jiatuo Xu Shanghai University of Traditional Chinese Medicine xjt@fudan.edu.cn

### From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance Emanuele Di Buccio Department of Information Engineering University of Padua, Italy dibuccio@dei.unipd.it Mounia Lalmas

### Online edition (c)2009 Cambridge UP

DRAFT! April 1, 2009 Cambridge University Press. Feedback welcome. 237 12 Language models for information retrieval A common suggestion to users for coming up with good queries is to think of words that

### The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

### UMass at TREC 2008 Blog Distillation Task

UMass at TREC 2008 Blog Distillation Task Jangwon Seo and W. Bruce Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This paper presents the work done for

### Using Transactional Data from ERP Systems for Expert Finding

Using Transactional Data from ERP Systems for Expert Finding Lars K. Schunk 1 and Gao Cong 2 1 Dynaway A/S, Alfred Nobels Vej 21E, 9220 Aalborg Øst, Denmark 2 School of Computer Engineering, Nanyang Technological

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### An Exploration of Ranking Heuristics in Mobile Local Search

An Exploration of Ranking Heuristics in Mobile Local Search ABSTRACT Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801, USA ylv2@uiuc.edu Users increasingly

### LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

### Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

### Risk Knowledge Capture in the Riskit Method

Risk Knowledge Capture in the Riskit Method Jyrki Kontio and Victor R. Basili jyrki.kontio@ntc.nokia.com / basili@cs.umd.edu University of Maryland Department of Computer Science A.V.Williams Building

### Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

### Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment

2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. ahmad@abdulkader.org

### Department of Cognitive Sciences University of California, Irvine 1

Mark Steyvers Department of Cognitive Sciences University of California, Irvine 1 Network structure of word associations Decentralized search in information networks Analogy between Google and word retrieval

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

### Modeling Contextual Factors of Click Rates

Modeling Contextual Factors of Click Rates Hila Becker Columbia University 500 W. 120th Street New York, NY 10027 Christopher Meek Microsoft Research 1 Microsoft Way Redmond, WA 98052 David Maxwell Chickering

### Automated Collage Generation With Intent

Automated Collage Generation With Intent Anna Krzeczkowska, Jad El-Hage, Simon Colton and Stephen Clark Department of Computing, Imperial College, London, UK. University of Cambridge Computer Laboratory,

### Probabilistic topic models for sentiment analysis on the Web

University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

### Recommending Citations for Academic Papers

Recommending Citations for Academic Papers Trevor Strohman, W. Bruce Croft, and David Jensen University of Massachusetts Amherst Abstract. Substantial effort is wasted in scientific circles by researchers

### REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

### Significance of Sentence Ordering in Multi Document Summarization

Computing For Nation Development, March 10 11, 2011 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi Significance of Sentence Ordering in Multi Document Summarization Rashid

### Ranking Biomedical Passages for Relevance and Diversity

Ranking Biomedical Passages for Relevance and Diversity University of Wisconsin-Madison at TREC Genomics 2006 Andrew B. Goldberg,, David Andrzejewski, Jurgen Van Gael, Burr Settles, Xiaojin Zhu, Mark Craven

### TEMPER : A Temporal Relevance Feedback Method

TEMPER : A Temporal Relevance Feedback Method Mostafa Keikha, Shima Gerani and Fabio Crestani {mostafa.keikha, shima.gerani, fabio.crestani}@usi.ch University of Lugano, Lugano, Switzerland Abstract. The

### Web Search Engines: Solutions

Web Search Engines: Solutions Problem 1: A. How can the owner of a web site design a spider trap? Answer: He can set up his web server so that, whenever a client requests a URL in a particular directory

### Evaluation of Features for Sentence Extraction on Different Types of Corpora

Evaluation of Features for Sentence Extraction on Different Types of Corpora Chikashi Nobata, Satoshi Sekine and Hitoshi Isahara Communications Research Laboratory 3-5 Hikaridai, Seika-cho, Soraku-gun,

### Latent Dirichlet Markov Allocation for Sentiment Analysis

Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

### Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering

Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering Guangyou Zhou 1, Tingting He 1, Jun Zhao 2, and Po Hu 1 1 School of Computer, Central China Normal

### Semantic Analysis of Song Lyrics

Semantic Analysis of Song Lyrics Beth Logan, Andrew Kositsky 1, Pedro Moreno Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-66 April 14, 2004* E-mail: Beth.Logan@hp.com, Pedro.Moreno@hp.com

### Web Content Summarization Using Social Bookmarking Service

ISSN 1346-5597 NII Technical Report Web Content Summarization Using Social Bookmarking Service Jaehui Park, Tomohiro Fukuhara, Ikki Ohmukai, and Hideaki Takeda NII-2008-006E Apr. 2008 Jaehui Park School

21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions

### Ordering Sentences According to Topicality

Ordering Sentences According to Topicality Ilana Bromberg The Ohio State University bromberg@ling.ohio-state.edu Abstract This paper addresses the problem of finding or producing the best ordering of the

### Automated FAQ Answering with Question-Specific Knowledge Representation for Web Self-Service

HSI 2009 Catania, Italy, May 21-23, 2009 Automated FAQ Answering with Question-Specific Knowledge Representation for Web Self-Service Eriks Sneiders Dept. of Computer and Systems Sciences, Stockholm University

### A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

### Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein

### International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

### Chapter 1: A Theory of the Language Learning Mechanism

Chapter 1: A Theory of the Language Learning Mechanism 1.1 The Mechanism of Language Learning Language learning is a curious enterprise, effortless for children while often effortful for adults. This intriguing

### Science Curriculum Review Worksheets

Science Curriculum Review Worksheets Table 1. ACT for Score Range 13-15 IOD 201 IOD 202 IOD 203 SIN 201 SIN 202 EMI 201 Select one piece of data from a simple data presentation (e.g., a simple food web

### Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

### MATHEMATICS FOR ENGINEERS STATISTICS TUTORIAL 4 PROBABILITY DISTRIBUTIONS

MATHEMATICS FOR ENGINEERS STATISTICS TUTORIAL 4 PROBABILITY DISTRIBUTIONS CONTENTS Sample Space Accumulative Probability Probability Distributions Binomial Distribution Normal Distribution Poisson Distribution

### Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

### Web Usage in Client-Server Design

Web Search Web Usage in Client-Server Design A client (e.g., a browser) communicates with a server via http Hypertext transfer protocol: a lightweight and simple protocol asynchronously carrying a variety

### The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of