# Community Answer Summarization for Multi-Sentence Question with Group L 1 Regularization

Save this PDF as:

Size: px
Start display at page:

## Transcription

5 Figure 1: Four kinds of the contextual factors are considered for answer summarization in our general CRF based models. tions as: sim(x, y) = 2 (w 1,w 2 ) M(x,y) sim(w 1, w 2 ) x + y (2) where M(x, y) denotes synset pairs matched in sentences x and y; and the similarity between the two synsets w 1 and w 2 is computed to be inversely proportional to the length of the path in Wordnet. One answer sentence may related to more than one sub questions to some extent. Thus, we define the replied question Qr i as the sub question with the maximal similarity to sentence x i : Qr i = argmax Qj sim(x i, Q j ). It is intuitive that different summary sentences aim at answering different sub questions. Therefore, we design the following two contextual factors based on the similarity of replied questions. Dissimilar Replied Question Factor: Given two answer sentences x i, x j and their corresponding replied questions Qr i, Qr j. If the similarity 2 of Qr i and Qr j is below some threshold τ lq, it means that x i and x j will present different viewpoints to answer different sub questions. In this case, it is likely that x i and x j are both summary sentences; we ensure this by setting the contextual factor cf 1 with a large value of exp ν, where ν is a positive real constant often assigned to value 1; otherwise we set cf 1 to exp ν for penalization. { exp ν, y i = y j = 1 cf 1 = exp ν, otherwise Similar Replied Question Factor: Given two an- 2 We use the semantic similarity of Equation 2 for all our similarity measurement in this paper. swer sentences x i, x j and their corresponding replied questions Qr i, Qr j. If the similarity of Qr i and Qr j is above some upper threshold τ uq, this means that x i and x j are very similar and likely to provide similar viewpoint to answer similar questions. In this case, we want to select either x i or x j as answer. This is done by setting the contextual factor cf 2 such that x i and x j have opposite labels, { exp ν, y i y j = 1 cf 2 = exp ν, otherwise Assuming that sentence x i is selected as a summary sentence, and its next local neighborhood sentence x i+1 by the same author is dissimilar to it but it is relevant to the original multi-sentence question, then it is reasonable to also pick x i+1 as a summary sentence because it may offer new viewpoints by the author. Meanwhile, other local and non-local sentences which are similar to it at above the upper threshold will probably not be selected as summary sentences as they offer similar viewpoint as discussed above. Therefore, we propose the following two kinds of contextual factors for selecting the answer sentences in the CRF model. Local Novelty Factor: If the similarity of answer sentence x i and x i+1 given by the same author is below a lower threshold τ ls, but their respective similarities to the sub questions both exceed an upper threshold τ us, then we will boost the probability of selecting both as summary sentences by setting: { exp ν, y i = y i+1 = 1 cf 3 = exp ν, otherwise Redundance Factor: If the similarity of answer 586

6 sentence x i and x j is greater than the upper threshold τ us, then they are likely to be redundant and hence should be given opposite labels. This is done by setting: { exp ν, y i y j = 1 cf 4 = exp ν, otherwise Figure 1 gives an illustration of these four contextual factors in our proposed general CRF based model. The parameter estimation and model inference are discussed in the following subsection. 3.3 Group L 1 Regularization for Feature Learning In the context of cqa summarization task, some features are intuitively to be more important than others. As a result, we group the parameters in our CRF model with their related features 3 and introduce a group L 1 -regularization term for selecting the most useful features from the least important ones, where the regularization term becomes, R(θ) = C G θ g 2, (3) g=1 where C controls the penalty magnitude of the parameters, G is the number of feature groups and θ g denotes the parameters corresponding to the particular group g. Notice that this penalty term is indeed a L(1, 2) regularization because in every particular group we normalize the parameters in L 2 norm while the weight of a whole group is summed in L 1 form. Given a set of training data D = (x (i), y (i) ), i = 1,..., N, the parameters θ = (µ l, λ k ) of the general CRF with the group L 1 -regularization are estimated in using a maximum log likelihood function L as: L = N log(p θ (y (i) x (i) )) C i=1 G θ g 2, (4) g=1 3 We note that every sentence-level feature discussed in Section 3.2 presents a variety of instances (e.g., the sentence with longer or shorter length is the different instance), and we may call it sub-feature of the original sentence-level feature in the micro view. Every sub-feature has its corresponding weight in our CRF model. Whereas in a macro view, those related subfeatures can be considered as a group. where N denotes the total number of training samples. we compute the log-likelihood gradient component of θ in the first term of Equation 4 as in usual CRFs. However, the second term of Equation 4 is non-differentiable when some special θ g 2 becomes exactly zero. To tackle this problem, an additional variable is added for each group (Schmidt, 2010); that is, by replacing each norm θ g 2 with the variable α g, subject to the constraint α g θ g 2, i.e., L = N log(p θ (y (i) x (i) )) C i=1 G α g, g=1 subject to α g θ g 2, g. (5) This formulation transforms the non-differentiable regularizer to a simple linear function and maximizing Equation 5 will lead to a solution to Equation 4 because it is a lower bound of the latter. Then, we add a sufficient small positive constant ε when computing the L 2 norm (Lee et al., 2006), i.e., θ g 2 = g j=1 θ2 gj + ε, where g denotes the number of features in group g. To obtain the optimal value of parameter θ from the training data, we use an efficient L-BFGS solver to solve the problem, and the first derivative of every feature j in group g is, δl δθ gj = N N C gj (y (i), x (i) ) i=1 p(y x (i) )C gj (y, x (i) ) 2C θ gj i=1 y g l=1 θ2 gl + ε (6) where C gj (y, x) denotes the count of feature j in group g of observation-label pair (x, y). The first two terms of Equation 6 measure the difference between the empirical and the model expected values of feature j in group g, while the third term is the derivative of group L 1 priors. For inference, the labeling sequence can be obtained by maximizing the probability of y conditioned on x, y = argmax y p θ (y x). (7) We use a modification of the Viterbi algorithm to perform inference of the CRF with non-local edges 587

8 Model R1 P R1 R R1 F1 R2 P R2 R R2 F1 RL P RL R RL F1 SVM 79.2% 52.5% 63.1% 71.9% 41.3% 52.4% 67.1% 36.7% 47.4% LR 75.2% 57.4% 65.1% 66.1% 48.5% 56.0% 61.6% 43.2% 50.8% LCRF 78.7%- 61.8% 69.3%- 71.4%- 54.1% 61.6% 67.1%- 49.6% 57.0% gcrf 81.9% 65.2% 72.6% 76.8% 57.3% 65.7% 73.9% 53.5% 62.1% gcrf-qs 81.4%- 70.0% 75.3% 76.2%- 62.4% 68.6% 73.3%- 58.6% 65.1% gcrf-qs-l1 86.6% 68.3%- 76.4% 82.6% 61.5%- 70.5% 80.4% 58.2%- 67.5% Table 3: The Precision, Recall and F1 of ROUGE-1, ROUGE-2, ROUGE-L in the baselines SVM,LR, LCRF and our general CRF based models (gcrf, gcrf-qs, gcrf-qs-l1). The down-arrow means performance degradation with statistical significance. Model Precision Recall F1 SVM 65.93% 61.96% 63.88% LR 66.92% % %- LCRF 69.80% 63.91% % gcrf 73.77% 69.43% 71.53% gcrf-qs 74.78% 72.51% 73.63% gcrf-qs-l % 71.73% % Table 2: The Precision, Recall and F1 measures of the baselines SVM,LR, LCRF and our general CRF based models (gcrf, gcrf-qs, gcrf-qs-l1). The up-arrow denotes the performance improvement compared to the precious method (above) with statistical significance under p value of 0.05, the short line - denotes there is no difference in statistical significance. which just utilize the independent sentence-level features, behave not vary well here, and there is no statistically significant performance difference between them. We also find that LCRF which utilizes the local context information between sentences perform better than the LR method in precision and F1 with statistical significance. While we consider the general local and non-local contextual factor cf 3 and cf 4 for novelty and non-redundancy constraints, the gcrf performs much better than LCRF in all three measures; and we obtain further performance improvement by adding the contextual factors based on QS, especially in the recall measurement. This is mainly because we have divided the question into several sub questions, and the system is able to select more novel sentences than just treating the original multi-sentence as a whole. In addition, when we replace the default L 2 regularization by the group L 1 regularization for more efficient feature weight learning, we obtain a much better performance in precision while not sacrificing the recall measurement statistically. We also compute the Precision, Recall and F1 in ROUGE-1, ROUGE-2 and ROUGE-L measurements, which are widely used to measure the quality of automatic text summarization. The experimental results are listed in Table 3. All results in the Table are the average of the ten-fold cross validation experiments on our dataset. It is observed that our gcrf-qs-l1 model improves the performance in terms of precision, recall and F1 score on all three measurements of ROUGE- 1, ROUGE-2 and ROUGE-L by a significant margin compared to other baselines due to the use of local and non-local contextual factors and factors based on QS with group L 1 regularization. Since the ROUGE measures care more about the recall and precision of N-grams as well as common substrings to the reference summary rather than the whole sentence, they offer a better measurement in modeling the user s information needs. Therefore, the improvements in these measures are more encouraging than those of the average classification accuracy for answer summarization. From the viewpoint of ROUGE measures we observe that our question segmentation method can enhance the recall of the summaries significantly due to the more fine-grained modeling of sub questions. We also find that the precision of the group L 1 regularization is much better than that of the default L 2 regularization while not hurting the recall significantly. In general, the experimental results show that our proposed method is more effective than other baselines in answer summarization for addressing the incomplete answer problem in cqas. 589

### Subordinating to the Majority: Factoid Question Answering over CQA Sites

Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

### A Classification-based Approach to Question Answering in Discussion Boards

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University Bethlehem, PA 18015 USA {lih307,davison}@cse.lehigh.edu

### Joint Relevance and Answer Quality Learning for Question Routing in Community QA

Joint Relevance and Answer Quality Learning for Question Routing in Community QA Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

### Question Routing by Modeling User Expertise and Activity in cqa services

Question Routing by Modeling User Expertise and Activity in cqa services Liang-Cheng Lai and Hung-Yu Kao Department of Computer Science and Information Engineering National Cheng Kung University, Tainan,

### Improving Question Retrieval in Community Question Answering Using World Knowledge

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

### Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of

### Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese

### Question Utility: A Novel Static Ranking of Question Search

Question Utility: A Novel Static Ranking of Question Search Young-In Song Korea University Seoul, Korea song@nlp.korea.ac.kr Chin-Yew Lin, Yunbo Cao Microsoft Research Asia Beijing, China {cyl, yunbo.cao}@microsoft.com

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement

Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement Jiang Bian College of Computing Georgia Institute of Technology jbian3@mail.gatech.edu Eugene Agichtein

### The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

### RETRIEVING QUESTIONS AND ANSWERS IN COMMUNITY-BASED QUESTION ANSWERING SERVICES KAI WANG

RETRIEVING QUESTIONS AND ANSWERS IN COMMUNITY-BASED QUESTION ANSWERING SERVICES KAI WANG (B.ENG, NANYANG TECHNOLOGICAL UNIVERSITY) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY SCHOOL OF COMPUTING

### MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

### Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

### Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

### Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

### Topical Authority Identification in Community Question Answering

Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95

### Detecting Promotion Campaigns in Community Question Answering

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Detecting Promotion Campaigns in Community Question Answering Xin Li, Yiqun Liu, Min Zhang, Shaoping

### Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media ABSTRACT Jiang Bian College of Computing Georgia Institute of Technology Atlanta, GA 30332 jbian@cc.gatech.edu Eugene

### Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Ming-Wei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at Urbana-Champaign, Urbana, IL 61801 +1 (917) 345-6125 mchang21@uiuc.edu http://flake.cs.uiuc.edu/~mchang21 Research

### Searching Questions by Identifying Question Topic and Question Focus

Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn

Routing Questions for Collaborative Answering in Community Question Answering Shuo Chang Dept. of Computer Science University of Minnesota Email: schang@cs.umn.edu Aditya Pal IBM Research Email: apal@us.ibm.com

### Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

### SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis Fangtao Li 1, Sheng Wang 2, Shenghua Liu 3 and Ming Zhang

### Incorporate Credibility into Context for the Best Social Media Answers

PACLIC 24 Proceedings 535 Incorporate Credibility into Context for the Best Social Media Answers Qi Su a,b, Helen Kai-yun Chen a, and Chu-Ren Huang a a Department of Chinese & Bilingual Studies, The Hong

### Summarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help?

Summarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help? Sumit Bhatia 1, Prakhar Biyani 2 and Prasenjit Mitra 2 1 IBM Almaden Research Centre, 650 Harry Road, San Jose, CA 95123,

21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions

### Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of

### PRODUCT REVIEW RANKING SUMMARIZATION

PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

### Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

### Guest Editors Introduction: Machine Learning in Speech and Language Technologies

Guest Editors Introduction: Machine Learning in Speech and Language Technologies Pascale Fung (pascale@ee.ust.hk) Department of Electrical and Electronic Engineering Hong Kong University of Science and

### Learning to Suggest Questions in Online Forums

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Learning to Suggest Questions in Online Forums Tom Chao Zhou 1, Chin-Yew Lin 2,IrwinKing 3, Michael R. Lyu 1, Young-In Song 2

### A survey on click modeling in web search

A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers

Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers Sonal Gupta Christopher Manning Natural Language Processing Group Department of Computer Science Stanford University Columbia

### COMMUNITY QUESTION ANSWERING (CQA) services, Improving Question Retrieval in Community Question Answering with Label Ranking

Improving Question Retrieval in Community Question Answering with Label Ranking Wei Wang, Baichuan Li Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong

### A Framework to Predict the Quality of Answers with Non-Textual Features

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon 1, W. Bruce Croft 1, Joon Ho Lee 2 and Soyeon Park 3 1 Center for Intelligent Information Retrieval, University of Massachusetts-Amherst,

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### End-to-End Sentiment Analysis of Twitter Data

End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India apoorv@cs.columbia.edu,

### Semi-Supervised Learning for Blog Classification

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,

### CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,

### Research on News Video Multi-topic Extraction and Summarization

International Journal of New Technology and Research (IJNTR) ISSN:2454-4116, Volume-2, Issue-3, March 2016 Pages 37-39 Research on News Video Multi-topic Extraction and Summarization Di Li, Hua Huo Abstract

Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Improving Web Image

### The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

### Early Detection of Potential Experts in Question Answering Communities

Early Detection of Potential Experts in Question Answering Communities Aditya Pal 1, Rosta Farzan 2, Joseph A. Konstan 1, and Robert Kraut 2 1 Dept. of Computer Science and Engineering, University of Minnesota

### Evaluating and Predicting Answer Quality in Community QA

Evaluating and Predicting Answer Quality in Community QA Chirag Shah Jefferey Pomerantz School of Communication & Information (SC&I) School of Information & Library Science (SILS) Rutgers, The State University

### Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

### SEARCHING QUESTION AND ANSWER ARCHIVES

SEARCHING QUESTION AND ANSWER ARCHIVES A Dissertation Presented by JIWOON JEON Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for

### PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL

Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra

### Search Result Optimization using Annotators

Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

### Personalizing Image Search from the Photo Sharing Websites

Personalizing Image Search from the Photo Sharing Websites Swetha.P.C, Department of CSE, Atria IT, Bangalore swethapc.reddy@gmail.com Aishwarya.P Professor, Dept.of CSE, Atria IT, Bangalore aishwarya_p27@yahoo.co.in

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Keyphrase Extraction for Scholarly Big Data

Keyphrase Extraction for Scholarly Big Data Cornelia Caragea Computer Science and Engineering University of North Texas July 10, 2015 Scholarly Big Data Large number of scholarly documents on the Web PubMed

### SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

### Multimedia Answer Generation from Web Information

Multimedia Answer Generation from Web Information Avantika Singh Information Science & Engg, Abhimanyu Dua Information Science & Engg, Gourav Patidar Information Science & Engg Pushpalatha M N Information

### Significance of Sentence Ordering in Multi Document Summarization

Computing For Nation Development, March 10 11, 2011 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi Significance of Sentence Ordering in Multi Document Summarization Rashid

### Evaluation of Features for Sentence Extraction on Different Types of Corpora

Evaluation of Features for Sentence Extraction on Different Types of Corpora Chikashi Nobata, Satoshi Sekine and Hitoshi Isahara Communications Research Laboratory 3-5 Hikaridai, Seika-cho, Soraku-gun,

### I, ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. & A.

ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. Micheal Fernandas* & A. Anitha** * PG Scholar, Department of Master of Computer Applications, Dhanalakshmi Srinivasan Engineering College, Perambalur, Tamilnadu

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### REVIEW ON QUERY CLUSTERING ALGORITHMS FOR SEARCH ENGINE OPTIMIZATION

Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING

### Interactive Chinese Question Answering System in Medicine Diagnosis

Interactive Chinese ing System in Medicine Diagnosis Xipeng Qiu School of Computer Science Fudan University xpqiu@fudan.edu.cn Jiatuo Xu Shanghai University of Traditional Chinese Medicine xjt@fudan.edu.cn

### Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

### Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15

### Japanese Opinion Extraction System for Japanese Newspapers Using Machine-Learning Method

Japanese Opinion Extraction System for Japanese Newspapers Using Machine-Learning Method Toshiyuki Kanamaru, Masaki Murata, and Hitoshi Isahara National Institute of Information and Communications Technology

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums

Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums Shilin Ding Gao Cong Chin-Yew Lin Xiaoyan Zhu Department of Computer Science and Technology, Tsinghua University,

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including

### Dissecting the Learning Behaviors in Hacker Forums

Dissecting the Learning Behaviors in Hacker Forums Alex Tsang Xiong Zhang Wei Thoo Yue Department of Information Systems, City University of Hong Kong, Hong Kong inuki.zx@gmail.com, xionzhang3@student.cityu.edu.hk,

### Quality-Aware Collaborative Question Answering: Methods and Evaluation

Quality-Aware Collaborative Question Answering: Methods and Evaluation ABSTRACT Maggy Anastasia Suryanto School of Computer Engineering Nanyang Technological University magg0002@ntu.edu.sg Aixin Sun School

### Web Content Summarization Using Social Bookmarking Service

ISSN 1346-5597 NII Technical Report Web Content Summarization Using Social Bookmarking Service Jaehui Park, Tomohiro Fukuhara, Ikki Ohmukai, and Hideaki Takeda NII-2008-006E Apr. 2008 Jaehui Park School

### Shape Feature Extraction of Wheat Leaf Disease based on Invariant Moment Theory

Shape Feature Extraction of Wheat Leaf Disease based on Invariant Moment Theory Zhihua Diao 1,1, Anping Zheng 1, Yuanyuan Wu 1 1 Zhengzhou University of Light Industry, College of Electric and Information

### Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

### A Systematic Cross-Comparison of Sequence Classifiers

A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel grurgrur@gmail.com, feldman@cs.biu.ac.il,

### Web Database Integration

Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

### The Use of Categorization Information in Language Models for Question Retrieval

The Use of Categorization Information in Language Models for Question Retrieval Xin Cao, Gao Cong, Bin Cui, Christian S. Jensen and Ce Zhang Department of Computer Science, Aalborg University, Denmark

### Programming Tools based on Big Data and Conditional Random Fields

Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up,

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives 7 XIN CAO and GAO CONG, Nanyang Technological University BIN CUI, Peking University CHRISTIAN S.

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering

Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering Guangyou Zhou 1, Tingting He 1, Jun Zhao 2, and Po Hu 1 1 School of Computer, Central China Normal

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

### Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach -

Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach - Philipp Sorg and Philipp Cimiano Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe, Germany {sorg,cimiano}@aifb.uni-karlsruhe.de

### THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

### Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

### Identifying Best Bet Web Search Results by Mining Past User Behavior

Identifying Best Bet Web Search Results by Mining Past User Behavior Eugene Agichtein Microsoft Research Redmond, WA, USA eugeneag@microsoft.com Zijian Zheng Microsoft Corporation Redmond, WA, USA zijianz@microsoft.com

### RNN for Sentiment Analysis: Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

RNN for Sentiment Analysis: Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Borui(Athena) Ye University of Waterloo borui.ye@uwaterloo.ca July 15, 2015 1 / 26 Overview 1 Introduction

### Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

### Challenges of Cloud Scale Natural Language Processing

Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent

### A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives

A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives Xin Cao 1, Gao Cong 1, 2, Bin Cui 3, Christian S. Jensen 1 1 Department of Computer