Word Embedding-based Antonym Detection using Thesauri and Distributional Information
|
|
|
- Magdalen Warner
- 9 years ago
- Views:
Transcription
1 Word Embedding-based Antonym Detection using Thesauri and Distributional Information Masataka Ono, Makoto Miwa, Yutaka Sasaki Department of Advanced Science and Technology Toyota Technological Institute Hisakata, Tempaku-ku, Nagoya, Japan {sd12412, makoto-miwa, Abstract This paper proposes a novel approach to train word embeddings to capture antonyms. Word embeddings have shown to capture synonyms and analogies. Such word embeddings, however, cannot capture antonyms since they depend on the distributional hypothesis. Our approach utilizes supervised synonym and antonym information from thesauri, as well as distributional information from large-scale unlabelled text data. The evaluation results on the GRE antonym question task show that our model outperforms the state-of-the-art systems and it can answer the antonym questions in the F-score of 89%. 1 Introduction Word embeddings have shown to capture synonyms and analogies (Mikolov et al., 2013b; Mnih and Kavukcuoglu, 2013; Pennington et al., 2014). Word embeddings have also been effectively employed in several tasks such as named entity recognition (Turian et al., 2010; Guo et al., 2014), adjectival scales (Kim and de Marneffe, 2013) and text classification (Le and Mikolov, 2014). Such embeddings trained based on distributional hypothesis (Harris, 1954), however, often fail to recognize antonyms since antonymous words, e.g. strong and weak, occur in similar contexts. Recent studies focuses on learning word embeddings for specific tasks, such as sentiment analysis (Tang et al., 2014) and dependency parsing (Bansal et al., 2014; Chen et al., 2014). These motivate a new approach to learn word embeddings to capture antonyms. Recent studies on antonym detection have shown that thesauri information are useful in distinguishing antonyms from synonyms. The state-of-the-art systems achieved over 80% in F-score on GRE antonym tests. Yih et al. (2012) proposed a Polarity Inducing Latent Semantic Analysis (PILSA) that incorporated polarity information in two thesauri in constructing a matrix for latent semantic analysis. They additionally used context vectors to cover the out-ofvocabulary words; however, they did not use word embeddings. Recently, Zhang et al. (2014) proposed a Bayesian Probabilistic Tensor Factorization (BPTF) model to combine thesauri information and existing word embeddings. They showed that the usefulness of word embeddings but they used pretrained word embeddings. In this paper, we propose a novel approach to construct word embeddings that can capture antonyms. Unlike the previous approaches, our approach directly trains word embeddings to represent antonyms. We propose two models: a Word Embedding on Thesauri information (WE-T) model and a Word Embeddings on Thesauri and Distributional information (WE-TD) model. The WE-T model receives supervised information from synonym and antonym pairs in thesauri and infers the relations of the other word pairs in the thesauri from the supervised information. The WE-TD model incorporates corpus-based contextual information (distributional information) into the WE-T model, which enables the calculation of the similarities among invocabulary and out-of-vocabulary words. 984 Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, pages , Denver, Colorado, May 31 June 5, c 2015 Association for Computational Linguistics
2 antonym synonym WE-T ( 2.1) WE-TD ( 2.2) Relations in Thesauri Relations inferred by WE-T co-occurrence nucleate Distributional information nucleate Relations inferred by WE-TD Figure 1: Overview of our approach. When we use the thesauri directly, and are known to be antonymous and and are known to be synonymous, but the remaining relations are unknown. WE-T infers indirect relations among words in thesauri. Furthermore, WE-TD incorporates distributional information, and the relatedness among in-vocabulary and out-of-vocabulary words (nucleate here) are obtained. 2 Word embeddings for antonyms This section explains how we train word embeddings from synonym and antonym pairs in thesauri. We then explain how to incorporate distributional information to cover out-of-vocabulary words. Figure 1 illustrates the overview of our approach. 2.1 Word embeddings using thesauri information We first introduce a model to train word embeddings using thesauri information alone, which is called the WE-T model. We embed vectors to words in thesauri and train vectors to represent synonym and antonym pairs in the thesauri. More concretely, we train the vectors by maximizing the following objective function: log σ(sim(w, s)) w V s S w +α log σ( sim(w, a)) w V a A w (1) V is the vocabulary in thesauri. S w is a set of synonyms of a word w, and A w is a set of antonyms of 1 a word w. σ(x) is the sigmoid function. α is 1+e x a parameter to balance the effects of synonyms and antonyms. sim(w 1, w 2 ) is a scoring function that measures a similarity between two vectors embedded to the corresponding words w 1 and w 2. We use the following asymmetric function for the scoring function: sim(w 1, w 2 ) = v w1 v w2 + b w1 (2) v w is a vector embedded to a word w and b w is a scalar bias term corresponding to w. This similarity score ranges from minus infinity to plus infinity and the sigmoid function in Equation (1) scales the score into the [0, 1] range. The first term of Equation (1) denotes the sum of the similarities between synonym pairs. The second term of Equation (1) denotes the sum of the dissimilarities between antonym pairs. By maximizing this objective, synonym and antonym pairs are tuned to have high and low similarity scores respectively, and indirect antonym pairs, e.g., synonym of antonym, will also have low similarity scores since the embeddings of the words in the pairs will be dissimilar. We use AdaGrad (Duchi et al., 2011) to maximize this objective function. AdaGrad is an online learning method using a gradient-based update with automatically-determined learning rate. 2.2 Word embeddings using thesauri and distributional information Now we explain a model to incorporate corpusbased distributional information into the WE-T model, which is called the WE-TD model. We hereby introduce Skip-Gram with Negative Sampling (SGNS) (Mikolov et al., 2013a), which the WE-TD model bases on. Levy and Goldberg (2014) shows the objective function for SGNS can 985
3 be rewritten as follows. {#(w, c) log σ(sim(w, c)) w V c V + k#(w)p 0 (c) log σ( sim(w, c))} (3) The first term represents the co-occurrence pairs within a context window of C words preceding and following target words. #(w, c) stands for the number of appearances of a target word w and its context c. The second term represents the negative sampling. k is a number of negatively sampled words for each target word. # p (w) is the number of appearances of w as a target word, and its negative context c is sampled from a modified unigram distribution P 0 (Mikolov et al., 2013a). We employ the subsampling (Mikolov et al., 2013a), which discards words according to the probability of P (w) = 1 t p(w). p(w) is the proportion of occurrences of a word w in the corpus, and t is a threshold to control the discard. When we use a large-scale corpus directly, the effects of rare words are dominated by the effects of frequent words. Subsampling alleviates this problem by discarding frequent words more often than rare words. To incorporate the distributional information into the WE-T model, we propose the following objective function, which simply adds this objective function to Equation 1 with an weight β: β{ log σ(sim(w, s)) w V s S w +α log σ( sim(w, a))} w V a A w + (4) {#(w, c) log σ(sim(w, c)) w V c V +k#(w)p 0 (c) log σ( sim(w, c))} This function can be further arranged as {A w,c log σ(sim(w, c)) w V c V + B w,c log σ( sim(w, c))} (5) Here, the coefficients A w,c and B w,c are sums of corresponding coefficients in Equation 4. These terms can be pre-calculated by using the number of appearances of contextual word pairs, unigram distributions, and synonym and antonym pairs in thesauri. The objective is maximized by using AdaGrad. We skip some updates according to the coefficients A w,c and B w,c to speed up the computation; we ignore the terms with extremely small coefficients (< 10 5 ) and we sample the terms according to the coefficients when the coefficients are less than 1. 3 Experiments 3.1 Evaluation settings This section explains the task setting, resource for training, parameter settings, and evaluation metrics GRE antonym question task We evaluate our models and compare them with other existing models using GRE antonym question dataset originally provided by Mohammad et al. (2008). This dataset is widely used to evaluate the performance of antonym detection. Each question has a target word and five candidate words, and the system has to choose the most contrasting word to the target word from the candidate words (Mohammad et al., 2013). All the words in the questions are single-token words. This dataset consists of two parts, development and test, and they have 162 and 950 questions, respectively. Since the test part contains 160 development data set, We will also report results on 790 ( ) questions following Mohammad et al. (2013). In evaluating our models on the questions, we first calculated similarities between a target word and its candidate words. The similarities were calculated by averaging asymmetric similarity scores using the similarity function in Equation 2. We then chose a word which had the lowest similarity among them. When the model did not contain any words in a question, the question was left unanswered Resource for training For supervised dataset, we used synonym and antonym pairs in two thesauri: WordNet (Miller, 1995) and Roget (Kipfer, 2009). These pairs were provided by Zhang et al. (2014) 1. There were 52,760 entries (words), each of which had 11.7 synonyms on average, and 21,319 entries, each of which had 6.5 antonyms on average. 1 word-representations-bptf 986
4 Dev. Set Test Set (950) Test Set (790) Prec. Rec. F Prec. Rec. F Prec. Rec. F Encarta lookup WordNet & Roget lookup WE-T WordNet + Affix heuristics + Adjacent category annotation WE-D Encarta PILSA + S2Net + Embedding WordNet & Roget BPTF WE-TD Table 1: Results on the GRE antonym question task. is from Yih et al. (2012), is from Zhang et al. (2014), and is from Mohammad et al. (2013). slightly differs from the result in Zhang et al. (2014) since thesauri can contain multiple candidates as antonyms and the answer is randomly selected for the candidates. Error Type Description # Example Errors Target Gold Predicted Contrasting Predicted answer is contrasting, reticence loquaciousness storm 7 but not antonym. dussuade exhort extol Degree Both answers are antonyms, but gold has a higher degree of contrast. 3 postulate verify reject Incorrect gold Gold answer is incorrect. 2 flinch extol advance Wrong Gold and predicted answers are expansion both in the expanded thesauri. 1 hapless fortunate happy Incorrect Predicted answer is not contrasting. 1 sessile obile ceasing Total 14 Table 2: Error types by WE-TD on the development set. We obtained raw texts from Wikipedia on November 2013 for unsupervised dataset. We lowercased all words in the text Parameter settings The parameters were tuned using the development part of the dataset. In training the WE-T model, the dimension of embeddings was set to 300, the number of iteration of AdaGrad was set to 20, and the initial learning rate of AdaGrad was set to α in Equation 1 were set to 3.2, according to the proportion of the numbers of synonym and antonym pairs in the thesauri. In addition to these parameters, when we trained the WE-TD model, we added the top 100,000 frequent words appearing in Wikipedia into the vocabulary. The parameter β was set to 100, the number of negative sampling k was set as 5, the context window size C was set to 5, the threshold for subsampling 2 was set to Evaluation metrics We used the F-score as a primary evaluation metric following Zhang et al. (2014). The F-score is the harmonic mean of precision and recall. Precision is the proportion of correctly answered questions over answered questions. Recall is the proportion of correctly answered questions over the questions. 2 This small threshold is because this was used to balance the effects of supervised and unsupervised information. 987
5 3.2 Results Table 1 shows the results of our models on the GRE antonym question task. This table also shows the results of previous systems (Yih et al., 2012; Zhang et al., 2014; Mohammad et al., 2013) and models trained on Wikipedia without thesauri (WE-D) for the comparison. The low performance of WE-D illuminates the problem of distributional hypothesis. Word embeddings trained by using distributional information could not distinguish antonyms from synonyms. Our WE-T model achieved higher performance than the baselines that only look up thesauri. In the thesauri information we used, the synonyms and antonyms have already been extended for the original thesauri by some rules such as ignoring part of speech (Zhang et al., 2014). This extension contributes to the larger coverage than the original synonym and antonym pairs in the thesauri. This improvement shows that our model not only captures the information of synonyms and antonyms provided by the supervised information but also infers the relations of other word pairs more effectively than the rule-based extension. Our WE-TD model achieved the highest score among the models that use both thesauri and distributional information. Furthermore, our model has small differences in the results on the development and test parts compared to the other models. 3.3 Error Analysis We analyzed the 14 errors on the development set, and summarized the result in Table 2. Half of the errors (i.e., seven errors) were caused in the case that the predicted word is contrasting to some extent but not antonym ( Contrasting ). This might be caused by some kind of semantic drift. In order to predict these gold answers correctly, constraints of the words, such as part of speech and selectional preferences, need to be used. For example, venerate usually takes person as its object, while magnify takes god. Three of the errors were caused by the degree of contrast of the gold and the predicted answers ( Degree ). The predicted word can be regarded as an antonym but the gold answer is more appropriate. This is because our model does not consider the degree of antonymy, which is out of our focus. One of the questions in the errors had an incorrect gold answer ( Incorrect gold ). We found that in one case both gold and predicted answers are in the expanded antonym dictionary ( Wrong expansion ). In expanding dictionary entries, the gold and predicted answers were both included in the word list of an antonym entries. In one case, the predicted answer was simply wrong ( Incorrect ). 4 Conclusions This paper proposed a novel approach that trains word embeddings to capture antonyms. We proposed two models: WE-T and WE-TD models. WE- T trains word embeddings on thesauri information, and WE-TD incorporates distributional information into the WE-T model. The evaluation on the GRE antonym question task shows that WE-T can achieve a higher performance over the thesauri lookup baselines and, by incorporating distributional information, WE-TD showed 89% in F-score, which outperformed the conventional state-of-the-art performances. As future work, we plan to extend our approaches to obtain word embeddings for other semantic relations (Gao et al., 2014). References Mohit Bansal, Kevin Gimpel, and Karen Livescu Tailoring continuous word representations for dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages , Baltimore, Maryland, June. Association for Computational Linguistics. Wenliang Chen, Yue Zhang, and Min Zhang Feature embedding for dependency parsing. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages , Dublin, Ireland, August. Dublin City University and Association for Computational Linguistics. John Duchi, Elad Hazan, and Yoram Singer Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12: , July. Bin Gao, Jiang Bian, and Tie-Yan Liu Wordrep: A benchmark for research on learning word representations. ICML 2014 Workshop on Knowledge-Powered Deep Learning for Text Mining. 988
6 Jiang Guo, Wanxiang Che, Haifeng Wang, and Ting Liu Revisiting embedding features for simple semisupervised learning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages , Doha, Qatar, October. Association for Computational Linguistics. Zellig S Harris Distributional structure. Word, 10(23): Joo-Kyung Kim and Marie-Catherine de Marneffe Deriving adjectival scales from continuous space word representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages , Seattle, Washington, USA, October. Association for Computational Linguistics. Barbara Ann Kipfer Roget s 21st Century Thesaurus. Philip Lief Group, third edition edition. Quoc Le and Tomas Mikolov Distributed representations of sentences and documents. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML- 14), pages JMLR Workshop and Conference Proceedings. Omer Levy and Yoav Goldberg Neural word embedding as implicit matrix factorization. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages Curran Associates, Inc. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013a. Distributed representations of words and phrases and their compositionality. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages Curran Associates, Inc. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013b. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages , Atlanta, Georgia, June. Association for Computational Linguistics. George A. Miller Wordnet: A lexical database for english. Commun. ACM, 38(11):39 41, November. Andriy Mnih and Koray Kavukcuoglu Learning word embeddings efficiently with noise-contrastive estimation. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages Curran Associates, Inc. Saif Mohammad, Bonnie Dorr, and Graeme Hirst Computing word-pair antonymy. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages , Honolulu, Hawaii, October. Association for Computational Linguistics. Saif M. Mohammad, Bonnie J. Dorr, Graeme Hirst, and Peter D. Turney Computing lexical contrast. Computational Linguistics, 39(3): , September. Jeffrey Pennington, Richard Socher, and Christopher Manning Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages , Doha, Qatar, October. Association for Computational Linguistics. Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages , Baltimore, Maryland, June. Association for Computational Linguistics. Joseph Turian, Lev Ratinov, and Yoshua Bengio Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 10, pages , Stroudsburg, PA, USA. Association for Computational Linguistics. Wen-tau Yih, Geoffrey Zweig, and John Platt Polarity inducing latent semantic analysis. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages , Jeju Island, Korea, July. Association for Computational Linguistics. Jingwei Zhang, Jeremy Salwen, Michael Glass, and Alfio Gliozzo Word semantic representations using bayesian probabilistic tensor factorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages Association for Computational Linguistics. 989
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Distributed Representations of Sentences and Documents
Quoc Le Tomas Mikolov Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043 [email protected] [email protected] Abstract Many machine learning algorithms require the input to be represented as
Neural Word Embedding as Implicit Matrix Factorization
Neural Word Embedding as Implicit Matrix Factorization Omer Levy Department of Computer Science Bar-Ilan University [email protected] Yoav Goldberg Department of Computer Science Bar-Ilan University [email protected]
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
Gated Neural Networks for Targeted Sentiment Analysis
Gated Neural Networks for Targeted Sentiment Analysis Meishan Zhang 1,2 and Yue Zhang 2 and Duy-Tin Vo 2 1. School of Computer Science and Technology, Heilongjiang University, Harbin, China 2. Singapore
Leveraging Frame Semantics and Distributional Semantic Polieties
Leveraging Frame Semantics and Distributional Semantics for Unsupervised Semantic Slot Induction in Spoken Dialogue Systems (Extended Abstract) Yun-Nung Chen, William Yang Wang, and Alexander I. Rudnicky
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
A Survey on Product Aspect Ranking Techniques
A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian
Recurrent Convolutional Neural Networks for Text Classification
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Recurrent Convolutional Neural Networks for Text Classification Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao National Laboratory of
Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.
Ming-Wei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at Urbana-Champaign, Urbana, IL 61801 +1 (917) 345-6125 [email protected] http://flake.cs.uiuc.edu/~mchang21 Research
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
DiegoLab16 at SemEval-2016 Task 4: Sentiment Analysis in Twitter using Centroids, Clusters, and Sentiment Lexicons
DiegoLab16 at SemEval-016 Task 4: Sentiment Analysis in Twitter using Centroids, Clusters, and Sentiment Lexicons Abeed Sarker and Graciela Gonzalez Department of Biomedical Informatics Arizona State University
GloVe: Global Vectors for Word Representation
GloVe: Global Vectors for Word Representation Jeffrey Pennington, Richard Socher, Christopher D. Manning Computer Science Department, Stanford University, Stanford, CA 94305 [email protected], [email protected],
Personalized Hierarchical Clustering
Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de
Summarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help?
Summarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help? Sumit Bhatia 1, Prakhar Biyani 2 and Prasenjit Mitra 2 1 IBM Almaden Research Centre, 650 Harry Road, San Jose, CA 95123,
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Distributed Representations of Words and Phrases and their Compositionality
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov [email protected] Ilya Sutskever [email protected] Kai Chen [email protected] Greg Corrado [email protected] Jeffrey
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
Sentiment Lexicons for Arabic Social Media
Sentiment Lexicons for Arabic Social Media Saif M. Mohammad 1, Mohammad Salameh 2, Svetlana Kiritchenko 1 1 National Research Council Canada, 2 University of Alberta [email protected], [email protected],
Probabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
Package syuzhet. February 22, 2015
Type Package Package syuzhet February 22, 2015 Title Extracts Sentiment and Sentiment-Derived Plot Arcs from Text Version 0.2.0 Date 2015-01-20 Maintainer Matthew Jockers Extracts
BEER 1.1: ILLC UvA submission to metrics and tuning task
BEER 1.1: ILLC UvA submission to metrics and tuning task Miloš Stanojević ILLC University of Amsterdam [email protected] Khalil Sima an ILLC University of Amsterdam [email protected] Abstract We describe
arxiv:1505.07909v2 [cs.cl] 17 Jun 2015
Solving Verbal Comprehension Questions in IQ Test by Knowledge-Powered Word Embedding arxiv:1505.07909v2 [cs.cl] 17 Jun 2015 ABSTRACT Jiang Bian Microsoft Research 13F, Bldg 2, No. 5, Danling St Beijing,
VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems
VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems Yonatan Belinkov, Mitra Mohtarami, Scott Cyphers, James Glass MIT Computer Science and Artificial
End-to-End Sentiment Analysis of Twitter Data
End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India [email protected],
Sentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: [email protected] Mihaela Cocea Email: [email protected] Sanaz Fallahkhair Email:
NILC USP: A Hybrid System for Sentiment Analysis in Twitter Messages
NILC USP: A Hybrid System for Sentiment Analysis in Twitter Messages Pedro P. Balage Filho and Thiago A. S. Pardo Interinstitutional Center for Computational Linguistics (NILC) Institute of Mathematical
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Linguistic Regularities in Continuous Space Word Representations
Linguistic Regularities in Continuous Space Word Representations Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig Microsoft Research Redmond, WA 98052 Abstract Continuous space language models have recently
A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis
A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse Features Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering
Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering Guangyou Zhou 1, Tingting He 1, Jun Zhao 2, and Po Hu 1 1 School of Computer, Central China Normal
Latent Dirichlet Markov Allocation for Sentiment Analysis
Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Large Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. [email protected] Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
User Modeling with Neural Network for Review Rating Prediction
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) User Modeling with Neural Network for Review Rating Prediction Duyu Tang, Bing Qin, Ting Liu, Yuekui
CMUQ@Qatar:Using Rich Lexical Features for Sentiment Analysis on Twitter
CMUQ@Qatar:Using Rich Lexical Features for Sentiment Analysis on Twitter Sabih Bin Wasi, Rukhsar Neyaz, Houda Bouamor, Behrang Mohit Carnegie Mellon University in Qatar {sabih, rukhsar, hbouamor, behrang}@cmu.edu
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Transform-based Domain Adaptation for Big Data
Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
Open-Domain Name Error Detection using a Multi-Task RNN
Open-Domain Name Error Detection using a Multi-Task RNN Hao Cheng Hao Fang Mari Ostendorf Department of Electrical Engineering University of Washington {chenghao,hfang,ostendor}@uw.edu Abstract Out-of-vocabulary
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Entity-centric Sentiment Analysis on Twitter data for the Potuguese Language
Entity-centric Sentiment Analysis on Twitter data for the Potuguese Language Marlo Souza 1, Renata Vieira 2 1 Instituto de Informática Universidade Federal do Rio Grande do Sul - UFRGS Porto Alegre RS
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Sentiment Classification on Polarity Reviews: An Empirical Study Using Rating-based Features
Sentiment Classification on Polarity Reviews: An Empirical Study Using Rating-based Features Dai Quoc Nguyen and Dat Quoc Nguyen and Thanh Vu and Son Bao Pham Faculty of Information Technology University
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
Fine-grained German Sentiment Analysis on Social Media
Fine-grained German Sentiment Analysis on Social Media Saeedeh Momtazi Information Systems Hasso-Plattner-Institut Potsdam University, Germany [email protected] Abstract Expressing opinions
Extracting Events from Web Documents for Social Media Monitoring using Structured SVM
IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85A/B/C/D, No. xx JANUARY 20xx Letter Extracting Events from Web Documents for Social Media Monitoring using Structured SVM Yoonjae Choi,
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Issues in evaluating semantic spaces using word analogies
Issues in evaluating semantic spaces using word analogies Tal Linzen LSCP & IJN École Normale Supérieure PSL Research University [email protected] Abstract The offset method for solving word analogies
Kea: Expression-level Sentiment Analysis from Twitter Data
Kea: Expression-level Sentiment Analysis from Twitter Data Ameeta Agrawal Computer Science and Engineering York University Toronto, Canada [email protected] Aijun An Computer Science and Engineering
SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses from the College of Business Administration Business Administration, College of 4-1-2012 SENTIMENT
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
EFFICIENTLY PROVIDE SENTIMENT ANALYSIS DATA SETS USING EXPRESSIONS SUPPORT METHOD
EFFICIENTLY PROVIDE SENTIMENT ANALYSIS DATA SETS USING EXPRESSIONS SUPPORT METHOD 1 Josephine Nancy.C, 2 K Raja. 1 PG scholar,department of Computer Science, Tagore Institute of Engineering and Technology,
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Web Information Mining and Decision Support Platform for the Modern Service Industry
Web Information Mining and Decision Support Platform for the Modern Service Industry Binyang Li 1,2, Lanjun Zhou 2,3, Zhongyu Wei 2,3, Kam-fai Wong 2,3,4, Ruifeng Xu 5, Yunqing Xia 6 1 Dept. of Information
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Semi-supervised Sentiment Classification in Resource-Scarce Language: A Comparative Study
一 般 社 団 法 人 電 子 情 報 通 信 学 会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信 学 技 報 IE IC E Technical Report Semi-supervised Sentiment Classification in Resource-Scarce Language:
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Web opinion mining: How to extract opinions from blogs?
Web opinion mining: How to extract opinions from blogs? Ali Harb [email protected] Mathieu Roche LIRMM CNRS 5506 UM II, 161 Rue Ada F-34392 Montpellier, France [email protected] Gerard Dray [email protected]
Finding Advertising Keywords on Web Pages. Contextual Ads 101
Finding Advertising Keywords on Web Pages Scott Wen-tau Yih Joshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University Contextual Ads 101 Publisher s website Digital Camera Review The
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
Bayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
Building A Vocabulary Self-Learning Speech Recognition System
INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes
An Iterative Link-based Method for Parallel Web Page Mining
An Iterative Link-based Method for Parallel Web Page Mining Le Liu 1, Yu Hong 1, Jun Lu 2, Jun Lang 2, Heng Ji 3, Jianmin Yao 1 1 School of Computer Science & Technology, Soochow University, Suzhou, 215006,
arxiv:1506.04135v1 [cs.ir] 12 Jun 2015
Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1 - Viadeo
System Behavior Analysis by Machine Learning
CSC456 OS Survey Yuncheng Li [email protected] December 6, 2012 Table of contents 1 Motivation Background 2 3 4 Table of Contents Motivation Background 1 Motivation Background 2 3 4 Scenarios Motivation
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Customer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun [email protected] Mohamed Salah Gouider [email protected] Lamjed Ben Said [email protected] ABSTRACT
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Emoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Automatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
Sentiment Classification. in a Nutshell. Cem Akkaya, Xiaonan Zhang
Sentiment Classification in a Nutshell Cem Akkaya, Xiaonan Zhang Outline Problem Definition Level of Classification Evaluation Mainstream Method Conclusion Problem Definition Sentiment is the overall emotion,
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), [email protected], Velalar College of Engineering and Technology,
