TUGAS: Exploiting Unlabelled Data for Twitter Sentiment Analysis
|
|
|
- Mitchell Charles
- 10 years ago
- Views:
Transcription
1 TUGAS: Exploiting Unlabelled Data for Twitter Sentiment Analysis Silvio Amir +, Miguel Almeida, Bruno Martins +, João Filgueiras +, and Mário J. Silva + + INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal Priberam Labs, Alameda D. Afonso Henriques, 41, 2 o, Lisboa, Portugal Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Portugal [email protected], [email protected], [email protected] [email protected], [email protected] Abstract This paper describes our participation in the message polarity classification task of SemEval We focused on exploiting unlabeled data to improve accuracy, combining features leveraging word representations with other, more common features, based on word tokens or lexicons. We analyse the contribution of the different features, concluding that unlabeled data yields significant improvements. 1 Introduction Research in exploiting social media for measuring public opinion, evaluating popularity of products and brands, anticipating stock-market trends, or predicting elections showed promising results (O Connor et al., 2010; Mitchell et al., 2013). However, this type of content poses a particularly challenging problem for text analysis systems. Typical messages show heavy use of Internet slang, emoticons and other abbreviations and discourse conventions. The lexical variation introduced by this creative use of language, together with the unconventional spelling and occasional typos, leads to very large vocabularies. On the other hand, messages are very short, and therefore word feature representations tend to become very sparse, degrading the performance of machine learned classifiers. The growing interest in this problem motivated the creation of a shared task for Twitter Sentiment Analysis in the 2013 edition of SemEval. The Message Polarity Classification task was formalized as follows: Given a message, decide whether the message is of positive, negative, or neutral sentiment. For messages conveying both a positive This work is licensed under a Creative Commons Attribution 4.0 International Licence. Page numbers and proceedings footer are added by the organisers. Licence details: http: //creativecommons.org/licenses/by/4.0/ Positive Neutral Negative Train Tweets Tweets SMS Tweets Sarcasm LiveJournal Table 1: Number of examples per class in each SemEval dataset. The first row represents all training data; the other rows are sets used for testing. and negative sentiment, whichever is the stronger sentiment should be chosen (Nakov et al., 2013). We describe our participation on the 2014 edition of this task, for which a set of manually labelled messages was created. Complying with the Twitter policies for data access, the corpus was distributed as a list of message IDs and each participant was responsible for downloading the actual tweets. Using the provided script, we collected a training set with 8604 tweets. After submission, the 2014 test sets were also made available. Along with the Tweets 2014 test set, evaluation was also performed on a set of tweets with sarcasm, on a set of LiveJournal blog entries, and on sets of tweets and SMS messages from the 2013 edition of the task. Table 1 shows the class distribution for each of these datasets. In the 2013 edition (task 2B), the NRC-Canada system (Mohammad et al., 2013) earned first place by scoring 69.02% on the Official SemEval metric (see Section 4) with a significant margin with respect to the other systems: the second (Günther and Furrer, 2013) and third (Reckman et al., 2013) best systems scored 65.27% and 64.86%, respectively. The main novelty in the NRC-Canada system was the use of sentiment lexicons, specific for the Twitter domain, generated from unlabeled tweets using emoticons and hashtags as indicators of sentiment. They found that these lexicons had a strong impact on the results more than word and 673 Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages , Dublin, Ireland, August 23-24, 2014.
2 character n-grams. The automatically induced lexicons are a way to use information from unlabeled data to aid in the classification task. In our approach, we take this reasoning further, and focus on the impact of various ways to incorporate knowledge from unlabeled data. This allows us to mimic many realworld scenarios where labelled data is scarce but unlabeled data is plentiful. 2 Word Representations In text classification it is common to represent documents as bags-of-words, i.e., as unordered collections of words. However, in the case of very short social media texts, these representations become less effective, as they lead to increased data sparseness. We focused our experiments in comparing and complementing these approaches with denser representations, which we now describe. 2.1 Bag-Of-Words and BM25 In a representation based on bags-of-words, each message is represented as a vector m = {w 1, w 2,..., w n } R V, where V is the size of the vocabulary. In order to have weights that reflect how relevant a word is to each of the classes, we weighted the individual terms according to the BM25 heuristic (Paltoglou and Thelwall, 2010): ( (Np df BM25(w i ) = tf i log i,p +s) df i,n +s (N n df i,n +s) df i,p +s), (1) where tf i represents the frequency of term i in the message, N a is the size of corpus a, df i,a is the document frequency of term i in the corpus a (i.e., in one of two subsets for the training data, corresponding to either positive or negative messages), and s is a smoothing constant, which we set to 0.5. This term weighting function was previously shown to be effective for sentiment analysis. 2.2 Brown Clusters Brown et al. (1992) proposed a greedy agglomerative hierarchical clustering procedure that groups words to maximize the mutual information of bigrams. Clusters are initialized as consisting of a single word each, and are then greedily merged according to a mutual information criterion, to form a lower-dimensional representation of a vocabulary. The hierarchical nature of the clustering allows words to be represented at different levels in the hierarchy. This approach provides a denser representation of the messages, mitigating the feature sparseness problem. We used a publicly available 1 set of 1000 Brown clusters induced from a corpus of 56 million Twitter messages. We leveraged the word clusters by mapping each word to the corresponding cluster, and we then represented each message as a bag-of-clusters vector in R K, where K = 1000 is the number of clusters. These word cluster features were also weighted with the BM25 scheme. 2.3 Concise Semantic Analysis Concise Semantic Analysis is a form of term and document representation that assigns, to each term, its weight on each of the classes (Li et al., 2011). These weights, computed from the frequencies of the term on the training data, reflect how associated the term is to each class. The weight of term j in class c is given by (Lopez- Monroy et al., 2013): w cj = ( log tf ) kj, (2) len(k) k P c where P c is the set of documents with label c and tf kj is the term frequency of term j in document k. To prevent labels with a higher number of examples, or terms with higher frequencies, to have stronger weights, an additional normalization step is performed to obtain nw cj, the normalized weight of term j in class c: nw cj = w cj w lj. (3) w ct l L t T In the formula, L is the set of class labels and T is the set of terms, making w lj the weight of term j for a class l, and w ct the weight of a term t in class c. After defining every term as a vector t j = {nw 1j,..., nw Cj } R C, where C is the number of classes, each message m is represented by summing each of its terms weight vectors: m csa = j m tf j len(m) t j. (4) In the formula, tf j is the frequency of term j in m. 2.4 Dense Word Vectors Efficient approaches have recently been introduced to train neural networks capable of producing continuous representations of words (Mikolov
3 Lexicon #1-grams #2-grams #pairs Bing Liu MPQA SentiStrength NRC EmoLex Sentiment NRC HashSent Table 2: Number of unigrams, bigrams, and collocation pairs, in the lexicons used in our system. et al., 2013). These approaches allow fast training of projections from a representation based on bags-of-words, where vectors have very high dimension (of the order of 10 4 ), but are also very sparse and integer-valued, to vectors of much lower dimensions (of the order of 10 2 ), with full density and continuous values. To induce word embeddings, a corpus of 17 million Twitter messages was collected with the Twitter crawler of Boanjak et al. (2012). Then, using word2vec 2, we induced representations for the word tokens occurring in the messages. All the tokens were represented as vectors w j R n, with n = 100. A message was modeled as the sum of the vector representations of the individual words: m vec = j m w j. (5) We also created a polarity class vector p c for each class c, defined as: p c = 1 m vec, (6) N c m c where m is a message of class c and N c is the total number of instances in class c. These vectors can be interpreted as prototypes of their classes, and are used in the classvec features described below. 3 The TUGAS System We now describe the TUGAS approach, detailing the considered features and our modeling choices. 3.1 Word Features To reduce the feature space of the model, messages were lower-cased, Twitter user mentions (@username) were replaced with the token <USER> and URLs were replaced with the <URL> token. We also normalized words to include at most 3 repeated characters (e.g., 2 helloooooo! to hellooo! ). Following Pang et al. (2002), negation was directly integrated into the word representations. All the tokens occurring between a negation word and the next punctuation mark, were suffixed with the NEG annotation. We used the following groups of features: bow-uni: vector of word unigrams bow-bc: vector of Brown word clusters csa: Concise Semantic Analysis vector m csa wordvec: word2vec message vector m vec classvec: Euclidean distance between message vector m vec and each class vector p c 3.2 Lexicon Features The document model was enriched with features that take into account the presence of words with a known prior polarity, such as happy or sad. We included words from manually annotated sentiment lexicons: Bing Liu Opinion Lexicon (Hu and Liu, 2004), MPQA (Wilson et al., 2005) and the NRC Emotion Lexicon (Mohammad and Turney, 2013). We also used the two automatically generated lexicons from Mohammad et al. (2013): the NRC Hashtag Sentiment Lexicon and the Sentiment140 Lexicon. Table 2 summarizes the number of terms of each lexicon. As Mohammad et al. (2013), we added the following set of lexicon features, for each lexicon, and for each combination of negated/non-negated words and positive/negative polarity. The sum of the sentiment scores of all (negated/non-negated) terms with (positive/negative) sentiment The largest of those scores The sentiment score of the last word in the message that is also present in the lexicon The number of terms within the lexicon Notice that terms can be unigrams, bigrams, and collocations pairs. A group of these features was computed for each of the sentiment lexicons. 3.3 Syntactic Features We extracted syntactic features aimed at the Twitter domain, such as the use of heavy punctuation, emoticons and character repetition. Concretely, the following features were computed from the original Twitter messages: Number of words originally with more than 3 repeated characters Number of sequences of exclamation marks and/or question marks 675
4 Tweets Test 2013 Tweets Test 2014 SMS 2013 Live Journal 2014 Tweets Sarcasm 2014 Features Acc F1 Official Acc F1 Official Acc F1 Official Acc F1 Official Acc F1 Official bow-uni submitted lexicons classvec wordvec bow-bc syntactic csa bow-uni Table 3: Impact of removing or adding groups of features. The row marked as submitted, in bold, is the one that we submitted to the shared task. The bold column is the official score used to rank participants. Number of positive/negative emoticons, detected with a pre-existing regular expression 3 Number of capitalized words 3.4 Model Training We used the L2-regularized logistic regression implementation from scikit-learn 4. Given a set of m instance-label pairs (x i, y i ), with i = 1,..., m, x i R n, and y i { 1, +1}, learning the classifier involves solving the following optimization problem, where C > 0 is a penalty parameter. m min w 1 2 w w + C i=1 log(1 + e y iw x i ). (7) In scikit-learn, the problem is solved through a trust region Newton method, using a wrapper over the implementation available in the liblinear 5 package. For multi-class problems, scikitlearn uses the one-vs-the-rest strategy. This particular implementation also suports the introduction of class weights, which we set to be inversely proportional to the class frequency in the training data, thus making each class equally important. The selection of groups of features to be included in the submitted run, as well as the tuning of the regularization constant, were obtained by cross-validation on the training dataset. 4 Results We report results using the following metrics: Accuracy, defined as the percentage of tweets correctly classified. Overall F1, computed by averaging the F1 score of all three classes. The Official SemEval score, computed by averaging the F1 scores of the positive and negative classes (Nakov et al., 2013) cjlin/liblinear/ Feature group Acc F1 Official bow-bc wordvec bow-uni csa Table 4: Performance comparison using different word representations in isolation. We tried including or excluding various groups of features, and obtained the best results on the training set using Brown clusters (bow-bc), lexicon features (lexicon), word2vec word representations (wordvec), and the Euclidean distance between the word2vec representation and each class vector (classvec). These features were the ones used in our submission. Inclusion of syntactic features (syntactic), Concise Semantic Analysis (csa), and word unigrams (bow-uni) was found to decrease performance during cross-validation, and thus these features were not included. Table 4 shows the results on the Twitter 2014 test set using only a single group of word representation features to train the model, from each of the techniques introduced in Section 2. This table suggests that exploiting unlabeled data is beneficial, as representing words through their Brown clusters (bow-bc) or through word2vec (wordvec) yields better results than unigrams or CSA. Table 3 shows results on five different test sets, including two from the 2013 challenge (Nakov et al., 2013), when features are added or removed from the official submission, one group at a time. Adding representations like bow-uni or csa actually hurts the performance, suggesting that, given the relatively small set of training instances, using coarse-level features in isolation, such as Brown clusters, can yield better results. More importantly, we verify that lexicon-based and Brown cluster features have the largest impact 676
5 (2.6% and 3.7%, respectively, in the official metric). These results indicate that leveraging unlabeled data yields significant improvements. 5 Conclusions This paper describes the participation of the TUGAS team in the message polarity classification task of SemEval We showed that there are significant gains in leveraging unlabeled data for the task of classifying the sentiment of Twitter texts. Our score of 69% ranks at fifth place in 42 submissions, roughly 2% points below the top score of 70.96%. We believe that the direction of leveraging unlabeled data is still vastly unexplored and, for future work, we intend to: (a) experiment with semi-supervised learning approaches, further exploiting unlabeled tweets; and (b) make use of domain adaptation strategies to leverage on labelled non-twitter data. Acknowledgements This work was partially supported by the EU/FEDER programme, QREN/POR Lisboa (Portugal), under the Intelligo project (contract 2012/24803). The researchers from INESC-ID were supported by Fundação para a Ciência e Tecnologia (FCT), through contracts Pest-OE/EEI/LA0021/2013, EXCL/EEI- ESS/0257/2012 (DataStorm), project grants PTDC/CPJ-CPO/116888/2010 (POPSTAR), and EXPL/EEI-ESS/0427/2013 (KD-LBSN), and Ph.D. scholarship SFRH/BD/89020/2012. References Matko Boanjak, Eduardo Oliveira, José Martins, Eduarda Mendes Rodrigues, and Luís Sarmento Twitterecho: a distributed focused crawler to support open research with twitter data. In 21st International Conference Companion on World Wide Web, pages Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai Class-based n-gram models of natural language. Computational Linguistics, 18(4): Tobias Günther and Lenz Furrer GU-MLT- LT: Sentiment analysis of short messages using linguistic features and stochastic gradient descent. In 7th International Workshop on Semantic Evaluation, pages Minqing Hu and Bing Liu Mining and summarizing customer reviews. In 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages Zhixing Li, Zhongyang Xiong, Yufang Zhang, Chunyong Liu, and Kuan Li Fast text categorization using concise semantic analysis. Pattern Recognition Letters, 32(3): Ádrian Pastor Lopez-Monroy, Manuel Montes-y Gomez, Hugo Jair Escalante, Luis Villasenor- Pineda, and Esaú Villatoro-Tello INAOE s participation at PAN 13: Author profiling task. In CLEF 2013 Evaluation Labs and Workshop. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean Distributed representations of words and phrases and their compositionality. In 27th Annual Conference on Neural Information Processing Systems, pages Lewis Mitchell, Kameron Decker Harris, Morgan R Frank, Peter Sheridan Dodds, and Christopher M Danforth The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place. PLoS ONE, 8(5):e Saif M Mohammad and Peter D Turney Crowdsourcing a word emotion association lexicon. Computational Intelligence, 29(3): Saif Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu NRC-Canada: building the state-ofthe-art in sentiment analysis of tweets. In 7th International Workshop on Semantic Evaluation, pages Preslav Nakov, Sara Rosenthal, Zornitsa Kozareva, Veselin Stoyanov, Alan Ritter, and Theresa Wilson SemEval-2013 task 2: Sentiment analysis in Twitter. In 7th International Workshop on Semantic Evaluation, pages Brendan O Connor, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith From tweets to polls: Linking text sentiment to public opinion time series. In 4th International AAAI Conference on Weblogs and Social Media, pages Georgios Paltoglou and Mike Thelwall A study of information retrieval weighting schemes for sentiment analysis. In 48th Annual Meeting of the Association for Computational Linguistics, pages Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan Thumbs up?: sentiment classification using machine learning techniques. In ACL-02 Conference on Empirical Methods in Natural Language Processing, pages Hilke Reckman, Baird Cheyanne, Jean Crawford, Richard Crowell, Linnea Micciulla, Saratendu Sethi, and Fruzsina Veress teragram: Rule-based detection of sentiment phrases using SAS sentiment analysis. In 7th International Workshop on Semantic Evaluation, pages Theresa Wilson, Janyce Wiebe, and Paul Hoffmann Recognizing contextual polarity in phraselevel sentiment analysis. In Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages
Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training
Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training Bing Xiang * IBM Watson 1101 Kitchawan Rd Yorktown Heights, NY 10598, USA [email protected] Liang Zhou
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
IIIT-H at SemEval 2015: Twitter Sentiment Analysis The good, the bad and the neutral!
IIIT-H at SemEval 2015: Twitter Sentiment Analysis The good, the bad and the neutral! Ayushi Dalmia, Manish Gupta, Vasudeva Varma Search and Information Extraction Lab International Institute of Information
Kea: Expression-level Sentiment Analysis from Twitter Data
Kea: Expression-level Sentiment Analysis from Twitter Data Ameeta Agrawal Computer Science and Engineering York University Toronto, Canada [email protected] Aijun An Computer Science and Engineering
NILC USP: A Hybrid System for Sentiment Analysis in Twitter Messages
NILC USP: A Hybrid System for Sentiment Analysis in Twitter Messages Pedro P. Balage Filho and Thiago A. S. Pardo Interinstitutional Center for Computational Linguistics (NILC) Institute of Mathematical
Sentiment Lexicons for Arabic Social Media
Sentiment Lexicons for Arabic Social Media Saif M. Mohammad 1, Mohammad Salameh 2, Svetlana Kiritchenko 1 1 National Research Council Canada, 2 University of Alberta [email protected], [email protected],
SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter
SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter Boris Velichkov, Borislav Kapukaranov, Ivan Grozev, Jeni Karanesheva, Todor Mihaylov, Yasen Kiprov, Georgi Georgiev,
CMUQ@Qatar:Using Rich Lexical Features for Sentiment Analysis on Twitter
CMUQ@Qatar:Using Rich Lexical Features for Sentiment Analysis on Twitter Sabih Bin Wasi, Rukhsar Neyaz, Houda Bouamor, Behrang Mohit Carnegie Mellon University in Qatar {sabih, rukhsar, hbouamor, behrang}@cmu.edu
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
Robust Sentiment Detection on Twitter from Biased and Noisy Data
Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research [email protected] Junlan Feng AT&T Labs - Research [email protected] Abstract In this
S-Sense: A Sentiment Analysis Framework for Social Media Sensing
S-Sense: A Sentiment Analysis Framework for Social Media Sensing Choochart Haruechaiyasak, Alisa Kongthon, Pornpimon Palingoon and Kanokorn Trakultaweekoon Speech and Audio Technology Laboratory (SPT)
Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies
Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts
Sentiment Analysis and Topic Classification: Case study over Spanish tweets
Sentiment Analysis and Topic Classification: Case study over Spanish tweets Fernando Batista, Ricardo Ribeiro Laboratório de Sistemas de Língua Falada, INESC- ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa,
DiegoLab16 at SemEval-2016 Task 4: Sentiment Analysis in Twitter using Centroids, Clusters, and Sentiment Lexicons
DiegoLab16 at SemEval-016 Task 4: Sentiment Analysis in Twitter using Centroids, Clusters, and Sentiment Lexicons Abeed Sarker and Graciela Gonzalez Department of Biomedical Informatics Arizona State University
Sentiment Analysis for Movie Reviews
Sentiment Analysis for Movie Reviews Ankit Goyal, [email protected] Amey Parulekar, [email protected] Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts Cataldo Musto, Giovanni Semeraro, Marco Polignano Department of Computer Science University of Bari Aldo Moro, Italy {cataldo.musto,giovanni.semeraro,marco.polignano}@uniba.it
Sentiment Classification on Polarity Reviews: An Empirical Study Using Rating-based Features
Sentiment Classification on Polarity Reviews: An Empirical Study Using Rating-based Features Dai Quoc Nguyen and Dat Quoc Nguyen and Thanh Vu and Son Bao Pham Faculty of Information Technology University
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian
Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian Borislav Kapukaranov Faculty of Mathematics and Informatics Sofia University Bulgaria [email protected] Preslav Nakov Qatar Computing
Twitter sentiment vs. Stock price!
Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured
Sentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: [email protected] Mihaela Cocea Email: [email protected] Sanaz Fallahkhair Email:
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction
Emoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
Sentiment Analysis: a case study. Giuseppe Castellucci [email protected]
Sentiment Analysis: a case study Giuseppe Castellucci [email protected] Web Mining & Retrieval a.a. 2013/2014 Outline Sentiment Analysis overview Brand Reputation Sentiment Analysis in Twitter
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
Sentiment Analysis of Microblogs
Thesis for the degree of Master in Language Technology Sentiment Analysis of Microblogs Tobias Günther Supervisor: Richard Johansson Examiner: Prof. Torbjörn Lager June 2013 Contents 1 Introduction 3 1.1
Twitter Stock Bot. John Matthew Fong The University of Texas at Austin [email protected]
Twitter Stock Bot John Matthew Fong The University of Texas at Austin [email protected] Hassaan Markhiani The University of Texas at Austin [email protected] Abstract The stock market is influenced
Package syuzhet. February 22, 2015
Type Package Package syuzhet February 22, 2015 Title Extracts Sentiment and Sentiment-Derived Plot Arcs from Text Version 0.2.0 Date 2015-01-20 Maintainer Matthew Jockers Extracts
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School
Fine-grained German Sentiment Analysis on Social Media
Fine-grained German Sentiment Analysis on Social Media Saeedeh Momtazi Information Systems Hasso-Plattner-Institut Potsdam University, Germany [email protected] Abstract Expressing opinions
A Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
End-to-End Sentiment Analysis of Twitter Data
End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India [email protected],
Neuro-Fuzzy Classification Techniques for Sentiment Analysis using Intelligent Agents on Twitter Data
International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 23 No. 2 May 2016, pp. 356-360 2015 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/
CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques
CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques Chris MacLellan [email protected] May 3, 2012 Abstract Different methods for aggregating twitter sentiment data are proposed and three
e-commerce product classification: our participation at cdiscount 2015 challenge
e-commerce product classification: our participation at cdiscount 2015 challenge Ioannis Partalas Viseo R&D, France [email protected] Georgios Balikas University of Grenoble Alpes, France [email protected]
SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses from the College of Business Administration Business Administration, College of 4-1-2012 SENTIMENT
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau Department of Computer Science Columbia University New York, NY 10027 USA {apoorv@cs, xie@cs, iv2121@,
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network
I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015 Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network
Impact of Financial News Headline and Content to Market Sentiment
International Journal of Machine Learning and Computing, Vol. 4, No. 3, June 2014 Impact of Financial News Headline and Content to Market Sentiment Tan Li Im, Phang Wai San, Chin Kim On, Rayner Alfred,
REACTION Workshop 2013.07.31 Overview Porto, FEUP. Mário J. Silva IST/INESC-ID, Portugal REACTION
Workshop 2013.07.31 Overview Porto, FEUP Mário J. Silva IST/INESC-ID, Portugal Agenda 11:30 Welcome + Quick progress report and status summary 11:45 Task leaders summarize ongoing activities (10 min each
Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians
Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians Lucas Brönnimann University of Applied Science Northwestern Switzerland, CH-5210 Windisch, Switzerland Email: [email protected]
Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
Technical Presentations. Arian Pasquali, FEUP, REACTION Data Collection Plataform David Batista, INESC-ID, Sematic Relations Extraction REACTION
Agenda 11:30 Welcome + Quick progress report and status summary 11:45 Task leaders summarize ongoing activities (10 min each max) 12:30 Break. 14:00 Technical Presentations 15:00 Break 16:00 Short Technical
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS Gautami Tripathi 1 and Naganna S. 2 1 PG Scholar, School of Computing Science and Engineering, Galgotias University, Greater Noida,
Predicting Movie Revenue from IMDb Data
Predicting Movie Revenue from IMDb Data Steven Yoo, Robert Kanter, David Cummings TA: Andrew Maas 1. Introduction Given the information known about a movie in the week of its release, can we predict the
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
Gated Neural Networks for Targeted Sentiment Analysis
Gated Neural Networks for Targeted Sentiment Analysis Meishan Zhang 1,2 and Yue Zhang 2 and Duy-Tin Vo 2 1. School of Computer Science and Technology, Heilongjiang University, Harbin, China 2. Singapore
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
Sentiment Analysis of Short Informal Texts
Journal of Artificial Intelligence Research 50 (2014) 723 762 Submitted 12/13; published 08/14 Sentiment Analysis of Short Informal Texts Svetlana Kiritchenko Xiaodan Zhu Saif M. Mohammad National Research
Sentiment analysis for news articles
Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based
Semantic Sentiment Analysis of Twitter
Semantic Sentiment Analysis of Twitter Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The 11 th International Semantic Web Conference
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.
Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Akshay Amolik, Niketan Jivane, Mahavir Bhandari, Dr.M.Venkatesan School of Computer Science and Engineering, VIT University,
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens
Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
Search Engines. Stephen Shaw <[email protected]> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Improving Sentiment Analysis in Twitter Using Multilingual Machine Translated Data
Improving Sentiment Analysis in Twitter Using Multilingual Machine Translated Data Alexandra Balahur European Commission Joint Research Centre Via E. Fermi 2749 21027 Ispra (VA), Italy [email protected]
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality
Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality Anindya Ghose, Panagiotis G. Ipeirotis {aghose, panos}@stern.nyu.edu Department of
Identifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
Sentiment Analysis Tool using Machine Learning Algorithms
Sentiment Analysis Tool using Machine Learning Algorithms I.Hemalatha 1, Dr. G. P Saradhi Varma 2, Dr. A.Govardhan 3 1 Research Scholar JNT University Kakinada, Kakinada, A.P., INDIA 2 Professor & Head,
Research on Sentiment Classification of Chinese Micro Blog Based on
Research on Sentiment Classification of Chinese Micro Blog Based on Machine Learning School of Economics and Management, Shenyang Ligong University, Shenyang, 110159, China E-mail: [email protected] Abstract
Content vs. Context for Sentiment Analysis: a Comparative Analysis over Microblogs
Content vs. Context for Sentiment Analysis: a Comparative Analysis over Microblogs Fotis Aisopos $, George Papadakis $,, Konstantinos Tserpes $, Theodora Varvarigou $ L3S Research Center, Germany [email protected]
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Prediction of Stock Market Shift using Sentiment Analysis of Twitter Feeds, Clustering and Ranking
382 Prediction of Stock Market Shift using Sentiment Analysis of Twitter Feeds, Clustering and Ranking 1 Tejas Sathe, 2 Siddhartha Gupta, 3 Shreya Nair, 4 Sukhada Bhingarkar 1,2,3,4 Dept. of Computer Engineering
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter Hassan Saif, 1 Yulan He, 2 Miriam Fernandez 1 and Harith Alani 1 1 Knowledge Media Institute, The Open University,
Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
