Predicting Short Term Company Performance by Applying Sentiment Analysis and Machine Learning Algorithms on Social Media

Size: px
Start display at page:

Download "Predicting Short Term Company Performance by Applying Sentiment Analysis and Machine Learning Algorithms on Social Media"

Transcription

1 Predicting Short Term Company Performance by Applying Sentiment Analysis and Machine Learning Algorithms on Social Media Niels ten Boom University of Twente P.O. Box 27, 7500AE Enschede The Netherlands ABSTRACT This paper reports about research into the use of sentiment analysis on social media in order to predict a company s short term performance. As a measure for the short term performance, the stock price of a company is used. The sentiment is extracted from a large corpus of tweets mentioning twenty large companies and a few techniques of extracting sentiment are reviewed. We find that for sentiment analysis a Naive Bayes classifier trained with data very similar to the corpus performs best. We use the Naive Bayes classifier to extract the sentiment from tweets. Together with the stock prices of twenty companies, we train various supervised machine learning models. We find that there is a combination of data where the accuracy of a classifier is 65,5%, but most other cases appear to be as bad as an algorithm that classifies randomly. The first tweet clearly has a positive sentiment whereas the second tweet has a negative sentiment. The third tweet does not seem to have sentiment at all and thus can be flagged as neutral. Performing sentiment analysis means that a computer algorithm is used to extract the sentiment of text. The stock price of a company can go up or down or stay relatively the same. This can only happen within the window when the stock is traded (between 9AM and 5PM). An example of the price direction of Apple s stock in one day can be seen in Figure. It is clear that the stock price has gone down that day. Maybe it is possible that this could have been predicted by examining the sentiment of the public, because research suggests that public sentiment has an influence on the financial market [7]. Keywords Sentiment analysis, Twitter, Naive Bayes classification, Machine Learning, Stock price. INTRODUCTION Social media plays a big role in society nowadays. A lot of people use it to share photos, stories and their activities. There are people that use it to express opinions or feelings about various topics and these are posted on the Internet for everyone to see. This research is interested in the opinions/sentiment towards companies and if these opinions are correlated with the stock price of a company. Twitter will be the provider of the social media data on which the sentiment analysis will be performed on. Because Twitter is the social media platform where large amounts of posts are easily filtered and extracted from. Take for example the three following tweets mentioning Apple: #AppleWatch launched and #Apple team. Looks cool as expected #appstore down for all users :( #Apple #wtf Apple Store app updated with support for AppleWatch. #Apple #iphone #ipad Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 23 th Twente Student Conference on IT June 22 st, 205, Enschede, The Netherlands. Copyright 205, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science. Figure. Sample of daily stock price We will experiment with predicting whether the stock price will go up or down at the end of the day. This will be done by performing sentiment analysis on tweets posted in the same period and by using the results of this analysis with several machine learning algorithms. There exists a lot of controversy regarding the prediction of stock price directions (up or down). Some theories suggest that it can not be done [4] [3], whereas other research reports very positive results [2] [5]. This will be further discussed in Section 2. In this paper we hope to find a clear conclusion regarding this controversy. If the results of this research are positive then a system could be created that can monitor Twitter and predict if it is likely that the stock price of a company will go up or down. Such a system could be of use in the financial domain as an analysis tool to make investment decisions. In this paper we first discuss the related work in Section 2. The research questions are then formulated in Section 3. Then the methodology is described in Section 4 and the results of the experiments described in the methodology section are presented in Section 5. And from these results

2 a conclusion will be drawn in Section RELATED WORK This research is going to be using different methods for sentiment analysis, an area where much research has been done already [9]. Sentiment analysis of tweets has also been successfully executed [8]. The research in which they accurately predicted box office numbers for movies using social media sentiment is related [], because the success of a movie also contributes to the performance of the company that published the movie. The main inspiration for this research comes from the study by J. Bollen et al. where they used sentiment analysis on a large amount of tweets to predict the direction of a big American market index using machine learning [2]. The results of this research were very positive, they reported an accuracy of 86,7%. But because they tried to predict one single market index, it could have been a matter of favorable circumstances. Their test set consisted of 9 days and thus 9 instances were tested. This research does not try to predict the direction of a market index but tries to predict the direction of stock prices of multiple companies, which results in a larger train and test set. There is more research on predicting the market using large amounts of data. Some research had less optimistic results [6]. In contrast to that, some research claims that they can outperform the market as a whole using large amounts of social media data [5]. The earlier mentioned controversy is that these papers report that they have predicted the directions of stock prices with high accuracy. However this contradicts the widely accepted random walk hypothesis [4]. The random walk hypothesis states that one can not predict whether a stock price goes up or down with greater accuracy than 50%. This is in line with Efficient-market Hypothesis (EMH) [3], which states that if there was a way to predict the stock market, everybody would be doing it, which would influence the market in such a way that it would not be predictable anymore. 3. RESEARCH QUESTIONS The main focus of this research is on the correlation between the sentiment towards companies and the short term stock price directions of these companies. So the main research question can be formulated as: Is it possible to predict the daily stock price direction by performing sentiment analysis on a large amount of social media messages mentioning a company? This main question is answered with the help of the following subquestions:. Which method of sentiment analysis is most effective for analyzing short text messages? There are several methods and tools for sentiment analysis. A few different tools will be used to perform sentiment analysis on the tweets. Also an evaluation of the tools will be done. 2. Which combination of machine learning algorithm and data features yields the best result? After all of the tweets are converted by the sentiment analysis then machine learning algorithms will be implemented and evaluated. Experiments with different combinations of sentiment will be done. For instance by only processing the amount of positivity or negativity or a combination of both. 3. Does taking the sentiment of earlier days into account improve the accuracy of the prediction? It could be that the sentiment of a specific day is only later reflected in the stock price. Therefore, we will experiment with processing the sentiment of up to four days earlier. 4. METHODOLOGY This section describes the methodology used in this research. The methodology can be split up in three parts. The first part is the collection and preprocessing of the data. This is discussed in Section 4.. Section 4.2 describes the sentiment analysis, the tools it used for that and the evaluation of the tools. Section 4.3 describes how all of the data was processed with the machine learning algorithms. 4. Dataset The dataset to perform the analysis was acquired by using the Application Programming Interface (API) of the short message platform Twitter. The API was used to scrape tweets related to twenty companies in America with the largest market capitalization [2]. The reasoning behind this was that larger companies get mentioned more on social media, which should result in enough data to work with. The API was queried for English tweets containing the hashtag name of each company and its stock market ticker. For instance, Apple tweets were saved when they contained the strings: #Apple, #AAPL, AAPL. This process was programmed on a server that ran from March 7, 205 until May, 205. In total about.5 million tweets were saved. The tweets came with UTC timestamps, these were converted to CST timestamps to match the American stock trading window. The stock price data of the twenty companies was extracted from Yahoo Finance [5]. This data was then simplified by classifying it with the UP or DOWN classes per day. Table shows an example by assigning the Apple stock with the DOWN class because the stock went down that day (Open > Close). The situation where a stock value stays the same was disregarded, because this did not occur in the dataset. Table. Example of the stock price data preprocessing Company Date Open Close class Apple DOWN This classification of the stock price data was done to reduce the complexity of the experiments. Exact statistics of all the data can be found in Table 7. From this table it becomes clear that the amount of tweets is not evenly distributed, some companies are being discussed on Twitter more often. The company Berkshire Hathaway received a too small amount of tweets and was discarded. 4.2 Implementing Sentiment Analysis After the collection of data was finished, sentiment had to be extracted from the tweets. As described before, opensource sentiment analysis tools were used to perform the sentiment analysis. The tools had to be given text as input and they would output the sentiment as a string. But before these tools were going to be used, an evaluation was performed to make sure that the classification of the tweets was reliable enough. Using more than three tools 2

3 was originally planned, but some turned out to be too complex to program them to work with tweets and for that reason they were left out of the evaluation. The first tool was a sentiment analysis tool that was trained with a dataset of IMDB movie reviews [3]. The second tool was the sentiment analysis module of the Stanford NLP toolkit [6]. This tool was also trained by a dataset based on the sentences of movie reviews []. The last tool that was evaluated was a self programmed Multinomial Naive Bayes classifier that was trained with a corpus of 4597 hand-classified tweets [0] that had no punctuation and uppercase characters. The tweets were tokenized based on spaces. The classifier was programmed with the use of the Weka [4] API. Subsection 4.2. elaborates on this classifier. The tools were programmed to determine the polarity of a tweet by tagging it as Positive, Negative or Neutral. This was tested on a sample of 20 hand labeled tweets randomly selected from the total of tweets. The distributions of the training set for the Naive Bayes classifier and the test set can be viewed in Table 2. for a tweet t. The class for which the value is the highest, is the class the tweet is going to be classified as. This algorithm was implemented using the Weka API in Java. The program was designed to convert an input arff file containing tweets to the counts of positive, negative and neutral tweets. Such a file was converted for each day in the dataset per company. This data was saved so that later it could be passed on to the machine learning algorithms. Figure 2, 3 and 4 visualize some data of the positive sentiment and the stock price. At each figure, the correlation coefficient ρ is mentioned. As can be seen, the correlation coefficient is decent for the two single companies. However, when the data of all the companies is plotted together, the correlation coefficient is lower. But this is only the positive sentiment, it could be that the machine learning algorithms are able to discover a pattern together with the rest of the sentiment data Apple 0.7 Table 2. Distributions of the percentages of positive, negative and neutral tweets in the training and test set. % pos % neg % neu Training set (4597 tweets) Test set (20 tweets) Stock price Positive Sentiment The results of this evaluation are presented in Section 5. This evaluation led to the decision to exclusively use the self programmed Naive Bayes classifier Classification with Naive Bayes This subsection elaborates on the Naive Bayes classifier that was programmed. The training set had to be modified to train the classifier with. This was done by converting the tweets to word vectors, and for each word a feature was created, this was achieved by applying the StringToWordVector filter in Weka. Naive Bayes makes use of Bayes theorem: P (A B) = P (B A) P (A) P (B) This equation is the foundation for the classifier. Because ultimately we would like to compute that given a tweet t which of the following has the highest probability P (c pos t) or P (c neg t) or P (c neu t). Where c represents the sentiment class positive, negative or neutral. This results in the following equation: Figure 2. Apple s positive sentiment plotted together with its stock price. (ρ = 0.672) Stock price Disney Positive Sentiment Figure 3. Disney s positive sentiment plotted together with its stock price. (ρ = 0.592) 0.9 P (c class t) = P (t c class) P (c class ) P (t) P (c class ) can be computed by dividing the the amount of tweets of that class by the total amount in the training set. P (t c class ) can in turn be computed by splitting the tweet up in words and then compute P (w c class ) for each word w in t and then these probabilities were multiplied. P (w c class ) is solved for the times word w occurs in c class divided by the total amount that word occurs in the training set. Training the classifier means computing P (c class ) and P (w c class ) for each word w in the vocabulary and storing these values so they can be used to classify by finding the maximum value of {P (c pos t),p (c neg t), P (c neu t)} Stock price Positive sentiment Figure 4. Positive sentiment towards all companies plotted together with its stock prices. (ρ = ) The file format that Weka uses, it is comparable to a Comma Separated Values (CSV) file 3

4 4.3 Predicting Stock Price Directions Four machine learning algorithms were evaluated in the prediction of the stock price direction. Random forests, neural networks, support vector machines and logistic regression are the techniques that were used. Weka was used for the implementations of these algorithms. All of the algorithms were applied with their default settings in Weka. For each company the sentiment data was available in the form of the counts of the positive, negative and neutral tweets per day. These counts were normalized separately for each company in the range [0,] with the feature scaling formula: x = x min(x) max(x) min(x) Where x stands for the number of tweets. When for instance the count of the number of positive tweets for a company were normalized, max(x) would be the highest count of positive tweets for that company in the dataset and min(x) the lowest. This was done because some companies received significantly more tweets than others. With the sentiment data for each company on the same scale, better results are expected. An example of a single data instance containing all sentiment features can be viewed in Table 3. The algorithms were also evaluated by using the ratios of the sentiment as features opposed to normalizing them. The ratios of the positive, negative and neutral sentiment were computed by dividing the number of tweets of a specific class on a day by the total number of tweets of that day. This also resulted in values in the range of [0,]. Table 3. Example of a single data instance containing all of the sentiment features with normalized data. positive negative neutral class UP Then the machine learning algorithms were evaluated by trying out different combinations of features related to the sentiment. The main features were positive, negative and neutral sentiment. The algorithms were tested by several combinations of these features. Experiments with using the sentiment of earlier days were also conducted. These experiments were extended by adding extra features. The features that were added are: the sentiment of the previous day, the sentiment of two days before, the amount of tweets and the stock direction of the previous day. Two combinations of these features was experimented with. The results were documented as percentages of correctly classified instances and can be viewed in Section 5. The algorithms were trained with the gathered data until April 20, 205 and evaluated with the data from April 2, 205 until April 30, 205. So the split was roughly 75% train data and 25% test data. Because the data was of 33 trading days and 9 companies, the total amount of data instances was 9 33 = 627, of which roughly 75% was used to train the algorithms. The distributions of the UP and DOWN classes can be viewed in Table 4. Table 4. Distributions of the directions UP and DOWN in the training and test set. % UP % DOWN Training set (456 instances) Test set (7 instances) RESULTS This section presents the results of the experiments described in Section 4. Section 5. presents the results of the evaluation of the sentiment analysis tools. Section 5.2 presents the results of the experiments where machine learning algorithms were used to try to predict stock price movements. 5. Sentiment Analysis This section presents the results of the evaluation of the three proposed ways to classify tweets on their sentiment. This was done by letting the tools classify 20 hand-labeled tweets. The results of this evaluation are presented in table 5. Table 5. Percentage of correctly classified instances per analysis method. Method Correctly classified Python tool 5.2% StanfordNLP 9.9% Naive Bayes classifier 77.% From Table 5 it becomes clear that the tools that were not exclusively trained with sentiment-labeled tweets do not perform very well in classifying them. An explanation for this could be that the use of language in tweets differs from the use of language in the data these tools were trained with. That is why the Naive Bayes classifier was chosen for the sentiment analysis in this research. Table 6. Confusion matrix of the evaluation of the Naive Bayes classifier. classified as pos neg neu class pos 3 6 neg neu Table 6 shows the confusion matrix of the Naive Bayes classifier. What stands out is that a lot of the tweets are incorrectly classified as neutral. But it is safe to state that the algorithm does recognize the difference between positivity and negativity. Only one positive tweet was incorrectly classified as negative. However, a portion of the neutral tweets are classified as either positive and negative. Only 5% of the positive tweets were classified correctly and 2.7% of the negative tweets. An explanation for this is that in the training dataset the tweets are not labeled by the same person who labeled the test set. This means that sentiment could have been interpreted differently in both sets. Better results are probably expected if we would have labeled our own training set for this research. But due to a limited time frame, this was not conducted. 5.2 Prediction of Stock Price Directions The results applying the machine learning algorithms to the data can be viewed in Table 8, Table 9 and Table 4

5 Table 7. Statistics of the gathered data of the twenty companies Company # of tweets % of total # of DOWN # of UP Apple , Google , Exxon Mobil , Microsoft , Berkshire Hathaway 9 0,0 - - Wal-Mart ,4 2 2 Johnson&Johnson , Wells Fargo 557 0, General Electric ,7 7 6 Procter&Gamble , Coca-cola 082 0, JPMorgan , Chevron , Verizon 6386,06 22 Facebook , Pfizer ,8 5 8 AT&T , Oracle 6646, Bank of America 2308, Disney , Total Table 8. Percentages of correctly classified instances per machine learning algorithm with normalized data. For combinations of sentiment and a shift in days. The columns with p stand for the cases in which only the positive sentiment was used as feature. pn stands for the positivity and negativity. And pnn stands for the cases in which the positivity, negativity and neutrality were used. same day - day -2 days -3 days -4 days p pn pnn p pn pnn p pn pnn p pn pnn p pn pnn Logistic Regression 54,6 54,6 5,5 42, 43,4 46,7 65,5 64,9 62,6 53,3 53,3 53,3 54,9 54,9 50,4 Support Vector Machine 52,6 52,6 53,5 49,3 48,7 48,7 52,6 60, ,2 52,6 5,3 54,9 54,9 55,6 Random Forest 53,2 44,7 45,8 46,7 55,2 56,6 50,0 50,3 56, 46,7 47,4 50,7 48,9 39, 43,6 Neural Network 52,6 53,2 53,5 5, ,6 52,6 52,6 50,9 42,8 42,8 46, 45, 42, 46,6 Table 9. Percentages of correctly classified instances per machine learning algorithm where the sentiment data were ratios. Computed by dividing the amount of tweets of a specific sentiment by the total amount of tweets on that day. The meaning of p, pn and pnn is the same as in Table 8. same day - day -2 days -3 days -4 days p pn pnn p pn pnn p pn pnn p pn pnn p pn pnn Logistic Regression 47,8 47,8 47,8 40,7 46,7 44,7 53,3 56,6 59,9 57,2 53,3 49,3 48,0 5,3 43,4 Support Vector Machine 47,4 47,4 50,7 53,3 42,8 40,8 49,3 5,3 49,3 57,2 57,2 57,2 52,6 53,3 42, Random Forest 47,4 48,8 49,3 45,4 55,9 44, 5,3 52,0 50,7 49,3 50,7 48,0 44, 38,8 40,8 Neural Network 55,0 56,0 5,2 53,3 54,6 54,6 57,9 57,9 59,2 44, 48,7 48,0 40,8 39,5 39,5 Table 0. This table contains the results of extending the previous experiments. By adding the sentiment (positive, negative and neutral) of one and one plus two days earlier as extra features (S and S 2 resp.). The stock direction (UP or DOWN) of the previous day and the amount of tweets were also added as features in these experiments. same day - day -2 days S 2 S S 2 S S 2 S Logistic Regression 63,2 48, 57,9 6,4 62,3 59,6 Support Vector Machine 62,4 48, 59, ,3 53,5 Random Forest 54,9 42, 50 50,9 54,4 54,4 Neural Network 49,6 38,4 54,4 45,6 5,8 56, 5

6 0. Table 8 contains the results where the sentiment data was normalized. Table 9 contains the results where the ratios of the sentiment were used by dividing the tweets of a sentiment by the amount of all tweets on that day. Table 0 contains the results of the experiments where firstly the amount of tweets and the stock direction of the previous day were added as extra features. Secondly the experiments were split up so that there was a case where sentiment of the previous day was added as features (results in column S ) and there was a case where the sentiment of the previous day and the day before that were added as features (results in column S 2). For each algorithm the combination of data that resulted in the highest accuracy was highlighted. Considering that a random algorithm should have 50% 2 accuracy in predicting the classes UP or DOWN, the results are not much better than that. Quite a few results score worse than random and most of them score around 50% accuracy. However when trying to predict the classes with the Logistic Regression algorithm and using only the positive sentiment of two days earlier, it predicted 65,5% of the test instances correctly. Which is significantly higher than all of the other results. 6. CONCLUSIONS AND FUTURE WORK Looking at the results, research question has a clear answer. Namely that the most effective way to classify tweets is by training a classifier that was trained by similar data, tweets in this case. A Naive Bayes classifier proved to be most effective. However, when we take the confusion matrix in Table 6 in account, the results were not that great either. Future research should use a better classifier that performs better in distinguishing positive or negative tweets from neutral tweets. Research question 2 looks if there are better results for a specific combination of features for the machine learning algorithms. From the results presented in this paper the conclusion can be drawn that adding the negative and neutral sentiment on top of the positive sentiment does only provide better results for some machine learning algorithms. Adding even more features did seem to improve the overall accuracies, as can be seen in Table 0. Research question 3 can be answered with a maybe, there are some results that seem significantly higher than 50%. However it is not sure if this was due to luck, as the Random walk hypothesis [4] suggests or that there really is some predictive power in the results. Future research should further identify which of the reasons seems most likely. With all these research questions combined the answer to the main research question is that one does not simply predict the stock price direction based on sentiment analysis on a large amount of social media posts. There were a few cases that yielded favorable results, but because this was not further verified, a clear conclusion can not be drawn yet. Future research should gather more data of more companies over a longer period and analyze it with a more reliable sentiment analysis technique. If that leads to similar or better results, better conclusions can be drawn. Another interesting approach would be to focus on a single company. Figure 2 and 3 in Section 4.2. show more structure than Figure 4 in Section It could be that repeating the experiments described in this research leads to better results when only taking a single company into account. 2 not exact as can be seen in Table 4, but we assume 50% for the sake of simplicity 7. REFERENCES [] S. Asur and B. A. Huberman. Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 200 IEEE/WIC/ACM International Conference on, volume, pages IEEE, 200. [2] J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(): 8, 20. [3] Github. Sentiment analysis in python. Accessed: [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, ():0 8, [5] M. Makrehchi, S. Shah, and W. Liao. Stock prediction using event-based sentiment analysis. In Proceedings IEEE/WIC/ACM International Conference on Web Intelligence, WI 203, volume, pages , 203. [6] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55 60, 204. [7] J. R. Nofsinger. Social mood and financial economics. The Journal of Behavioral Finance, 6(3):44 60, [8] A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In LREC, volume 0, pages , 200. [9] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(-2): 35, [0] Sananalytics. Twitter sentiment corpus. Accessed: [] R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), volume 63, page 642. Citeseer, 203. [2] Theonlineinvestor. 20 largest u.s. companies by market capitalization. Accessed: [3] Wikipedia. Efficient-market hypothesis. Efficient-market_hypothesis. Accessed: [4] Wikipedia. Random walk hypothesis. wikipedia.org/wiki/random_walk_hypothesis. Accessed: [5] Yahoo. Yahoo finance. Accessed: [6] Y. Yu, W. Duan, and Q. Cao. The impact of social and conventional media on firm equity value: A sentiment analysis approach. Decision Support Systems, 55(4):99 926,

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Keywords social media, internet, data, sentiment analysis, opinion mining, business Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction

More information

Package syuzhet. February 22, 2015

Package syuzhet. February 22, 2015 Type Package Package syuzhet February 22, 2015 Title Extracts Sentiment and Sentiment-Derived Plot Arcs from Text Version 0.2.0 Date 2015-01-20 Maintainer Matthew Jockers Extracts

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Forecasting stock markets with Twitter

Forecasting stock markets with Twitter Forecasting stock markets with Twitter Argimiro Arratia argimiro@lsi.upc.edu Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,

More information

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

More information

Using Tweets to Predict the Stock Market

Using Tweets to Predict the Stock Market 1. Abstract Using Tweets to Predict the Stock Market Zhiang Hu, Jian Jiao, Jialu Zhu In this project we would like to find the relationship between tweets of one important Twitter user and the corresponding

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch The Viability of StockTwits and Google Trends to Predict the Stock Market By Chris Loughlin and Erik Harnisch Spring 2013 Introduction Investors are always looking to gain an edge on the rest of the market.

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

Sentiment analysis: towards a tool for analysing real-time students feedback

Sentiment analysis: towards a tool for analysing real-time students feedback Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:

More information

Predicting sports events from past results

Predicting sports events from past results Predicting sports events from past results Towards effective betting on football Douwe Buursma University of Twente P.O. Box 217, 7500AE Enschede The Netherlands d.l.buursma@student.utwente.nl ABSTRACT

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Text Opinion Mining to Analyze News for Stock Market Prediction

Text Opinion Mining to Analyze News for Stock Market Prediction Int. J. Advance. Soft Comput. Appl., Vol. 6, No. 1, March 2014 ISSN 2074-8523; Copyright SCRG Publication, 2014 Text Opinion Mining to Analyze News for Stock Market Prediction Yoosin Kim 1, Seung Ryul

More information

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about

More information

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Twitter Stock Bot John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Hassaan Markhiani The University of Texas at Austin hassaan@cs.utexas.edu Abstract The stock market is influenced

More information

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market

More information

A Look Into the World of Reddit with Neural Networks

A Look Into the World of Reddit with Neural Networks A Look Into the World of Reddit with Neural Networks Jason Ting Institute of Computational and Mathematical Engineering Stanford University Stanford, CA 9435 jmting@stanford.edu Abstract Creating, placing,

More information

Using Social Media for Continuous Monitoring and Mining of Consumer Behaviour

Using Social Media for Continuous Monitoring and Mining of Consumer Behaviour Using Social Media for Continuous Monitoring and Mining of Consumer Behaviour Michail Salampasis 1, Giorgos Paltoglou 2, Anastasia Giahanou 1 1 Department of Informatics, Alexander Technological Educational

More information

Analysis of Tweets for Prediction of Indian Stock Markets

Analysis of Tweets for Prediction of Indian Stock Markets Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,

More information

Stock Prediction Using Twitter Sentiment Analysis

Stock Prediction Using Twitter Sentiment Analysis Stock Prediction Using Twitter Sentiment Analysis Anshul Mittal Stanford University anmittal@stanford.edu Arpit Goel Stanford University argoel@stanford.edu ABSTRACT In this paper, we apply sentiment analysis

More information

Robust Sentiment Detection on Twitter from Biased and Noisy Data

Robust Sentiment Detection on Twitter from Biased and Noisy Data Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research lbarbosa@research.att.com Junlan Feng AT&T Labs - Research junlan@research.att.com Abstract In this

More information

Can Twitter provide enough information for predicting the stock market?

Can Twitter provide enough information for predicting the stock market? Can Twitter provide enough information for predicting the stock market? Maria Dolores Priego Porcuna Introduction Nowadays a huge percentage of financial companies are investing a lot of money on Social

More information

Tweets Miner for Stock Market Analysis

Tweets Miner for Stock Market Analysis Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: b.pavlyshenko@gmail.com

More information

Applying Machine Learning to Stock Market Trading Bryce Taylor

Applying Machine Learning to Stock Market Trading Bryce Taylor Applying Machine Learning to Stock Market Trading Bryce Taylor Abstract: In an effort to emulate human investors who read publicly available materials in order to make decisions about their investments,

More information

Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction

Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction Proceedings of the Federated Conference on Computer Science and Information Systems pp. 1349 1354 DOI: 10.15439/2015F230 ACSIS, Vol. 5 Sentiment Analysis of Twitter Data within Big Data Distributed Environment

More information

Using Twitter as a source of information for stock market prediction

Using Twitter as a source of information for stock market prediction Using Twitter as a source of information for stock market prediction Ramon Xuriguera (rxuriguera@lsi.upc.edu) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

Crowdfunding Support Tools: Predicting Success & Failure

Crowdfunding Support Tools: Predicting Success & Failure Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo mdgreenb@u.northwestern.edu pardo@northwestern.edu Karthic Hariharan karthichariharan2012@u.northwes tern.edu Elizabeth

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Machine Learning Techniques for Stock Prediction. Vatsal H. Shah

Machine Learning Techniques for Stock Prediction. Vatsal H. Shah Machine Learning Techniques for Stock Prediction Vatsal H. Shah 1 1. Introduction 1.1 An informal Introduction to Stock Market Prediction Recently, a lot of interesting work has been done in the area of

More information

WILL TWITTER MAKE YOU A BETTER INVESTOR? A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET

WILL TWITTER MAKE YOU A BETTER INVESTOR? A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET WILL TWITTER MAKE YOU A BETTER INVESTOR? A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET ABSTRACT Eric D. Brown Dakota State University edbrown@dsu.edu The use of social networks

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

Twitter sentiment vs. Stock price!

Twitter sentiment vs. Stock price! Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured

More information

Predicting IMDB Movie Ratings Using Social Media

Predicting IMDB Movie Ratings Using Social Media Predicting IMDB Movie Ratings Using Social Media Andrei Oghina, Mathias Breuss, Manos Tsagkias, and Maarten de Rijke ISLA, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

More information

Equity forecast: Predicting long term stock price movement using machine learning

Equity forecast: Predicting long term stock price movement using machine learning Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK Nikola.milosevic@manchester.ac.uk Abstract Long

More information

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams 2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment

More information

End-to-End Sentiment Analysis of Twitter Data

End-to-End Sentiment Analysis of Twitter Data End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India apoorv@cs.columbia.edu,

More information

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Comparing Methods to Identify Defect Reports in a Change Management Database

Comparing Methods to Identify Defect Reports in a Change Management Database Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com

More information

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques Chris MacLellan cjmaclel@asu.edu May 3, 2012 Abstract Different methods for aggregating twitter sentiment data are proposed and three

More information

Semantic Sentiment Analysis of Twitter

Semantic Sentiment Analysis of Twitter Semantic Sentiment Analysis of Twitter Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The 11 th International Semantic Web Conference

More information

Author Gender Identification of English Novels

Author Gender Identification of English Novels Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

SENTIMENT ANALYZER. Manual. Tel & Fax: +39 0984 494277 E-mail: info@altiliagroup.com Web: www.altilagroup.com

SENTIMENT ANALYZER. Manual. Tel & Fax: +39 0984 494277 E-mail: info@altiliagroup.com Web: www.altilagroup.com Page 1 of 7 SENTIMENT ANALYZER Sede opertiva: Piazza Vermicelli 87036 Rende (CS), Italy Page 2 of 7 TABLE OF CONTENTS 1 APP documentation... 3 1.1 HOW IT WORKS... 3 1.2 Input data... 4 1.3 Output data...

More information

IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS

IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS Vojtěch Fiala 1, Svatopluk Kapounek 1, Ondřej Veselý 1 1 Mendel University in Brno Volume 1 Issue 1 ISSN 2336-6494 www.ejobsat.com ABSTRACT

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

A Distributed Sentiment Analysis Development Environment

A Distributed Sentiment Analysis Development Environment A Distributed Sentiment Analysis Development Environment Christopher Burdorf NBCUniversal 5750 Wilshire Blvd Los Angeles, CA, USA Christopher.Burdorf@nbcuni.com Abstract This document describes a work-in-progress

More information

Sentiment Analysis Tool using Machine Learning Algorithms

Sentiment Analysis Tool using Machine Learning Algorithms Sentiment Analysis Tool using Machine Learning Algorithms I.Hemalatha 1, Dr. G. P Saradhi Varma 2, Dr. A.Govardhan 3 1 Research Scholar JNT University Kakinada, Kakinada, A.P., INDIA 2 Professor & Head,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics Maximize Revenues on your Customer Loyalty Program using Predictive Analytics 27 th Feb 14 Free Webinar by Before we begin... www Q & A? Your Speakers @parikh_shachi Technical Analyst @tatvic Loves js

More information

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

Sentiment Analysis and Topic Classification: Case study over Spanish tweets Sentiment Analysis and Topic Classification: Case study over Spanish tweets Fernando Batista, Ricardo Ribeiro Laboratório de Sistemas de Língua Falada, INESC- ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa,

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence Dr. Sulkhan Metreveli Leo Keller The greed https://www.youtube.com/watch?v=r8y6djaeolo The money https://www.youtube.com/watch?v=x_6oogojnaw

More information

Sentiment Analysis on Big Data

Sentiment Analysis on Big Data SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS Huina Mao School of Informatics and Computing Indiana University, Bloomington, USA ECB Workshop on Using Big Data for Forecasting

More information

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation Linhao Zhang Department of Computer Science, The University of Texas at Austin (Dated: April 16, 2013) Abstract Though

More information

A Logistic Regression Approach to Ad Click Prediction

A Logistic Regression Approach to Ad Click Prediction A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh

More information

Sentiment Analysis of Investor Opinions on Twitter

Sentiment Analysis of Investor Opinions on Twitter Social Networking, 2015, 4, 62-71 Published Online July 2015 in SciRes. http://www.scirp.org/journal/sn http://dx.doi.org/10.4236/sn.2015.43008 Sentiment Analysis of Investor Opinions on Twitter Brian

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

III. DATA SETS. Training the Matching Model

III. DATA SETS. Training the Matching Model A Machine-Learning Approach to Discovering Company Home Pages Wojciech Gryc Oxford Internet Institute University of Oxford Oxford, UK OX1 3JS Email: wojciech.gryc@oii.ox.ac.uk Prem Melville IBM T.J. Watson

More information

Exploiting Social Media Data for Traffic Monitoring Using the Techniques of Data Mining

Exploiting Social Media Data for Traffic Monitoring Using the Techniques of Data Mining Exploiting Social Media Data for Traffic Monitoring Using the Techniques of Data Mining Shaikh Kamran, Musaib Shaikh, Alefiya Naseem, Priyanka Kamble B. E Student, Dept. of Computer Engineering, Trinity

More information

Classification of Learners Using Linear Regression

Classification of Learners Using Linear Regression Proceedings of the Federated Conference on Computer Science and Information Systems pp. 717 721 ISBN 978-83-60810-22-4 Classification of Learners Using Linear Regression Marian Cristian Mihăescu Software

More information

Sentiment Analysis for Movie Reviews

Sentiment Analysis for Movie Reviews Sentiment Analysis for Movie Reviews Ankit Goyal, a3goyal@ucsd.edu Amey Parulekar, aparulek@ucsd.edu Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing

More information

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Investigating Automated Sentiment Analysis of Feedback Tags in a Programming Course Stephen Cummins, Liz Burd, Andrew

More information

Twitter Volume Spikes: Analysis and Application in Stock Trading

Twitter Volume Spikes: Analysis and Application in Stock Trading Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao University of Connecticut yuexin.mao@uconn.edu Wei Wei FinStats.com weiwei@finstats.com Bing Wang University of Connecticut bing@engr.uconn.edu

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

SI485i : NLP. Set 6 Sentiment and Opinions

SI485i : NLP. Set 6 Sentiment and Opinions SI485i : NLP Set 6 Sentiment and Opinions It's about finding out what people think... Can be big business Someone who wants to buy a camera Looks for reviews online Someone who just bought a camera Writes

More information

The Use of Twitter Activity as a Stock Market Predictor

The Use of Twitter Activity as a Stock Market Predictor National College of Ireland Higher Diploma in Science in Data Analytics 2013/2014 Robert Coyle X13109278 robert.coyle@student.ncirl.ie The Use of Twitter Activity as a Stock Market Predictor Table of Contents

More information

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Akshay Amolik, Niketan Jivane, Mahavir Bhandari, Dr.M.Venkatesan School of Computer Science and Engineering, VIT University,

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Automatic measurement of Social Media Use

Automatic measurement of Social Media Use Automatic measurement of Social Media Use Iwan Timmer University of Twente P.O. Box 217, 7500AE Enschede The Netherlands i.r.timmer@student.utwente.nl ABSTRACT Today Social Media is not only used for personal

More information

Microblog Sentiment Analysis with Emoticon Space Model

Microblog Sentiment Analysis with Emoticon Space Model Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory

More information

A Decision Support Approach based on Sentiment Analysis Combined with Data Mining for Customer Satisfaction Research

A Decision Support Approach based on Sentiment Analysis Combined with Data Mining for Customer Satisfaction Research 145 A Decision Support Approach based on Sentiment Analysis Combined with Data Mining for Customer Satisfaction Research Nafissa Yussupova, Maxim Boyko, and Diana Bogdanova Faculty of informatics and robotics

More information

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Nishith Tirpankar, Jiten Thakkar tirpankar.n@gmail.com, jitenmt@gmail.com December 20, 2015 Abstract In the world

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Approaches for Sentiment Analysis on Twitter: A State-of-Art study

Approaches for Sentiment Analysis on Twitter: A State-of-Art study Approaches for Sentiment Analysis on Twitter: A State-of-Art study Harsh Thakkar and Dhiren Patel Department of Computer Engineering, National Institute of Technology, Surat-395007, India {harsh9t,dhiren29p}@gmail.com

More information

Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

More information

The process of gathering and analyzing Twitter data to predict stock returns EC115. Economics

The process of gathering and analyzing Twitter data to predict stock returns EC115. Economics The process of gathering and analyzing Twitter data to predict stock returns EC115 Economics Purpose Many Americans save for retirement through plans such as 401k s and IRA s and these retirement plans

More information

Rabobank: Incident and change process analysis

Rabobank: Incident and change process analysis Rabobank: Incident and change process analysis Michael Arias 1, Mauricio Arriagada 1, Eric Rojas 1, Cecilia Sant-Pierre 1, Marcos Sepúlveda 1 1 Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna

More information

Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods

Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods João Emanoel Ambrósio Gomes 1, Ricardo Bastos Cavalcante Prudêncio 1 1 Centro de Informática Universidade Federal

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Initial Report. Predicting association football match outcomes using social media and existing knowledge.

Initial Report. Predicting association football match outcomes using social media and existing knowledge. Initial Report Predicting association football match outcomes using social media and existing knowledge. Student Number: C1148334 Author: Kiran Smith Supervisor: Dr. Steven Schockaert Module Title: One

More information

Concurrent Validity and Consistency of Social Media Sentiment Analysis Tools

Concurrent Validity and Consistency of Social Media Sentiment Analysis Tools Concurrent Validity and Consistency of Social Media Sentiment Analysis Tools Joost Martijn van Aggelen ABSTRACT Users of social media increasingly express their opinions and as a result, a large number

More information