Twitter sentiment vs. Stock price!

Size: px
Start display at page:

Download "Twitter sentiment vs. Stock price!"

Transcription

1 Twitter sentiment vs. Stock price!

2 Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured were posted. This lead to a 1% loss on the Dow Jones. On May 6 th 2010 a poorly written algorithm triggered a selling spree that caused a 9.2% drop of the Dow Jones. Using text mining as part of trading algorithms is common, and more incidents similar to these have happened (e.g. fake news about American Airlines going bankrupt once made the stock price fall quickly). 2!

3 Aim! Inspired by this I wanted to look into the following: Is it possible to collect posts from Twitter (known as tweets), that mention a specified stock ticker (Apple Inc. uses AAPL), calculate a sentiment score of these tweets and find a visual relationship between this score and the stocks current price? When we say a visual relationship we mean that we want to plot the score and the price side by side and be able to visually see a relationship between them. More on this later 3!

4 Method - High level perspective! The general idea is to get all tweets for a specific hour, calculate the average sentiment score of these tweets, and plot it next to the closing price of the stock for that hour. But what is a sentiment score? 1. Find (or create) a corpus with tweets that are classified as positive or negative, create features and use in a naïve Bayes classifier (use the distribution rather than the label as the score) 2. Use a lexicon of sentiment tagged words, (e.g. bad could be negative and super could be good). For each tweet count the number of positive and negative words and create a score from these counts. 4!

5 Approach 1! The first approach was built upon what we have seen in the labs, creating features and using a naïve Bayes classifier. I found a corpus of tweets that were labelled as positive or negative. Based on these I wanted to create features and use them in a naive Bayes classifier. I created unigram, bigram and trigram features. Furthermore I created a TF-IDF index over these tweets and used it as a feature. I also partially used the second approach (lexicon of sentiment words, more on this later ). 5!

6 Approach 1! However it turned out that after a few days trying to coerce my code to get this to work in reasonable time I failed. Since each run was taking very long I decided that I needed to save the tokenized and cleaned tweets, along with their features (and the TF-IDF index) to disk. However when trying to serialize the class structure I had created, the pickle module included in Python was using > 5GB of RAM to check for cycles in the objects that were saved, and it basically blew up every time (giving MemoryError). So I had a choice of fixing this (and save to an SQL database rather than to a flat file), or find another approach I decided to use another approach. 6!

7 Approach 2! I found three lexicons that all consisted of words with a positive or negative label attached to it. One of them also included the POS of the word used: Example of first lexicon (8221 words): word1=agony pos1=noun priorpolarity=negative word1=agree pos1=verb priorpolarity=positive Example of second lexicon (3642): Consisted of two files, one with positive words: worst, wreck,... And one with negative words: shield, shiny, Example of third lexicon (6787): Consisted of two files, one with positive words: fine, flashy,... And one with negative words: spooky, sporadic, 7!

8 Approach 2! These lexicons were parsed and placed into a large lexicon (duplicates were allowed and not removed from the lexicon) I then downloaded 7945 tweets that contained the word AAPL (the stock ticker for Apple Inc.) For each of the tweets I did the following processing: Lowercase, remove all and other URL structures, remove all usernames removed all multiple whitespaces (i.e. became ), replaced #word with word, replaced repetitions of letters to only two (e.g. yeeeeeehaaaaa became yeehaa), removed all words that did not start with a number (i.e. 3am was removed), stripped punctuations (!,?,.,,) 8!

9 Approach 2! Next step was to create the actual sentiment score. For each tweet I wanted to look up the tokens in my lexicon to try and decide if the token was positive or negative. Since one of my lexicons also contained the POS of the word each of my tweets were subjected to POS tagging. Each token of a tweet was sent to the lexicon (along with the POS tag) and a sentiment was returned. I did a simple count of the positive and negative words. 9!

10 Approach 2! Since multiple lexicons were included in my larger lexicon I needed a way of decided which lexicon to trust for a given word (since there was some overlap between the lexicons) The following algorithm was created to solve ties: 1. If there is only one lexicon that contains the word then this lexicon wins. 2. If the token and POS matched the first lexicon then this lexicon wins. 3. If all lexicons agree on the sentiment then all win. 4. If lexicons disagree, then count (i.e. if one lexicon says positive, and the other two say negative then negative wins). 5. If it is still a tie then return neutral. 10!

11 Approach 2! So for each tweet there now exists a positive (p) and a negative (n) count, and the total number of tokens (N). The following two scores where then associated with each tweet: Sentiment diff: p n Positive score: p / N But I was not satisfied by this, because I felt that some words must be more negative than others, and some words must be more positive than others. 11!

12 Approach 2! The idea was then to create a TF-IDF index using the tokens in the lexicon (apprx unique tokens) and 2000 tweets from the downloaded AAPL tweets. This TF-IDF index was created (and since it was a reasonable size it could be serialized to disk). The issue then arose that it was only really useful on the 2000 tweets that I used to create the TF-IDF, when incoming tweets were to be processed they did not belong to the index. 12!

13 Approach 2! So since ignorance is bliss I invented the average TF-IDF weight: I calculated the average TF-IDF for each token in the index, saved this value and threw away all the other values in the index, creating a very compact index of average TF-IDF values. So for any token (regardless of which tweet it came from) I could get an average weight for the token. E.g. good could have weight and awesome could have weight !

14 Approach 2! So armed with the average TF-IDF index I continued my sentiment scoring. Instead of counting the positive and negative words I instead looked them up in the average TF-IDF index, and summed the weights. A weighted positive count (wp) and a weighted negative count (wn) gave the following scores: Weighted sentiment diff: wp wn Weighted positive score: wp / N 14!

15 Plotting! The 7945 tweets that were downloaded were grouped by hour, so all tweets that were posted between 11:01 AM and 12:00 AM were considered to belong to 12:00 AM. For each grouping the individual sentiment score for each tweet was calculated (using all four sentiment scores discussed). The total sentiment score for the grouping was simply the average score. From Google Finance hourly closing prices were downloaded for AAPL (this means that at time 11:00 AM the latest price AAPL was sold for is the closing price for this hour). 15!

16 Plots! Sentiment difference (raw counts)! At first glance visually useless,! however it is worth noting that the! maximum of each oscillation increases! Note: The flat horizontal lines are created while the stock market is closed.! Hourly price and sentiment score between the 21 st of May and 27 th of May! 16!

17 Plots! p / N (raw counts)! Difficult to find anything visually appealing about this.! Hourly price and positive score between the 21 st of May and 27 th of May! 17!

18 Plots! wp / N (weighted sum)! Just as bad as the positive score without the TF-IDF weighting.! Hourly price and weighted sum between the 21 st of May and 27 th of May! 18!

19 Plots! wp wn (weighted difference)! Chartists - investors that mainly look at charts of price and volume rather than the fundamental data about a company.! Looks for trends in the charts.! One of the classical ways of finding a trend it is to find higherlows.! The support lines drawn in the charts show that both the price and the sentiment are creating higher lows, indicating that the stock and the sentiment are entering (or already in) a period of upward trend.! 19!

20 Results! It is easy to conclude that most results were useless, however it is interesting to see some similarity in trend (in the chartists sense of the word) between price and the weighted diff.! One obvious flaw in the process could be the fact that I averaged the sentiment score of each hour, if this was kept raw then hours were there were a lot of positive tweets would possibly outweigh other hours more clearly, and possibly remove some of the oscillation.! When comparing the sentiment scores created against the already labelled tweets from approach 1 (the tweets), the accuracy of the scores were low (it would almost have been as good as just randomly guessing the sentiment).! An attractive feature of the sentiment scoring approach is the lack of a labelled corpus (the lexicons can be reused).! 20!

21 1.6 Task 1(c) Method Average TF-IDF! Using the selected processors (Table 4) the naive Bayes classifier was ran again, however this time with some added feature generators. We included the 1000 most frequent bigrams (creating has bigram( word1, word2 )) features for each document. A feature was added that tells the classifier if the average document word length is greater, less or equal to the corpus average word length. Furthermore a 10-bin feature with cuto s in regards to the lexical diversity of the document was created Results 1.7 Task Without 1(d) average TF-IDF features! 21! Table 5: Results from Task 1(c) Processors Method Features Accuracy Pre(P) Rec(P) F-M(P) Pre(N) Rec(N) F-M(N) PunctuationProcessor, HWFG, The ideastemmingprocessor, was to include tf-idf as a binary BFG, feature. This has been done by calculating the average tf-idf weight for each term in the entire LemmatizerProcessor, corpus, and then settingldg, the feature [tfidf( word ) > Avg] or [tfidf( word ) apple Avg] for each frequent term in each document. As before only the 1000 most AWLG, frequent terms have been used. LowerProcessor, Stop- HWFG, The TFIDF WordProcessor, feature generator StemmingProcessor, was added BFG, to the generators in Sec 1.6, using only a selection of the processors. Lemma- LDG, tizerprocessor, Results AWLG, LowerProcessor, PunctuationProcessor, HWFG, Stem- BFG, Table 6: Results from Task 1(d) With average TF-IDF features! mingprocessor, LemmatizerProcessor, LDG, Processors Features AWLG, Accuracy Pre(P) Rec(P) F-M(P) Pre(N) Rec(N) F-M(N) PunctuationProcessor, LowerProcessor, Number- HWFG, StemmingProcessor, Punctuation- LemmatizerProcessor, StopWordProcessor, BFG, LDG, StemmingProces- AWLG, sor, LemmatizerProces- TFIDF, LowerProcessor, sor, Stop- HWFG, WordProcessor, Stem- BFG, Conclusions Lemma- LDG, 1.6.3mingProcessor, tizerprocessor, AWLG, Adding the feature generators do change TFIDF, the column values more than trying di erent combinations of processors. However there LowerProcessor, is no di erence between Punc- the choice HWFG, of processors

22 give a different result. It would be interesting to expand this lexicon further to include more words, and also to try it on text that is not as random as Tweets are. The average TF-IDF index used here is not very large (only used 1000 tweets), this could possible also increase usefulness if expanded. Sources! REFERENCES [1] A. Nagar and M. Hahsler, Using text and data mining techniques to extract stock market sentiment from live news streams. [2] N. Godbole, M. Srinivasaiah, and S. Skiena, Large-scale sentiment analysis for news and blogs, in Proceedings of the International Conference on Weblogs and Social Media (ICWSM), vol. 2, [3] V. Sehgal and C. Song, Sops: stock prediction using web sentiment, in Data Mining Workshops, ICDM Workshops Seventh IEEE International Conference on. IEEE, 2007, pp [4] W. Zhang and S. Skiena, Trading strategies to exploit blog and news sentiment, in Proc. of the Fourth International AAAI Conference on Weblogs and Social Media, 2010, pp [5] M. Hu and B. Liu, Mining and summarizing customer reviews, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Lexicons! MPQA Subjectivity lexicon lexicons/!! tm.plugin.tags - This is an R package that contain positive and negative words!! Opinion mining, Sentiment Analysis and Opinion Spam Detection - sentiment-analysis.html! 22!

23

News Sentiment Analysis Using R to Predict Stock Market Trends

News Sentiment Analysis Using R to Predict Stock Market Trends News Sentiment Analysis Using R to Predict Stock Market Trends Anurag Nagar and Michael Hahsler Computer Science Southern Methodist University Dallas, TX Topics Motivation Gathering News Creating News

More information

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams 2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Twitter Stock Bot John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Hassaan Markhiani The University of Texas at Austin hassaan@cs.utexas.edu Abstract The stock market is influenced

More information

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts

More information

Social Market Analytics, Inc.

Social Market Analytics, Inc. S-Factors : Definition, Use, and Significance Social Market Analytics, Inc. Harness the Power of Social Media Intelligence January 2014 P a g e 2 Introduction Social Market Analytics, Inc., (SMA) produces

More information

Sentiment Analysis on Hadoop with Hadoop Streaming

Sentiment Analysis on Hadoop with Hadoop Streaming Sentiment Analysis on Hadoop with Hadoop Streaming Piyush Gupta Research Scholar Pardeep Kumar Assistant Professor Girdhar Gopal Assistant Professor ABSTRACT Ideas and opinions of peoples are influenced

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

Tweets Miner for Stock Market Analysis

Tweets Miner for Stock Market Analysis Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: b.pavlyshenko@gmail.com

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques Chris MacLellan cjmaclel@asu.edu May 3, 2012 Abstract Different methods for aggregating twitter sentiment data are proposed and three

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

A Comparative Study on Sentiment Classification and Ranking on Product Reviews A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan

More information

Analysis of Tweets for Prediction of Indian Stock Markets

Analysis of Tweets for Prediction of Indian Stock Markets Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,

More information

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market

More information

Applying Machine Learning to Stock Market Trading Bryce Taylor

Applying Machine Learning to Stock Market Trading Bryce Taylor Applying Machine Learning to Stock Market Trading Bryce Taylor Abstract: In an effort to emulate human investors who read publicly available materials in order to make decisions about their investments,

More information

SOPS: Stock Prediction using Web Sentiment

SOPS: Stock Prediction using Web Sentiment SOPS: Stock Prediction using Web Sentiment Vivek Sehgal and Charles Song Department of Computer Science University of Maryland College Park, Maryland, USA {viveks, csfalcon}@cs.umd.edu Abstract Recently,

More information

Can Twitter provide enough information for predicting the stock market?

Can Twitter provide enough information for predicting the stock market? Can Twitter provide enough information for predicting the stock market? Maria Dolores Priego Porcuna Introduction Nowadays a huge percentage of financial companies are investing a lot of money on Social

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

More information

Text Opinion Mining to Analyze News for Stock Market Prediction

Text Opinion Mining to Analyze News for Stock Market Prediction Int. J. Advance. Soft Comput. Appl., Vol. 6, No. 1, March 2014 ISSN 2074-8523; Copyright SCRG Publication, 2014 Text Opinion Mining to Analyze News for Stock Market Prediction Yoosin Kim 1, Seung Ryul

More information

Prediction of Stock Market Shift using Sentiment Analysis of Twitter Feeds, Clustering and Ranking

Prediction of Stock Market Shift using Sentiment Analysis of Twitter Feeds, Clustering and Ranking 382 Prediction of Stock Market Shift using Sentiment Analysis of Twitter Feeds, Clustering and Ranking 1 Tejas Sathe, 2 Siddhartha Gupta, 3 Shreya Nair, 4 Sukhada Bhingarkar 1,2,3,4 Dept. of Computer Engineering

More information

Semantic Sentiment Analysis of Twitter

Semantic Sentiment Analysis of Twitter Semantic Sentiment Analysis of Twitter Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The 11 th International Semantic Web Conference

More information

The Bayesian Spam Filter Project

The Bayesian Spam Filter Project The Bayesian Spam Filter Project March 24, 2004 1 Testing Methodology The following section describes the testing methodology used for the Spam- BGon suite of products: In setting up the tests for each

More information

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Keywords social media, internet, data, sentiment analysis, opinion mining, business Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction

More information

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

A CRF-based approach to find stock price correlation with company-related Twitter sentiment POLITECNICO DI MILANO Scuola di Ingegneria dell Informazione POLO TERRITORIALE DI COMO Master of Science in Computer Engineering A CRF-based approach to find stock price correlation with company-related

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

Can Twitter Predict Royal Baby's Name?

Can Twitter Predict Royal Baby's Name? Summary Can Twitter Predict Royal Baby's Name? Bohdan Pavlyshenko Ivan Franko Lviv National University,Ukraine, b.pavlyshenko@gmail.com In this paper, we analyze the existence of possible correlation between

More information

Forecasting stock markets with Twitter

Forecasting stock markets with Twitter Forecasting stock markets with Twitter Argimiro Arratia argimiro@lsi.upc.edu Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,

More information

EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES

EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES Zhenar Shaho Faeq 1,Kayhan Ghafoor 2, Bawar Abdalla 3 and Omar Al-rassam 4 1 Department of Software Engineering, Koya University, Koya,

More information

Robust Sentiment Detection on Twitter from Biased and Noisy Data

Robust Sentiment Detection on Twitter from Biased and Noisy Data Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research lbarbosa@research.att.com Junlan Feng AT&T Labs - Research junlan@research.att.com Abstract In this

More information

Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians

Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians Lucas Brönnimann University of Applied Science Northwestern Switzerland, CH-5210 Windisch, Switzerland Email: lucas.broennimann@students.fhnw.ch

More information

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Emoticon Smoothed Language Models for Twitter Sentiment Analysis Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

Sentiment Analysis and Topic Classification: Case study over Spanish tweets Sentiment Analysis and Topic Classification: Case study over Spanish tweets Fernando Batista, Ricardo Ribeiro Laboratório de Sistemas de Língua Falada, INESC- ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa,

More information

Sentiment analysis: towards a tool for analysing real-time students feedback

Sentiment analysis: towards a tool for analysing real-time students feedback Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:

More information

Data Deduplication in Slovak Corpora

Data Deduplication in Slovak Corpora Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain

More information

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch The Viability of StockTwits and Google Trends to Predict the Stock Market By Chris Loughlin and Erik Harnisch Spring 2013 Introduction Investors are always looking to gain an edge on the rest of the market.

More information

Sentiment Analysis for Movie Reviews

Sentiment Analysis for Movie Reviews Sentiment Analysis for Movie Reviews Ankit Goyal, a3goyal@ucsd.edu Amey Parulekar, aparulek@ucsd.edu Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing

More information

Impact of Financial News Headline and Content to Market Sentiment

Impact of Financial News Headline and Content to Market Sentiment International Journal of Machine Learning and Computing, Vol. 4, No. 3, June 2014 Impact of Financial News Headline and Content to Market Sentiment Tan Li Im, Phang Wai San, Chin Kim On, Rayner Alfred,

More information

Issues in Information Systems Volume 15, Issue II, pp. 350-358, 2014

Issues in Information Systems Volume 15, Issue II, pp. 350-358, 2014 AUTOMATED PLATFORM FOR AGGREGATION AND TOPICAL SENTIMENT ANALYSIS OF NEWS ARTICLES, BLOGS, AND OTHER ONLINE PUBLICATIONS Michael R. Grayson, Mercer University, michael.richard.grayson@live.mercer.edu Myungjae

More information

Reputation Management System

Reputation Management System Reputation Management System Mihai Damaschin Matthijs Dorst Maria Gerontini Cihat Imamoglu Caroline Queva May, 2012 A brief introduction to TEX and L A TEX Abstract Chapter 1 Introduction Word-of-mouth

More information

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com

More information

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage Whitepaper Leveraging Social Media Analytics for Competitive Advantage May 2012 Overview - Social Media and Vertica From the Internet s earliest days computer scientists and programmers have worked to

More information

Non-Parametric Spam Filtering based on knn and LSA

Non-Parametric Spam Filtering based on knn and LSA Non-Parametric Spam Filtering based on knn and LSA Preslav Ivanov Nakov Panayot Markov Dobrikov Abstract. The paper proposes a non-parametric approach to filtering of unsolicited commercial e-mail messages,

More information

CS224N Final Project: Sentiment analysis of news articles for financial signal prediction

CS224N Final Project: Sentiment analysis of news articles for financial signal prediction 1 CS224N Final Project: Sentiment analysis of news articles for financial signal prediction Jinjian (James) Zhai (jameszjj@stanford.edu) Nicholas (Nick) Cohen (nick.cohen@gmail.com) Anand Atreya (aatreya@stanford.edu)

More information

Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training

Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training Bing Xiang * IBM Watson 1101 Kitchawan Rd Yorktown Heights, NY 10598, USA bingxia@us.ibm.com Liang Zhou

More information

Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews

Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews Walid Maalej University of Hamburg Hamburg, Germany maalej@informatik.uni-hamburg.de Hadeer Nabil University of Hamburg

More information

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation Linhao Zhang Department of Computer Science, The University of Texas at Austin (Dated: April 16, 2013) Abstract Though

More information

IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS

IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS Vojtěch Fiala 1, Svatopluk Kapounek 1, Ondřej Veselý 1 1 Mendel University in Brno Volume 1 Issue 1 ISSN 2336-6494 www.ejobsat.com ABSTRACT

More information

Trading Strategies To Exploit Blog and News Sentiment

Trading Strategies To Exploit Blog and News Sentiment Trading Strategies To Exploit Blog and News Sentiment Wenbin Zhang and Steven Skiena {wbzhang@cs.sunysb.edu} and {skiena@cs.sunysb.edu} Department of Computer Science, Stony Brook University Stony Brook,

More information

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas

More information

{ { Calculating Your Social Media Marketing Return on Investment. A How-To Guide for New Social Media Marketers. Peter Ghali - Senior Product Manager

{ { Calculating Your Social Media Marketing Return on Investment. A How-To Guide for New Social Media Marketers. Peter Ghali - Senior Product Manager { { Calculating Your Social Media Marketing Return on Investment A How-To Guide for New Social Media Marketers Peter Ghali - Senior Product Manager This guide provides practical advice for developing a

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance of

More information

A Description of Consumer Activity in Twitter

A Description of Consumer Activity in Twitter Justin Stewart A Description of Consumer Activity in Twitter At least for the astute economist, the introduction of techniques from computational science into economics has and is continuing to change

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS Huina Mao School of Informatics and Computing Indiana University, Bloomington, USA ECB Workshop on Using Big Data for Forecasting

More information

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation

More information

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence Dr. Sulkhan Metreveli Leo Keller The greed https://www.youtube.com/watch?v=r8y6djaeolo The money https://www.youtube.com/watch?v=x_6oogojnaw

More information

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Akshay Amolik, Niketan Jivane, Mahavir Bhandari, Dr.M.Venkatesan School of Computer Science and Engineering, VIT University,

More information

Identifying Market Price Levels using Differential Evolution

Identifying Market Price Levels using Differential Evolution Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand mmayo@waikato.ac.nz WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary

More information

Bayesian Spam Filtering

Bayesian Spam Filtering Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Mimicking human fake review detection on Trustpilot

Mimicking human fake review detection on Trustpilot Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark

More information

HedgeChatter Case Study: Stock Price Manipulation Detection

HedgeChatter Case Study: Stock Price Manipulation Detection HedgeChatter Case Study: Stock Price Manipulation Detection Case Study: Stock Price Manipulation Detection Company: Organovo Holdings, INC Ticker: ONVO Focus: Risk Mitigation TOC Table of Content: Pg3

More information

Approaches for Sentiment Analysis on Twitter: A State-of-Art study

Approaches for Sentiment Analysis on Twitter: A State-of-Art study Approaches for Sentiment Analysis on Twitter: A State-of-Art study Harsh Thakkar and Dhiren Patel Department of Computer Engineering, National Institute of Technology, Surat-395007, India {harsh9t,dhiren29p}@gmail.com

More information

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

More information

Microblog Sentiment Analysis with Emoticon Space Model

Microblog Sentiment Analysis with Emoticon Space Model Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory

More information

Web Content Summarization Using Social Bookmarking Service

Web Content Summarization Using Social Bookmarking Service ISSN 1346-5597 NII Technical Report Web Content Summarization Using Social Bookmarking Service Jaehui Park, Tomohiro Fukuhara, Ikki Ohmukai, and Hideaki Takeda NII-2008-006E Apr. 2008 Jaehui Park School

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

Particular Requirements on Opinion Mining for the Insurance Business

Particular Requirements on Opinion Mining for the Insurance Business Particular Requirements on Opinion Mining for the Insurance Business Sven Rill, Johannes Drescher, Dirk Reinel, Jörg Scheidt, Florian Wogenstein Institute of Information Systems (iisys) University of Applied

More information

ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES

ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES FOUNDATION OF CONTROL AND MANAGEMENT SCIENCES No Year Manuscripts Mateusz, KOBOS * Jacek, MAŃDZIUK ** ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES Analysis

More information

Sentiment analysis for news articles

Sentiment analysis for news articles Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

SINAI at WEPS-3: Online Reputation Management

SINAI at WEPS-3: Online Reputation Management SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes

More information

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Sentiment Analysis: a case study. Giuseppe Castellucci castellucci@ing.uniroma2.it

Sentiment Analysis: a case study. Giuseppe Castellucci castellucci@ing.uniroma2.it Sentiment Analysis: a case study Giuseppe Castellucci castellucci@ing.uniroma2.it Web Mining & Retrieval a.a. 2013/2014 Outline Sentiment Analysis overview Brand Reputation Sentiment Analysis in Twitter

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume

On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume Abstract. In this study, we explored data from StockTwits, a microblogging platform exclusively dedicated to

More information

Social Media Monitoring visualisation: What do we have to look for?

Social Media Monitoring visualisation: What do we have to look for? Social Media Monitoring visualisation: What do we have to look for? Christopher Hackett Centre for Digital Business, Salford Business School, The University of Salford, UK Email: C.A.Hackett@salford.ac.uk

More information

A Logistic Regression Approach to Ad Click Prediction

A Logistic Regression Approach to Ad Click Prediction A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh

More information

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Nishith Tirpankar, Jiten Thakkar tirpankar.n@gmail.com, jitenmt@gmail.com December 20, 2015 Abstract In the world

More information

Real Time Analytics for Big Data. NtiSh Nati Shalom @natishalom

Real Time Analytics for Big Data. NtiSh Nati Shalom @natishalom Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for

More information

Sentiment Analysis in Twitter

Sentiment Analysis in Twitter Sentiment Analysis in Twitter Maria Karanasou, Christos Doulkeridis, Maria Halkidi Department of Digital Systems School of Information and Communication Technologies University of Piraeus http://www.ds.unipi.gr/cdoulk/

More information

On Attacking Statistical Spam Filters

On Attacking Statistical Spam Filters On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle

More information

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

More information

WHITEPAPER. Text Analytics Beginner s Guide

WHITEPAPER. Text Analytics Beginner s Guide WHITEPAPER Text Analytics Beginner s Guide What is Text Analytics? Text Analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content

More information

Towards Effective Recommendation of Social Data across Social Networking Sites

Towards Effective Recommendation of Social Data across Social Networking Sites Towards Effective Recommendation of Social Data across Social Networking Sites Yuan Wang 1,JieZhang 2, and Julita Vassileva 1 1 Department of Computer Science, University of Saskatchewan, Canada {yuw193,jiv}@cs.usask.ca

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about

More information

Big Data to trade bonds/fx & Python demo on FX intraday vol

Big Data to trade bonds/fx & Python demo on FX intraday vol Big Data to trade bonds/fx & Python demo on FX intraday vol Saeed Amen, Quantitative Strategist Managing Director & Co-founder of The Thalesians @thalesians / commentary around finance saeed@thalesians.com

More information