Using Twitter as a source of information for stock market prediction

Similar documents

Forecasting stock markets with Twitter

Forecasting financial time series with machine learning models and Twitter data

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Analysis of Tweets for Prediction of Indian Stock Markets

Sentiment analysis on tweets in a financial domain

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

Initial Report. Predicting association football match outcomes using social media and existing knowledge.

IMPACT OF SOCIAL MEDIA ON THE STOCK MARKET: EVIDENCE FROM TWEETS

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

How To Predict Stock Price With Mood Based Models

Using Tweets to Predict the Stock Market

On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume

SI485i : NLP. Set 6 Sentiment and Opinions

Tweets Miner for Stock Market Analysis

Semantic Sentiment Analysis of Twitter

Data Mining Yelp Data - Predicting rating stars from review text

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract

Applying Machine Learning to Stock Market Trading Bryce Taylor

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Twitter Analytics for Insider Trading Fraud Detection

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

Stock Prediction Using Twitter Sentiment Analysis

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction

Equity forecast: Predicting long term stock price movement using machine learning

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

It has often been said that stock

An Introduction to Data Mining

Can Twitter provide enough information for predicting the stock market?

Prediction of Stock Performance Using Analytical Techniques

Microblog Sentiment Analysis with Emoticon Space Model

Predict Influencers in the Social Network

Mammoth Scale Machine Learning!

ISSN: (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

WILL TWITTER MAKE YOU A BETTER INVESTOR? A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Active Learning SVM for Blogs recommendation

Machine Learning for Data Science (CS4786) Lecture 1

The process of gathering and analyzing Twitter data to predict stock returns EC115. Economics

Text Mining for Sentiment Analysis of Twitter Data

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Predicting Stock Market Fluctuations. from Twitter

Constructing Social Intentional Corpora to Predict Click-Through Rate for Search Advertising

Using Twitter to Analyze Stock Market and Assist Stock and Options Trading

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin

ALGORITHMIC TRADING USING MACHINE LEARNING TECH-

Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians

Do Tweets Matter for Shareholders? An Empirical Analysis

Feifei Xu. Submitted in partial fulfilment of the requirements for the degree of Master of Electronic Commerce

Sentiment analysis: towards a tool for analysing real-time students feedback

Supervised Learning (Big Data Analytics)

Introduction. A. Bellaachia Page: 1

Chapter 6. The stacking ensemble approach

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Programming Exercise 3: Multi-class Classification and Neural Networks

Spam Detection on Twitter Using Traditional Classifiers M. McCord CSE Dept Lehigh University 19 Memorial Drive West Bethlehem, PA 18015, USA

The relation between news events and stock price jump: an analysis based on neural network

The Operational Value of Social Media Information. Social Media and Customer Interaction

Predicting stocks returns correlations based on unstructured data sources

Italian Journal of Accounting and Economia Aziendale. International Area. Year CXIV n. 1, 2 e 3

Machine learning for algo trading

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

How can we discover stocks that will

The Influence of Sentimental Analysis on Corporate Event Study

Machine Learning in FX Carry Basket Prediction

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Exploiting Topic based Twitter Sentiment for Stock Prediction

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Quantifying Political Legitimacy from Twitter

How To Use Neural Networks In Data Mining

How Does Social Sentiment Affect the Stock Market?

A short guide to Twitter

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

How To Identify Software Related Tweets On Twitter

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Neural Networks for Sentiment Detection in Financial Text

STATISTICAL ANALYSIS OF SOCIAL NETWORKS. Agostino Di Ciaccio, Giovanni Maria Giorgi

Data Mining. Nonlinear Classification

Sentiment Analysis Tool using Machine Learning Algorithms

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

A Content based Spam Filtering Using Optical Back Propagation Technique

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller

Challenges of Cloud Scale Natural Language Processing

Exploiting Social Relations and Sentiment for Stock Prediction

Correlation between Stock Prices and polarity of companies performance in Tweets: a CRF-based Approach

Using Social Media for Continuous Monitoring and Mining of Consumer Behaviour

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Decision Trees from large Databases: SLIQ

Stock Market Forecasting Using Machine Learning Algorithms

Big Data analytics of Social Media

The Use of Twitter Activity as a Stock Market Predictor

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

Machine Learning Techniques for Stock Prediction. Vatsal H. Shah

Transcription:

Using Twitter as a source of information for stock market prediction Ramon Xuriguera (rxuriguera@lsi.upc.edu) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of London, UK UPC BarcelonaTech

Motivation and Goals Vast amounts of new information every day second available in social networking platforms. Can some of this information help improve time series predictions for certain stocks? Hypotheses Volume More messages More variance (Volatility) Sentiment More positive/negative Increase/Decrease benefits (Returns)

Motivation and Goals Related Work J. Bollen, et al. Twitter mood predicts the stock market [5] Assesses general mood by checking whether messages contain certain words (OpinionFinder and Google Profile of Mood States) Predict time series direction with a self-organizing Fuzzy Neural Network Twitter dataset from Feb. to Dec. 2008. Our approach Different sentiment classifier Target different companies and indexes by only looking at tweets related to them. Test with other models

Motivation and Goals The big picture 22/07 16/08 Target time series 17/08? 22/07 Time series from Twitter data 16/08 My right sock has THor. My left has Iron-Man. I'm missing Captain America pants however. @Stormylyric I wnt to see tht Super 8 and I ain't sure bout Green Lantern. F*** Captain America I need a Marvel obsessed friend to see Thor and Captain America with. Captain America movie looks way better than the Thor movie.

Twitter @TEDNews TED News RT @TEDchris: Mind-shifting #TED talk on the evolution of language from Mark Pagel http://on.ted.com/pagel 3 Aug via TweetDeck Social networking and microblogging service Messages or tweets up to 140 characters Hashtags, retweets, mentions, URLs

Twitter Data Retrieval Lack of standard public datasets. It is not allowed for third parties to redistribute Twitter Content. We were forced to create our own dataset. Twitter Streaming API An HTTP connection is kept alive to retrieve tweets as they are posted. Tweets are filtered by keyword or author. Limited amount of data. Can t go back in time. Began listening to Twitter s stream in March 2011. Buying data from official tweet resellers Extremely expensive

Twitter Relevance Filter Latent Dirichlet Allocation Topics politician 0.05 corruption 0.04 election 0.02... human 0.03 relationship 0.03 violence 0.02 peace 0.01... sound wave harmony... 0.03 0.01 0.01 Documents @TEDNews TED News Jarreth Merz at #TEDGlobal: Corruption + violence erupted during Ghana's election. Sounds of harmony erupted from the crowd: "We want peace" 14 Jul via TweetDeck Topic assignments Topic proportions Text as mixtures of (two) topics (relevant/not relevant) 300K tweet collection. Tested with 20K tweets High precision (83.3%), low recall (65.4%)

Sentiment Analysis Determine whether a message contains positive or negative opinions on a given subject A sentiment classifier is trained with an automatically labelled dataset where smileys are used as labels: [1, 2, 3] :-) :-D positive [:=8][ -]?[)D] :-( negative [:=8][ -]?( How do we go about providing text to a probabilistic classifier? Bag of Words Vector of occurrences/frequencies of the words in a text.

Sentiment Analysis Classifier Multinomial Naïve Bayes Will assign a tweet, the sentiment with the highest conditional probability given that tweet Binary Classification (positive/negative) 82.5% for twittersentiment set Sentiment Index: Time series of the daily percentage of positive tweets concerning a company

Sentiment Analysis The big picture 22/07 16/08 Target time series 17/08? 22/07 Time series from Twitter data 16/08 My right sock has THor. My left has Iron-Man. I'm missing Captain America pants however. @Stormylyric I wnt to see tht Super 8 and I ain't sure bout Green Lantern. F*** Captain America I need a Marvel obsessed friend to see Thor and Captain America with. Captain America movie looks way better than the Thor movie.

Forecasting Financial Time Series Goal Focus on companies: AAPL, GOOG, MSFT, YHOO Two indices: OEX (S&P100) and GSPC (S&P500) And their implicit volatility: VXO and VIX Price Returns: Adjusted Close Log-normally distributed Logarithmic Returns Volatility: Computed from price returns Exponetially Weighted Moving Average

Forecasting Financial Time Series Index Adj. Close Log Ret. 22/03 05/04 19/04 04/05 18/05 02/06 16/06 30/06 15/07 29/07 12/08 26/08 12/09 26/09 10/10 24/10 07/11 Date 2 0 2 4 2 0 2 4 Lags: 3 3 2 1 0 1 2 3 2 1 0 1 2 3 Adj. Close Log Ret. Lags: 4 3 2 1 0 1 2 Lags: 3 3 2 1 0 1 2 Lags: 2 3 2 1 0 1 2 Lags: 1 3 2 1 0 1 2 Lags: 0 3 2 1 0 1 2 3 Index Adjusted Close Log Returns Index (aapl)

Forecasting Financial Time Series Model Adequacy Nonlinearity Test for nonlinear relationship between two series. [6] Causality tests Granger test of causality (parametric and non-parametric [7]) for the two time series and different lags Prequential evaluation Too few data to split into sets. Use all past data to re-train, then predict the direction of the series for the following day. Models for prediction Linear Regression Feed-forward Neural Networks Support Vector Machines

Results Compared Accuracy and Directional Measure for predictions using and not using Twitter data. Large DM Model outperforms chance of random choice Predicted R.Filter Model Lags... Acc. w/o Acc. w/ DM w/o DM w/ AAPL TRUE SVM(sigmoid) 2... 0.560 0.628 2.398 10.739 AAPL TRUE SVM(sigmoid) 3... 0.521 0.601 0.296 6.629 AAPL TRUE SVM(sigmoid) 4... 0.537 0.635 0.835 11.932........................... Tested many parameter configurations and we end up with thousands of rows of results: Need to summarise

Results Decision tree. Improved Accuracy (Weka s REPTree) Model Family Predicted Series Adj. Close Add. Series Sentiment Predicted Symbol AAPL LDA Filter True Improved! (14/7) SVM (sigmoid) Predicted Series Add. Series Predicted Symbol Improved! (10/2) NN (5 hu) Volatility Volume GOOG, MSFT...

Results Decision tree. Improved DM (Weka s REPTree) Model Family SVM (sigmoid) Predicted Symbol... AAPL Lags 2 < lag < 6 Add. Series Predicted Index Adj. Close Improved! (14/5) Both Predicted Index Adj. Close Sentiment Improved! (10/3)

Results Some of the machine learning models we have evaluated, especially Support Vector Machines, present improvements in their predictive power when reinforced with our Twitter Sentiment Index

References I [1] A. Go, R. Bhayani, L. Huang Twitter Sentiment Classification using Distant Supervision, 2009. [2] A. Bifet, E. Frank Sentiment Knowledge Discovery in Twitter Streaming Data, Discovery Science, 2010. [3] J.K. Ahkter, S. Soria Sentiment Analysis: Facebook Statuses Messages, [4] R.S. Tsay Analysis of Financial Time Series (pp. 216) 2002

References II [5] J. Bollen, H. Mao and Xiao-Jun Zeng Twitter mood predicts the stock market, Journal of Computational Science, Vol. 2, Iss. 1, 2011. [6] T. H. Lee, H. White, C. W. J. Granger Testing for neglected nonlinearity in time series models Journal of Econometrics, Vol. 56, Iss. 3, 1993. [7] C. Diks, V. Panchenko A new statistic and practical guidelines for nonparametric Granger causality testing Journal of Economic Dynamics and Control, Vol. 30, Iss. 9-10, 2006.

Directional Measure Given a contingency table with predicted results, Actual Predicted Up Down Up m 11 m 12 m 10 Down m 21 m 22 m 20 m 01 m 02 m The directional measure can be computed as: χ 2 = 2 2 i=1 j=1 ( mij m i0 m 0j /m ) 2 m i0 m 0j /m Under some circumstances, it behaves like a χ 2 distribution with 1 degree of freedom [4]. Assuming a 5% error, we need to have χ 2 > 3.84.

A bit more about LDA α θ d z d,n w d,n β k η Generative model N D Words are only observed variables Topics are distribution over words Dirichlet: distribution over distributions K