Tweets Miner for Stock Market Analysis



Similar documents
Sentiment analysis on tweets in a financial domain

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

News Sentiment Analysis Using R to Predict Stock Market Trends

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

Analysis of Tweets for Prediction of Indian Stock Markets

Predicting Stock Market Fluctuations. from Twitter

Twitter sentiment vs. Stock price!

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume

TRCM: A Methodology for Temporal Analysis of Evolving Concepts in Twitter

How Does Social Sentiment Affect the Stock Market?

The Use of Twitter Activity as a Stock Market Predictor

Using Twitter as a source of information for stock market prediction

Twitter Analytics: Architecture, Tools and Analysis

Reconstruction and Analysis of Twitter Conversation Graphs

Predicting Stock Market Indicators Through Twitter I hope it is not as bad as I fear

Community-Aware Prediction of Virality Timing Using Big Data of Social Cascades

Forecasting stock markets with Twitter

Predicting stocks returns correlations based on unstructured data sources

Twitter Volume Spikes: Analysis and Application in Stock Trading

WILL TWITTER MAKE YOU A BETTER INVESTOR? A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET

Text Opinion Mining to Analyze News for Stock Market Prediction

Twitter Analytics for Insider Trading Fraud Detection

How To Predict Stock Price With Mood Based Models

USING TWITTER TO PREDICT SALES: A CASE STUDY

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

Microblog Sentiment Analysis with Emoticon Space Model

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-4, Issue-4) Abstract-

Can Twitter provide enough information for predicting the stock market?

Initial Report. Predicting association football match outcomes using social media and existing knowledge.

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller

Twitter Data Mining for Events Classification and Analysis

A Big Data Analytical Framework For Portfolio Optimization Abstract. Keywords. 1. Introduction

Financial Text Mining

Sentiment Analysis and Time Series with Twitter Introduction

TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING

Semantic Sentiment Analysis of Twitter

Social-Sensed Multimedia Computing

A Statistical Text Mining Method for Patent Analysis

Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping

An Analysis of Verifications in Microblogging Social Networks - Sina Weibo

Part 2: Community Detection

Exploring Big Data in Social Networks

IoT Solutions for 3-D Visualization of Twitter Data

Instagram Post Data Analysis

Pearson's Correlation Tests

Italian Journal of Accounting and Economia Aziendale. International Area. Year CXIV n. 1, 2 e 3

Software Review: ITSM 2000 Professional Version 6.0.

Do All Birds Tweet the Same? Characterizing Twitter Around the World

traders jlbollen #twitter mood predicts the #DJIA FTW!

Prediction of Stock Market Shift using Sentiment Analysis of Twitter Feeds, Clustering and Ranking

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

Online Marketing Module COMP. Certified Online Marketing Professional. v2.0

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

Using Social Media for Continuous Monitoring and Mining of Consumer Behaviour

EXPLOITING TWITTER IN MARKET RESEARCH FOR UNIVERSITY DEGREE COURSES

Sentiment Analysis Tool using Machine Learning Algorithms

Visualizing Big Data. Activity 1: Volume, Variety, Velocity

ConTag: Conceptual Tag Clouds Video Browsing in e-learning

Understanding Graph Sampling Algorithms for Social Network Analysis

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.

Salesforce ExactTarget Marketing Cloud Radian6 Mobile User Guide

Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

Multiple Linear Regression

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Chapter 4: Vector Autoregressive Models

A Study of YouTube recommendation graph based on measurements and stochastic tools

Specific Usage of Visual Data Analysis Techniques

Text and data analytics for social network mining

Transcription:

Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: b.pavlyshenko@gmail.com In this paper, we present a software package for the data mining of Twitter microblogs for the purpose of using them for the stock market analysis. The package is written in R langauge using apropriate R packages. The model of tweets has been considered. We have also compared stock market charts with frequent sets of keywords in Twitter microblogs messages. Key words: Twitter, tweets, data mining, frequent sets, stock market. Introduction The system of microblogs Twitter is one of popular means of interaction among users via short messages (up to 140 characters). Twitter messages are characterized by high density of contextually meaningful keywords. This feature conditions the availability of the study of microblogs by using data mining in order to detect semantic relationships between the main concepts and discussion subjects in microblogs. Very promising is the analysis of predictive ability of time dependences of key quantitative characteristics of thematic concepts in the messages of twitter microblogs. The peculiarities of social networks and users' behavior are researched in many studies. In [1] the microblogging phenomena were investigated. Users' influence in Twitter was studied in [2]. Users' behavior in social networks is analyzed in [3]. In [4] the methods of opinion mining of Twitter corpus were analyzed. Several papers are devoted to the analysis of possible forecasting of events by analyzing messages in microblogs. In [5] it was studied whether public mood, as measured from large-scale collection of tweets, posted on twitter.com, is correlated or predictive for stock markets. In this paper, we construct a set-theoretic model of key tags of twitter messages. Then we consider our developed software package for data mining of twitter s microblogs. Then we compare the stock market charts with the frequent sets of keywords in Twitter microblogs messages. We analyzed the graphs for frequent itemsets and association rules for specified keywords. We also conducted the Granger test to determine the causation between the time dynamics of frequent sets and the stock price. Theoretical Model Let us consider a model that describes microblogs messages. We have chosen a set of keywords which specifiy the themes of messages and are present in all messages, e.g, kw Keywords, Keywords = { stock, market } (1) Then we define a set of microblogs messages for the analysis: TW kw ( kw) { tw i kw tw kw Keywords } =, Our next step is to consider the basic elements of the theory of frequent sets. Each tweet will be considered as a basket of key terms Such a set is called a transaction. We label some set of terms as j i j (2) tw { tw i = w ij }. (3) 1

The set of tweets, which includes the set F looks like TW F = { w j }. (4) = { tw F tw ; r 1,... m}. (5) kw F r r = The ratio of the number of transactions, which include the set F, to the total number of transactions is called a support of F basket and it is marked as Supp (F) : kw TWF Supp ( F) =. (6) tw TW A set is called a frequent itemset, if its support value is more than the minimum support that is specified by a user Given the condition (7), we find a set of frequent itemsets Supp ( F) > Supp. (7) min L = F Supp( F ) Supp }. (8) { j j > min Based on frequent itemsets, we can build association rules, which are considered as X Y, (9) where X is called antecedent and Y is called consequent. The objects of antecedent and consequent are the subsets of the frequent set F of considered keywords X Y = F. (10) Software package for tweets mining Let us consider the software package for tweets mining, we developed it using R language and appropriate R packages. R GUI environment with working software is presented in the Fig.1. This software allows users to load tweets from a specified list of twitter users, such as: CNN, WSJ, Reuters, Bloomberg, etc., and perform data mining analysis, comparing results with the stock market chart. This package is described and can be loaded in our analytics blog at http://bpavlyshenko.blogspot.com. Fig.1 Tweets Miner in the R GUI 2

The user s interface is followed in the Fig.2. To form frequent sets of keywords for the analysis, one can compose a list of some specific frequent terms, and using this list, one can find the terms which are associated with these frequent terms. E.g. a term "apple" is associated with "aapl", "ipad", "iphone". Then the following frequent sets can be investigated: "apple", "apple aapl", "apple aapl ipad", etc. Before the analysis, one should click on the buttons "Load new users' tweets" and "Load financial time series". Each time the latest tweets only will be loaded, previous tweets are saved in the file and can be used for the next analysis. Fig.2 User s interface for Tweets Miner The analysis can also be performed without loading new tweets and financial time series. In this case the analysis will be carried out for previously loaded tweets and financial time series, which are saved in the data files. When one tries to find frequent terms or associations for the first time per session or after loading new tweets, the program requires several minutes for documents-terms matrix creation. It happens only once per session. A user can browse frequent keywords using a keywords cloud for visualization (Fig.3), and find the associations in the tweets with specified keywords. (Fig.4). To plot the time dynamics of frequent sets of keywords, a users need to specify a frequent set in the lower case, e.g. "apple", "apple aapl", "apple iphone", etc.; then users can choose a stock symbol for comparing time series and choose the time windows for two moving averages, then click on the button "Plot Dynamics of Frequent Sets". The time dynamics of frequent sets of keywords includes two moving averages, which can be used for the trading strategy with the intersection of two moving averages (Fig.5). 3

Fig.3 The search of frequent terms list and the terms cloud. Fig.4 The cloud of keywords, which are associated with The keyword market" in the loaded tweets. Fig.5 The time dynamics of frequent sets of keywords {apple,stock} and the stock chart for AAPL. 4

To compare the time dynamics of frequent sets of keywords, they can be plotted as a standard stocks chart with candles and volumes (Fig.6). Fig.6 The stock chart with candles and volumes and moving averages of keywords frequent sets. The developed package can also plot a crosscorrelation function, which shows the correlation between the moving averages of frequent sets and the stock price Fig.7). It allows to find predictive frequent sets. Fig.7 The crosscorellation between the stock chart AAPL and the moving averages of keywords frequent sets {apple,stock}. 5

This software also allows to perform the forecasting for the dynamics of stock charts; it is based on the ARIMA model using R package 'forecast' (Fig.8). Fig.8 The forecasting of the stock chart AAPL using ARIMA model. Consideration and conclusions Comparing the moving averages of some found frequent sets of keywords and the stock market, we can notice that these curves have similar trends regions, which are also proved by the calculated crosscorelation function. We have also performed an additional analysis, which is not included into the package under consideration. In the Fig.9, we showed the graph for the frequent itemsets, which is formed in the analyzed tweets array with some specified keywords. In the Fig.10 we showed the graph of association rules which are formed with the specified keywords. Such graphs can map the semantic structure of analyzed tweets array. Fig.9 The graph for the frequent sets of specified keywords. 6

Fig.10 The graph for the association rules of specified keywords. We conducted the Granger test to determine the causation between the time dynamics of frequent itemsets and the stock price. In the first test, we considered the null hypothesis about lack of causality between the dynamics of the frequent itemset {apple, stock} and AAPL stock price; in the second test, we examined the null hypothesis about lack of the causality between AAPL stock prices and the dynamics of the frequent itemset {apple, stock}. The calculations were performed using R packages. We have got the following results: test 1 Granger causality test Model 1: V3 ~ Lags(V3, 1:1) + Lags(V2, 1:1) Model 2: V3 ~ Lags(V3, 1:1) Res.Df Df F Pr(>F) 1 87 2 88-1 10.05 0.002103 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 test 2 Granger causality test Model 1: V2 ~ Lags(V2, 1:1) + Lags(V3, 1:1) Model 2: V2 ~ Lags(V2, 1:1) Res.Df Df F Pr(>F) 1 87 2 88-1 0.3261 0.5694 p-value in the first test is equal to 0.002103, this is significantly less than the standard significance level of 0.05. The P-value in the second test is equal to 0.5694, this is substantially more than the standard significance level of 0.05. It means that the dynamics of the frequent 7

itemsets of keywords {apple, stock} in users' tweets under analysis determines the dynamics of Apple stock prices. Thus, the dynamics of support of some of the identified frequent itemsets of keywords reflects the trends of the stock market. Such frequent itemsets can be considered as potential predictive markers. Our next step is going to be the use of multivariate forecasting algorithms, based on the vector autoregressive model (VAR model). These algorithms can include many time series into analyses, the time series describe both stock prices and quantitative characteristics of tweets. We suppose such an approach will give the narrower and more precise forecasting. We are also planning to use the theory of semantic fields, frequent sets, association rules, Galois lattice, and the formal concepts analysis. References [1] Kwak, H., Lee, C., Park, H., & Moon, S. (2010, April). What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web (pp. 591-600). ACM. [2] Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K. P. (2010, May). Measuring user influence in twitter: The million follower fallacy. In 4th international aaai conference on weblogs and social media (icwsm) (Vol. 14, No. 1, p. 8). [3] Benevenuto, F., Rodrigues, T., Cha, M., & Almeida, V. (2009, November). Characterizing user behavior in online social networks. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference (pp. 49-62). ACM. [4] Pak, A., & Paroubek, P. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of LREC, pp. 1320 1326 (Vol. 2010). [5] Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8. 8