The Use of Twitter Activity as a Stock Market Predictor



Similar documents
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

How To Predict Stock Price With Mood Based Models

Tweets Miner for Stock Market Analysis

The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch

Forecasting stock markets with Twitter

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Sentiment analysis on tweets in a financial domain

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

JetBlue Airways Stock Price Analysis and Prediction

Can Twitter provide enough information for predicting the stock market?

Pattern Recognition and Prediction in Equity Market

The process of gathering and analyzing Twitter data to predict stock returns EC115. Economics

The Influence of Sentimental Analysis on Corporate Event Study

Sentiment Analysis on Big Data

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Financial Trading System using Combination of Textual and Numerical Data

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies

Market Velocity and Forces

IT services for analyses of various data samples

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

The term marginal cost refers to the additional costs incurred in providing a unit of

Socialbakers Analytics User Guide

Application of Predictive Model for Elementary Students with Special Needs in New Era University

Estimating a market model: Step-by-step Prepared by Pamela Peterson Drake Florida Atlantic University

Stock Prediction Using Twitter Sentiment Analysis

SECURE BACKUP SYSTEM DESKTOP AND MOBILE-PHONE SECURE BACKUP SYSTEM HOSTED ON A STORAGE CLOUD

Easily Identify Your Best Customers

Professional Diploma in Digital Marketing

Predicting Stock Market Fluctuations. from Twitter

Cymon.io. Open Threat Intelligence. 29 October 2015 Copyright 2015 esentire, Inc. 1

Business Valuation Review

SPC Data Visualization of Seasonal and Financial Data Using JMP WHITE PAPER

OUTLOOK 2003 ADDRESS BOOK BACKUP. For this reason outlook 2003 address book backup guides are far superior compared to pdf guides.

Applying Machine Learning to Stock Market Trading Bryce Taylor

Customer Experience Management

Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

Free Trial - BIRT Analytics - IAAs

Big Data and Analytics: Challenges and Opportunities

Using Twitter as a source of information for stock market prediction

SPRING 14 RELEASE NOTES

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Capturing Meaningful Competitive Intelligence from the Social Media Movement

Cleaned Data. Recommendations

White paper. Gerhard Hausruckinger. Approaches to measuring on-shelf availability at the point of sale

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

ACTIVITY 4.1 READING A STOCK TABLE

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

How to Win the Stock Market Game

Content Marketing Integration Workbook

SAP Digital CRM. Getting Started Guide. All-in-one customer engagement built for teams. Run Simple

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Data Warehousing and Data Mining in Business Applications

Stock Market Q & A. What are stocks? What is the stock market?

Testing Metrics. Introduction

Sales and Invoice Management System with Analysis of Customer Behaviour

The power of IBM SPSS Statistics and R together

Chapter 14. Web Extension: Financing Feedbacks and Alternative Forecasting Techniques

Past, present, and future Analytics at Loyalty NZ. V. Morder SUNZ 2014

Lina Warrad. Applied Science University, Amman, Jordan

Neural Networks for Sentiment Detection in Financial Text

Sentiment analysis using emoticons

Professional Diploma. in Digital Marketing.

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Automating FP&A Analytics Using SAP Visual Intelligence and Predictive Analysis

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Data Analytics in Organisations and Business

THE INFLUENCE OF MARKETING INTELLIGENCE ON PERFORMANCES OF ROMANIAN RETAILERS. Adrian MICU 1 Angela-Eliza MICU 2 Nicoleta CRISTACHE 3 Edit LUKACS 4

DEVELOPING A SOCIAL MEDIA STRATEGY

Predicting stocks returns correlations based on unstructured data sources

Analysis of Tweets for Prediction of Indian Stock Markets

Web Extension: Financing Feedbacks and Alternative Forecasting Techniques

Hootsuite Best Practices

Introduction to Regression and Data Analysis

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

11. Analysis of Case-control Studies Logistic Regression

Sensex Realized Volatility Index

DATA MINING TECHNIQUES FOR CRM

Introduction To Hive

Project Proposal: Monitoring and Processing Stock Market Data In Real Time Using the Cyclone V FPGA

MINING DATA FROM TWITTER. Abhishanga Upadhyay Luis Mao Malavika Goda Krishna

A Primer on Forecasting Business Performance

What is Driving Rapid Growth in the Australian Mobile Advertising Market?

Stock Price Prediction Using Sentiment Detection of Twitter

Operationalise Predictive Analytics

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Transcription:

National College of Ireland Higher Diploma in Science in Data Analytics 2013/2014 Robert Coyle X13109278 robert.coyle@student.ncirl.ie The Use of Twitter Activity as a Stock Market Predictor

Table of Contents ABSTRACT... 6 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS... 6 INTRODUCTION... 7 RELATED WORK... 8 SYSTEMS AND DATASETS... 8 DESIGN AND ARCHITECTURE... 8 Brief description of work carried out... 8 DATASETS... 8 Gathering of Twitter Data.... 9 Gathering of Stock Price Data... 15 Data Preparation... 16 REQUIREMENTS... 17 Data requirements... 17 User requirements... 17 Usability requirements... 17 Functional Requirements... 17 TESTING AND EVALUATION...19 SYSTEMS TESTING.... 19 Apple Stock... 19 Microsoft Stock... 25 Tesla Stock... 33 FORMULA FOR PREDICTING STOCK MOVEMENT... 36 Formula Used... 36 Apple Stock Prediction... 36 Microsoft Stock Prediction... 40 Tesla Stock Prediction... 43 CONCLUSION...46 FURTHER DEVELOPMENT...47 BIBLIOGRAPHY...48 APPENDIX...48 Project Materials:... 48 PROJECT PROPOSAL...49 INTRODUCTION... 49 BACKGROUND... 49 TECHNICAL APPROACH... 50 SPECIAL RESOURCES REQUIRED... 50 PROJECT PLAN... 51 TECHNICAL DETAILS... 51 SYSTEMS/DATASETS... 51 EVALUATION/TEST AND ANALYSIS... 51 CONSULTATION WITH SPECIALIZATION PERSONS... 52 REQUIRMENTS SPECIFICATION...53 The Use of Twitter Activity as a Stock Market Predictor 2

DOCUMENT CONTROL... 53 REVISION HISTORY... 53 DISTRIBUTION LIST... 53 RELATED DOCUMENTS... 53 1 INTRODUCTION... 54 1.1 PURPOSE... 54 1.2 PROJECT SCOPE... 54 1.2.1 In Scope... 54 1.2.2 Out of Scope... 55 1.3 DOCUMENT SCOPE... 55 1.4 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS... 55 2 USER REQUIREMENTS DEFINITION...55 2.1 USER CHARACTERISTICS... 55 3 REQUIREMENTS SPECIFICATION...56 3.1 FUNCTIONAL REQUIREMENTS... 56 3.1.1 USE CASE DIAGRAM OVERALL FUNCTIONAL REQUIREMENTS... 57 3.1.2 REQUIREMENT 1: ACQUIRE DATA 1 AND 2... 57 3.1.2.1 Description & Priority... 57 3.1.2.2 Use Case... 58 Scope... 58 Description... 58 Use Case Diagram... 58 Flow Description... 58 3.1.3 REQUIREMENT 2: CLEAN DATA 1 AND 2... 60 3.1.3.1 Description & Priority... 60 3.1.3.2 Use Case... 60 Scope... 60 Description... 60 Use Case Diagram... 61 Flow Description... 61 3.1.4 REQUIREMENT 2: ANALYZE DATA... 63 3.1.4.1 Description & Priority... 63 3.1.4.2 Use Case... 63 Scope... 63 Description... 63 Use Case Diagram... 64 Flow Description... 64 3.1.5 REQUIREMENT 2: PUBLISH DATA... 65 3.1.5.1 Description & Priority... 65 3.1.5.2 Use Case... 66 Scope... 66 Description... 66 Use Case Diagram... 66 Flow Description... 67 3.2 NON-FUNCTIONAL REQUIREMENTS... 68 3.2.1 Availability: Must Have... 68 3.2.2 Storage Requirements: Must Have... 68 3.2.3 Connection Reliability: Must Have... 68 3.2.4 Connection Speed: Must Have... 68 3.2.5 Backup and Recovery: Must Have... 68 3.2.6 Program to clean data: Must Have... 68 3.2.7 Software Analysis tools: Must Have... 68 3.2.8 Communication Requirements: Must Have... 69 The Use of Twitter Activity as a Stock Market Predictor 3

3.2.9 Security: Must Have... 69 3.2.9 Data Validation: Must Have... 69 5 INTERFACE REQUIREMENTS...69 5.1 GUI... 69 An example of a analysis of tweets.... 69 Examples of tweets analyzed on Microsoft Excel and Geo Flow... 69 Analysis of tweets using R language... 71 Example of Excel Data for intro to Regression.... 71 Example of analysis completed on R Studio.... 72 6 ANALYSIS EVOLUTION...72 PROGRESS MANAGEMENT REPORT 1...73 DOCUMENT LOCATION... 73 REVISION HISTORY... 73 APPROVALS... 73 DISTRIBUTION... 73 PURPOSE OF DOCUMENT... 74 DATE OF REPORT... 74 PERIOD COVERED... 74 SCHEDULE STATUS... 74 Updated Gantt chart... 74 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS...74 PRODUCTS COMPLETED DURING THIS PERIOD...75 PROBLEMS...75 ACTUAL... 75 POTENTIAL... 75 RAID LOG:... 76 Risks... 76 Assumptions... 77 Issues... 77 Dependency... 77 PRODUCTS DUE FOR COMPLETION...77 PROJECT ISSUES STATUES... 78 CONCLUSION...78 PROGRESS MANAGEMENT REPORT 2...79 DOCUMENT LOCATION... 79 REVISION HISTORY... 79 APPROVALS... 79 DISTRIBUTION... 79 PURPOSE OF DOCUMENT... 80 DATE OF REPORT... 80 PERIOD COVERED... 80 SCHEDULE STATUS... 80 Updated Gantt chart... 80 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS...80 PRODUCTS COMPLETED DURING THIS PERIOD...81 PROBLEMS...81 ACTUAL... 81 POTENTIAL... 81 The Use of Twitter Activity as a Stock Market Predictor 4

RAID LOG:... 82 Risks... 82 Assumptions... 83 Issues... 83 Dependency... 84 PRODUCTS DUE FOR COMPLETION...84 CONCLUSION...85 PROGRESS MANAGEMENT REPORT 3...85 DOCUMENT LOCATION... 85 REVISION HISTORY... 85 APPROVALS... 85 DISTRIBUTION... 85 PURPOSE OF DOCUMENT... 86 DATE OF REPORT... 86 PERIOD COVERED... 86 SCHEDULE STATUS... 86 Updated Gantt chart... 86 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS...86 PRODUCTS COMPLETED DURING THIS PERIOD...86 PROBLEMS...87 ACTUAL... 87 POTENTIAL... 87 RAID LOG:... 87 Risks... 87 Assumptions... 88 Issues... 88 Dependency... 89 PRODUCTS DUE FOR COMPLETION...89 CONCLUSION...89 REFERENCES...90 The Use of Twitter Activity as a Stock Market Predictor 5

Abstract This thesis investigates the possibility of predicting stock market movement using Twitter activity. The Analysis will use data mining applications, data analysis techniques, correlation and regression modelling. The data mining of Twitter feeds was carried out. The process involved using Twitter API and Java code to search and download tweets with the words Apple, Microsoft and Tesla in them. These files were then processed using Amazon web service and Text Wrangler. An analysis was carried out using software such as R studio and Microsoft excel. Correlation models and Regression models were built along with the Granger Causality test in R studio. Visualisation techniques were carried out in Microsoft Excel and R studio showing some trends in the data. A formula for stock market prediction for commercial use was created. Since the data set gathered from Twitter was not large enough and the actual information in the tweets was not specified towards the stock belonging to the companies, there is an issue of noisy data corrupting the analysis. A sentiment analysis was not carried out on the tweets. Definitions, Acronyms, and Abbreviations Term API AWS Causative GPOMS Granger causality test NASDAQ Noisy Data POMS Sentiment analysis Text Wrangler Tweet Definition Application programming interface Amazon Web Service A form that indicates that a subject causes something else to do something or causes a change in state of a nonvolition event. Google Profile of Mood States, algorithm to classify public sentiment into 6 categories {Calm, Alert, Sure, Vital, Kind and Happy} A statistical hypothesis test for predicting if one time series is useful in predicting another. National Association of Securities Dealers Automated Quotations Meaningless data. Profile of Mood States. A natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. Text editor for Mac OS X A message posted on the Twitter website. The Use of Twitter Activity as a Stock Market Predictor 6

Introduction The stock market is an essential way for companies to raise money. Companies can raise additional financial capital by being publicly traded in order to expand their business by selling shares of ownership. Historically it is known that share prices can have a major influence on economic activities and can be an indicator of social mood. The stock market movements has always been a rich and interesting subject with such many factors to be analysed that for a long time it would be considered unpredictable. The application of new computerized mathematical methods over the past few decades developed by companies such as Merrill Lynch and other financial management companies have created models that can maximize their returns while minimizing their risks. Stock market prediction has been around for years but it has been giving a new method of prediction thanks to the rise of social media. The objective of this project is to analyse Twitter feeds for activities and trends associated with a brand and to see how their stock market shares are related and if they are affected to the twitter activity. This analysis will look at the relationship of the amount of tweets for three specific brands on the NASDAQ, Apple, Microsoft and Tesla. The search for each company s symbols on the NASDAQ within those returned tweets would be conducted as an additional exploration of stock conversation on Twitter. These brands where chosen since they are innovative technology companies that are on the same stock exchange. Therefore gathering of the twitter data was not time zone dependent. Stock market data was collected from the Yahoo Finance website, there they provide historical data for the NASDAQ. Java scripts were used to acquire the tweets through Twitters API service. The Tweets for each brand were then counted using Amazon Web Service and Text Wrangler. The counted tweets were subsequently analysed using R studio were correlational and regression models were built and Granger Causality Test was performed. The Data was then visualised in Excel and R studio and the creation of a formula for commercial use was attempted. The Use of Twitter Activity as a Stock Market Predictor 7

Related Work In the previous study Stock Market Prediction Using Twitter I researched papers in relation to sentiment analysis of social media for the prediction of stock market movement. The social media in question was Twitter. The investigated looked at the correlation between the public mood and the stock market movement and how it can be used to predict stock market prices. The use of sentiment analysis was used to translate the tweets into moods using algorithms such as Google Profile of Mood States. The process of using a sentiment analysis on the tweets proved to be an accurate analysis of the data. Analysing Twitter activity does not provide sufficient behavioural attitudes towards the investors and an accurate prediction of stock movement cannot be ascertained. Sentiment analysis provides the investigation with an insight into the public attitude. The more detailed sentiment analysis on the Twitter data along with a reliable stock data the more superior and accurate the results. Twitter activity along might not give the insight the stockbroker needs to make challenging decisions in buying or selling shares. Systems and Datasets Design and Architecture Brief description of work carried out The system was designed to acquire twitter and stock market data and compare the two data sets for a relationship. For the Twitter data the use of JAVA script, AWS script and Text Wrangler were used to clean the data. The financial data was acquired from the Yahoo Finance website. The data was downloaded in excel format then saved as a CSV file. Then the results from the cleaned Twitter data were placed with the financial cleaned data in excel. Grangers Causality implemented in R Studio to find if the Twitter time s series was useful at forecasting the stock prices time series. A correlation model was built to confirm the relation between the two data types. Then excel was used to visualizes and confirm the relation. Datasets There were two forms of datasets. The first dataset acquired was the Twitter feeds. Historical tweets proved to be difficult since Twitter had sold on their information to external parties. These companies, such as DataSift offer analysis on historical data. While this would have been beneficial to the original project proposal the budget of the project was zero. Twitter launched a Historical Data Grant scheme, which allowed academic students to send in their proposal to gain access to Twitters historical data. The Use of Twitter Activity as a Stock Market Predictor 8

A proposal on behalf of this project was sent into the Data Grant scheme but a reply from Twitter returned far too late into the project. Subsequently from these dates the historical stock market data was gathered from Yahoo Finance. Gathering of Twitter Data. The Java script was acquired under approval of Dr. Brian Mac Namee, a Principal Investigator with CeADAR and a lecturer in the School of Computing at the Dublin Institute of Technology. The Java script was used in conjunction with Twitter API. In order to use the Twitter API user must first sign up for a developer account and create an application; there the user can acquire the API codes/keys to run their script. The script was run on my behalf at a friend s home since my own personal Internet connection was not suitable and the apprehension of disconnection, which would have returned unreliable time series. Figure 1.1: Example of the application used in twitter. (Dev.twitter.com, 2014) The Use of Twitter Activity as a Stock Market Predictor 9

Figure 1.2: Example of the JAVA code used for downloading the twitter feeds. Figure 1.3: Demonstrates where the unique keys were inputted into the JAVA script. Figure 1.4: Demonstrates where the key words were inputted into the JAVA script. The Use of Twitter Activity as a Stock Market Predictor 10

Java script Issues Since the returns from the JAVA script were so regular and to avoid any apprehension of a system crash the data was saved into text files daily. The data sets retrieved from twitter were from 60 megabytes to 100 megabytes with over 400,000 lines of tweets per day. Five sets of text files were attained representing Monday to Friday the NASDAQ opening times. Figure 1.5: Example of the acquired twitter feeds from the JAVA script in a text file. Since one of the days the script was running stopped there was a gap of which existed no tweets from 3am until 8am one day because of this tweets that were published between the trading times of the NASDAQ were used. NASDAQ trading hours is from 09:30 until 16:00 Monday to Friday. In GMT time that is 14:30 to 21:00. Counting the Tweets Next the tweets had to be counted. To this I initially proposed using Amazon Web Services because of the size of the data sets. A word count from the AWS website was used to count all the specific words in each tweet. The Use of Twitter Activity as a Stock Market Predictor 11

Figure 1.6: Example of the acquired Python script file from the AWS website. (Aws.amazon.com, 2014) A folder in the S3 bucket was created named project 2014. Here all necessary files such as python scripts and tweet files were uploaded. An Elastic Map Reduce Cluster was created. Figure 1.7: Example of a successful cluster from the AWS website. (Aws.amazon.com, 2014) The Use of Twitter Activity as a Stock Market Predictor 12

Figure 1.8: Example of a text file returned form the AWS. Word counting Issues The drawback to this script file is that it counted each time a specific word came up in a tweet providing results that were inaccurate. The Use of Twitter Activity as a Stock Market Predictor 13

Figure 1.9: Example of a tweet with Apple mentioned twice in Text Wrangler. (Mac App Store, 2014) What was needed was a way to count the amount of tweets that had the keyword mentioned in them. These tweets could contain all three keywords (Apple, Microsoft and Tesla) or together the twitter feeds of each word separately. Text Wrangler was used to search the individual text files for the frequency of the tweets with the key words separately but still had the same problem of counting the amount of times the word occurred. Figure 1.10: Example of tweets from Monday with Tesla mentioned, 3866 occurrences in Text Wrangler. (Mac App Store, 2014) For this reason there will be some conflicts in my analysis result because of extra word counts in tweets with the keywords mentioned twice. Date Apple AAPL Microsoft MSFT Tesla TSLA 07/04/2014 71913 1001 36417 521 3866 281 08/04/2014 118077 950 47925 613 4600 395 09/04/2014 81840 1100 24084 437 3113 301 10/04/2014 63983 1483 19521 435 3204 447 11/04/2014 62755 1145 18146 343 2140 347 Figure 1.11: Displays the key words and their occurrences per day. The Original Key words were Apple, Microsoft and Tesla. I decide to also search for their NASDAQ symbol/code. From previous research into twitter mining and stock prediction researchers searched for the company codes, as it would return The Use of Twitter Activity as a Stock Market Predictor 14

more accurate tweet count where people were tweeting about the actual stock of the company. Gathering of Stock Price Data Once the twitter feeds had being gathered the financial data could be downloaded. The historical stock prices had to be the same dates as the Twitter feeds. The data was downloaded in excel format then saved as a CSV file for use in R for analysis. Historical data sets of stock prices can only obtained per day at the minimum from Yahoo Finance otherwise it would have to be streamed from directly from the NASDAQ website, which I did not have the access to. Ideally hourly stock prices would have worked by matching the time series with the Twitter feeds. Data sets of stock prices were collected from the Yahoo Finance website for all three companies. Each set had seven columns consisting of Date, Open, High, Low, Close, Volume and Adjusted Close. Date is the day of trading. Open is the opening price of the stock at the start of the days trading. High is the highest price of the stock form that day. Low is the lowest price of the stock from that day. Close is the closing price of the stock at the end of the days trading. Volume the number of shares traded that day. Adjusted Close is the after trading hours price. The difference between the open and close price. The Use of Twitter Activity as a Stock Market Predictor 15

Figure 1.6: Demonstrates the acquired historical Apple stock prices for the month of April 2014 form the Yahoo Finance website. (Finance.yahoo.com, 2014) The closing price is the data in which this analysis focoused on. Data Preparation Results from the cleaned Twitter data were placed with the financial cleaned data in excel. Date Open High Low Close Volume Adj Apple AAPL Close 2014-519 522.83 517.14 519.61 9704200 516.72 62755 1145 04-11 2014-530.68 532.24 523.17 523.48 8559000 520.57 63983 1483 04-10 2014-522.64 530.49 522.02 530.32 7363200 527.37 81840 1100 04-09 2014-525.19 526.12 518.7 523.44 8710300 520.53 118077 950 04-08 2014-528.02 530.9 521.89 523.47 10351800 520.56 71913 1001 04-07 Figure 4.2: Displays the key words and their occurrences per day with the stock prices for Apple. This was repeated for all three companies. The Use of Twitter Activity as a Stock Market Predictor 16

Requirements The requirements have remained mostly the same from the original Requirements Specification except for the use of live data rather than using historical Twitter data. Historical Twitter proved to be impracticable as the project had no budget and the historical data had to be purchased. Data requirements DR# Category Description Mo sco w DR1 Use of The information produced must be of use to the user. S M Infromation DR2 Availability Information generated must not be previously available to the user. S L DR3 Access The user must have access to this information. M H S t a t u s User requirements UR# Category Description Mo sco w UR1 Analysis outcome The analysis will provide Apple, Microsoft and Tesla with a better insight of the effectiveness of their advertising campaign strategy form data acquired by the Twitter feeds and stock market. S S t a t u s M UR2 User outcome This information must be of assistance to these companies M M Usability requirements Functional Requirements FR# Category Description Mo sco w FR1 Aquire Data 1 The project will gather and store all nessary data from live Twitter feeds using JAVA scripts in conjunction with Twitter M S t a t u s H The Use of Twitter Activity as a Stock Market Predictor 17

API. FR2 Aquire Data 2 The project will gather and store all nessary historical stock M H mrket data regarding the brand corrosponding to the dates in relation to the Twitter data that was aquired from the Yahoo Finance website. FR3 Clean Data 2 The correct programs will be aquired and used to clean and M H retrive Twitter data regarding to key words and hash tags of the brand on certain dates. FR4 Clean Data 2 The correct programs will be aquired and used to clean and M H retrive data historcal stock market share prices regarding the brand on the same time series as the Twitter feeds data. FR5 Analyse 1 The cleaned Twitter data is then analysed and compared. M H FR6 Analyse 2 The cleaned stock market data is then analysed and compared. FR7 Publish Data The analyse will then be publised and avslible to the coustomer. M M H H The Use of Twitter Activity as a Stock Market Predictor 18

Testing and Evaluation Systems Testing. Correlation Correlation coefficient is the linear relationship between two variables. Also know as Pearson Product-Moment Correlation Coefficient. Correlation values can be on a scale of +1 to -1. +1 for very story positive relationship. -1 for a strong negative relationship. Regression Regression is used to estimate or predict the relationships among one quantitative variable with another quantitative variable. Granger Causality Granger Causality is a statistical hypothesis test for predicting if one time series is useful in predicting another. Steps in testing stage 1. Check for correlation in R studio. 2. Compose a regression model. 3. Use Granger Causality test used to test if one time series is useful at forecasting another. 4. Change time series to adjust for lag. 5. Excel and R studio to visualizes and confirm any relation. Data sets. The data sets used are the counts from the keyword searches from the AWS returns. Apple, Microsoft and Tesla. Also the counts of the NASDAQ symbols for each company within those initial counts will be used as an additional investigation AAPL, MSFT and TSLA. Apple Stock 1. Check for correlation Figure 4.3: Displays the file AprilAAPL imported into R studio. First the data is imported into R studio. The Use of Twitter Activity as a Stock Market Predictor 19

Figure 4.4: Displays the correlation output in R. The correlation model result shows a moderate relation between Close and the counts of the keyword Apple of 0.223. 2. Regression Model Figure 4.5: Displays the regression model output in R. lm(formula = Apple ~ Close, data = AprilAAPL) Does Apple tweet count have an effect the close price? From the Multiple R-squared it is possible to see that the regression model returned a poor result with only 4.8% explaining Close price. The process was carried out for the AAPL count. The Use of Twitter Activity as a Stock Market Predictor 20

Figure 4.6 Displays the regression model output in R. lm(formula = AAPL ~ Close, data = AprilAAPL) Does Apple tweet count have an effect the close price? The regression model returned a similar poor result with only 0.07% explaining Close price. 3. Granger Causality Test Close is Dependent and Apple is independent. Is Apple the cause of the effect of Close? Does Apple Granger cause Close? Figure 4.7 Displays Granger Causality Test output in R for Closing price and Apple word count. From the result above you can see that after one-day lag are P value is 0.7057. The Use of Twitter Activity as a Stock Market Predictor 21

This is more than the significance level of 5%. Therefore the rejection of the Null hypothesis cannot happen meaning Apple word count does not predict the closing price one day later. Figure 4.8 Displays Granger Causality Test output in R Closing price and AAPL word count. A similar test was performed use the keyword AAPL as the independent and Close as the dependent. Results were slight better but did not cause Granger Causality. P value of 24% >5%. Since the data set was small a lag of 2 days could not be performed. Figure 4.9 Displays Granger Causality Test unsuccessful outputs. The above image demonstrates the unsuccessful outputs of the Granger Causality test using more than 1 day s lag. The reason for this error is because the data set was too small. The Use of Twitter Activity as a Stock Market Predictor 22

4. Visualization. Figure 4.1.1 demonstrates the relationship between the Apple count and Close price. From the above graph it is possible to see the positive relationship that the keyword Apple has with the Close price of Apple stock. As the Apple Count rises there is a rise in the closing stock price. Figure 4.1.2 demonstrates the relationship between the AAPL count and Close price. The Use of Twitter Activity as a Stock Market Predictor 23

Close Price Apple Count From the above graph it is possible to see the negative relationship that the keyword AAPL has with the Close price of Apple stock. As the AAPL Count rises there is a decline in the closing stock price. This proves are negative results from the correlation and regression models. AAPL was not a key word in the JAVA script but a search within the key word apple. 532 530 528 526 524 522 520 518 516 514 Apple count and Close Price 2014-04-07 2014-04-08 2014-04-09 2014-04-10 2014-04-11 140000 120000 100000 80000 60000 40000 20000 0 Close Apple Figure 4.1.3 demonstrates the relationship between the Apple count and Close price. As you can see from the above chart the Close Price marked line follows a similar trend about a day later to the Apple count line. The Use of Twitter Activity as a Stock Market Predictor 24

Close Price AAPL Count 532 530 528 526 524 522 520 518 516 514 AAPL count and Close Price 2014-04-07 2014-04-08 2014-04-09 2014-04-10 2014-04-11 1600 1400 1200 1000 800 600 400 200 0 Close AAPL Figure 4.1.4 demonstrates the relationship between the AAPL count and Close price. Unfortunately the above chart shows that the Close price didn t show a similar trend with AAPL but it actually showed a trend where AAPL word count is following the Close Price. This is probably the reason the correlation model was so low between the two; also the investor community that would use the keyword AAPL (Apple stock symbol) are disusing the rise in Apple stock. Microsoft Stock The process was started again this time using the Microsoft data set. 1. Check for correlation Figure 4.1.5 demonstrates the correlation between Microsoft and MSFT word count and Close price. The correlation model this time is much better with both keywords retuning a moderate correlation with Close price. The Use of Twitter Activity as a Stock Market Predictor 25

2. Regression Model Figure 4.1.6 displays the regression model with Microsoft word count as the independent variable. Figure 4.1.7 displays the regression model with MSFT word count as the independent variable. Figure 4.1.6 and 4.1.7 demonstration the two regression outputs from R as Close stock price as the dependent variable. Figure 4.1.6 displays a Multiple R-squared value of 0.96% explaining Close price. Figure 4.1.7 displays a Multiple R-squared value of 12.6% explaining Close price. The Use of Twitter Activity as a Stock Market Predictor 26

The normality plot If the residuals fall in a straight line that means the normality condition is met. Figure 4.1.8 demonstrates Normality plot of Microsoft and Close price. Normality condition is met. Figure 4.1.9 demonstrates Normality plot of MSFT and Close price. Normality condition is met. The Use of Twitter Activity as a Stock Market Predictor 27

3. Granger Causality Test Figure 4.2.1 displays the Granger Causality. Again the Granger Causality would not use a lag bigger tan one day. Both returned values bigger than the significant level of 5%. 4. Visualization Figure 4.2.2 demonstrates the relationship between the Microsoft count and Close price. The Use of Twitter Activity as a Stock Market Predictor 28

Close price Microsoft count Figure 4.2.3 demonstrates the relationship between the MSFT count and Close price. 40.6 40.4 40.2 40 39.8 39.6 39.4 39.2 39 38.8 38.6 38.4 Microsoft and Close Price 4/7/14 4/8/14 4/9/14 4/10/14 4/11/14 60000 50000 40000 30000 20000 10000 0 Close Microsoft Figure 4.2.4 demonstrates the relationship between the Microsoft count and Close price on a line chart. As you can see from the above chart the Close Price marked line follows a similar trend about a day later to the Microsoft count line. The Use of Twitter Activity as a Stock Market Predictor 29

Close price Microsoft count Close price MSFT count MSFT and Close Price 41 40.5 40 700 600 500 400 39.5 39 38.5 4/7/14 4/8/14 4/9/14 4/10/14 4/11/14 300 200 100 0 Close MSFT Figure 4.2.5 demonstrates the relationship between the MSFT count and Close price on a line chart. Pervious results with one day lag. Microsoft and Close Price with 1 day lag 40.6 40.4 40.2 40 39.8 39.6 39.4 39.2 39 38.8 38.6 38.4 4/8/14 4/9/14 4/10/14 4/11/14 60000 50000 40000 30000 20000 10000 0 Close Microsoft Figure 4.2.6 demonstrates the relationship between the Microsoft count and Close price on a line chart with a one-day lag. The Use of Twitter Activity as a Stock Market Predictor 30

Close price MSFT count 40.6 40.4 40.2 40 39.8 39.6 39.4 39.2 39 38.8 38.6 38.4 MSFT andclose Price with 1 day lag 4/8/14 4/9/14 4/10/14 4/11/14 700 600 500 400 300 200 100 0 Close MSFT Figure 4.2.7 demonstrates the relationship between the MSFT count and Close price on a line chart with a one-day lag. The decision was made to perform a manual lag in excel by moving the dates of the Microsoft count forward to see if the lines in the chart match up. This lag would mean that the tweet counts about Microsoft happened on the same dates as the actual Closing price. The results from the two graphs show that visually there is a relationship between the word counts and the Close stock price. A correlation and regression model was built again using the lagged data. 1. Correlation Figure 4.2.8 demonstrates the correlation between Microsoft and MSFT word count and Close price with a lag of one day. The correlation model in figure 4.2.8 shown a strong correlation with the two word counts. So a regression model was produced. The Use of Twitter Activity as a Stock Market Predictor 31

2. Regression Model Figure 4.2.9 displays the regression model with Microsoft word count as the independent variable using data with a one-day lag. The Use of Twitter Activity as a Stock Market Predictor 32

Figure 4.3.1 displays the regression model with MSFT word count as the independent variable with data of one-day lag. The two regression models returned a high Multiple R-squared value of 98%Figure explaining Close price. The high correlation and regression proved that there is a relation between the tweet counts and the closing stock price. The results were very high the reason for this occurrence would be the very small data set that was used. Tesla Stock The process was started again this time using the Tesla data set. Correlation and regression was performed with similar results from the pervious data sets. Figure 4.3.2 demonstrates the correlation between Microsoft and MSFT word count and Close price. Figure 4.3.2 demonstrates the correlation between Microsoft and MSFT word count and Close price with a one-day lag. The keyword Tesla showed a strong correlation with the Tesla closing stock price from the lagged data set. TSLA still displayed a moderate correlation. The Use of Twitter Activity as a Stock Market Predictor 33

Close Price Tesla count Figure 4.3.3 displays the regression model with Tesla word count as the independent variable using data with a one-day lag. Again the regression with the lagged data set showed a huge improvement then the non-lagged Tesla data. 220 215 210 205 200 195 Tesla Count and Close Price 4/7/14 4/8/14 4/9/14 4/10/14 4/11/14 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Close Tesla Figure 4.3.4 demonstrates the relationship between the Tesla word count and Close price on a line chart. The Use of Twitter Activity as a Stock Market Predictor 34

Close Price Tesla Count 220 215 210 205 200 195 Tesla Count and Close Price with one day lag 4/8/14 4/9/14 4/10/14 4/11/14 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Close Tesla Figure 4.3.5 demonstrates the relationship between the Tesla word count and Close price on a line chart with a one-day lag. Figures 4.3.4 and 4.3.5 demonstrate the difference between the non-lagged and the lagged data sets. Figure 4.3.5 demonstrates that the one-day in lag does make a difference to the results. It demonstrates a close relationship the Tesla count has with the Close price. The Use of Twitter Activity as a Stock Market Predictor 35

Formula For Predicting Stock Movement The creation of a formula for commercial use was conducted. The small data set had an impact on this work since the use of a lag between two the three days was desired. From pervious research Stock Market Prediction using Twitter it was discovered that the tweets would predict stock movement two to three days after the message was tweeted. Knowing the tweet volumes of a company for two consecutive days the percentage of movement of tweets between those two days should in turn allow us to predict the movement in the company share price within in a two or three day lag. Formula Used The percentage difference between two numbers ( V1 - V2 / ((V1 + V2)/2)) * 100 V1 = total company tweets on day one. V2 = total company tweets on day two. The formula was used to find the percentage difference between the stock movement and the tweet movement. Apple Stock Prediction To save time the focus is only on the key word count of Microsoft. Calculate the percentage difference of Apple Tweets And Closing Price Difference in Apple Stock % Difference in Tweet Activity % -5.73099E-05 0.019568162 Day one 0.005% Day One 1.96% 0.013143818 0.279089758 Day Two 1.31% Day Two 27.91% -0.012897873 0.442778592 Day Three 1.29% Day Three 44.28% -0.007392833-0.390965218 Day Four 0.73% Day Four 39.09% Figure 4.3.6 demonstrates difference in Stock Close price and Tweet activity between days. The Use of Twitter Activity as a Stock Market Predictor 36

If the movement were not identical in percentage increase/ decrease then the formula would need to be adjusted. The movement in Tweet Activity was not proportionate (pro rata movement). Figure 4.3.7 demonstrates the formula for predicting the third day using Close stock values. Example of the formula process Subtract the tweets of Day 1 from Day 2. The tweet volume has an increase of 1228 tweets, which represent 1.9568% increase. The Apple closing stock of Day 1 is $523.47. Multiply it by 1.9568% This projects an increase of $10.29 Add this to the to the Day 1 share price (523.47 + 10.29) = $533.7 Closing price of Day 3 = $530.32 Formula projects a closing price of $533.76 against an actual closing price of $530.32. The difference in the projected actual price is $3.38 This represents a variance of 0.639% The Use of Twitter Activity as a Stock Market Predictor 37

The formula used here is a straight line (1:1 ratio) The Apple share prices increase at the same rate as the Twitter feeds within an error level of just 0.639%. Figure 4.3.8 demonstrates the formula for predicting the forth day using Close stock values. The process was repeated this time using values to predict the fourth day. Unfortunately an error of 27.904% was returned. Figure 4.3.9 demonstrates the formula for predicting the fifth day using Close stock values. The process was repeated this time using values to predict the fifth day. Unfortunately an error of 47.25% was returned. The formula didn t apply to the days after the third. The Use of Twitter Activity as a Stock Market Predictor 38

Calculate the percentage difference of Apple Tweets And Low Price Figure 4.4.1demonstrates the formula for predicting the third forth and fifth day using Low Stock values. Also considered was the formula used with the Low stock price to see if there was a relation. The best day the formula applied to was predicting the third day with an error of 1.89%. The Use of Twitter Activity as a Stock Market Predictor 39

Calculate the percentage difference of Microsoft Tweets And Volume The use of Volume in the formula was also measured. Figure 4.4.2 demonstrates the formula for predicting the third day using the volume values. However this too had a high error rate of 30.23%. Microsoft Stock Prediction Calculate the percentage difference of Microsoft Tweets And Closing Price Difference in Stock Difference in Tweet Activity 0.000502513 0.316006261 Day one 0.05% Day One 31.60% 0.016323456-0.497464789 Day Two 1.63% Day Two 49.74% -0.027427724-0.189461883 Day Three 2.74% Day Three 18.94% -0.003810976-0.070436965 Day Four 0.38% Day Four 7.04% Figure 4.4.3 demonstrates difference in Stock Close price and Tweet activity between days. Projecting closing stock price Day 3 The Use of Twitter Activity as a Stock Market Predictor 40

Figure 4.4.4 demonstrates the formula for predicting the third forth and fifth day using the Close stock values. The formula returned a high variance for all projected days. This concludes that the formula does not apply to any of these days using Close Stock. The Use of Twitter Activity as a Stock Market Predictor 41

Calculate the percentage difference of Microsoft Tweets And Low Price Also considered was the formula used with the Low stock price to see if there was a relation. Tweets day1 - day2 11508 Low stock of day 1 * difference of tweets day1 and day 2 12.5580888 Stock low price day 1 + low stock of day 1 * difference of tweets day1 and day 2 52.2980888 Low price of Day3 - projected low price day 3-12.5580888 Difference between projected low day 3 and actual day 3 as a variance. 0.237448234 23.74% Figure 4.4.7 demonstrates the formula for predicting the third day using the Low stock values. Again the formula showed that it did not apply to the Low Stock price. Calculate the percentage difference of Microsoft Tweets And Volume Figure 4.4.7 demonstrates the formula for predicting the third day using the Volume values. The Volume data was placed into the formula but the result shown above has a high error rate of 44.5%. The Use of Twitter Activity as a Stock Market Predictor 42

Tesla Stock Prediction Calculate the percentage difference of Tesla Tweets And Closing Price Difference in Stock Difference in Tweet Activity 0.002007934 0.189860321 Day one 0.200793379 Day One 18.98603207 0.027922269-0.32326087 Day Two 2.792226911 Day Two 32.32608696-0.02110152 0.029232252 Day Three 2.110151951 Day Three 2.923225185 0.026816564 0.332084894 Day Four 2.681656439 Day Four 33.20848939 Figure 4.4.8 demonstrates difference in Stock Close price and Tweet activity between days. The Use of Twitter Activity as a Stock Market Predictor 43

Figure 4.4.9 demonstrates the formula for predicting the third forth and fifth day using the Close stock values. The formula had high percentage errors except for the prediction for the fifth day with an error of 2.33%. The Use of Twitter Activity as a Stock Market Predictor 44

Tweets day1 - day2 low stock of day 1 * difference of tweets day1 and day 2 38.69163476 Stock low price day 1 + low stock of day 1 * difference of tweets day1 and day 2 242.4816348 Low price of Day3 - projected low price day 3-48.07163476 - Difference between projected low day 3 and actual day 3 as a variance. 0.198248559-19.82485594 Figure 4.5.1 demonstrates the formula for predicting the day using the Low stock values. Tweets day1 - day2-734 Volume day 1 * difference of tweets day1 and day 2 1369177.703 Volume day 1 + Volume day 1 * difference of tweets day1 and day 2 8580677.703 Volume Day3 - projected low price day 3-877677.7031 Difference between projected Volume day 3 and actual day 3 as a variance. Figure 4.4.9 demonstrates the formula for predicting the third day using the Volume values. - 0.102285359-10.22853594 When the Low Stock and Volume values were placed into the formula they also displayed high errors. Low Stock had an error of over 19% and the Volume values had an error over 10%. The Use of Twitter Activity as a Stock Market Predictor 45

Conclusion This analysis investigated the relation between twitter activity and stock market share prices of three companies in the NASDAQ over a period of one week. The use of a Java script and Twitters API collected the tweets that had the keywords Apple, Microsoft and Tesla mentioned in them. Once the tweets were collected a python file was used to count the frequency of words in conjunction with Amazon Web Service. AWS was used because of the size of the Tweets files, which were in text format of sizes ranging from 60 to 130 megabytes. Text Wrangler was also used to count the frequency of tweets with the keywords. Since one of the data sets have missing data over five hours due to a program failure it was decided to use tweets during the NASDAQ trading hours. Stock data belonging to the three companies was acquired from the Yahoo Finance website. Similarly a count of times the NASDAQ symbols for each company was conducted and used as an additional analysis. The symbols would give the opportunity to investigate the occurrence of conversations directed to the actual company stock on the NASDAQ. Analysis was performed in R studio using a correlation model first to see the how strong a relation the tweet data had with the stock data of each company. A Linear regression algorithm was then used to see the effect that the twitter data had on the stock data. Granger Causality was performed to discover if one of the time series affected the other providing a result in the form of a lag per day. Since the data was so small a lag of only one-day could be performed providing a significant level of over 5%, which we could not select, the alternative hypothesis. During visualization of the data using line graphs it was noted that there seem to be a relation where the stock data had a similar trend one day after the tweet data. A manual lag was performed in excel by moving the tweet data time series forward by one day. This proved that a trend did exist. Subsequently a correlation model in R studio was created and the results exhibit a strong correlation of 0.9 and over. The creation of a formula for commercial use was attempted. The first formula was used to find the percentage difference between the stock movement and the tweet movement. On average there was a difference between the movement of the stocks and the shares. Another formula was created to predict the close share price. Knowing the twitter volumes of a company for two consecutive days, the percentage of movement of tweets between those two days should in turn allow us to predict the movement in the company share price three days later. The formula used is a straight line (1:1 ratio) Whilst predicting the third day for the Apple share prices an error level of just 0.639% was returned. This meant that the close share price increased at the same rate as the Twitter feeds for the key word Apple. Within an error lever of 0.639% Disappointingly the other days predicted for Apple Close stock price were not as suitable returning error rates of 27.9% and 47.25%. This trend continued throughout the analysis for the closing price in the Microsoft and Tesla stock. The formula was slightly altered to accommodate the use of other variables such as Low Close stock and Volume. Again the errors were high for each one. The Use of Twitter Activity as a Stock Market Predictor 46

The main issue here is that the data set is not developed enough to do this form of analysis. When acquiring the data specific tweets regarding the stock of the company should have only being collected. A company on Twitter is competing for public interest while the stock exchange is competing for capital interest. In that aspect some of the Tweets gathered in this analysis are noisy data. Further Development Further develop in the project would include extracting tweets and stock data over a longer period of time. This would have provided the analysis with a superior result from the Granger Causality test. The tweets need to be selected form a niche community, preferably the investor community who communicate through Twitter in relation to the stocks of companies. Tweets that have the company symbols and the word stock mentioned in them should be gathered using those keywords. Narrowing down the selection of companies and focusing on one would support in reducing the amount of discrepancies in the tweet count. Developing a program script to count the lines that a word appears in without recounting the word again if it has being mentioned more than once in a tweet. The potential use of developing a formula that could take account of other variables that would cause movement in stock, such as events like the release of company financial reports, takeover rumours, mergers or bad publicity. The process of using a sentiment analysis on the tweets would provide a more accurate result from the data. Analysing Twitter data activity along will not provide the analysis with any information about behavioural attitudes towards the investors. Sentiment analysis would also provide a better insight into the public attitude. The Use of Twitter Activity as a Stock Market Predictor 47

Bibliography Aws.amazon.com, (2014). Word Count Example : Articles & Tutorials : Amazon Web Services. [online] Available at: http://aws.amazon.com/articles/2273 (Accessed 22 May. 2014). Bollen, J. and Mao, H. (2011) 'Twitter mood as a stock market predictor' Computer. Datasift.com, (2014). Power Decisions With Social Data DataSift. [online] Available at: http://datasift.com (Accessed 24 May. 2014). Dev.twitter.com, (2014). Twitter Developers. [online] Available at: https://dev.twitter.com (Accessed 22 May. 2014). Finance.yahoo.com, (2014). AAPL Historical Prices Apple Inc. Stock - Yahoo! Finance. [online] Available at: http://finance.yahoo.com/q/hp?s=aapl&a=03&b=01&c=2014&d=03&e=30&f= 2014&g=d (Accessed 22 May. 2014). Mac App Store, (2014). TextWrangler. [online] Available at: https://itunes.apple.com/ie/app/textwrangler/id404010395?mt=12 (Accessed 22 May. 2014). Mittal, A. and Goel, A. (2012) 'Stock prediction using Twitter sentiment analysis' Standford University, CS229(2011 http://cs229. stanford. edu/proj2011/goelmittal-stockmarketpredictionusingtwittersentimentanalysis. pdf). Simsek, M. and Ozdemir, S. (2012) 'Analysis of the relation between Turkish twitter messages and stock market index'. Ucd.ie, (2014). CeADAR. [online] Available at: http://www.ucd.ie/ceadar/ (Accessed 26 May. 2014). Ucd.ie, (2014). Brian Mac Namee CeADAR. [online] Available at: http://www.ucd.ie/ceadar/people/principalinvestigators/brianmacnamee/ (Accessed 26 May. 2014). Appendix Project Materials: https://drive.google.com/folderview?id=0b4pkbial1w7cqzvvakgwq3psnfk& usp=sharingreferences The Use of Twitter Activity as a Stock Market Predictor 48

Project Proposal Introduction The purpose of this project is to study and analyse the activities and trends associated to the Mobile World Congress 2014, which is being held from the 24 th to the 27 th of February 2014. The Mobile World Congress is the world s largest exhibition of the mobile industry. Mobile operators, device manufacturers and technology providers are all represented at the exhibition. With a large amount of manufacturers attending and product launches the subject can be quite broad. The objective of this project is to analyse Twitter feeds for activity s and trends associated with the top mobile manufacturers before, during and after the event and to see how their stock market shares are connected and affected by the Twitter feeds. Background As Twitter matures, top brands have realized just how relevant Twitter can be as a marketing and engagement platform. According to Useful Social Media 98% of the top brands are on Twitter and 92% of top brands tweet daily. There are 230 million active users on Twitter; this provides brands with a global presence. (USM) 92% of top brands Tweet at least once daily as audiences grow. Study shows Twitter s maturity as a marketing and engagement platform. 98% of all top brands are active on Twitter. The social network has matured into a valuable and necessary channel for marketing organizations. (Usefulsocialmedia.com, 2014) i Releases such as the Samsung Galaxy s5 will hopefully see a surge of Twitter activity in relation to Samsung during the event. According to Trusted Reviews the release of the Samsung Galaxy s5 will take place during the event. (Trusted Reviews) The Samsung Galaxy S5 release date looks set to be held in a matter of days as the Korean manufacturer issues invites to a February 24 launch event, kicking Samsung Galaxy S5 rumours into overdrive. (Trusted Reviews, 2014) ii Using the data from the Twitter feeds I can then analyse them against the stock market shares. According to Mac Rumours, Samsung has the biggest phone market share with Apple in second place. (Mac Rumours) Apple Continues to Lose Smartphone Share, Gain Mobile Phone Share in 4Q 2013 (Mac Rumours, 2014) iii The Use of Twitter Activity as a Stock Market Predictor 49

Similar research has being done in relation to Twitter feeds influencing market shares but this project will be focusing mainly on the Mobile World Congress in relation to the markets shares of the top five mobile device manufacturers. Technical Approach This objective will be achieved by: Creating the necessary python coding to use with the Twitter API for retrieving the data. Gathering all data created on Twitter related to the mobile device brands before, during and after the event. Gather stock market share prices before, during and after the event of the mobile device brands. Clean all data gathered for analysis Analysis of the data gathered of Twitter activity against the stock market share prices. Return the results of the analysis. Special Resources Required Books to be used: Python for data analysis Mckinney, W. (2013) Twitter API: Up and Running: Learn How to Build Applications with the Twitter API Paperback by Kevin Makice. (2009) Writing Your Dissertation by Swetnam, D. & Swetnam, R. (2000). Software to be used: Python R studio MYSQL Microsoft Excel Microsoft Project Twitter API System storage to be used: Twitter API At this stage of the project I am unaware of the amount of data that I will accumulate from Twitter. The Use of Twitter Activity as a Stock Market Predictor 50

Project Plan Technical Details The coding I will use to retrieve the data will be python. R coding and Microsoft Excel will then be used to do the analysis of the data. Systems/Datasets The datasets used will be all collected by myself using the online Twitter API with the python coding to collect specific words, hash tags from the tweets over the duration of the events operating time per day. Evaluation/Test and Analysis I am unable to state how I will test the data due to the fact that we have only had one class of Data and web mining but I can list the types of analysis that we will be learning. Classification Regression (value estimation) Similarity matching Clustering The Use of Twitter Activity as a Stock Market Predictor 51

Co-occurrence grouping (frequent itemset mining) Profiling (behaviour description) Link Prediction Data reduction Causal modelling Consultation with Specialization Persons John O Connor CEO of Wellclever. Wellclever is a startup company that provides the media groups and content producers with keyword contextual online advertising solutions. Consulted with John for project ideas. John has over 20 years of experience in the advertising industry. (Wellclever, 2014) iv Oisin Creaner coordinator of the project for NCI Spoke to Oisin about project ideas through the use of Twitter API s. The Use of Twitter Activity as a Stock Market Predictor 52

Requirments Specification Document Control Revision History Date Version Scope of Activity Prepared Reviewed Approved 20/02/2014 1 Create RC X X 23/02/2014 2 Update RC X X 24/02/2014 3 Update RC X X Distribution List Name Title Version Oisin Creaner Lecturer Samsung Customer Robert Coyle BA Robert Coyle System Developer Robert Coyle Statistician Robert Coyle Tester Robert Coyle Advertising and Marketing Devision Related Documents Title Proposal Document Comments The Use of Twitter Activity as a Stock Market Predictor 53

1 Introduction 1.1 Purpose The purpose of this project is to study and analyze the activities and trends associated to a brands advertising campaign. The objective of this project is to analyze Twitter feeds for activities and trends associated with the brand before, during and after their advertising campaign and to see how their stock market shares are connected and affected by the Twitter feeds. The intended customers are the actual brands, their marketing and PR team. As Twitter matures, top brands have realized just how relevant Twitter can be as a marketing and engagement platform. According to Useful Social Media 98% of the top brands are on Twitter and 92% of top brands tweet daily. There are 230 million active users on Twitter; this provides brands with a global presence. (USM) 92% of top brands Tweet at least once daily as audiences grow. Study shows Twitter s maturity as a marketing and engagement platform. 98% of all top brands are active on Twitter. The social network has matured into a valuable and necessary channel for marketing organizations. (Usefulsocialmedia.com, 2014) v 1.2 Project Scope This analysis will compare different advertising campaigns done by a brand on the release of a new or updated product and how they differ from one another. It will also look at how a brands advertising campaign affects their stock market share prices. I will be using the historic Twitter feeds and historic stock market shares. The project will look at an individual brand such as Samsung, acquire the necessary twitter feeds associated with Samsung. Using the correct programs and scripts the program should gather any mentions of Samsung in the tweets including hash tags. The data will include the time series of the tweets and then we can match this data to the time series of the stock market data. With a budget of zero acclimating the historic Twitter feeds could be a difficult task since my researching has show that Twitter has giving/sold their data to separate/outside companies who now sell the data for use. 1.2.1 In Scope 1. The analysis of a advertising campaign with the data gathered from twitter and stock market share prices. 2. The development of python programs for cleaning data. 3. The development of an R program and the use of Microsoft Excel for the analysis of the data. The Use of Twitter Activity as a Stock Market Predictor 54

1.2.2 Out of Scope 1. The project will not provide Samsung with outside analysis of other brands data. 1.3 Document Scope The goal of this document is to describe the functional and non-functional requirements of the Samsung advertising campaign analysis. The stakeholder analysis was carried out prior to requirement elicitation process. 1.4 Definitions, Acronyms, and Abbreviations Term Advertising campaign BA Backed-up Cloud Data Excel GUI Moscow Pyton R Definition A series of messages to promote a product. Business Analyst The process of storing information (hardware or software based) Internet based service where storage, applications and servers are accused through the internet for an organization. Information Microsoft Excel is a spreadsheet application used here for analyzing data. Graphical user interface Is a technique used in functional requirements.must, Could, Should, Want. See Functional requirements Type of programming language Programming Langauge 2 User Requirements Definition 2.1 User Characteristics As part of Samsung s $14 billion advertising and marketing campaign last year (2013) the company requires an analysis on the effectiveness of the advertising campaign and how the twitter activity and their stock market prices were affected. According to ibtimes.co.uk Samsung were expected to spend $14 billion on there marketing campaign (ibtimes.co.uk) The South Korean company is expected to spend around $14 billion ( 8.5bn, 10.3bn) on marketing and promotion of its products in 2013, which is the biggest (as a percentage of its total revenue) advertising budget of any company ever (ibtimes 2013) vi, Samsung have not yet released there analog report for 2014. The analysis will provide Samsung with a better insight of the effectiveness of their advertising campaign strategy form data acquired by the Twitter feeds and stock market. This information will assist Samsung in managing their advertising The Use of Twitter Activity as a Stock Market Predictor 55

campaign more effectively and efficiently by directing the style and approach of the campaign towards their specific products. 3 Requirements Specification 3.1 Functional Requirements FR# Category Description Mo sco w FR1 Aquire Data 1 The project will gather and store all nessary data from historical Twitter feeds. M FR2 Aquire Data 2 The project will gather and store all nessary historical stock M H mrket data regarding the brand corrosponding to the dates in relation to the Twitter data that was aquired. FR3 Clean Data 2 The correct programs will be aquired and used to clean and M H retrive histoical Twitter data regarding to key words and hash tags of the brand on certain dates. FR4 Clean Data 2 The correct programs will be aquired and used to clean and M H retrive data historcal stock market share prices regarding the brand on the same time and dates as the histoical Twitter feeds data. FR5 Analyse 1 The cleaned Twitter data is then analysed and compared. M H FR6 Analyse 2 The cleaned stock market data is then analysed and compared. FR7 Publish Data The analyse will then be publised and avslible to the coustomer. M M S t a t u s H H H The Use of Twitter Activity as a Stock Market Predictor 56

3.1.1 Use Case Diagram Overall Functional Requirements 3.1.2 Requirement 1: Acquire Data 1 and 2 3.1.2.1 Description & Priority The scope of this use case is to gather all the data necessary to carrier out the analysis and continue onto the next stage of the project. This requirement has a very high status and is essential in progressing on the next stage of the analysis. The Use of Twitter Activity as a Stock Market Predictor 57

3.1.2.2 Use Case Scope The system shall source the historic twitter and stock market data from online data resources. Define all access points. Accuses the Data, notify its availability and then download the data. Description This use case describes the process to which the data for analysis is acquired. Use Case Diagram Flow Description Precondition The Data must be online. The data system must be operational at all times. The Use of Twitter Activity as a Stock Market Predictor 58

Activation Use case is activated when the programmer connects to the system online. Main Flow 1. Step: 1A. Programmer and System Developer source data. 2. Step: 2A. Programmer and Business Analyst validate data with the Customer. 3. Step: 3A. Programmer accesses the data. 4. Step: 4A. Programmer notifies data availability to the System Developer. 5. Step: 5A. Programmer downloads data for cleaning. Alternate Flow 1. Step: 1A. Programmer and System Developer source data. 2. Step: 2A. Programmer and Business Analyst validate data with the Customer. 3. Step: 2A. Customer does not validate data. Step 1A is set to recommence. 4. Step: 1A. Programmer and System Developer source data. 5. Step: 2A. Programmer and Business Analyst validate data with the Customer. 6. Step: 3A. Programmer accesses the data. 7. Step: 4A. Programmer notifies data availability to the System Developer. 8. Step: 5A. Programmer downloads data for cleaning. Exceptional Flow 1. Step: 1A. Programmer and System Developer source data. 2. Step: 2A. Programmer and Business Analyst validate data with the Customer. 3. Step: 2A. Customer does not validate data. Data is unavailable. 4. Use case ends Termination The system has gathered all necessary data. The data is then exported on the cloud storage system. This process has now being terminated. Post Condition All Data gathered, move onto the next step. The Use of Twitter Activity as a Stock Market Predictor 59

3.1.3 Requirement 2: Clean Data 1 and 2 3.1.3.1 Description & Priority The scope of this use case is to clean all the data gathered from the pervious requirement. A programmer and tester investigate the data for any errors such as missing data and fix the errors. This requirement has a very high status and is essential in progressing on the next stage of the analysis. 3.1.3.2 Use Case Scope The system shall clean all data sets gathered from the pervious requirement. Define all error points. Get recommendations for fixing the errors. Fixes the errors and then exports the data for analysis. Description This use case describes the process to which the data is cleaned for analysis. The Use of Twitter Activity as a Stock Market Predictor 60

Use Case Diagram Flow Description Precondition The Data must be stored and available for cleaning at all times. Activation Use case is activated when the programmer connects to the cloud storage system and retrieves the data. Main Flow 1. Step: 1B. Programmer and System Developer retrieve data from the cloud storage system. 2. Step: 2B. Programmer and Tester identify errors in the data set. 3. Step: 3B. Programmer receives recommendations from System Developer. The Use of Twitter Activity as a Stock Market Predictor 61

4. Step: 4B. Programmer with the help of the Tester fixes errors and notifies the System Developer. 5. Step: 5B. Programmer exports the data for analysis. Alternate Flow 1. Step: 1B. Programmer and System Developer retrieve data from the cloud storage system. 2. Step: 2B. Programmer and Tester identify errors in the data set. 3. Step: 3B. Programmer receives recommendations from System Developer. 4. Step: 4B. Programmer with the help of the Tester fixes errors and notifies the System Developer. 5. Step: 2B. Programmer and Tester test system again and identify more errors in the data set. 6. Step: 3B. Programmer receives recommendations from System Developer. 7. Step: 4B. Programmer with the help of the Tester fixes errors and notifies the System Developer. 8. Step: 5B. Programmer exports the data for analysis. Exceptional Flow 1. Step: 1B. Programmer and System Developer retrieve data from the cloud storage system. 2. Step: 2B. Programmer and Tester identify errors in the data set. 3. Step: 3B. Programmer receives recommendations from System Developer. 4. Step: 4B. Programmer with the help of the Tester fixes cannot fix errors. Data is corrupt. 5. Use case ends. Termination The system cleaned all acquired data. The data is then saved onto the cloud storage system and exported for analysis. This process has now being terminated. Post Condition All data cleaned, move onto the next step. The Use of Twitter Activity as a Stock Market Predictor 62

3.1.4 Requirement 2: Analyze Data 3.1.4.1 Description & Priority The scope of this use case is to analyze all the data gathered and cleaned from the pervious requirements. A Business Analyst and Statistician examine and study the data for Analysis. This requirement has a very high status and is essential in progressing on the next stage of the analysis. 3.1.4.2 Use Case Scope This process involves the skills and management of the Statistician and Business Analyst to compare and analyze all data. The process shall calculate and prove/predict outcomes form the data with the help of graphs for visualizing. Then all proven data is backed-up and stored. Description This use case describes the process to which the data analyzed. The Use of Twitter Activity as a Stock Market Predictor 63

Use Case Diagram Flow Description Precondition The Data must be available for analysis at all times. Activation Use case is activated when the BA and the Statistician connects to the cloud storage system and retrieves the data. Main Flow 1. Step: 1C. BA and Statistician retrieve data from the cloud storage system. The Use of Twitter Activity as a Stock Market Predictor 64

2. Step: 2C. The Statistician and BA explore and understand the data set. 3. Step: 3C. Statistician begins the calculations. 4. Step: 4C. Statistician and BA began to visualize the data. 5. Step: 5C. Programmer backs up and stores findings with the approval of the BA. Alternate Flow 1. Step: 1C. BA and Statistician retrieve data from the cloud storage system. 2. Step: 2C. The Statistician and BA explore and understand the data set. 3. Step: 3C. Statistician begins the calculations. 4. Step: 4C. Statistician and BA began to visualize the data. Ba requests the data to be recalculated with a different approach. 5. Step: 3C. Statistician begins the new calculations. 6. Step: 4C. Statistician and BA began to visualize the data. 7. Step: 5C. Programmer backs up and stores findings with the approval of the BA. Exceptional Flow 1. Step: 1C. BA and Statistician retrieve data from the cloud storage system. 2. Step: 2C. The Statistician and BA explore and understand the data set. Statistician and BA are unable to understand the data set. Ba requests new data set. 3. Use case ends Termination The analysis is completed. The data is then saved onto the cloud storage system and exported for Publishing. This process has now being terminated. Post Condition All data analyzed, move onto the next step. 3.1.5 Requirement 2: Publish Data 3.1.5.1 Description & Priority The scope of this use case is to publish the findings from the analysis approved by the pervious requirements. A Business Analyst consults the Customer on topics such as the proprietor of the data, the goal from the publication, the target audience/data consumer (is the data confidential and for internal use only), media to which it is published and the release date. This requirement has a very high status. The Use of Twitter Activity as a Stock Market Predictor 65

3.1.5.2 Use Case Scope This process involves the communication and business skills of the BA and how to handle the customer s requirements and outcomes. The process involves the Customer, BA and the Advertising/Publications division. The process shall publicize the findings to the desired audience with the approval of the customer and recommendations of the BA. Description This use case describes the process to which the data is publicized. Use Case Diagram The Use of Twitter Activity as a Stock Market Predictor 66

Flow Description Precondition The Data must be available for analysis at all times. Customer/Client must be available for analysis at all times. Activation Use case is activated when the findings are present to BA, Customer and Advertising/Publication Division and all three are engaged in communication. Main Flow 1. Step: 1D. BA, Customer and Advertising/Publication Division retrieve analysis findings. Findings have acquired owner s approval. 2. Step: 2D. BA and Customer discuss the objective of the findings release. 3. Step: 3D. BA and Customer began to agree on the target audience/data consumer. 4. Step: 4D. Customer decides the medium type/the style and method of publicizing the data e.g. websites, newspaper, with the BA s approval and the assistance of the Advertising/Publication Division. 5. Step: 5D. BA notifies Advertising/Publication Division to publish the data. Alternate Flow 1. Step: 1D. BA, Customer and Advertising/Publication Division retrieve analysis findings. Findings have acquired owner s approval. 2. Step: 2D. BA and Customer discuss the objective of the findings release. 3. Step: 3D. BA and Customer began to agree on the target audience/data consumer. 4. Step: 4D. Customer decides the medium type/the style and method of publicizing the data e.g. websites, newspaper, with the BA s approval and the assistance of the Advertising/Publication Division. Customer decides to recommence Step: 3D. Again to change the publication approach. 5. Step: 3D. BA and Customer began to agree on a new target audience/data consumer 6. Step: 4D. Customer decides the medium type/the style and method of publicizing the data e.g. websites, newspaper, with the BA s approval and the assistance of the Advertising/Publication Division. 7. Step: 5D. BA notifies Advertising/Publication Division to publish the data. The Use of Twitter Activity as a Stock Market Predictor 67

Exceptional Flow 1. Step: 1D. BA, Customer and Advertising/Publication Division retrieve analysis findings. Findings have not acquired owner s approval. Customer decides not to publicize the data findings due to the high importance and confidentiality of the findings. 2. Use case ends Termination The publication of the data is completed. This process has now being terminated. Post Condition All data publicize, all steps completed. 3.2 Non-Functional Requirements 3.2.1 Availability: Must Have The information must be available at all times for analysis. 3.2.2 Storage Requirements: Must Have The data kept during and after the analysis should be stored in a secure facility. Cloud storage security protocols must be assessed. The must be enough capacity in the cloud to hold the large amount of data. 3.2.3 Connection Reliability: Must Have It must have a reliable connection at all times when retrieving, uploading and updating the data. Connection lost could transpire into losing data. 3.2.4 Connection Speed: Must Have It must have fast online connection. This is needed when retrieving, uploading and updating the data. A large data set could take some time to upload. 3.2.5 Backup and Recovery: Must Have The data must be easily accessed, backed up and updated. It must have a system recovery in the case of a system failure. 3.2.6 Program to clean data: Must Have The analysis must have the correct programs to clean and fix any errors in the data. 3.2.7 Software Analysis tools: Must Have The analysis must have the correct software analysis tools that all divisions of the analysis can exercise. The Use of Twitter Activity as a Stock Market Predictor 68

3.2.8 Communication Requirements: Must Have The analysis must have constant communication between all divisions/ parties in the decision making process. 3.2.9 Security: Must Have The analysis must have high security measures. The analysis is operating with highly confidential data. Only key divisions from the analysis must have accuses to the data. 3.2.9 Data Validation: Must Have This process requires the use of external services in order to download the data. Once the data is gathered from the services (Twitter, Nasdaq) it should be validated. 5 Interface Requirements 5.1 GUI An example of a analysis of tweets. vii comprendia. 2014 Examples of tweets analyzed on Microsoft Excel and Geo Flow The Use of Twitter Activity as a Stock Market Predictor 69

viii powerpivotblog. 2013 The Use of Twitter Activity as a Stock Market Predictor 70

Analysis of tweets using R language ix evolutionanalytics. 2013 Example of Excel Data for intro to Regression. This is using stock market data. x skilledup. 2013 The Use of Twitter Activity as a Stock Market Predictor 71

Example of analysis completed on R Studio. xi datamachines. 2012 6 Analysis Evolution The analysis will evolve over time to produce a much more focused outcome, differencing itself by the analysis of a specific product in the Samsung product range. This can occur by changing the mining of keys words in the twitter data, focusing on a product such as the Galaxy products in the Samsung range. These include the smartphone, Tablet and Watch. If the customer Samsung required an analysis to focus on the release of a specific product such as the Galaxy S4 which was released April 2013 this can be done by narrowing down the search key word, using hash tags and words such as (#samsungs4, #SamsungGalaxyS4, #GalaxyS4 #S4) and narrowing down the time lines to the release date of the phone. The Use of Twitter Activity as a Stock Market Predictor 72

Progress Management Report 1 Document Location This document will be uploaded through Turnitin. Revision History Date of this revision: 9/03/14 Revision date Prevision revision date Summary of changes 9/03/14 First Issue Changes marked Approvals This project requires the following approvals. Name Signature Title Date of issue Version Robert Coyle Project 10/03/14 1 Manager Distribution Name Title Date of issue Version Oisin Creaner Project Lecturer 10/03/14 1 The Use of Twitter Activity as a Stock Market Predictor 73

Purpose of Document Is to provide Oisin Creaner the project lecturer with a summary of the status of the project. Date of report 09/03/14 Period covered 10/02/14 9/03/14 Schedule Status This project is still on schedule at this interval. Updated Gantt chart Project Proposal Create Python codes Data retrival from Twitter API and Data retrival from Twitter API and Management Progress Report 1 Management Progress Report 2 03-Feb 23-Feb 15-Mar 04-Apr 24-Apr 7 5 1 5 3 1 4 8 7 25 3 20 Definitions, Acronyms, and Abbreviations Term API JSON NASDAQ RSS Definition Application programming interface JavaScript Object Notation American Stock Exchange Rich Site Summary The Use of Twitter Activity as a Stock Market Predictor 74

Products completed during this period Project proposal Requirements specification The project proposal was completed on time. See (Coyle, 2014) Requirements specification was completed on time with changes t project scope. See (Coyle, 2014) Problems Actual Accessing Twitter API Acquiring free historical data. Twitter API has being more difficult to access than first anticipated due to change of regulations and updated version of twitter. The API only supports JSON. Historical feeds are proving to be difficult, as twitter has sold their data to approved sites for resale. As this project has no budget this has being a high impact on the plan. Twitter has released a grant application form online for accessing their historical data. Potential The quality and quantity of the twitter data. Gathering the data in the required time. Not having the JSON code yet I am not sure what my expected returned of data will be. Using a site called Twillert, I acquired some data but the site won t gather more that the first 100 RSS feeds, this rendering the service useless. Once I have a response from the Twitter developers grant I can determine whether the historical data is possible to acquire and progress to the next stage of the project. The Use of Twitter Activity as a Stock Market Predictor 75

Raid Log: Risks The Use of Twitter Activity as a Stock Market Predictor 76

Assumptions Issues Dependency Products due for completion By the next period the following should be accomplished. Gathering of Twitter feeds. Gathering of stock market data. Analysis of data. Preliminary presentation. Should have gathered all twitter data either historical or real time in relation to Samsung. Should have gathered all Nasdaq data in relation to Samsung in the same time series as the twitter data. Once all data has being gathered analysis can take place. Should have Preliminary presentation completed. The Use of Twitter Activity as a Stock Market Predictor 77

Projects write up. Management Progress Report 2. Commenced first draft. This repot will be the end of this period. Project Issues Statues We currently have 2 issues on the project issue log, these haven t being resolved and are currant outstanding. Both are waiting upon external client response. Conclusion This project, even with the set backs is still capable of finishing within the original set target dates. Gathering all the data in the next week is paramount for the success of the project. Any more delays will compromise the quality of the project. Currently I am waiting on a response from Twitter in relation with their Developers grant scheme. If this is approved all the historic data from January 2013 to March 2014 will be available and can be gathered using JSON coding language, See Dependences Ref: D02. All necessary information has being submitted to the Twitter Developer Grant scheme such as dates, key words and hash tags. Alternatives: If this grant is not approved the project can revert back to streaming the data live form Twitter using JSON language. If the grant approval takes to long the project can revert back to streaming the data live form Twitter using JSON language. The Use of Twitter Activity as a Stock Market Predictor 78

Progress Management Report 2 Document Location This document will be uploaded through Turnitin. Revision History Date of this revision: 30/03/14 Revision date Prevision revision date Summary of changes 30/03/14 First Issue Changes marked Approvals This project requires the following approvals. Name Signature Title Date of issue Version Robert Coyle Project 30/03/14 1 Manager Distribution Name Title Date of issue Version Oisin Creaner Project Lecturer 30/03/14 1 The Use of Twitter Activity as a Stock Market Predictor 79

Purpose of Document Is to provide Oisin Creaner the project lecturer with a summary of the status of the project. Date of report 30/03/14 Period covered 10/03/14 30/03/14 Schedule Status This project is still on schedule at this interval. Updated Gantt chart Project Proposal Create Python codes Data retrival from Twitter API and Data retrival from Twitter API and Management Progress Report 1 Management Progress Report 3 03-Feb 23-Feb 15-Mar 04-Apr 24-Apr 14-May 7 5 1 5 3 1 4 7 7 7 14 Definitions, Acronyms, and Abbreviations Term API JSON NASDAQ RSS Definition Application programming interface JavaScript Object Notation American Stock Exchange Rich Site Summary The Use of Twitter Activity as a Stock Market Predictor 80

Products completed during this period Progress Management report 1 The Project management report 1 was completed on time. See (Coyle, 2014) Problems Actual Accessing Twitter API The decision has being made under advisement from project lecturers to duplicate the twitter feeds using the Twilert application. Twilert provides a free service for accessing live twitter feeds however it only delivers 100 RSS feeds per day. The trial run lasts for 15 days so it will provide the project over 1500 tweets. These tweets will then be duplicated to match the historic stock market prices. The stock market data provide daily end of day prices. Potential The quality and quantity of the Twitter data provide by Twilert. The Twitter data provided by Twilert must be of good quality and having enough data is essential. Data will be duplicated otherwise. The Use of Twitter Activity as a Stock Market Predictor 81

Raid Log: Risks Open Risks Ris k Ref R01 R02 R03 Risk Categ ory technol ogy cost time Closed Risks Ris k Ref R01 R02 R03 Risk Categ ory technol ogy cost time Date last reviewed Risk Description No data backup available Acquiring data for free. Acquiring data on time. Risk Description No data backup available No costs needed for use of data Data will be aquired on time. 30/03 /2014 Raised by R.Coyle R.Coyle R.Coyle Raised by R.Coyle R.Coyle R.Coyle Dat e Iden tifie d Pri orit y Im pac t Pr o b 10- Feb- 14 H H L 10- Feb- 14 M M L 10- Feb- 14 M H H Dat e Iden tifie d Pri orit y Im pac t Pr o b 17- Feb- 14 H H L 24- Mar- 14 L L L 24- Mar- 14 M H H Mitig ation Cate gory preve ntion accep tance preve ntion Mitig ation Cate gory preve ntion accep tance conti ngenc y Mitig ation Sourc e onlin e stora ge for data. Sourc e free histor ic twitte r feeds. Sours e the data on time. Mitig ation Sourc e hard drive for stora ge Using differ ent data. Sours e the data on time. O wn er RC RC RC O wn er RC RC RC Up dat e Up dat e Dat e upd ated 10- Feb- 14 10- Feb- 14 10- Feb- 14 Dat e upd ated 10- Jun- 14 24- Mar- 14 24- Mar- 14 E nd D at e E nd D at e The Use of Twitter Activity as a Stock Market Predictor 82

Assumptions Assumptions The purpose of this document is to surface, document, analyse and monitor the key assumptions upon which the plan is based. Planning parameters, design parameters, issues and risks will be generated from these assumptions Ref # Assumption Importance Certainty Influence Test Test Date A01 A02 A03 A04 Lecturers will provide prompt feedback and guidance 4 - critical 3 - Probable H Send request to test level of response Twitter will repley to my grant request for the use of their historic data. 2 - somewhat 1 - unknown L Wait for replay. RSS feeds gathered from twitter not missing data. 3 - important 4 - Fact H Unknow as of yet. Skills developed for Continue arriving to analysis of data. 4 - critical 4 - Fact H lectures. 10-Feb- 14 03-Mar- 14 30-Mar- 14 03-Mar- 14 Issues Issues are unexpected incidents or events Issue Ref I01 I02 I03 Issue Description Unexpected issue in accessing twitter feeds. Twitter API access more complex than anticipated. No response from Twitter developer data grant scheme. Raised by RC RC RC Date Raised Impact Priority 17-Feb- 14 H H 03- Mar-14 H H 24- Mar-14 H M Action Plan Status Owner Target Resolution Date Actual Resolution Date Identify different means of accessing the twitter feeds. open RC 10-Feb-14 This issue has being brought up to Project Leturers. Awaiting response. closed RC 03-Mar-14 24-Mar-14 This issue has being brought up to Project Leturers. Alternative solution has being provided. closed RC 24-Mar-14 30-Mar-14 The Use of Twitter Activity as a Stock Market Predictor 83

Dependency Depen dency Dependency Ref D01 D02 D03 Projec t NCI Facilities External Expert External Expert Dependency Description IT facilities available for running twitter API Twitter historical data grant approval. Aquire Twitter data from Twilert. Rai sed by RC RC RC Date Rais ed Im pac t Pri orit y 10- Feb- 14 H H 03- Mar- 14 L L 30- Mar- 14 M H Peri od Affe cted Feb - Mar Mar- Apr Mar- Apr Acti on Plan Ow ner Conf irm availa bility with IT RC Awai ting resp onse from twitt er for histo rical data grant appr oval. RC Awai ting resp onse from exter nal client. RC Targ et Resol ution Date Mar- 14 Mar- 14 Apr- 14 Actu al Resol ution Date Mar- 14 Mar- 14 Products due for completion By the next period the following should be accomplished. Gathering of Twitter feeds. Gathering of stock market data. Analysis of data. Projects write up. Management Progress Report 3. Should have gathered all twitter data in relation to Samsung. Should have gathered all Nasdaq data in relation to Samsung. Once all data has being gathered analysis can take place. Commenced first draft. This report will be the end of this period. The Use of Twitter Activity as a Stock Market Predictor 84

Conclusion This project is still on course for completion within the requested timeline. The project data source has changed since there has being no replay from the Twitter research data grant scheme to access their historical data. Twilert will now provide the data for the project. It has proven to be a reliable source but can only provide access to 100 RSS feeds per day, this data however will be duplicated providing enough data to complete the project. Yahoo finance will provide the historical stock market prices. Alternatives: If the Twitter developer grant is approved within the next 2 weeks the project can revert back to using the correct historical data. Progress Management Report 3 Document Location This document will be uploaded through Turnitin. Revision History Date of this revision: 20/04/14 Revision date Prevision revision date Summary of changes 20/04/14 First Issue Changes marked Approvals This project requires the following approvals. Name Signature Title Date of issue Version Robert Coyle Project Manager 20/04/14 1 Distribution Name Title Date of issue Version Oisin Creaner Project Lecturer 20/04/14 1 The Use of Twitter Activity as a Stock Market Predictor 85

Purpose of Document The purpose of this document is to provide the project lecturer, Oisin Creaner, with a summary of the status of the project. Date of report 20/04/14 Period covered 1/04/14 20/04/14 Schedule Status This project is still on schedule at this interval. Updated Gantt chart Project Proposal Create Python codes Data retrival from Twitter API and Data retrival from Twitter API and Management Progress Report 1 Management Progress Report 3 03-Feb 23-Feb 15-Mar 04-Apr 24-Apr 14-May 03-Jun 7 5 1 5 7 4 7 7 7 7 25 Definitions, Acronyms, and Abbreviations Term API JSON NASDAQ RSS Definition Application programming interface JavaScript Object Notation American Stock Exchange Rich Site Summary Products completed during this period Acquired Stock Data This was completed on the 20-04-14. The Use of Twitter Activity as a Stock Market Predictor 86

Acquired Twitter Data This was completed on the 20-04-14. Problems Actual Analysis of Data The decision has being made to use companies in the same stock market. The three brands I have chosen are on the NASDAQ stock exchange. This has mitigated the problems that would have being encountered with different currency and time frames that are associated with foreign stock exchanges. Potential Cleaning Twitter Data Cleaning of Twitter data acquired from Java script can be completed in the short time frame that is left. Raid Log: Risks Open Risks Date last reviewed 20/04/2014 Risk Ref Risk Category Risk Description Raised by Date Identified Priority Impact Prob Mitigation Category R01 technology No data backup available R.Coyle 10-Feb-14 H H L prevention R02 cost Acquiring data for free. R.Coyle 10-Feb-14 M M L acceptance R03 time Acquiring data on time. R.Coyle 10-Feb-14 M H H prevention R04 time Data analysis. R.Coyle 20-Apr-14 H H M prevention Mitigation Owner Update Date updated End Date Source online storage for data. RC 10-Feb-14 Source free historic twitter feeds. RC 10-Feb-14 Sourse the data on time. RC 10-Feb-14 Perpare and analyze data. RC 21-Apr-14 The Use of Twitter Activity as a Stock Market Predictor 87

Closed Risks Risk Ref Risk Category Risk Description Raised by Date Identified Priority Impact Prob R01 technology No data backup available R.Coyle 17-Feb-14 H H L R02 cost No costs needed for use of data R.Coyle 24-Mar-14 L L L R03 time Data is acquired. R.Coyle 24-Mar-14 M H H Mitigation Category Mitigation Owner Update Date updated End Date prevention Source hard drive for storage RC 10-Jun-14 acceptance Using different data. RC 24-Mar-14 contingency Sourse the data on time. RC 20-Apr-14 20-Apr-14 Assumptions Assumptions The purpose of this document is to surface, document, analyze and monitor the key assumptions upon which the plan is based. Planning parameters, design parameters, issues and risks will be generated from these assumptions Ref # Assumption Importance Certainty Influence Test Test Date A01 A04 A05 A05 Lecturers will provide prompt feedback and guidance 3 - important 3 - Probable M Skills developed for analysis of data. 4 - critical 4 - Fact H Data can be cleaned and prepared for analysis. 4 - critical 4 - Fact H Cleaned data is adequate and can be analyzed 4 - critical 4 - Fact H Send request to test level of response Continue arriving to lectures. Project lectures can assist during lecture hours. Project lectures can assist during lecture hours. 10-Feb-14 03-Mar-14 20-Apr-14 20-Apr-14 Issues Issue Ref Issue Description Raised by Date Raised Impact Priority I01 Unexpected issue in accessing twitter feeds. RC 17-Feb-14 H H I02 Twitter API access more complex than anticipated. RC 03-Mar-14 H H I03 The Response from the Twitter developer data grant scheme came back rejected. RC 24-Mar-14 L L Action Plan Status Owner Target Resolution Date Actual Resolution Date Data was acquired. closed RC 10-Feb-14 20-Apr-14 This issue has being brought up to Project Lecturers. Awaiting response. closed RC 03-Mar-14 24-Mar-14 This issue has being brought up to Project Lecturers. Alternative solution has being provided. closed RC 24-Mar-14 20-Apr-14 The Use of Twitter Activity as a Stock Market Predictor 88

Dependency Depend ency Ref D01 Project NCI Facilities Depend ency Descript ion IT facilities available for running twitter API Rais ed by RC Date Raise d Imp act Prior ity 10- Feb-14 H H Perio d Affec ted Feb - Mar Actio n Plan Own er Target Resolut ion Date Actual Resolut ion Date Confir m availabi lity with IT RC Mar-14 Mar-14 Products due for completion By the next period the following should be accomplished. Cleaning of Twitter data. Cleaning of stock market data. Analysis of data. Projects write up. Twitter data will be cleaned and time series prepared for analysis. Stock data will be cleaned and time series prepared for analysis, Stock market data time series is per day. Once all data has being and cleaned analysis will begin. Commenced first draft. Conclusion This project is still on course for completion within the requested timeline. The project data source has changed since the Twitter Historical Data grant was denied. I now have gathered a weeks worth of Twitter data associated to three companies that are on the same stock exchange. I will now focus on Apple Inc., Tesla Motors, Inc. and Microsoft Corporation. These tech companies being on the same stock exchange (NASDAQ) will create a more straightforward approach to the analysis. Samsung Electronics, which was my original company I had selected to base the analysis upon, is on the Korean stock market. Not only would I have different time series but I would also have to modify the currency difference. Yahoo finance will provide the historical stock market prices. I am hoping to find a correlation between the twitter activity and the stock market prices of the three brands with a lag of around three to four days. Alternatives: If I can gather the stock market prices in hourly format the analysis would be more detailed. The Use of Twitter Activity as a Stock Market Predictor 89

References Usefulsocialmedia.com. 2014. Twitter Evolves Becoming more brand friendly Useful Social Media. [online] Available at: http://www.usefulsocialmedia.com/measurement/twitter-evolves- -becomingmore-brand-friendly [Accessed: 9 Feb 2014]. Johnson, L. 2014. Samsung Galaxy S5 release date, news, rumours, specs and price - News - Trusted Reviews. [online] Available at: http://www.trustedreviews.com/news/samsung-galaxy-s5-release-date-newsrumours-specs-and-price [Accessed: 9 Feb 2014]. Macrumors.com. 2014. Apple Continues to Lose Smartphone Share, Gain Mobile Phone Share in 4Q 2013. [online] Available at: http://www.macrumors.com/2014/01/28/apple-phone-share-4q-2013/ [Accessed: 9 Feb 2014]. Wellclever.com. 2014. Well Clever - Publisher Centric Platforms. [online] Available at: http://wellclever.com [Accessed: 9 Feb 2014]. usefulsocialmedia. 2014. Twitter Evolves -Becoming more brand friendly. [ONLINE] Available at: http://www.usefulsocialmedia.com/measurement/twitter-evolves- -becomingmore-brand-friendly. [Accessed 23 February 14]. btimes.co.uk. 2013. Samsung's $14bn is 'Biggest Marketing Budget in History. [ONLINE] Available at: http://www.ibtimes.co.uk/samsung-14bn-marketingbudget-biggest-history-525979. [Accessed 28 February 14]. comprendia. 2014. If A Tweet Falls In The Forest? Maximizing Twitter Engagement Through Time Of Day Analysis. [ONLINE] Available at: http://comprendia.com/2012/07/17/if-a-tweet-falls-in-the-forest-maximizingtwitter-engagement-and-exposure-through-time-of-day-analysis/. [Accessed 24 February 14]. powerpivotblog. 2013. Analyze a Twitter feed with Excel 2013, DataExplorer and GeoFlow. [ONLINE] Available at: http://www.powerpivotblog.nl/analyze-atwitter-feed-with-excel-2013-dataexplorer-and-geoflow/. [Accessed 24 February 14]. evolutionanalytics. 2013. What does Barack Obama tweet about most?. [ONLINE] Available at: http://blog.revolutionanalytics.com/2013/11/what-does-barackobama-tweet-about-most.html. [Accessed 24 February 14]. skilledup. 2013. 50+ (Mostly) Free Excel Add-Ins For Any Task. [ONLINE] Available at: http://www.skilledup.com/learn/businessentrepreneurship/mostly-free-excel-add-ins/. [Accessed 24 February 14]. The Use of Twitter Activity as a Stock Market Predictor 90

datamachines. 2012. Decomposing North Carolina Amendment 1 with R and Tableau (part 1). [ONLINE] Available at: http://datamachines.blogspot.ie/2012/05/decomposing-north-carolinaamendment.html. [Accessed 24 February 14]. Twilert. 2014. Twitter search alerts. [ONLINE] Available at: http://www.twilert.com. [Accessed 10 March 14]. Twitter. 2014. Overview: Version 1.1 of the Twitter API. [ONLINE] Available at: https://dev.twitter.com/docs/api/1.1/overview. [Accessed 10 March 14]. Twitter. 2014. Data Grants. [ONLINE] Available at: https://engineering.twitter.com/research/data-grants. [Accessed 10 March 14]. Yahoo Finance, 2014. Samsung Electronics Co. Ltd. [ONLINE] Available at: http://finance.yahoo.com/q/hp?s=005930.ks+historical+prices. [Accessed 30 March 14]. Twilert, 2014. Twitter search alerts. [ONLINE] Available at: http://www.twilert.com. [Accessed 10 March 14]. Yahoo Finance - Business Finance, Stock Market, Quotes, News (2014) Yahoo Finance. Available at: http://finance.yahoo.com (Accessed: 20 April 2014). The Use of Twitter Activity as a Stock Market Predictor 91