Twitter Analytics for Insider Trading Fraud Detection
|
|
|
- Reynold O’Neal’
- 10 years ago
- Views:
Transcription
1 Twitter Analytics for Insider Trading Fraud Detection W-J Ketty Gann, John Day, Shujia Zhou Information Systems Northrop Grumman Corporation, Annapolis Junction, MD, USA {wan-ju.gann, john.day2, Abstract Twitter analytics have been developed to process Twitter data at a macro level for use in an insider trading detection system. This system establishes normal trading patterns between daily stock price change and public sentiment. Two machine learning models, Support Vector Machine (SVM) and Decision Tree, are built based on annotated historical Twitter data and the Stanford Sentiment140 Tweet corpus, respectively. This paper focuses on the discussions of polarized sentiment (positive and negative), comparison of SVM and Decision Tree models, Sentiment Key Performance Index (SKPI), Daily Sentiment Index (DSI) and mood analysis. The results illustrate that Twitter SKPI and DSI are useful indexes to predict the future stock price movement on regular stock trading. Keywords: sentiment analysis; mood analysis; Twitter analytics; machine learning; Support Vector Machine; Decision Tree 1. Introduction It has been reported that daily stock price change and public sentiment are correlated [1]; while this paper focuses on the detection of insider trade fraud based on Twitter analytics. Insider trading fraud is considered as one of the major financial fraud types. Such frauds often deploy sophisticated schemes which current approaches are not able to systematically detect in a timely manner. The proposed approach is to establish normal trading patterns among daily stock price and public sentiment at a macro level. When sentiment and mood are towards positive, the stock price rises and the majority of investors are buying; while sentiment and mood are towards negative, the stock price falls and the majority of investors are selling. When comparing insider trading data, such as US Securities and Exchange Commission (SEC) Form 4, with normal trading patterns at micro level, the system is able to detect the abnormal timing of the insider s stock trade execution and issues a warning on those trades for further investigation. The remainder of this paper focuses on discussions of Twitter analytics in sentiment/mood analysis used within the insider trading detection system mentioned above. Twitter is the world s largest micro-blogging platform. English is the most frequently used language, with 39% of total Twitter messages [2]. Two supervised machine learning trained models, Support Vector Machine (SVM) and Decision Tree, are built to classify the Twitter data. They are discussed along with the training data and feature selection. Various indexes, Sentiment Key Performance Index (SKPI) and Daily Sentiment Index (DSI), are implemented for tweet volume calculations. The results of the indexes show that daily tweets appear to form the basis for predicting the future stock trend. 2. Data Collection Supervised machine learning models require annotated training and testing sets. The following are descriptions of data collection and preparation for each model. 2.1 Historical Twitter Data We established a Twitter data set pertaining to Apple Inc. from November 2012 to February Each tweet is recorded with user ID (identifier), date/time of posting, source (where a tweet is published) and tweeter content. After cleaning the data and removing duplications from the collection, we retrieve tweets written in the English language with Apple related Twitter hashtags and/or keywords. The resulting tweets are then grouped by posting day with a total number of 167,345 tweets. 2.2 Stanford Sentiment140 Tweet Corpus This Sentiment140 data set is publically available [3]. To facilitate the labeling of such a large corpus (1.6 million), Stanford used only Twitter messages (i.e., tweets) containing emoticons to determine positive or negative sentiment. Positive emoticons, such as, or negative emoticons, such as are used to classify each tweet. Before training, the emoticons were removed in order to force the modeling software to build a sentiment model exclusively from the context of the words surrounding the emoticons. Stanford used this data to build their Sentiment140 maximum entropy classifier [4]. 3. Training Data Preparation For historical Twitter data, we focus on opinioned tweets by filtering the information bearing ones, such as tweets containing http and www strings. Strong sentiment words are selected from the filtered tweets to establish positive and negative sentiment training sets with 1,500 examples in each. The sentiment words (including their variations) are defined in Linguistic Inquiry and Word Count (LIWC) 2007 dictionary. LIWC [5] is a popular text analysis software program. For example, it has been used to analyze people s mental and physical health in correlation with the words they use in speech and written samples. The LIWC 2007 dictionary is the core of this text analysis software and is composed of 4,500 words and word stems in various cognitive and emotion hierarchical categories. For instance, positive sentiment words include love, thrill, wonderful; negative words consist of negation and blame words, e.g., hate, frustrate, damn. When preparing the training data, we primarily include tweets less than words for training purposes since in longer tweets, the sentiment tends to be balanced out or with more than one
2 sentiment presented. Currently, neutral tweets are not categorized in the training data. For the Stanford Sentiment140 Tweet corpus, the process is more complicated than that of AAPL Twitter data. The following are steps used to prepare it for training: Tag each tweet using the Carnegie Mellon University (CMU) ARK part-of-speech tagger [6]. Each tokenized word within a given tweet is concatenated an ARK tag. For example, run`v or dog`n. The CMU ARK tagger has defined 25 tags, specially customized for tagging tweets. A subset of 13 tags are used (V, N, A, R,!, &, G, E, #, T, Z, X, S). Select tagged tokens that occur 50 times or more in the overall data set. A total of 15,000 tokens are selected. Segregate these high frequency tagged tokens into positive and negative groups according to the POS (positive) or NEG (negative) label on each tweet. Calculate the Total Sentiment Index (TSI) for each selected tagged token. This represents the relative sentiment of a token based on the number of times p it occurred in positive-labeled tweets and the number of times n it occurred in negative tweets. Thus, an index of -1 would occur if the token was always seen in a negative tweet, +1 if always seen in a positive tweet, and 0 if seen in equal occurrences of positive or negative. To calibrate training sets with unequal numbers of positive and negative tweets, the totalpositive over total-negative ratio ( is used to rebalance the set so that TSI=0 represents neutral. (Note: the Stanford set was already balanced, with 800,000 positive and 800,000 negative tweets, so this ratio was 1 for our experiments) Select feature tokens appearing in the training corpus occurring more than 50 times, resulting in 15,000 tokens. For each of these tokens, a TSI value was computed. Tokens with TSI value near zero were discarded, leaving 6,799 tokens with TSI values relatively greater than or less than zero. The hypothesis is that the sum of these features tends to classify the tweets they occur in as positive or negative. All other tokens in each tweet are considered to have TSI=0. The following are examples of selected tokens: welcome`a as adjective occurs 2,298 times in the whole data set, where 137 of them are negative and 2,161 are positive. Both TSI and absolute TSI of welcome`a are 0.881; cavity`n as a noun has a high negative count of 54, and low positive count of 3, resulting in a TSI of Create a feature vector for each tweet in the data set with the following elements: ID: a unique identifier, Ground Truth (GT): Pos (Positive Sentiment), Neg (Negative Sentiment), or Neu (Neutral Sentiment) [Note: no Neu records were included in this training set but were reserved for future test purposes], TSI: sum of TSI for each token in the tweet. For words not in top 6,799 tokens as described above, 0 is the default value assigned. A Boolean array indicates the presence and absence of TSI-selected tokens in a given tweet. Thus, a feature vector contains 6,802 elements consisting of an ID, GT (ground truth tag), TSI sum and 6,799 Boolean features denoting presence or absence of each key sentiment token. 4. Building Training Models Probabilistic models using supervised machine learning algorithms are built using open source text/data mining tools. 4.1 SVM (Supported Vector Machine) Model SVM is a known classifier for learning in text categorization. Given a set of training samples, each marked with one of two categories (positive and negative in this case), an SVM training algorithm builds a model that assigns a new instance into a positive or negative category. We used the historical Twitter data set described above to build a SVM model on RapidMiner [7]. RapidMiner is an open source text/data mining tool. Version is used for this project. A 10-fold cross validation is used to estimate the accuracy of the model. Based on the total of 3,000 sample tweets in our training data (1,500 each in positive and negative), the precision has reached 90.25% and recall 74.27%. 4.2 Decision Tree Model A Decision Tree model is built for a general purpose sentiment classifier by using the open source data mining tool called RuleQuest C5 [8]. The model (unboosted decision tree) was trained on the 1.6 million machine-labeled tweets from the Stanford Sentiment140 training set, which included 359 additional labeled tweets set aside for blind testing of the models. Machine-labeling was facilitated by using only tweets containing emoticons, which were removed after labeling to force the model to learn the matching sentiment from the surrounding text. Experiments are run with random samplings of 10K tweets, 100K tweets, and the full 1.6M tweet data set, respectively. The 10K random sample provides the best result with 83.3% accuracy. This exceeds Stanford s accuracy report of 79.9% (Naïve Bayes), 79.9% (Maximum Entropy) and 81.9% (SVM) when unigram and POS are used as features [4]. 5. Discussions of SVM and Decision Tree Models The SVM and Decision Tree models were trained with different training data; therefore, the results are not directly comparable. However, it seems that the SVM model tends to have higher precision when using hand labeled training data. On the other hand, the Decision Tree model was trained on a much larger corpus of 1.6 million machine-labeled examples, exposing it to many more expressions of positive and negative sentiment from real-world Twitter messages. Thus, one would expect such models to be more robust and have greater generalization skills than models trained on a smaller set of sentiment concepts. The rationale for using qualified CMU ARK tags in the 1.6M tweet Sentiment140 data is to allow the same word to support different sentiment labels, depending on the part of speech context. For example, the word wish has a positive weight when used as a noun (wish`n); while it is negative
3 when used as a verb (wish`v). The intent of using this qualification is to improve the accuracy of sentiment predictions. To test the effectiveness of ARK-tagged features vs. untagged, we built two models on a randomly selected subset of 11,000 tweets from the Stanford training corpus. One model used bare words as features and the other used the same words tagged with the CMU ARK part-of-speech tags. The tagged model achieved 71.6% accuracy compared to 67.4% for the bare-word model, thus supporting the superiority of using features tagged by part-of-speech. It should be noted that Stanford also tried using part-ofspeech tags [4] and reported little success with this approach. However, they did not use the CMU ARK tagger, which has been optimized for recognizing unconventional tokens such as hash tags, emoticons and abbreviations as tokens in a specialized language for writing tweets. It is also worth noting that our approach for building the Decision Tree model used only word tokens that occurred more than 50 times as modeling features. This was to mitigate the effect of misspellings and nonsense words which tend to occur frequently in tweets. However, there were over 800,000 distinct tokens in the raw training corpus (including punctuation, but no emoticons). This is surprising because there are only about 50,000 words in standard English, suggesting that the language used to express Twitter messages is an order of magnitude larger than English. But these additional tokens come from proper names, misspelled and abbreviated words and many other specialized tokens (e.g., emoticons) that have become part of the unique tweet vocabulary, which are not normally considered standard parts of natural languages. The effect of filtering reduced the token set size to approximately 15,000. These tokens were then further processed (described in section 3. above) to select the key sentiment tokens which tend to classify tweets as positive or negative. These are precisely the tokens with relatively high positive or negative TSI values. Neutral words (and all other unknown or unrecognized words) are effectively ignored by setting their TSI values to zero. It was observed that the TSI sum feature (i.e. sum of individual TSI values for each word in a tweet) is a very strong feature by itself, capable of classifying the evaluation set with 75% accuracy. The additional 6,799 key token features are much weaker in classification strength, but collectively can raise the accuracy by additional 8 percent points to 83%. 6. Granger Causality and Durbin-Watson Test The tweets for days that the stock market is open from November 13, 2012 to February 5, 2013 (total number of 167,345) are processed by the trained SVM model. Instead of using a daily count of positive and negative tweets as the metric, a Sentiment Key Performance Index (SKPI) and stock market value time series are used as an indicator of sentiment. Granger Causality Analysis (GCA) is applied to the daily time series produced by the daily SKPI and AAPL stock market value time series. GCA is a standard test in finance and economics to discover causal links between independently generated time series. GCA is based on the assumption that if a variable X causes Y, then changes in X will systematically occur before changes in Y and the lagged values of X will illustrate a statistically significant correlation with Y. SKPI has shown Granger causality relation with AAPL stock price movement for lags ranging from 1 to 5 days (p<0.1), where a 3-day lag has the smallest p value (p<0.05). In other words, SKPI is shown to predict AAPL daily stock price movement with a 3-day lag (i.e., 3 days prior) as shown in Table 1. Table 1. GCA Results for AAPL price/skpi Lag (Day) Y=f(X)/X=f(Y) /0.045* /0.027* /0.018** /0.070* /0.034* / /0.152 Legend: **p < 0.05, *p < 0.1 Format Y=f(X)/X=f(Y); X Granger causes Y/Y Granger causes X; X = AAPL daily stock value changes, Y = SKPI However, correlation in GCA does not prove causation. GCA results are then validated with a Durbin-Watson (DW) test to filter out any spurious results. The DW test result implies a valid test with a high DW value (DW= 2.77). 7. Daily Sentiment Index (DSI) and Stock Trend DSI is created to compute the daily positive and negative sentiment counts returned by the model. DSI ranges are between a -1 and +1 scale to normalize daily fluctuations in tweet volume where tp = total sum of daily positive counts; tn = total sum of daily negative counts; n = daily negative tweet count and p = daily positive tweet count. DSI behaves like a time-derivative and spikes up or down during sentiment change. Three regression models are produced for comparison purposes. Model #3 shows the best results for predicting future stock trends given past stock trend and multiple DSI values (Figure 1). Model #1 => Predicted future AAPL Trend from Past Price Model #2 => Predicted future AAPL Trend from Past Price + AAPL DSI Model #3 => Predicated future AAPL Trend from Past Price + Multiple Source-DSI (Note: computed from 23 Tweet sources) To maximize the persistence and effectiveness of using tweet source as a model feature, we used the 21 most frequently occurring sources. The remaining 2,386 sources were aggregated into a 22nd virtual source called OTHER. The 23rd source was the aggregate of all sources, called ALL. See appendix for the source list.
4 Figure 1. Plots for three models: left panel = Model#1; middle panel = Model#2 and right panel = Model#3 Table 2. CGA Results for AAPL price/negative Mood Analysis Lag (Day) Positive Emotion Swear Anxiety Anger Sadness Tentative Certain / / / / / / **/ / / / / / / */ / / / / / / */ / / / / / / **/ / /0.002** 0.40/ /0.003** 0.59/ / */ / /0.005** 0.53/ /0.008** 0.72/ / / / /0.01** 0.47/ /0.009** 0.69/ **/0.03** 0.03**/0.22 Legend: **p < 0.05, *p < 0.1 Format Y=f(X)/X=f(Y); X Granger causes Y/Y Granger causes X; X = AAPL daily stock value changes, Y = word ratio Table 3. DW Validations of AAPL price/negative Mood Analysis CGA Results Durbin Watson Test Positive Emotion Swear Anxiety Anger Sadness Tentative Certain Legend: 2 = no correlation; 0 = positive correlation; 4 = negative correlation 8. Mood Analysis and Stock Trend We also investigated whether Mood might better form a basis for modeling stock trends by analyzing the negative sentiment tweets using LIWC. The negative sentiment is subcategorized into Swear, Anxiety, Anger, and Sadness based on the LIWC 2007 dictionary. In addition, two cognitive categories Tentative and Certain are also processed. The output is the word ratio of each category presented in daily tweets. GCA is performed based on the word ratio and AAPL daily changes in stock value. The results show that only Swear and Anger correlate with the AAPL stock price change in 5- day lag (Table 2). On the other hand, daily AAPL stock value correlates with Tentative in a 7-day lag, and Certain in 1, 4, and 7 day lags. However, DW test validation indicates that Swear and Anger have weak positive correlation with AAPL stock changes, and 1.989, respectively (Table 3), which is different from the result of Bollen et al [1]. They concluded that mood is a better predictor than sentiment for the stock market. We believe that the difference in data cleansing may be the potential cause. 9. Conclusion Two supervised models, SVM and Decision Tree, have been built for Twitter analytics as part of an insider trading
5 detection system. SVM has achieved high precision/recall when using historical AAPL tweet data, while Decision Tree with positive features reached 83.3% accuracy when processing a 10K random sample data set from the Stanford Sentiment140 Tweet corpus. Two major indexes, SKPI and DSI are discussed. They appear to be capable of predicting the stock price movement with data used in this project. DSI, when combined with Twitter sources, shows the best results in prediction. Acknowledgment This project is funded by Northrop Grumman Corporation 2013 Internal Research and Development program. Comments or opinions expressed in this paper do not necessarily represent the position of the company. We would like to thank Jim Sowder for helpful discussions. References [1] Johan Bollen, Huina Mao, Xiao-Jun Zeng. Twitter mood predicts the stock market. IEEE Computer, 44(10): 91-94, October [2] Arabic highest growth on Twitter, Semiocast, November 24, URL: on_twitter [3] Senitment140 Tweet Corpus. URL: [4] Alec Go, Richa Bhayani, Lei Huang. Twitter sentiment classification using distance supervision. Technical Report, Stanford University, [5] James W. Pennebaker, Cindy K. Chung, Molly Ireland, Amy Gonzales, Roger J. Booth. The development and psychometric properties of LIWC2007. LIWC2007 Manual. [6] Kevin Gimpel, Nathan Schneider, Brendan O Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, et al Part-of-speech tagging for Twitter: annotation, features, and experiments In Proceedings of the Annual Meeting of the Assoication for Computational Linguistics, companion volume, Portland, OR, June [7] RapidMiner. v Rapid-I GmbH, Stockumer Str. 475, Dortmund, Germany URL: [8] RuleQuest C5. URL:
6 Appendix List of Tweet Sources From Historical Tweet Data The metadata for November 2012 to February 2013 that we analyzed included the source for each tweet, an identifier representing the particular stream feed that each tweet was sampled from. There were 2,400 unique sources for the entire corpus, which appeared to be distributed according to a power law. The most frequent source web accounted for about 20% of the corpus. Whereas there were over 1,000 sources which only occurred once in the corpus. Rank Count Source web Twitter for iphone twitterfeed dlvr.it Twitter for Android Instagram TweetDeck IFTTT HootSuite Tweet Button Round Team ios Twitter for BlackberryAr Mobile Web Twitter for ipad Google Tweetbot for ios Sylvester Trends StockTwits Web Echophone Twitter for Mac Other (2,386 other sources) All (all sources)
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement
Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market
Forecasting stock markets with Twitter
Forecasting stock markets with Twitter Argimiro Arratia [email protected] Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction
Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies
Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts
Using Twitter as a source of information for stock market prediction
Using Twitter as a source of information for stock market prediction Ramon Xuriguera ([email protected]) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of
Semantic Sentiment Analysis of Twitter
Semantic Sentiment Analysis of Twitter Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The 11 th International Semantic Web Conference
Tweets Miner for Stock Market Analysis
Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: [email protected]
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Analysis of Tweets for Prediction of Indian Stock Markets
Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,
Robust Sentiment Detection on Twitter from Biased and Noisy Data
Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research [email protected] Junlan Feng AT&T Labs - Research [email protected] Abstract In this
The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch
The Viability of StockTwits and Google Trends to Predict the Stock Market By Chris Loughlin and Erik Harnisch Spring 2013 Introduction Investors are always looking to gain an edge on the rest of the market.
A CRF-based approach to find stock price correlation with company-related Twitter sentiment
POLITECNICO DI MILANO Scuola di Ingegneria dell Informazione POLO TERRITORIALE DI COMO Master of Science in Computer Engineering A CRF-based approach to find stock price correlation with company-related
Kea: Expression-level Sentiment Analysis from Twitter Data
Kea: Expression-level Sentiment Analysis from Twitter Data Ameeta Agrawal Computer Science and Engineering York University Toronto, Canada [email protected] Aijun An Computer Science and Engineering
Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians
Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians Lucas Brönnimann University of Applied Science Northwestern Switzerland, CH-5210 Windisch, Switzerland Email: [email protected]
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
Sentiment analysis using emoticons
Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was
Twitter sentiment vs. Stock price!
Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured
Twitter Stock Bot. John Matthew Fong The University of Texas at Austin [email protected]
Twitter Stock Bot John Matthew Fong The University of Texas at Austin [email protected] Hassaan Markhiani The University of Texas at Austin [email protected] Abstract The stock market is influenced
Projektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
Italian Journal of Accounting and Economia Aziendale. International Area. Year CXIV - 2014 - n. 1, 2 e 3
Italian Journal of Accounting and Economia Aziendale International Area Year CXIV - 2014 - n. 1, 2 e 3 Could we make better prediction of stock market indicators through Twitter sentiment analysis? ALEXANDER
Can Twitter provide enough information for predicting the stock market?
Can Twitter provide enough information for predicting the stock market? Maria Dolores Priego Porcuna Introduction Nowadays a huge percentage of financial companies are investing a lot of money on Social
Using Social Media for Continuous Monitoring and Mining of Consumer Behaviour
Using Social Media for Continuous Monitoring and Mining of Consumer Behaviour Michail Salampasis 1, Giorgos Paltoglou 2, Anastasia Giahanou 1 1 Department of Informatics, Alexander Technological Educational
Predicting Stock Market Fluctuations. from Twitter
Predicting Stock Market Fluctuations from Twitter An analysis of the predictive powers of real-time social media Sang Chung & Sandy Liu Stat 157 Professor ALdous Dec 12, 2011 Chung & Liu 2 1. Introduction
Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.
Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Akshay Amolik, Niketan Jivane, Mahavir Bhandari, Dr.M.Venkatesan School of Computer Science and Engineering, VIT University,
Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
Sentiment Analysis Tool using Machine Learning Algorithms
Sentiment Analysis Tool using Machine Learning Algorithms I.Hemalatha 1, Dr. G. P Saradhi Varma 2, Dr. A.Govardhan 3 1 Research Scholar JNT University Kakinada, Kakinada, A.P., INDIA 2 Professor & Head,
Keywords social media, internet, data, sentiment analysis, opinion mining, business
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Sentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: [email protected] Mihaela Cocea Email: [email protected] Sanaz Fallahkhair Email:
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
How To Predict Stock Price With Mood Based Models
Twitter Mood Predicts the Stock Market Xiao-Jun Zeng School of Computer Science University of Manchester [email protected] Outline Introduction and Motivation Approach Framework Twitter mood model
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
The Use of Twitter Activity as a Stock Market Predictor
National College of Ireland Higher Diploma in Science in Data Analytics 2013/2014 Robert Coyle X13109278 [email protected] The Use of Twitter Activity as a Stock Market Predictor Table of Contents
Social Market Analytics, Inc.
S-Factors : Definition, Use, and Significance Social Market Analytics, Inc. Harness the Power of Social Media Intelligence January 2014 P a g e 2 Introduction Social Market Analytics, Inc., (SMA) produces
Sentiment Analysis and Topic Classification: Case study over Spanish tweets
Sentiment Analysis and Topic Classification: Case study over Spanish tweets Fernando Batista, Ricardo Ribeiro Laboratório de Sistemas de Língua Falada, INESC- ID Lisboa R. Alves Redol, 9, 1000-029 Lisboa,
Using Tweets to Predict the Stock Market
1. Abstract Using Tweets to Predict the Stock Market Zhiang Hu, Jian Jiao, Jialu Zhu In this project we would like to find the relationship between tweets of one important Twitter user and the corresponding
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Emoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques
CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques Chris MacLellan [email protected] May 3, 2012 Abstract Different methods for aggregating twitter sentiment data are proposed and three
On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume
On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume Abstract. In this study, we explored data from StockTwits, a microblogging platform exclusively dedicated to
Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller
Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence Dr. Sulkhan Metreveli Leo Keller The greed https://www.youtube.com/watch?v=r8y6djaeolo The money https://www.youtube.com/watch?v=x_6oogojnaw
Text Opinion Mining to Analyze News for Stock Market Prediction
Int. J. Advance. Soft Comput. Appl., Vol. 6, No. 1, March 2014 ISSN 2074-8523; Copyright SCRG Publication, 2014 Text Opinion Mining to Analyze News for Stock Market Prediction Yoosin Kim 1, Seung Ryul
Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract
Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation Linhao Zhang Department of Computer Science, The University of Texas at Austin (Dated: April 16, 2013) Abstract Though
C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
Statistical Feature Selection Techniques for Arabic Text Categorization
Statistical Feature Selection Techniques for Arabic Text Categorization Rehab M. Duwairi Department of Computer Information Systems Jordan University of Science and Technology Irbid 22110 Jordan Tel. +962-2-7201000
SENTIMENT ANALYSIS: TEXT PRE-PROCESSING, READER VIEWS AND CROSS DOMAINS EMMA HADDI BRUNEL UNIVERSITY LONDON
BRUNEL UNIVERSITY LONDON COLLEGE OF ENGINEERING, DESIGN AND PHYSICAL SCIENCES DEPARTMENT OF COMPUTER SCIENCE DOCTOR OF PHILOSOPHY DISSERTATION SENTIMENT ANALYSIS: TEXT PRE-PROCESSING, READER VIEWS AND
IIIT-H at SemEval 2015: Twitter Sentiment Analysis The good, the bad and the neutral!
IIIT-H at SemEval 2015: Twitter Sentiment Analysis The good, the bad and the neutral! Ayushi Dalmia, Manish Gupta, Vasudeva Varma Search and Information Extraction Lab International Institute of Information
Stock Prediction Using Twitter Sentiment Analysis
Stock Prediction Using Twitter Sentiment Analysis Anshul Mittal Stanford University [email protected] Arpit Goel Stanford University [email protected] ABSTRACT In this paper, we apply sentiment analysis
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
Initial Report. Predicting association football match outcomes using social media and existing knowledge.
Initial Report Predicting association football match outcomes using social media and existing knowledge. Student Number: C1148334 Author: Kiran Smith Supervisor: Dr. Steven Schockaert Module Title: One
Mining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Content vs. Context for Sentiment Analysis: a Comparative Analysis over Microblogs
Content vs. Context for Sentiment Analysis: a Comparative Analysis over Microblogs Fotis Aisopos $, George Papadakis $,, Konstantinos Tserpes $, Theodora Varvarigou $ L3S Research Center, Germany [email protected]
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews
Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews Walid Maalej University of Hamburg Hamburg, Germany [email protected] Hadeer Nabil University of Hamburg
End-to-End Sentiment Analysis of Twitter Data
End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India [email protected],
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Effect of Using Regression on Class Confidence Scores in Sentiment Analysis of Twitter Data
Effect of Using Regression on Class Confidence Scores in Sentiment Analysis of Twitter Data Itir Onal *, Ali Mert Ertugrul, Ruken Cakici * * Department of Computer Engineering, Middle East Technical University,
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
Beating the NCAA Football Point Spread
Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
Machine Learning in Stock Price Trend Forecasting
Machine Learning in Stock Price Trend Forecasting Yuqing Dai, Yuning Zhang [email protected], [email protected] I. INTRODUCTION Predicting the stock price trend by interpreting the seemly chaotic market
New Developments in the Automatic Classification of Email Records. Inge Alberts, André Vellino, Craig Eby, Yves Marleau
New Developments in the Automatic Classification of Email Records Inge Alberts, André Vellino, Craig Eby, Yves Marleau ARMA Canada 2014 INTRODUCTION 2014 2 OUTLINE 1. Research team 2. Research context
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau Department of Computer Science Columbia University New York, NY 10027 USA {apoorv@cs, xie@cs, iv2121@,
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Sentiment Analysis for Movie Reviews
Sentiment Analysis for Movie Reviews Ankit Goyal, [email protected] Amey Parulekar, [email protected] Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing
QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS
QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS Huina Mao School of Informatics and Computing Indiana University, Bloomington, USA ECB Workshop on Using Big Data for Forecasting
The Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
Content-Based Discovery of Twitter Influencers
Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy [email protected] [email protected]
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
CS224N Final Project: Sentiment analysis of news articles for financial signal prediction
1 CS224N Final Project: Sentiment analysis of news articles for financial signal prediction Jinjian (James) Zhai ([email protected]) Nicholas (Nick) Cohen ([email protected]) Anand Atreya ([email protected])
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
