Forecasting stock markets with Twitter
|
|
|
- Prosper Dominic Boyd
- 10 years ago
- Views:
Transcription
1 Forecasting stock markets with Twitter Argimiro Arratia Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013, as part of larger study by the title Forecasting with Twitter data.
2 Motivations and Goals Does a public Sentiment Indicator extracted from daily Twitter messages can indeed improve the forecasting of social, economic, or commercial indicators, based on time series models? Answer: Experiment with all possible machine learning models reinforced with a Sentiment Index built from Twitter data, and compare their performance when trained with and without the Twitter index.
3
4 Some technical difficulties There are no public data sets available and Twitter have imposed several restrictions on retrieving on-line posted tweets. As of April 2010 Twitter forbids 3rd parties to redistribute tweets. Hence we have to create our own data set, with limitations. Begun on 22 March Use a Streaming API: One HTTP connection is kept alive to retrieve tweets as they are posted Filter the stream by keyword or user
5 The Twitter Gold Mine You can BUY data from official Twitter reseller. Very, very expensive! The Twitter-Hedge fund July 2011: Derwent Capital, a UK based hedge fund, in partnership with Bollen et al. began trading a $40 Million Hedge Fund using the Twitter Predictor.
6 How we did it Target Dataset R.Filter Model Lag... Acc. w/o Acc. w/... GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) GOOG GOOG TRUE SVM(sigmoid) VIX YHOO TRUE SVM(sigmoid) VIX YHOO TRUE SVM(sigmoid) Messages Text Processing Time Series Model Fitting and Evaluation Results table Summary 20:33 You can choose exactly who you want to be a good person. I know because I'm human.... Relevance Filter Data Cleaning Bag of Words Sentiment Index Twitter Series lags lags L.Regression N. Network SVM Target Series Figure: Overview of data collection, preprocessing, forecasting and final analysis processes.
7 Twitter Online service that allows to build social networks based on microblogging Messages or tweets up to 140 characters Special or reserved for replying; # (hashtags) for topics assignment or keywords; RT (retweet) for sharing with followers; URLs.
8 Twitter social indicators implemented Our Twitter based indicators: Volume and Sentiment Hypothesis: (Volume) More messages more variance (Volatility)? Hypothesis: (Sentiment index) negative/positive decrease/increase benefits (returns)?
9 Sentiment Analysis Goal: Determine whether a message contains positive or negative impressions on a given subject Subjectivity Recognition Polarity Detection Begin with creating a labelled corpora for supervised training a sentiment classifier We apply a recent pictorial tagging idea [Bifet et al, 2010]: Create a dataset of tweets that are automatically labelled positive if contains a smiley of the form: :-),;-D or negative if contains :-(. (Formally, consider all regular expressions from [:=8][ -]?[)D] for positive, and from [:=8][ -]?( for negative.)
10 Sentiment Analysis Expand the training datasets by Feature Lists of words: classify by frequent words with at least 5 occurrences, and topic (e.g. a stock s ticker). Clean the data previously: Remove duplicates (mostly in retweets); Remove stopwords (e.g. pronouns, prepositions, many verbs); Stemming (reduce words to their roots by removing suffixes); Negation handling (use tagging: not good NOT good); Relevance filtering: text containing some of the keywords is no guarantee of its relevance for the subject we want to classify. E.g. I really love eating an apple Apple stock soared above $404 today
11 Sentiment Classifiers (SC) Multinomial Naïve Bayes: Given D the set of all docs. and C set of class labels, a Naive Bayes classifier will assign a doc. d the class with the highest conditional probability given that document. (d is represented as n-dim. vector (w 1,..., w n ) of Bernoulli-distributed variables indicating whether word w i occurs in d). The naive assumption is that the presence of a particular feature of a class is unrelated to the presence of any other feature. We only focus on binary classification (positive/negative) 3 classes of data sets to train different SCs: English, Multi-language, Stock. Variations of SC obtained by training on these data sets with different preprocessing, feature words to represent docs, etc. Best scoring SC extracted: C-En, C-Ml, C-Stk. Accuracy: 76.49% for C-En, 79.5% for C-Ml, 76% for C-Stk
12 Sentiment Index (or Twitter series) Sentiment: A time series consisting of the daily percentage of positive tweets (over the total number of tweets posted) for each top-scoring SC. Volume: time series of daily number of tweets concerning a subject.
13 Financial Time Series Goal To predict stock s return or volatility Focus on following companies and indices: AAPL, MSFT, GOOG, YHOO S&P100 (OEX), S&P100 s implicit volatility (VXO), S&P500 (GSPC), S&P500 implicit volatility (VIX). Returns Computed from Adjusted Close Log normally distributed Log returns Volatility Computed from log returns Exponential Weighted Moving Average m V (t n ) = (1 λ) λ i 1 Rn i 2 i=1
14 Forecasting time series Models for prediction Linear Regression y = n i=0 w ix i Neural Networks (feedforward) y j = ϕ ( n i=0 w ijx ij ) Support Vector Machines (with kernel polynomial, radial or sigmoid) Model Assessment Nonlinearity: we use White neural network test for neglected nonlinearity to assess the possible nonlinear relationship between two time series Causality: apply Granger test of causality, both parametric and non-parametric, to the two time series and for different lags. Prequential evaluation: Our data is from short period, we cannot spare an independent set for training. Use past data to update/retrain the classifier and predict the direction of the time series for the following day.
15 Experimental set up Combining all different parameters give us approx experiments. Summary trees Decision trees built with REPTree (Weka), by greedy selection of attributes that give a higher performance gain. The tree give an indication of the attributes that most influent the increase on accuracy.
16 Experimental Results (General) Big failure of linear models (only improve 2.5% of the time: 64 improved/ 2418 not improved) Nonlinear models with either Twitter series improve as predictors of the trend of volatility indices (VXO, VIX) and historic volatilities of stocks. Predicting the trend of benefits is more dependent on the parameters, the input data and Twitter classifier. Best case is SVM with poly-2 kernel for predicting OEX with sentiment index obtained from C-Stk classifier.
17 Experimental Results (Specific) Success rates by model family for predicting the VIX index using Tweet Volume and lags = 3. Model family Successful Unsuccessful Success rate Neural Networks % Support Vector Machines % Success rates of SVM by kernel type for predicting the VXO index when lags = 2, plus either Twitter series. Kernel Type Successful Unsuccessful Success rate Polynomial Kernels % Radial % Sigmoid %
18 WARNING! Don t play with your home account The Twitter-hedge fund crash October 2011: Derwent Capital closes shop with reported returns of 1.86% after a month of trading with Bollen et al Twitter predictor, and is for sale today: The only machine model used in Bollen et al is a Fuzzy Logic Neural Network and they only care about price direction disregarding volatility. We were not able to test such machine (the program for it was of course not available), but in general Neural Network score low in our tests for predicting direction of returns for stocks (and the results of such experiments are very parameter-dependent). This is the END
Using Twitter as a source of information for stock market prediction
Using Twitter as a source of information for stock market prediction Ramon Xuriguera ([email protected]) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies
Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts
Analysis of Tweets for Prediction of Indian Stock Markets
Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,
Can Twitter provide enough information for predicting the stock market?
Can Twitter provide enough information for predicting the stock market? Maria Dolores Priego Porcuna Introduction Nowadays a huge percentage of financial companies are investing a lot of money on Social
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction
Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract
Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation Linhao Zhang Department of Computer Science, The University of Texas at Austin (Dated: April 16, 2013) Abstract Though
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Using News Articles to Predict Stock Price Movements
Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 [email protected] 21, June 15,
Stock Prediction Using Twitter Sentiment Analysis
Stock Prediction Using Twitter Sentiment Analysis Anshul Mittal Stanford University [email protected] Arpit Goel Stanford University [email protected] ABSTRACT In this paper, we apply sentiment analysis
Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction
Proceedings of the Federated Conference on Computer Science and Information Systems pp. 1349 1354 DOI: 10.15439/2015F230 ACSIS, Vol. 5 Sentiment Analysis of Twitter Data within Big Data Distributed Environment
Semantic Sentiment Analysis of Twitter
Semantic Sentiment Analysis of Twitter Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The 11 th International Semantic Web Conference
On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume
On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume Abstract. In this study, we explored data from StockTwits, a microblogging platform exclusively dedicated to
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
How To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
Using Tweets to Predict the Stock Market
1. Abstract Using Tweets to Predict the Stock Market Zhiang Hu, Jian Jiao, Jialu Zhu In this project we would like to find the relationship between tweets of one important Twitter user and the corresponding
Applying Machine Learning to Stock Market Trading Bryce Taylor
Applying Machine Learning to Stock Market Trading Bryce Taylor Abstract: In an effort to emulate human investors who read publicly available materials in order to make decisions about their investments,
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Emoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques
CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques Chris MacLellan [email protected] May 3, 2012 Abstract Different methods for aggregating twitter sentiment data are proposed and three
The Use of Twitter Activity as a Stock Market Predictor
National College of Ireland Higher Diploma in Science in Data Analytics 2013/2014 Robert Coyle X13109278 [email protected] The Use of Twitter Activity as a Stock Market Predictor Table of Contents
NAIVE BAYES ALGORITHM FOR TWITTER SENTIMENT ANALYSIS AND ITS IMPLEMENTATION IN MAPREDUCE. A Thesis. Presented to. The Faculty of the Graduate School
NAIVE BAYES ALGORITHM FOR TWITTER SENTIMENT ANALYSIS AND ITS IMPLEMENTATION IN MAPREDUCE A Thesis Presented to The Faculty of the Graduate School At the University of Missouri In Partial Fulfillment Of
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
A CRF-based approach to find stock price correlation with company-related Twitter sentiment
POLITECNICO DI MILANO Scuola di Ingegneria dell Informazione POLO TERRITORIALE DI COMO Master of Science in Computer Engineering A CRF-based approach to find stock price correlation with company-related
Hong Kong Stock Index Forecasting
Hong Kong Stock Index Forecasting Tong Fu Shuo Chen Chuanqi Wei [email protected] [email protected] [email protected] Abstract Prediction of the movement of stock market is a long-time attractive topic
Spam Detection on Twitter Using Traditional Classifiers M. McCord CSE Dept Lehigh University 19 Memorial Drive West Bethlehem, PA 18015, USA
Spam Detection on Twitter Using Traditional Classifiers M. McCord CSE Dept Lehigh University 19 Memorial Drive West Bethlehem, PA 18015, USA [email protected] M. Chuah CSE Dept Lehigh University 19 Memorial
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Tweets Miner for Stock Market Analysis
Tweets Miner for Stock Market Analysis Bohdan Pavlyshenko Electronics department, Ivan Franko Lviv National University,Ukraine, Drahomanov Str. 50, Lviv, 79005, Ukraine, e-mail: [email protected]
The relation between news events and stock price jump: an analysis based on neural network
20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 The relation between news events and stock price jump: an analysis based on
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Keywords social media, internet, data, sentiment analysis, opinion mining, business
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction
The Viability of StockTwits and Google Trends to Predict the Stock Market. By Chris Loughlin and Erik Harnisch
The Viability of StockTwits and Google Trends to Predict the Stock Market By Chris Loughlin and Erik Harnisch Spring 2013 Introduction Investors are always looking to gain an edge on the rest of the market.
Sentiment Analysis Tool using Machine Learning Algorithms
Sentiment Analysis Tool using Machine Learning Algorithms I.Hemalatha 1, Dr. G. P Saradhi Varma 2, Dr. A.Govardhan 3 1 Research Scholar JNT University Kakinada, Kakinada, A.P., INDIA 2 Professor & Head,
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
How can we discover stocks that will
Algorithmic Trading Strategy Based On Massive Data Mining Haoming Li, Zhijun Yang and Tianlun Li Stanford University Abstract We believe that there is useful information hiding behind the noisy and massive
ALGORITHMIC TRADING USING MACHINE LEARNING TECH-
ALGORITHMIC TRADING USING MACHINE LEARNING TECH- NIQUES: FINAL REPORT Chenxu Shao, Zheming Zheng Department of Management Science and Engineering December 12, 2013 ABSTRACT In this report, we present an
Initial Report. Predicting association football match outcomes using social media and existing knowledge.
Initial Report Predicting association football match outcomes using social media and existing knowledge. Student Number: C1148334 Author: Kiran Smith Supervisor: Dr. Steven Schockaert Module Title: One
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Machine Learning Techniques for Stock Prediction. Vatsal H. Shah
Machine Learning Techniques for Stock Prediction Vatsal H. Shah 1 1. Introduction 1.1 An informal Introduction to Stock Market Prediction Recently, a lot of interesting work has been done in the area of
SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING
I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and
Content-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
Predicting stocks returns correlations based on unstructured data sources
Predicting stocks returns correlations based on unstructured data sources Mateusz Radzimski, José Luis Sánchez-Cervantes, José Luis López Cuadrado, Ángel García-Crespo Departamento de Informática Universidad
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Crowdfunding Support Tools: Predicting Success & Failure
Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo [email protected] [email protected] Karthic Hariharan [email protected] tern.edu Elizabeth
Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.
Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques. Akshay Amolik, Niketan Jivane, Mahavir Bhandari, Dr.M.Venkatesan School of Computer Science and Engineering, VIT University,
Data Pre-Processing in Spam Detection
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Why is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,
Challenges of Cloud Scale Natural Language Processing
Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India [email protected] 2 School of Computer
Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention
, 225 October, 2013, San Francisco, USA Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention Ji-Wu Jia, Member IAENG, Manohar Mareboyana Abstract---In this paper, we have
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research [email protected]
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research [email protected] Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
Twitter Stock Bot. John Matthew Fong The University of Texas at Austin [email protected]
Twitter Stock Bot John Matthew Fong The University of Texas at Austin [email protected] Hassaan Markhiani The University of Texas at Austin [email protected] Abstract The stock market is influenced
Twitter sentiment vs. Stock price!
Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Customer Relationship Management using Adaptive Resonance Theory
Customer Relationship Management using Adaptive Resonance Theory Manjari Anand M.Tech.Scholar Zubair Khan Associate Professor Ravi S. Shukla Associate Professor ABSTRACT CRM is a kind of implemented model
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Italian Journal of Accounting and Economia Aziendale. International Area. Year CXIV - 2014 - n. 1, 2 e 3
Italian Journal of Accounting and Economia Aziendale International Area Year CXIV - 2014 - n. 1, 2 e 3 Could we make better prediction of stock market indicators through Twitter sentiment analysis? ALEXANDER
Projektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews
Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews Walid Maalej University of Hamburg Hamburg, Germany [email protected] Hadeer Nabil University of Hamburg
LCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
Final Project Report. Twitter Sentiment Analysis
Final Project Report Twitter Sentiment Analysis John Dodd Student number: x13117815 [email protected] Higher Diploma in Science in Data Analytics 28/05/2014 Declaration SECTION 1 Student to complete
Financial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Machine Learning in Stock Price Trend Forecasting
Machine Learning in Stock Price Trend Forecasting Yuqing Dai, Yuning Zhang [email protected], [email protected] I. INTRODUCTION Predicting the stock price trend by interpreting the seemly chaotic market
