# Applying Machine Learning to Stock Market Trading Bryce Taylor

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 sorted by the time at which the articles were published so that all testing data occurred chronologically after all training data. I initially used randomized selection of the testing / training data, but found that this was an unrealistic model because any algorithm would then know how well each company did on average over the testing period since it was the same as the training period, giving it unreasonable hindsight. Research Questions: For each of the algorithms, the goal was to answer a question of the form: Given a headline released today about some company X, will the stock price of X rise by more than P percent over the next time period T? As baselines, we compared our algorithm to two simple algorithms: always responding yes and always responding no. A third algorithm was also used for comparison that took the best of these two results, retrospectively choosing the better of those two algorithms as its algorithm (note that this algorithm can never have an error rate above 50%). This algorithm represents following the general market trend, buying in an up time and shorting in a down time. Based on research by Eugene F. Fama, Lars Peter Hansen and Robert J. Shiller for their Nobel prize-winning paper in Economics, I chose T to be 3 months, in particular much longer than a week (information taken from Nobel Prize press release: onomic-sciences/laureates/2013/press.html). This is because their paper indicates it is impossible to predict prices based on public information in the short term. Longer time periods were considered, but due to only having access to 7 years of data and making predictions based on a single headline at a time, I decided to use T=3months. P was varied over different trials; a large value of P indicates looking for the most extreme increases in price (i.e. a chance to make the most money). Bayesian Classifier: For my initial tests, I chose to use a simple multinomial Bayesian classifier that analyzed Figure 1

4 Support Vector Machines: After converting the data to Matlab format, I trained Support Vector Machines (SVMs) on the data for all of the same trials as the Bayesian classifier. Unfortunately, Matlab was unable to process the full data set (12K headlines with 50K features each), so I only tested it on the reduced feature data set and the minimal feature data set. The results were almost identical to the Bayesian classifier regardless as to which type of SVM was used (polynomial, linear, etc) and roughly approximated the best baseline solution, but did not match or beat it. Principal Component Analysis: In order to determine how useful the features I had actually were, I ran principal component analysis on the data and then tested linear SVMs on several of the top principal components. Matlab's built-in principal component function was used to identify the principal components and the data with all 693 features was then projected onto this space to create new training and testing data for the SVM. Table 2 on this page shows the error rates as a function of how many of the top principal components were used. There are several things to take away from these results: first, using a small number of the top principal components performs nearly as well as the baseline and slightly better than Naive Baye's (given a testing size of ~3500, these are likely a statistically significant differences). Thus, some portion of the data does hold information significant to predicting stock prices. Furthermore, adding more features does not reduce the error rate further and in fact increases the error rate by a significant amount. This means that many of the features (those that contribute to the non-top principal components) are effectively noise for the purposes of satisfying our hypotheses. Number Components Error (MAX) BAYES BASELINE Table 2 Manual Keyword Selection Another method I tried for feature selection was to manually select features human insight indicates should be indicative. The key words corresponding to these tokens are shown below in Table 3 with their frequency in positively and negatively classified headlines; few of these stand out as strong indicators (not stronger than for example the average number of times stocks rose on the training period). Training on these symbols alone meant that few of the headlines could be processed total (~1300) and resulted in an error of 0.38 when classifying with P=0, notably worse than the baseline error of 0.33 on the same set.

### Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

Recognizing Informed Option Trading Alex Bain, Prabal Tiwaree, Kari Okamoto 1 Abstract While equity (stock) markets are generally efficient in discounting public information into stock prices, we believe

### Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market

### SOPS: Stock Prediction using Web Sentiment

SOPS: Stock Prediction using Web Sentiment Vivek Sehgal and Charles Song Department of Computer Science University of Maryland College Park, Maryland, USA {viveks, csfalcon}@cs.umd.edu Abstract Recently,

### Author Gender Identification of English Novels

Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in

### Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

### Money Math for Teens. Dividend-Paying Stocks

Money Math for Teens Dividend-Paying Stocks This Money Math for Teens lesson is part of a series created by Generation Money, a multimedia financial literacy initiative of the FINRA Investor Education

### Stock Market Efficiency and Insider Trading Kris McKinley, Elon College

Stock Market Efficiency and Insider Trading Kris McKinley, Elon College Since the U.S. stock market (NASDAQ, American Stock Exchange, and New York Stock Exchange) is an efficient market, it is impossible

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

### Forecasting stock markets with Twitter

Forecasting stock markets with Twitter Argimiro Arratia argimiro@lsi.upc.edu Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,

### Spam Filtering based on Naive Bayes Classification. Tianhao Sun

Spam Filtering based on Naive Bayes Classification Tianhao Sun May 1, 2009 Abstract This project discusses about the popular statistical spam filtering process: naive Bayes classification. A fairly famous

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### RapidMiner Sentiment Analysis Tutorial. Some Orientation

RapidMiner Sentiment Analysis Tutorial Some Orientation Set up Training First make sure, that the TextProcessing Extensionis installed. Retrieve labelled data: http://www.cs.cornell.edu/people/pabo/movie-review-data

### Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

### ALGORITHMIC TRADING USING MACHINE LEARNING TECH-

ALGORITHMIC TRADING USING MACHINE LEARNING TECH- NIQUES: FINAL REPORT Chenxu Shao, Zheming Zheng Department of Management Science and Engineering December 12, 2013 ABSTRACT In this report, we present an

### Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different

### Social Market Analytics, Inc.

S-Factors : Definition, Use, and Significance Social Market Analytics, Inc. Harness the Power of Social Media Intelligence January 2014 P a g e 2 Introduction Social Market Analytics, Inc., (SMA) produces

### Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### Equity forecast: Predicting long term stock price movement using machine learning

Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK Nikola.milosevic@manchester.ac.uk Abstract Long

### Beating the NCAA Football Point Spread

Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over

### Detecting Corporate Fraud: An Application of Machine Learning

Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several

### Predicting Soccer Match Results in the English Premier League

Predicting Soccer Match Results in the English Premier League Ben Ulmer School of Computer Science Stanford University Email: ulmerb@stanford.edu Matthew Fernandez School of Computer Science Stanford University

### Discussions of Monte Carlo Simulation in Option Pricing TIANYI SHI, Y LAURENT LIU PROF. RENATO FERES MATH 350 RESEARCH PAPER

Discussions of Monte Carlo Simulation in Option Pricing TIANYI SHI, Y LAURENT LIU PROF. RENATO FERES MATH 350 RESEARCH PAPER INTRODUCTION Having been exposed to a variety of applications of Monte Carlo

### Betting with the Kelly Criterion

Betting with the Kelly Criterion Jane June 2, 2010 Contents 1 Introduction 2 2 Kelly Criterion 2 3 The Stock Market 3 4 Simulations 5 5 Conclusion 8 1 Page 2 of 9 1 Introduction Gambling in all forms,

### Using Twitter as a source of information for stock market prediction

Using Twitter as a source of information for stock market prediction Ramon Xuriguera (rxuriguera@lsi.upc.edu) Joint work with Marta Arias and Argimiro Arratia ERCIM 2011, 17-19 Dec. 2011, University of

### BizPro: Extracting and Categorizing Business Intelligence Factors from News

BizPro: Extracting and Categorizing Business Intelligence Factors from News Wingyan Chung, Ph.D. Institute for Simulation and Training wchung@ucf.edu Definitions and Research Highlights BI Factor: qualitative

### Twitter sentiment vs. Stock price!

Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured

### BONUS REPORT#5. The Sell-Write Strategy

BONUS REPORT#5 The Sell-Write Strategy 1 The Sell-Write or Covered Put Strategy Many investors and traders would assume that the covered put or sellwrite strategy is the opposite strategy of the covered

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

### CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Automatic Inventory Control: A Neural Network Approach. Nicholas Hall

Automatic Inventory Control: A Neural Network Approach Nicholas Hall ECE 539 12/18/2003 TABLE OF CONTENTS INTRODUCTION...3 CHALLENGES...4 APPROACH...6 EXAMPLES...11 EXPERIMENTS... 13 RESULTS... 15 CONCLUSION...

### ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational

### Centralized versus Decentralized Business Strategy: Which is better for growth?

Centralized versus Decentralized Business Strategy: Which is better for growth? McDonalds Corporation (MCD) is the world s largest fast food chain. MCD s business strategy is centralized, resulting in

### Content-Based Recommendation

Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

### Measure Social Media like a Pro: Social Media Analytics Uncovered SOCIAL MEDIA LIKE SHARE. Powered by

1 Measure Social Media like a Pro: Social Media Analytics Uncovered # SOCIAL MEDIA LIKE # SHARE Powered by 2 Social media analytics were a big deal in 2013, but this year they are set to be even more crucial.

### Can Twitter provide enough information for predicting the stock market?

Can Twitter provide enough information for predicting the stock market? Maria Dolores Priego Porcuna Introduction Nowadays a huge percentage of financial companies are investing a lot of money on Social

### How to Win at the Track

How to Win at the Track Cary Kempston cdjk@cs.stanford.edu Friday, December 14, 2007 1 Introduction Gambling on horse races is done according to a pari-mutuel betting system. All of the money is pooled,

Lead Conversion win rates Introduction The Lead Conversion team has expressed concerns about the low number of wins from this project. Based on a comprehensive database of thousands of leads over 6 years,

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

### Distinguishing Opinion from News Katherine Busch

Distinguishing Opinion from News Katherine Busch Abstract Newspapers have separate sections for opinion articles and news articles. The goal of this project is to classify articles as opinion versus news

### Crowdfunding Support Tools: Predicting Success & Failure

Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo mdgreenb@u.northwestern.edu pardo@northwestern.edu Karthic Hariharan karthichariharan2012@u.northwes tern.edu Elizabeth

### Benchmarking Real Estate Performance Considerations and Implications

Benchmarking Real Estate Performance Considerations and Implications By Frank L. Blaschka Principal, The Townsend Group The real estate asset class has difficulties in developing and applying benchmarks

### Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Company Fundamentals. THE CMC Markets Trading Smart Series

Company Fundamentals THE CMC Markets Trading Smart Series How to evaluate company growth potential At any given point in time, share prices tend to represent the sum of expectations about its value from

### Pattern Recognition and Prediction in Equity Market

Pattern Recognition and Prediction in Equity Market Lang Lang, Kai Wang 1. Introduction In finance, technical analysis is a security analysis discipline used for forecasting the direction of prices through

### Increasing Marketing ROI with Optimized Prediction

Increasing Marketing ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Smart marketers are using predictive analytics to make the best offer to the best customer for the least cost.

### Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence Dr. Sulkhan Metreveli Leo Keller The greed https://www.youtube.com/watch?v=r8y6djaeolo The money https://www.youtube.com/watch?v=x_6oogojnaw

### Analysis of Tweets for Prediction of Indian Stock Markets

Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,

### MACHINE LEARNING BASICS WITH R

MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

### A Sentiment Detection Engine for Internet Stock Message Boards

A Sentiment Detection Engine for Internet Stock Message Boards Christopher C. Chua Maria Milosavljevic James R. Curran School of Computer Science Capital Markets CRC Ltd School of Information and Engineering

### T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN Goal is to process 360 degree images and detect two object categories 1. Pedestrians,

### How can we discover stocks that will

Algorithmic Trading Strategy Based On Massive Data Mining Haoming Li, Zhijun Yang and Tianlun Li Stanford University Abstract We believe that there is useful information hiding behind the noisy and massive

### Relational Learning for Football-Related Predictions

Relational Learning for Football-Related Predictions Jan Van Haaren and Guy Van den Broeck jan.vanhaaren@student.kuleuven.be, guy.vandenbroeck@cs.kuleuven.be Department of Computer Science Katholieke Universiteit

### Football Match Winner Prediction

Football Match Winner Prediction Kushal Gevaria 1, Harshal Sanghavi 2, Saurabh Vaidya 3, Prof. Khushali Deulkar 4 Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai,

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### Segmentation and Classification of Online Chats

Segmentation and Classification of Online Chats Justin Weisz Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 jweisz@cs.cmu.edu Abstract One method for analyzing textual chat

### Forex Trading Strategies: One way to trade the Non Farm Payroll report.

Forex Trading Strategies: One way to trade the Non Farm Payroll report. June 3 2011 1 Upshot Trade Signals disclaimer The information provided in this report is for educational purposes only. It is not

### Stock Market. Software User Guide

Stock Market Software User Guide Contents 1. Welcome... 3 2. GoVenture Stock Market... 4 What is the GoVenture Stock Market Simulation... 4 What Makes GoVenture Stock Market Unique... 4 How GoVenture Differs

### Online Appendix for. On the determinants of pairs trading profitability

Online Appendix for On the determinants of pairs trading profitability October 2014 Table 1 gives an overview of selected data sets used in the study. The appendix then shows that the future earnings surprises

### Rafael Witten Yuze Huang Haithem Turki. Playing Strong Poker. 1. Why Poker?

Rafael Witten Yuze Huang Haithem Turki Playing Strong Poker 1. Why Poker? Chess, checkers and Othello have been conquered by machine learning - chess computers are vastly superior to humans and checkers

### Best and Worst Dividend Paying Stocks

1 Getting risk-free U.S. Treasury securities and bank securities doesn t seem so great now, especially with the rock-bottom returns that are being offered. Instead, investors like you are looking for other

### How to Write an Effective Social Media Report

How to Write an Effective Social Media Report A GUIDE FROM BRANDWATCH S REPORT WRITERS Introduction Social media monitoring has become an important part of understanding consumers behaviour, thoughts and

### How to get profit-creating information from your accountant

How to get profit-creating information from your accountant What a tailored accounting service can do for you How could you get much more out of the accounting service you re already paying for? Possibly

### Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

### quotemedia XML and JSON Delivery Services On Demand, Low Latency, Cloud-based Market and Research Data Services

quotemedia XML and JSON Delivery Services On Demand, Low Latency, Cloud-based Market and Research Data Services QuoteMedia s Cloud Data API provides intraday on demand snap data and end of day bulk information

### Probability and Expected Value

Probability and Expected Value This handout provides an introduction to probability and expected value. Some of you may already be familiar with some of these topics. Probability and expected value are

### TRADING GAPS TAIL STRATEGY SPECIAL REPORT #37

TRADING GAPS TAIL STRATEGY SPECIAL REPORT #37 Welcome to Market Geeks special report. Today I m going to teach you a little bit about gaps, how to identify different gaps and most importantly how to put

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu

Twitter Stock Bot John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Hassaan Markhiani The University of Texas at Austin hassaan@cs.utexas.edu Abstract The stock market is influenced

### Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Analyzing Customer Churn in the Software as a Service (SaaS) Industry

Analyzing Customer Churn in the Software as a Service (SaaS) Industry Ben Frank, Radford University Jeff Pittges, Radford University Abstract Predicting customer churn is a classic data mining problem.

### Predicting Market-Volatility from Federal Reserve Board Meeting Minutes NLP for Finance

Predicting Market-Volatility from Federal Reserve Board Meeting Minutes NLP for Finance Reza Bosagh Zadeh, Andreas Zollmann 1 Introduction Predicting markets has always had a certain appeal to researchers,

### Advisor Perspectives welcomes guest contributions. The views presented here do not necessarily represent those of Advisor Perspectives.

What Matters More When Investing: A Good Company or Good Price? What to Expect When P/E Multiples Compress By John Alberg and Michael Seckler December 3, 2013 Advisor Perspectives welcomes guest contributions.

### STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

### High-frequency trading: towards capital market efficiency, or a step too far?

Agenda Advancing economics in business High-frequency trading High-frequency trading: towards capital market efficiency, or a step too far? The growth in high-frequency trading has been a significant development

### Predicting the Betting Line in NBA Games

Predicting the Betting Line in NBA Games Bryan Cheng Computer Science Kevin Dade Electrical Engineering Michael Lipman Symbolic Systems Cody Mills Chemical Engineering Abstract This report seeks to expand

### Lecture 6: Logistic Regression

Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

### Financial Text Mining

Enabling Sophisticated Financial Text Mining Calum Robertson Research Analyst, Sirca Background Data Research Strategies Obstacles Conclusions Overview Background Efficient Market Hypothesis Asset Price

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

### Social Media Monitoring in Fifteen Minutes

Social Media Monitoring in Fifteen Minutes By Murray Newlands Murray Newlands 1 Table of Contents Social Media monitoring Guides your Business Introduction: Social Media Monitoring How Social Media monitoring

### Music Classification by Composer

Music Classification by Composer Janice Lan janlan@stanford.edu CS 229, Andrew Ng December 14, 2012 Armon Saied armons@stanford.edu Abstract Music classification by a computer has been an interesting subject

### Random forest algorithm in big data environment

Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

### Analysis of Stock Symbol Co occurrences in Financial Articles

Analysis of Stock Symbol Co occurrences in Financial Articles Gregory Kramida (gkramida@cs.umd.edu) Introduction Stock market prices are influenced by a wide variety of factors. Undoubtedly, market news

### Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

### E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

### Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts

### Why is Internal Audit so Hard?

Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### Professionally Managed Portfolios of Exchange-Traded Funds

ETF Portfolio Partners C o n f i d e n t i a l I n v e s t m e n t Q u e s t i o n n a i r e Professionally Managed Portfolios of Exchange-Traded Funds P a r t I : I n v e s t o r P r o f i l e Account

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Do broker/analyst conflicts matter? Detecting evidence from internet trading platforms

1 Introduction Do broker/analyst conflicts matter? Detecting evidence from internet trading platforms Jan Hanousek 1, František Kopřiva 2 Abstract. We analyze the potential conflict of interest between

### Discussion of Momentum and Autocorrelation in Stock Returns

Discussion of Momentum and Autocorrelation in Stock Returns Joseph Chen University of Southern California Harrison Hong Stanford University Jegadeesh and Titman (1993) document individual stock momentum:

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail