FORECASTING STOCK MARKET USING BAYESIAN NETWORKS:A TEXT MINING PROACH

Similar documents

Up/Down Analysis of Stock Index by Using Bayesian Network

Do Tweets Matter for Shareholders? An Empirical Analysis

Appendices with Supplementary Materials for CAPM for Estimating Cost of Equity Capital: Interpreting the Empirical Evidence

Hong Kong Stock Index Forecasting

Do the asset pricing factors predict future economy growth? An Australian study. Bin Liu Amalia Di Iorio

Online Sentiment Contagion in Stock Market

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

LIQUIDITY AND ASSET PRICING. Evidence for the London Stock Exchange

Pattern Recognition and Prediction in Equity Market

Internet Appendix to Who Gambles In The Stock Market?

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

Discussion of Betermier, Calvet, and Sodini Who are the Value and Growth Investors?

Developing a Stock Price Model Using Investment Valuation Ratios for the Financial Industry Of the Philippine Stock Market

Evaluation of Machine Learning Techniques for Green Energy Prediction

SEC Working Papers Forum คร งท 2

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Financial Text Mining

Risk and return (1) Class 9 Financial Management,

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

Discussion Ruslan Goyenko McGill University

Chapter 5. Conditional CAPM. 5.1 Conditional CAPM: Theory Risk According to the CAPM. The CAPM is not a perfect model of expected returns.

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

Detecting stock market seasonality A period mining approach

A Prediction Model for Taiwan Tourism Industry Stock Index

Long-term Stock Market Forecasting using Gaussian Processes

Using R for Linear Regression

Equity forecast: Predicting long term stock price movement using machine learning

Data Integration: Financial Domain-Driven Approach

Investor Expectations and the Volatility Puzzle in the Japanese Stock Market

Applying Machine Learning to Stock Market Trading Bryce Taylor

Machine Learning.

Observing the Changing Relationship Between Natural Gas Prices and Power Prices

Fama-French and Small Company Cost of Equity Calculations. This article appeared in the March 1997 issue of Business Valuation Review.

EQUITY STRATEGY RESEARCH.

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM

Product Market Competition, Insider Trading. And Stock Market Efficiency

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

Real Estate Risk Exposure of Equity Real Estate Investment Trusts

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm

Machine Learning in FX Carry Basket Prediction

Neural Networks for Sentiment Detection in Financial Text

Recognizing Informed Option Trading

DOES IT PAY TO HAVE FAT TAILS? EXAMINING KURTOSIS AND THE CROSS-SECTION OF STOCK RETURNS

Review on Financial Forecasting using Neural Network and Data Mining Technique

Tilted Portfolios, Hedge Funds, and Portable Alpha

Introduction to Data Mining

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015

Volatility, Productivity Correlations and Measures of. International Consumption Risk Sharing.

SP500 September 2011 Outlook

Automated news based

Time Series Laboratory

Exchange Traded Contracts for Difference: Design, Pricing and Effects

Estimating Software Reliability In the Absence of Data

HOW WEB SEARCH ACTIVITY EXERT INFLUENCE ON STOCK TRADING ACROSS MARKET STATES?

The Trading Behavior on Ex-Dividend Day: A Study on French Stock Market

Capital Structure: Informational and Agency Considerations

Stock Valuation: Gordon Growth Model. Week 2

Financial Trading System using Combination of Textual and Numerical Data

Equity Risk Premium Article Michael Annin, CFA and Dominic Falaschetti, CFA

Liquidity of Corporate Bonds

Risk and return in Þxed income arbitrage: Nickels in front of a steamroller?

Application of Markov chain analysis to trend prediction of stock indices Milan Svoboda 1, Ladislav Lukáš 2

Prediction of Stock Performance Using Analytical Techniques

Empirical Research on Influencing Factors of Human Resources Management Outsourcing Degree *

Chapter 7 Risk, Return, and the Capital Asset Pricing Model

Answers to Concepts in Review

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

(Part.1) FOUNDATIONS OF RISK MANAGEMENT

EVALUATION OF THE PAIRS TRADING STRATEGY IN THE CANADIAN MARKET

Predict the Popularity of YouTube Videos Using Early View Data

APPLICATION OF THE VARMA MODEL FOR SALES FORECAST: CASE OF URMIA GRAY CEMENT FACTORY

Java Modules for Time Series Analysis

Illiquidity frictions and asset pricing anomalies

Asymmetric Volatility and the Cross-Section of Returns: Is Implied Market Volatility a Risk Factor?

Financial Market Efficiency and Its Implications

How to Win the Stock Market Game

Discussion: In Search of Distress Risk and Default Risk, Shareholder Advantage, and Stock Returns

Journal Of Financial And Strategic Decisions Volume 9 Number 2 Summer 1996

Homework Solutions - Lecture 2

A Behavioral Economics Exploration into the Volatility Anomaly *

Abnormal Audit Fees and Audit Opinion Further Evidence from China s Capital Market

Hedge-funds: How big is big?

Price-Earnings Ratios, Dividend Yields and Real Estate Stock Prices

Analysis of China Motor Vehicle Insurance Business Trends

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Multiple Linear Regression in Data Mining

Do investors care about noise trader risk? FARGO - Centre de recherche en Finance, ARchitecture et Gouvernance des Organisations

Knowledge Discovery in Stock Market Data

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES

Betting Against Beta

MARKETS, INFORMATION AND THEIR FRACTAL ANALYSIS. Mária Bohdalová and Michal Greguš Comenius University, Faculty of Management Slovak republic

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

FADE THE GAP: ODDS FAVOR MEAN REVERSION

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

How can we discover stocks that will

Transcription:

FORECASTING STOCK MARKET USING BAYESIAN NETWORKS:A TEXT MINING PROACH Junwei Ma Yanping Fu Tiejun Wang SWUFE

How Stock Price is Formed? Stock Market Information Traders

1. Digital information Fama & French (1970) Stiglitz (1980) Kyle (1985) Classification of Information 2. Text information Mitchell and Mulherin (1994) Tetlock (2007) (2008) 3. Video information?

Stock market forecasting method 1. Traditional method AR MA ARMA 2. Intelligence method Natural networks, SVM, Bayesian Networks

Bayesian Network Forecast What s Bayesian Network method? A Bayesian network, Bayes network, belief network, Bayesian model is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph. Why we use Bayesian Network method? 1. Many factors,such as events, weather condition, can influence investors and then effect the stock price indirectly. The So we try to use more other factors in Bayesian Networks to improve the prediction accuracy. 2. Bayesian Networks can be used even some inputs are missed

Bayesian network with financial news information Compare with traditional time series models. The structure of our Bayesian Network model.

Text Information on Gu Ba BBS Why we choose Gu Ba but not other BBS? 1. According to a survey of the leading consulting group iresearch, EastMoney.com is the leader and the largest provider of financial information service in China. 2. We obtained 2,854,061 postings of 58 representative listed firms released from late 2007 to 30th March 2012. These postings are collected from the largest financial discussion board (EastMoney.com) in China.

Quantify Messages of firms on the discussion board A firm s daily posts number from Jan.2009 to Mar.2012

Data Description Re-consider total postings according to text length, i.e. words count,99% postings have no more than 86 words.

Data Description We divide all postings into five intervals, and count the postings in different invervals. Group QUA1 QUA2 QUA3 QUA4 QUA5 Length [0,23] [24,43] [44,63] [64,85] [86,max) Percentage 0-30% 30%-50% 50%-70% 70%-99% 99%-100%

Tetlock (2007) (2008) Antweiler (2004) Mitchell (1994) Relative Work Multiple Data Source, Multiple Statistic Method 1. Text message is statistically significant to stock returns. 2. The relationship is economically small In our work: 1. If the relationship is really so slight? 2. If our Bayesian network model should include this factor to forecast stock market?

Why returns but not price? 1. Is stock price should be concerned? Situation 1. Stock A: $6 Stock B: $7 Situation 1. Period 1: Stock A: $5 Stock B: $6 Period 2: Stock B: $6 Stock B: $7 Which stock is dominating? 2. Compare with price series, return data is usually stable Spurious return

Calculation of Idiosyncratic Return What s idiosyncratic return? Part of an excess return on an asset not explainable by common factors, and independent of the specific returns of other assets. Why idiosyncratic returns? The correspondent text message from BBS is only against a special stock. How we calculate idiosyncratic returns? stock returns, intercept term, market excess returns, firm size, book to market value, idiosyncratic return. (Fama three factor) r MKT SMB HML r i i i i i i t t MKT t SMB HML t t

Regression with Total New Post Numbers Idiosyncratic. Return Number of daily Postings Leverage Effect C Trading Volume -1.21E-05 -- -1.385E-03 5.93E-05 0.0209 (Std.Error) (3.69E-07) -- (3.36E-05) (5.33E-07) (t Value) (-32.72) -- (-41.19) (-111.41) Volatility 2.09E-06-2.66E-04 6.14E-05 1.98E-06 0.0767 (Std.Error) (1.80E-08) (2.46E-06) (1.96E-06) (2.61E-08) (t Value) (116.40) (108.15) (31.37) (76.11) R2 Has the similar result, so BBS is an reliable data source to represent public opinion.

Regressions with New Postings in Different Words Intervals C QUA1 QUA2 QUA3 R 2 Idiosyncra tic Return coefficien ts -1.40E-03 6.94E-05-2.00E-04-7.10E-05 Std. Error -3.36E-05-1.68E-06-3.79E-06-7.68E-06 t-value -41.6358 41.37513-52.82047-9.248943 coefficien ts QUA4 QUA5 VOL 7.83E-05 1.21E-04 6.03E-05 -- Std. Error -1.40E-05-3.18E-06-5.32E-07 -- t-value 5.604786 38.11191 113.4495 -- 0.077

Regressions with New Postings in Different Words Intervals C QUA1 QUA2 QUA3 R 2 coefficients 7.41E-05 8.57E-06-3.60E-06-4.99E-07 Std. Error -1.95E-06-8.16E-08-1.85E-07-3.74E-07 t-value 28.01478 104.9874-19.46463-1.335186 Volatility QUA4 QUA5 VOL LEVER coefficients -7.14E-06-1.79E-06 1.91E-06 2.62E-04 0.088 Std. Error -6.80E-07-1.55E-07-2.60E-08-2.45E-06 t-value -10.50374-11.54368 73.50909 107.1097

Conclusion of this regression: The correlation between different words interval groups are different even opposite, this can explain why statistic result in other papers is significant but slight. Thus the words of new post is an important factor, it can be used in the next Bayesian network forecasting.

Future works We will quantify other text information on the discussion board,which might be related to stock price. Introduce more other factors(variables) in to Bayesian Networks and try to check whether this can improve the prediction accuracy.

Thanks