Stock market movement direction prediction using tree algorithms. Gunter Senyurt, Abdulhamit Subasi
|
|
|
- Milo Hunt
- 9 years ago
- Views:
Transcription
1 Stock market movement direction prediction using tree algorithms Gunter Senyurt, Abdulhamit Subasi [email protected], [email protected] Abstract One of the highly challenging businesses today is the task of forecasting the market movements by examining the financial time series data as correctly as possible in order to hedge against the almost incalculable risk involved and to yield better profits for investors. If there was a highly credible estimation technique available giving better results than the traditional statistical tools for financial markets, it would be a great asset for trading decision makers of all kinds such as speculators, arbitrageurs, portfolio fund managers and even individual investors. In this study CART, C4.5 and Random Forest algorithms were used to predict the movement direction of a 10 year Istanbul Stock Exchange index (XU-100). Ten technical market indicators such as momentum, MACD and RSI were used in this study as the feature set. Keywords: Price movement direction, CART, C4.5, Random Forest, forecasting, stock market. 1. INTRODUCTION The complex dynamism of the markets is characterized by the nonlinearity and nonparametric nature of the variables influencing the index movement directions including human psychology and political events. The unpredictable volatility of the market index makes it a highly challenging task to accurately forecast its path of movement. On the other hand, it is crucial for investors to estimate the trend of the markets as precisely as possible in order to reach the best trading decisions for their investments, so in this context it is in the investor's best interest to use the most accurate time series forecasting model to maximize the profit or to minimize the risk. By means of this study, it is aimed at contributing to the demonstration and verification of the XU-100 index movement path predictability through some tree algorithms. The stochastic performance parameter is accuracy and it is defined as the ratio of the correctly classified instances divided by the number of all instances. The remaining part of this study is organized into four sections. The next section presents an overview of the theoretical literature while in section 3 the research data and the structures of tree algorithms CART, C4.5, Random Forest is described. In section 4, the reports and results of empirical findings from the comparative WEKA analysis are given. Finally, the last section contains the concluding remarks. 374
2 2. Literature Review CART review The classification tree analysis CART (classification and regression trees) is suggested first by Breiman (1984) and uses the predictor variables splitting rule to build a binary decision tree (Denison, Mallick and Smith, 1998). The CART method is experimented in the credit scoring area, retail lending and evaluation of insurance risks in workers compensation showing better results than logistic regression and discriminant analysis (Friedman 1991, Devaney 1994, Lee 2006, Kolyshkina 2002). C4.5 review The C4.5 method is high in efficiency when used for inductive inference. Recent research has shown that this algorithm produces high accuracy in image segmentation (Polat and Gunes, 2009; Mazid, Ali and Tickle, 2010). In another work a hybrid approach including C4.5 is suggested with potentially high outcomes (Jiang and Yu, 2009; Mazid, Ali and Tickle, 2010). It is also used for classification of remote sensing data (Yu and Ai, 2009; Mazid, Ali and Tickle, 2010). Another variant of C4.5 successfully trimmed down the leaf node number and improved accuracy (Yang, 2009; Mazid, Ali and Tickle, 2010). Random Forest review High-dimensional classification and regression problems can be approached by using random forest algorithm that is extensively researched by Breiman (2001). Among the machine learning techniques used to predict markets random forest is quite successful (Dietterich, 2000). Though the practicality of random forest is excellent it is hard to interpret and clarify mathematically (Breiman, 2002; Lin and Jeon, 2006; Biau et al., 2008, Biau and Devroye, 2008). 3. Materials and Methods CART method CART constructs a tree where the data is separated into two parts by binary variable splits. The best divider variable and the best point to split is determined by variance minimization. The CART algorithm can be viewed as a classification procedure consisting of four distinct parts: Part 1: a variance criterion, Part 2: the criterion how good it is split, Part 3: the terminal node class assignments and estimates of resubstitution, Part 4: determining the right tree complexity (Buyukbebeci, 2009). 375
3 The root node, internal nodes and leaf (terminal) nodes constitute the CART tree. Two child nodes follow each root and internal node. Each node contains and is defined by the subset of the original learning sample. The splitting of each node into child nodes is characterized by a certain rule depending on the chosen feature. The child nodes inherit subsamples with minimum variance that measures their heterogeneity from parent nodes (Iscanoglu, 2005). The goodness of the splitting procedure is defined by an impurity function that is derived from a a variance function which is applied to each split point indicating the best point for splitting (Iscanoglu, 2005). Gini, Entropy and Twoing are the main rules for binary recursive splitting that are derived from the impurity function (Breiman, Frydman, Olshen and Stone, 1984) C4.5 Method In doing classification with C4.5, the concepts of entropy and correlation coefficient need to be explained in brief. Entropy is a measure of uncertainty among random variables in a collection of data or in other words entropy provides information about the behavior of random processes used in data analysis. Correlation coefficient has its uses as a chief statistical tool in data analysis finding the relationship between variable sets. Different ways of calculations have been introduced to boost the efficiency of the correlation coefficient among which are Kendall, Pearson s and Spearman s correlation coefficients.there are several test options with WEKA providing data classification such as training set, supplied test set, percentage split and cross validation. In this paper, cross validation is chosen as the test option (Mazid M., Ali S. and Tickle K. (2010). Random Forest method Random forests are based on conjoining lots of binary regression trees. In the process of growing these large number of regression trees independent subsets of variables are used. Random forests randomly choose variables to split and a bootstrapped sample of the dataset builds the decision trees (Efron and Tibshirani, 1993). When K trees are aggregated the predicted decision is gained as the average value over these K trees. Marking each single tree predictors by, the final outcome is: Research Data 376 (x) In this study, all experiments were conducted on WEKA software using its tree classifiers built-in tool to make comparisons of prediction performances based on the chosen dataset. The dataset is comprised of 10 input variables with 2733 instances in total. These 10 input attributes are technical market indicators as used by Kara, Boyacioglu and Baykan (2010) which are 10-day moving average, 10-day weighted moving average, momentum, stochastic %K, stochastic %D, RSI (Relative Strength Index), MACD (moving average convergence divergence), Larry William's %R, A/D (Accumulation/Distribution) Oscillator and CCI
4 (Commodity Channel Index). The total number of cases or 2733 trading days have 1440 days with increasing direction (advances), while 1293 days show decreasing direction (declines). In the analysis, 10-fold cross-validation was used as the test option in WEKA. 4. Results and Discussion The relevance and quality of the data, usually, has a big impact on the performance of the model used. Thus, the choice of data becomes the most important part in forecasting the markets. In this study, all series are real-valued and the input data spans from 02/01/1997 to 31/12/2007. For WEKA testing, the accuracy or correctly classified instances metric is utilized, showing the ability of the model to capture the data. The dataset with 10 features is tested using CART, C4.5 and Random Forest classifiers in order to see which tree algorithm has better predictive power over the others. The results of the tests can be seen in the Table 1 where CART and Random Forest classifiers have almost identical prediction power, whereas C4.5 has a little less prediction power compared to the other two tree algorithms. Table 1. Tree Classifiers Test Results CART C Random Forest % Accuracy (correctly classified instances) 5. CONCLUSION The issue of accurately predicting the stock market price movement direction is highly important for formulating the best market trading solutions. It is fundamentally affecting buy and sell decisions of an instrument that can be lucrative for investors. This study focuses on predicting the ISE National 100 closing price movement directions using tree algorithms based on the daily data from 1997 to Even though the prediction performance of tree classifiers such as CART, random forest and C4.5 do not really outperform studies alike in literature, it is still likely that the forecasting performance of the models can still be improved by doing the followings: Either the model parameters should be adjusted by thorough experimentation or the input variable sets need to be modified by selecting those input attributes that are more realistic in reflecting the market workings. (Kara, Boyacioglu, and Baykan, 2010) had already proved the significance of using ten particular technical market indicators which gave also about %78 accuracy in this study, as well. More appropriate 377
5 variables has to be found that may improve the forecasting performance of the models employed that can be a further subject of study for interested readers. Acknowledgement : We sincerely deliver our special thanks to Assist. Prof. Melek Acar Boyacioglu for her graciousness in sharing her knowledge with us. REFERENCES Biau G., Devroye L., and Lugosi G. (2008), Consistency of random forests and other averaging classifiers, Journal of Machine Learning Research, 9, Biau G., Devroye L., and Lugosi G. (2008), On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification, Technical report, Universite Paris 6. Breiman, L., Frydman, H., Olshen, R.A., and Stone, C.J. (1984), Classification and Regression Trees, Chapman and Hall, New York, London. Breiman, L. (2001), Random forests, Machine Learning, Kluwer Academic Publishers, 45, Breiman, L. (2002), Manual on setting up, using, and understanding Random Forests v3.1, Technical Report, Buyukbebeci, E. (2009), Comparison of MARS, CMARS and CART in predicting default probabilities for emerging markets, Master Thesis, METU, Ankara. David G.T. Denison, Bani K. Mallick, Adrian F.M. Smith, Biometrika, Vol.85, No.2 (1998), Devaney, S. (1994), The Usefulness of Financial Ratios as Predictors of Household Insolvency: Two Perspectives, Financial Counseling and Planning, 5, Dietterich T.G. (2000), Ensemble methods in machine learning, Lecture Notes in Computer Science, Springer-Verlag, 1-15, Efron B. and Tibshirani R. J. (1993), An Introduction to the Bootstrap, New York, Chapman and Hall. Friedman, J.H. (1991), Multivariate adaptive regression splines, The Annals of Statistics, 19, 1, Frydman, H., Olshen, R.A., and Stone, C.J., Classification and Regression Trees, Chapman and Hall, New York, London, Iscanoglu, A. (2005), Credit Scoring Methods and Accuracy Ratio, Master Thesis, METU, Ankara. Jiang S. and Yu W. (2009), A Combination Classification Algorithm Based on Outlier Detection and C4.5, Springer Publications. 378
6 Kara Y., Boyacioglu M.A., Baykan O.K., (2010). Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange. Expert Systems with Applications 38, Kolyshkina I. and Brookes R. (2002), Data Mining Approaches to Modeling Insurance Risk, Report, Price Waterhouse Coopers. Lee T., Chiu C., Chou Y., and Lu C. (2006), Mining the customer credit using classification and regression tree and multivariate adaptive regression splines, Computational Statistics & Data Analysis, 50, Lin Y. and Jeon Y. (2006), Random forests and adaptive nearest neighbours, Journal of American Statistical Association, 101, Mazid M., Ali S. and Tickle K. (2010), Improved C4.5 Algorithm for rule based classification, Recent Advances in Artificial Intelligence, Knowledge Engineering and Data Bases, Australia. Polat K. and Gune S. (2009), A novel hybrid intelligent method based on C4.5 decision tree classifier and one against-all approach for multi-class classification problems, Expert Systems with Applications, vol.36, Yang X.Y. (2009), Decision tree induction with constrained number of leaf node, Master Thesis, National Central University, Taiwan. Yu M. and Ai T.H. (2009), Study of RS data classification based on rough sets and C4.5 algorithm, In Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. Weka ( ), Waikato Environment for Knowledge Analysis, Version 3.7.3, The University of Waikato Hamilton, New Zealand. 379
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
Studying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
Stock Portfolio Selection using Data Mining Approach
IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 11 (November. 2013), V1 PP 42-48 Stock Portfolio Selection using Data Mining Approach Carol Anne Hargreaves, Prateek
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
REPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
A Decision Theoretic Approach to Targeted Advertising
82 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 A Decision Theoretic Approach to Targeted Advertising David Maxwell Chickering and David Heckerman Microsoft Research Redmond WA, 98052-6399 [email protected]
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Getting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand [email protected] ABSTRACT Ensemble Selection uses forward stepwise
Ensemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
Benchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble
Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble Dr. Hongwei Patrick Yang Educational Policy Studies & Evaluation College of Education University of Kentucky
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES
The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
A Prediction Model for Taiwan Tourism Industry Stock Index
A Prediction Model for Taiwan Tourism Industry Stock Index ABSTRACT Han-Chen Huang and Fang-Wei Chang Yu Da University of Science and Technology, Taiwan Investors and scholars pay continuous attention
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
The Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
arxiv:1301.4944v1 [stat.ml] 21 Jan 2013
Evaluation of a Supervised Learning Approach for Stock Market Operations Marcelo S. Lauretto 1, Bárbara B. C. Silva 1 and Pablo M. Andrade 2 1 EACH USP, 2 IME USP. 1 Introduction arxiv:1301.4944v1 [stat.ml]
Ira J. Haimowitz Henry Schwarz
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Clustering and Prediction for Credit Line Optimization Ira J. Haimowitz Henry Schwarz General
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems
Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
A Meta Forecasting Methodology for Large Scale Inventory Systems with Intermittent Demand
Proceedings of the 2009 Industrial Engineering Research Conference A Meta Forecasting Methodology for Large Scale Inventory Systems with Intermittent Demand Vijith Varghese, Manuel Rossetti Department
Solving Regression Problems Using Competitive Ensemble Models
Solving Regression Problems Using Competitive Ensemble Models Yakov Frayman, Bernard F. Rolfe, and Geoffrey I. Webb School of Information Technology Deakin University Geelong, VIC, Australia {yfraym,brolfe,webb}@deakin.edu.au
Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
MSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
A Learning Algorithm For Neural Network Ensembles
A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República
Introduction To Ensemble Learning
Educational Series Introduction To Ensemble Learning Dr. Oliver Steinki, CFA, FRM Ziad Mohammad July 2015 What Is Ensemble Learning? In broad terms, ensemble learning is a procedure where multiple learner
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
Package acrm. R topics documented: February 19, 2015
Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Predictive time series analysis of stock prices using neural network classifier
Predictive time series analysis of stock prices using neural network classifier Abhinav Pathak, National Institute of Technology, Karnataka, Surathkal, India [email protected] Abstract The work pertains
Up/Down Analysis of Stock Index by Using Bayesian Network
Engineering Management Research; Vol. 1, No. 2; 2012 ISSN 1927-7318 E-ISSN 1927-7326 Published by Canadian Center of Science and Education Up/Down Analysis of Stock Index by Using Bayesian Network Yi Zuo
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)
260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
NEURAL networks [5] are universal approximators [6]. It
Proceedings of the 2013 Federated Conference on Computer Science and Information Systems pp. 183 190 An Investment Strategy for the Stock Exchange Using Neural Networks Antoni Wysocki and Maciej Ławryńczuk
JetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
Forecasting Of Indian Stock Market Index Using Artificial Neural Network
Forecasting Of Indian Stock Market Index Using Artificial Neural Network Proposal Page 1 of 8 ABSTRACT The objective of the study is to present the use of artificial neural network as a forecasting tool
Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention
, 225 October, 2013, San Francisco, USA Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention Ji-Wu Jia, Member IAENG, Manohar Mareboyana Abstract---In this paper, we have
Sales Forecasting for Retail Chains
1 Sales Forecasting for Retail Chains Ankur Jain 1, Manghat Nitish Menon 2, Saurabh Chandra 3 A53097130 1, A53097652 2, A53104614 3 {anj022 1, mnmenon 2, sbipinch 3 }@eng.ucsd.edu Abstract This paper presents
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Better credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan
Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016
Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency
Selecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: [email protected] Mariusz Łapczyński
New Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
Efficiency in Software Development Projects
Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University [email protected] Abstract A number of different factors are thought to influence the efficiency of the software
