A Big Data Analytical Framework For Portfolio Optimization Abstract. Keywords. 1. Introduction



Similar documents
International Journal of Innovative Research in Computer and Communication Engineering

Data Mining Solutions for the Business Environment

Prediction of Stock Performance Using Analytical Techniques

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

COMPUTATIONIMPROVEMENTOFSTOCKMARKETDECISIONMAKING MODELTHROUGHTHEAPPLICATIONOFGRID. Jovita Nenortaitė

Use of Big Data Technologies in Capital Markets

A Prediction Model for Taiwan Tourism Industry Stock Index

Advanced Big Data Analytics with R and Hadoop

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

How To Handle Big Data With A Data Scientist

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

Research on the Performance Optimization of Hadoop in Big Data Environment

A QoS-Aware Web Service Selection Based on Clustering

UPS battery remote monitoring system in cloud computing

Research Article EFFICIENT TECHNIQUES TO DEAL WITH BIG DATA CLASSIFICATION PROBLEMS G.Somasekhar 1 *, Dr. K.

Hybrid Data Envelopment Analysis and Neural Networks for Suppliers Efficiency Prediction and Ranking

Neural Network Applications in Stock Market Predictions - A Methodology Analysis

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Business Analytics and the Nexus of Information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

How To Understand And Understand The Theory Of Computational Finance

Hadoop. Sunday, November 25, 12

Data Mining System, Functionalities and Applications: A Radical Review

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

Big Data on Cloud Computing- Security Issues

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Neural Networks for Sentiment Detection in Financial Text

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP

Data Refinery with Big Data Aspects

Study of Lightning Damage Risk Assessment Method for Power Grid

Prediction Model for Crude Oil Price Using Artificial Neural Networks

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Hexaware E-book on Predictive Analytics

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Aggregation Methodology on Map Reduce for Big Data Applications by using Traffic-Aware Partition Algorithm

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Study on Cloud Computing Resource Scheduling Strategy Based on the Ant Colony Optimization Algorithm

Big Data: Study in Structured and Unstructured Data

The ABC Wind Power Station Construction Project Management Performance Study Project Performance Improvement

Turning Big Data into Big Decisions Delivering on the High Demand for Data

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Sentiment Analysis on Big Data

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

BIG DATA CHALLENGES AND PERSPECTIVES

Big Data Storage Architecture Design in Cloud Computing

Machine Learning using MapReduce

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Flexible Neural Trees Ensemble for Stock Index Modeling

Tweets Miner for Stock Market Analysis

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Why is Internal Audit so Hard?

Using Data Mining for Mobile Communication Clustering and Characterization

QOS Based Web Service Ranking Using Fuzzy C-means Clusters

Keywords Big Data Analytic Tools, Data Mining, Hadoop and MapReduce, HBase and Hive tools, User-Friendly tools.

Advances in Natural and Applied Sciences

Available online at Available online at Advanced in Control Engineering and Information Science

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Students will become familiar with the Brandeis Datastream installation as the primary source of pricing, financial and economic data.

NEURAL NETWORKS IN DATA MINING

Genetic algorithms for solving portfolio allocation models based on relative-entropy, mean and variance

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

Big Data in the Nordics 2012

A hybrid Approach of Genetic Algorithm and Particle Swarm Technique to Software Test Case Generation

ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Credit Card Fraud Detection Using Self Organised Map

Exploring Big Data in Social Networks

An Approach to Implement Map Reduce with NoSQL Databases

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

Predictive time series analysis of stock prices using neural network classifier

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Transcription:

A Big Data Analytical Framework For Portfolio Optimization Dhanya Jothimani, Ravi Shankar and Surendra S. Yadav Department of Management Studies, Indian Institute of Technology Delhi {dhanya.jothimani, ravi1, ssyadav}@dms.iitd.ac.in Abstract. With the advent of Web 2.0, various types of data are being produced every day. This has led to the revolution of big data. Huge amount of structured and unstructured data are produced in financial markets. Processing these data could help an investor to make an informed investment decision. In this paper, a framework has been developed to incorporate both structured and unstructured data for portfolio optimization. Portfolio optimization consists of three processes: Asset selection, Asset weighting and Asset management. This framework proposes to achieve the first two processes using a 5-stage methodology. The stages include shortlisting stocks using Data Envelopment Analysis (DEA), incorporation of the qualitative factors using text mining, stock clustering, stock ranking and optimizing the portfolio using optimization heuristics. This framework would help the investors to select appropriate assets to make portfolio, invest in them to minimize the risk and maximize the return and monitor their performance. Keywords. Portfolio Optimization, Big Data, Hadoop, DEA, Stock Selection 1. Introduction Big data has gained popularity in the recent years and has become a buzz word. However, it is time to understand what Big Data is. Big Data refers to large data being generated continuously in the form of unstructured data produced by heterogeneous group of applications from social network to scientific computing applications, and so forth. The dataset ranges from a few hundred gigabytes to terabytes that is beyond the capacity of existing data management tools that can capture, store, manage and analyze [1]. Big data is characterized by the following dimensions: Volume, Velocity, Variety, Variability, Complexity and Low Value Density [2]. Various types of data, namely, structured, semi-structured and unstructured data are produced with the advent of Web 2.0 technology [3]. In the era of big data, one of the major challenges for the capital market firms is to handle the velocity at which data is being generated, considering the production of unstructured data in financial services like investment banking. New York Stock Exchange (NYSE) and NASDAQ have employed big data applications like IBM Netezza to store and process big data from various sources. Capital market firms use big data technologies to mitigate risks (fraud mitigation, ondemand enterprise management), regulation, trading analytics (High Frequency Trading, Pretrading decision), and data tagging [4]. The firms adopting big data technologies and predictive analytics have an edge over other firms in the uncertain market conditions [5]. In finance, portfolio is defined as the collection of assets. Assets range from stocks and bonds to real estate. With the seminal work of Markowitz [6], portfolio optimization has been a topic of research. Portfolio optimization is the investment decision-making process to hold a set of financial assets to meet various criteria of the investors. In general terms, the criteria are maximizing return and minimizing risk. In this paper, the scope of the work is limited to stock analysis. Portfolio optimization consists of three major steps: asset 1

selection, asset weighting, and asset management. In this paper, a framework is proposed to integrate unstructured and structured data to make an informed investment decision. The design of the paper is as follows. Section 2 discusses the proposed framework for portfolio optimization, followed by the expected outcomes in Section 3. Conclusions are presented in Section 4. 2. Proposed Framework The proposed framework for portfolio optimization can be explained using 5-step process: (a) Data Envelopment Analysis (DEA), (b) Validation of selected stocks, (c) Stock clustering, (d) Stock ranking, and (e) Optimization. All listed firms at a particular stock exchange are considered as the initial input to the framework and the output would be a set of stocks that would maximize the return and minimize the risk. The abstract framework for portfolio optimization is shown in figure 1. DEA is used to narrow the sample space of firms by identifying the efficient firms. In order to validate these firms as potential candidates for portfolio optimization, the latest information about the company is retrieved and processed from online news articles and tweets using text mining to the sentiments about the company in current context. The validated efficient firms are clustered into different groups to aid the diversification of portfolio. This is further followed by ranking of the stocks within each cluster and followed by asset weighting using optimization algorithms. Each process is explained from section 2.1 to section 2.5. 2.1 Data Envelopment Analysis Figure 1: Framework for Portfolio Optimization Data Envelopment Analysis (DEA) is a non-parametric linear programming that calculates the efficiency score of a Decision-Making Unit (DMU) based on a given set of inputs and outputs. 2

The DMUs with score 1 are considered to be efficient [7]. Apart from its applicability in the discipline of manufacturing, DEA can be used for stock selection. In this case, stocks form the DMU. Based on the previous studies [8, 9], four input parameters, namely, total assets, total equity, cost of sales and operating expenses and two output parameters, namely, net sales and net income, can be considered. These data can be obtained from standard databases like Bloomberg. The stocks with efficiency score 1 are considered for next stage. 2.2 Hadoop Framework for Sentiment Analysis This stage involves processing of frequently generated unstructured data using Hadoop MapReduce. This step complements the previous stage. Events like election, change of management and announcement of dividends have an effect on the market sentiment, which is not captured using the quantitative analysis. As first step, online news articles and tweets of the efficient firms are retrieved. Tweets can be obtained through Twitter API but it is limited to 1500 tweets. The ease-of-use, scalability, and failover properties make Hadoop MapReduce a popular choice for processing big data efficiently [10]. Tweets and news articles are processed using text mining to obtain the positive and negative sentiments about the firm. Hadoop MapRaduce infrastructure quickens the distributed text mining process. Figure 2 shows the MapReduce framework for distributed text mining. As shown in the figure, the company tweets and news articles are distributed among different Map processors to produce intermediate data. This intermediate data is processed by the Reduce processors to give the aggregated result. The firms with positive sentiments are chosen for the next stage. 2.3 Stock Clustering Figure 2: Hadoop framework for Distributed Text Mining In this stage, the correlation coefficients of the returns of the stocks are calculated. The stocks are assigned to different clusters based on these correlation coefficients. The greater the number of 3

clusters more is the diversification. The objective for number of clusters and quality of clusters is to maximize similarity within the cluster and to minimize the similarity between the clusters. Various clustering algorithms like k-means clustering or Louvian clustering [11] can be used. This process reduces the portfolio risk through diversification of stocks [12]. These resulting clusters can consist of firms with similar business activity or size in real sense. 2.4 Stock Ranking The appropriate stocks from each cluster should be selected. The stocks in each cluster can be ranked using Artificial Neural Network (ANN) [13]. Till the previous stage, only the internal factors of the firms were considered. At this stage, external factors like Gross Domestic Product (GDP) growth rate and interest rate can be considered [12]. ANN is a model for information processing that consists of three layers: input, hidden and output layer respectively. The inputs for ANN can be GDP growth rate and interest rate and the outputs can be future return on investments. This results in ranking the stocks within a cluster. 2.5 Optimization Previous stage leads to an assumption that the investor might choose the top stocks from each cluster. But the question that still remains is: How much to invest in each stock? Previous study [12] considered simple (equal) stock weighting method, a primitive method, to allocate the resources among the stocks. Hence the ranked stocks should be optimized to maximize returns and to minimize risk. Markowitz s mean-variance model can be used at this stage [6]. Various optimization heuristics like Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and Genetic Algorithm (GA) can be used. The distribution of the stocks in a portfolio will be formed at the end of this stage. Top 3 performing portfolio will be suggested to the investor. 3. Expected Outcomes Apart from letting the investors make an informed investment decision, this framework would be useful to the naive investors as well. The expected outcome of the framework is to generate portfolio in conformance with the criteria of the investors: minimize the risk and maximize the return. The first stage of DEA would result in a group of the potential stocks for portfolio optimization. The current information related to those resulting companies is analyzed using text mining to obtain the sentiment about the company. Sentiment analysis gives the qualitative aspect of the firms. The stocks resulting from the previous stage are grouped into different categories using the correlation co-efficient of the returns. This leads to the diversification of the stocks. The stocks are ranked to select appropriate stocks from each group. At the end of the entire process, top three portfolio suggestions would be provided to the investors so that they can select one of those three. 4. Conclusion The proposed framework tries to integrate both structured data from database (stock price, balance sheet data etc) and unstructured data from online news articles and tweets. Consideration of qualitative factors (Management of firms, etc.) along with quantitative factors (financial ratios) provides better alternatives for formation of portfolio. The assets are well-diversified using k-means clustering and ANN. Top three portfolio suggestions obtained using optimization 4

heuristics gives flexibility to the investors to choose the appropriate assets suiting their risk profile. The proposed model can be applied to any stock market data. The utility of the framework is limited only to stock analysis and investments. The parameters suggested for DEA is based on the previous studies. Instead of variance as risk measure, other risk measures like Value-at-risk and downside risk measures can be used in the final stage of the framework. References [1] X. Qin, H. Wang, F. Li, B. Zhou, Y. Cao, C. Li, H. Chen, X. Zhou, X. Du, and S. Wang, Beyond simple integration of RDBMS and MapReduce Paving the way toward a unified system for big data analytics: Vision and progress, in Proceedings of the 2012 Second International Conference on Cloud and Green Computing, CGC 12, (Washington, DC, USA), pp. 716 725, IEEE Computer Society, 2012. [2] N. S. Sachchidanand Singh, Big data analytics, in International Conference on Communication, Information & Computing Technology (ICCICT), pp. 1-4, 2012. [3] M. Minelli, M. Chambers, and A. Dhiraj, Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today s Businesses Wiley CIO series). Wiley Publishing, Hoboken, New Jersey, 1st ed., 2013. [4] R. Verma and S. R. Mani, Use of big data technologies in capital markets. Available online at: http://www.infosys.com/industries/financialservices/whitepapers/documents/big-dataanalytics.pdf, 2012, Accessed on: April 2, 2014. [5] M. Peat, Big data in finance, InFinance: The Magazine for Finsia Members, vol. 127, no. 1, pp. 34 36, 2013. [6] H. Markowitz, Portfolio selection, The Journal of Finance, vol. 7, no. 1, pp. 77 91, 1952. [7] W. Cooper, L. Seiford, and J. Zhu, Data envelopment analysis: History, models and interpretations, in Handbook on Data Envelopment Analysis (W. W. Cooper, L. M. Seiford, and J. Zhu, eds.), vol. 71 of International Series in Operations Research & Management Science, pp. 1 39, Springer US, 2004. [8] H.-H. Chen, Stock selection using data envelopment analysis., Industrial Management and Data Systems, vol. 108, no. 9, pp. 1255 1268, 2008. [9] Y. S. Chen and B. S. Chen, Applying DEA, MPI and Grey model to explore the operation performance of the Taiwanese wafer fabrication industry, Technological forecasting and social change, vol. 78, no. 3, pp. 536 546, 2011. [10] J. Dittrich and J.-A. Quian -Ruiz, Efficient big data processing in hadoop mapreduce, eproc. VLDB Endow., vol. 5, pp. 2014 2015, Aug. 2012. [11] V. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment, p. 10008, 2008. [12] N. Koochakzadeh, A heuristic stock portfolio optimization approach based on data mining techniques. PhD thesis, Department of Computer Science, University of Calgary, March 2013. [13] T. Ware, Adaptive statistical evaluation tools for equity ranking models. Submitted to: Canadian Industrial Problem Solving Workshops (Calgary, Canada, May 15-19, 2005), 2005. 5