Social Media Aided Stock Market Predictions by Sparsity Induced Regression

Size: px
Start display at page:

Download "Social Media Aided Stock Market Predictions by Sparsity Induced Regression"

Transcription

1 Social Media Aided Stock Market Predictions by Sparsity Induced Regression Delft Center for Systems and Control

2

3 Social Media Aided Stock Market Predictions by Sparsity Induced Regression For the degree of Master of Science in Systems and Control at Delft University of Technology October 26, 2014 Faculty of Mechanical, Maritime and Materials Engineering (3mE) Delft University of Technology

4 Copyright All rights reserved.

5 Abstract Prediction of the stock market has been a research topic for decades. Recently, data from social media like Google and Twitter are included in prediction models. This data serves as an indicator of sentiments that are potentially useful for prediction. Interpretation of current prediction methods is cumbersome. Beforehand it is not known which data is relevant for the prediction and hence which data should be added to the model. To improve interpretability and thereby credibility of the results, this thesis uses sparse regression methods that automatically discard data that is not useful for the prediction. Current methods induce sparsity via l 1 -regularization such as the LASSO. In contrast to traditional applications, this thesis assumes that a sparse, time-varying regression vector is estimated from time series data that arrives sequentially over time. Data can thus not be treated in batch form where constant behavior over a window is assumed and hence performance of current sparse regression methods is limited. Therefore, a new Weighted Sparse Kalman Filter (Weighted-SKF) is proposed that induces sparsity in the KF equations. The KF is able to track time-varying behavior, while the sparsity ensures that interpretable results are obtained. Simulations demonstrate that the Weighted- SKF outperforms current regression methods in identifying the time-varying support and regression vector. Moreover, the time-varying usefulness of social media data is demonstrated: the Weighted-SKF includes social media data in its prediction model only during large declines in the stock market.

6 ii Abstract

7 Table of Contents Abstract Acknowledgements i xi 1 Introduction Stock Markets Social Media Aided Stock Market Prediction Sparse Models Approach and Goals of the Thesis Outline of the Thesis The LASSO and its Application to the Stock Market Introduction Mathematical Framework and Notations The LASSO and its Properties Background and Origin of the LASSO Sparsity via l 1 -Regularization The Oracle Properties and the Adaptive LASSO Tuning Parameter λ Application of the LASSO to the Stock Market Classical LASSO applications Stock Market Application of the LASSO Challenges with the Stock Market Application of the LASSO Summary

8 iv Table of Contents 3 Sparse Time-Varying Regression Introduction Windowed LASSO Working Principle Limitations Summary The Dynamic LASSO Working Principle Limitations Summary Kalman Filtered LASSO The Kalman Filter equations Working Principle Limitations Summary The Weighted Sparse Kalman Filter Inducing Sparsity in the Kalman Filter Working Principle Limitations Summary Summary Simulation Results Introduction Estimation Performance for P > N > S Dataset Discussion Simulation Results Estimation Performance for P > N S Dataset Discussion Simulation Results Weighted-SKF and Time-Varying Support Dataset Discussion Simulation Results Simulations with Real Social Media Data Dataset Discussion Simulation Results Summary Conclusions and Recommendations Conclusions Recommendations

9 Table of Contents v A Proofs 65 A-1 The Soft Thresholding Function A-2 Calculating λ max Glossary 71 List of Symbols List of Acronyms

10 vi Table of Contents

11 List of Figures 1-1 (a) Google search volume for DJIA and (b) the actual DJIA from Yahoo! Illustration of the minimization problem (2-2) and the definition of the vectors From left to right: graphical representation of the l 2 -, l 1 - and l 0 -norm in the (x 1, x 2 ) plane Graphical representation of the soft thresholding function (2-5). Small values of x are set exactly to zero, while for the nonzero coefficients a bias is introduced via λ Graphical representation of the shrinkage effects of the (a) l 0 -norm, (b) the l 2 -norm and (c) l 1 -norm for the orthonormal case A T A = I. The 45 degrees dotted line serves as a reference and represents the values before shrinkage Graphical representation of the constraint l 1 -region (left) and l 2 -region (right), with the contour lines of the objective function Ax y 2 2 in the (x 1, x 2 ) plane. A feasible solution is obtained where the constraint region is entered. For the LASSO, this is likely to happen at a vertex where either x 1 or x 2 is zero and hence a sparse solution is obtained Graphical representation of the shrinkage of coefficients in x, in all situations λ = 4. Left: the LASSO with the biased nonzero coefficients. Middle: Adaptive LASSO with γ = 0.5. Right: Adaptive LASSO with γ = 2. It is seen that the bias for larger coefficients is eliminated. Figures borrowed from (Zou, 2006) Illustration of the A-matrix of Scenario 1: N > P > S Illustration of the A-matrix of Scenario 2: P > N > S Illustration of the data vector a t of Scenario 3: P > N S Plots of four coefficients and their estimates. The most accurate constant representation of x t in a window is ˆx t mean. The ˆx t LASSO is only close to ˆx t mean at some instances. Moreover, applying a window obstructs accurate tracking of time-varying behavior True and estimated coefficient for increasing λ 2 in D-LASSO. The values x t are those calculated at T = 50 to illustrate the effect of λ

12 viii List of Figures 3-3 x t and its estimates with λ = 0.2. The red, circled line labeled ˆx D t are the estimates of D-LASSO based on data that is available until time step t. The black, squared line labeled ˆx D t at t = 50, are the estimates calculated at t = T (a) shows a nonzero coefficient and its KF estimate and (b) shows a zero coefficient and its KF estimate. In contrast to Windowed LASSO, the mean of the KF estimates over W time steps accurately approximates the true mean Generated nonzero coefficients in x t with constant support At each time step, the MSE(x t ) calculated over 15 runs is shown for P = 25, N = 8 and S = 3. KF-LASSO converges to a lower MSE(x t ) than LASSO by using dynamic information of x t. Weighted-SKF outperforms all other methods since it also uses dynamic information in support estimation MSE(x t ) of Weighted-SKF for several choices of the tuning parameters. When the parameters are not set exactly, but close to the true value, the Weighted-SKF still performs well (a) The average MSE(x t ) and (b) the percentage of how many times (of in total 75) the support is estimated correctly. Both as a function of N Nonzero coefficient and its estimate by (a) KF, (c) SKF and (e) Weighted- SKF. Zero coefficient and its estimates by (b) KF, (d) SKF and (f) Weighted- SKF. The SKF nonzero coefficient is heavily biased, while this is compensated in the Weighted-SKF Generated nonzero coefficients in x t with time-varying support (a) IEN and the estimated size of the support by the Weighted-SKF of Algorithm 3 and 4 (b) Tracking of a coefficient added to the support Additions to and deletions from the support, tracked by the Weighted-SKF. Detection of additions via the IEN is fast. Deletion is somewhat slower since the mean of the estimate needs to be below the threshold for W time steps Normalized DJIA closing price and the number of Google searches Normalized DJIA closing price and periods where the Weighted-SKF includes Google data DJIA closing price and buy and sell moments of the Weighted-SKF with Model

13 List of Tables 3-1 Performance of the various algorithms Speed of detecting a change in the support with the IEN and finding the correct support with Weighted-SKF of Algorithm 4 for 30 runs financial Google search terms Top 5 financial search terms with correlation coefficient MAPE and DA of the Weighted-SKF and (Mao et al., 2011) for 3 Models Returns of the Weighted-SKF and (Mao et al., 2011) for 3 Models

14 x List of Tables

15 Acknowledgements This thesis is the result of a year of research. The main motivation for this research is that I greatly enjoy to explore whether methods and knowledge, developed in technical research areas, are also applicable outside these original areas. As such I was curious whether knowledge, insights and methods of Systems & Control could be applied for economic problems such as stock market prediction. I believe that by combining knowledge and applications of several research areas, valuable contributions and original solutions can be obtained for the encountered problems. I would like to thank my supervisor, Prof. dr. ir. Michel Verhaegen for his enthusiasm and support for this thesis proposal and the discussions during the past year. Moreover, I would like to thank Joep Kooijman with whom I collaborated a large part of last year for his feedback on my report and the fruitful discussions we had. Finally, I would like to thank my family, especially my parents, who have always supported me in every way they can. They are the ones who made it possible for me to get this far and I am deeply grateful to them. Delft, University of Technology October 26, 2014

16 xii Acknowledgements

17 Does it mean this, does it mean that, that s all anybody wants to know. I d say what any decent poet would say if anyone dared ask him to analyze his work: if you see it, then it s there! Freddie Mercury

18

19 Chapter 1 Introduction Prediction of the stock market has been a research topic for decades. Recently, attempts have been made to improve the accuracy of the predictions by including data from social media like Google and Twitter. Data from social media are regarded as indicators for sentiments that potentially carry useful information in addition to financial data. Current prediction methods, however, do not give results that are easily interpretable. Beforehand it is often not known which data is relevant for the prediction and hence which data should be added to the model. To obtain interpretable results, regression methods that induce sparsity are required: data that is not useful for the prediction is automatically discarded from the model. The goal of this chapter is to introduce the various aspects of social media aided stock market prediction. Section 1-1 discusses stock markets and the influence of sentiments on investors. Section 1-2 discusses recent studies that use data from social media to improve the predictions. Examples are given of studies that use Google and Twitter data. The principle of sparse regression methods is introduced in Section 1-3. It discusses how sparsity can improve the interpretability and it gives two examples of current applications. The goals of this thesis are presented in Section 1-4 and an outline is given in Section Stock Markets The market in which stocks of publicly held companies are traded is called the stock market. In exchange for capital, the one who invests in stocks receives part of the ownership of a company. When a company is profitable, the investor makes money by receiving dividend. Moreover, the stock price increases when the demand is high and the investor makes money by selling his stocks for a higher price. One the other hand, the investor can loose money when the company is not profitable and when stocks are sold for a lower price.

20 2 Introduction The stock market is thus a network of sellers and buyers and the stock price is determined by supply and demand. Behavior economics argues that decisions made by investors on the stock market are influenced by social and emotional factors. Therefore, people make irrational decisions and their behavior does not follow economic models (Smailovic et al., 2013). If these emotions could be captured and used for stock market predictions, more reliable predictions may be obtained. For example, during the financial crisis of 2008 nobody seemed to be able to predict what would happen on the stock market since the models used for prediction are based on fundamental price movements and not on sentiments. So even though the models are quite involved and the trading strategies carefully chosen, the returns were negative when the stock market was dominated by emotions (Anderluh, 2011). This emphasizes that the performance of the prediction models can be improved by capturing sentiments and emotions in the prediction models. 1-2 Social Media Aided Stock Market Prediction Recently, to account for the emotions and sentiments that influence the investors behavior, data from social media is used. The recent availability of vast amounts of social media data, gives the possibility to capture part of these emotions and sentiments. Nowadays, a huge amount of information is shared on social media sites such as Twitter, Google and Facebook. This new development gives the possibility to use sentiments of large groups of people for prediction by extracting this information from these social networks. One of the first papers to address this phenomenon was the paper of (Bollen et al., 2010) that also received a lot of attention from the media. This paper used several public mood indicators based on search terms in Google (e.g. the calmness index) and it showed that these indicators sometimes predict the Dow Jones Industrial Average (DJIA) three days ahead. Similar examples that use social media to enhance predictions are the work of (Asur and Huberman, 2010) and (Jiang et al., 2013). In the past 5 years, social media aided predictions primarily focused on the data from Twitter and Google. First, three illustrative examples of Twitter based predictions are briefly discussed. (Oliveira et al., 2013) performed sentiment analysis on the content of tweets and it investigated posting volume of tweets. It was found that the Twitter posting volume is relevant for modeling the trading volume of the next day. The second study relating to Twitter is that of (Mao et al., 2012). They investigated whether posting volume on Twitter is correlated at three different levels: the stock market as a whole, the industry sector and individual company stocks. It was found that Twitter posting volume was primarily helpful to predict the stock market at the stock market level. The third publication is that of (Rao and Srivastava, 2012). Data was used to model the movements in oil, gold and forex prices. It was found that by including social media data the prediction error was reduced for forecasting the DJIA.

21 1-2 Social Media Aided Stock Market Prediction 3 (a) Google search volume for the term DJIA. Source: (b) Illustration of the DJIA closing price. Source: Figure 1-1: (a) Google search volume for DJIA and (b) the actual DJIA from Yahoo! Two Google based prediction methods are briefly discussed next. The work of (Beer et al., 2013) proposed a novel investor sentiment indicator based on search volumes on Google. It was shown that this sentiment indicator contributes to short-term market returns. The second publication, that of (Preis et al., 2013), investigated search volumes and their relation to future trends. It was found that search volumes of certain search terms are "early warning signs", especially during stock market falls and financial crises. These studies illustrate the potential of using Google and Twitter data for improving stock market predictions. This thesis focuses on data from Google, since this data gives promising results while it is more easily available than Twitter data. Figure 1-1a shows the search volume for the search term DJIA. Hence, this figure depicts the interest in this term over time. Figure 1-1b illustrates the DJIA closing price. A visual examination of Figure 1-1 already shows the resemblance between the social media and the stock market. It is noted that the improvements in prediction accuracy achieved by the previously mentioned studies is difficult to quantify, since the results of adding social media to the prediction differ per scenario. Although a conclusive answer is yet to be found to the question whether data from Google and Twitter can predict the stock market, the previously mentioned studies showed promising results.

22 4 Introduction 1-3 Sparse Models Stock markets are complex, dynamic, time-varying systems of supply and demand. The majority of the aforementioned studies employed neural networks or a Support Vector Machine (SVM) for prediction. These methods can be regarded as a black box: many types of information serve as inputs to the algorithm and via a unknown, nonlinear mapping the prediction is obtained. In terms of prediction accuracy, these nonlinear models often perform well, however, some drawbacks exist in the currently used models. First, inputs to these nonlinear models have to be selected carefully. When additional inputs, such as social media data, are included in the model, the performance may deteriorate when the added inputs contain no useful information. Currently, inputs are selected during a pre-processing step, before they are added to the model. Due to the time-varying character of the stock market it might well be that inputs that are first not useful become useful later, as time progresses. However, these inputs cannot be added to the model halfway the prediction, since they have already been discarded during pre-processing. Moreover, black box models lack interpretability. Because of the complexity of the system, the understandability of predictions is important. Human traders are more likely to trust a prediction if it comes with valid reasoning. Therefore, the system of stock market predictions should be understandable and interpretable. Interpretability - and thereby credibility - of the forecasts is increased when it is clearly understood what inputs are selected and to what extend the prediction is based on what inputs. This thesis therefore proposes to use sparse methods that automatically discard data that is not useful. Currently, such methods are primarily used in system identification problems. For example, in image reconstruction with MRI scans, it is known that using a subset of the measurements is already sufficient to reconstruct the image. Sparse methods are able to exploit this knowledge and to identify this subset and thereby accelerating the imaging speed (Lustig et al., 2007). Another illustrative example is the application of sparsity for climate predictions (Chatterjee et al., 2012). This study investigated what variables are most relevant for prediction of land climate, in order to gain better insight and interpretation. The goal is to obtain the most relevant variables for prediction from a set of ocean climate variables, such as temperature, sea level pressure, wind speed etc. It was shown that off coast temperature and precipitation are most relevant for predicting the land climate. However, current applications of sparse regression methods, such as the aforementioned two examples, assume that multiple measurements of the signal of interest can be obtained. For applications to the stock market, on the other hand, it is argued that it is not possible to take sufficient measurements at one time step. Instead, a sparse signal is estimated from observations that are acquired sequentially over time and it is assumed that this sparse signal is time-varying. Performance of current sparse regression methods is thus expected to be limited when these are applied to time series for stock market prediction. Therefore, this thesis proposes a new sparse regression method that is suitable for stock market predictions.

23 1-4 Approach and Goals of the Thesis Approach and Goals of the Thesis Based on the discussion of stock market predictions and sparse regression methods in the previous sections, the goal of this thesis is formulated. The goal of this thesis is to propose a sparse regression method that is able to retrieve a sparse, time-varying regression vector from time series data that is acquired sequentially over time. The first research question therefore is: 1. How is sparsity induced to obtain interpretable results from a regression model? To understand how interpretability is translated into the mathematical notion of sparsity, it is discussed how sparsity is induced and what effects this has on the regression model. This question is the subject of Chapter What challenges arise when such interpretable regression models are applied to time series? How do current methods meet these challenges? In contrast to current applications, this thesis assumes that a sparse, time-varying regression vector is estimated from time series data that arrives sequentially over time. Challenges will thus arise when current methods are applied to time series for stock market prediction. Section 2-4 puts these challenges in a mathematical framework and it discusses the challenges in detail. Chapter 3, Section 3-2 to 3-4, discusses the most common and best performing sparse regression methods proposed in current literature. It is discussed to what extend these methods are applicable to time series and what causes their limited performance in this new application. 3. Is it possible to propose a sparse, time-varying regression method that performs well with time series data for stock market predictions? The performance of current methods is limited when these are applied to time series data that is acquired sequentially over time. Therefore, a new sparse regression method is proposed. The proposed method should be able to (i) accurately identify the important data (the nonzero coefficients of the regression vector, called the support) and (ii) accurately estimate the values of the nonzero entries in the coefficient vector. This is the subject of Chapter 3, Section 3-5. Moreover, Chapter 4, Section 4-2 to 4-4, discusses the results of numerical simulations of current methods and the proposed method. 4. Can the proposed sparse regression methods for time series be employed to improve stock market predictions with social media? Finally, when the proposed method has been designed, it is investigated whether this method can be employed to predict stock markets with the aid of social media data. This is part of Chapter 4, Section 4-5.

24 6 Introduction 1-5 Outline of the Thesis The outline of the remainder of the thesis is summarized in this section. Chapter 2 discusses how sparsity is induced by l 1 -regularization and how the Least Absolute Shrinkage and Selection Operator (LASSO) can be employed to obtain interpretable results. The mathematical framework and the properties of the LASSO are discussed in that chapter. Moreover, the challenges that arise when the LASSO is applied in social media aided stock market prediction are discussed in detail. Chapter 3 discusses three sparse regression methods that are proposed in current literature for time-varying systems. The working principle of these methods is discussed as well as their limitations when they are applied to stock market predictions. The second part of Chapter 3 introduces the newly proposed Weighted Sparse Kalman Filter (Weighted-SKF) that accurately estimates the time-varying support and the timevarying coefficients in the regression vector for stock market predictions. Chapter 4 discusses the simulation results of the most promising method in current literature, the Kalman Filtered LASSO (KF-LASSO), and it compares the performance to the newly proposed Weighted-SKF for various datasets. Moreover, simulation results on real social media data and stock market data are discussed for the W-SKF. Conclusions are summarized by answering the research questions of Section 1-4 in Chapter 5. Moreover, this chapter gives some recommendations for future research.

25 Chapter 2 The LASSO and its Application to the Stock Market 2-1 Introduction In Chapter 1 it was stated that sparse regression methods are employed in this thesis to obtain interpretable results for stock market predictions. This chapter discusses how sparsity is induced by l 1 -regularization and how the Least Absolute Shrinkage and Selection Operator (LASSO) can be employed to obtain interpretable results. Section 2-2 discusses the mathematical framework and the notations used throughout this report. Then in Section 2-3, the LASSO is introduced that uses sparsity via l 1 - regularization in a least squares problem. The properties of the LASSO are discussed and the Adaptive LASSO is introduced as an extension to the LASSO for unbiased estimates. Section 2-4 discusses how the framework of the LASSO can be applied to the stock market in order to obtain interpretable results and it is discussed what challenges arise in that setting. A summary of this chapter is given in Section Mathematical Framework and Notations Data is collected in the data matrix A R N P that contains N observations and P variables, or features. Each column in the A-matrix is called a feature vector, f p R N. Data of the output is collected in y and the regression-, or coefficient vector is x. A linear relation between A, y and x is described via (2-1), where ɛ is zero-mean white noise: ɛ N (0, σεi). 2 y = Ax + ɛ (2-1)

26 8 The LASSO and its Application to the Stock Market The goal is to estimate x and thus to obtain the relation between the data in A and the phenomenon in y. Hence, the following Ordinary Least Squares (OLS) minimization problem is of interest: Figure 2-1 illustrates problem (2-2) with all defined vectors. min x Ax y 2 2 (2-2) Figure 2-1: Illustration of the minimization problem (2-2) and the definition of the vectors. Without loss of generality, it is assumed throughout this report that the output y is zero-mean and that the feature vectors f p are standardized to have zero-mean and unitvariance (Zou and Hastie, 2005): N y i = 0, i=1 N f i,p = 0 i=1 and N fi,p 2 = 1 i=1 for p = 1, 2,..., P By making f p and y zero-mean, no intercept term is needed in objective function (2-2). Moreover, standardizing the feature vectors assures that all features are approximately on the same scale and that useful solutions for x are obtained. This is particularly useful when regularization is applied in Section 2-3. In Chapter 1 it was already discussed that beforehand it is unknown which features contribute to the prediction of y. Hence, a practical solution would be to add all features to the A-matrix and that the useless features are automatically discarded from the regression. For problem (2-2) this means that certain coefficients in x are set zero, in order do exclude irrelevant features from the regression. The LASSO is such a method and it is the subject of Section The LASSO and its Properties This section discusses the LASSO that uses l 1 -regularization and how this encourages sparsity. Section discusses the origin of the LASSO and what l 1 -regularization distinguishes from other regularization methods. In Section it is discussed and illustrated how the l 1 -regularization encourages sparse solutions. Section discusses two desirable properties that the LASSO should satisfy and it introduces the Adaptive LASSO that satisfies both properties. Finally, Section discusses tuning of the LASSO parameter λ.

27 2-3 The LASSO and its Properties Background and Origin of the LASSO This section discusses the origin of the LASSO and what l 1 -regularization distinguishes from other regularization methods such as l 0 - and l 2 -regularization. An l p -norm of x, x p is defined as ( i x i p) 1 p. With this l p -norm an extended version of the OLS problem in (2-2) is introduced as in (2-3). min x Ax y λ x p (2-3) The tuning parameter λ controls the trade-off between the two terms and it thus controls the amount of regularization applied by the second term. The three most commonly used norms are the l 0 - l 1 - and the l 2 -norm. representation for the two dimensional case P = 2 is shown in Figure 2-2. A graphical Figure 2-2: From left to right: graphical representation of the l 2 -, l 1 - and l 0 -norm in the (x 1, x 2 ) plane. The effects of these three norms in problem (2-3) are briefly discussed below. The l 0 -norm Regularization with the l 0 -norm, x 0, is also known as subset selection. The l 0 -norm counts the nonzero entries of x and it leads to interpretable models, since some coefficients in x become exactly zero. However, a drawback of subset selection is that coefficients are retained in the model whenever their value becomes nonzero. A slight change in the dataset can therefore result in completely different coefficients that remain nonzero which limits the prediction accuracy (Tibshirani, 1996). The l 2 -norm Regularization with the l 2 -norm, x 2, is also known as ridge regression. In ridge regression, all variables are continuously shrunk and hence it is more stable than subset selection. A drawback of ridge regression, however, is that variables are not exactly set to 0, so that the resulting model is less interpretable (Tibshirani, 1996). The l 1 -norm To incorporate the desired behavior of both l 0 - and l 2 -regularization, (Tibshirani, 1996) proposed to apply an l 1 -norm, x 1, in (2-3). This method is known as the LASSO. The LASSO shrinks some variables towards 0 and sets others exactly to 0 and the LASSO thus obtains sparse solutions. This ensures that interpretable models are obtained as in l 0 -regularization, while the results are more stable as in l 2 -regularization. Next, in Section it is discussed in more detail how application of the l 1 -norm ensures that some coefficients are exactly shrunk to zero and hence why the LASSO is able to encourage sparsity.

28 10 The LASSO and its Application to the Stock Market Sparsity via l 1 -Regularization This section discusses how the l 1 -norm in the LASSO of (2-4) encourages sparsity. First, this is illustrated for the orthonormal case, A T A = I, and then it is illustrated for the general case without the assumption of orthonormality. Sparsity via l 1 -Regularization for the Orthonormal Case min x Ax y λ x 1 (2-4) For the orthonormal case, A T A = I, the solutions of (2-4) can be computed as in (2-5). ( ˆx i = sign(ˆx OLS i ) ˆx OLS i λ ) (2-5) 2 In (2-5) the sign( ) denotes the sign vector and ( ) + is max(, 0) and hence only positive values are considered. The proof is given in Appendix A-1. Equation (2-5) is called the soft thresholding function (Fan and R., 2001). The graphical representation of the soft thresholding function (2-5) is shown in Figure 2-3. The horizontal axis shows the coefficient values before shrinkage and the vertical axis the values after shrinkage. Without shrinkage, the diagonal line is obtained, while for the soft thresholding function it can be seen that -when λ is sufficiently large - the small coefficients will be exactly set to zero. This comes at the cost of a biased solution of the nonzero coefficients. + Figure 2-3: Graphical representation of the soft thresholding function (2-5). Small values of x are set exactly to zero, while for the nonzero coefficients a bias is introduced via λ. For comparison purposes, the soft thresholding function of l 1 -regularization is compared with the shrinkage effects of the l 0 - and l 2 -norm in Figure 2-4, for the orthonormal case. This figure is borrowed from (Tibshirani, 1996). It shows that indeed the l 0 -norm sets coefficients exactly to zero, but that this is a discrete process. The l 2 -norm is shown to continuously shrink all coefficients, but none exactly to zero.

29 2-3 The LASSO and its Properties 11 Figure 2-4: Graphical representation of the shrinkage effects of the (a) l 0 -norm, (b) the l 2 -norm and (c) l 1 -norm for the orthonormal case A T A = I. The 45 degrees dotted line serves as a reference and represents the values before shrinkage. Sparsity via l 1 -Regularization for the General Case (Tibshirani, 1996) showed that the various shrinkage effects of the l p -norms discussed in the previous sections also hold for the general, non-orthonormal case. This is illustrated in Figure 2-5 that is borrowed from (Tibshirani, 1996). The figure shows the contour lines of the objective function Ax y 2 2 and the black area is the constraint region of the l 1 - and l 2 -norm, respectively. The OLS function Ax y 2 2 can be rewritten into the function (x ˆxOLS ) T A T A(x ˆx OLS ), where ˆx OLS is the OLS estimate. The minimum of this function is obtained when x = ˆx OLS and hence the contour lines are centered at the OLS estimates. However, due to the regularization term, these solutions are infeasible and the solution to (2-4) is obtained where the contour lines hit the constraint region for the first time. Due to the shape of the constraint l 1 region, (Tibshirani, 1996) stated the contour lines are likely to hit the constraint region in the vertex, where either x 1 or x 2 is zero. This thus yields a sparse solution. The constraint region of the l 2 -norm, on the other hand, has no vertexes and solutions in zero will rarely occur. Figure 2-5: Graphical representation of the constraint l 1 -region (left) and l 2 -region (right), with the contour lines of the objective function Ax y 2 2 in the (x 1, x 2 ) plane. A feasible solution is obtained where the constraint region is entered. For the LASSO, this is likely to happen at a vertex where either x 1 or x 2 is zero and hence a sparse solution is obtained.

30 12 The LASSO and its Application to the Stock Market The Oracle Properties and the Adaptive LASSO This section discusses two important properties that the LASSO should satisfy and an extension of the LASSO, the Adaptive LASSO, is introduced. The Adaptive LASSO prevents the bias that occurs in the LASSO for nonzero regression coefficients. The Oracle Properties The LASSO of (2-4) continuously shrinks all coefficients and it sets small coefficients exactly to zero. It was already noted that the nonzero estimates of the LASSO are biased. This means that the LASSO is unable to accurately retrieve the values of the true coefficient vector x. More formally stated, the LASSO does not satisfy the desirable Oracle Properties. When an algorithm exhibits the Oracle Properties it means that it behaves as if it knew the true subset of nonzero coefficients in advance. In order to discuss the two Oracle Properties, the definition of the support is introduced. Definition 1. Support The support, S, consists of the nonzero coefficients in x. S { } i {1,.., P } : x i 0 Moreover, x S denotes the subvector of x belonging to S, i.e. consisting of only the nonzero entries of x. Furthermore, the size of the support is defined as S S. Without loss of generality it is assumed that the first q coefficients of x are nonzero and hence form x S and that the last q + 1,.., P coefficients of x are zero and hence form x Sc. Here, ( ) c denotes the complement of a set. Then the first q columns of A form A S and the last q + 1,.., P columns form A S c. If now Σ 11 = 1 N AT S A S and Σ 21 = 1 N AT S ca S the two Oracle Properties can be formulated as follows (Zou, 2006): 1. lim t Prob[Ŝt = S] = 1 2. ( t ˆx St x S) N (0, σɛ 2 Σ 1 11 ) Property 1 means that the support is consistently estimated. Property 2 means that the nonzero values are estimated consistently. (Zhao and Yu, 2006) showed that it can be checked whether Property 1 holds by testing whether the Irrepresentable Condition (IC) holds since it is an if and (almost) only if condition. The IC is shown in (2-6) where the inequality holds element-wise and it is assumed that Σ 11 is invertible. η is any constant larger than zero. Σ 21 Σ 1 11 sign(xs ) 1 η (2-6) The LASSO is able to satisfy Oracle Property 1, however, since it gives biased estimates of the nonzero coefficients Oracle Property 2 cannot be satisfied. In order to satisfy both Oracle Properties, the Adaptive LASSO is discussed next.

31 2-3 The LASSO and its Properties 13 The Adaptive LASSO A straight-forward method to get unbiased estimates of x is to apply a two-step method consisting of a LASSO and an OLS. The LASSO is applied for estimation of the support. The OLS is then run over the reduced set consisting of only x S. The OLS gives unbiased estimates and hence both Oracle Properties are satisfied. However, (Zhao and Yu, 2006) concluded that when the IC of (2-6) fails, the amount of shrinkage applied to the nonzero coefficients is too large. Therefore, there is no guarantee that the correct support is obtained and hence the two-step method fails. (Zou, 2006) therefore advised to reduce the amount of shrinkage on the nonzero coefficients in the LASSO directly. They proposed the Adaptive LASSO: a method that gives unbiased estimates, while retaining the convex computational advantage of the LASSO. The Adaptive LASSO is given in (2-7). min x P Ax y λ w i x i (2-7) i=1 The coefficients w i are stacked in the weight vector w. (Zou, 2006) proposes to approximate this vector as w = 1/ x OLS γ, with γ an extra tuning parameter. Their study showed that the estimates of the Adaptive LASSO satisfy both Oracle Properties. A graphical representation of the LASSO versus the Adaptive LASSO is shown in Figure 2-6 for the orthonormal case, A T A = I. It can be seen in Figure 2-6 that the Adaptive LASSO has no bias for the larger coefficients and it thus not only satisfies Property 1, but also Property 2. Figure 2-6: Graphical representation of the shrinkage of coefficients in x, in all situations λ = 4. Left: the LASSO with the biased nonzero coefficients. Middle: Adaptive LASSO with γ = 0.5. Right: Adaptive LASSO with γ = 2. It is seen that the bias for larger coefficients is eliminated. Figures borrowed from (Zou, 2006). It is noted that when in practice (2-6) is violated, the results of the LASSO are not necessarily useless. It means that no formal proof can be given that the support is always correctly estimated. However, in practice these results can still be useful and therefore evaluation of support estimation will be performed based on numerical simulations in Chapter 4.

32 14 The LASSO and its Application to the Stock Market Tuning Parameter λ It has already been noted that λ in the LASSO problem of (2-4) controls the amount of shrinkage applied on x. When λ = 0, the LASSO is reduced to the OLS and on average the correct solution of x is found, but the solution differs for various datasets. In other words, the OLS estimates are unbiased with high variance. When λ is increased, biased estimates are obtained, but the variance is decreased. In the extreme case, when λ, x will contain only zeros. The goal is to find the optimal parameter values. In this context, optimal can be defined in two ways. The first is that λ is optimal when the prediction error is minimized. The second is that λ is regarded optimal when interpretable results are obtained (e.g. to discard extra x ps to increase interpretability at the cost of an increased prediction error). The three most common approaches to select λ are discussed below. The first choice of λ was proposed by (Donoho et al., 1993). They proved that, theoretically, the smallest error E[ x ˆx 2 2 ] is obtained for λ = σ ɛ 2 log(n). This is consistent with the later findings of (Chen et al., 1998) who proposed to use this λ when the variance of the error is known. In practice, however, this value of λ often does not lead to minimal error. Therefore, the second method that is often employed to find λ is cross-validation. Among others, (Zou and Hastie, 2005) and (Angelosante et al., 2009) used cross-validation to find λ. The main idea of cross-validation is that the dataset is split in K parts and that the algorithm is trained over all parts except for the kth part. The training is then validated on this kth part. This procedure is iterated over k = 1,..., K for many different values of λ. It is noted that this may be a computationally expensive procedure and that it may yield unstable estimates, i.e. the optimal value found for λ changes suddenly when the dataset changes slightly (Hirose et al., 2013). Furthermore, when using the cross-validation, it is implicitly assumed that the goal is to minimize the prediction error, since the cross-validation aims at minimizing this prediction error. When the emphasis is more on retrieving an interpretable model, the cross-validation may not be the most appropriate method to apply. The third method that can be employed to choose λ was used by (Farahmand et al., 2011a) among others. They set λ = ρλ max, with ρ [0, 1] and where λ max is the smallest value of λ for which (2-4) gives x = 0. By changing ρ, the sparsity of the solution is controlled. Furthermore, λ max is found to be (2-8) where l is the infinity norm, which is the max of a vector. A proof is given in Appendix A-2. λ max = 2A T y (2-8) The third tuning method is preferred in this thesis. A set of various values of ρ is chosen to perform simulations. The value that yields the smallest prediction error is then selected. This is computationally less expensive than cross-validation. Moreover, in the situations considered in this report, an extensive cross-validation does not guarantee better tuning of λ, since performance of the LASSO is inherently limited, regardless of tuning of λ. This originates from the challenges that arise when the LASSO is applied to the stock market. This is discussed in Section 2-4.

33 2-4 Application of the LASSO to the Stock Market Application of the LASSO to the Stock Market The LASSO introduced in Section 2-3 will be employed to obtain interpretable results for stock market predictions. However, current applications of the LASSO differ from applications on time series in the stock market. The framework of current LASSO applications is discussed in Section Section discusses the framework of LASSO applications in the stock market and how this framework differs from current literature. The challenges that arise with this new framework are discussed in Section Classical LASSO applications In current literature, two scenarios can be distinguished in which a LASSO is commonly applied. These two scenarios are discussed in this section. Scenario 1: The LASSO for N > P > S The first scenario that is considered is that of N > P > S. Hence, more measurements (N) are available than the number of variables, or features (P ). The A-matrix is thus tall, as shown in Figure 2-7. This scenario is often of interest in current LASSO literature. A well-known example is that of the prostate cancer data of (Stamey et al., 1989). This dataset contains measurements of the prostate-specific antigen (PSA) on N = 97 men with prostate cancer and P = 8 features are collected, for example the volume of the cancer, the weight of the prostate, the age of the men etc. The goal was to find a linear relation between the PSA and a subset of the 8 features. It was found that the cancer volume and the prostate weight are sufficient to predict the level of PSA (Tibshirani, 1996). Figure 2-7: Illustration of the A-matrix of Scenario 1: N > P > S.

34 16 The LASSO and its Application to the Stock Market Scenario 2: The LASSO for P > N > S The second scenario to which the LASSO can be applied is P > N > S. Hence, in this situation more features than measurement are available. The data matrix A is thus fat, as illustrated in Figure 2-8, and an underdetermined problem needs to be solved. However, since S features are sufficient to predict y, the problem can be reduced to the overdetermined N S problem by applying the LASSO. One such example is given in the work of (Zou and Hastie, 2005). They considered the problem of microarray classification and gene selection for datasets with thousands of genes, P > 1000, and fewer than hundred samples, N < 100. Also for this (P > N) situation the LASSO is able to discard irrelevant features. Figure 2-8: Illustration of the A-matrix of Scenario 2: P > N > S. Time-Varying Behavior for Scenario 1 and 2 Scenario 1 and 2 are the classical scenarios that are focused on in current literature. Model (2-9) describes the time-varying behavior of x t for scenario 1 and 2. x S t+1 = C t x S t + ζ t y t = A t x t + ɛ t ζ t N (0, σ 2 sysi) ɛ t N (0, σ 2 εi) (2-9) The LASSO problem that relates to (2-9) is given in (2-10). min x t A t x t y t λ x t 1 (2-10) Here A t R N P, y t R N, C t R S S and xt S R S. Hence, the dynamic model (2-9) is only defined for the nonzero coefficients. In practice, it may occur that zero coefficients are added to the support and hence become nonzero. It is assumed that these additions to the support do not originate from the dynamics in (2-9). The LASSO of (2-10) shows that these scenarios assume that multiple measurements are available at each time step and hence the matrix A t is obtained with N rows (measurements). Hence, data is treated in a batch form since it is assumed that the system is constant over the N time steps. For applications to the stock market, however, it is argued that a sparse signal is estimated from data that is acquired sequentially over time. Treating data in a batch form is then inappropriate. This is discussed next.

35 2-4 Application of the LASSO to the Stock Market Stock Market Application of the LASSO The two scenarios of Section 2-4 are not suitable for stock market predictions and time series. Therefore, this thesis introduces a third scenario that relates to stock market predictions. So far, this scenario has received little attention in current literature. Scenario 3: The LASSO for P > N S So far, the two scenarios encountered in LASSO applications assume that sufficient measurements, N, are available to solve the LASSO problem. In the first scenario this is obvious, since N > P, and in the second scenario the problem is solvable by discarding features and hence transforming the problem from N < P to N > S. For applications to the stock market and time series, however, it is argued that N = 1 and the linear Autoregressive model with exogenous inputs (ARX) of (2-11) is obtained. The stock price in y t can thus only be formed by the features that are available at that time (previous stock prices, financial indicators or social media data). ] y t = [y t 1 y t 2 y t d f t,1 f t,2 f t,3... f t,p x t + ɛ t (2-11) } {{ } a t This amounts to the data vector a t, as illustrated in Figure 2-9, instead of a data matrix A t. Figure 2-9: Illustration of the data vector a t of Scenario 3: P > N S. For this scenario, the dynamic model of (2-12) is obtained where a t R P, the scalar y t, C t R S S and x S t R S. x S t+1 = C t x S t + ζ t ζ t N (0, σ 2 sysi) (2-12a) y t = a t x t + ɛ t ɛ t N (0, σ 2 ε) (2-12b) To represent the stock market, his thesis assumes model (2-12) with slowly time-varying nonzero coefficients and slowly time-varying support set. The -non classical- LASSO problem that relates to (2-12) is given in (2-13). min x t (a t x t y t ) 2 + λ x t 1 (2-13) Problem (2-13) is of interest in this thesis. The challenges that arise for solving problem (2-13) are discussed in Section

36 18 The LASSO and its Application to the Stock Market Challenges with the Stock Market Application of the LASSO This section discusses two challenges that arise when the LASSO is applied to the stock market, such as in Scenario 3 of Section The first challenge that arises is that of each feature only N = 1 measurement is available at every time step. Since it is assumed that more than 1 feature is of interest in stock market predictions, S > 1, the new scenario of P > N S is obtained. It is known that when P > N, the LASSO selects at most N variables (Zou and Hastie, 2005). This limits the performance of the LASSO since at least S variables should be selected. The second challenge is that it is assumed that the relevance of the various features varies over time. Social media data, for example, may be more useful during declines in the stock market when sentiments play an important role. In contrast to most applications in scenarios 1 and 2 this indicates that the coefficient vector, x t, is also time-varying. The LASSO in (2-13), however, does not take the dynamics of x t into account and its performance of tracking this time-varying behavior is thus limited. Therefore, Chapter 3 discusses extensions of the LASSO that are proposed for timevarying problems. The performance and limitations of the most promising methods presented in current literature are discussed. Moreover, a new sparse regression method is introduced for N = Summary Section 2-3 introduced the Least Absolute Shrinkage and Selection Operator (LASSO). The LASSO extends the Least Squares (LS) problem with an l 1 -norm over coefficient vector x. This ensures that the LS problem is solved while sparsity on x is induced. The l 1 -norm shrinks all coefficients and it sets some exactly to zero. This results in interpretable models, as in subset selection (l 0 ), while the prediction accuracy is close to that of ridge regression (l 2 ). Moreover, the LASSO should have two properties: (i) it should correctly identify the nonzero coefficients that form the support, S t, and (ii) the values of the coefficients in the support should be accurately estimated. The LASSO gives biased nonzero estimates and the second property is therefore not satisfied. The Adapative LASSO is proposed to compensate for this. Furthermore, Section 2-4 discussed that certain challenges arise when the LASSO is applied to stock market prediction. In current literature it is assumed that sufficient measurements, N, are available at each time step to solve the LASSO. However, it is argued that time series used for stock market prediction contain N = 1 measurement per time step and the LASSO is therefore not directly applicable to stock markets. Moreover, the coefficient vector x t is time-varying and the LASSO cannot accurately track this time-varying behavior. Therefore, Chapter 3 will discuss extensions of the LASSO that are proposed for timevarying problems. Moreover, a new sparse regression method will be introduced for N = 1.

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

More information

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market

More information

Multiple Kernel Learning on the Limit Order Book

Multiple Kernel Learning on the Limit Order Book JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Stock Market Forecasting Using Machine Learning Algorithms

Stock Market Forecasting Using Machine Learning Algorithms Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Using Tweets to Predict the Stock Market

Using Tweets to Predict the Stock Market 1. Abstract Using Tweets to Predict the Stock Market Zhiang Hu, Jian Jiao, Jialu Zhu In this project we would like to find the relationship between tweets of one important Twitter user and the corresponding

More information

Polynomial Neural Network Discovery Client User Guide

Polynomial Neural Network Discovery Client User Guide Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3

More information

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

More information

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013 Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

How to assess the risk of a large portfolio? How to estimate a large covariance matrix? Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

More information

Module1. x 1000. y 800.

Module1. x 1000. y 800. Module1 1 Welcome to the first module of the course. It is indeed an exciting event to share with you the subject that has lot to offer both from theoretical side and practical aspects. To begin with,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Big Data: a new era for Statistics

Big Data: a new era for Statistics Big Data: a new era for Statistics Richard J. Samworth Abstract Richard Samworth (1996) is a Professor of Statistics in the University s Statistical Laboratory, and has been a Fellow of St John s since

More information

A New Interpretation of Information Rate

A New Interpretation of Information Rate A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

A New Quantitative Behavioral Model for Financial Prediction

A New Quantitative Behavioral Model for Financial Prediction 2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of

More information

Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

More information

Lasso on Categorical Data

Lasso on Categorical Data Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

A semi-supervised Spam mail detector

A semi-supervised Spam mail detector A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

8. Linear least-squares

8. Linear least-squares 8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot

More information

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and. Master s Thesis Title A Study on Active Queue Management Mechanisms for Internet Routers: Design, Performance Analysis, and Parameter Tuning Supervisor Prof. Masayuki Murata Author Tomoya Eguchi February

More information

Data analysis in supersaturated designs

Data analysis in supersaturated designs Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,

More information

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

In this chapter, you will learn improvement curve concepts and their application to cost and price analysis.

In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. 7.0 - Chapter Introduction In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. Basic Improvement Curve Concept. You may have learned about improvement

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Recognizing Informed Option Trading

Recognizing Informed Option Trading Recognizing Informed Option Trading Alex Bain, Prabal Tiwaree, Kari Okamoto 1 Abstract While equity (stock) markets are generally efficient in discounting public information into stock prices, we believe

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

From Sparse Approximation to Forecast of Intraday Load Curves

From Sparse Approximation to Forecast of Intraday Load Curves From Sparse Approximation to Forecast of Intraday Load Curves Mathilde Mougeot Joint work with D. Picard, K. Tribouley (P7)& V. Lefieux, L. Teyssier-Maillard (RTE) 1/43 Electrical Consumption Time series

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Hong Kong Stock Index Forecasting

Hong Kong Stock Index Forecasting Hong Kong Stock Index Forecasting Tong Fu Shuo Chen Chuanqi Wei tfu1@stanford.edu cslcb@stanford.edu chuanqi@stanford.edu Abstract Prediction of the movement of stock market is a long-time attractive topic

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

discuss how to describe points, lines and planes in 3 space.

discuss how to describe points, lines and planes in 3 space. Chapter 2 3 Space: lines and planes In this chapter we discuss how to describe points, lines and planes in 3 space. introduce the language of vectors. discuss various matters concerning the relative position

More information

Special Situations in the Simplex Algorithm

Special Situations in the Simplex Algorithm Special Situations in the Simplex Algorithm Degeneracy Consider the linear program: Maximize 2x 1 +x 2 Subject to: 4x 1 +3x 2 12 (1) 4x 1 +x 2 8 (2) 4x 1 +2x 2 8 (3) x 1, x 2 0. We will first apply the

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

A Simple Model of Price Dispersion *

A Simple Model of Price Dispersion * Federal Reserve Bank of Dallas Globalization and Monetary Policy Institute Working Paper No. 112 http://www.dallasfed.org/assets/documents/institute/wpapers/2012/0112.pdf A Simple Model of Price Dispersion

More information

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING 1. Introduction The Black-Scholes theory, which is the main subject of this course and its sequel, is based on the Efficient Market Hypothesis, that arbitrages

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS Huina Mao School of Informatics and Computing Indiana University, Bloomington, USA ECB Workshop on Using Big Data for Forecasting

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network Dušan Marček 1 Abstract Most models for the time series of stock prices have centered on autoregressive (AR)

More information

Evaluating the Lead Time Demand Distribution for (r, Q) Policies Under Intermittent Demand

Evaluating the Lead Time Demand Distribution for (r, Q) Policies Under Intermittent Demand Proceedings of the 2009 Industrial Engineering Research Conference Evaluating the Lead Time Demand Distribution for (r, Q) Policies Under Intermittent Demand Yasin Unlu, Manuel D. Rossetti Department of

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

A Description of Consumer Activity in Twitter

A Description of Consumer Activity in Twitter Justin Stewart A Description of Consumer Activity in Twitter At least for the astute economist, the introduction of techniques from computational science into economics has and is continuing to change

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

How To Find Local Affinity Patterns In Big Data

How To Find Local Affinity Patterns In Big Data Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)

More information

The degrees of freedom of the Lasso in underdetermined linear regression models

The degrees of freedom of the Lasso in underdetermined linear regression models The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN

More information

JetBlue Airways Stock Price Analysis and Prediction

JetBlue Airways Stock Price Analysis and Prediction JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

Option Portfolio Modeling

Option Portfolio Modeling Value of Option (Total=Intrinsic+Time Euro) Option Portfolio Modeling Harry van Breen www.besttheindex.com E-mail: h.j.vanbreen@besttheindex.com Introduction The goal of this white paper is to provide

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

A CRF-based approach to find stock price correlation with company-related Twitter sentiment POLITECNICO DI MILANO Scuola di Ingegneria dell Informazione POLO TERRITORIALE DI COMO Master of Science in Computer Engineering A CRF-based approach to find stock price correlation with company-related

More information

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

More information