Inverse covariance matrix regularization for minimum variance portfolio. Auberson Maxime

Size: px
Start display at page:

Download "Inverse covariance matrix regularization for minimum variance portfolio. Auberson Maxime"

Transcription

1 Inverse covariance matrix regularization for minimum variance portfolio Auberson Maxime June 2015

2 Abstract In the mean-variance optimization framework, the inverse of the covariance matrix (or the precision matrix) of the assets considered Σ 1 is of primary importance. It is central in determining the optimized weights to allocate in each asset, especially when using the global minimum variance portfolio w GMV. Unfortunately, in practice, the estimation of the covariance matrix and its inverse is associated with several serious problems, especially when using the sample estimates. The gains from the optimization are often more than offset by the errors in the estimation. One way to deal with this problem is to pull the off-diagonal elements of the precision matrix estimate towards zero, i.e. shrink the precision matrix. Extreme coefficients are reduced, and it improves the prediction accuracy of the estimation, as extreme coefficients are likely to be due to estimation errors. A way to achieve that shrinkage effect is to impose a penalty on the size of the off-diagonal elements in the Σ 1 estimation. In this paper, I focus on two main types of regularization; the l 1 -penalty and the l 2 -penalty, which impose respectively a penalty on the l 1 and the l 2 norm of the precision matrix coefficients. It results in different shrinkage profiles and this is what I aim to study in this paper. As for the organization, the first part of the paper is about the theory, i.e. how to estimate the precision matrices according to these two methods and what are the theoretic implications and characteristics of both estimates. Then, in the second part, I implement both methods on several sets of financial data in order to study and compare their out-of-sample performance in an asset allocation context, using the minimum variance portfolio w GMV. Moreover, to provide a better global view of the results and some type of benchmark, other well-known methods are considered, like the equally-weighted portfolio or the sample-based global minimum variance portfolio. 1

3 Contents 1 Introduction 3 2 The precision matrix Theory Estimation: problems and potential solutions The precision matrix estimation The unpenalized version The L1-regularized estimation: the graphical lasso estimator Description The L1-penalty Graphical lasso: how it works Graphical lasso: the algorithm The L2-regularized estimation: the ridge precision estimator Description The L2-penalty The Alternative Type I ridge precision estimator Analysis of the two estimators The graphical lasso - the shrinkage parameter λ L The graphical lasso - the convergence condition t The ridge estimator - the shrinkage parameter λ R The ridge estimator VS the graphical lasso - the timing The ridge estimator VS the graphical lasso - sparsity Out-of-sample evaluation Setup The databases The different portfolio strategies considered The approach The choice of the shrinkage parameters λ L and λ R The adaptive ridge strategy - ADR Results The performance measures considered # 1 dataset - the 96 portfolios based on size and BtM ratio # 2 dataset - the 48 industry portfolios # 3 dataset - the 133 individual stocks # 4 dataset - combination 2 (96SBtMport + 133Indiv) Conclusion 36 2

4 1 Introduction How should portfolio managers optimally allocate wealth across all available assets? What are the criteria to consider in determining whether an additional asset should be included in a portfolio? How can we judge the usefulness of an asset on the portfolio level? The economist Harry Markowitz is the first one to really bring answers to these questions in his 1952 paper; basically, one should always maximize the portfolio expected return with respect to a given amount of portfolio risk, represented by its variance. Only these two statistical moments are considered in judging the performance of assets and portfolios. It is the mean-variance framework, which serves as foundation for many theories which are still relevant today, such as the Capital Asset Pricing Model (CAPM). On the portfolio level, when multiple assets are considered, not only the variance of each asset is important but also the covariances (or correlations) between each asset. Indeed, some assets among the portfolio may tend to move in an offsetting way, and therefore reduce the variance (or risk) of the overall portfolio. This is the notion of diversification, which means that it is actually possible to decrease the (non-systematic) risk of the portfolio by investing in additional assets. For the diversification effect to be efficient, the assets should, as much as possible, not move in the same way i.e. be imperfectly correlated. Hence, in mean-variance portfolio optimization, the covariance matrix Σ of asset returns is crucial in determining the optimal allocations. Unfortunately, in practice, it must be estimated and the estimation process brings along serious problems. The real expected returns, variances and covariances are not available; they must be estimated using data. That means that errors can be made through the estimation and it is usually what happens in practice. In some cases, the gains from portfolio optimization even disappear, offset completely by these estimation errors. So, even though the mean-variance optimization framework is intuitive, easy to understand and still used in practice, it is well criticized in the literature. Therefore, the results from the optimization can be sub-optimal and even outperformed by nave diversification strategies such as the 1/N strategy. DeMiguel, Garlappi and Uppal (2007) find that no optimizing model consistently delivers a better performance than the equally-weighted strategy, as the latter does not need any estimation procedures and has a very low turnover. The conditions needed for the sample-based mean-variance optimization framework to actually outperform the 1/N strategy out-of-sample are almost impossible to achieve in practice. In an effort to decrease the estimation errors, it is possible to focus only on the risk, i.e. the variance, rather than on the risk and the return. In that case, all the potential errors from estimating the expected returns of the assets are avoided. This is the basis for the risk budgeting or risk parity approaches; only controlling for the risk of the portfolio. Therefore, only the covariance matrix has to be estimated. However, problems still persist; using the sample estimate for the covariance matrix tend to perform awfully out-of-sample. First, it usually gives extreme weights which are not desirable or maybe even applicable in practice. Second, it is also hypersensitive to new data, resulting in a huge turnover over multiple trading periods. The reason for that is that the number of observations needed for the estimation (T ) has to be very high for the estimate to be trustworthy relatively to the number of assets considered (N). In other words, due to its nature, the sample estimator needs to collect a lot of information in the data to be reliable. Closer to one gets the T/N ratio, less reliable are the results. It is therefore not 3

5 surprising that, using the sample covariance matrix, the gains from the optimization are completely offset by the costs, especially against the 1/N portfolio which has almost no turnover costs. Actually, DeMiguel, Garlappi and Uppal (2007) find that the estimation window needed for the sample-based optimization strategies to consistently outperform the 1/N strategy is of around 3000 months for 25 assets and around 6000 months for 50 assets. It is almost impossible to achieve in practice, and they find therefore that no optimizing model consistently outperform the equally-weighted strategy. Related to that, another main drawback is that when the number of observations used for estimation (T ) is lower than to the number of assets considered (N), the sample-based covariance matrix is singular. It is then impossible to invert, and it can t be used for the optimization process as the inverse covariance matrix is computationally necessary. The main objective of this paper is to find better ways than the sample-based method to estimate the covariance matrix and its inverse in order to find more adequate results in an asset allocation context. The weights resulting from the optimization should be less extreme and more stable across the trading periods to avoid the turnover trap. To be more precise, I test two different methods to estimate the precision matrix which are based on two different types of penalization: the graphical lasso algorithm from Friedman, Hastie and Tibshirani (2007) and the alternative Type I ridge precision estimator from Van Wieringen and Peeters (2014). The first one is based on the l 1 -penalty, whereas the second is based on the l 2 -penalty. I apply both methods on financial data in order to compare their performance on financial data. For the optimization, I only focus on risk, i.e. the covariance matrix, without considering the expected returns. I use the global minimum variance portfolio w GMV along the paper for the evaluations, which does not need any expected return estimation. 2 The precision matrix 2.1 Theory The covariance matrix and especially its inverse are of critical importance for an efficient asset allocation in a mean-variance framework. The covariance matrix is intuitively useful, to better understand the statistical relations between the data. However, computationally, its inverse is more relevant as the covariance matrix has to be inverted to transform the return data into weights. When using a strategy which focuses only on the risk like the global minimum variance portfolio, the precision matrix 1 is actually the only parameter to compute to find the optimized weights. It is the only element which depends on the data, other than the number of assets, as we can see in the GMW portfolio formula: w GMV = 1 1 N Σ 1 1 N Σ 1 1 N Of course, when the inverse has to be estimated, as in practice, Σ 1 must be replaced by Σ 1. Even though the precision matrix in itself is complicated to interpret, Stevens (1995) revealed an interesting information about it. He highlighted the fact that the inverse of the covariance matrix gives us the hedging trades among the assets. More precisely, each line i in the precision matrix is 1 Along the paper, I use Σ 1 or Θ equivalently to designate the precision matrix depending on the situation 4

6 a hedge portfolio for the asset i. Given N assets, each line i of the precision matrix can be seen as a long position in the i-th asset and short positions in the N 1 other assets. Basically, the long position in the i-th stock is then hedged by the N 1 short positions. It is useful to see the precision matrix with these notations in order to better understand the principle: Σ 1 = 1 β 12 σ 11 (1 R1 2) β 21 1 σ 22 (1 R2 2) σ 11 (1 R 2 1 )... β 1N σ 11 (1 R1 2) β 2N σ 22 (1 R2 2) σ 22 (1 R2 2) β N1 σ NN (1 R 2 N ) β N2 σ NN (1 R 2 N )... 1 σ 22 (1 R 2 2 ) The inverse covariance matrix is expressed here in a multiple regression way. The returns of the i-th stock are regressed on the N 1 other stocks, and it returns the (N 1) different β ij, or the vector β i. Of course, the inverse matrix being symmetric,β ij must be equal to β ji. Their signs are negative, as they represent short positions, according to the hedge portfolio view of the precision matrix. The vector β i represent the part of the i-th stock return that can be explained by the regression, i.e. by the variations in the N 1 stocks. The elements are normalized by the part of asset i s variance that can t be explained by the regression, i.e. the unhedgeable risk of the asset i (Ri 2 being the R squared of the i-th regression). 2.2 Estimation: problems and potential solutions Hence, if the precision matrix estimation is seen like a multiple linear regression problem as Stevens suggested it, another problem faces itself: the multicollinearity in the data. Basically, it means that if the independent variables in a multiple regression are highly correlated, it can disrupt the regression process and discredit the vector of coefficient β found. Indeed, a regression works well when the independent variables are really, as it is suggested, independent of each other. They should have separate and independent impacts on the dependent variable. Therefore, when the independent variables are in some way dependent of each other, the results from the linear regression can be imprecise and unreliable. That can be especially problematic in large databases like groups of stocks, as correlations often exist among them. It can then generate large errors in the estimation process of the precision matrix if the regular least squares estimation is used. One way to limit these estimation errors and avoid the multicollinearity problem is to simplify the covariance structure of the data. Essentially, it means that it can be beneficial in a model to, rather than consider the whole set of parameters and increasing the errors, select only some relevant parameters from the full set and therefore decreasing the estimation errors. In the precision matrix context, it means that setting some redundant off-diagonal elements ( β ij ) to zero can reduce the amount of noise in the model due to estimation errors. It also makes the models easier to interpret as they focus only on the strongest relations. This phenomenon is called subset selection, as we force some independence in the data only to focus on the most pertinent relations i.e. the most pertinent subset of parameters. Another way to improve the estimation process is to reduce the most extreme coefficients, as they are likely to be due to estimation errors. This phenomenon is called shrinkage, as we shrink the coefficients towards zero so that they become more conservative and less subject to estimation errors. 5

7 To achieve subset selection and/or shrinkage, it requires a type of penalization in the estimation process, to pull the too extreme coefficients of the precision matrix towards to or to zero. An important concept to understand is that, with the penalizations, the estimation is biased. It means that the penalized estimate do not asymptotically tend to the true parameter value and they are statistically incorrect. The model is statistically wrong as it does not represent the exact correct relations due to the structure simplification. However, it allows to reduce overfitting i.e. reduce the variance and increase the prediction accuracy of the estimates. The least squares estimates of the coefficients have a very low bias, but they also have a high variance, which explains why they are subject to large estimation errors. They need a very large number of observations to really capture enough information as it is in the data and achieve sufficient prediction power. Hence, there is actually a trade-off between the benefits (higher prediction accuracy) and costs (bias) of the penalization. By shrinking, we sacrifice some bias for a better prediction power of the estimation. 3 The precision matrix estimation 3.1 The unpenalized version Let Y i = (Y 1,..., Y T ) be one of the N T -dimensional random variable drawn from a multivariate normal distribution N (µ, Σ). There are N variables, and T observations. If Σ 1 = Θ and S is the sample covariance matrix, the problem is to maximize the likelihood function: argmax {ln(det Θ) trace(sθ)} Θ If N < T, there are more observations than variables, and Θ ML = Ŝ 1, meaning that the inverse of the sample covariance matrix is the maximum likelihood estimator of the precision matrix. However, when N > T, as it has already been mentioned, the sample covariance matrix is singular. It is therefore not invertible, and the precision matrix is undefined. Moreover, even if N < T, as it was also mentioned earlier, the sample covariance matrix and its inverse achieve low prediction power and produce unstable weights. The results of the optimization are therefore unreliable in practice. In order to avoid these problems, it is necessary to regularize the precision matrix estimation. 3.2 The L1-regularized estimation: the graphical lasso estimator Description The graphical lasso is an algorithm which shrinks the elements of the precision matrix towards zero compared to the maximum likelihood estimates. Due to the nature of its penalty, it provides sparsity in the data, meaning that it shrinks some irrelevant precision matrix coefficients directly to zero. Therefore, aside from shrinkage, it also promotes subset selection. If β ij (the precision matrix coefficient) is different from zero according to this method, it means that the jth-stock provides a sufficient contribution to the hedge of the i-th stock relative to the other N 2 stocks. Otherwise, it is set to zero by the l 1 -penalty. The advantage is that it limits the trades only to the assets which are really relevant in a risk reduction context. Another nice property is also that 6

8 the graphical lasso keeps the precision matrix positive definite even if N > T. The graphical lasso have been shown by Goto and Xu (2013) to bring substantial gains in terms of risk reduction in an asset allocation context and my work is based on their paper. It is important to note that the sparsity of the precision matrix does not imply the sparsity of the covariance matrix. Even though the precision matrix is sparse, the covariance matrix often still has positive covariances The L1-penalty Basically, the l 1 -penalty imposes a penalty on the overall size of the regression coefficients, i.e. on the sum of the absolute values of the vector β. It is used in least squares regression, and if we see it in the Lagrangian form, it can be expressed like this: β lasso = argmin β 1 2 N p (y i β 0 x ij β j ) 2 + λ L i=1 j=1 j=1 p β j With the penalty parameter being λ L. The higher it is, stronger is the shrinkage. Due to the absolute nature of the penalty around the coefficients, it achieves absolute shrinkage and set coefficients exactly to zero. An equivalent way to see the problem which makes the size penalty clearer is: β lasso = argmin β N (y i β 0 i=1 p x ij β j ) 2 j=1 subject to p j=1 β j t When applied to the estimation and the likelihood function, according to Friedman, Hastie and Tibshirani (2007), it can be expressed in this way: argmax {ln(det Θ) trace(sθ) λ L Θ 1 } Θ With Θ 1 being the l 1 norm, or the sum of the absolute values of the elements of Σ 1. In this context, the penalty is on the precision matrix coefficients. A larger value for λ L promotes more sparsity, whereas a value of zero for λ L gives us the same solution as the unconstrained maximum likelihood solution. The crucial choice for the value of the regularization parameter will be discussed later in the implementation Graphical lasso: how it works Unfortunately, the computations for estimating the precision matrix with the lasso penalization are complicated and a closed-form solution does not exist. This is why it must be solved with the help of an algorithm. Several have been elaborated in the recent years, but I use the graphical lasso algorithm that Friedman, Hastie and Tibshirani presented in their 2007 paper. It actually estimates the covariance matrix Σ rather than the precision matrix Σ 1 (the explanations are below), but the latter can be easily retrieved from Σ. The graphical lasso is a block coordinate descent algorithm, meaning that it estimates one line and column of the covariance matrix at a 7

9 time rather than estimating the whole covariance matrix at once. In substance, it means that at each line of the matrix, the covariance coefficients for this line are estimated through the lasso l 1 -penalized regression. Then, using these new coefficients, the algorithm updates the current covariance matrix estimate. This new covariance matrix estimate is used as the basis for the next optimization at the next line. Therefore, it does not consist of N separate lasso problems, but rather of a single N-coupled lasso problem. This is what makes the graphical lasso relevant as the use of the current estimate at each lasso problem shares the information between the problems in an appropriate fashion. The graphical lasso algorithm is based on the sub-gradient of the likelihood function above. Using the fact that the derivative of ln(det Θ) is Θ 1 = W (the covariance matrix), the sub-gradient is equal to: W + S + λ L Γ = 0 With Γ =sign(θ), or a matrix of component-wise signs of Θ. Basically, the graphical lasso solves this sub-gradient for one row/column at the time, while holding the rest fixed. Intuitively, it regresses the variance coefficient w 22 on the other coefficients in order to find the covariance coefficients w 12 and w12 T by symmetry. For each i, the algorithm partitions the covariance matrix estimate W in that way: ( ) W11 w W = 12 w12 T w 22 With: W 11 being a (N 1) (N 1) matrix, corresponding to the original matrix without the line and column i w 12 and w12 T being (N 1) 1 vectors, corresponding to the line and column i, i.e. the covariances ij and ji by symmetry w 22 being a scalar, corresponding to the diagonal element ii (the variance of i) Θ is partitioned in the same way. Therefore, if we use the partitioning above, for each i it means that the sub-gradient solves: w 12 + s 12 + λ L γ 12 = 0 With γ jk = sign(θ jk ) now that we are taking it line by line, as θ jk is the element jk of the precision matrix Θ. Using w 12 = W 11 β, the sub-gradient for each row/column can be rewritten: W 11 β + s 12 + λ L v = 0 Where v sign(β) as we know that the sign of θ 22 (the diagonal of precision matrix) is always positive. It corresponds to the sub-gradient of the l 1 -regularized quadratic program: { } 1 min β 2 β W 11 β + β s 12 + λ L β 1 For β being a (N 1) 1 vector. The algorithm uses the partitioning and finds the vectors w 12 and w T 12 (by symmetry), i.e. the shrunk covariances, through the l 1-regularized quadratic program of the variable i on the other variables j (i j). The sub-gradient is the link between 8

10 the l 1 -regularized quadratic program (hence, the graphical lasso algorithm) and the solution of the likelihood maximization that is shown at the beginning of the section Graphical lasso: the algorithm The objective for each line i is to estimate the covariances w 12 (and w 21 or w12 T by symmetry) through the lasso regression. The diagonal coefficients of the covariance matrix must not be changed during the algorithm, as it only shrinks the covariances and not the variances. It cycles through the lines i = 1, 2,..., N, 1, 2,...N..., and each time updates the current estimate of the covariance matrix W with the N 1 coefficients (w 12 = W 11 β) corresponding to the covariances of the asset i with the other assets j. The algorithm continues until it decides it has converged. In my implementation, according to Friedman, Hastie and Tibshirani (2007), the convergence condition is achieved when the average absolute change in W is less than t mean( S ), where S are the off-diagonal elements of the sample covariance matrix and t is a fixed threshold. Once the convergence is achieved for the covariance matrix, it is easy to convert it into the precision matrix. The stages of the algorithm can be summarized in this way: 1. Start with Ŵ0 = S + λ I N, where I is an identity matrix of dimension N For each i from 1 to N: (a) Rearrange the row/column i in the matrix so that it is the last one, corresponding to the partitioning of the matrix above (b) Solve the l 1 -regularized quadratic program above to find the vector of covariances ŵ 12 using as warm start the precedent vector β for this line (c) Update the row/column of covariances ŵ 12 in the current estimate of the covariance matrix Ŵ (transform line by line Ŵ0 into Ŵ) (d) Save the β for this row/column in a matrix (e) Check the convergence condition i. If it is satisfied, stop the algorithm ii. If not, start again at (a) with the current estimate Ŵ as Ŵ0 2. Finally, once it has converged, sweep through all the lines and convert in the same way Ŵ into Θ using first θ 22 = 1 (ŵ 22 + β ŵ 12 ) for the diagonal elements and θ 12 = β θ 22 for the off-diagonal elements (using the matrix of β saved through the algorithm) It is important to note that after each i, the algorithm updates Ŵ, and uses this update of Ŵ for the next iteration. Therefore, Ŵ is a minor of the sample covariance matrix only at the first iteration, as the algorithm updates (or shrinks) Ŵ each time. Therefore, it shares the information between the problems in an appropriate block coordinate fashion, and this is why it amounts to the approximate solution of the penalized likelihood function. It is also important to note that if λ L = 0, the algorithm does not penalize the coefficients and simply compute the sample inverse covariance matrix Ŝ 1 using a regression at each stage. 2 For more details, see Friedman, Hastie and Tibshirani (2007); Banerjee et al. (2008); Mazumder and Hastie (2012) 9

11 Of course, smaller t is, longer it takes to converge, more the covariance matrix is shrunk and more sparse is the precision matrix estimate. Indeed, the average absolute change in W must be smaller to satisfy the convergence condition, and it gets only smaller more it cycles through the lines and shrinks the matrix. The sparsity is also a function of the penalty parameter λ L ; higher it is, more the covariance and precision matrices are shrunk through each optimization. If λ L > S i.e. the penalty is higher than all the covariance coefficients, it results in a covariance matrix filled with zero non-diagonal elements. The characteristics of the graphical lasso are studied later more in details. 3.3 The L2-regularized estimation: the ridge precision estimator Description The ridge regression is another way to shrink and estimate the precision matrix. Actually, whereas the graphical lasso uses the l 1 -penalty, the ridge estimator uses the l 2 -penalty. It is similar as it also imposes a penalty on the coefficients, but the latter penalizes the sum of the coefficients squared (instead of the sum of the absolute values of the coefficients). When the l 1 -penalty shrinks the coefficients in a different way and set some to zero, the l 2 -penalty shrinks all the coefficients in a proportional way. Therefore, the result from the estimation is not sparse. In some situations, it can be better not to have a sparse solution, as the true model may not be sparse. Hence, the ridge estimation promotes shrinkage, but not subset selection. As the lasso penalization, the ridge penalization also ensures that the covariance matrix is non-singular or invertible, and that the precision matrix exists. In my paper, I use the work done on the ridge estimation of the precision matrix by Van Wieringen and Peeters (2014), as they have shown that the alternative ridge estimators perform well, and even better than the corresponding graphical lasso estimators in terms of loss The L2-penalty The l 2 -penalty is also primary used in regressions, and written in the Lagrangian form, it can be expressed like this: or β ridge = argmin β 1 2 N (y i β 0 i=1 β ridge = argmin β p x ij β j ) 2 + λ R j=1 N (y i β 0 i=1 subject to p j=1 β2 j t p x ij β j ) 2 With the penalty parameter being λ R. We see that the only difference between with this formula and the formula of the l 1 -penalty is the penalty itself, i.e. p j=1 β2 j instead of p j=1 β j. As λ L for the l 1 -penalty, λ R also controls the strength of the shrinkage. Whereas there is no closed-form solution for the lasso penalty, there is a closed-form solution for the ridge penalty, which makes the computations way easier. 10 j=1 p j=1 β 2 j

12 When applied to the estimation and the likelihood function, according to Van Wieringen and Peeters (2014) and their Alternative Ridge precision estimator, the penalized likelihood function can be expressed in this way: argmax Θ {ln(det Θ) trace(sθ) 12 trace [ (Θ T) T Λ(Θ T) ]} Where Λ is a positive definite symmetric matrix of penalty parameters and T is a positive definite symmetric target matrix. Essentially, it means that the shrinkage parameter will shrink the precision matrix towards the target matrix (the strength of that shrinkage depending on Λ). Solving this, it results in the following generic penalized ML ridge estimator of the precision matrix: Θ(Λ) = { [ Λ + 1 ] 1 (S ΛT) (S ΛT) 2 Equivalently, the covariance matrix can be estimated in this way: Σ(Λ) = [Λ + 14 ] 1 (S 2 1 ΛT)2 + (S ΛT) The Alternative Type I ridge precision estimator The alternative type I ridge precision estimator is a special case of the generic penalized ML Ridge estimator above, but with Λ = λ R I N, λ R being a scalar penalty parameter between zero and infinity. We can then rewrite the Alternative Type I ridge precision estimator in that way: Θ(λ R ) = } 1 { [ λ R I N + 1 ] 1 } 1 4 (S λ RT) (S λ RT) And the Alternative Type I ridge covariance matrix estimator: Σ(λ R ) = [λ R I N + 14 (S λ RT) 2 ] (S λ RT) It is then also possible to shrink the covariance matrix to a target covariance matrix C; one just has to specify T = C 1. In order to understand better the estimator, it is necessary to state some of its main properties. First, for any λ R, the precision matrix coefficients are never exactly equal to zero, as it is a proportional shrinkage method. In other words, it does not promote sparsity and does not achieve subset selection. Second, closer to zero the shrinkage parameter λ R gets, more the precision matrix looks like the inverse of the sample covariance matrix (with the advantage of being always definite). Finally, closer to infinity the shrinkage parameter λ R gets, more the precision matrix looks like the target matrix T. The choice of the shrinkage parameter is discussed in the out-of-sample evaluation section. 11

13 We see that this ridge estimator is way easier to compute than the graphical lasso and it is a significant advantage. It also offers much more flexibility, due to the multiple values possible for the shrinkage parameter and the possibilities to shrink towards a target matrix. I chose T such that diag[t]= 1/diag[S] as Van Wieringen and Peeters (2014) suggested it. This way the estimator shrinks the off-diagonal elements of the precision matrix towards zero as desired. Of course, multiple different choices could be possible, and it deserves to be studied more in details. To hold up a potential disadvantage of the estimator, it does not achieve subset selection, and it could perhaps be problematic in an asset allocation context, especially compared to the graphical lasso. This is what I aim to test in the second part of my paper with an application on financial data. 4 Analysis of the two estimators In this section, the main characteristics of both estimators are studied. Of course, all the results are sample specific, meaning that, depending on the data and the shrinkage parameters used, the results can be different. However, the objective here is to capture the main relationships between the parameters. All the timing results are based on an Intel Core i GHz processor. 4.1 The graphical lasso - the shrinkage parameter λ L The effect of the shrinkage parameter for the graphical lasso can be seen in this table. These are data for 10 same variables, with t = 0.01 as convergence parameter. Σ and Θ are the absolute value of the off-diagonal elements of the corresponding matrices. As a general trend, higher is the shrinkage parameter, more shrunk is the precision matrix or the covariance matrix. It also takes less time, and less iterations to get to the convergence condition (one iteration meaning N optimizations, or one cycle through the matrix). Therefore, it converges faster with a higher shrinkage parameter. If the penalty parameter λ L is very close or even higher than the covariance matrix entries, it will simply shrink the entries directly towards zero in one iteration, and this is what we can see with λ L = 70 in the Table 1. λ L mean( Σ L ) mean( Θ L ) #Iterations Seconds Table 1: The effects of the shrinkage parameter λ L 4.2 The graphical lasso - the convergence condition t The convergence condition is also a parameter which has an effect on the shrinkage of the covariance and precision matrices. In this table, t vary while the other elements are kept fixed. There are also 10 variables, with a shrinkage parameter λ L = 2. As anticipated, smaller is the convergence 12

14 condition, longer the algorithm takes to shrink the matrices (especially between t = 0.05 and t = 0.01). However, there is not much difference in terms of shrinkage once t = 0.01 is exceeded. That means that the shrinkage parameter λ L really determines the shrinkage; the convergence condition only determines the precision of the shrinkage. t mean( Σ L ) mean( Θ L ) #Iterations Seconds Table 2: The effects of the convergence condition t 4.3 The ridge estimator - the shrinkage parameter λ R The shrinkage parameter of the ridge estimator is on a different scale than the graphical lasso as it can go from zero to infinity. It is interesting to see the effect of the different values for λ R on the covariance and precision matrices. These results are also for 10 same variables, and shows that it can make a lot of difference. The column mean( Ŝ 1 Θ shows the difference between the sample inverse covariance matrix and the inverse covariance matrix estimated through the Alternative Type I ridge estimator. In agreement with the theory, when λ R is close to zero, the precision matrix looks a lot like the sample one. As soon as λ R gets higher and goes closer to infinity, the precision matrix estimated through the ridge estimator becomes more and more different from the sample one to look like the target matrix. In the asset allocation context, lower is the shrinkage parameter, more extreme are the weights given by the strategy. We can see that the shrinkage is proportional to the value of λ R in agreement with the theory. As for the time taken, we can see that there is a big difference between the graphical lasso and the ridge estimator. The timing is studied more in details in the next section. λ R mean( Σ R ) mean( Θ R ) mean( Ŝ 1 Θ R ) Seconds Table 3: The effects of the shrinkage parameter λ R 13

15 4.4 The ridge estimator VS the graphical lasso - the timing We can see in this section the differences in timing between both estimators. Basically, the time needed to estimate the covariance and precision matrices is displayed as a function of the number of variables. The variables here are series of random numbers from a normal distribution with µ = 0 and σ = 4. For the number of variables N going from 5 to 100, N + 1 observations are generated for each variable. For the graphical lasso parameters, λ GL = 1 and t = 0.01, whereas for the ridge estimator λ R = The element which really determines the speed of the graphical lasso estimator is the size of the penalty compared to the size of the covariances. As with these random normal observations the covariances are rather low compared to the covariances with stock return data, the λ L is set rather low to represent more accurately the shrinkage process. It can actually take more or less time than what is shown here depending on the data. The results could not be represented together, as the ridge estimator would not even appear on the graphical lasso graph. The ridge estimator takes, at most, seconds to estimate the covariance and precision matrix as we see in Figure 3, whereas for the graphical lasso it can take up to almost 200 seconds as we see in Figure 1. Especially for the graphical lasso, there seems to be an upward trend in the time taken when variables are added. It is not surprising, as more variables means more optimizations and more coefficients to estimate for each optimization. However, it is not always exactly the case, and the behaviour can be quite erratic. Moreover, in Figure 2, the time needed to estimate for the graphical lasso is represented, besides as a function of the number of variables, also as a function of the convergence condition, with t = 0.5, 0.1 and We see that smaller is the convergence condition, longer it takes for the same number of assets. Figure 1: Time in seconds as a function of # of variables - graphical lasso 14

16 Figure 2: Time in seconds (y-axis) as a function of # of variables (x-axis) and convergence condition t (z-axis) - graphical lasso Figure 3: Time in seconds as a function of # of variables - ridge estimator 15

17 4.5 The ridge estimator VS the graphical lasso - sparsity In theory, the ridge estimator never sets coefficients to zero, as it only shrinks the coefficients proportionally. The graphical lasso, meanwhile, shrinks some coefficients more than others and can set some coefficients to zero. We can see the different effects of both estimators on the sparsity of the precision matrices in the tables below. The sparsity is defined as the number of zero non-diagonal elements 3 (as the diagonal elements are not changed by the graphical lasso, they are always positive) over the total number of elements in the precision matrix estimate. The theory seems to be confirmed. Even though the mean of the off-diagonal elements of Θ R can be smaller than Θ GL, there is still no zero elements with the ridge estimator and therefore no sparsity. This demonstrates its proportional shrinkage, as opposed to the graphical lasso. Of course, as anticipated, higher the shrinkage parameter, higher is the sparsity, amounting to 100% for λ GL = 70, meaning that all the non-diagonal coefficients are equal to zero. Here, the convergence condition t for the graphical lasso is equal to λ L Sparsity Θ L mean( Θ L ) λ R Sparsity Θ R mean( Θ R ) Figure 4: The sparsity of both precision matrix estimators 5 Out-of-sample evaluation Now that the theoretic characteristics and the implications of both estimators have been reviewed, I come to the second part of the paper: the implementation of both methods on financial data to test their out-of-sample performance. 5.1 Setup The databases Datasets Data description N T T/N Time period #1 Portfolios based on size and BtM ratio (96SBtMport) / /2012 #2 Industry portfolios (48IndPort) / /2012 #3 Individuals (133Indiv) / /2012 #4 Combination2 (96SBtMport+133Indiv) / /2012 Table 4: Description of the datasets considered 3 The precision matrix elements are actually rounded to zero for both estimators here; if the coefficient< , it is considered as being zero. 16

18 For the data, I follow the example of Goto and Xu (2013), as I chose four different databases of different sizes in order to cover several sample characteristics. The period for all the databases goes from 07/1969 to 12/2012 and these are all monthly return data. Of course, I also dispose of the data on the risk-free rate for the same period. The first dataset is composed of 96 portfolios 4 formed on size and book-to-market ratio available on the Kenneth R. French website 5. The second dataset is formed of 48 industry portfolios, also available on courtesy of the Kenneth R. French website. The third one is a sample of 133 individual stocks chosen randomly. Finally, the last one is a combination between the 96 portfolios and the 133 individual stocks, totalling 114 assets (48 from the first and 66 from the fourth). We can see that three of the four datasets contain large and diversified portfolios return data, except for the #3 dataset (and, of course, half of the last dataset). For all datasets, the covariance matrices and their inverses are computed using a period of 120 months (T), or 10 years. Therefore, for the third dataset, it is impossible to use the sample covariance matrix for the optimization. As T/N < 1, it is not invertible and the precision matrix is undefined The different portfolio strategies considered To provide some type of benchmark for the empirical evaluation, I also consider in addition to the graphical lasso and the ridge portfolios several popular methods. Before going into more details, I think it is necessary to clarify one notion; except for the equally-weighted portfolio which is free on any estimations, the methods I consider are all based on the Global Minimum Variance portfolio (GMV) formula: w GMV = 1 1 N Σ 1 1 N Σ 1 1 N It only depends on the inverse covariance matrix Σ 1, and the difference between the methods is the way to estimate this inverse covariance matrix ( Σ 1 ) or to apply this formula. But the point is that they are all based on this, as I only focus on the risk in my paper and not on the return for the strategies. Or, to be more precise, I only focus on risk minimization as these are minimum variance portfolios. 1. The equally-weighted portfolio (1/N): w EW = 1 N 1 N It is the simplest possible strategy, as it is only an equal repartition across all assets in the portfolio. However, it has been shown to perform surprisingly quite well, as it is free of any estimation errors and has low turnover costs. 2. The sample-based minimum variance portfolio (S): w S = 1 1 NŜ 1 1 N Ŝ 1 1 N 4 There actually are 100 portfolios, but I removed four of them for data missing reasons

19 It is the GMV portfolio, but based on the sample covariance matrix Ŝ 1 with all its known disadvantages. 3. The Jagannathan and Ma (2003) portfolio (JM): w JM = argmin w Ŝw w s.t. N i=1 w i = 1 and w i 0 for all i = 1,..., N Basically, it just consists in the sample minimum variance portfolio but with a no short-sale constraint. It has been shown to perform well and limit the sample covariance problems. Moreover, it does not depend on the inverse covariance matrix which means that it can be computed for every T/N profile. 4. The Ledoit and Wolf portfolio (LW ): w LW = 1 N 1 1 Σ LW 1 N Σ 1 LW 1 N It is also the GMV portfolio but with the precision matrix estimated with the Ledoit and Wolf s shrinkage estimator Σ 1 LW The graphical lasso portfolio (L): w L = 1 N 1 1 Σ L 1 N Σ 1 L 1 N The GMV portfolio with the inverse covariance matrix estimated through the graphical lasso algorithm. 6. The ridge portfolio (R): w R = 1 N 1 1 Σ R 1 N Σ 1 R 1 N The GMV portfolio estimated through the Alternative Type I ridge precision estimator. Of course, as mentioned earlier, for two datasets the sample minimum variance strategy is not applicable The approach The approach I consider is out-of-sample, meaning that the period I use to estimate the matrices and optimize the weights is not the same period used to test the performance of the strategies. It is actually a rolling-window approach; for each month t, the covariance and precision matrices are estimated using the 120 preceding months (from t 120 to t). The optimized portfolio weights to invest in each asset ŵ i,t are estimated according to the strategy, and these weights are held for one month during which the strategy achieves a certain level of out-of-sample return R i,t+1 = N j=1 ŵi,tr j,t+1. Then, the following month t + 1, the matrices are re-estimated using the 6 The function is publicly available on courtesy of Michael Wolf s page: publications.html 18

20 120 months from t 119 to t + 1, which gives new optimized weights ŵ i,t+1 which are again held until t + 2, to achieve a strategy return R i,t+2 = N j=1 ŵi,t+1r j,t+2. This process goes on over the whole sample period. Given the fact that 120 initial months are needed to start trading, it results in 402 covariance and precision matrices and 402 vectors of N optimized weights as there are 402 trading periods (denoted as P in the rest of the paper) over the whole sample (the 522 months of data minus the 120 initial months). This process ensures the potential investability of the strategies and the pertinence of the backtesting The choice of the shrinkage parameters λ L and λ R For the graphical lasso and the ridge strategies, values for λ L and λ R have to be set in order to apply the strategies. The optimal shrinkage parameters values are unknown and a way to estimate them must be found. There exists different methods, like the Leave-One-Out Cross Validation (LOOCV) score, but given that performance is the main objective in this paper I rather use a performance-based method. In order to be able to compare both strategies on the same scale, the method should be the same for the two different shrinkage parameters. I estimate the precision matrix with different shrinkage parameter values for both strategies during the first 10 years for each sample, and I use the shrinkage parameters which result in the best performance over the this in-sample period. The shrinkage parameters are then the same over the whole period. The performance is judged from a mean point-of-view, or to be more precise; I choose the shrinkage parameter which maximizes the mean of the returns over the in-sample period. Of course, given that only one precision matrix is estimated for the whole in-sample period, the weights for each month during that period are the same for one given shrinkage parameter value. For the out-of-sample period, the weights change each month, and performance must not be judged in the same way. Other measures than the mean could be used, like the variance or the Sharpe ratio, but it usually results in bad performance out-of-sample. Indeed, the lowest shrinkage parameter tends to give the least variable returns in-sample, whereas it is not the case out-of-sample. Therefore, using any measure based on the variance or standard deviation of the returns in-sample is incorrect. A possible explanation is that, as the same period is used to estimate the matrices and to test the performance, the most extreme weights (i.e. the lowest shrinkage parameter) give correctly the least variable returns. The mean, even though it is rather intuitive and simple, tends to give better results out-of-sample. For the ridge strategy, I test 200 different values for λ R between 150 and for all samples. Unfortunately, for the graphical lasso, due to the intensive computations, I can not be as thorough and I test 10 different values of λ GL per sample. The range of potential values depend on the sample; for smaller samples, the range covers lower values than for samples with more assets. For example, the sample with the individual stocks is the one for which the range is the highest. Indeed, this is probably where the highest shrinkage must be achieved, as there are probably less correlations in individual stocks than in large and diversified portfolios. As for the convergence condition, I chose t to be equal to 0.1 for all the samples due to the time taken by the graphical lasso algorithm. Hence, it is important to notice that a smaller t may achieve a different shrinkage and maybe a different out-of-sample performance than what is shown here. It is necessary to specify that the computations for the graphical lasso are intensive and this is the reason why the shrinkage parameters are only estimated once per sample for both 19

21 regularization strategies. It can be re-estimated during the period but it is more difficult to implement, and with the hypothesis that the optimal shrinkage parameter stays stable it should not be a problem. For the ridge estimator, the computations are way easier and it takes less time to estimate the precision matrices. Moreover, there are an infinity of shrinkage parameters to choose from, and it is more likely that the optimal shrinkage parameter changes during the period. Therefore, it makes sense in that case to re-estimate several times over the period, and this is what I do with the adaptative ridge strategy in the next section. 5.2 The adaptive ridge strategy - ADR In order to fully exploit the advantages of the ridge estimator, I also integrate a strategy which re-estimates the optimal shrinkage parameter λ R along the sample period. Indeed, it is likely that the same shrinkage parameter for 402 months (more than 30 years) may not be really optimal. It can give good results in some situations, while performing poorly in others. It can be especially problematic for the ridge strategy which has an infinity of possible shrinkage parameter values unlike the graphical lasso whose optimal shrinkage parameter is more likely to be stable. Furthermore, the speed of estimation and the flexibility of the ridge estimator are characteristics that must be taken advantage of. The most important feature is the re-estimation period, or at which frequency the shrinkage parameter is reset. There is a trade-off; it must not be too short in order not to be reset when not necessary, but also be short enough for the strategy to adapt itself sufficiently to potential new economical and statistical conditions. Moreover, too many shrinkage parameter changes may not be desirable in terms of portfolio turnover as it has a direct effect on the weights. To take that into account, I implement a flexible re-estimation period in the algorithm. The re-estimation is done in principle every 24 months, but depending on the value for λ R found, it can be shortened or extended. If the value found is too different from the last ones, meaning that the actual context may be unstable, the period until next re-estimation is reduced to 6 months (instead of 24). Hence, it allows the strategy to adapt itself quicker to potential erratic conditions. Conversely, if the shrinkage parameter found is very close to the previous one, the next re-estimation period is extended to 36 months on the basis that the situation may be more stable and that the current λ R may be well-suited to the sample. Another important characteristic is the range of shrinkage parameters possible at each re-estimation. The minimum value for λ R across all samples is fixed at 150, as the regular ridge strategy. There is also a maximum value, set to be as, for unexplainable reasons, the estimation process is disrupted with too high shrinkage parameter values. Moreover, there is no loss by imposing this limit given that, exceeding a certain level, it does not make any real differences in terms of shrinkage. At each re-estimation, the goal is to have a range wide enough to capture all potential optimal values while not being too wide in order to be sufficiently precise. It is a function of the last estimated optimal shrinkage parameter, or to be more precise λ R,i [0.4 λ R,i 1 ; 1.6 λ R,i 1 ]. 200 potential values with equal distance are then chosen as potential candidates within this range. However, when the last optimal shrinkage parameter value is at the top of the range, the potential range is increased for the next estimation, in order to capture a potential upward or downward trends. There is then a time series of the estimated optimal shrinkage parameters per each sample. 20

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Understanding the Impact of Weights Constraints in Portfolio Theory

Understanding the Impact of Weights Constraints in Portfolio Theory Understanding the Impact of Weights Constraints in Portfolio Theory Thierry Roncalli Research & Development Lyxor Asset Management, Paris thierry.roncalli@lyxor.com January 2010 Abstract In this article,

More information

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

How to assess the risk of a large portfolio? How to estimate a large covariance matrix? Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

More information

We provide a general framework for finding portfolios that perform well out-of-sample in the presence of

We provide a general framework for finding portfolios that perform well out-of-sample in the presence of Published online ahead of print March 6, 2009 MANAGEMENT SCIENCE Articles in Advance, pp. 1 15 issn 0025-1909 eissn 1526-5501 informs doi 10.1287/mnsc.1080.0986 2009 INFORMS A Generalized Approach to Portfolio

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

Lasso on Categorical Data

Lasso on Categorical Data Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

FTS Real Time System Project: Portfolio Diversification Note: this project requires use of Excel s Solver

FTS Real Time System Project: Portfolio Diversification Note: this project requires use of Excel s Solver FTS Real Time System Project: Portfolio Diversification Note: this project requires use of Excel s Solver Question: How do you create a diversified stock portfolio? Advice given by most financial advisors

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

WHITE PAPER - ISSUE #10

WHITE PAPER - ISSUE #10 WHITE PAPER - ISSUE #10 JUNE 2013 BENJAMIN BRUDER Quantitative Research Lyxor Asset Management NICOLAS GAUSSEL Chief Investment Officer Lyxor Asset Management JEAN-CHARLES RICHARD Quantitative Research

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Predicting daily incoming solar energy from weather data

Predicting daily incoming solar energy from weather data Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting

More information

Computing the optimal portfolio policy of an investor facing capital gains tax is a challenging problem:

Computing the optimal portfolio policy of an investor facing capital gains tax is a challenging problem: MANAGEMENT SCIENCE Vol. 1, No., February, pp. 77 9 issn -199 eissn 1-1 1 77 informs doi 1.187/mnsc.1.1 INFORMS Portfolio Investment with the Exact Tax Basis via Nonlinear Programming Victor DeMiguel London

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Active Versus Passive Low-Volatility Investing

Active Versus Passive Low-Volatility Investing Active Versus Passive Low-Volatility Investing Introduction ISSUE 3 October 013 Danny Meidan, Ph.D. (561) 775.1100 Low-volatility equity investing has gained quite a lot of interest and assets over the

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

Investment Statistics: Definitions & Formulas

Investment Statistics: Definitions & Formulas Investment Statistics: Definitions & Formulas The following are brief descriptions and formulas for the various statistics and calculations available within the ease Analytics system. Unless stated otherwise,

More information

Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

CAPM, Arbitrage, and Linear Factor Models

CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors

More information

Moving Least Squares Approximation

Moving Least Squares Approximation Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the

More information

7 Time series analysis

7 Time series analysis 7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

The Dangers of Using Correlation to Measure Dependence

The Dangers of Using Correlation to Measure Dependence ALTERNATIVE INVESTMENT RESEARCH CENTRE WORKING PAPER SERIES Working Paper # 0010 The Dangers of Using Correlation to Measure Dependence Harry M. Kat Professor of Risk Management, Cass Business School,

More information

Multiple regression - Matrices

Multiple regression - Matrices Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

More information

Chapter 2 Portfolio Management and the Capital Asset Pricing Model

Chapter 2 Portfolio Management and the Capital Asset Pricing Model Chapter 2 Portfolio Management and the Capital Asset Pricing Model In this chapter, we explore the issue of risk management in a portfolio of assets. The main issue is how to balance a portfolio, that

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

Using Microsoft Excel to build Efficient Frontiers via the Mean Variance Optimization Method

Using Microsoft Excel to build Efficient Frontiers via the Mean Variance Optimization Method Using Microsoft Excel to build Efficient Frontiers via the Mean Variance Optimization Method Submitted by John Alexander McNair ID #: 0061216 Date: April 14, 2003 The Optimal Portfolio Problem Consider

More information

Introduction to Principal Component Analysis: Stock Market Values

Introduction to Principal Component Analysis: Stock Market Values Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

Risk Reduction in Style Rotation

Risk Reduction in Style Rotation EDHEC-Risk Institute 393-400 promenade des Anglais 06202 Nice Cedex 3 Tel.: +33 (0)4 93 18 32 53 E-mail: research@edhec-risk.com Web: www.edhec-risk.com Risk Reduction in Style Rotation October 2010 Rodrigo

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Solution: The optimal position for an investor with a coefficient of risk aversion A = 5 in the risky asset is y*:

Solution: The optimal position for an investor with a coefficient of risk aversion A = 5 in the risky asset is y*: Problem 1. Consider a risky asset. Suppose the expected rate of return on the risky asset is 15%, the standard deviation of the asset return is 22%, and the risk-free rate is 6%. What is your optimal position

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Computer Handholders Investment Software Research Paper Series TAILORING ASSET ALLOCATION TO THE INDIVIDUAL INVESTOR

Computer Handholders Investment Software Research Paper Series TAILORING ASSET ALLOCATION TO THE INDIVIDUAL INVESTOR Computer Handholders Investment Software Research Paper Series TAILORING ASSET ALLOCATION TO THE INDIVIDUAL INVESTOR David N. Nawrocki -- Villanova University ABSTRACT Asset allocation has typically used

More information

Community Mining from Multi-relational Networks

Community Mining from Multi-relational Networks Community Mining from Multi-relational Networks Deng Cai 1, Zheng Shao 1, Xiaofei He 2, Xifeng Yan 1, and Jiawei Han 1 1 Computer Science Department, University of Illinois at Urbana Champaign (dengcai2,

More information

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R. Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Chapter 5. Conditional CAPM. 5.1 Conditional CAPM: Theory. 5.1.1 Risk According to the CAPM. The CAPM is not a perfect model of expected returns.

Chapter 5. Conditional CAPM. 5.1 Conditional CAPM: Theory. 5.1.1 Risk According to the CAPM. The CAPM is not a perfect model of expected returns. Chapter 5 Conditional CAPM 5.1 Conditional CAPM: Theory 5.1.1 Risk According to the CAPM The CAPM is not a perfect model of expected returns. In the 40+ years of its history, many systematic deviations

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

Appendices with Supplementary Materials for CAPM for Estimating Cost of Equity Capital: Interpreting the Empirical Evidence

Appendices with Supplementary Materials for CAPM for Estimating Cost of Equity Capital: Interpreting the Empirical Evidence Appendices with Supplementary Materials for CAPM for Estimating Cost of Equity Capital: Interpreting the Empirical Evidence This document contains supplementary material to the paper titled CAPM for estimating

More information

L13: cross-validation

L13: cross-validation Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014 Risk Decomposition of Investment Portfolios Dan dibartolomeo Northfield Webinar January 2014 Main Concepts for Today Investment practitioners rely on a decomposition of portfolio risk into factors to guide

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances?

Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances? 1 Extending the debate between Spearman and Wilson 1929: When do single variables optimally reproduce the common part of the observed covariances? André Beauducel 1 & Norbert Hilger University of Bonn,

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

Monotonicity Hints. Abstract

Monotonicity Hints. Abstract Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

Design of an FX trading system using Adaptive Reinforcement Learning

Design of an FX trading system using Adaptive Reinforcement Learning University Finance Seminar 17 March 2006 Design of an FX trading system using Adaptive Reinforcement Learning M A H Dempster Centre for Financial Research Judge Institute of Management University of &

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Lecture 1: Asset Allocation

Lecture 1: Asset Allocation Lecture 1: Asset Allocation Investments FIN460-Papanikolaou Asset Allocation I 1/ 62 Overview 1. Introduction 2. Investor s Risk Tolerance 3. Allocating Capital Between a Risky and riskless asset 4. Allocating

More information

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions. Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

More information

Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies

Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies Drazen Pesjak Supervised by A.A. Tsvetkov 1, D. Posthuma 2 and S.A. Borovkova 3 MSc. Thesis Finance HONOURS TRACK Quantitative

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

A Mean-Variance Framework for Tests of Asset Pricing Models

A Mean-Variance Framework for Tests of Asset Pricing Models A Mean-Variance Framework for Tests of Asset Pricing Models Shmuel Kandel University of Chicago Tel-Aviv, University Robert F. Stambaugh University of Pennsylvania This article presents a mean-variance

More information