Multi Factors Model. Daniel Herlemont. March 31, Estimating using Ordinary Least Square regression 3

Size: px

Start display at page:

Download "Multi Factors Model. Daniel Herlemont. March 31, 2009. 2 Estimating using Ordinary Least Square regression 3"

Alisha Quinn
9 years ago
Views:

1 Multi Factors Model Daniel Herlemont March 31, 2009 Contents 1 Introduction 1 2 Estimating using Ordinary Least Square regression 3 3 Multicollinearity 6 4 Estimating Fundamental Factor Models by Orthogonal Regression 7 5 References 11 1 Introduction The objective of this practical work is to provide an empirical case study of factor decomposition using historical prices of two stocks (Nokia and Vodafone) and four fundamental factors: ˆ a broad market index, The New York Stock Exchange (NYSE) composite index, ˆ an industry factor, a Mutual Communication fund, ˆ a growth style factor, the Riverside growth fund and ˆ a large caps factor, the AFBA Five Star Large Cap fund. source: Carol Alexander, see [1], study case II.1.4 1

historical prices of two stocks (Nokia and Vodafone) and four fundamental factors: ˆ a broad market index, The New York Stock Exchange (NYSE) composite index, ˆ an industry factor, a

2 1 INTRODUCTION Download the data at /downloads/alexander-case-study-ii-1-4.csv the data to your working directory and read them by the command quotes=read.csv("alexander-case-study-ii-1-4.csv") This work can be performed under Excel (download the package /downloads/matrix.zip Use the following code to read the data and plot the prices > dates = as.date(quotes[, 1], "%d/%m/%y") > prices = quotes[, -1] > prices = apply(prices, 2, function(p) p/p[1]) > n = ncol(prices) > matplot(dates, prices, type = "l", col = 1:n, lty = 1:n, xaxt = "n") > axis.date(1, dates) > legend(min(dates), max(prices), colnames(prices), col = 1:n, + lty = 1:n, cex = 0.7) Daniel Herlemont 2

zip Use the following code to read the data and plot the prices > dates = as.

3 2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION prices Vodafone Nokia NYSE.Index Communications Growth Large.Cap dates Using regression to build a multi factor model with these factors gives rise to some econometric problems. The main problem is related to multi-collinearity. The proposed solution is to use orthogonal regression. 2 Estimating using Ordinary Least Square regression The following commands compute the returns and transform to a data frame to facilitate regression using R. > r = apply(prices, 2, function(p) diff(p)/p[-length(p)]) > r = data.frame(r) Daniel Herlemont 3

The main problem is related to multi-collinearity. The proposed solution is to use orthogonal regression.

4 2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION Then we can perform a regression of stocks against the risk factors > reg.vodafone = lm(vodafone ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.vodafone) lm(formula = Vodafone ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Min 1Q Median 3Q Max (Intercept) -7.16e e NYSE.Index 8.69e e e-09 *** Communications 1.44e e ** Growth 2.04e e Large.Cap 1.01e e Residual standard error: on 1326 degrees of freedom Multiple R-squared: 0.348, Adjusted R-squared: F-statistic: 177 on 4 and 1326 DF, p-value: <2e-16 > reg.nokia = lm(nokia ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.nokia) lm(formula = Nokia ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Min 1Q Median 3Q Max Daniel Herlemont 4

131810 (Intercept) -7.16e-05 5.32e-04-0.13 0.8930 NYSE.Index 8.69e-01 1.47e-01 5.91 4.4e-09 *** Communications 1.44e-01 5.14e-02 2.81 0.0051 ** Growth 2.04e-01 1.19e-01 1.71 0.0869. Large.Cap 1.

5 2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION (Intercept) NYSE.Index Communications e-06 *** Growth Large.Cap e-13 *** Residual standard error: on 1326 degrees of freedom Multiple R-squared: 0.468, Adjusted R-squared: F-statistic: 292 on 4 and 1326 DF, p-value: <2e-16 todo: comments on the results... Suppose we build a portfolio with $3 Millions of Nokia and $1 Million of Vodafone, what is the todo: compute the following: ˆ the volatility of the portfolio ˆ the betas of the portfolio with respect to the factors, ˆ the explained variance by the factors, Expected results: > w = c(0.25, 0.75) > rptf = 0.75 * r[, "Nokia"] * r[, "Vodafone"] > covfactors = cov(r[, c("nyse.index", "Communications", "Large.Cap", + "Growth")]) > beta = 0.75 * reg.nokia$coef[-1] * reg.vodafone$coef[-1] > var.explained = t(beta) %*% covfactors %*% beta > var.total = sd(rptf)^2 > sigma.total = sd(rptf) * sqrt(252) * 100 > sigma.explained = sqrt(var.explained) * sqrt(252) * 100 Daniel Herlemont 5

468, Adjusted R-squared: 0.467 F-statistic: 292 on 4 and 1326 DF, p-value: <2e-16 todo: comments on the results.

6 3 MULTICOLLINEARITY ˆ the total variance of the portfolio is and total volatility (yearly) is 42.6% ˆ beta NYSE.Index Communications Growth Large.Cap ˆ The Variance explained by the factors is and total volatility (yearly) is 30.7% Comments? 3 Multicollinearity Multicollinearity refers to the correlation between the explanatory variables in a regression model: if one or more explanatory variables are highly correlated then it is difficult to estimate their regression coefficients. The multicollinearity problem becomes apparent when the estimated change considerably when adding another (collinear) variable to the regression. When high multicollinearity is present, confidence intervals for coefficients tend to be very wide and tstatistics tend to be very small. Coefficients will have to be larger in order to be statistically significant, i.e. it will be harder to reject the null when multicollinearity is present. There is no statistical test for multicollinearity, but a useful rule of thumb is that a model will suffer from it if the square of the pairwise correlation between explanatory variables is greater than the multiple R 2 of the regression. Todo: perform regression of the Nokia and Vodafone using ˆ one factor: NYSE.Index ˆ 2 factors: NYSE.Index and Communications ˆ 3 factors: NYSE.Index and Communications and Growth ˆ 4 factors: NYSE.Index and Communications and Growth and Large.Cap Explain the results, using the correlation matrix of the factors > r.factors = r[, c("nyse.index", "Communications", "Growth", "Large.Cap")] > cor.factors = cor(r.factors) > cor.factors Daniel Herlemont 6

3 Multicollinearity Multicollinearity refers to the correlation between the explanatory variables in a regression model: if one or more explanatory variables are highly correlated then it is

7 4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION NYSE.Index Communications Growth Large.Cap NYSE.Index Communications Growth Large.Cap Estimating Fundamental Factor Models by Orthogonal Regression The best solution to a multicollinearity problem is to apply principal component analysis and then use the principal components as explanatory variables. We apply principal component analysis to the covariance matrix of the factors: > pca = prcomp(r.factors) > pca Standard deviations: [1] Rotation: PC1 PC2 PC3 PC4 NYSE.Index Communications Growth Large.Cap > summary(pca) Importance of components: PC1 PC2 PC3 PC4 Standard deviation Proportion of Variance Cumulative Proportion > plot(pca) Daniel Herlemont 7

000 4 Estimating Fundamental Factor Models by Orthogonal Regression The best solution to a multicollinearity problem is to apply principal component analysis and then use the principal components as

8 4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION pca Variances 0e+00 2e 04 4e 04 6e 04 8e 04 Alternatively we can use eigen(cov(r.factors)). todo: using the first component (maybe the 2 main components) compute the explained variance by the components. Conclusions? Daniel Herlemont 8

9 Solutions: 4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION > pc1 = pca$rotation[, 1] > pc2 = pca$rotation[, 2] > pc3 = pca$rotation[, 3] > pc4 = pca$rotation[, 4] > pc1r = apply(r.factors, 1, function(x) sum(x * pc1)) > pc2r = apply(r.factors, 1, function(x) sum(x * pc2)) > pc3r = apply(r.factors, 1, function(x) sum(x * pc3)) > pc3r = apply(r.factors, 1, function(x) sum(x * pc4)) > summary(lm(r[, "Nokia"] ~ pc1r)) lm(formula = r[, "Nokia"] ~ pc1r) Min 1Q Median 3Q Max (Intercept) pc1r <2e-16 *** Residual standard error: on 1329 degrees of freedom Multiple R-squared: 0.451, Adjusted R-squared: F-statistic: 1.09e+03 on 1 and 1329 DF, p-value: <2e-16 > summary(lm(r[, "Nokia"] ~ pc1r + pc2r)) lm(formula = r[, "Nokia"] ~ pc1r + pc2r) Min 1Q Median 3Q Max Daniel Herlemont 9

factors, 1, function(x) sum(x * pc4)) > summary(lm(r[, "Nokia"] ~ pc1r)) lm(formula = r[, "Nokia"] ~ pc1r) Min 1Q Median 3Q Max -0.182175-0.009307-0.000295 0.008892 0.201183 (Intercept) 0.000275 0.

10 4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION (Intercept) pc1r < 2e-16 *** pc2r e-05 *** Residual standard error: on 1328 degrees of freedom Multiple R-squared: 0.459, Adjusted R-squared: F-statistic: 563 on 2 and 1328 DF, p-value: <2e-16 > summary(lm(r[, "Vodafone"] ~ pc1r)) lm(formula = r[, "Vodafone"] ~ pc1r) Min 1Q Median 3Q Max (Intercept) pc1r <2e-16 *** Residual standard error: 0.02 on 1329 degrees of freedom Multiple R-squared: 0.307, Adjusted R-squared: F-statistic: 587 on 1 and 1329 DF, p-value: <2e-16 > summary(lm(r[, "Vodafone"] ~ pc1r + pc2r)) lm(formula = r[, "Vodafone"] ~ pc1r + pc2r) Daniel Herlemont 10

458 F-statistic: 563 on 2 and 1328 DF, p-value: <2e-16 > summary(lm(r[, "Vodafone"] ~ pc1r)) lm(formula = r[, "Vodafone"] ~ pc1r) Min 1Q Median 3Q Max -0.112669-0.010215-0.000164 0.009569 0.

11 Min 1Q Median 3Q Max (Intercept) pc1r <2e-16 *** pc2r <2e-16 *** Residual standard error: on 1328 degrees of freedom Multiple R-squared: 0.343, Adjusted R-squared: F-statistic: 346 on 2 and 1328 DF, p-value: <2e-16 5 References [1] ALEXANDER, C. Market Risk Analysis: Practical Financial Econometrics. Wiley, Daniel Herlemont 11

0195 on 1328 degrees of freedom Multiple R-squared: 0.343, Adjusted R-squared: 0.

Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is