Multi Factors Model. Daniel Herlemont. March 31, 2009. 2 Estimating using Ordinary Least Square regression 3



Similar documents
Multiple Linear Regression

Correlation and Simple Linear Regression

Regression step-by-step using Microsoft Excel

Statistical Models in R

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Simple Linear Regression Inference

5. Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Comparing Nested Models

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Part 2: Analysis of Relationship Between Two Variables

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Final Exam Practice Problem Answers

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Using R for Linear Regression

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Regression Analysis (Spring, 2000)

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

Week 5: Multiple Linear Regression

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Psychology 205: Research Methods in Psychology

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Factors affecting online sales

Univariate Regression

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Lucky vs. Unlucky Teams in Sports

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Chapter 3 Quantitative Demand Analysis

Exchange Rate Regime Analysis for the Chinese Yuan

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Premaster Statistics Tutorial 4 Full solutions

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

n + n log(2π) + n log(rss/n)

Additional sources Compilation of sources:

Developing a Stock Price Model Using Investment Valuation Ratios for the Financial Industry Of the Philippine Stock Market

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Testing for Lack of Fit

Module 5: Multiple Regression Analysis

N-Way Analysis of Variance

August 2012 EXAMINATIONS Solution Part I

Introduction to Hierarchical Linear Modeling with R

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

2. Linear regression with multiple regressors

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Part II. Multiple Linear Regression

Week TSX Index

Multiple Regression: What Is It?

Regression Analysis: A Complete Example

Chapter 7: Simple linear regression Learning Objectives

MULTIPLE REGRESSION EXAMPLE

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

How Far is too Far? Statistical Outlier Detection

ANOVA. February 12, 2015

Implementing Panel-Corrected Standard Errors in R: The pcse Package

SPSS Guide: Regression Analysis

FACTOR ANALYSIS NASC

Estimation of σ 2, the variance of ɛ

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Generalized Linear Models

Financial Risk Management Exam Sample Questions/Answers

Exchange Rate Regime Analysis for the Indian Rupee

Statistical Models in R

Causal Forecasting Models

Sales forecasting # 1

Factor Analysis. Chapter 420. Introduction

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

MIXED MODEL ANALYSIS USING R

An Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, Urbana-Champaign

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

2013 MBA Jump Start Program. Statistics Module Part 3

Multivariate Logistic Regression

Sensex Realized Volatility Index

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

11. Analysis of Case-control Studies Logistic Regression

1 Simple Linear Regression I Least Squares Estimation

Econometric Modelling for Revenue Projections

Dimensionality Reduction: Principal Components Analysis

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Chapter 7 Section 1 Homework Set A

Directions for using SPSS

Simple linear regression

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

Simple Regression Theory II 2010 Samuel L. Baker

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang

Investment Statistics: Definitions & Formulas

Least Squares Estimation

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM

DOES IT PAY TO HAVE FAT TAILS? EXAMINING KURTOSIS AND THE CROSS-SECTION OF STOCK RETURNS

T-test & factor analysis

Data Analysis Tools. Tools for Summarizing Data

Transcription:

Multi Factors Model Daniel Herlemont March 31, 2009 Contents 1 Introduction 1 2 Estimating using Ordinary Least Square regression 3 3 Multicollinearity 6 4 Estimating Fundamental Factor Models by Orthogonal Regression 7 5 References 11 1 Introduction The objective of this practical work is to provide an empirical case study of factor decomposition using historical prices of two stocks (Nokia and Vodafone) and four fundamental factors: ˆ a broad market index, The New York Stock Exchange (NYSE) composite index, ˆ an industry factor, a Mutual Communication fund, ˆ a growth style factor, the Riverside growth fund and ˆ a large caps factor, the AFBA Five Star Large Cap fund. source: Carol Alexander, see [1], study case II.1.4 1

1 INTRODUCTION Download the data at /downloads/alexander-case-study-ii-1-4.csv the data to your working directory and read them by the command quotes=read.csv("alexander-case-study-ii-1-4.csv") This work can be performed under Excel (download the package /downloads/matrix.zip Use the following code to read the data and plot the prices > dates = as.date(quotes[, 1], "%d/%m/%y") > prices = quotes[, -1] > prices = apply(prices, 2, function(p) p/p[1]) > n = ncol(prices) > matplot(dates, prices, type = "l", col = 1:n, lty = 1:n, xaxt = "n") > axis.date(1, dates) > legend(min(dates), max(prices), colnames(prices), col = 1:n, + lty = 1:n, cex = 0.7) Daniel Herlemont 2

2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION prices 0.2 0.4 0.6 0.8 1.0 1.2 Vodafone Nokia NYSE.Index Communications Growth Large.Cap 2001 2002 2003 2004 2005 2006 dates Using regression to build a multi factor model with these factors gives rise to some econometric problems. The main problem is related to multi-collinearity. The proposed solution is to use orthogonal regression. 2 Estimating using Ordinary Least Square regression The following commands compute the returns and transform to a data frame to facilitate regression using R. > r = apply(prices, 2, function(p) diff(p)/p[-length(p)]) > r = data.frame(r) Daniel Herlemont 3

2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION Then we can perform a regression of stocks against the risk factors > reg.vodafone = lm(vodafone ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.vodafone) lm(formula = Vodafone ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Min 1Q Median 3Q Max -0.110331-0.009820-0.000308 0.009155 0.131810 (Intercept) -7.16e-05 5.32e-04-0.13 0.8930 NYSE.Index 8.69e-01 1.47e-01 5.91 4.4e-09 *** Communications 1.44e-01 5.14e-02 2.81 0.0051 ** Growth 2.04e-01 1.19e-01 1.71 0.0869. Large.Cap 1.01e-02 1.35e-01 0.07 0.9403 Residual standard error: 0.0194 on 1326 degrees of freedom Multiple R-squared: 0.348, Adjusted R-squared: 0.346 F-statistic: 177 on 4 and 1326 DF, p-value: <2e-16 > reg.nokia = lm(nokia ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.nokia) lm(formula = Nokia ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Min 1Q Median 3Q Max Daniel Herlemont 4

2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION -0.175062-0.009665-0.000142 0.008843 0.217256 (Intercept) 0.000217 0.000620 0.35 0.73 NYSE.Index -0.260330 0.171240-1.52 0.13 Communications 0.265789 0.059836 4.44 9.7e-06 *** Growth 0.209248 0.138489 1.51 0.13 Large.Cap 1.142582 0.157037 7.28 5.9e-13 *** Residual standard error: 0.0226 on 1326 degrees of freedom Multiple R-squared: 0.468, Adjusted R-squared: 0.467 F-statistic: 292 on 4 and 1326 DF, p-value: <2e-16 todo: comments on the results... Suppose we build a portfolio with $3 Millions of Nokia and $1 Million of Vodafone, what is the todo: compute the following: ˆ the volatility of the portfolio ˆ the betas of the portfolio with respect to the factors, ˆ the explained variance by the factors, Expected results: > w = c(0.25, 0.75) > rptf = 0.75 * r[, "Nokia"] + 0.25 * r[, "Vodafone"] > covfactors = cov(r[, c("nyse.index", "Communications", "Large.Cap", + "Growth")]) > beta = 0.75 * reg.nokia$coef[-1] + 0.25 * reg.vodafone$coef[-1] > var.explained = t(beta) %*% covfactors %*% beta > var.total = sd(rptf)^2 > sigma.total = sd(rptf) * sqrt(252) * 100 > sigma.explained = sqrt(var.explained) * sqrt(252) * 100 Daniel Herlemont 5

3 MULTICOLLINEARITY ˆ the total variance of the portfolio is 0.00072 and total volatility (yearly) is 42.6% ˆ beta NYSE.Index Communications Growth Large.Cap 0.0220 0.2354 0.2079 0.8595 ˆ The Variance explained by the factors is 0.000375 and total volatility (yearly) is 30.7% Comments? 3 Multicollinearity Multicollinearity refers to the correlation between the explanatory variables in a regression model: if one or more explanatory variables are highly correlated then it is difficult to estimate their regression coefficients. The multicollinearity problem becomes apparent when the estimated change considerably when adding another (collinear) variable to the regression. When high multicollinearity is present, confidence intervals for coefficients tend to be very wide and tstatistics tend to be very small. Coefficients will have to be larger in order to be statistically significant, i.e. it will be harder to reject the null when multicollinearity is present. There is no statistical test for multicollinearity, but a useful rule of thumb is that a model will suffer from it if the square of the pairwise correlation between explanatory variables is greater than the multiple R 2 of the regression. Todo: perform regression of the Nokia and Vodafone using ˆ one factor: NYSE.Index ˆ 2 factors: NYSE.Index and Communications ˆ 3 factors: NYSE.Index and Communications and Growth ˆ 4 factors: NYSE.Index and Communications and Growth and Large.Cap Explain the results, using the correlation matrix of the factors > r.factors = r[, c("nyse.index", "Communications", "Growth", "Large.Cap")] > cor.factors = cor(r.factors) > cor.factors Daniel Herlemont 6

4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION NYSE.Index Communications Growth Large.Cap NYSE.Index 1.000 0.689 0.844 0.909 Communications 0.689 1.000 0.880 0.834 Growth 0.844 0.880 1.000 0.892 Large.Cap 0.909 0.834 0.892 1.000 4 Estimating Fundamental Factor Models by Orthogonal Regression The best solution to a multicollinearity problem is to apply principal component analysis and then use the principal components as explanatory variables. We apply principal component analysis to the covariance matrix of the factors: > pca = prcomp(r.factors) > pca Standard deviations: [1] 0.031355 0.008992 0.004167 0.002782 Rotation: PC1 PC2 PC3 PC4 NYSE.Index 0.2588-0.6099-0.0966 0.7427 Communications 0.7963 0.5640-0.1407 0.1674 Growth 0.3915-0.2687 0.8447-0.2472 Large.Cap 0.3817-0.4875-0.5074-0.5993 > summary(pca) Importance of components: PC1 PC2 PC3 PC4 Standard deviation 0.03 0.009 0.004 0.003 Proportion of Variance 0.90 0.074 0.016 0.007 Cumulative Proportion 0.90 0.977 0.993 1.000 > plot(pca) Daniel Herlemont 7

4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION pca Variances 0e+00 2e 04 4e 04 6e 04 8e 04 Alternatively we can use eigen(cov(r.factors)). todo: using the first component (maybe the 2 main components) compute the explained variance by the components. Conclusions? Daniel Herlemont 8

Solutions: 4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION > pc1 = pca$rotation[, 1] > pc2 = pca$rotation[, 2] > pc3 = pca$rotation[, 3] > pc4 = pca$rotation[, 4] > pc1r = apply(r.factors, 1, function(x) sum(x * pc1)) > pc2r = apply(r.factors, 1, function(x) sum(x * pc2)) > pc3r = apply(r.factors, 1, function(x) sum(x * pc3)) > pc3r = apply(r.factors, 1, function(x) sum(x * pc4)) > summary(lm(r[, "Nokia"] ~ pc1r)) lm(formula = r[, "Nokia"] ~ pc1r) Min 1Q Median 3Q Max -0.182175-0.009307-0.000295 0.008892 0.201183 (Intercept) 0.000275 0.000628 0.44 0.66 pc1r 0.662287 0.020043 33.04 <2e-16 *** Residual standard error: 0.0229 on 1329 degrees of freedom Multiple R-squared: 0.451, Adjusted R-squared: 0.451 F-statistic: 1.09e+03 on 1 and 1329 DF, p-value: <2e-16 > summary(lm(r[, "Nokia"] ~ pc1r + pc2r)) lm(formula = r[, "Nokia"] ~ pc1r + pc2r) Min 1Q Median 3Q Max -0.181130-0.009391-0.000152 0.008528 0.212437 Daniel Herlemont 9

4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION (Intercept) 0.000178 0.000624 0.28 0.78 pc1r 0.662287 0.019907 33.27 < 2e-16 *** pc2r -0.304551 0.069417-4.39 1.2e-05 *** Residual standard error: 0.0228 on 1328 degrees of freedom Multiple R-squared: 0.459, Adjusted R-squared: 0.458 F-statistic: 563 on 2 and 1328 DF, p-value: <2e-16 > summary(lm(r[, "Vodafone"] ~ pc1r)) lm(formula = r[, "Vodafone"] ~ pc1r) Min 1Q Median 3Q Max -0.112669-0.010215-0.000164 0.009569 0.126809 (Intercept) 0.000140 0.000548 0.26 0.8 pc1r 0.423424 0.017470 24.24 <2e-16 *** Residual standard error: 0.02 on 1329 degrees of freedom Multiple R-squared: 0.307, Adjusted R-squared: 0.306 F-statistic: 587 on 1 and 1329 DF, p-value: <2e-16 > summary(lm(r[, "Vodafone"] ~ pc1r + pc2r)) lm(formula = r[, "Vodafone"] ~ pc1r + pc2r) Daniel Herlemont 10

Min 1Q Median 3Q Max -0.111048-0.009771-0.000363 0.009244 0.132099 (Intercept) -0.000022 0.000534-0.04 0.97 pc1r 0.423424 0.017013 24.89 <2e-16 *** pc2r -0.508226 0.059325-8.57 <2e-16 *** Residual standard error: 0.0195 on 1328 degrees of freedom Multiple R-squared: 0.343, Adjusted R-squared: 0.342 F-statistic: 346 on 2 and 1328 DF, p-value: <2e-16 5 References [1] ALEXANDER, C. Market Risk Analysis: Practical Financial Econometrics. Wiley, 2008. Daniel Herlemont 11