Regression Analysis Pekka Tolonen
Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model Procedures provided by Excel and SAS
Why Regression Analysis? In statistics, regression analysis includes any techniques for modelling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables More specifically, regression analysis helps us to understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed
Uses of Regressions Regression analysis is widely used for prediction and forecasting Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships In (very) restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. >> Causal relationship is a very strong claim
Techniques A large body of techniques for carrying out regression analysis has been developed. Familiar methods such as linear regression and ordinary least squares (OLS) regression are parametric that is, the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data Nonparametric regression refers to techniques that allow the regression function to lie in a specified set of functions, which may be infinite dimensional
Simple Linear Regression The simple linear regression model is: y i = β 1 + β 2 x i + e i y i : Dependent variable; x i : Regressor (Independent variable) e i : A random error term (residual) Regression parameters: β 1 : The intercept; β 2 : The slope coefficient In principle, the residual should account for all the movements in Y that cannot be explained by X
Simple linear regression The essence of regression analysis is that any observation on the dependent variable y can be decomposed into two parts: (1) a systematic component; and (2) a random component The dependent variable y is explained by a component that varies systematically with the independent variable and by the error term e.
Assumptions of the Linear Regression Model 1. The e i are statistically independent of each other 2. The e i have a constant variance, σ 2, for all values of x i 3. The e i are normally distributed with mean 0 4. The means of the dependent variable Y fall on a straight line for all values of the independent variable X 5. The variable X must take at least two different values
Model Estimation The Least Squares Principle We estimate the parameters β 1 and β 2 using the method based on the least squares principle This principle asserts that to fit a line to the data values we should fit the line so that the sum of the squares of the vertical distance from each point to the line is small as possible The distances are squares to prevent large positive distances from being canceled by large negative distances
Model Estimation The Least Squares Principle The fitted line is then: y i = b 1 + b 2 x i The vertical distances from each point to the fitted line are the least squares residuals e i = y i y i = y i b 1 b 2 x i, i = 1,2,, n
The Least Squares Estimators The sum of squares function is: n s β 1, β 2 = (y i β 1 β 2 x i )² i=1 The values for the unknown parameters β 1 and β 2 that minimize the sum of squares function are given by b 2 = (y i y )(x i x ) (x i x ) 2 b 1 = y b 2 x, where x and y are the sample means of the observations on x and y
Empirical example The regression equation is a linear equation of the form: y i = b 1 + b 2 x i Consider an example where returns of an investment portfolio are regressed against returns of the market index R i = β 1 + β 2 R m,i + e i The least squares estimates are b 1 =0.37; b 2 = 1.002 Therefore, as b 2 is close to 1, the portfolio return changes approximately 1% if a market return changes 1%
Empirical Example Portfolio Return 10.0 8.0 6.0 Data Points Regression line (Fitted Values: y i = 0.37 + 1.002 x i ) 4.0 2.0 0.0-2.0-4.0-6.0-4.0-2.0 0.0 2.0 4.0 6.0 8.0 Market Index
Goodness of Fit: R² The quality of a regression model is often measured in terms of its ability to explain the movements of the dependent variable Measures of goodness of fit typically summarize the deviation between observed values and the values expected
R² The variability of the data set is measured through different sums of squares: SST = (y i y )²; SSS = y i y ²; SSS = e i² SST = Total sum of squares SSR = Explained sum of squares; (y i refers to the fitted values) SSE = Error sum of squares SST = SSR + SSE R 2 = SSS SSS = 1 SSS SSS
Variance of the Error Term The estimated variance of the error term can be measured as σ 2 = e i² T 2 Sum the estimated squared residuals and divide by T-2 where T is the number of observations
Hypothesis Testing and Statistical Significance The test statistic is t = b 2 β 2 ss(b 2 ) se b 2 = ~t T 2, where σ ² (x i x )² and σ 2 = e i² is the estimated variance of the error term T 2 The random variable t has a t-distribution with (T-2) degrees of freedom where T is the number of observations
Statistical Significance In the t-test statistic, the denominator se b 2 is the standard error The hypothesis test is usually carried out by determining a critical t value t c, which corresponds to the confidence interval of choice (typically 95% or 99%) in order that we can reject the null hypothesis if t t c
Statistical Significance: Example Assume the following beta of an investment fund: b 2 = 0.08 The number of observations: T=72 The degrees of freedom is therefore: T-2 = 70 The standard error is: ss b 2 = 3.36 Confidence level: 95% The critical t-value for 95% confidence with 70 degrees of freedom is 1.994 (t c = 1.994) Test H 0 : β 2 = 0; H a : β 2 0 The test statistic is: t = b 2 β 2 ss(b 2 ) = 0.08 0 3.36 = 0.023
Critical Values for the t-distribution Degrees of Significance level=α Freedom 0.1 0.05 0.02 0.01 1 6.314 12.706 31.821 63.657 2 2.920 4.303 6.965 9.925 3 2.353 3.182 4.541 5.841 4 2.132 2.776 3.747 4.604 5 2.015 2.571 3.365 4.032 6 1.943 2.447 3.143 3.707 7 1.895 2.365 2.998 3.499 8 1.860 2.306 2.896 3.355 9 1.833 2.262 2.821 3.250 10 1.812 2.228 2.764 3.169............... 70 1.667 1.994 2.381 2.648 80 1.664 1.990 2.374 2.639 90 1.662 1.987 2.368 2.632 The critical t-value for 95% confidence (5% significance level) with 70 degrees of freedom is: t c =1.994 The test statistic t is: t = b 2 β 2 ss(b 2 ) = 0.08 0 = 0.023 3.36 Since t < t c we cannot be sure at 95% if the fund has a beta which is not different from zero In other words, we cannot reject the null hypothesis Excel function: =TINV(probability, degrees of freedom) =TINV(0.05,70)
Application: Capital Asset Pricing Model In finance, the CAPM is used to determine a theoretically appropriate required rate of return of an asset Based on the CAPM, the expected rate of return of any security is measured proportional to the beta with respect to the market risk premium: E R = R f + β i (E R m R f ) E(R) is the expected return of a security, R f is the risk-free rate (e.g. interest of government bond), and E R m R f is the market premium
Empirical Setup of the CAPM In the CAPM model we can factor in the risk-free rate and use the revised regression equation to calculate a new beta r i r F,i = α + β r M,i r F,i + e i r i : Return of an asset or an investment portfolio r F,i : Risk-free rate r M,i : Market index (benchmark) α: Intercept of the model β: Beta measures the sensitivity of the return to the variation in the market return e is the error term
The CAPM The CAPM provides an estimate of the asset s expected return. If the model is correct, the intercept should be zero The model decomposes the return of the security to (1) the systematic component and; (2) the specific component which is not related to movement in the market index
CAPM Beta The estimate of the beta is β = (r i r F,i r r F) (r M,i r F,i r M r F) (r M,i r F,i r M r F) ² The systematic component of the return at point i is β r M,i r F,i
Jensen s Alpha (Abnormal Return) Alpha is the part of the return which is not explained by the model At the point i the alpha is α i = α + e i The average alpha equals to α (the model intercept) since the mean of the error term is zero
Correlation Is Closely Connected to Beta Correlation: ρ = CCCCCCCCCC σ σ M = SSSSSSSSSS rrrr TTTTT RRRR, or equivalently: ρ = β σ M, σ where σ is the standard deviation of the dependent variable Therefore, beta and correlation are linked by the formula β = ρ σ σ M
Systematic and Specific Risk The estimates of the regression can be used to measure how large proportion of the total risk comes from the systematic component σ S = β σ M Specific risk: sss(e), that is the standard deviation of the model s error term
Decomposition of Total Risk The residual or specific risk is not attributed to general market movements but is unique to the particular security The total risk can be decomposed into systematic and specific component as follows: Total Risk² = systematic risk² + specific risk²
Other Applications in Finance Regression analysis is widely used in performance evaluation of investment portfolios. In the CAPM framework one may examine the alpha of the stock portfolio with respect to the market benchmark The idea is to decompose the portfolio return into Alpha (Stock-picking skill) and systematic Component which is the part of the return explained by the movements in a market benchmark Alpha is important: The objective in investing is to generate alpha that is, beat the market
Extensions to Simple Linear Regression Model Models with more than one regressor they are called as multivariate models In finance, they are called multifactor models More than one factor explain returns of assets (e.g. equities and bonds) Model diagnostics are important: normality or the serial correlation of the error term, etc. Alternative estimation methodologies
In Excel The Excel function Slope generates the least squares estimate of the beta For instance, if stock excess returns are in Column A (A1:A100) and market index excess returns are in Column B (B1:B100), the slope function gives the estimate of the beta =slope(a1:a100,b1:b100)
In SAS In SAS, proc reg procedure estimates the parameters of a regression model proc reg data = aaa outest = bbb ; model y = x ; run ; Input data is in the dataset aaa and estimates are saved to the output data bbb