Regression Analysis. Pekka Tolonen

Similar documents
Simple Linear Regression Inference

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Univariate Regression

2013 MBA Jump Start Program. Statistics Module Part 3

Introduction to Regression and Data Analysis

Causal Forecasting Models

Estimation of σ 2, the variance of ɛ

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

2. Simple Linear Regression

Regression Analysis: A Complete Example

Week TSX Index

Simple Regression Theory II 2010 Samuel L. Baker

Factors affecting online sales

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

17. SIMPLE LINEAR REGRESSION II

Part 2: Analysis of Relationship Between Two Variables

One-Way Analysis of Variance (ANOVA) Example Problem

Portfolio Performance Measures

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Simple Linear Regression I Least Squares Estimation

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Chapter 7 Risk, Return, and the Capital Asset Pricing Model

Module 5: Multiple Regression Analysis

SPSS Guide: Regression Analysis

EQUITY STRATEGY RESEARCH.

Econometrics Simple Linear Regression

Review for Exam 2. Instructions: Please read carefully

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Example: Boats and Manatees

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

Quantitative Methods for Finance

August 2012 EXAMINATIONS Solution Part I

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Review for Exam 2. Instructions: Please read carefully

5. Linear Regression

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Study Guide for the Final Exam

Models of Risk and Return

2. Linear regression with multiple regressors

Descriptive Statistics

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Hypothesis testing - Steps

Multiple Linear Regression

Using Duration Times Spread to Forecast Credit Risk

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Time Series Analysis

Journal of Exclusive Management Science May Vol 4 Issue 5 - ISSN

Least Squares Estimation

Final Exam Practice Problem Answers

CHAPTER 11: ARBITRAGE PRICING THEORY

Chapter 5 Risk and Return ANSWERS TO SELECTED END-OF-CHAPTER QUESTIONS

Simple linear regression

Simple Methods and Procedures Used in Forecasting

Exercise 1.12 (Pg )

The Wondrous World of fmri statistics

Chapter 7: Simple linear regression Learning Objectives

CHAPTER 10 RISK AND RETURN: THE CAPITAL ASSET PRICING MODEL (CAPM)

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Using R for Linear Regression

Notes on Applied Linear Regression

A Primer on Forecasting Business Performance

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Regression step-by-step using Microsoft Excel

Applying Statistics Recommended by Regulatory Documents

AFM 472. Midterm Examination. Monday Oct. 24, A. Huang

Week 5: Multiple Linear Regression

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Elements of statistics (MATH0487-1)

DATA INTERPRETATION AND STATISTICS

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

NIKE Case Study Solutions

Section 1: Simple Linear Regression

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Chapter 23. Inferences for Regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Regression III: Advanced Methods

Chapter 6: Multivariate Cointegration Analysis

Testing for Granger causality between stock prices and economic growth

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Statistical Functions in Excel

CHAPTER 7: OPTIMAL RISKY PORTFOLIOS

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Additional sources Compilation of sources:

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

Transcription:

Regression Analysis Pekka Tolonen

Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model Procedures provided by Excel and SAS

Why Regression Analysis? In statistics, regression analysis includes any techniques for modelling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables More specifically, regression analysis helps us to understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed

Uses of Regressions Regression analysis is widely used for prediction and forecasting Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships In (very) restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. >> Causal relationship is a very strong claim

Techniques A large body of techniques for carrying out regression analysis has been developed. Familiar methods such as linear regression and ordinary least squares (OLS) regression are parametric that is, the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data Nonparametric regression refers to techniques that allow the regression function to lie in a specified set of functions, which may be infinite dimensional

Simple Linear Regression The simple linear regression model is: y i = β 1 + β 2 x i + e i y i : Dependent variable; x i : Regressor (Independent variable) e i : A random error term (residual) Regression parameters: β 1 : The intercept; β 2 : The slope coefficient In principle, the residual should account for all the movements in Y that cannot be explained by X

Simple linear regression The essence of regression analysis is that any observation on the dependent variable y can be decomposed into two parts: (1) a systematic component; and (2) a random component The dependent variable y is explained by a component that varies systematically with the independent variable and by the error term e.

Assumptions of the Linear Regression Model 1. The e i are statistically independent of each other 2. The e i have a constant variance, σ 2, for all values of x i 3. The e i are normally distributed with mean 0 4. The means of the dependent variable Y fall on a straight line for all values of the independent variable X 5. The variable X must take at least two different values

Model Estimation The Least Squares Principle We estimate the parameters β 1 and β 2 using the method based on the least squares principle This principle asserts that to fit a line to the data values we should fit the line so that the sum of the squares of the vertical distance from each point to the line is small as possible The distances are squares to prevent large positive distances from being canceled by large negative distances

Model Estimation The Least Squares Principle The fitted line is then: y i = b 1 + b 2 x i The vertical distances from each point to the fitted line are the least squares residuals e i = y i y i = y i b 1 b 2 x i, i = 1,2,, n

The Least Squares Estimators The sum of squares function is: n s β 1, β 2 = (y i β 1 β 2 x i )² i=1 The values for the unknown parameters β 1 and β 2 that minimize the sum of squares function are given by b 2 = (y i y )(x i x ) (x i x ) 2 b 1 = y b 2 x, where x and y are the sample means of the observations on x and y

Empirical example The regression equation is a linear equation of the form: y i = b 1 + b 2 x i Consider an example where returns of an investment portfolio are regressed against returns of the market index R i = β 1 + β 2 R m,i + e i The least squares estimates are b 1 =0.37; b 2 = 1.002 Therefore, as b 2 is close to 1, the portfolio return changes approximately 1% if a market return changes 1%

Empirical Example Portfolio Return 10.0 8.0 6.0 Data Points Regression line (Fitted Values: y i = 0.37 + 1.002 x i ) 4.0 2.0 0.0-2.0-4.0-6.0-4.0-2.0 0.0 2.0 4.0 6.0 8.0 Market Index

Goodness of Fit: R² The quality of a regression model is often measured in terms of its ability to explain the movements of the dependent variable Measures of goodness of fit typically summarize the deviation between observed values and the values expected

R² The variability of the data set is measured through different sums of squares: SST = (y i y )²; SSS = y i y ²; SSS = e i² SST = Total sum of squares SSR = Explained sum of squares; (y i refers to the fitted values) SSE = Error sum of squares SST = SSR + SSE R 2 = SSS SSS = 1 SSS SSS

Variance of the Error Term The estimated variance of the error term can be measured as σ 2 = e i² T 2 Sum the estimated squared residuals and divide by T-2 where T is the number of observations

Hypothesis Testing and Statistical Significance The test statistic is t = b 2 β 2 ss(b 2 ) se b 2 = ~t T 2, where σ ² (x i x )² and σ 2 = e i² is the estimated variance of the error term T 2 The random variable t has a t-distribution with (T-2) degrees of freedom where T is the number of observations

Statistical Significance In the t-test statistic, the denominator se b 2 is the standard error The hypothesis test is usually carried out by determining a critical t value t c, which corresponds to the confidence interval of choice (typically 95% or 99%) in order that we can reject the null hypothesis if t t c

Statistical Significance: Example Assume the following beta of an investment fund: b 2 = 0.08 The number of observations: T=72 The degrees of freedom is therefore: T-2 = 70 The standard error is: ss b 2 = 3.36 Confidence level: 95% The critical t-value for 95% confidence with 70 degrees of freedom is 1.994 (t c = 1.994) Test H 0 : β 2 = 0; H a : β 2 0 The test statistic is: t = b 2 β 2 ss(b 2 ) = 0.08 0 3.36 = 0.023

Critical Values for the t-distribution Degrees of Significance level=α Freedom 0.1 0.05 0.02 0.01 1 6.314 12.706 31.821 63.657 2 2.920 4.303 6.965 9.925 3 2.353 3.182 4.541 5.841 4 2.132 2.776 3.747 4.604 5 2.015 2.571 3.365 4.032 6 1.943 2.447 3.143 3.707 7 1.895 2.365 2.998 3.499 8 1.860 2.306 2.896 3.355 9 1.833 2.262 2.821 3.250 10 1.812 2.228 2.764 3.169............... 70 1.667 1.994 2.381 2.648 80 1.664 1.990 2.374 2.639 90 1.662 1.987 2.368 2.632 The critical t-value for 95% confidence (5% significance level) with 70 degrees of freedom is: t c =1.994 The test statistic t is: t = b 2 β 2 ss(b 2 ) = 0.08 0 = 0.023 3.36 Since t < t c we cannot be sure at 95% if the fund has a beta which is not different from zero In other words, we cannot reject the null hypothesis Excel function: =TINV(probability, degrees of freedom) =TINV(0.05,70)

Application: Capital Asset Pricing Model In finance, the CAPM is used to determine a theoretically appropriate required rate of return of an asset Based on the CAPM, the expected rate of return of any security is measured proportional to the beta with respect to the market risk premium: E R = R f + β i (E R m R f ) E(R) is the expected return of a security, R f is the risk-free rate (e.g. interest of government bond), and E R m R f is the market premium

Empirical Setup of the CAPM In the CAPM model we can factor in the risk-free rate and use the revised regression equation to calculate a new beta r i r F,i = α + β r M,i r F,i + e i r i : Return of an asset or an investment portfolio r F,i : Risk-free rate r M,i : Market index (benchmark) α: Intercept of the model β: Beta measures the sensitivity of the return to the variation in the market return e is the error term

The CAPM The CAPM provides an estimate of the asset s expected return. If the model is correct, the intercept should be zero The model decomposes the return of the security to (1) the systematic component and; (2) the specific component which is not related to movement in the market index

CAPM Beta The estimate of the beta is β = (r i r F,i r r F) (r M,i r F,i r M r F) (r M,i r F,i r M r F) ² The systematic component of the return at point i is β r M,i r F,i

Jensen s Alpha (Abnormal Return) Alpha is the part of the return which is not explained by the model At the point i the alpha is α i = α + e i The average alpha equals to α (the model intercept) since the mean of the error term is zero

Correlation Is Closely Connected to Beta Correlation: ρ = CCCCCCCCCC σ σ M = SSSSSSSSSS rrrr TTTTT RRRR, or equivalently: ρ = β σ M, σ where σ is the standard deviation of the dependent variable Therefore, beta and correlation are linked by the formula β = ρ σ σ M

Systematic and Specific Risk The estimates of the regression can be used to measure how large proportion of the total risk comes from the systematic component σ S = β σ M Specific risk: sss(e), that is the standard deviation of the model s error term

Decomposition of Total Risk The residual or specific risk is not attributed to general market movements but is unique to the particular security The total risk can be decomposed into systematic and specific component as follows: Total Risk² = systematic risk² + specific risk²

Other Applications in Finance Regression analysis is widely used in performance evaluation of investment portfolios. In the CAPM framework one may examine the alpha of the stock portfolio with respect to the market benchmark The idea is to decompose the portfolio return into Alpha (Stock-picking skill) and systematic Component which is the part of the return explained by the movements in a market benchmark Alpha is important: The objective in investing is to generate alpha that is, beat the market

Extensions to Simple Linear Regression Model Models with more than one regressor they are called as multivariate models In finance, they are called multifactor models More than one factor explain returns of assets (e.g. equities and bonds) Model diagnostics are important: normality or the serial correlation of the error term, etc. Alternative estimation methodologies

In Excel The Excel function Slope generates the least squares estimate of the beta For instance, if stock excess returns are in Column A (A1:A100) and market index excess returns are in Column B (B1:B100), the slope function gives the estimate of the beta =slope(a1:a100,b1:b100)

In SAS In SAS, proc reg procedure estimates the parameters of a regression model proc reg data = aaa outest = bbb ; model y = x ; run ; Input data is in the dataset aaa and estimates are saved to the output data bbb