TESTING IN MULTIPLE REGRESSION ANALYSIS UDC 519.233.5. Vera Djordjević



Similar documents
Part 2: Analysis of Relationship Between Two Variables

Simple Linear Regression Inference

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Univariate Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Causal Forecasting Models

17. SIMPLE LINEAR REGRESSION II

5. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Example: Boats and Manatees

Module 5: Multiple Regression Analysis

SPSS Guide: Regression Analysis

Introduction to General and Generalized Linear Models

General Regression Formulae ) (N-2) (1 - r 2 YX

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Least Squares Estimation

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Econometrics Simple Linear Regression

Testing for Granger causality between stock prices and economic growth

Regression Analysis: A Complete Example

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

2. Simple Linear Regression

International Statistical Institute, 56th Session, 2007: Phil Everson

Simple Regression Theory II 2010 Samuel L. Baker

Descriptive Statistics

1.5 Oneway Analysis of Variance

Multiple Linear Regression

Lecture Notes Module 1

1 Simple Linear Regression I Least Squares Estimation

13. Poisson Regression Analysis

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Factors affecting online sales

Testing for Lack of Fit

Using R for Linear Regression

CRITICAL FACTORS AFFECTING THE UTILIZATION OF CLOUD COMPUTING Alberto Daniel Salinas Montemayor 1, Jesús Fabián Lopez 2, Jesús Cruz Álvarez 3

Randomized Block Analysis of Variance

Introduction to Quantitative Methods

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

August 2012 EXAMINATIONS Solution Part I

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Final Exam Practice Problem Answers

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

REGRESSION MODEL OF SALES VOLUME FROM WHOLESALE WAREHOUSE

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Week TSX Index

One-Way Analysis of Variance

Chapter 13 Introduction to Linear Regression and Correlation Analysis

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Getting Correct Results from PROC REG

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Section 13, Part 1 ANOVA. Analysis Of Variance

Determining Minimum Sample Sizes for Estimating Prediction Equations for College Freshman Grade Average

Rockefeller College University at Albany

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

Simple linear regression

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

The Use of Event Studies in Finance and Economics. Fall Gerald P. Dwyer, Jr.

Study Guide for the Final Exam

The correlation coefficient

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

A Primer on Forecasting Business Performance

HYPOTHESIS TESTING: POWER OF THE TEST

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Multivariate normal distribution and testing for means (see MKB Ch 3)

Estimation of σ 2, the variance of ɛ

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Permutation Tests for Comparing Two Populations

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

2 Sample t-test (unequal sample sizes and unequal variances)

Experimental Designs (revisited)

Chapter 5: Bivariate Cointegration Analysis

Interaction between quantitative predictors

5. Multiple regression

Pearson's Correlation Tests

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

A logistic approximation to the cumulative normal distribution

Logistic Regression (a type of Generalized Linear Model)

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Module 3: Correlation and Covariance

MEASURES OF VARIATION

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Elements of statistics (MATH0487-1)

You have data! What s next?

Research Methods & Experimental Design

Non-Parametric Tests (I)

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Difference of Means and ANOVA Problems

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Point Biserial Correlation Tests

Week 5: Multiple Linear Regression

Transcription:

FACTA UNIVERITATI eries: Economics and Organization Vol. 1, N o 10, 00, pp. 5-9 TETING IN MULTIPLE REGREION ANALYI UDC 519.33.5 Vera Djordjević Faculty of Economics, University of Niš, 18000 Niš, erbia and Montenegro Abstract. Multiple regression analysis takes great place in the statistical science and practice. F. Galton, English statistician and geneticist, has written that the aim of statistical science is "to find out methods in collecting large groups of data in short, condensend terms convenient for discussion". This thought, although it was told more than one century ago, can be applied to the subject and method of multiple linear regression. The significance of this method is in its usage to predict and estimate the value of this phenomenon (it has been identified as dependent variable). We have to test the significance of given results if we want to use the statistics of the F-test and statistics of the tudent's t-test. Key words: testing, multiple linear regression, regression model, residual, regression coefficient, standard error of regression, coefficient of multiple determination, zero hypothesis, alternative hypothesis, analysis of variance, F-test, t-test. INTRODUCTION In simple regression analysis we exemine dependence between variation of two phenomena. Due to the fact that many factors influence some phenomena, especially social, it's necessary to calcualte the interraction among phenomena. In order to attain this we can use multiple regression methods. Multiple regression has taken a very significant place in statistical science. Not interfering into reasons of this interest, according to the analytician's opinion we can consider the method of regression convenient in one of two cases: a) in order to control dependent variable according to independent variables, and b) in order to predict dependent variable. In both cases it is necessary to make researches. Received February 15, 003

6 V. DJORDJEVIĆ MATHEMATICAL MODEL In case of researching relationship between two phenomena and in case of prediction of the value of dependent variable, first we are going to identify variables and then to find out random sample n-size for the chosen values of dependent variables. uppose that k phenomenon is identified as independent variable (predictor), or X i, i 1,,, k, and Y as dependent random variable. The whole multiple linear model can be presented as one equation for volontarily dependent variable Y i : Y i β 0 + β 1 x 1 + β x + + β k x k + ε i (1) where: Y i dependent random variable, x 1, x,, x k values of independent variable, β 0, β 1,, β k model parameters (regression coefficient) ε i a supporting element, or a random error which has normal distribution, zero mean and constant variance. Multiple linear regression model (1) consists of two parts: determined (Y i ') Y i ' β 0 + β 1 x 1 + β x + + β k x k () stohastics (ε i ), so that from (1) we can get: ε i Y i Y i ' (3) Determined part of the linear regression model is an average value of dependent variable (Y i ) for the given values of independent variables: Y i ' E(Y i ) β 0 + β 1 x 1 + β x + + β k x k (4) and other values of Y i show average values E(Y i ). The whole regression model (1) was estimated by the sample regression model: ŷ b i 0 + b 1 x 1 + b x + + b k x k (5) where we have: ŷ adjustable or foreseen value of dependent variabley i i, x 1, x,, x k values of independent variables, b 0, b 1,, b k estimations of unknown parameters β 0, β 1,, β k. We should choose the multiple linear regression model which presents in the most suitable way the relationship between observed phenomena. It can be acheived by minimizing a sum of square equations of empirical points from the regression model (for example: regression plane when k), or: ei (yi ŷi ) min, (6) where e i is random error in sample. Multiple linear regression model as statistical model does not mean only mathematical expression but also assumptions which supply the optimal estimation of parameters β 0, β 1,, β k. These assumptions are usually connected with random error:

Testing in Multiple Regression Analysis 7 the random error has normal distribution, it is equal zero (on the average) supporting elements have equal variances. In case when k, multiple linear regression model is regression plane equation in samples (the easiest example of multiple linear regression): ŷ b i 0 + b 1 x 1 + b x (7) In order to establish adaptation of the estimated regression model by empirical data we use standard error of the sample regression which represents the estimation of standard deviation of the random error σ ε. It is market by ε, and it is presented as square root of repetition, or: ε (yi ŷi ) E σ (8) where E is a sum of square root aberration of the empirical points of regression model (Error um of quares). The standard error of regression as absolute measure of the unexplained variability is not convenient for comparison. That's the reason why we use relative indicator coefficient of multiple determination R. It is presented as a measure of explained variability and it is calculated by this equation: (ŷ R i y) R (9) (yi y) y where R presents Regression um of quares (explained variability) and y presents the total um of quares (total variability). The coefficient of multiple determination shows the procentage of variations of dependent variable Y which is described by common influence of independent variables which are involved in this model. During its calculation we should take care of the number of independent variables and of sample size. It is achieved by calculation of the adjusted coefficient of multiple determination: R n 1 adj. 1 (1 R ) (10) where: n is the sample size and k number of independent variables. MODEL UAGE TETING In order to use the estimated regression equation we firstly have to test the significance of given estimates. This is zero and alternative hypothesis: H 0 : β 0 β 1 β + β k 0 H A : at least one β i 0

8 V. DJORDJEVIĆ According to this, we have laid, zero hypothesis in that way that a linear connection between observed phenomena variations does not exist, or that x 1, x,, x k has not influence on Y. If we start from the assumption that the total variability of dependent variable is conditioned by the variability of independent variables involved in the model and by the unexplained variability, we can write: y R + E (11) where y presents the total um of quares (total variability), R Regression um of quares (explained variability), E Error um of quares (unexplained variability), We apply F-test, and test and test the possibility of the regression model usage by analysis of variance. The table of this analysis is presented here: sources of variation Regression Error degrees of freedom k n k 1 sum of squares R E Total n 1 y The decisions rules: mean squares MR R k ME E n k 1 F-ratio MR F ME if F F α;k;n-k-1, we reject null hypothesis, if F < F α;k;n-k-1, we accept null hypothesis. According to this, if the realized value of the F-test is lesser than theoretical, or we accept null hypothesis, we come to a conclusion that the linear influence of independent variables on dependent variable doesn't exist. REGREION COEFFICIENT TETING If we use estimated regression model to estimate and predict the values of dependent variable Y, we have to test the significance of estimates of each parameter apart (β i, i 1,,, k). In the easiest case of the multiple linear regression k we test the estimate significance of two parameters β 1 and β. The null and alternative hypothesis are presented here: I H 0 : β 1 0 II H 0 : β 0 H A : β 1 0 H A : β 0 Test statistics: b1 b t 1 t b 1 has the tudent's t-distribution with n k 1 degrees of freedom. b

Testing in Multiple Regression Analysis 9 If t i < t α/, i1,, the value of the test statistics has fallen in the field of accepting null hypothesis. In that case we accept null hypothesis that independent variables (X 1, i.e X ) does not influence dependent variable Y. Generally in multiple regression model we apply testing: H 0 : β i 0 H A : β i 0 (for i 1,,, k); the test statistics b t i has the tudent's t-distribution with degrees of freedom n k 1. We accept null hypothesis if t i < t α/. We can also mention that when testing the significance greater realized value of the t- test statistics does not mean that the variable which corresponds has greater relative influence on dependent variable. In order to determine mathematical model of multiple regression and to test it we have to do many calculations. It's necessary to use special calculation programme for these calculations. The application of some current package of calculation programme is significant not only for ananalyst but also for statistical practice. i b i REFERENCE 1. Mc Clave J.; Benson G.; incich T. tatistics for business and economic, Prentice Hall, 001.. Aczel A. Complete business statistics, Homewood, 1986. 3. Phaffenberger P., Paterson J.. tatistical methods, Homewood, 1987. TETIRANJE MODELA LINEARNE VIŠETRUKE REGREIJE Vera Djordjević Upotrebljivost modela linearne višestruke regresije za predviđanje i ocenjivanje vrednosti zavisne promenljive proverava se testiranjem. Za testiranje možemo koristiti F-test i t-test. I jednim i drugim testom proverava se da li oblik zavisnosti utvrđen na osnovu raspoloživih podataka (podataka uzorka) važi za osnovni skup. amo u tom slučaju regresioni model može poslužiti analitičaru za dalju statističku analizu i korišćenje.