# The Simple Linear Regression Model: Specification and Estimation

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 3 The Simple Linear Regression Model: Specification and Estimation 3.1 An Economic Model Suppose that we are interested in studying the relationship between household income and expenditure on food. We ask the question, How much did your household spend on food last week? Suppose that the continuous random variable y has a probability density function, f(y), that describes the probabilities of obtaining various food expenditure values. The probability density function in this case describes how expenditures are distributed over the population, and it might look like one of those in Figure 3.1. Slide 3.1

2 The probability distribution in Figure 3.1a is actually a conditional probability density function, since it is conditional upon household income. If x = weekly household income, then the conditional probability density function is f(y x = \$480). The conditional mean, or expected value, of y is E(y x = \$480) = µ y x and is our population s average weekly food expenditure. The conditional variance of y is var(y x = \$480) = σ 2, which measures the dispersion of household expenditures y about their mean µ y x. The parameters µ y x and σ 2, if they were known, would give us some valuable information about the population we are considering. If we knew these parameters, and if we knew that the conditional distribution f(y x = \$480) was normal, N(µ y x, σ 2 ), then we could calculate probabilities that y falls in specific intervals using properties of the normal distribution. In order to investigate the relationship between expenditure and income, we must build an economic model and then an econometric model that forms the basis for a Slide 3.2

3 quantitative economic analysis. If we consider households with different levels of income, we expect the average expenditure on food to change. In Figure 3.1b we show the probability density functions of food expenditure for two different levels of weekly income, \$480 and \$800. Each density function f(y x) shows that expenditures will be distributed about the mean value µ y x, but the mean expenditure by households with higher income is larger than the mean expenditure by lower income household. Our economic model of household food expenditure, depicted in Figure 3.2, is described as the simple regression function E(y x) = µ y x = β 1 + β 2 x (3.1.1) The conditional mean E(y x) in Equation (3.1.1) is called a simple regression function. The unknown regression parameters β 1 and β 2 are the intercept and slope of the regression function, respectively. Slide 3.3

4 In our food expenditure example, the intercept, β 1, represents the average weekly household expenditure on food by a household with no income, x = 0. The slope β 2 represents the change in E(y x) given a \$1 change in weekly income; it could be called the marginal propensity to spend on food. Algebraically, E( y x) de( y x) β 2 = = x dx (3.1.2) where denotes change in and de(y x)/dx denotes the derivative of E(y x) with respect to x. Slide 3.4

5 3.2 An Econometric Model The model E(y x) = µ y x = β 1 + β 2 x that we specified in Section 3.1 describes economic behavior, but it is an abstraction from reality. If we were to sample household expenditures at other levels of income, we would expect the sample values to be symmetrically scattered around their mean value E(y x) = µ y x = β 1 + β 2 x. In Figure 3.3 we arrange bell-shaped figures like Figure 3.1, depicting f(y x), along the regression line for each level of income. This figure shows that at each level of income the average value of household expenditure is given by the regression function, E(y x) = µ y x = β 1 + β 2 x. It also shows that we assume values of household expenditure on food will be distributed around the mean value E(y x) = µ y x = β 1 + β 2 x at each level of income. This regression function is the foundation of an econometric model for household food expenditure. Slide 3.5

6 In order to make the econometric model complete, we have to make a few more assumptions. The standard assumption about the dispersion of values y about their mean is the same for all levels of income, x. That is, var(y x) = σ 2 for all values of x. The constant variance assumption, var(y x) = σ 2 implies that at each level of income x we are equally uncertain about how far values of food expenditure, y, may fall from their mean value, E(y x) = µ y x = β 1 + β 2 x, and the uncertainty does not depend on income or anything else. Data satisfying this condition are said to be homoskedastic. If this assumption is violated, so that var(y x) σ 2 for all values of income, x, the data are said to be heteroskedastic. Next, we have described the sample as random. By this we mean that when data are collected, they are statistically independent. If y i and y j denote the expenditures of two randomly selected households, then knowing the value of one of these (random) variables tells us nothing about what value the other might take. If y i and y j are the expenditures of two randomly selected households, then we will assume that their covariance is zero, or Slide 3.6

7 cov(y i,y j ) = 0. This is a weaker assumption than statistical independence (since independence implies zero covariance, but not vice versa); it implies only that there is no systematic linear association between y i and y j. Refer to Chapter 2.5 for a discussion of this difference. In order to carry out a regression analysis we must make an assumption about the values of the variable x. The idea of regression analysis is to measure the effect of changes in one variable, x, on another, y. In order to do that, x must take several different values, at least two in this case, within the sample of data. For now we simply state the additional assumption that x must take at least two different values. Furthermore we assume that x is not random, which means that we can control the values of x at which we observe y. Finally, it is sometimes assumed that the distribution of y, f(y x), is known to be normal. It is reasonable, sometimes, to assume that an economic variable is normally distributed about its mean. Slide 3.7

8 Assumptions of the Simple Linear Regression Model I The average value of y, for each value of x, is given by the linear regression E(y) = β 1 + β 2 x For each value of x, the values of y are distributed about their mean value, following probability distributions that all have the same variance, i.e., var(y) = σ 2 The values of y are all uncorrelated, and have zero covariance, implying that there is no linear association among them. Slide 3.8

9 cov(y i,y j ) = 0 for all i and j This assumption can be made stronger by assuming that the values of y are all statistically independent. The variable x is not random and must take at least two different values. (optional) The values of y are normally distributed about their mean for each value of x, y ~ N[(β 1 + β 2 x), σ 2 ] Slide 3.9

10 3.2.1 Introducing the Error Term The essence of regression analysis is that any observation on the dependent variable y can be decomposed into two parts: a systematic component and a random component. The systematic component of y is its mean E(y), which itself is not random, since it is a mathematical expectation. The random component of y is the difference between y and its mean value E(y). This is called a random error term, and it is defined as e = y E(y) = y β 1 β 2 x (3.2.1) If we rearrange Equation (3.2.1) we obtain the simple linear regression model y = β 1 + β 2 x + e (3.2.2) Slide 3.10

11 The dependent variable y is explained by a component that varies systematically with the independent variable x and by the random error term e. Equation (3.2.1) shows that y and error term e differ only by constant E(y), and since y is random, so is the error term e. Given what we have already assumed about y, the properties of the random error, e, can be derived directly from Equation (3.2.1). Since y and e differ only by a constant, β 1 + β 2 x, the probability density functions for y and e are identical except for their location, as shown in Figure 3.4. Slide 3.11

12 Assumptions of the Simple Linear Regression Model II It is customary in econometrics to state the assumptions of the regression model in terms of the random error e. For future reference the assumptions are named SR1-SR6, SR denoting simple regression. SR1. The value of y, for each value of x, is y = β 1 + β 2 x + e SR2. The average value of the random error e is E(e) = 0 since we assume that y = β 1 + β 2 x SR3. The variance of the random error e is var(e) = σ 2 = var(y) since y and e differ only by a constant, which doesn t change the variance. SR4. The covariance between any pair of random errors, e i and e j is Slide 3.12

13 SR5. SR6. cov(e i,e j ) = cov(y i,y j ) = 0 If the values of y are statistically independent, then so are the random errors e, and vice versa. The variable x is not random and must take at least two different values. (optional) The values of e are normally distributed about their mean e ~ N(0, σ 2 ) if the values of y are normally distributed, and vice versa. The random error e and the dependent variable y are both random variables, and the properties of one can be determined from the properties of the other. There is one interesting difference between these random variables, however, and that is that y is observable and e is unobservable. If the regression parameters β 1 and β 2 were known, then for any value of y we could calculate e = y (β 1 + β 2 x). This is illustrated in Figure 3.5. Knowing the regression function E(y x) = β 1 + β 2 x, we could separate y into its fixed Slide 3.13

14 and random parts. Unfortunately, since β 1 and β 2 are never known, it is impossible to calculate e, and no bets on its true value can ever be collected. The random error e represents all factors affecting y rather than x. Moreover, the random error e also captures any approximation error that arises, because the linear functional form we have assumed may be only an approximation to reality. Slide 3.14

15 3.3 Estimating the Parameters for the Expenditure Relationship The economic and statistical models we developed in the previous section are the basis for using a sample of data to estimate the intercept and slope parameters, β 1 and β The Least Squares Principle The least squares principle asserts that to fit a line to the data values we should fit the line so that the sum of the squares of the vertical distances from each point to the line is as small as possible. The distances are squared to prevent large positive distances from being canceled by large negative distances. The intercept and slope of this line, the line that best fits the data using the least squares principle, are b 1 and b 2, the least squares estimates of β 1 and β 2. The fitted regression line is then Slide 3.15

16 y = b + b x (3.3.1) ˆt 1 2 t The vertical distances from each point to the fitted line are the least squares residual. They are given by e? t y t y t y t b1 b2x t = = (3.3.2) These residuals are depicted in Figure 3.7a. Now suppose we fit another line, any other fitted line, to the data, say y = b + b x (3.3.3) * * * ˆt 1 2 t Slide 3.16

17 where b * 1 and b * 2 are any other intercept and slope values. The residuals for this line, e? * * t y t y t =, are shown in Figure 3.7b. The least squares estimates b 1 and b 2 have the property that the sum of their squared residuals is less than the sum of squared residuals for any other line, e? = ( y y ) e? = ( y y ) 2 2 *2 * 2 t t t t t t no matter how the other line might be drawn through the data. The least squares principle says that the estimates b 1 and b 2 of β 1 and β 2 are the ones to use, since the line using them as intercept and slope fits are the data best. Given the sample observations on y and x, least squares estimates of the unknown parameters β 1 and β 2 are obtained by minimizing the sum of squares function Slide 3.17

18 T t 1 2 t (3.3.4) t= 1 S( β, β ) = ( y β β x ) Since the points (y t, x t ) have been observed, the sum of squares function S is a function of the unknown parameters β 1 and β 2. This function, which is a quadratic in terms of the unknown parameters β 1 and β 2, is a bowl-shaped surface like the one depicted in Figure 3.8. Given the data, our task is to find, out of all the possible values that β 1 and β 2 can take, the point (b 1, b 2 ) at which the sum of squares function S is minimum. The partial derivatives of S with respect to β 1 and β 2 are Slide 3.18

19 S β 1 S β 2 = 2Tβ 2 y + 2 xβ 1 t t xt 2 2 xtyt 2 xt 1 = β + β (3.3.5) Algebraically, to obtain the point (b 1, b 2 ) we set the derivatives of Equation (3.35) to zero, and replaceβ 1 and β 2 by b 1 and b 2, respectively, to obtain 2( y Tb xb ) = 0 t 1 t 2 2( xy xb xb) = 0 2 t t t 1 t 2 (3.3.6) Rearranging Equation (3.3.6) leads to two equations usually known as the normal equations, Slide 3.19

20 Tb + x b = y 1 t 2 t (3.3.7a) 2 xb t 1+ x b xy t 2 = t t (3.3.7b) These two equations comprise a set of two linear equations in two unknowns b 1 and b 2. To solve for b 2, multiply the first Equation (3.3.7a) by xt ; multiply the second Equation (3.3.7b) by T; subtract the second equation from the first and then isolate b 2 on the left-hand side. To solve for b 1, divide both sides of Equation (3.3.7a) by T. The formulas for the least squares estimates β 1 and β 2 are b ( t) T x y x y = t t t t T xt x (3.3.8a) b1 = y b2x (3.3.8b) Slide 3.20

21 where y= y / T and x / T t = x t are the sample means of the observations on y and x. If we plug the sample values y t and x t into Equation (3.3.8), then we obtain the least squares estimates of the intercept and slope parameters β 1 and β 2. When the formulas for b 1 and b 2 are taken to be rules that are used whatever the sample data turn out to be, then b 1 and b 2 are random variables. When actual sample values are substituted into the formulas, we obtain numbers that are the observed values of random variables. To distinguish these two cases we call the rules or general formulas for b 1 and b 2 the least squares estimators. We call the numbers obtained when the formulas are used with a particular sample least squares estimates. Slide 3.21

22 3.3.2 Estimates for the Food Expenditure Function We have used the least squares principle to derive Equation (3.3.8), which can be used to obtain the least squares estimates for the intercept and slope parameters β 1 and β 2. To illustrate the use of these formulas, we will use them to calculate the values of b 1 and b 2 for the household expenditure data given in Table 3.1. From Equation (3.3.8a) we have b t t t t 2 T xt ( xt) (3.3.9a) T x y x y = 2 2 (40)( ) (27920)( ) = = (40)( ) (27920) and from Equation (3.3.8b) Slide 3.22

23 b = y b x 1 2 = ( )(698.0) = (3.3.9b) A convenient way to report the values for b 1 and b 2 is to write out the estimated or fitted regression line: (3.3.10) This line is graphed in Figure 3.9. The line s slope is and its intercept, where it crosses the vertical axis, is The least squares fitted line passes through the middle of the data in a very precise way, since one of the characteristics of the fitted line based on the least squares parameter estimates is that it passes through the point defined by the sample means, ( x, y ) = (698.00, ). Slide 3.23

24 3.3.3 Interpreting the Estimates The value b 2 = is an estimate of β 2, the amount by which weekly expenditure on food increases when weekly income increases by \$1. Thus, we estimate that if income goes up by \$100, weekly expenditure on food will increase by approximately \$ Strictly speaking, the intercept estimate b 1 = is an estimate of the weekly amount spent on food for a family with zero income. In most economic models we must be very careful when interpreting the estimated intercept. The problem is that we usually do not have any data points near x = 0, which is certainty true for the food expenditure data shown in Figure 3.9. If we have no observations in the region where income is near zero, then our estimated relationship may not be a good approximation to reality in that region. So, although our estimated model suggests that a household with zero income will spend \$ per week on food, it might be risky to take this Slide 3.24

25 estimate literally. You should consider this issue in each economic model that you estimate a Elasticities The income elasticity of demand is a useful way to characterize the responsiveness of consumer expenditure to changes in income. From microeconomic principles the elasticity of any variable y with respect to another variable x is percentage change in y y/ y y x η = = = percentage change in x x/ x x y (3.3.11) In the linear economic model given by Equation (3.1.1) we have shown that Slide 3.25

26 E( y) β 2 = (3.3.12) x so the elasticity of average expenditure with respect to income is E( y)/ E( y) E( y) x x η= = =β2 x/ x x E( y) E( y) (3.3.13) To estimate this elasticity we replace β 2 by b 2 = We must also replace x and E(y) by something, since in a linear model the elasticity is different on each point upon the regression line. One possibility is to choose a value of x and replace E(y) by its fitted value. Another frequently used alternative is to report the elasticity at the point of the means ( x, y ) = (698.00,130.31) since that is a representative point on the regression line. If we calculate the income elasticity at the point of the means, we obtain Slide 3.26

27 x η= ˆ b2 = = (3.3.14) y We estimate that a 1% change in weekly household income will lead, on average, to approximately a 0.7% increase in weekly household expenditure on food when ( x, y) = ( x, y) = (698.00,130.31). Since the estimated income elasticity is less than one, we would classify food as a necessity rather than a luxury, which is consistent with what we would expect for an average household b Prediction Suppose that we wanted to predict weekly food expenditure for a household with a weekly income of \$750. This prediction is carried out by substituting x = 750 into our estimated equation to obtain Slide 3.27

28 yˆ = x = (750) = \$ (3.3.15) t t We predict that a household with a weekly income of \$750 will spend \$ per week on food. Slide 3.28

29 3.3.3c Examining Computer Output Dependent Variable: FOODEXP Method: Least Squares Sample: 1 40 Included observations: 40 Variable Coefficient Std. Error t-statistic Prob. C INCOME R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Figure 3.10 EViews Regression Output Slide 3.29

30 Dependent Variable: FOODEXP Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total Root MSE R-square Dep Mean Adj R-sq C.V Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > T INTERCEP INCOME Figure 3.11 SAS Regression Output Slide 3.30

31 3.3.4 Other Economic Models If y and x are transformed in some way, then the economic interpretation of the parameters can change. A popular transformation in economics is the natural logarithm. Economic model like ln(y) = β 1 + β 2 ln(x) are common. A nice feature of this model, if the assumptions of the regression model hold, is that the parameter β 2 is the elasticity of y with respect to x. The derivative of ln(y) with respect to x is d[ln( y)] 1 dy = dx y dx The derivative of β 1 + β 2 ln(x) with respect to x is d[ln( y)] d[ β +β ln( x)] 1 dx dx x 1 2 = = β 2 Slide 3.31

32 Setting these two pieces equal to one another, and solving for β 2 gives β = dy x 2 dx y =η (3.3.16) Equation (3.3.16) shows that in an economic model in which ln(y) and ln(x) are linearly related, the parameters β 2 is the point elasticity of y with respect to x. Slide 3.32

33 Exercise Slide 3.33

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models LIUWEI (9906360) - 1 - ABSTRACT...3 1. INTRODUCTION...4 2. DATA ANALYSIS...5 2.1 Stationary estimation...5 2.2 Dickey-Fuller Test...6 3.

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Eviews Tutorial. File New Workfile. Start observation End observation Annual

APS 425 Professor G. William Schwert Advanced Managerial Data Analysis CS3-110L, 585-275-2470 Fax: 585-461-5475 email: schwert@schwert.ssb.rochester.edu Eviews Tutorial 1. Creating a Workfile: First you

### Microeconomic Theory: Basic Math Concepts

Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts

### Simple Regression Theory I 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY I 1 Simple Regression Theory I 2010 Samuel L. Baker Regression analysis lets you use data to explain and predict. A simple regression line drawn through data points In Assignment

### Time Series Analysis

Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### 1 Lecture: Integration of rational functions by decomposition

Lecture: Integration of rational functions by decomposition into partial fractions Recognize and integrate basic rational functions, except when the denominator is a power of an irreducible quadratic.

### Chapter 5: Basic Statistics and Hypothesis Testing

Chapter 5: Basic Statistics and Hypothesis Testing In this chapter: 1. Viewing the t-value from an OLS regression (UE 5.2.1) 2. Calculating critical t-values and applying the decision rule (UE 5.2.2) 3.

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### Chapter 12: Time Series Models

Chapter 12: Time Series Models In this chapter: 1. Estimating ad hoc distributed lag & Koyck distributed lag models (UE 12.1.3) 2. Testing for serial correlation in Koyck distributed lag models (UE 12.2.2)

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS

DUSP 11.203 Frank Levy Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS These notes have three purposes: 1) To explain why some simple calculus formulae are useful in understanding

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation

Multiple Regression - Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### STA 4163 Lecture 10: Practice Problems

STA 463 Lecture 0: Practice Problems Problem.0: A study was conducted to determine whether a student's final grade in STA406 is linearly related to his or her performance on the MATH ability test before

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Chapter 4 and 5 solutions

Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### Review of Fundamental Mathematics

Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

### Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

Section 5.4 The Quadratic Formula 481 5.4 The Quadratic Formula Consider the general quadratic function f(x) = ax + bx + c. In the previous section, we learned that we can find the zeros of this function

### Integrals of Rational Functions

Integrals of Rational Functions Scott R. Fulton Overview A rational function has the form where p and q are polynomials. For example, r(x) = p(x) q(x) f(x) = x2 3 x 4 + 3, g(t) = t6 + 4t 2 3, 7t 5 + 3t

### CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies Lecture 6. Portfolio Optimization: Basic Theory and Practice Steve Yang Stevens Institute of Technology 10/03/2013 Outline 1 Mean-Variance Analysis: Overview 2 Classical

### We have discussed the notion of probabilistic dependence above and indicated that dependence is

1 CHAPTER 7 Online Supplement Covariance and Correlation for Measuring Dependence We have discussed the notion of probabilistic dependence above and indicated that dependence is defined in terms of conditional

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

### Basic numerical skills: EQUATIONS AND HOW TO SOLVE THEM. x + 5 = 7 2 + 5-2 = 7-2 5 + (2-2) = 7-2 5 = 5. x + 5-5 = 7-5. x + 0 = 20.

Basic numerical skills: EQUATIONS AND HOW TO SOLVE THEM 1. Introduction (really easy) An equation represents the equivalence between two quantities. The two sides of the equation are in balance, and solving

### Partial Fractions Examples

Partial Fractions Examples Partial fractions is the name given to a technique of integration that may be used to integrate any ratio of polynomials. A ratio of polynomials is called a rational function.

### Chapter 8 Graphs and Functions:

Chapter 8 Graphs and Functions: Cartesian axes, coordinates and points 8.1 Pictorially we plot points and graphs in a plane (flat space) using a set of Cartesian axes traditionally called the x and y axes

### 3 Random vectors and multivariate normal distribution

3 Random vectors and multivariate normal distribution As we saw in Chapter 1, a natural way to think about repeated measurement data is as a series of random vectors, one vector corresponding to each unit.

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Time Series Analysis

Time Series Analysis Descriptive analysis of a time series Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos

### Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

### CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

### Air passenger departures forecast models A technical note

Ministry of Transport Air passenger departures forecast models A technical note By Haobo Wang Financial, Economic and Statistical Analysis Page 1 of 15 1. Introduction Sine 1999, the Ministry of Business,

### UK GDP is the best predictor of UK GDP, literally.

UK GDP IS THE BEST PREDICTOR OF UK GDP, LITERALLY ERIK BRITTON AND DANNY GABAY 6 NOVEMBER 2009 UK GDP is the best predictor of UK GDP, literally. The ONS s preliminary estimate of UK GDP for the third

### 3.3. Solving Polynomial Equations. Introduction. Prerequisites. Learning Outcomes

Solving Polynomial Equations 3.3 Introduction Linear and quadratic equations, dealt within Sections 3.1 and 3.2, are members of a class of equations, called polynomial equations. These have the general

### Statistiek (WISB361)

Statistiek (WISB361) Final exam June 29, 2015 Schrijf uw naam op elk in te leveren vel. Schrijf ook uw studentnummer op blad 1. The maximum number of points is 100. Points distribution: 23 20 20 20 17

### 1 Cobb-Douglas Functions

1 Cobb-Douglas Functions Cobb-Douglas functions are used for both production functions Q = K β L (1 β) where Q is output, and K is capital and L is labor. The same functional form is also used for the

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Variances and covariances

Chapter 4 Variances and covariances 4.1 Overview The expected value of a random variable gives a crude measure for the center of location of the distribution of that random variable. For instance, if the

### The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

### Sampling Distribution of a Normal Variable

Ismor Fischer, 5/9/01 5.-1 5. Formal Statement and Examples Comments: Sampling Distribution of a Normal Variable Given a random variable. Suppose that the population distribution of is known to be normal,

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Non-Linear Regression 2006-2008 Samuel L. Baker

NON-LINEAR REGRESSION 1 Non-Linear Regression 2006-2008 Samuel L. Baker The linear least squares method that you have een using fits a straight line or a flat plane to a unch of data points. Sometimes

### MACRO ECONOMIC PATTERNS AND STORIES. Is Your Job Cyclical?

Is Your Job at Risk? Page 1 of 8 Is Your Job Cyclical? Accessing the website of the Bureau of Labor Statistics Finding out about the ups and downs of your job Total Nonfarm Employment is illustrated in

### On the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina

On the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina car@cema.edu.ar www.cema.edu.ar\~car Version1-February 14,2000 All data can be consulted

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Neda Farzinnia, UCLA Statistics University of California,

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Joint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single

### Regression analysis in practice with GRETL

Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that

### Testing for serial correlation in linear panel-data models

The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear panel-data models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear panel-data

### Competition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities

Competition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities Corina ŞERBAN 1 ABSTRACT Nowadays, social marketing practices represent an important

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### THE CORRELATION BETWEEN UNEMPLOYMENT AND REAL GDP GROWTH. A STUDY CASE ON ROMANIA

THE CORRELATION BETWEEN UNEMPLOYMENT AND REAL GDP GROWTH. A STUDY CASE ON ROMANIA Dumitrescu Bogdan Andrei The Academy of Economic Studies Faculty of Finance, Insurance, Banking and Stock Exchange 6, Romana

### Bivariate Distributions

Chapter 4 Bivariate Distributions 4.1 Distributions of Two Random Variables In many practical cases it is desirable to take more than one measurement of a random observation: (brief examples) 1. What is

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Simple Regression and Correlation

Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### 2. Simplify. College Algebra Student Self-Assessment of Mathematics (SSAM) Answer Key. Use the distributive property to remove the parentheses

College Algebra Student Self-Assessment of Mathematics (SSAM) Answer Key 1. Multiply 2 3 5 1 Use the distributive property to remove the parentheses 2 3 5 1 2 25 21 3 35 31 2 10 2 3 15 3 2 13 2 15 3 2

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### RELATIONSHIP BETWEEN STOCK MARKET VOLATILITY AND EXCHANGE RATE: A STUDY OF KSE

RELATIONSHIP BETWEEN STOCK MARKET VOLATILITY AND EXCHANGE RATE: A STUDY OF KSE Waseem ASLAM Department of Finance and Economics, Foundation University Rawalpindi, Pakistan seem_aslam@yahoo.com Abstract:

### Chapter 5 Estimating Demand Functions

Chapter 5 Estimating Demand Functions 1 Why do you need statistics and regression analysis? Ability to read market research papers Analyze your own data in a simple way Assist you in pricing and marketing

### 3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Solving quadratic equations 3.2 Introduction A quadratic equation is one which can be written in the form ax 2 + bx + c = 0 where a, b and c are numbers and x is the unknown whose value(s) we wish to find.

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and