ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
|
|
|
- Clemence Adele Newman
- 9 years ago
- Views:
Transcription
1 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages and log-wages on the y-axes and education on the x-axis, respectively. In both figures, there appears to be evidence of heteroskedasticity in the variance of the outcome with respect to education. Notice how the dispersion of points at a given level of education varies across the education levels. In addition, it appears that there may be more heteroskedasticity in the level of wages than in log-wages. 150 (mean) hrwage Twin1 Educ 6 (mean) lwage Twin1 Educ
2 b. If there is heteroskedasticity in the residuals of the log-wage regression, the OLS estimator of the returns to education is still unbiased and consistent but will no longer be the efficient estimator (not BLUE). In addition, the conventional estimator of the variance of the estimated return to education will be biased and inconsistent. c. Below are the STATA regression results uncorrected and corrected for heteroskedasticity, respectively.. reg lwage educ age age female white Source SS df MS Number of obs = F( 5, 674) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ age age female white _cons reg lwage educ age age female white, robust Regression with robust standard errors Number of obs = 680 F( 5, 674) = 59.6 Prob > F = R-squared = Root MSE = Robust lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ age age female white _cons When using the robust subcommand, the estimates of the standard errors are corrected for heteroskedasticity by allowing the variance of the residuals to vary across individuals in an unspecified way when the variance-covariance matrix of the slope coefficients is estimated. In effect, the squares of the estimated residuals for each individual are used in the calculation of the standard errors (see class notes for the Eicker-White formula). The corrected standard errors on education and age are about 10-15% greater than the uncorrected standard errors, while the standard errors on the gender and racial indicators are not very different. Consequently, based on this comparison, there is only slight evidence of heteroskedasticity. d. The reason why the R-squared and estimated coefficients from the regression of the estimated residuals on the regressors is virtually zero is that these first-order conditions (a.k.a., normal equations or moment/orthogonality conditions) were how the original least-squares estimates were derived in the first place. In other words, least squares estimation calculates the slope coefficients by imposing the
3 restriction implied by the model that the covariance between the residuals and regressors is zero (look at your notes and the last problem set for the equations). e. The following are the STATA regression results from the regressions of the squared residuals from the level of wages and log-wages regressions on the covariates, respectively:. reg err educ age age female white Source SS df MS Number of obs = F( 5, 674) = 7.03 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = err Coef. Std. Err. t P> t [95% Conf. Interval] educ age age female white _cons reg lerr educ age age female white Source SS df MS Number of obs = F( 5, 674) = 3.36 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lerr Coef. Std. Err. t P> t [95% Conf. Interval] educ age age female white _cons The number of observations times the R-squareds from these regressions (N*R-squared) is a test-statistic for heteroskedasticity that has an asymptotic chi-squared distribution under the null hypothesis of no heteroskedasticity, with the number of degrees of freedom equal to the number of regressors in the auxiliary squared residuals regression. In the above cases, 680*R χ (5). This test-statistic is 33.7 and 16.5 for the wage and log-wage regressions, respectively. The 5% critical value for a χ (5) is Consequently, we would reject the null hypothesis of homoskedasticity for both models at conventional levels of significance. It does appear, however, that the residuals from the wage regression exhibit more heteroskedasticity than the residuals from the log-wage regression. When the squared residuals are regressed on the original regressors, their squares, and their interactions, then one is performing the White test for heteroskedasticity. The R-squareds from the regressions of the squared wage regression residuals and the squared log-wage regression residuals on all of these variables are 0.09 and 0.050, respectively. The resulting test-statistics are 6.6 and 34.0 and have asymptotic χ (11) distributions under the null. Since the 5% critical value for a χ (11) is 19.7, we again reject the null of no heteroskedasticity in both cases. 3
4 f. Since identical twins are not just from the same family, but also from the same egg, there is a very good chance that there will be unobservable characteristics that are common among twins (e.g., personality, family background, schooling environment, etc.). Consequently, treating twins as being independent of each other is probably inappropriate, leading to a violation of the pairwise uncorrelated assumption on their residuals. Although the OLS estimator will be unbiased and consistent, it will be inefficient and the conventional estimates of the variances of the slope coefficients will be inconsistent. In many cases, this bias in the estimated standard errors can be quite substantial. Since twins are likely to be positively correlated with each other, the conventional standard error estimates will be biased down, overstating the precision of the coefficient estimates. Note that we may also worry that these twin random effects (unobservables) may also be correlated with both education and wages (e.g., differences across families in ability or motivation). In this case, there may be omitted variables bias in the OLS estimate of the returns to education. This is the subject of parts c and d of question. g. The STATA results are:. reg lwage educ age age female white, cluster(id) Regression with robust standard errors Number of obs = 680 F( 5, 339) = Prob > F = R-squared = Number of clusters (id) = 340 Root MSE = Robust lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ age age female white _cons So the standard error on the estimated return to education increases by over 30% when compared to the conventional standard error that is not corrected for clustering (the standard error on the age coefficient also increases substantially). When clustering is corrected for, we are now appropriately recognizing that the observations should not be treated as being independent of each other. When clustering is not corrected for, using the conventional OLS estimates of the standard errors inappropriately treats the data as being independent and overstates the amount of information contained in the data. Question : a. If the self-reports of education are measured with error, and the measurement error is classical, then the estimated return to education from the multivariate log-wage regression will be biased down. This is also known as attenuation bias. The size of this bias depends on the relative amount of the variation in the self-report of schooling that is attributable to the measurement error. This is also known as the noise-tototal variance ratio. In the bivariate regression model with no additional regressors, the bias has the form: ^ βσ u βσ u Bias ( β OLS ) = = = λβ, σ σ + σ S u S* 4
5 u S where σ is the measurement error variance, σ is the total variance in the self-report of schooling, is the variance in true schooling (a.k.a, the signal), and λ is the noise-to-total variance ratio. If several regressors, X, are included in the regression as controls, then the bias has the form: ^ λβ Bias( β OLS ) =, where R S,X is the R-squared from the regression of schooling on the X s. 1 R S,X As R S,X goes up, the size of the attenuation bias increases for a fixed noise-signal ratio. Intuitively, if the measurement error is classical, and therefore independent of the additional regressors X, then including X in the regression will soak up some of the signal in schooling since it is related with true schooling. However, it will soak up none of the noise variance in self-reported schooling since it is independent of it. Consequently, while including additional regressors may alleviate the potential omitted variables problem, it could exacerbate attenuation bias arising from measurement error. b. The following are the results from using the other twin s report of an individual s education as an instrument for the self-reported education in two-stage least squares estimation:. ivreg lwage age age female white (educ = educt_t) Instrumental variables (SLS) regression Source SS df MS Number of obs = F( 5, 674) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ age age female white _cons Instrumented: educ Instruments: educt_t + age age female white Here, the estimated return to education is about 6% higher than the estimate without instrumenting. Under the assumption that the self-reports of education are measured with classical measurement error and that the measurement error of the other twin s report of the individual s education is not correlated with this measurement error, then the instrumental variable will purge the attenuation bias in the conventional estimated return to education induced by the measurement error. This explains why the estimated return to education increases slightly. c. If there are unobservable, confounding variables that are correlated with both educational attainment and earnings, then there will be omitted variables bias in the least squares estimated return to education when these confounding variables are not controlled for. Intuitively, since education is not randomly assigned, if there are variables that drive both education and earnings (e.g., ability, motivation, etc.), we will inappropriately assign wage effects to education that are actually attributable to the other factors we have not controlled for. Suppose that the omitted variable is Z and that the correct regression model is: σ S* 5
6 y i = α + βs i + θz i + ε i, where y i is log-wages and S i is schooling. Further suppose that Z i is related to S i in the following way: Z i = λs i + u i. Then the omitted variables bias from regressing y on S without controlling for Z is: Bias( ~ β ) = θλ. For omitted variables bias, the omitted factors must be both correlated with schooling and correlated with earnings. If the omitted variables are positively correlated with both schooling and earnings, then the bias in the misspecified regression estimate is positive. d. If the omitted factors are identical between identical twins, then one can first-difference the data and compare differences in earnings and education across twin pairs to get an unbiased estimate of the return to education. The following are the STATA regression results from the first-differenced data:. reg dlwage deduc if first==1, noconstant Source SS df MS Number of obs = F( 1, 339) = Model Prob > F = Residual R-squared = Adj R-squared = 0.08 Total Root MSE = dlwage Coef. Std. Err. t P> t [95% Conf. Interval] deduc The estimated return to education is much lower (by over 40%) than the one based on the conventional log-wage regression. This may imply that the conventional cross-sectional estimate of the return to education is biased up since it does not control for ability differences across individuals with different levels of schooling (a.k.a., ability bias). We didn t include controls for age, gender, and race since these are identical between identical twins and get differenced out when comparing differences between twins. This highlights that many factors may get differenced out when comparing twins. e. The following is the scatter plot of the difference in twins log-wages and the difference in twins selfreported education. 1st Diff. Log Wage Own-Report 1st Diff in Educ 6
7 Most of the observations are clustered around zero, implying that most twins have identical education levels. This could imply that some of the observed differences in self-reported education between twins are attributable to measurement error. This could be another explanation of the result found in part d, since it could be that the estimated return to education based on first-differences is attenuated down due to the measurement error. In particular, if the measurement error is classical and the measurement errors in the twins self-reports are independent of each other, then first-differencing the data will absorb much of the true signal in education while it absorbs none of the measurement error variance. This is because the education levels of twins are highly correlated (a correlation of about 0.7). This could exacerbate the attenuation bias attributable to measurement error quite a bit since the bias has the form: ^ λβ Bias( β D ) =, where r S is the correlation between two twins education levels. Notice the (1 rs ) similarity of this bias to the one shown above when additional regressors are included as controls. One way to think about this is that differencing the data is like including dummy variables for every twin pair (family) in the regression. Consequently, a lot of the signal may be absorbed relative to the amount of noise variance that is absorbed, and some of the observed differences in education between twins may be attributable to either or both twins misreporting. f. The following are the results from instrumenting the difference between the twins self-reports with the difference between the twins reports of the other twin s education using two-stage least squares:. ivreg dlwage (deduc = deduct) if first==1, noconstant Instrumental variables (SLS) regression Source SS df MS Number of obs = F( 1, 339) =. Model Prob > F =. Residual R-squared = Adj R-squared =. Total Root MSE = dlwage Coef. Std. Err. t P> t [95% Conf. Interval] deduc Instrumented: deduc Instruments: deduct The estimated return to education has increased because under the assumption that the instrument is valid, the potentially large attenuation bias due to measurement error has now been purged. This conclusion depends crucially on the assumption that all of the potential reporting errors (associated with both the self-reports and the other twins reports of each twin s education) are independent of each other. If the measurement errors between an individual s self-report and his twin s report of the individual s education are correlated, then this instrumental variables estimate will be biased. In class, we discussed that under one scenario for correlated measurement errors, one twin s report of the difference in education is a valid instrument for the other twin s report of the education difference. Although this was not required to receive full credit, the below two-stage least squares regression uses twin s report of the education difference (dceduct) as an instrument for twin 1 s report of the education difference (dceduc).. ivreg dlwage (dceduc = dceduct) if first==1, noconstant Instrumental variables (SLS) regression 7
8 Source SS df MS Number of obs = F( 1, 339) =. Model Prob > F =. Residual R-squared = Adj R-squared =. Total Root MSE = dlwage Coef. Std. Err. t P> t [95% Conf. Interval] dceduc Instrumented: dceduc Instruments: dceduct The resulting instrumental variables (IV) estimate of the return to education is lower than the IV estimate based on the assumption that the measurement errors are independent. Thus, there appears to be some evidence of correlated measurement errors. See Ashenfelter and Krueger (1994) for more details on potential solutions to the problem of correlated reporting errors. Also, look at your class notes. g. If the reporting errors are uncorrelated with each other, then the estimate in the first part of f. is purged of attenuation bias and is also more likely to be purged of ability bias than the conventional estimate of the return to education based on the regression of log-wages on educ, age, age, female, and white. Since this estimate is only slightly lower than the conventional estimate, we might conclude that the size of the omitted variables bias in the conventional estimate is not very big. However, this is one of the more controversial debates ongoing in labor economics. Consider the following model for log-wages (y ij ): y ij = γs ij + ε ij, ε ij = α i + u ij, where i indexes the family, j indexes the twin (1 or ), and α i represents the unobservable factors shared by twins in the same family (i.e., the family effect). Suppose we compare twin differences in log-wages and education in order to difference out α i : (y i1 y i ) = γ(s i1 s i ) + (u i1 u i ). Comparing twin differences will reduce the omitted variables problem if the following holds: Cov(s Cov(s, ) i1 s i, ε i1 ε ) ij ε i ij < Var(s i1 s i ) Var(s ij ) Although it is believable that the numerator on the left-hand-side is less than the numerator on the righthand-side, we also know that the denominator on the left-hand-side is much lower. That is, the variation in twin-differences in education across families is much lower than the overall variation in education across individuals. Consequently, the variation in the treatment is greatly reduced by focusing on twin differences, and even if the correlation between this treatment and the unobservables is smaller, the omitted variables bias may not fall. On the other hand, it is informative that the estimated return to education based on twin-differences is similar to the conventional OLS estimate. 8
Lecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
MULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
August 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
Rockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Correlated Random Effects Panel Data Models
INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear
Nonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
Correlation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
Discussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo [email protected] Abstract The analysis of a data set of observation for 10
Lecture 3: Differences-in-Differences
Lecture 3: Differences-in-Differences Fabian Waldinger Waldinger () 1 / 55 Topics Covered in Lecture 1 Review of fixed effects regression models. 2 Differences-in-Differences Basics: Card & Krueger (1994).
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
SPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially
Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random
Solución del Examen Tipo: 1
Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical
Stata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
Interaction effects between continuous variables (Optional)
Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD
REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Regression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
The following postestimation commands for time series are available for regress:
Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Sample Size Calculation for Longitudinal Studies
Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction
Financial Risk Management Exam Sample Questions/Answers
Financial Risk Management Exam Sample Questions/Answers Prepared by Daniel HERLEMONT 1 2 3 4 5 6 Chapter 3 Fundamentals of Statistics FRM-99, Question 4 Random walk assumes that returns from one time period
MODELING AUTO INSURANCE PREMIUMS
MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies
Quick Stata Guide by Liz Foster
by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Chapter 7: Dummy variable regression
Chapter 7: Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,
xtmixed & denominator degrees of freedom: myth or magic
xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
Regression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
Multiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Econometrics I: Econometric Methods
Econometrics I: Econometric Methods Jürgen Meinecke Research School of Economics, Australian National University 24 May, 2016 Housekeeping Assignment 2 is now history The ps tute this week will go through
Linear Regression Models with Logarithmic Transformations
Linear Regression Models with Logarithmic Transformations Kenneth Benoit Methodology Institute London School of Economics [email protected] March 17, 2011 1 Logarithmic transformations of variables Considering
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
Panel Data Analysis in Stata
Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing
MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED
1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility
Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005
Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 This is an introduction to panel data analysis on an applied level using Stata. The focus will be on showing the "mechanics" of these
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Oscar Torres-Reyna [email protected] December 2007 http://dss.princeton.edu/training/ Intro Panel data (also known as longitudinal
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
MULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
Probability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
Introduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
Empirical Methods in Applied Economics
Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already
MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
Forecasting in STATA: Tools and Tricks
Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
International Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: [email protected] 1. Introduction
Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science [email protected]
Dept of Information Science [email protected] October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Clustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, 2011. Professor Patricia A.
Introduction to Regression Models for Panel Data Analysis Indiana University Workshop in Methods October 7, 2011 Professor Patricia A. McManus Panel Data Analysis October 2011 What are Panel Data? Panel
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
Getting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
Multivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Parametric and non-parametric statistical methods for the life sciences - Session I
Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute
