Introduction to Panel Data Analysis


 Eugenia Jenkins
 2 years ago
 Views:
Transcription
1 Introduction to Panel Data Analysis Oliver Lipps / Ursina Kuhn Swiss Centre of Expertise in the Social Sciences (FORS) c/o University of Lausanne Lugano Summer School, August
2 Introduction panel data, data management 1 Introducing panel data (OL) 2 The SHP (OL) 3 Data Management with Stata (UK) Regressions with panel data: basic 4 Regression refresher (UK) 5 Fixed effects (FE) models (OL) 6 Introducing random effects (RE) models (OL) 7 Nonlinear regression (UK) Start: 8.3 Breaks: Lunch Breaks: End: 17.3 Additional topics 8 Level 1 and 2 growth models (OL) 9 Missing data (OL) 1 Dynamic models (UK)
3 1 Introducing panel data
4 Surveys over time: repeated crosssections vs. panels CrossSection: Survey conducted at several points in time ( rounds ) using different sample members in each round Panel: Survey conducted at several points in time ( waves ) using the same individuals over waves > panel data mostly from panel surveys > If from crosssectional surveys: retrospective ( biographical ) questionnaire 14
5 Panel Surveys: to distinguish Length and sample size: Time Series: N small (mostly=1), T large (T ) time series models Panel Surveys: N large, T small (N ) social science panel surveys Sample General population:  rotating: only few (predefined number) waves per individual (in CH: SILC, LFS)  indefinitely long (in CH: SHP) Special population:  e.g., age/birth cohorts (in CH e.g.: TREE, SHARE, COCON) representative for population of special agegroup / birthyears 15
6 Panel surveys increasingly important Changing focus in social sciences Life course research: effects of events within individuals Large investments in social science household panels surveys, high data quality! Concern about causality in crosssectional studies Analysis potential of panel data  close to experimental design: before and after studies  control unobserved individual characteristics (exogenous independent variables; implicit in regression analyses) 16
7 Identification of individual dynamics (poverty in SHP) 1% 96% 4% 89.5% 91.% 92.3% 91.2% 9.6% 5% 52% 49% 46% % 5% 48% 51% 54% 1.5% 9.% 7.7% 8.8% 9.4% poor not poor > individual dynamics can only be measured with panel data! 17
8 Identification of age, time, and (birth) cohort effects Fundamental relationship: a it = t  c i Effects from formative years (childhood, youth) > cohort effect (eg taste in music ) Time may affect behavior > time effect (eg computer performance) Behavior may change with age > age effect (eg health) In a crosssection, t is constant age and cohort collinear (only joint effect estimable) In a cohort study, cohort is constant age and time collinear (only joint effect estimable) In a panel, t varies, but A it, t, and c i collinear. only two of the three effects can be estimated we can use (t,c i ), (A it,c i ), or (A it,t), but not all three 18
9 Age, time, cohort effects: interpretation (no cohort effect) (aging effect) (age effect) (no aging effect) (no age effect) (aging and cohort effect) 19
10 Problems of panels Fieldwork / data quality related High costs (panel care, tracking households, incentives) Initial nonresponse (wave 1) and attrition (=dropout of panel after wave 1) Finally: you design a panel for the next generation Modeling related Sometimes strong assumptions for applicability of appropriate models necessary (later) 11
11 Advantages of panel data Allow to study individual dynamics + Control for unobserved characteristics between individuals by repeated observation + Higher precision of change + Data can be pooled 111
12 2 Introducing the Swiss Household Panel (SHP)
13 Swiss Household Panel: history Primary goal: observe social change and changing life conditions in Switzerland Started as a common project of the Swiss National Science Foundation, Swiss Federal Statistical Office, University of Neuchâtel First wave in 1999, more than 5, households Refreshment sample in 24, more than 2,5 households, several new questions Since 28, integrated into FORS (Swiss Centre of Expertise in the Social Sciences ), c/o University of Lausanne 22
14 Disciplines working with SHP data 23
15 SHP sample and methods Representative of the Swiss residential population Each individual surveyed every year (Sept.Jan.) All household members from 14 years on surveyed (proxy questionnaire if child or unable) Telephone interviews (central CATI), languages D/F/I Metadata: biographic, interviewers Paradata: call data (from address management) Following rules: OSM followed if moving, from 27 on all individuals All new household entrants surveyed 24
16 SHP sample size (households) and attrition Befragte Haushalte SHPI SHP II Crosssectional and longitudinal (SHPI and SHP I+II combined) weights to account for nonresponse and attrition 25
17 SHP: Survey process and questionnaires Grid Questionnaire: Inventory and characteristics of hhmembers Persons 18+ years «reference person» Persons 14+ years Persons 13 years + «unable to respond» Household Questionnaire: housing, finances, family roles, Individual Questionnaires: work, income, health, politics, leisure, satisfaction of life Individual Proxy Questionnaires: school, work, income, health, 26
18 SHP: Questionnaire Content Social structure: sociodemography, socioecomomy, work, education social stratification and social mobility Life events: marriages, births, deaths, deceases, accidents, conflicts with close persons, etc. life course Social participation: politics (attitudes, elections, party preferences and choice), culture, social network, leisure social integration, political attitude and behavior Perception and values: trust, confidence, gender values and social capital Satisfaction and health: physical and mental health selfevaluation, chronic problems, different satisfaction issues quality of life 27
19 SHP Household: composition and housing «objective» elements Characteristics of household members (sex, age, civil status, education, occupation) Relationships between all household members «subjective» elements Satisfaction with house, noise, pollution, etc. Assessment of state of house Since when at this place Type of house Size, number of rooms of house State of house (heating, noise, pollution, etc.) Costs and subsidy, etc. External help for domestic work Child care Division of labor Who takes decisions, etc. 28
20 SHP Household: standard of living «objective» elements Activities and (durable) goods Reason why absence of goods (financial, other) «subjective» elements Satisfaction with financial situation Financial difficulties Debts (+reason) Total household income Taxes Social and private financial transfers 29
21 SHP Individual level: family «objective» elements Children out of the house Division of housework, care for dependents Disagreement about family problems etc. «subjective» elements Satisfaction with private situation Satisfaction with living alone or together Satisfaction with division of housework SHP Individual level: health, wellbeing «objective» elements Health problems Physical activities Doctor visits, hospitalization Improvement of health Long and short term handicaps «subjective» elements Subjective health state Satisfaction with health Satisfaction with life in general 21
22 Profession of parents Level of education of parents Nationality of parents Financial problems in childhood Social origin and education Education level Current training Language capabilities Leisure «objective» elements Activities: holidays, invitation of and meeting friends, reading, Internet use, restaurant, etc. «subjective» elements Satisfaction with leisure time and leisure activities Satisfaction with worklife balance 211
23 Individual level work «objective» elements Job sector Social stratification Private or public Position Working time, commuting time Size of company «subjective» elements Satisfaction with work (general, income, interests, working conditions, amount, atmosphere) Risk of unemployment Job security Chances to get promoted Income «objective» elements Total personal income Total personal income from work Social transfers received Private transfers received Other income «subjective» elements Satisfaction with financial situation Assessment whether financial situation improved or not Possible reactions on financial problems 212
24 Values and politics «objective» elements Right to vote Political activities Member of a political party «subjective» elements Satisfaction with democracy Confidence in federal government Political interest Leftright political positioning Opinions on political questions Participation, integration, social network «objective» elements Frequency of contacts Voluntary work outside the household Participation and membership in associations Belief and religious participation «subjective» elements Satisfaction with personal relationships Assessment of amount of practical help received from partner, parents, friends, etc. Assessment of amount of emotional help received from partner, parents, friends, etc. General trust in people 213
25 Biographical (retrospective) questionnaire N = 5 56 Written questionnaire in 21/22, sent to all individuals surveyed in 2, aged 14 or over Questions since birth about family, education, and professional biography:  with whom lived together  periods out of Switzerland  changes of civil status  learned professions  education  professional and nonprofessional biography  family life events (divorceremarriage of parents) 214
26 International Context SHP is part of the Cross National Equivalent File (CNEF): General population panel surveys from: USA (PSID since 198) D (SOEP, since 1984) UK (BHPS since 1991) Canada (SLID since 1993) CH (SHP since 1999) Australia (HILDA since 21) Korea (KLIPS since 27) More countries will join (Russia, South Africa, ) Each panel includes subset of all variables (variables from original files can be merged) Variables expost harmonized, names, categories Missing income variables are imputed Frick, Jenkins, Lillard, Lipps and Wooden (27): The CrossNational Equivalent File (CNEF) and its member country household panel studies. Journal of Applied Social Science Studies (Schmollers Jahrbuch) 215
27 SHP Questionnaire: Rotation Module Social network X X X X Religion X X X Social participation X X X X Politics X X X X Leisure X X X X Psychological Scales X X X 216
28 Outlook: new sample (LIVES) SHP III (213, based on individual register) Biographic questionnaire in 1. Wave SHP III (with NCCR LIVES) NCCR LIVES (Precarious) life course University of Lausanne and Geneva 15 research projects, 12 years Use of SHP 217
29 SHP structure of the data 2 yearly files (currently available: (+beta 211)) household Individual 5 unique files master person (mp) master household (mh) social origin (so) last job (lj) activity (employment) calendar (ca) Complementary files biographical questionnaire Interviewer data (2, and yearly since 23) Call data (since 25) CNEF SHP data variables 218
30 Documentation (Website: D/E/F) e.g.,: User Guide Questionnaires Variable Search (by variable name and topic) Construction of variables Syntax examples  Merge data files with SPSS, SAS, Stata  Documentation Data Management with SHP 219
31 SHP data delivery Data ready about 1 year after end of fieldwork downloadable from SHPserver: Signed contract with FORS Upon contract receipt, login and password sent by Data free of charge Users become member of SHP scientific network and document all publications based on SHP data Data on request: Imputed income Call data Interviewer matching ID Context data (special contract); data is matched at FORS More info: 22
32 3 Stata and panel data
33 Why Stata? Capabilities Data management Broad range of statistics Powerful for panel data! Many commands ready for analysis Userwritten extensions Beginners and experienced users For beginners: analysis through menus (point and click) Advanced users: good programmable capacities 3_2
34 Starting with Stata Basics Look at the data, variables Descriptive statistics Regression analysis Handout Stata basics Working with panel data Merge Creating «long files» Working with the long file Add information from other household members Handout data management of SHP with Stata (includes Syntax examples, exercises) 3_3
35 1. Merge: _merge variable Master file idpers p7c idpers p8c using file idpers p7c44 p8c _merge Merge variable 1 only in master file 2 only in user file 3 in both files 3_4
36 Merge: identifier Master file idpers p7c idpers p8c using file idpers p7c44 p8c _merge _5
37 Merge files: identifiers filename identifiers Individual master file shp_mp idpers, idhous$$, idfath, idmoth Individual annual files shp$$_p_user idpers, idint, idhous$$, idspou, refper$$ Additional ind. files (Social origin, last job, calendar, biographic) shp_so, shp_lj shp_ca, shp_* idpers Interviewer data shp$$_v_user idint Household annual files shp$$_h_user Biographic files idhous$$, refpers, idint, canton$$, (gdenr) idpers CNEF files shpequiv_$$$$ x1111ll (=idpers) 3_6
38 Stata merge command The merge command merge [type] [varlist] using filename [filename...] [, options] varlist filename identifier(s), e.g. idpers data set to be merged type 1:1 each observation has a unique identifier in both data sets 1:m, m:1 some observations have the same identifier in one data set 3_7
39 2 annual individual files Basic merge example I use shp8_p_user, clear merge 1:1 idpers using shp_p_user _merge Freq. Percent Cum , , , Total 16, _8
40 Basic merge example II annual individual file and individual master file use shp8_p_user, clear // opens the file (master) count //there are cases merge 1:1 idpers using shp_mp //identif. & using file tab _merge _merge Freq. Percent Cum , , Total 22, 1. drop if _merge==2 //if only ind. from 28 wanted drop _merge 3_9
41 Basic merge example III annual individual file and annual household file use shp8_p_user, clear //master file merge m:1 idhous8 using shp8_h_user /*identifier & using file */ _merge Freq. Percent Cum , Total 1, _1
42 More on merge Options of merge command keepusing (varlist): selection of variables from using file keep: selection of observations from master and/or using file for more options: type help merge Merge many files loops (see handout) Create partner files (see handout) 3_11
43 2. Wide and long format Wide format idpers i4empyn i5empyn i6empyn i7empyn Long format (personperiodfile) idpers year iempyn _12
44 Use of long data format in stata All panel applications: xt commands descriptives panel data models fixed effects models, random effects, multilevel discrete time eventhistory analysis declare panel structure panel identifier, time identifier xtset idpers wave 3_13
45 Convert wide form to long form reshape long command in stata reshape long varlist, i(idpers) j(wave) But: stata does not automatically detect years in varname reshape long /// i (idpers) /// j(wave "99" "" "1" "2" "3" "4" /// "5" "6" "7" "8" ),atwl () 3_14
46 Create a long file with append 1. Modify datasets for each wave idpers i99wyn idpers wave iwyn temp1.dta 2. Stack data sets use temp1, clear forval y = 2/1 { append using temp`y' } idpers i99wyn idpers wave iwyn temp2.dta 3_15
47 Work with time lags If data in long format and defined as panel data (xtset) l. indicates time lag Example: social class of last job (see handout) 3_16
48 Missing data in the SHP Missing data in the SHP: negative values 1 does not know 2 no answer 3 inapplicable (question has not been asked) 8/4 other missings Missing data in Stata:..a.b.c.d etc negative values are treated as real values missing data (..a.b etc) are defined as the highest possible values;. <.a <.b <.c <.d recode to missing or analyses only positive values e.g. sum i8empyn if i8empyn>= care with operator > e.g. count if i8empyn>1 counts also missing values write <. instead of!=. 3_17
49 Longitudinal data analysis with Stata xt commands descriptive statistics xtdescribe xtsum, xttab, xttrans regression analysis xtreg, xtgls, xtlogit, xtpoisson, xtcloglog xtmixed, xtmelogit, diagrams: xtline 3_18
50 Descriptive analysis Get to know the data Usually: similar findings to complicated models Visualisation Accessible results to a wider public Assumptions more explicit than in complicated models 3_19
51 Example: variability of party preferences Kuhn (29), Swiss Political Science Review 15(3): _2
52 Happiness with life 8.5 Example: becoming unemployed West East Germanspeaking Frenchspeaking Employed Year before unemployed 1st year unemployed 2nd year unemployed 5. Employed Year before unemployed 1st year unemployed Germany, Switzerland, nd year unemployed Oesch and Lipps (212), European Sociological Review (online first) 3_21
53 Example: Income mobility Switzerland Low income 29 Middle income 29 High income 29 Total Low income % 4.8 % 3.1 % 1 % Middle income % 75.8 % 1.8 % 1 % High income % 34.4 % 61.1 % 1 % Germany Low income 29 Middle income 29 High income 29 Total Low income % 36.4 % 1.9 % 1 % Middle income % 78.4 % 9.2 % 1 % High income % 29.6 % 67.8 % 1 % Grabka and Kuhn (212), Swiss Journal of Sociology 38(2), _22
54 4 Linear regression (Refresher course) 4_1
55 Aim and content Refresher course on linear regression What is a regression? How do we obtain regression coefficients? How to interpret regression coefficients? Inference from sample to population of interest (significance tests) Assumptions of linear regression Consequences when assumptions are violated 4_2
56 What is a regression? A statistical method for studying the relationship between a single dependent variable and one or more independent variables. Y: dependent variable X: independent variable(s) Simplest form: bivariate linear regression linear relationship between a dependent and one independent variable for a given set of observations Example Does the wage level affect the number of hours worked? Gender discrimination in wages? Do children increase happiness? 4_3
57 Y yi ŷi Linear regression: fitting a line Y = x 1 unit X slope xi ei 4_4
58 yearly income from employment number of years spent in paid work scatter plot of observations 4_5
59 yearly income from employment a b 1 unit x number of years spent in paid work Regression line: ŷ i = a + bx i = *x i 4_6
60 yearly income from employment number of years spent in paid work Regression line: ŷ i = a + bx i = *xi Estimated regression equation: y i = a + bx i + e i 4_7
61 Components (linear) regression equation Estimated regression equation: y i = a + bx i + e i y dependent variable x independent variable(s) (predictor(s), regressor(s)) a intercept (predicted value of Y if x =) b regression coefficients (slope) measure of the effect of X on Y e part of y not explained by x (residual), due to omitted variables  measurement errors  stochastic shock disturbance 4_8
62 Scales of independent variables Independent variables Continuous variables: linear Binary variables (Dummy variables) (, 1) (e.g. female=1, male=) Ordinal or multivariate variables (n categories) Create n1 dummy variables (base category) Examples: educational levels 1 low educational level 2 intermediate educational level 3 high educational level Include 2 dummy variables in regression model 4_9
63 Example: multivariate regression Including other covariates: Regression coefficients represent the portion of y explained by x that is not explained by the other x s Example: gender wage gap (sample: fulltime employed, yearly salary between 2 and 2 CHF) Bivariate model y = a + b x + e i salary = female + e i Multivariate model b constant 45'369 y = a + b 1 x 1 + b 2 x 2 + b 3 x e i female 9'9 education (Ref: compulsory) secondary education 9'197 tertiary education 3'786 supervision 17'128 financial sector 15'592 number of years in paid work 729 4_1
64 Assumptions for OLSestimations: coefficients Assumptions for OLSestimation (necessary to calculate slope coefficients) 1) No perfect multicollinearity (None of the regressors can be written as a linear function of the other regressors) 2) E(e) = 3) None of the x is correlated with e; Cov(x,e) = ; (all x s are exogenous) If assumptions 13 hold: OLS is consistent (regression coefficients asymptotically unbiased) 4_11
65 Inference from linear regression I Inference from OLSestimations if random sample But: OLS coefficients are estimations Estimated regression equation: y i = a + bx i + e i True regression equation: y i = α + βx i + ε i True coefficients (α, β) unknown, true «error term» unknown Distribution of coefficients (a, b) E( b) Var( b) ˆ 2 σ β E(β) 4_12
66 Inference from linear regression II Var ( b) ( i ) where 2 ( xi x) n p Variation of b (σ β2 ): decreases if n increases x are more spread out squared residuals decrease Distribution of b Student tdistribution Depends on n and number of x s = normal distribution if n large σ β E(β) 4_13
67 Inference from linear regression: testing whether b If β = (in population), there is no relationship between x and y test how likely it is, that β = H : Distribution if β = critical values for coefficients compare estimated coefficient with critical value if abs(b) >abs(critical value), b significant b stand b t value b b Critical value for standardized normal distribution and 95% confidence level: _14
68 Inference from linear regression: example yearly income from employment number of years spent in paid work Regression line: ŷ i = a + b *x i example: ŷ i = *x i 4_15
69 Inference from linear regression: example Sample n=53 Coef. st.e. t P> t [95% Conf. Interval] years work _cons R 2 :.11 Sample n=1787 Coef. St.e. t P> t [95% Conf. Interval] years work _cons R 2 :.159 4_16
70 4_17 Inference : assumptions Assumptions on error terms Independence of error terms, no autocorrelation: Cov (ε i, ε k ) = for all i,k, i k Constant error variance : Var(ε i )=σ 2 ε for all i; (Homoscedasticity) Preferentially: e is normally distributed Matrix of error terms ; n n n n k i
71 4_18 Autocorrelation Reason: Nested observations (e.g. households, schools, time, communities) standard errors underestimated OLS, adjust standard errors 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ ; n n n n k i ; n n n n k i autocorrelation no autocorrelation
72 4_19 Heteroskedasticity Variance is not consistent standard errors overestimated or underestimated OLS, adjust standard errors (White standard errors) Weighted least squares (WLS) σ σ 1 σ 5 σ 4 σ 3 σ 2 σ ; n n n n k i ; n n n n k i Homoskedasticity Heteroskedasticity
73 Summary: assumptions of OLS regression General Continuous dependent variable Random sample Coefficient estimation No perfect multicollinearity E(e) = No endogeneity Cov(x,e) = Omitted variables Measurement error in indep. variables Simultaneity Nonlinearity in parameters Inference No autocorrelation Cov (ei, ek)= Constant variance (no heterogeneity) Preferentially: residuals normally distributed Coefficients biased (inconsistent) Standard errors of coefficients biased 4_2
74 Endogeneity Traditional meaning Variable is determined within a model Econometrics Any situation where an explanatory variable is correlated with the residual If a variable is endogenous Care with interpretation: model cannot be interpreted as causal 4_21
75 Endogeneity: reasons Omitted variables Measurement error (in explanatory variables) Simultaneity Nonlinearity in parameters x contains lagged values of y (see later, dynamic models) 4_22
76 Endogeneity: consequences and detection Consequence of endogeneity ALL estimators may be biased! (exception: if a variable is completely exogenous controlling for the endogenous variable) Detection of endogeneity Difficult to detect and correct! Caution for causal interpretation Theory, literature (variable selection and interpretation)!!!! 4_23
77 Endogeneity: correction Test for nonlinear relationship and interactions Omitted variables Proxy variables, instrumental variables (2sls estimation) Panels: FEmodels (within estimators), DifferenceinDifference models Propensity score analysis, Regression discontinuity, Heckman selection models, Simultaneity Structural equations modelling, Panel data for time ordering Theory, literature (variable selection and interpretation)!!!! 4_24
78 Example: Control of unobserved heterogeneity Example: effect of partnership on happiness happiness = e (partnership) We know (from literature, e.g.): happiness = f (attractiveness, leisure activities, health,...) [not measured!] Similarly: partnership = g (attractiveness, leisure activities, health,...) Crosssectional data happiness = e(partnership) but we know that there are positive effects on partnership from income, fitness, attractiveness, health,... > which part of effects are due to attractiveness, leisure activities, health,...? Panel data 1. Happiness (at times with a partner) 2. Happiness (at times without partner) of the same individuals Care: reversed causality, timedependent unmeasured effects!) 4_25
79 Example: Regression analysis using Stata Sample: individuals in employment, 2 to 6 years Dependent variable: Working hours per week (paid work) Independent variables: hourly wage, number of children, married, sex, age Summary statistics sum workhours wage age8 married_nokid onekid twokids threepkids /// if workhours> & age8>=2 & age8<=6 Variable Obs Mean Std. Dev. Min Max workhours wage age married_no kids onekid twokids three+ kids _26
80 Example: check hourly wages graph box wage 4_ gross hourly wage Frequency gross hourly wage
81 Example: transform hourly wages Frequency Frequency gross hourly wage histogram wage, freq histogram lnwage, freq lnwage 4_28
82 reg workhours wagesd wagesq agerec agerecsq married_nokid /// onekid twokids threepkids if workhours> & age8>2 & age8<=6 Source SS df MS Number of obs = F( 8, 36) = Model Prob > F =. Residual Rsquared = Adj Rsquared =.3598 Total Root MSE = workhours Coef. Std. Err. t P> t [95% Conf. Interval] female lnwage lnwagesq onekid twokids threepkids married_no~d agerec _cons _29
83 Diagnostic plot: heteroskedasticity standardised residuals Fitted values 4_3
84 Diagnostic plot: normal distribution of residuals Density standardised residuals Standardized residuals normal scores 4_31
85 Regression with panel data: Data structure Wide data format Long data format (personperiodfile, pooled data) idpers wage4 wave5 wave6 wage idpers year wage _32
86 OLS with pooled panel data: problems I OLS for crosssectional analysis (one wave) no particular problem! OLS for pooled data (different years in one file) Problem: assumption of independent observations violated (autocorrelation) Possible correction: Correct for clustering in error terms (coefficients unaffected) But: OLS is not the best estimator for pooled data (not efficient) number of working hours OLS OLS, cluster in se per week b t b t lnwage (13.76) lnwage squared (9.57) (5.55) female (15.34) (11.4) 1 child.78 (2.5).78 (1.53) female*1 child (2.43) (14.) 2 children 1.74 (4.91) 1.74 (3.84) female*2 children (33.4) (16.36) 3+ children 2.85 (5.88) 2.85 (2.85) female*3+ children (26.73) (18.84) married, no child 2.4 (5.63) 2.4 (4.53) female*married, no child (19.95) (13.63) age .6 (6.89) .6 (4.39) 4_33
87 OLS with panel data: problems II OLS does not take advantage of panel structure Two different types of variation in panel data Variation within individuals Variation between individuals Control for unobservable variables (stable personal characteristics) Fixed Effects Models (only within variation) Random Effect Models (multilevel /random intercept / frailty for event history) 4_34
88 Comparisons of different regression models number of working OLS OLS, cluster Random effects Fixed effects hours per week b t b t b t b t lnwage (13.76) (14.2) 13.3 (1.26) lnwage squared (9.57) (5.55) (12.3) (9.91) female (15.34) (11.4) (16.19) dropped 1 child.78 (2.5).78 (1.53) 1.33 (3.31).22 (.39) female*1 child (2.43) (14.) (19.) 6.8 (7.76) 2 children 1.74 (4.91) 1.74 (3.84) 1.84 (4.23).19 (.3) female*2 children (33.4) (16.36) (23.3) (8.17) 3+ children 2.85 (5.88) 2.85 (2.85) 2.55 (4.23) .23 (.26) female*3+ children (26.73) (18.84) (19.28) (6.73) married, no child 2.4 (5.63) 2.4 (4.53) 1.79 (4.23).21 (.36) female*married, no chi (19.95) (13.63) (11.71) .54 (.63) age .6 (6.89) .6 (4.39) .3 (2.42) constant R squared _35
89 5 Introducing Fixed Effects Models ( within effects)
90 Hypothesis: Example: BMI after stopping smoking BMI increases after stopping smoking Hypothetical data: Random sample of former smokers with year after stopping and BMI at that time for 3 individuals time bmi1 bmi2 bmi
91 BMI after stopping smoking: pooled OLS BMI years after stop smoke bmi Fitted values 53
92 Pooled regression. * pooled regression:. reg bmi time Source SS df MS Number of obs = F( 1, 16) =.26 Model Prob > F =.6149 Residual Rsquared = Adj Rsquared = Total Root MSE = bmi Coef. Std. Err. t P> t [95% Conf. Interval] time _cons No BMI increase over time 54
93 BMI after stopping smoking: individual data BMI years after stop smoke P1 P2 P3 OLS P1 OLS P2 OLS P3 OLSpooled Autocorrelat. with pooled regressionunobserved individual heterogeneity 55
94 Individual regressions. forval j=1/3 { /* loop over each individual*/ 2. reg bmi`j' time 3. } bmi1 Coef. Std. Err. t P> t [95% Conf. Interval] time _cons bmi2 Coef. Std. Err. t P> t [95% Conf. Interval] time _cons bmi3 Coef. Std. Err. t P> t [95% Conf. Interval] time _cons All individuals have significant BMI increase over time 56
95 Excursus: unobserved heterogeneity Omitted variables bias: Many individual characteristics are not observed e.g. enthusiasm, ability, willingness to take risks, our example: physical activities, calories intake, muscle mass, genes, cohort These have generally an effect on dependent variable, and are correlated with independent variables. Then regression coefficients will be biased! Note: these (formerly) unobserved measures are increasingly included in surveys 57
96 What about the betweeneffect? BMI years after stop smoke P1 P1 P2 P2 P3 P3 OLS P1 OLS P1 OLS P2 OLS OLS P2 P3 OLSpooled P3 58
97 Which models are appropriate to analyze the effects of time? Data transformation necessary 59
98 Panel Data and within Regression (FE) 51
99 Error components in panel data models We separate the error components: e it = u i + ε u it, i = personspecific unobserved heterogeneity (level) = fixed effects (e.g., physical activities, calories intake, genetics, cohort) ε it = residual Model: bmi it 1 x it u i it Remember: Pooled OLS assumes that x is not correlated with both error components u i and ε it (omitted variable bias) 511
100 Fixed effects regression We can eliminated the fixed effects u i by estimating them as person specific dummies > remains only withinvariation Corresponds to demeaning for each individual: bmi it 1 x it u i it (1) bmi i individual mean: (2) i 1 x i u i subtract (2) from (1): bmi it bmi i ( x x ) ( ) 1 it i it i > Fixed (all time invariant) effects u i disappear, i.e. timeconstant unobserved heterogeneity is eliminated 512
101 Demeaned values with OLS regression demeaned 25 BMI BMI demeaned years after years stop after smoke stop smoke P1 P1 P2 P2 P3 P3 OLS P1 OLS OLS P2 P2 OLS OLS P3 P3 513
102 OLS of individually demeaned Data We demean and regress the Data:. bysort id: egen bmi_m=mean(bmi). gen bmi_dem=bmibmi_m. bysort id: egen time_m=mean(time). gen time_dem=timetime_m. reg bmi_dem time_dem Source SS df MS Number of obs = F( 1, 16) = Model Prob > F =. Residual Rsquared = Adj Rsquared =.844 Total Root MSE = bmi_dem Coef. Std. Err. t P> t [95% Conf. Interval] time_dem _cons 2.12e
103 Direct modeling of fixed Effects in Stata xtreg bmi time, fe (calculates correct df; this causes higher Std. Err.). xtreg bmi time, fe Fixedeffects (within) regression Number of obs = 18 Group variable: id Number of groups = 3 Rsq: within =.8532 Obs per group: min = 6 between =.992 avg = 6 time since stop smoking explains parts of individual heterogeneity! max = 6 overall =.162 (from pooled OLS) F(1,14) = corr(u_i, Xb) = Prob > F = bmi Coef. Std. Err. t P> t [95% Conf. Interval] time _cons sigma_u sigma_e rho (fraction of variance due to u_i) F test that all u_i=: F(2, 14) = Prob > F =. 515
104 Alternative: OLS with individual dummies controlled. xi i.id, noomit. reg bmi time _I*, noconst Source SS df MS Number of obs = F( 4, 14) = Model Prob > F =. Residual Rsquared =.9996! Adj Rsquared =.9994 Total Root MSE = bmi Coef. Std. Err. t P> t [95% Conf. Interval] time _Iid_ _Iid_ _Iid_ useful for small N, the u i are estimated (only approximate) 516
105 FE estimation can solve the problem of unobserved heterogeneity But: Summary: Fixed Effects Estimation If number of groups large, many extra parameters Enough variance needed in data With FERegressions, estimation of timeconstant covariates not possible. Are dropped from the model. But: possibility to use interactions (like male*nrchildren) What about comparing with people who never stopped smoking or who never smoked? (later) 517
106 No identification of timeinvariant covariates z i Consider the model: y it = az i + bx it + u i + ε it (1) let be an arbitrary number; add and subtract z i on the rhs: y it = (az i + z i ) + bx it + (u i  z i ) + ε it and rewrite this as: y it = a*z i + bx it + u i *+ ε it with a* = a + and u i * = u i  z i (2) But (1) and (2) have exactly the same form so it is not clear if a or a* = a + is estimated > separate effects of az i and u i cannot be distinguished without further assumptions (e.g., no correlation between z i and u i ) 518
107 Example: DiD (control group comparison) Hypothesis: financial support increases BMI of lowincome women (Schmeisser 29)* (hypothetical) experiment: Survey: Sample of low income female patients of doctoral surgeries randomized into program social aid (5% in program and 5% not) 4 measurements of bmi (2 x before program start, 2 x after) *Expanding wallets and waistlines: the impact of family income on the bmi of women and men eligible for the earned income tax credit. Health Econ. 18:
108 Data: BMIincrease through social aid? bmi BMI of low income Women time Effects: causal effect: BMI increase due to more fast food (more money available) Time (age) effect: BMI increases with age NO aid aid Start social aid Programm 52
109 Pooled Regression. reg bmi aid Source SS df MS Number of obs = F( 1, 14) =.89 Model Prob > F =.3627 Residual Rsquared = Adj Rsquared = .77 Total Root MSE = bmi Coef. Std. Err. t P> t [95% Conf. Interval] aid _cons
110 Fixed effects xtreg bmi aid, fe. xtreg bmi aid, fe Fixedeffects (within) regression Number of obs = 16 Group variable: id Number of groups = 4 Rsq: within =.3596 Obs per group: min = 4 between =.54 avg = 4. overall =.595 max = 4 F(1,11) = 6.18 corr(u_i, Xb) = .74 Prob > F = bmi Coef. Std. Err. t P> t [95% Conf. Interval] aid _cons sigma_u sigma_e rho (fraction of variance due to u_i) F test that all u_i=: F(3, 11) = Prob > F =. 522
111 One step back: a causal model With crosssectional data: only betweenestimation: Crucial assumption: Y i, Tt ( reatment) C( ontrol) Yi,t Random sample (no unobserved heterogeneity) With Panel data I: withinestimation (before and after) T C Y i, t 1 Yi,t problem: time effects, panel conditioning With Panel data II: differenceindifference (DID): ( T C C C Y i, t 1Yi,t j,t 1 Yj, t ) ( Y ) 523
112 Now: causal effect of aid We have  Beforeafter comparison (within)  Treatment and control groups (between) We compare the withineffect of aid ( treatment ) with that without aid ( control ) i.e., we calculate treatment effect and control for time DID estimator: =(after aid before aid ) (after noaid before noaid )
113 DiD effects. xi i.time, noomit. xtreg bmi aid _I*, fe note: _Itime_4 omitted because of collinearity Fixedeffects (within) regression Number of obs = 16 Group variable: id Number of groups = 4 Rsq: within =.8876 Obs per group: min = 4 between =.54 avg = 4. overall =.1528 max = 4 F(4,8) = 15.8 corr(u_i, Xb) = .55 Prob > F = bmi Coef. Std. Err. t P> t [95% Conf. Interval] aid _Itime_ _Itime_ _Itime_ _Itime_4 (omitted) _cons sigma_u sigma_e rho (fraction of variance due to u_i) F test that all u_i=: F(3, 8) = Prob > F =. 525
114 FE easier, no control group Summary: within estimators DID: control group can control simultaneous effects (like time): find statistical twin, such that research variables is only variable > DID useful for estimating causal effects from nonexperimental data. Especially for small samples Excursus: FirstDifference (FD) estimators: Stable similarities of adjacent observations eliminated. Problem: level differences not taken into account (65 children = 1 ch) for lasting effects (like children): FD not useful because only immediate changes taken into account 526
115 6 Introducing Random Effects Models 61
116 Towards RE: error components in panel models We have both within and between variance: y it = a + e it = a + u i + ε it fixed effects = within: u i for each person ( ANOVA) two residuals:  it on lowest (1.) level: time point a u u i on highest (2.) level: individuals 62
117 Necessary, if data have different levels with  observations are not independent of levels  true social interactions Examples: Motivation: multilevel models Schools classes students: first applications Networks: people are influenced by their peers Spatial context: from environment (e.g., poor people are less happy if they live in a rich environment) US: neighborhoodeffects Interviewer  effects: respondents clustered in interviewers Panelsurveys: waves clustered in respondents (households) 63
118 Levels in clustered data Hierarchical  Households in neighborhoods  Students in schools in classes (three levels)  Respondents in interviewers  Panel Surveys: Waves in respondents (crossed?) Crossed  Questions in respondents longitudinal? Attention: Do not confuse variables and levels: (total) variance can be attributed to levels, not to variables! E.g., 1 hospitals are probably a level, 7 nations are probably dummy variables. Think of a population the sample represents! Note: 3 rule of thumb in contexts: 3 second level units, randomly chosen 64
119 Multilevel models: analytic advantages Improved regression models  unbiased estimators for regression coefficients  unbiased estimators for standard errors (usually higher std.err. than OLS)  model true covariance structure (autocorrelation, heteroscedasticity, ) Decomposition of total variance into those in different levels (withinbetween) Similar to Analysis of variance (ANOVA), but parsimonious (number of estimation parameter independent on number of contexts) can handle large number of contexts (in parts) modeling of unobserved heterogeneity / self selection 65
120 Typical results: onelevel vs. multilevel model Dependent Variable mixed school boy school girl school Underestimated variance: Kish design effect deff : larger N necessary 66
121 Illustration: variance decomposition between levels Example: 2 individuals each asked 2x about their happiness (continuous), here measurements not time ordered! Which variance is due to individuals, which to observations? 3 Happiness Total ( Grand ) Mean (=) Indiv. 1 Indiv. 2 Total Mean= (3+2+(1)+(4)) / 4 = 67
122 Illustration: calculation of the total variance Happiness n i 1 T i ( t 1 y it T yy y ) 2 n T i i1 t1 ( y it W y yy Total Mean (=) i ) 2 n T i i1 t1 B yy (y i y) 2 Indiv. 1 Indiv. 2 Total Variance is equal to the Square of the Differences of all Observations from the Total Mean divided by the Sample Size (4) = { (3) 2 + (2) 2 + (1) 2 +(4) 2 } / 4 = ( )/4 =
123 Happiness Illustration: individual specific variance ( between ) n T i i1 t1 ( y T it yy y) 2 n i i1 t1 ( y W y yy Total Mean (=) 2.5 T it i ) 2 i 1t n Ti ( 1 B yy yi y ) 2 Indiv. 1 Indiv. 2 variance of the individual means = between variance ( between = u ) ( (2.5) 2 )/2 = 6.25 (remember: total variance = 7.5) 69
124 Happiness Illustration: measurement specific variance ( within ) n T i i1 t1 ( y T it yy 2.5 y) 2 i 1t n Ti ( 1 yit W yy yi ) 2 n T i i1 t1 B yy (y i y) 2 Indiv. 1 Indiv. 2 variance of measurements within individuals= within Variance (later: within = ) ((32.5) 2 + (22.5) 2 + (1(2.5)) 2 + (4(2.5)) 2 )/4 =1.25 (=18% of total variance) variance of individual means = ( (2.5) 2 )/2 = 6.25 (=82% of total variance = ρ = ICC (intraclasscorrelation) 61
125 ICC = (zero clustering use OLS) Examples of Θ ICC.8 ICC.2 ICC = 1 (maximum clustering) 611
126 y where : u ε it i it u Starting point: null ( Variance Components (VC)) model i it individual specific random variable (N(,σ deviation from individual (note : no intercept a in VC model) u specific mean (N(,σ ) assumed) ε ) assumed) the VC model allows for variance decomposition : ρ correlation between different time points t within an individual i: 2 σu ρ ( ICC intra  class  correlation autocorrelation in Panels) 2 2 σ σ u ε (note : ρ significant multilevel model necessary) 612
127 Idea RE: better estimate of u i (modeling intercept) random intercept ( borrowing strength from others): u high edu To estimate mean of individual i (=u i ) only within (FE) suboptimal if  sample small (T i small)  variance high  n large (inefficient) a low edu idea: use information u j from other sample members (between) in the same population group (e.g., education) 613
128 Idea RE: weighted within and between RE  Regression is equivalent to pooled OLS after the Transformation : (y it with θ y ) β i (1 θ) β (x θ 1, and σ 1 it θ x ) (u (1 θ) (ε 2 ε i 2 σε Tσ 2 u i, θ 1 it θ ε i )) RE uses optimal combination of within and between variation RE allows estimation of time invariant RE biased because u i variables u remains in error term (if cov(x,u ) ) i i 614
129 . xtreg bmi children, re theta Example ρ based on SHP Randomeffects GLS regression Number of obs = 18 Group variable: id Number of groups = 3 Rsq: within =.3921 Obs per group: min = 6 between =.1125 avg = 6. overall =.24 max = 6 Wald chi2(1) = 9.76 corr(u_i, X) = (assumed) Prob > chi2 =.18 theta = bmi Coef. Std. Err. z P> z [95% Conf. Interval] children _cons sigma_u sigma_e rho (fraction of variance due to u_i) 615
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationPanel Data Analysis Josef Brüderl, University of Mannheim, March 2005
Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 This is an introduction to panel data analysis on an applied level using Stata. The focus will be on showing the "mechanics" of these
More informationDepartment of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
More informationDETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10
More informationLab 5 Linear Regression with Withinsubject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Withinsubject Correlation Goals: Data: Fit linear regression models that account for withinsubject correlation using Stata. Compare weighted least square, GEE, and random
More informationCorrelated Random Effects Panel Data Models
INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 1319, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationPanel Data Analysis Fixed and Random Effects using Stata (v. 4.2)
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Oscar TorresReyna otorres@princeton.edu December 2007 http://dss.princeton.edu/training/ Intro Panel data (also known as longitudinal
More informationECON Introductory Econometrics. Lecture 17: Experiments
ECON4150  Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.
More informationRegression in Stata. Alicia Doyle Lynch HarvardMIT Data Center (HMDC)
Regression in Stata Alicia Doyle Lynch HarvardMIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationSample Size Calculation for Longitudinal Studies
Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG1891101A1) Introduction
More informationIntroduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, 2011. Professor Patricia A.
Introduction to Regression Models for Panel Data Analysis Indiana University Workshop in Methods October 7, 2011 Professor Patricia A. McManus Panel Data Analysis October 2011 What are Panel Data? Panel
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How
More informationEcon 371 Problem Set #3 Answer Sheet
Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore
More informationChapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More informationFrom the help desk: Swamy s randomcoefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s randomcoefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) randomcoefficients
More informationMilk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED
1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationMulticollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,
More informationAddressing Alternative. Multiple Regression. 17.871 Spring 2012
Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationStata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25yearold nephew, who is dating a 35yearold woman. God, I can t see them getting
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More informationMultilevel Models for Longitudinal Data. Fiona Steele
Multilevel Models for Longitudinal Data Fiona Steele Aims of Talk Overview of the application of multilevel (random effects) models in longitudinal research, with examples from social research Particular
More informationIn Chapter 2, we used linear regression to describe linear relationships. The setting for this is a
Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects
More information2. What are the theoretical and practical consequences of autocorrelation?
Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationBivariate Regression Analysis. The beginning of many types of regression
Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More information25 Working with categorical data and factor variables
25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationproblem arises when only a nonrandom sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a nonrandom
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationRegression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
More informationLecture 16. Endogeneity & Instrumental Variable Estimation (continued)
Lecture 16. Endogeneity & Instrumental Variable Estimation (continued) Seen how endogeneity, Cov(x,u) 0, can be caused by Omitting (relevant) variables from the model Measurement Error in a right hand
More informationI n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,
More informationEfficient and Practical Econometric Methods for the SLID, NLSCY, NPHS
Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Philip Merrigan ESGUQAM, CIRPÉE Using Big Data to Study Development and Social Change, Concordia University, November 2103 Intro Longitudinal
More informationModels for Longitudinal and Clustered Data
Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationRegression Analysis. Data Calculations Output
Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In nonlinear regression models, such as the heteroskedastic
More informationGETTING STARTED: STATA & R BASIC COMMANDS ECONOMETRICS II. Stata Output Regression of wages on education
GETTING STARTED: STATA & R BASIC COMMANDS ECONOMETRICS II Stata Output Regression of wages on education. sum wage educ Variable Obs Mean Std. Dev. Min Max +
More informationPanel Data Analysis in Stata
Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a SBahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing
More informationSELFTEST: SIMPLE REGRESSION
ECO 22000 McRAE SELFTEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an inclass examination, but you should be able to describe the procedures
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationTesting for serial correlation in linear paneldata models
The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear paneldata models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear paneldata
More informationChapter 11: Two Variable Regression Analysis
Department of Mathematics Izmir University of Economics Week 1415 20142015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationLectures 8, 9 & 10. Multiple Regression Analysis
Lectures 8, 9 & 0. Multiple Regression Analysis In which you learn how to apply the principles and tests outlined in earlier lectures to more realistic models involving more than explanatory variable and
More informationA Simple Feasible Alternative Procedure to Estimate Models with HighDimensional Fixed Effects
DISCUSSION PAPER SERIES IZA DP No. 3935 A Simple Feasible Alternative Procedure to Estimate Models with HighDimensional Fixed Effects Paulo Guimarães Pedro Portugal January 2009 Forschungsinstitut zur
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationRegression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationRethinking the Cultural Context of Schooling Decisions in Disadvantaged Neighborhoods: From Deviant Subculture to Cultural Heterogeneity
Rethinking the Cultural Context of Schooling Decisions in Disadvantaged Neighborhoods: From Deviant Subculture to Cultural Heterogeneity Sociology of Education David J. Harding, University of Michigan
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationUsing SPSS for Multiple Regression. UDP 520 Lab 7 Lin Lin December 4 th, 2007
Using SPSS for Multiple Regression UDP 520 Lab 7 Lin Lin December 4 th, 2007 Step 1 Define Research Question What factors are associated with BMI? Predict BMI. Step 2 Conceptualizing Problem (Theory) Individual
More informationMGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
More informationMultiple Regression  Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation
Multiple Regression  Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression  ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationDepartment of Economics, Session 2012/2013. EC352 Econometric Methods. Exercises from Week 03
Department of Economics, Session 01/013 University of Essex, Autumn Term Dr Gordon Kemp EC35 Econometric Methods Exercises from Week 03 1 Problem P3.11 The following equation describes the median housing
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationPanel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS15/16
1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS15/16 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationLecture 13. Use and Interpretation of Dummy Variables. Stop worrying for 1 lecture and learn to appreciate the uses that dummy variables can be put to
Lecture 13. Use and Interpretation of Dummy Variables Stop worrying for 1 lecture and learn to appreciate the uses that dummy variables can be put to Using dummy variables to measure average differences
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationSimple Linear Regression One Binary Categorical Independent Variable
Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationMultiple Regression. Motivations. Multiple Regression
Motivations Multiple Regression to make the predictions of the model more precise by adding other factors believed to affect the dependent variable to reduce the proportion of error variance associated
More informationWhen to Use Which Statistical Test
When to Use Which Statistical Test Rachel Lovell, Ph.D., Senior Research Associate Begun Center for Violence Prevention Research and Education Jack, Joseph, and Morton Mandel School of Applied Social Sciences
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationwhere b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
More informationEcon 371 Problem Set #4 Answer Sheet. P rice = (0.485)BDR + (23.4)Bath + (0.156)Hsize + (0.002)LSize + (0.090)Age (48.
Econ 371 Problem Set #4 Answer Sheet 6.5 This question focuses on what s called a hedonic regression model; i.e., where the sales price of the home is regressed on the various attributes of the home. The
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationModerator and Mediator Analysis
Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,
More informationModule 3: Multiple Regression Concepts
Contents Module 3: Multiple Regression Concepts Fiona Steele 1 Centre for Multilevel Modelling...4 What is Multiple Regression?... 4 Motivation... 4 Conditioning... 4 Data for multiple regression analysis...
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More information