Introduction to Panel Data Analysis

Size: px
Start display at page:

Download "Introduction to Panel Data Analysis"

Transcription

1 Introduction to Panel Data Analysis Oliver Lipps / Ursina Kuhn Swiss Centre of Expertise in the Social Sciences (FORS) c/o University of Lausanne Lugano Summer School, August

2 Introduction panel data, data management 1 Introducing panel data (OL) 2 The SHP (OL) 3 Data Management with Stata (UK) Regressions with panel data: basic 4 Regression refresher (UK) 5 Fixed effects (FE) models (OL) 6 Introducing random effects (RE) models (OL) 7 Nonlinear regression (UK) Start: 8.3 Breaks: Lunch Breaks: End: 17.3 Additional topics 8 Level 1 and 2 growth models (OL) 9 Missing data (OL) 1 Dynamic models (UK)

3 1 Introducing panel data

4 Surveys over time: repeated cross-sections vs. panels Cross-Section: Survey conducted at several points in time ( rounds ) using different sample members in each round Panel: Survey conducted at several points in time ( waves ) using the same individuals over waves -> panel data mostly from panel surveys -> If from cross-sectional surveys: retrospective ( biographical ) questionnaire 1-4

5 Panel Surveys: to distinguish Length and sample size: Time Series: N small (mostly=1), T large (T ) time series models Panel Surveys: N large, T small (N ) social science panel surveys Sample General population: - rotating: only few (pre-defined number) waves per individual (in CH: SILC, LFS) - indefinitely long (in CH: SHP) Special population: - e.g., age/birth cohorts (in CH e.g.: TREE, SHARE, COCON) representative for population of special agegroup / birthyears 1-5

6 Panel surveys increasingly important Changing focus in social sciences Life course research: effects of events within individuals Large investments in social science household panels surveys, high data quality! Concern about causality in cross-sectional studies Analysis potential of panel data - close to experimental design: before and after studies - control unobserved individual characteristics (exogenous independent variables; implicit in regression analyses) 1-6

7 Identification of individual dynamics (poverty in SHP) 1% 96% 4% 89.5% 91.% 92.3% 91.2% 9.6% 5% 52% 49% 46% % 5% 48% 51% 54% 1.5% 9.% 7.7% 8.8% 9.4% poor not poor -> individual dynamics can only be measured with panel data! 1-7

8 Identification of age, time, and (birth) cohort effects Fundamental relationship: a it = t - c i Effects from formative years (childhood, youth) -> cohort effect (eg taste in music ) Time may affect behavior -> time effect (eg computer performance) Behavior may change with age -> age effect (eg health) In a cross-section, t is constant age and cohort collinear (only joint effect estimable) In a cohort study, cohort is constant age and time collinear (only joint effect estimable) In a panel, t varies, but A it, t, and c i collinear. only two of the three effects can be estimated we can use (t,c i ), (A it,c i ), or (A it,t), but not all three 1-8

9 Age, time, cohort effects: interpretation (no cohort effect) (aging effect) (age effect) (no aging effect) (no age effect) (aging and cohort effect) 1-9

10 Problems of panels Fieldwork / data quality related High costs (panel care, tracking households, incentives) Initial nonresponse (wave 1) and attrition (=drop-out of panel after wave 1) Finally: you design a panel for the next generation Modeling related Sometimes strong assumptions for applicability of appropriate models necessary (later) 1-1

11 Advantages of panel data Allow to study individual dynamics + Control for unobserved characteristics between individuals by repeated observation + Higher precision of change + Data can be pooled 1-11

12 2 Introducing the Swiss Household Panel (SHP)

13 Swiss Household Panel: history Primary goal: observe social change and changing life conditions in Switzerland Started as a common project of the Swiss National Science Foundation, Swiss Federal Statistical Office, University of Neuchâtel First wave in 1999, more than 5, households Refreshment sample in 24, more than 2,5 households, several new questions Since 28, integrated into FORS (Swiss Centre of Expertise in the Social Sciences ), c/o University of Lausanne 2-2

14 Disciplines working with SHP data 2-3

15 SHP sample and methods Representative of the Swiss residential population Each individual surveyed every year (Sept.-Jan.) All household members from 14 years on surveyed (proxy questionnaire if child or unable) Telephone interviews (central CATI), languages D/F/I Metadata: biographic, interviewers Paradata: call data (from address management) Following rules: OSM followed if moving, from 27 on all individuals All new household entrants surveyed 2-4

16 SHP sample size (households) and attrition Befragte Haushalte SHPI SHP II Cross-sectional and longitudinal (SHPI and SHP I+II combined) weights to account for nonresponse and attrition 2-5

17 SHP: Survey process and questionnaires Grid Questionnaire: Inventory and characteristics of hh-members Persons 18+ years «reference person» Persons 14+ years Persons 13- years + «unable to respond» Household Questionnaire: housing, finances, family roles, Individual Questionnaires: work, income, health, politics, leisure, satisfaction of life Individual Proxy Questionnaires: school, work, income, health, 2-6

18 SHP: Questionnaire Content Social structure: socio-demography, socio-ecomomy, work, education social stratification and social mobility Life events: marriages, births, deaths, deceases, accidents, conflicts with close persons, etc. life course Social participation: politics (attitudes, elections, party preferences and -choice), culture, social network, leisure social integration, political attitude and behavior Perception and values: trust, confidence, gender values and social capital Satisfaction and health: physical and mental health selfevaluation, chronic problems, different satisfaction issues quality of life 2-7

19 SHP Household: composition and housing «objective» elements Characteristics of household members (sex, age, civil status, education, occupation) Relationships between all household members «subjective» elements Satisfaction with house, noise, pollution, etc. Assessment of state of house Since when at this place Type of house Size, number of rooms of house State of house (heating, noise, pollution, etc.) Costs and subsidy, etc. External help for domestic work Child care Division of labor Who takes decisions, etc. 2-8

20 SHP Household: standard of living «objective» elements Activities and (durable) goods Reason why absence of goods (financial, other) «subjective» elements Satisfaction with financial situation Financial difficulties Debts (+reason) Total household income Taxes Social and private financial transfers 2-9

21 SHP Individual level: family «objective» elements Children out of the house Division of housework, care for dependents Disagreement about family problems etc. «subjective» elements Satisfaction with private situation Satisfaction with living alone or together Satisfaction with division of housework SHP Individual level: health, well-being «objective» elements Health problems Physical activities Doctor visits, hospitalization Improvement of health Long and short term handicaps «subjective» elements Subjective health state Satisfaction with health Satisfaction with life in general 2-1

22 Profession of parents Level of education of parents Nationality of parents Financial problems in childhood Social origin and education Education level Current training Language capabilities Leisure «objective» elements Activities: holidays, invitation of and meeting friends, reading, Internet use, restaurant, etc. «subjective» elements Satisfaction with leisure time and leisure activities Satisfaction with work-life balance 2-11

23 Individual level work «objective» elements Job sector Social stratification Private or public Position Working time, commuting time Size of company «subjective» elements Satisfaction with work (general, income, interests, working conditions, amount, atmosphere) Risk of unemployment Job security Chances to get promoted Income «objective» elements Total personal income Total personal income from work Social transfers received Private transfers received Other income «subjective» elements Satisfaction with financial situation Assessment whether financial situation improved or not Possible reactions on financial problems 2-12

24 Values and politics «objective» elements Right to vote Political activities Member of a political party «subjective» elements Satisfaction with democracy Confidence in federal government Political interest Left-right political positioning Opinions on political questions Participation, integration, social network «objective» elements Frequency of contacts Voluntary work outside the household Participation and membership in associations Belief and religious participation «subjective» elements Satisfaction with personal relationships Assessment of amount of practical help received from partner, parents, friends, etc. Assessment of amount of emotional help received from partner, parents, friends, etc. General trust in people 2-13

25 Biographical (retrospective) questionnaire N = 5 56 Written questionnaire in 21/22, sent to all individuals surveyed in 2, aged 14 or over Questions since birth about family, education, and professional biography: - with whom lived together - periods out of Switzerland - changes of civil status - learned professions - education - professional and non-professional biography - family life events (divorce-re-marriage of parents) 2-14

26 International Context SHP is part of the Cross National Equivalent File (CNEF): General population panel surveys from: USA (PSID since 198) D (SOEP, since 1984) UK (BHPS since 1991) Canada (SLID since 1993) CH (SHP since 1999) Australia (HILDA since 21) Korea (KLIPS since 27) More countries will join (Russia, South Africa, ) Each panel includes subset of all variables (variables from original files can be merged) Variables ex-post harmonized, names, categories Missing income variables are imputed Frick, Jenkins, Lillard, Lipps and Wooden (27): The Cross-National Equivalent File (CNEF) and its member country household panel studies. Journal of Applied Social Science Studies (Schmollers Jahrbuch) 2-15

27 SHP Questionnaire: Rotation Module Social network X X X X Religion X X X Social participation X X X X Politics X X X X Leisure X X X X Psychological Scales X X X 2-16

28 Outlook: new sample (LIVES) SHP III (213, based on individual register) Biographic questionnaire in 1. Wave SHP III (with NCCR LIVES) NCCR LIVES (Precarious) life course University of Lausanne and Geneva 15 research projects, 12 years Use of SHP 2-17

29 SHP structure of the data 2 yearly files (currently available: (+beta 211)) household Individual 5 unique files master person (mp) master household (mh) social origin (so) last job (lj) activity (employment) calendar (ca) Complementary files biographical questionnaire Interviewer data (2, and yearly since 23) Call data (since 25) CNEF SHP data variables 2-18

30 Documentation (Website: D/E/F) e.g.,: User Guide Questionnaires Variable Search (by variable name and topic) Construction of variables Syntax examples - Merge data files with SPSS, SAS, Stata - Documentation Data Management with SHP 2-19

31 SHP data delivery Data ready about 1 year after end of fieldwork downloadable from SHP-server: Signed contract with FORS Upon contract receipt, login and password sent by Data free of charge Users become member of SHP scientific network and document all publications based on SHP data Data on request: Imputed income Call data Interviewer matching ID Context data (special contract); data is matched at FORS More info: 2-2

32 3 Stata and panel data

33 Why Stata? Capabilities Data management Broad range of statistics Powerful for panel data! Many commands ready for analysis User-written extensions Beginners and experienced users For beginners: analysis through menus (point and click) Advanced users: good programmable capacities 3_2

34 Starting with Stata Basics Look at the data, variables Descriptive statistics Regression analysis Handout Stata basics Working with panel data Merge Creating «long files» Working with the long file Add information from other household members Handout data management of SHP with Stata (includes Syntax examples, exercises) 3_3

35 1. Merge: _merge variable Master file idpers p7c idpers p8c using file idpers p7c44 p8c _merge Merge variable 1 only in master file 2 only in user file 3 in both files 3_4

36 Merge: identifier Master file idpers p7c idpers p8c using file idpers p7c44 p8c _merge _5

37 Merge files: identifiers filename identifiers Individual master file shp_mp idpers, idhous$$, idfath, idmoth Individual annual files shp$$_p_user idpers, idint, idhous$$, idspou, refper$$ Additional ind. files (Social origin, last job, calendar, biographic) shp_so, shp_lj shp_ca, shp_* idpers Interviewer data shp$$_v_user idint Household annual files shp$$_h_user Biographic files idhous$$, refpers, idint, canton$$, (gdenr) idpers CNEF files shpequiv_$$$$ x1111ll (=idpers) 3_6

38 Stata merge command The merge command merge [type] [varlist] using filename [filename...] [, options] varlist filename identifier(s), e.g. idpers data set to be merged type 1:1 each observation has a unique identifier in both data sets 1:m, m:1 some observations have the same identifier in one data set 3_7

39 2 annual individual files Basic merge example I use shp8_p_user, clear merge 1:1 idpers using shp_p_user _merge Freq. Percent Cum , , , Total 16, _8

40 Basic merge example II annual individual file and individual master file use shp8_p_user, clear // opens the file (master) count //there are cases merge 1:1 idpers using shp_mp //identif. & using file tab _merge _merge Freq. Percent Cum , , Total 22, 1. drop if _merge==2 //if only ind. from 28 wanted drop _merge 3_9

41 Basic merge example III annual individual file and annual household file use shp8_p_user, clear //master file merge m:1 idhous8 using shp8_h_user /*identifier & using file */ _merge Freq. Percent Cum , Total 1, _1

42 More on merge Options of merge command keepusing (varlist): selection of variables from using file keep: selection of observations from master and/or using file for more options: type help merge Merge many files loops (see handout) Create partner files (see handout) 3_11

43 2. Wide and long format Wide format idpers i4empyn i5empyn i6empyn i7empyn Long format (person-period-file) idpers year iempyn _12

44 Use of long data format in stata All panel applications: xt commands descriptives panel data models fixed effects models, random effects, multilevel discrete time event-history analysis declare panel structure panel identifier, time identifier xtset idpers wave 3_13

45 Convert wide form to long form reshape long command in stata reshape long varlist, i(idpers) j(wave) But: stata does not automatically detect years in varname reshape long /// i (idpers) /// j(wave "99" "" "1" "2" "3" "4" /// "5" "6" "7" "8" ),atwl () 3_14

46 Create a long file with append 1. Modify datasets for each wave idpers i99wyn idpers wave iwyn temp1.dta 2. Stack data sets use temp1, clear forval y = 2/1 { append using temp`y' } idpers i99wyn idpers wave iwyn temp2.dta 3_15

47 Work with time lags If data in long format and defined as panel data (xtset) l. indicates time lag Example: social class of last job (see handout) 3_16

48 Missing data in the SHP Missing data in the SHP: negative values -1 does not know -2 no answer -3 inapplicable (question has not been asked) -8/-4 other missings Missing data in Stata:..a.b.c.d etc negative values are treated as real values missing data (..a.b etc) are defined as the highest possible values;. <.a <.b <.c <.d recode to missing or analyses only positive values e.g. sum i8empyn if i8empyn>= care with operator > e.g. count if i8empyn>1 counts also missing values write <. instead of!=. 3_17

49 Longitudinal data analysis with Stata xt commands descriptive statistics xtdescribe xtsum, xttab, xttrans regression analysis xtreg, xtgls, xtlogit, xtpoisson, xtcloglog xtmixed, xtmelogit, diagrams: xtline 3_18

50 Descriptive analysis Get to know the data Usually: similar findings to complicated models Visualisation Accessible results to a wider public Assumptions more explicit than in complicated models 3_19

51 Example: variability of party preferences Kuhn (29), Swiss Political Science Review 15(3): _2

52 Happiness with life 8.5 Example: becoming unemployed West East German-speaking French-speaking Employed Year before unemployed 1st year unemployed 2nd year unemployed 5. Employed Year before unemployed 1st year unemployed Germany, Switzerland, nd year unemployed Oesch and Lipps (212), European Sociological Review (online first) 3_21

53 Example: Income mobility Switzerland Low income 29 Middle income 29 High income 29 Total Low income % 4.8 % 3.1 % 1 % Middle income % 75.8 % 1.8 % 1 % High income % 34.4 % 61.1 % 1 % Germany Low income 29 Middle income 29 High income 29 Total Low income % 36.4 % 1.9 % 1 % Middle income % 78.4 % 9.2 % 1 % High income % 29.6 % 67.8 % 1 % Grabka and Kuhn (212), Swiss Journal of Sociology 38(2), _22

54 4 Linear regression (Refresher course) 4_1

55 Aim and content Refresher course on linear regression What is a regression? How do we obtain regression coefficients? How to interpret regression coefficients? Inference from sample to population of interest (significance tests) Assumptions of linear regression Consequences when assumptions are violated 4_2

56 What is a regression? A statistical method for studying the relationship between a single dependent variable and one or more independent variables. Y: dependent variable X: independent variable(s) Simplest form: bivariate linear regression linear relationship between a dependent and one independent variable for a given set of observations Example Does the wage level affect the number of hours worked? Gender discrimination in wages? Do children increase happiness? 4_3

57 Y yi ŷi Linear regression: fitting a line Y = x 1 unit X slope xi ei 4_4

58 yearly income from employment number of years spent in paid work scatter plot of observations 4_5

59 yearly income from employment a b 1 unit x number of years spent in paid work Regression line: ŷ i = a + bx i = *x i 4_6

60 yearly income from employment number of years spent in paid work Regression line: ŷ i = a + bx i = *xi Estimated regression equation: y i = a + bx i + e i 4_7

61 Components (linear) regression equation Estimated regression equation: y i = a + bx i + e i y dependent variable x independent variable(s) (predictor(s), regressor(s)) a intercept (predicted value of Y if x =) b regression coefficients (slope) measure of the effect of X on Y e part of y not explained by x (residual), due to -omitted variables - measurement errors - stochastic shock -disturbance 4_8

62 Scales of independent variables Independent variables Continuous variables: linear Binary variables (Dummy variables) (, 1) (e.g. female=1, male=) Ordinal or multivariate variables (n categories) Create n-1 dummy variables (base category) Examples: educational levels 1 low educational level 2 intermediate educational level 3 high educational level Include 2 dummy variables in regression model 4_9

63 Example: multivariate regression Including other covariates: Regression coefficients represent the portion of y explained by x that is not explained by the other x s Example: gender wage gap (sample: full-time employed, yearly salary between 2 and 2 CHF) Bivariate model y = a + b x + e i salary = female + e i Multivariate model b constant 45'369 y = a + b 1 x 1 + b 2 x 2 + b 3 x e i female -9'9 education (Ref: compulsory) secondary education 9'197 tertiary education 3'786 supervision 17'128 financial sector 15'592 number of years in paid work 729 4_1

64 Assumptions for OLS-estimations: coefficients Assumptions for OLS-estimation (necessary to calculate slope coefficients) 1) No perfect multicollinearity (None of the regressors can be written as a linear function of the other regressors) 2) E(e) = 3) None of the x is correlated with e; Cov(x,e) = ; (all x s are exogenous) If assumptions 1-3 hold: OLS is consistent (regression coefficients asymptotically unbiased) 4_11

65 Inference from linear regression I Inference from OLS-estimations if random sample But: OLS coefficients are estimations Estimated regression equation: y i = a + bx i + e i True regression equation: y i = α + βx i + ε i True coefficients (α, β) unknown, true «error term» unknown Distribution of coefficients (a, b) E( b) Var( b) ˆ 2 σ β E(β) 4_12

66 Inference from linear regression II Var ( b) ( i ) where 2 ( xi x) n p Variation of b (σ β2 ): decreases if n increases x are more spread out squared residuals decrease Distribution of b Student t-distribution Depends on n and number of x s = normal distribution if n large σ β E(β) 4_13

67 Inference from linear regression: testing whether b If β = (in population), there is no relationship between x and y test how likely it is, that β = H : Distribution if β = critical values for coefficients compare estimated coefficient with critical value if abs(b) >abs(critical value), b significant b stand b t value b b Critical value for standardized normal distribution and 95% confidence level: _14

68 Inference from linear regression: example yearly income from employment number of years spent in paid work Regression line: ŷ i = a + b *x i example: ŷ i = *x i 4_15

69 Inference from linear regression: example Sample n=53 Coef. st.e. t P> t [95% Conf. Interval] years work _cons R 2 :.11 Sample n=1787 Coef. St.e. t P> t [95% Conf. Interval] years work _cons R 2 :.159 4_16

70 4_17 Inference : assumptions Assumptions on error terms Independence of error terms, no autocorrelation: Cov (ε i, ε k ) = for all i,k, i k Constant error variance : Var(ε i )=σ 2 ε for all i; (Homoscedasticity) Preferentially: e is normally distributed Matrix of error terms ; n n n n k i

71 4_18 Autocorrelation Reason: Nested observations (e.g. households, schools, time, communities) standard errors underestimated OLS, adjust standard errors 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ 2 σ ; n n n n k i ; n n n n k i autocorrelation no autocorrelation

72 4_19 Heteroskedasticity Variance is not consistent standard errors overestimated or underestimated OLS, adjust standard errors (White standard errors) Weighted least squares (WLS) σ σ 1 σ 5 σ 4 σ 3 σ 2 σ ; n n n n k i ; n n n n k i Homoskedasticity Heteroskedasticity

73 Summary: assumptions of OLS regression General Continuous dependent variable Random sample Coefficient estimation No perfect multicollinearity E(e) = No endogeneity Cov(x,e) = Omitted variables Measurement error in indep. variables Simultaneity Nonlinearity in parameters Inference No autocorrelation Cov (ei, ek)= Constant variance (no heterogeneity) Preferentially: residuals normally distributed Coefficients biased (inconsistent) Standard errors of coefficients biased 4_2

74 Endogeneity Traditional meaning Variable is determined within a model Econometrics Any situation where an explanatory variable is correlated with the residual If a variable is endogenous Care with interpretation: model cannot be interpreted as causal 4_21

75 Endogeneity: reasons Omitted variables Measurement error (in explanatory variables) Simultaneity Nonlinearity in parameters x contains lagged values of y (see later, dynamic models) 4_22

76 Endogeneity: consequences and detection Consequence of endogeneity ALL estimators may be biased! (exception: if a variable is completely exogenous controlling for the endogenous variable) Detection of endogeneity Difficult to detect and correct! Caution for causal interpretation Theory, literature (variable selection and interpretation)!!!! 4_23

77 Endogeneity: correction Test for nonlinear relationship and interactions Omitted variables Proxy variables, instrumental variables (2sls estimation) Panels: FE-models (within estimators), Difference-in-Difference models Propensity score analysis, Regression discontinuity, Heckman selection models, Simultaneity Structural equations modelling, Panel data for time ordering Theory, literature (variable selection and interpretation)!!!! 4_24

78 Example: Control of unobserved heterogeneity Example: effect of partnership on happiness happiness = e (partnership) We know (from literature, e.g.): happiness = f (attractiveness, leisure activities, health,...) [not measured!] Similarly: partnership = g (attractiveness, leisure activities, health,...) Cross-sectional data happiness = e(partnership) but we know that there are positive effects on partnership from income, fitness, attractiveness, health,... -> which part of effects are due to attractiveness, leisure activities, health,...? Panel data 1. Happiness (at times with a partner) 2. Happiness (at times without partner) of the same individuals Care: reversed causality, time-dependent unmeasured effects!) 4_25

79 Example: Regression analysis using Stata Sample: individuals in employment, 2 to 6 years Dependent variable: Working hours per week (paid work) Independent variables: hourly wage, number of children, married, sex, age Summary statistics sum workhours wage age8 married_nokid onekid twokids threepkids /// if workhours> & age8>=2 & age8<=6 Variable Obs Mean Std. Dev. Min Max workhours wage age married_no kids onekid twokids three+ kids _26

80 Example: check hourly wages graph box wage 4_ gross hourly wage Frequency gross hourly wage

81 Example: transform hourly wages Frequency Frequency gross hourly wage histogram wage, freq histogram lnwage, freq lnwage 4_28

82 reg workhours wagesd wagesq agerec agerecsq married_nokid /// onekid twokids threepkids if workhours> & age8>2 & age8<=6 Source SS df MS Number of obs = F( 8, 36) = Model Prob > F =. Residual R-squared = Adj R-squared =.3598 Total Root MSE = workhours Coef. Std. Err. t P> t [95% Conf. Interval] female lnwage lnwagesq onekid twokids threepkids married_no~d agerec _cons _29

83 Diagnostic plot: heteroskedasticity standardised residuals Fitted values 4_3

84 Diagnostic plot: normal distribution of residuals Density standardised residuals Standardized residuals normal scores 4_31

85 Regression with panel data: Data structure Wide data format Long data format (person-period-file, pooled data) idpers wage4 wave5 wave6 wage idpers year wage _32

86 OLS with pooled panel data: problems I OLS for cross-sectional analysis (one wave) no particular problem! OLS for pooled data (different years in one file) Problem: assumption of independent observations violated (autocorrelation) Possible correction: Correct for clustering in error terms (coefficients unaffected) But: OLS is not the best estimator for pooled data (not efficient) number of working hours OLS OLS, cluster in se per week b t b t lnwage (13.76) lnwage squared (-9.57) (-5.55) female (-15.34) (-11.4) 1 child.78 (2.5).78 (1.53) female*1 child (-2.43) (-14.) 2 children 1.74 (4.91) 1.74 (3.84) female*2 children (-33.4) (-16.36) 3+ children 2.85 (5.88) 2.85 (2.85) female*3+ children (-26.73) (-18.84) married, no child 2.4 (5.63) 2.4 (4.53) female*married, no child (-19.95) (-13.63) age -.6 (-6.89) -.6 (-4.39) 4_33

87 OLS with panel data: problems II OLS does not take advantage of panel structure Two different types of variation in panel data Variation within individuals Variation between individuals Control for unobservable variables (stable personal characteristics) Fixed Effects Models (only within variation) Random Effect Models (multilevel /random intercept / frailty for event history) 4_34

88 Comparisons of different regression models number of working OLS OLS, cluster Random effects Fixed effects hours per week b t b t b t b t lnwage (13.76) (14.2) 13.3 (1.26) lnwage squared (-9.57) (-5.55) (-12.3) (-9.91) female (-15.34) (-11.4) (-16.19) dropped 1 child.78 (2.5).78 (1.53) 1.33 (3.31).22 (.39) female*1 child (-2.43) (-14.) (-19.) -6.8 (-7.76) 2 children 1.74 (4.91) 1.74 (3.84) 1.84 (4.23).19 (.3) female*2 children (-33.4) (-16.36) (-23.3) (-8.17) 3+ children 2.85 (5.88) 2.85 (2.85) 2.55 (4.23) -.23 (.26) female*3+ children (-26.73) (-18.84) (-19.28) (-6.73) married, no child 2.4 (5.63) 2.4 (4.53) 1.79 (4.23).21 (.36) female*married, no chi (-19.95) (-13.63) (-11.71) -.54 (-.63) age -.6 (-6.89) -.6 (-4.39) -.3 (-2.42) constant R squared _35

89 5 Introducing Fixed Effects Models ( within effects)

90 Hypothesis: Example: BMI after stopping smoking BMI increases after stopping smoking Hypothetical data: Random sample of former smokers with year after stopping and BMI at that time for 3 individuals time bmi1 bmi2 bmi

91 BMI after stopping smoking: pooled OLS BMI years after stop smoke bmi Fitted values 5-3

92 Pooled regression. * pooled regression:. reg bmi time Source SS df MS Number of obs = F( 1, 16) =.26 Model Prob > F =.6149 Residual R-squared = Adj R-squared = Total Root MSE = bmi Coef. Std. Err. t P> t [95% Conf. Interval] time _cons No BMI increase over time 5-4

93 BMI after stopping smoking: individual data BMI years after stop smoke P1 P2 P3 OLS P1 OLS P2 OLS P3 OLSpooled Autocorrelat. with pooled regression-unobserved individual heterogeneity 5-5

94 Individual regressions. forval j=1/3 { /* loop over each individual*/ 2. reg bmi`j' time 3. } bmi1 Coef. Std. Err. t P> t [95% Conf. Interval] time _cons bmi2 Coef. Std. Err. t P> t [95% Conf. Interval] time _cons bmi3 Coef. Std. Err. t P> t [95% Conf. Interval] time _cons All individuals have significant BMI increase over time 5-6

95 Excursus: unobserved heterogeneity Omitted variables bias: Many individual characteristics are not observed e.g. enthusiasm, ability, willingness to take risks, our example: physical activities, calories intake, muscle mass, genes, cohort These have generally an effect on dependent variable, and are correlated with independent variables. Then regression coefficients will be biased! Note: these (formerly) unobserved measures are increasingly included in surveys 5-7

96 What about the between-effect? BMI years after stop smoke P1 P1 P2 P2 P3 P3 OLS P1 OLS P1 OLS P2 OLS OLS P2 P3 OLSpooled P3 5-8

97 Which models are appropriate to analyze the effects of time? Data transformation necessary 5-9

98 Panel Data and within- Regression (FE) 5-1

99 Error components in panel data models We separate the error components: e it = u i + ε u it, i = person-specific unobserved heterogeneity (level) = fixed effects (e.g., physical activities, calories intake, genetics, cohort) ε it = residual Model: bmi it 1 x it u i it Remember: Pooled OLS assumes that x is not correlated with both error components u i and ε it (omitted variable bias) 5-11

100 Fixed effects regression We can eliminated the fixed effects u i by estimating them as person specific dummies -> remains only within-variation Corresponds to de-meaning for each individual: bmi it 1 x it u i it (1) bmi i individual mean: (2) i 1 x i u i subtract (2) from (1): bmi it bmi i ( x x ) ( ) 1 it i it i -> Fixed (all time invariant) effects u i disappear, i.e. timeconstant unobserved heterogeneity is eliminated 5-12

101 De-meaned values with OLS regression de-meaned 25 BMI BMI de-meaned years after years stop after smoke stop smoke P1 P1 P2 P2 P3 P3 OLS P1 OLS OLS P2 P2 OLS OLS P3 P3 5-13

102 OLS of individually de-meaned Data We de-mean and regress the Data:. bysort id: egen bmi_m=mean(bmi). gen bmi_dem=bmi-bmi_m. bysort id: egen time_m=mean(time). gen time_dem=time-time_m. reg bmi_dem time_dem Source SS df MS Number of obs = F( 1, 16) = Model Prob > F =. Residual R-squared = Adj R-squared =.844 Total Root MSE = bmi_dem Coef. Std. Err. t P> t [95% Conf. Interval] time_dem _cons -2.12e

103 Direct modeling of fixed Effects in Stata xtreg bmi time, fe (calculates correct df; this causes higher Std. Err.). xtreg bmi time, fe Fixed-effects (within) regression Number of obs = 18 Group variable: id Number of groups = 3 R-sq: within =.8532 Obs per group: min = 6 between =.992 avg = 6 time since stop smoking explains parts of individual heterogeneity! max = 6 overall =.162 (from pooled OLS) F(1,14) = corr(u_i, Xb) = Prob > F = bmi Coef. Std. Err. t P> t [95% Conf. Interval] time _cons sigma_u sigma_e rho (fraction of variance due to u_i) F test that all u_i=: F(2, 14) = Prob > F =. 5-15

104 Alternative: OLS with individual dummies controlled. xi i.id, noomit. reg bmi time _I*, noconst Source SS df MS Number of obs = F( 4, 14) = Model Prob > F =. Residual R-squared =.9996! Adj R-squared =.9994 Total Root MSE = bmi Coef. Std. Err. t P> t [95% Conf. Interval] time _Iid_ _Iid_ _Iid_ useful for small N, the u i are estimated (only approximate) 5-16

105 FE estimation can solve the problem of unobserved heterogeneity But: Summary: Fixed Effects Estimation If number of groups large, many extra parameters Enough variance needed in data With FE-Regressions, estimation of time-constant covariates not possible. Are dropped from the model. But: possibility to use interactions (like male*nrchildren) What about comparing with people who never stopped smoking or who never smoked? (later) 5-17

106 No identification of time-invariant covariates z i Consider the model: y it = az i + bx it + u i + ε it (1) let be an arbitrary number; add and subtract z i on the rhs: y it = (az i + z i ) + bx it + (u i - z i ) + ε it and rewrite this as: y it = a*z i + bx it + u i *+ ε it with a* = a + and u i * = u i - z i (2) But (1) and (2) have exactly the same form so it is not clear if a or a* = a + is estimated -> separate effects of az i and u i cannot be distinguished without further assumptions (e.g., no correlation between z i and u i ) 5-18

107 Example: DiD (control group comparison) Hypothesis: financial support increases BMI of low-income women (Schmeisser 29)* (hypothetical) experiment: Survey: Sample of low income female patients of doctoral surgeries randomized into program social aid (5% in program and 5% not) 4 measurements of bmi (2 x before program start, 2 x after) *Expanding wallets and waistlines: the impact of family income on the bmi of women and men eligible for the earned income tax credit. Health Econ. 18:

108 Data: BMI-increase through social aid? bmi BMI of low income Women time Effects: causal effect: BMI increase due to more fast food (more money available) Time (age-) effect: BMI increases with age NO aid aid Start social aid Programm 5-2

109 Pooled Regression. reg bmi aid Source SS df MS Number of obs = F( 1, 14) =.89 Model Prob > F =.3627 Residual R-squared = Adj R-squared = -.77 Total Root MSE = bmi Coef. Std. Err. t P> t [95% Conf. Interval] aid _cons

110 Fixed effects xtreg bmi aid, fe. xtreg bmi aid, fe Fixed-effects (within) regression Number of obs = 16 Group variable: id Number of groups = 4 R-sq: within =.3596 Obs per group: min = 4 between =.54 avg = 4. overall =.595 max = 4 F(1,11) = 6.18 corr(u_i, Xb) = -.74 Prob > F = bmi Coef. Std. Err. t P> t [95% Conf. Interval] aid _cons sigma_u sigma_e rho (fraction of variance due to u_i) F test that all u_i=: F(3, 11) = Prob > F =. 5-22

111 One step back: a causal model With cross-sectional data: only between-estimation: Crucial assumption: Y i, Tt ( reatment) C( ontrol) Yi,t Random sample (no unobserved heterogeneity) With Panel data I: within-estimation (before and after) T C Y i, t 1 Yi,t problem: time effects, panel conditioning With Panel data II: difference-in-difference (DID): ( T C C C Y i, t 1Yi,t j,t 1 Yj, t ) ( Y ) 5-23

112 Now: causal effect of aid We have - Before-after comparison (within) - Treatment- and control groups (between) We compare the within-effect of aid ( treatment ) with that without aid ( control ) i.e., we calculate treatment effect and control for time DID estimator: =(after aid -before aid ) (after noaid before noaid )

113 DiD effects. xi i.time, noomit. xtreg bmi aid _I*, fe note: _Itime_4 omitted because of collinearity Fixed-effects (within) regression Number of obs = 16 Group variable: id Number of groups = 4 R-sq: within =.8876 Obs per group: min = 4 between =.54 avg = 4. overall =.1528 max = 4 F(4,8) = 15.8 corr(u_i, Xb) = -.55 Prob > F = bmi Coef. Std. Err. t P> t [95% Conf. Interval] aid _Itime_ _Itime_ _Itime_ _Itime_4 (omitted) _cons sigma_u sigma_e rho (fraction of variance due to u_i) F test that all u_i=: F(3, 8) = Prob > F =. 5-25

114 FE easier, no control group Summary: within estimators DID: control group can control simultaneous effects (like time): find statistical twin, such that research variables is only variable -> DID useful for estimating causal effects from nonexperimental data. Especially for small samples Excursus: First-Difference (FD) estimators: Stable similarities of adjacent observations eliminated. Problem: level differences not taken into account (6-5 children = 1- ch) for lasting effects (like children): FD not useful because only immediate changes taken into account 5-26

115 6 Introducing Random Effects Models 6-1

116 Towards RE: error components in panel models We have both -within and -between variance: y it = a + e it = a + u i + ε it fixed effects = within: u i for each person ( ANOVA) two residuals: - it on lowest (1.) level: time point a u u i on highest (2.) level: individuals 6-2

117 Necessary, if data have different levels with - observations are not independent of levels - true social interactions Examples: Motivation: multilevel models Schools classes students: first applications Networks: people are influenced by their peers Spatial context: from environment (e.g., poor people are less happy if they live in a rich environment) US: neighborhood-effects Interviewer - effects: respondents clustered in interviewers Panel-surveys: waves clustered in respondents (households) 6-3

118 Levels in clustered data Hierarchical - Households in neighborhoods - Students in schools in classes (three levels) - Respondents in interviewers - Panel Surveys: Waves in respondents (crossed?) Crossed - Questions in respondents longitudinal? Attention: Do not confuse variables and levels: (total) variance can be attributed to levels, not to variables! E.g., 1 hospitals are probably a level, 7 nations are probably dummy variables. Think of a population the sample represents! Note: 3 rule of thumb in contexts: 3 second level units, randomly chosen 6-4

119 Multilevel models: analytic advantages Improved regression models - unbiased estimators for regression coefficients - unbiased estimators for standard errors (usually higher std.err. than OLS) - model true covariance structure (autocorrelation, heteroscedasticity, ) Decomposition of total variance into those in different levels (within-between) Similar to Analysis of variance (ANOVA), but parsimonious (number of estimation parameter independent on number of contexts) can handle large number of contexts (in parts) modeling of unobserved heterogeneity / self selection 6-5

120 Typical results: one-level vs. multilevel model Dependent Variable mixed school boy school girl school Underestimated variance: Kish design effect deff : larger N necessary 6-6

121 Illustration: variance decomposition between levels Example: 2 individuals each asked 2x about their happiness (continuous), here measurements not time ordered! Which variance is due to individuals, which to observations? 3 Happiness Total ( Grand ) Mean (=) Indiv. 1 Indiv. 2 Total Mean= (3+2+(-1)+(-4)) / 4 = 6-7

122 Illustration: calculation of the total variance Happiness n i 1 T i ( t 1 y it T yy y ) 2 n T i i1 t1 ( y it W y yy Total Mean (=) i ) 2 n T i i1 t1 B yy (y i y) 2 Indiv. 1 Indiv. 2 Total Variance is equal to the Square of the Differences of all Observations from the Total Mean divided by the Sample Size (4) = { (3-) 2 + (2-) 2 + (-1-) 2 +(-4-) 2 } / 4 = ( )/4 =

123 Happiness Illustration: individual specific variance ( between ) n T i i1 t1 ( y T it yy y) 2 n i i1 t1 ( y W y yy Total Mean (=) -2.5 T it i ) 2 i 1t n Ti ( 1 B yy yi y ) 2 Indiv. 1 Indiv. 2 variance of the individual means = between -variance ( between = u ) ( (-2.5) 2 )/2 = 6.25 (remember: total variance = 7.5) 6-9

124 Happiness Illustration: measurement specific variance ( within ) n T i i1 t1 ( y T it yy -2.5 y) 2 i 1t n Ti ( 1 yit W yy yi ) 2 n T i i1 t1 B yy (y i y) 2 Indiv. 1 Indiv. 2 variance of measurements within individuals= within -Variance (later: within = ) ((3-2.5) 2 + (2-2.5) 2 + (-1-(-2.5)) 2 + (-4-(-2.5)) 2 )/4 =1.25 (=18% of total variance) variance of individual means = ( (-2.5) 2 )/2 = 6.25 (=82% of total variance = ρ = ICC (intra-class-correlation) 6-1

125 ICC = (zero clustering use OLS) Examples of Θ ICC.8 ICC.2 ICC = 1 (maximum clustering) 6-11

126 y where : u ε it i it u Starting point: null ( Variance Components (VC)) model i it individual specific random variable (N(,σ deviation from individual (note : no intercept a in VC model) u specific mean (N(,σ ) assumed) ε ) assumed) the VC model allows for variance decomposition : ρ correlation between different time points t within an individual i: 2 σu ρ ( ICC intra - class - correlation autocorrelation in Panels) 2 2 σ σ u ε (note : ρ significant multilevel model necessary) 6-12

127 Idea RE: better estimate of u i (modeling intercept) random intercept ( borrowing strength from others): u high edu To estimate mean of individual i (=u i ) only within (FE) suboptimal if - sample small (T i small) - variance high - n large (inefficient) a low edu idea: use information u j from other sample members (between) in the same population group (e.g., education) 6-13

128 Idea RE: weighted within and between RE - Regression is equivalent to pooled OLS after the Transformation : (y it with θ y ) β i (1 θ) β (x θ 1, and σ 1 it θ x ) (u (1 θ) (ε 2 ε i 2 σε Tσ 2 u i, θ 1 it θ ε i )) RE uses optimal combination of within and between variation RE allows estimation of time invariant RE biased because u i variables u remains in error term (if cov(x,u ) ) i i 6-14

129 . xtreg bmi children, re theta Example ρ based on SHP Random-effects GLS regression Number of obs = 18 Group variable: id Number of groups = 3 R-sq: within =.3921 Obs per group: min = 6 between =.1125 avg = 6. overall =.24 max = 6 Wald chi2(1) = 9.76 corr(u_i, X) = (assumed) Prob > chi2 =.18 theta = bmi Coef. Std. Err. z P> z [95% Conf. Interval] children _cons sigma_u sigma_e rho (fraction of variance due to u_i) 6-15

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005

Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 This is an introduction to panel data analysis on an applied level using Stata. The focus will be on showing the "mechanics" of these

More information

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052) Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

More information

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10

More information

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format: Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

More information

Correlated Random Effects Panel Data Models

Correlated Random Effects Panel Data Models INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear

More information

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Oscar Torres-Reyna otorres@princeton.edu December 2007 http://dss.princeton.edu/training/ Intro Panel data (also known as longitudinal

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

More information

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models: Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

More information

Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, 2011. Professor Patricia A.

Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, 2011. Professor Patricia A. Introduction to Regression Models for Panel Data Analysis Indiana University Workshop in Methods October 7, 2011 Professor Patricia A. McManus Panel Data Analysis October 2011 What are Panel Data? Panel

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

From the help desk: Swamy s random-coefficients model

From the help desk: Swamy s random-coefficients model The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

More information

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility

More information

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Addressing Alternative. Multiple Regression. 17.871 Spring 2012

Addressing Alternative. Multiple Regression. 17.871 Spring 2012 Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

More information

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

Clustering in the Linear Model

Clustering in the Linear Model Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

25 Working with categorical data and factor variables

25 Working with categorical data and factor variables 25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

More information

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

More information

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Philip Merrigan ESG-UQAM, CIRPÉE Using Big Data to Study Development and Social Change, Concordia University, November 2103 Intro Longitudinal

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Multilevel Models for Longitudinal Data. Fiona Steele

Multilevel Models for Longitudinal Data. Fiona Steele Multilevel Models for Longitudinal Data Fiona Steele Aims of Talk Overview of the application of multilevel (random effects) models in longitudinal research, with examples from social research Particular

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Panel Data Analysis in Stata

Panel Data Analysis in Stata Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing

More information

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects

A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects DISCUSSION PAPER SERIES IZA DP No. 3935 A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects Paulo Guimarães Pedro Portugal January 2009 Forschungsinstitut zur

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Rethinking the Cultural Context of Schooling Decisions in Disadvantaged Neighborhoods: From Deviant Subculture to Cultural Heterogeneity

Rethinking the Cultural Context of Schooling Decisions in Disadvantaged Neighborhoods: From Deviant Subculture to Cultural Heterogeneity Rethinking the Cultural Context of Schooling Decisions in Disadvantaged Neighborhoods: From Deviant Subculture to Cultural Heterogeneity Sociology of Education David J. Harding, University of Michigan

More information

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Moderation. Moderation

Moderation. Moderation Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

MODELING AUTO INSURANCE PREMIUMS

MODELING AUTO INSURANCE PREMIUMS MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies

More information

Note 2 to Computer class: Standard mis-specification tests

Note 2 to Computer class: Standard mis-specification tests Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the

More information

Moderator and Mediator Analysis

Moderator and Mediator Analysis Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

More information

Interaction effects between continuous variables (Optional)

Interaction effects between continuous variables (Optional) Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat

More information

Applied Panel Data Analysis

Applied Panel Data Analysis Applied Panel Data Analysis Using Stata Prof. Dr. Josef Brüderl LMU München April 2015 Contents I I) Panel Data 06 II) The Basic Idea of Panel Data Analysis 17 III) An Intuitive Introduction to Linear

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

is paramount in advancing any economy. For developed countries such as

is paramount in advancing any economy. For developed countries such as Introduction The provision of appropriate incentives to attract workers to the health industry is paramount in advancing any economy. For developed countries such as Australia, the increasing demand for

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Forecasting in STATA: Tools and Tricks

Forecasting in STATA: Tools and Tricks Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

More information

Data Analysis Methodology 1

Data Analysis Methodology 1 Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project

More information

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers

More information

Longitudinal Data Analysis: Stata Tutorial

Longitudinal Data Analysis: Stata Tutorial Part A: Overview of Stata I. Reading Data: Longitudinal Data Analysis: Stata Tutorial use Read data that have been saved in Stata format. infile Read raw data and dictionary files. insheet Read spreadsheets

More information

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Jennjou Chen and Tsui-Fang Lin Abstract With the increasing popularity of information technology in higher education, it has

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

STATA Commands for Unobserved Effects Panel Data

STATA Commands for Unobserved Effects Panel Data STATA Commands for Unobserved Effects Panel Data John C Frain November 24, 2008 Contents 1 Introduction 1 2 Examining Panel Data 4 3 Estimation using xtreg 8 3.1 Introduction..........................................

More information

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector Journal of Modern Accounting and Auditing, ISSN 1548-6583 November 2013, Vol. 9, No. 11, 1519-1525 D DAVID PUBLISHING A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information