International University of Japan Public Management & Policy Analysis Program

Transcription

1 Internatonal Unversty of Japan Publc Management & Polcy Analyss Program Practcal Gudes To Panel Data Modelng: A Step by Step Analyss Usng Stata * Hun Myoung Park, Ph.D. [email protected] 1. Introducton. Preparng Panel Data 3. Bascs of Panel Data Models 4. Pooled OLS and LSDV 5. Fxed Effect Model 6. Random Effect Model 7. Hausman Test and Chow Test 8. Presentng Panel Data Models 9. Concluson References 011 Last modfed on October 011 Publc Management and Polcy Analyss Program Graduate School of Internatonal Relatons Internatonal Unversty of Japan 777 Kokusa-cho Mnam Uonuma-sh, Ngata , Japan (05) * The ctaton of ths document should read: Park, Hun Myoung Practcal Gudes To Panel Data Modelng: A Step-by-step Analyss Usng Stata. Tutoral Workng Paper. Graduate School of Internatonal Relatons, Internatonal Unversty of Japan. Ths document s based on Park, Hun Myoung Lnear Regresson Models for Panel Data Usng SAS, Stata, LIMDEP, and SPSS. The Unversty Informaton Technology Servces (UITS) Center for Statstcal and Mathematcal Computng, Indana Unversty

2 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 1 1. Introducton Panel data are also called longtudnal data or cross-sectonal tme-seres data. These longtudnal data have observatons on the same unts n several dfferent tme perods (Kennedy, 008: 81); A panel data set has multple enttes, each of whch has repeated measurements at dfferent tme perods. Panel data may have ndvdual (group) effect, tme effect, or both, whch are analyzed by fxed effect and/or random effect models. U.S. Census Bureau s Census 000 data at the state or county level are cross-sectonal but not tme-seres, whle annual sales fgures of Apple Computer Inc. for the past 0 years are tme seres but not cross-sectonal. The cumulatve Census data at the state level for the past 0 years are longtudnal. If annual sales data of Apple, IBM, LG, Semens, Mcrosoft, Sony, and AT&T for the past 10 years are avalable, they are panel data. The Natonal Longtudnal Survey of Labor Market Experence (NLS) and the Mchgan Panel Study of Income Dynamcs (PSID) data are cross sectonal and tme-seres, whle the cumulatve General Socal Survey (GSS) and Amercan Natonal Electon Studes (ANES) data are not n the sense that ndvdual respondents vary across survey year. As more and more panel data are avalable, many scholars, practtoners, and students have been nterested n panel data modelng because these longtudnal data have more varablty and allow to explore more ssues than do cross-sectonal or tme-seres data alone (Kennedy, 008: 8). Baltag (001) puts, Panel data gve more nformatve data, more varablty, less collnearty among the varables, more degrees of freedom and more effcency (p.6). Gven well-organzed panel data, panel data models are defntely attractve and appealng snce they provde ways of dealng wth heterogenety and examne fxed and/or random effects n the longtudnal data. However, panel data modelng s not as easy as t sounds. A common msunderstandng s that fxed and/or random effect models should always be employed whenever your data are arranged n the panel data format. The problems of panel data modelng, by and large, come from 1) panel data themselves, ) modelng process, and 3) nterpretaton and presentaton of the result. Some studes analyze poorly organzed panel data (n fact, they are not longtudnal n a strong econometrc sense) and some others mechancally apply fxed and/or random effect models n haste wthout consderaton of relevance of such models. Careless researchers often fal to nterpret the results correctly and to present them approprately. The motvaton of ths document s several IUJ master s theses that, I thnk, appled panel data models napproprately and faled to nterpret the results correctly. Ths document s ntended to provde practcal gudes of panel data modelng, n partcular, for wrtng a master s thess. Students can learn how to 1) organze panel data, ) recognze and handle llorganzed data, 3) choose a proper panel data model, 4) read and report Stata output correctly, 5) nterpret the result substantvely, and 6) present the result n a professonal manner. In order to avod unnecessary complcaton, ths document manly focuses on lnear regresson models rather than nonlnear models (e.g., bnary response and event count data models) and balanced data rather than unbalanced ones. Hopefully ths document wll be a good companon of those who want to analyze panel data for ther master s theses at IUJ. Let us begn wth preparng and evaluatng panel data.

3 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata:. Preparng Panel Data Ths secton descrbes how to prepare panel data sets usng Stata (release 11) and then dscuss types and qualtes of panel data..1 Sample Panel Data Set A sample panel data used here are total cost data for the U.S. arlnes ( ), whch are avalable on The sample data set ncludes total cost, output ndex, fuel prce, and loadng factor of sx U.S. arlnes measured at 15 dfferent tme ponts. Let us type n the followng command at the Stata s dot prompt.. use clear The.use command reads a data set arlne.dta through Internet, and the clear opton removes data n current memory and then loads new one n to the man memory. The.keep command below drops (deletes) all varables other than those lsted n the command.. keep arlne year cost output fuel load. descrbe arlne year cost output fuel load storage dsplay value varable name type format label varable label arlne nt %8.0g Arlne name year nt %8.0g Year cost float %9.0g Total cost n $1,000 output float %9.0g Output n revenue passenger mles, ndex number fuel float %9.0g Fuel prce load float %9.0g Load factor The above.descrbe command dsplays basc nformaton of varables lsted after the command. The.summary command below provdes descrptve statstcs (e.g., mean, standard devaton, mnmum, and maxmum) of varables lsted. 1 From the output below, we know that fve arlnes were coded from 1 to 6 and tme perods were set from 1 through 15.. sum arlne year cost output fuel load Varable Obs Mean Std. Dev. Mn Max arlne year cost output fuel load In order to use panel data commands n Stata, we need to declare cross-sectonal (arlne) and tme-seres (year) varables to tell Stata whch varable s cross-sectonal and whch one s tme-seres. The.tsset command s followed by cross-sectonal and tme-seres varables n order.. tsset arlne year panel varable: arlne (strongly balanced) 1 You may use short versons of these commands; Stata knows that.des and.sum are equvalent to.descrbe and.summary, respectvely.

4 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 3 tme varable: year, 1 to 15 delta: 1 unt Let us frst explore descrptve statstcs of panel data. Run.xtsum to obtan summary statstcs. The total number of observatons s 90 because there are 6 unts (enttes) and 15 tme perods. The overall mean ( ) and standard devaton (1.130) of total cost below are the same as those n the.sum output above.. xtsum cost output fuel load Varable Mean Std. Dev. Mn Max Observatons cost overall N = 90 between n = 6 wthn T = 15 output overall N = 90 between n = 6 wthn T = 15 fuel overall N = 90 between n = 6 wthn T = 15 load overall N = 90 between n = 6 wthn T = 15 Note that Stata lsts three dfferent types of statstcs: overall, between, and wthn. Overall statstcs are ordnary statstcs that are based on 90 observatons. Between statstcs are calculated on the bass of summary statstcs of sx arlnes (enttes) regardless of tme perod, whle wthn statstcs by summary statstcs of 15 tme perods regardless of arlne.. Type of Panel Data A panel data set contans n enttes or subjects, each of whch ncludes T observatons measured at 1 through t tme perod. Thus, the total number of observatons n the panel data s nt. Ideally, panel data are measured at regular tme ntervals (e.g., year, quarter, and month). Otherwse, panel data should be analyzed wth cauton. A panel may be long or short, balanced or unbalanced, and fxed or rotatng...1 Long versus Short Panel Data A short panel has many enttes (large n) but few tme perods (small T), whle a long panel has many tme perods (large T) but few enttes (Cameron and Trved, 009: 30). Accordngly, a short panel data set s wde n wdth (cross-sectonal) and short n length (tme-seres), whereas a long panel s narrow n wdth. Both too small N (Type I error) and too large N (Type II error) problems matter. Researchers should be very careful especally when examnng ether short or long panel... Balanced versus Unbalanced Panel Data In a balanced panel, all enttes have measurements n all tme perods. In a contngency table (or cross-table) of cross-sectonal and tme-seres varables, each cell should have only one frequency. Therefore, the total number of observatons s nt. Ths tutoral document assumes that we have a well-organzed balanced panel data set.

5 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 4 When each entty n a data set has dfferent numbers of observatons, the panel data are not balanced. Some cells n the contngency table have zero frequency. Accordngly, the total number of observatons s not nt n an unbalanced panel. Unbalanced panel data ental some computaton and estmaton ssues although most software packages are able to handle both balanced and unbalanced data...3 Fxed versus Rotatng Panel Data If the same ndvduals (or enttes) are observed for each perod, the panel data set s called a fxed panel (Greene 008: 184). If a set of ndvduals changes from one perod to the next, the data set s a rotatng panel. Ths document assumes a fxed panel..3 Data Arrangement: Long versus Wde Form n Stata A typcal panel data set has a cross-secton (entty or subject) varable and a tme-seres varable. In Stata, ths arrangement s called the long form (as opposed to the wde form). Whle the long form has both ndvdual (e.g., entty and group) and tme varables, the wde form ncludes ether ndvdual or tme varable. Most statstcal software packages assume that panel data are arranged n the long form. The followng data set shows a typcal panel data arrangement. Yes, ths s a long form. There are 6 enttes (arlne) and 15 tme perods (year).. lst arlne year load cost output fuel n 1/0, sep(0) arlne year load cost output fuel If data are structured n a wde form, you need to rearrange data frst. Stata has the.reshape command to rearrange a data set back and forth between long and short forms. The followng.reshape wth wde changes from the long form to wde one so that the resultng data set n a wde form has only sx observatons but n turn nclude an dentfcaton (entty) The.lst command lsts data tems of ndvdual observatons. The n 1/0 of ths command dsplays data of the frst 0 observatons, and the sep(0) opton nserts a horzontal separator lne n every 0 observatons rather than n the default every 5 lnes.

6 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 5 varable arlne and as many varables as the tme perods (4 15), droppng a tme varable year.. reshape wde cost output fuel load, (arlne) j(year) (note: j = ) Data long -> wde Number of obs. 90 -> 6 Number of varables 6 -> 61 j varable (15 values) year -> (dropped) xj varables: cost -> cost1 cost... cost15 output -> output1 output... output15 fuel -> fuel1 fuel... fuel15 load -> load1 load... load The () above specfes dentfcaton varables to be used as dentfcaton of observatons. If you wsh to rearrange the data set back to the long counterpart, run the followng.reshape command wth long.. reshape long cost output fuel load, (arlne) j(year).4 Evaluatng the Qualtes of Your Panel Data The frst task that a research has to do after cleanng data s to check the qualty of panel data n hand. When sayng panel data, you are mplctly argung that the data are well arranged by both cross-sectonal and tme-seres varables and that you get a strong mpresson of presence of fxed and/or random effects. Otherwse, the data are smply (or physcally) arranged n the panel data format but are no longer panel data n an econometrc sense. The most mportant ssue s consstency n the unt of analyss (or measurement), whch says that each observaton n a data set deserves beng treated and weghted equally. Ths requrement seems self-evdent but s often overlooked by careless researchers. If each observaton s not equvalent n many senses, any analyss based on such data may not be relable. Here are some checkponts that researchers should examne carefully. Make sure that your data are really longtudnal and that there are some fxed and/or random effects. Check f ndvduals (e.g., enttes and subjects) are not consstent but changng. For nstance, a company mght be splt or merged durng the research perod to become a completely new one. Smlarly, check f tme perods are not consstent but changng. A tme perod under some crcumstances may not be fxed but almost random (e.g., second perod s two days later the frst perod, thrd perod s 100 days later the second perod, forth perod s one and a half years later the thrd perod, etc.) In some data sets, tme perod s fxed but multple tme perods are used; both yearly and weekly data coexst n a data set. Check f an entty has more than one observaton n a partcular tme perod. For example, Apple has four observatons for quarterly sales data n 011, whle each of other frms has one yearly sales observaton n that year. In ths smple case, you may aggregate quarterly data to obtan yearly fgures.

7 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 6 Check f measurement methods employed are not consstent. Measurements are not commensurable f 1) some enttes were measured n method A and other enttes n method B, ) some tme perods were measured n method C and other perods n method D, and/or 3) both 1) and ) are mxed. 3 Be careful when you darn your data set by combnng data sets measured and bult by dfferent nsttutons who employed dfferent methods. Ths crcumstance s qute understandable because a perfect data set s rarely ready for you; n many cases, you need to combne some sources of nformaton to buld a new data set for your research. Another ssue s f the number of enttes and/or tme-perod s too small or too large. It s less valuable to contrast one group (or tme perod) wth another n the panel data framework: n= or T=3). By contrast, comparng mllons of ndvduals or tme perods s almost useless because of hgh lkelhood of Type II error. Ths task s almost smlar to argung that at least one company out of 1 mllon frms n the world has a dfferent productvty. Is ths argument nterestng to you?; We already know that! In case of too large N (specfcally n or T), you mght try to reclassfy ndvduals or tme perods nto several meanngful categores; for example, classfy mllons of ndvduals by ther ctzenshps or ethnc groups (e.g., whte, black, Asan, and Spansh). Fnally, many mssng values are lkely lower the qualty of panel data. So called lstwse deleton (an entre record s excluded from analyss f any sngle value of a varable s mssng) tends to reduce the number of observatons used n a model and thus weaken statstcal power of a test. Ths ssue s also related to dscusson on balanced versus unbalanced panel data. Once a well organzed panel data s prepared, we are movng forward to dscuss panel data models that are used to analyze fxed and/or random effects embedded n the longtudnal data. 3 Assume that methods A and B, and methods C and D are not comparable each other n terms of scale and unt of measurements.

8 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 7 3. Bascs of Panel Data Models Panel data models examne group (ndvdual-specfc) effects, tme effects, or both n order to deal wth heterogenety or ndvdual effect that may or may not be observed. 4 These effects are ether fxed or random effect. A fxed effect model examnes f ntercepts vary across group or tme perod, whereas a random effect model explores dfferences n error varance components across ndvdual or tme perod. A one-way model ncludes only one set of dummy varables (e.g., frm1, frm,...), whle a two-way model consders two sets of dummy varables (e.g., cty1, cty, and year1, year, ). Ths secton follows Greene s (008) notatons wth some modfcatons, such as lower-case k (the number of regressors excludng the ntercept term; He uses K nstead), w t (the composte error term), and v t (tradtonal error term; He uses ε t ). 3.1 Pooled OLS If ndvdual effect u (cross-sectonal or tme specfc effect) does not exst (u =0), ordnary least squares (OLS) produces effcent and consstent parameter estmates. y = α + β + ε ' t X t t (u =0) OLS conssts of fve core assumptons (Greene, 008: 11-19; Kennedy, 008: 41-4). 1. Lnearty says that the dependent varable s formulated as a lnear functon of a set of ndependent varable and the error (dsturbance) term.. Exogenety says that the expected value of dsturbances s zero or dsturbances are not correlated wth any regressors. 3. Dsturbances have the same varance (3.a homoskedastcty) and are not related wth one another (3.b nonautocorrelaton) 4. The observatons on the ndependent varable are not stochastc but fxed n repeated samples wthout measurement errors. 5. Full rank assumpton says that there s no exact lnear relatonshp among ndependent varables (no multcollnearty). If ndvdual effect u s not zero n longtudnal data, heterogenety (ndvdual specfc characterstcs lke ntellgence and personalty that are not captured n regressors) may nfluence assumpton and 3. In partcular, dsturbances may not have same varance but vary across ndvdual (heteroskedastcty, volaton of assumpton 3.a) and/or are related wth each other (autocorrelaton, volaton of assumpton 3.b). Ths s an ssue of nonsphercal varance-covarance matrx of dsturbances. The volaton of assumpton renders random effect estmators based. Hence, the OLS estmator s no longer best unbased lnear estmator. Then panel data models provde a way to deal wth these problems. 3. Fxed versus Random Effects Panel data models examne fxed and/or random effects of ndvdual or tme. The core dfference between fxed and random effect models les n the role of dummy varables 4 Country, state, agency, frm, respondent, employee, and student are examples of a unt (ndvdual or entty), whereas year, quarter, month, week, day, and hour can be examples of a tme perod.

9 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 8 (Table 3.1). A parameter estmate of a dummy varable s a part of the ntercept n a fxed effect model and an error component n a random effect model. Slopes reman the same across group or tme perod n ether fxed or random effect model. The functonal forms of one-way fxed and random effect models are, 5 ' Fxed effect model: y t = ( α + u) + X tβ + vt ' Random effect model: y t = α + X tβ + ( u + vt ), where u s a fxed or random effect specfc to ndvdual (group) or tme perod that s not ncluded n the regresson, and errors are ndependent dentcally dstrbuted, v ~ IID(0, σ ). A fxed group effect model examnes ndvdual dfferences n ntercepts, assumng the same slopes and constant varance across ndvdual (group and entty). Snce an ndvdual specfc effect s tme nvarant and consdered a part of the ntercept, u s allowed to be correlated wth other regressors; That s, OLS assumpton s not volated. Ths fxed effect model s estmated by least squares dummy varable (LSDV) regresson (OLS wth a set of dummes) and wthn effect estmaton methods. Table 3.1 Fxed Effect and Random Effect Models Fxed Effect Model Random Effect Model Functonal form ' ' y = ( α + u ) + X β + v y = α + X β + ( u + v ) t t t Assumpton - Indvdual effects are not correlated wth regressors Intercepts Varyng across group and/or tme Constant Error varances Constant Randomly dstrbuted across group and/or tme Slopes Constant Constant Estmaton LSDV, wthn effect estmaton GLS, FGLS (EGLS) Hypothess test F test Breusch-Pagan LM test A random effect model assumes that ndvdual effect (heterogenety) s not correlated wth any regressor and then estmates error varance specfc to groups (or tmes). Hence, u s an ndvdual specfc random heterogenety or a component of the composte error term. Ths s why a random effect model s also called an error component model. The ntercept and slopes of regressors are the same across ndvdual. The dfference among ndvduals (or tme perods) les n ther ndvdual specfc errors, not n ther ntercepts. A random effect model s estmated by generalzed least squares (GLS) when a covarance structure of an ndvdual, Σ (sgma), s known. The feasble generalzed least squares (FGLS) or estmated generalzed least squares (EGLS) method s used to estmate the entre varance-covarance matrx V (Σ n all dagonal elements and 0 n all off-dagonal elements) when Σ s not known. There are varous estmaton methods for FGLS ncludng the maxmum lkelhood method and smulaton (Baltag and Cheng, 1994). A random effect model reduces the number of parameters to be estmated but wll produce nconsstent estmates when ndvdual specfc random effect s correlated wth regressors (Greene, 008: 00-01). Fxed effects are tested by the F test, whle random effects are examned by the Lagrange multpler (LM) test (Breusch and Pagan, 1980). If the null hypothess s not rejected n ether t t t t v 5 Let us focus here on cross-sectonal (group) effects. For tme effects, swtch wth t n the formula.

10 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 9 test, the pooled OLS regresson s favored. The Hausman specfcaton test (Hausman, 1978) compares a random effect model to ts fxed counterpart. If the null hypothess that the ndvdual effects are uncorrelated wth the other regressors s not rejected, a random effect model s favored over ts fxed counterpart. If one cross-sectonal or tme-seres varable s consdered (e.g., country, frm, and race), ths s called a one-way fxed or random effect model. Two-way effect models have two sets of dummy varables for ndvdual and/or tme varables (e.g., state and year) and thus ental some ssues n estmaton and nterpretaton. 3.3 Estmatng Fxed Effect Models There are several strateges for estmatng a fxed effect model. The least squares dummy varable model (LSDV) uses dummy varables, whereas the wthn estmaton does not. These strateges, of course, produce the dentcal parameter estmates of regressors (nondummy ndependent varables). The between estmaton fts a model usng ndvdual or tme means of dependent and ndependent varables wthout dummes. LSDV wth a dummy dropped out of a set of dummes s wdely used because t s relatvely easy to estmate and nterpret substantvely. Ths LSDV, however, becomes problematc when there are many ndvduals (or groups) n panel data. If T s fxed and n (n s the number of groups or frms and T s the number of tme perods), parameter estmates of regressors are consstent but the coeffcents of ndvdual effects, α + u, are not (Baltag, 001: 14). In ths short panel, LSDV ncludes a large number of dummy varables; the number of these parameters to be estmated ncreases as n ncreases (ncdental parameter problem); therefore, LSDV loses n degrees of freedom but returns less effcent estmators (p.14). Under ths crcumstance, LSDV s useless and thus calls for another strategy, the wthn effect estmaton. Unlke LSDV, the wthn estmaton does not need dummy varables, but t uses devatons from group (or tme perod) means. That s, wthn estmaton uses varaton wthn each ndvdual or entty nstead of a large number of dummes. The wthn estmaton s, 6 ( yt y ) = ( xt x )' β + ( εt ε ), where y s the mean of dependent varable (DV) of ndvdual (group), x represent the means of ndependent varables (IVs) of group, and ε s the mean of errors of group. In ths wthn estmaton, the ncdental parameter problem s no longer an ssue. The parameter estmates of regressors n the wthn estmaton are dentcal to those of LSDV. The wthn estmaton reports correct the sum of squared errors (SSE). The wthn estmaton, however, has several dsadvantages. Frst, data transformaton for wthn estmaton wpes out all tme-nvarant varables (e.g., gender, ctzenshp, and ethnc group) that do not vary wthn an entty (Kennedy, 008: 84). Snce devatons of tme-nvarant varables from ther average are all zero, t s not possble 6 Ths wthn estmaton needs three steps: 1) compute group means of the dependent and ndependent varables; ) transform dependent and ndependent varables to get devatons from ther group means; 3) run OLS on the transformed varables wthout the ntercept term.

11 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 10 to estmate coeffcents of such varables n wthn estmaton. As a consequence, we have to ft LSDV when a model has tme-nvarant ndependent varables. Second, wthn estmaton produces ncorrect statstcs. Snce no dummy s used, the wthn effect model has larger degrees of freedom for errors, accordngly reportng small mean squared errors (MSE), standard errors of the estmates (SEE) or square root of mean squared errors (SRMSE), and ncorrect (smaller) standard errors of parameter estmates. Hence, we have to adjust ncorrect standard errors usng the followng formula. 7 se * k = se k df df wthn error LSDV error = se k nt k nt n k Thrd, R of the wthn estmaton s not correct because the ntercept term s suppressed. Fnally, the wthn estmaton does not report dummy coeffcents. We have to compute * them, f really needed, usng the formula d = y x ' β. Table 3. Comparson of Three Estmaton Methods LSDV Wthn Estmaton Between Estmaton Functonal form y α + X β + ε y y = x x + ε y α + + ε = t t ε t = x Tme nvarant Yes No No varables Dummy varables Yes No No Dummy coeffcents Presented Need to be computed N/A Transformaton No Devaton from the group means Group means Intercept estmated Yes No Yes R Correct Incorrect SSE Correct Correct MSE/SEE (SRMSE) Correct Incorrect (smaller) Standard errors Correct Incorrect (smaller) DF error nt-n-k * nt-k (n larger) n-k-1 Observatons nt nt n * It means that the LSDV estmaton loses n degrees of freedom because of dummy varables ncluded. The between group estmaton, so called the group mean regresson, uses varaton between ndvdual enttes (groups). Specfcally, ths estmaton calculates group means of the dependent and ndependent varables and thus reduces the number of observatons down to n. Then, run OLS on these transformed, aggregated data: y = α + x + ε. Table 3. contrasts LSDV, wthn group estmaton, and between group estmaton. 3.4 Estmatng Random Effect Models The one-way random effect model ncorporates a composte error term, w t = u + vt. The u are assumed ndependent of tradtonal error term v t and regressors X t, whch are also ndependent of each other for all and t. Remember that ths assumpton s not necessary n a fxed effect model. Ths model s, 7 Fortunately, Stata and other software packages report adjusted standard errors for us.

12 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 11 y = α + X ' β + u + v, where u ~ IID(0, σ ), and v ~ IID(0, σ ). t t t u t v The covarance elements of Cov ( wt, wjs ) = E( wtw' js ) are σ u + σ v f =j and t=s and σ u f =j and t s. Therefore, the covarance structure of composte errors Σ = E( w w ') for ndvdual and the varance-covarance matrx of entre dsturbances (errors) V are, σ u + σ v Σ = σ u T... σ u T u σ + σ u σ... σ u v σ u σ u and... σ u + σ v V nt nt = I n Σ 0 Σ = Σ Σ A random effect model s estmated by generalzed least squares (GLS) when the covarance structure s known, and by feasble generalzed least squares (FGLS) or estmated generalzed least squares (EGLS) when the covarance structure of composte errors s unknown. Snce Σ s often unknown, FGLS/EGLS s more frequently used than GLS. Compared to a fxed effect counterpart, a random effect model s relatvely dffcult to estmate. In FGLS, you frst have to estmate θ usng effect estmaton (group mean regresson) and ˆ σ v. The ˆu σ and ˆu σ comes from the between ˆ σ v s derved from the SSE (sum of squared errors) of the wthn effect estmaton or the devatons of resduals from group means of resduals. ˆ ˆ σ ˆ σ θ = 1 = 1, T ˆ σ ˆ σ T ˆ σ where ˆ σ ˆ σ v v u + v v between σ v = ˆ σ between, where ˆ σ between =, T n k 1 ˆ u SSEwthn e' ewthn = = = nt n k nt n k n T = 1 t = 1 ( v t nt n k SSE between v ), where v t are the resduals of the LSDV. Then, the dependent varable, ndependent varables, and the ntercept term need to be transformed as follows, y x * t = yt y θˆ * t = xt x * α = 1 ˆ θ θˆ for all x k Fnally, run OLS on those transformed varables wth the tradtonal ntercept suppressed. y * * * * t α + x t ' β + = ε. * t 3.5 Testng Fxed and Random Effects

13 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 1 How do we know f fxed and/or random effects exst n panel data n hand? A fxed effect s tested by F-test, whle a random effect s examned by Breusch and Pagan s (1980) Lagrange multpler (LM) test. The former compares a fxed effect model and OLS to see how much the fxed effect model can mprove the goodness-of-ft, whereas the latter contrast a random effect model wth OLS. The smlarty between random and fxed effect estmators s tested by a Hausman test F-test for Fxed Effects In a regresson of yt = α + μ + X t ' β + εt, the null hypothess s that all dummy parameters except for one for the dropped are all zero, H0 : μ 1 =... = μ n 1 = 0. The alternatve hypothess s that at least one dummy parameter s not zero. Ths hypothess s tested by an F test, whch s based on loss of goodness-of-ft. Ths test contrasts LSDV (robust model) wth the pooled OLS (effcent model) and examnes the extent that the goodness-of-ft measures (SSE or R ) changed. ( e' e F n 1, nt n k) = ( e' e e' elsdv ) ( n 1) ( R = ) ( nt n k) (1 R pooled LSDV ( LSDV LSDV Rpooled ) ( n 1) ) ( nt n k) If the null hypothess s rejected (at least one group/tme specfc ntercept u s not zero), you may conclude that there s a sgnfcant fxed effect or sgnfcant ncrease n goodness-of-ft n the fxed effect model; therefore, the fxed effect model s better than the pooled OLS Breusch-Pagan LM Test for Random Effects Breusch and Pagan s (1980) Lagrange multpler (LM) test examnes f ndvdual (or tme) specfc varance components are zero, H0 : σ u = 0. The LM statstc follows the ch-squared dstrbuton wth one degree of freedom. nt T e' e LM u = 1 ~ χ (1) ( 1) ', T e e where e s the n 1 vector of the group means of pooled regresson resduals, and SSE of the pooled OLS regresson. Baltag (001) presents the same LM test n a dfferent way. nt ( e t ) LM u = 1 nt ( Te ) = 1 ~ χ (1). (T 1) e t (T 1) e t e' e s the If the null hypothess s rejected, you can conclude that there s a sgnfcant random effect n the panel data, and that the random effect model s able to deal wth heterogenety better than does the pooled OLS Hausman Test for Comparng Fxed and Random Effects How do we know whch effect (fxed effect or random effect) s more relevant and sgnfcant n the panel data? The Hausman specfcaton test compares fxed and random effect models

14 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 13 under the null hypothess that ndvdual effects are uncorrelated wth any regressor n the model (Hausman, 1978). If the null hypothess of no correlaton s not volated, LSDV and GLS are consstent, but LSDV s neffcent; otherwse, LSDV s consstent but GLS s nconsstent and based (Greene, 008: 08). The estmates of LSDV and GLS should not dffer systematcally under the null hypothess. The Hausman test uses that the covarance of an effcent estmator wth ts dfference from an neffcent estmator s zero (Greene, 008: 08). ' ( ) ˆ 1 b b W ( b b ) ~ χ ( k) LM = LSDV random LSDV random, where Wˆ = Var[ b b ] = Var( b ) Var( b ) s the dfference n the estmated LSDV random LSDV covarance matrces of LSDV (robust model) and GLS (effcent model). Keep n mnd that an ntercept and dummy varables SHOULD be excluded n computaton. Ths test statstc follows the ch-squared dstrbuton wth k degrees of freedom. The formula says that a Hausman test examnes f the random effects estmate s nsgnfcantly dfferent from the unbased fxed effect estmate (Kennedy, 008: 86). If the null hypothess of no correlaton s rejected, you may conclude that ndvdual effects u are sgnfcantly correlated wth at least one regressors n the model and thus the random effect model s problematc. Therefore, you need to go for a fxed effect model rather than the random effect counterpart. A drawback of ths Hausman test s, however, that the dfference of covarance matrces W may not be postve defnte; Then, we may conclude that the null s not rejected assumng smlarty of the covarance matrces renders such a problem (Greene, 008: 09) Chow Test for Poolablty What s poolablty? Poolablty asks f slopes are the same across group or over tme (Baltag 001: 51-57). One smple verson of poolablty test s an extenson of the Chow test (Chow, 1960). The null hypothess of ths Chow test s the slope of a regressor s the same regardless of ndvdual for all k regrssors, H 0 : β k = β k. Remember that slopes reman constant n fxed and random effect models; only ntercepts and error varances matter. ' ( e' e ee ) ( n 1)( k + 1) F[ ( n 1)( k + 1), n( T k 1) ] =, ' ee n( T k 1) where e' e s the SSE of the pooled OLS and e e ' s the SSE of the pooled OLS for group. If the null hypothess s rejected, the panel data are not poolable; each ndvdual has ts own slopes for all regressors. Under ths crcumstance, you may try the random coeffcent model or herarchcal regresson model. The Chow test assumes that ndvdual error varance components follow the normal dstrbuton, μ ~ N(0, s I nt ). If ths assumpton does not hold, the Chow test may not properly examne the null hypothess (Baltag, 001: 53). Kennedy (008) notes, f there s reason to beleve that errors n dfferent equatons have dfferent varances, or that there s contemporaneous correlaton between the equatons errors, such testng should be undertaken by usng the SURE estmator, not OLS; nference wth OLS s unrelable f the varance-covarance matrx of the error s nonsphercal (p.9). random

15 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Model Selecton: Fxed or Random Effect? When combnng fxed vs. random effects, group vs. tme effects, and one-way vs. two-way effects, we get 1 possble panel data models as shown n Table 3.3. In general, one-way models are often used manly due to ther parsmony, and a fxed effect model s easer than a random counterpart to estmate the model and nterpret ts result. It s not, however, easy to sort out the best one out of the followng 1 models. Table 3.3 Classfcaton of Panel Data Analyss Type Fxed Effect Random Effect One-way Group One-way fxed group effect One-way random group effect Two-way Tme Two groups * One-way fxed tme effect Two-way fxed group effect One-way random tme effect Two-way random group effect Two tmes * Two-way fxed tme effect Two-way random tme effect Mxed Two-way fxed group & tme effect Two-way random group & tme effect Two-way fxed tme and random group effect Two-way fxed group and random tme effect * These models need two group (or tme) varables (e.g., country and arlne) Substantve Meanngs of Fxed and Random Effects The formal tests dscussed n 3.5 examne presence of fxed and/or random effects. Specfcally, the F-test compares a fxed effect model and (pooled) OLS, whereas the LM test contrasts a random effect model wth OLS. The Hausman specfcaton test compares fxed and random effect models. However, these tests do not provde substantve meanngs of fxed and random effects. What does a fxed effect mean? How do we nterpret a random effect substantvely? Here s a smple and rough answer. Suppose we are regressng the producton of frms such as Apple, IBM, LG, and Sony on ther R&D nvestment. A fxed effect mght be nterpreted as ntal producton capactes of these companes when no R&D nvestment s made; each frm has ts own ntal producton capacty. A random effect mght be vewed as a knd of consstency or stablty of producton. If the producton of a company fluctuates up and down sgnfcantly, for example, ts producton s not stable (or ts varance component s larger than those of other frms) even when ts productvty (slope of R&D) remans the same across company. 8 Kennedy (008: 8-86) provdes theoretcal and nsghtful explanaton of fxed and random effects. Ether fxed or random effect s an ssue of unmeasured varables or omtted relevance varables, whch renders the pooled OLS based. Ths heterogenety s handled by ether puttng n dummy varables to estmate ndvdual ntercepts of groups (enttes) or vewng the dfferent ntercepts as havng been drawn from a bowl of possble ntercepts, so they may be nterpreted as random and treated as though they were a part of the error term (p. 84); they are fxed effect model and random effect model, respectvely. A random effect model has a composte error term that conssts of the tradtonal random error and a random ntercept measurng the extent to whch ndvdual s ntercept dffers from the 8 Lke dummy coeffcents n a fxed effect model, parameter estmates of error components of ndvdual companes can be calculated n a random effect model. The SAS MIXED procedure reports such error component estmators.

16 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 15 overall ntercept (p. 84). He argues that the key dfference between fxed and random effects s not whether unobserved heterogenety s attrbuted to the ntercept or varance components, but whether the ndvdual specfc error component s related to regressors. Fgure 3.1 Scatter Plots of Total Cost versus Output Index and Loadng Factor Total Cost Arlne 4 Arlne 3 Regresson Lne Arlne Arlne Output Index Source: Total Cost Arlne 4 Arlne Regresson Lne Arlne 1 Arlne Loadng Factor Source: It wll be a good practce to draw plots of the dependent and ndependent varables before modelng panel data. For nstance, Fgure 3.1 llustrates two scatter plots wth lnear regresson lnes of four arlnes only. The left plot s of total cost versus output ndex, and the rght one s of total cost versus loadng factor (compare them wth Kennedy s Fgure 18.1 and 18.). Assume that the thck black lnes represent lnear regresson lnes of entre observatons. The key dfference s that slopes of ndvdual arlnes are very smlar to the overall regresson lne on the left plot, but dfferent n the rght plot. As Kennedy (008: 86) explans, OLS, fxed effect, and random effect estmators on the left plot are all unbased, but random effect estmators are most effcent; a random effect s better. In the rght plot, however, OLS and random effects estmators are based because the composte error term seems to be correlated wth a regressor, loadng factor, but the fxed effects estmator s not based; accordngly, a fxed effect model mght be better Two Recommendatons for Panel Data Modelng The frst recommendaton, as n other data analyss processes, s to descrbe the data of nterest carefully before analyss. Although often gnored n many data analyses, ths data descrpton s very mportant and useful for researchers to get deas about data and analyss strateges. In panel data analyss, propertes and qualty of panel data nfluence model secton sgnfcantly. Clean the data by examnng f they were measured n relable and consstent manners. If dfferent tme perods were used n a long panel, for example, try to rearrange (aggregate) data to mprove consstency. If there are many mssng values, decde whether you go for a balanced panel by throwng away some peces of usable nformaton or keep all usable observatons n an unbalanced panel at the expense of methodologcal and computatonal complcaton. Examne the propertes of the panel data ncludng the number of enttes (ndvduals), the number of tme perods, balanced versus unbalanced panel, and fxed versus rotatng panel. Then, try to fnd models approprate for those propertes. Be careful f you have long or short panel data. Imagne a long panel that has 10 thousand tme perods but 3 ndvduals or a short panel of (years) 9,000 (frms).

17 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 16 If n and/or T are too large, try to reclassfy ndvduals and/or tme perods and get some manageable n and T. The null hypothess of u 1 = u = = u 999,999 = 0 n a fxed effect model, for nstance, s almost useless. Ths s just as you are serously argung that at least one ctzen looks dfferent from other 999,999 people! Ddn t you know that before? Try to use yearly data rather than weekly data or monthly data rather than daly data. Second recommendaton s to begn wth a smpler model. Try a pooled OLS rather than a fxed or random effect model; a one-way effect model rather than a two-way model; a fxed or random effect model rather than a herarchcal lnear model; and so on. Do not try a fancy, of course, complcated, model that your panel data do not support enough (e.g., poorly organzed panel and long/short panel) Gudelnes of Model Selecton On the modelng stage, let us begn wth pooled OLS and then thnk crtcally about ts potental problems f observed and unobserved heterogenety (a set of mssng relevant varables) s not taken nto account. Also thnk about the source of heterogenety (.e., crosssectonal or tme seres varables) to determne ndvdual (entty or group) effect or tme effect. 9 Fgure 3. provdes a bg pcture of the panel data modelng process. Fgure 3. Panel Data Modelng Process If you thnk that the ndvdual heterogenety s captured n the dsturbance term and the ndvdual (group or tme) effect s not correlated wth any regressors, try a random effect model. If the heterogenety can be dealt wth ndvdual specfc ntercepts and the ndvdual effect may possbly be correlated wth any regressors, try a fxed effect model. If each ndvdual (group) has ts own ntal capacty and shares the same dsturbance varance wth 9 Kennedy (008: 86) suggests that frst examne f ndvdual specfc ntercepts are equal; f yes, the panel data are poolable and OLS wll do; f not, conduct the Hausman test; use random effect estmators f the group effect s not correlated wth the error term; otherwse, use the fxed effect estmator.

18 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 17 other ndvduals, a fxed effect model s favored. If each ndvdual has ts own dsturbance, a random effect wll be better at fgurng out heteroskedestc dsturbances. Next, conduct approprate formal tests to examne ndvdual group and/or tme effects. If the null hypothess of the LM test s rejected, a random effect model s better than the pooled OLS. If the null hypothess of the F-test s rejected, a fxed effect model s favored over OLS. If both hypotheses are not rejected, ft the pooled OLS. Conduct the Hausman test when both hypotheses of the F-test and LM test are all rejected. If the null hypothess of uncorrelaton between an ndvdual effect and regressors s rejected, go for the robust fxed effect model; otherwse, stck to the effcent random effect model. If you have a strong belef that the heterogenety nvolves two cross-sectonal, two tme seres, or one cross-secton and one tme seres varables, try two-way effect models. Double-check f your panel data are well-organzed, and n and T are large enough; do not try a two-way model for a poorly organzed, badly unbalanced, and/or too long/short panel. Conduct approprate F-test and LM test to examne the presence of two-way effects. Stata does not provde drect ways to ft two-way panel data models but t s not mpossble. In Stata, twoway fxed effect models seem easer than two-way random effect models (see 3.7 below). Fnally, f you thnk that the heterogenety entals slops (parameter estmates of regressors) varyng across ndvdual and/or tme. Conduct a Chow test or equvalent to examne the poolablty of the panel data. If the null hypothess of poolable data s rejected, try a random coeffcent model or herarchcal lnear model. 3.7 Estmaton Strateges n Stata The least squares dummy varable (LSDV) regresson, wthn estmaton, between estmaton (group or tme mean model), GLS, and FGLS/EGLS are fundamentally based on ordnary least squares (OLS). Therefore, Stata.regress can ft all of these lnear models. Table 3.4 Stata Commands Used for Panel Data Analyss Commands Optons Regresson (OLS).regress LSDV1 wthout a dummy.regress.x: regress. LSDV wthout the ntercept.regress noconstant LSDV3 wth a restrcton.cnsreg and.constrant One-way fxed effect ( wthn estmaton).xtreg.areg fe abs Two-way fxed ( wthn estmaton).xtreg wth a set of dummes fe Between estmaton.xtreg be One-way random effect.xtreg re.xtgls.xtmxed Two-way random effect.xtmxed Heratcal lnear model Random coeffcent model.xtmxed.xtrc betas Testng fxed effect (F-test).test (Included n.xtreg) Testng random effect (LM test).xtest0 Comparng fxed and random effect.hausman

19 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 18 You can also use.regress wth the.x prefx command to ft LSDV1 wthout creatng dummy varables (see 4.4.1). The.cnsreg command s used for LSDV3 wth restrctons defned n.constrant (see 4.4.3). The.areg command wth the absorb opton, equvalent to the.xtreg wth the fe opton below, supports the one-way wthn estmaton that nvolves a large number of ndvduals or tme perods. Stata has more convenent commands and optons for panel data analyss. Frst,.xtreg estmates a fxed effect model wth the fe opton ( wthn estmaton), between estmators wth be, and a random effect model wth re. Ths command, however, does not drectly ft two-way fxed and random effect models. 10 Table 3.4 summarzes related Stata commands. A random effect model can be also estmated usng.xtmxed and.xtgls. The.xtgls command fts panel data models wth heteroscedastcty across group (tme) and/or autocorrelaton wthn a group (tme)..xtmxed and.xtrc are used to ft herarchcal lnear models and random coeffcent models. In fact, a random effect model s a smple herarchcal lnear model wth a random ntercept..logt and.probt ft nonlnear regresson models and examne fxed effects n logt and probt models..xtmxed wth fe by default conducts the F-test for fxed effects. Of course, you can also use.test to conduct a classcal Wald test to examne the fxed effects. Snce.xtmxed does not report the Breusch-Pagan LM statstc for a random effect model, you need to conduct.xtest0 after fttng a random effect model. Use.hausman to conduct Hausman test to compare fxed and random effect models. 10 You may ft a two-way fxed effect model by ncludng a set of dummes and usng the fe opton. For the two-way random effect model, you need to use the.xtmxed command nstead of.xtreg.

20 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Pooled OLS and LSDV Ths secton begns wth classcal least squares method called ordnary least squares (OLS) and explans how OLS can deal wth unobserved heterogenety usng dummy varables. A dummy varable s a bnary varable that s coded to ether one or zero. OLS usng dummy varables s called a least square dummy varable (LSDV) model. The sample model used here regresses total cost of arlne companes on output n revenue passenger mles (output ndex), fuel prce, and loadng factor (the average capacty utlzaton of the fleet) Pooled OLS The (pooled) OLS s a pooled lnear regresson wthout fxed and/or random effects. It assumes a constant ntercept and slopes regardless of group and tme perod. In the sample panel data wth fve arlnes and 15 tme perods, the basc scheme s that total cost s determned by output, fuel prce, and loadng factor. The pooled OLS posts no dfference n ntercept and slopes across arlne and tme perod. OLS: cos t = 0 + β1output + β fuel + β3 β loadng + ε Note that β 0 s the ntercept; β 1 s the slope (coeffcent or parameter estmate) of output; β s the slope of fuel prce; β 3 s the slope of loadng factor; and ε s the error term. Now, let us load the data and ft the pooled regresson model.. use clear (Cost of U.S. Arlnes (Greene 003)). regress cost output fuel load Source SS df MS Number of obs = F( 3, 86) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.1461 cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons Ths pooled OLS model fts the data well at the.05 sgnfcance level (F= and p<.0000). R of.9883 says that ths model accounts for 99 percent of the total varance n the total cost of arlne companes. The regresson equaton s, cost = *output *fuel *load You may nterpret these slopes n several ways. The ceters parbus assumpton, holdng all other varables constant, s mportant but often skpped n presentaton. The p-values n parenthess below are the results of t-tests for ndvdual parameters. 11 For detals on the data, see

21 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 0 Even n case of zero output ndex, zero fuel prce, and zero loadng factor, each arlne company s expected to have unts of total cost (p<.0000). For one unt ncrease n output ndex, the total cost of arlnes s expected to ncrease by.887 unts, holdng all other varables constant (p<.0000). Whenever fuel prce ncreases by ten unts, the total cost wll ncrease by unts, holdng all other varables constant (p<.0000). If the loadng factor ncreases by one unt, an arlne company can save total cost on average by unts (p<.0000). Although ths model fts the data well, you may suspect f each arlne or year has dfferent ntal total cost. That s, each arlne may have ts own ntal total cost, ts Y-ntercept, that s sgnfcantly dfferent from those of other arlne companes.what f you beleve that error terms vary across arlne and/or year? The former queston suspect fxed effects, whereas the latter asks f there s any random effect. 4. LSDV wth a Set of Dummy Varables Let us here examne fxed group effects by ntroducng group (arlne) dummy varables. The dummy varable g1 s set to 1 for arlne 1 and zero for other arlne companes; smlarly, the varable g s coded as 1 for arlne and zero for other arlne companes; and so on. See the followng for the codng scheme of dummy varables. 1. generate g1=(arlne==1). gen g=(arlne==). lst arlne year g1-g arlne year g1 g g3 g4 g5 g The frst.generate command creates a dummy varable and then assgns 1 f the condton (arlne==1) provded s satsfed and 0 otherwse.

22 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 1 Ths LSDV model s, LSDV: cos t = 0 + β1output + β fuel + β3loadng + u1g1 + ug + u3g3 + u4g4 + u5g5 β + ε You should fnd that fve group dummes, g 1 -g 5, are added to the pooled OLS equaton. Notce that one of sx dummes, g 6 n ths case, was excluded from the regresson equaton n order to avod perfect multcollnearty. 13 The dummy varables and regressors are allowed to be correlated n a fxed effect model. u 1 -u 5 are respectvely parameter estmates of group dummy varables g 1 -g 5. Let us ft ths lnear regresson wth dummes. In the followng command, I ntentonally added g 1 -g 5 rght after the dependent varable cost n order to emphasze ther coeffcents are part of ntercepts (as opposed to error terms).. regress cost g1-g5 output fuel load Source SS df MS Number of obs = F( 8, 81) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = cost Coef. Std. Err. t P> t [95% Conf. Interval] g g g g g output fuel load _cons Ths LSDV fts the data better than does the pooled OLS n 4.1. The F statstc ncreased from to (p<.0000); SSE (sum of squares due to error or resdual) decreased from to.96; and R ncreased from.9883 to Due to the dummes ncluded, ths model loses fve degrees of freedom (from 86 to 81). Parameter estmates of ndvdual regressors are slghtly dfferent from those n the pooled OLS. For nstance, the coeffcent of fuel prce decreased from.4540 to.4175 but ts statstcal sgnfcance remaned almost unchanged (p<.0000). Ths fxed effect model posts that each arlne has ts own ntercept but shares the same slopes of regressors (.e., output ndex, fuel prce, and loadng factor). Then, how do we get arlne specfc ntercepts? How do we nterpret the dummy coeffcents u 1 -u 5? How do we report regresson equatons n LSDV? The parameter estmate of g 6 (dropped dummy) s presented n the LSDV ntercept (9.7930), whch s the baselne ntercept (reference pont). Each of u 1 -u 5 represents the devaton of ts group specfc ntercept from the baselne ntercept (ntercept of arlne 6). For nstance, u 1 = means that the ntercept of arlne 1 s.0871 smaller than the reference 13 The last dummy g 6 (arlne 6) was dropped and used as the reference group. Of course, you may drop any other dummy to get the equvalent result.

23 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: pont Accordngly, the ntercept of arlne 1 s = (-.0871). 14 More formal computaton s = (-.0871)*1 + (-.183)*0 + (-.960)*0 + (.0975)*0 + (-.0630)*0. Note that all group dummes other than g 1 are zero n case of arlne 1. Smlarly, we can compute other ntercepts for arlne -5 and eventually get the followng sx regresson equatons. Arlne 1: cost = *output *fuel *load Arlne : cost = *output *fuel *load Arlne 3: cost = *output *fuel *load Arlne 4: cost = *output *fuel *load Arlne 5: cost = *output *fuel *load Arlne 6: cost = *output *fuel *load Notce that all parameter estmates of regressors are the same regardless of arlne. The coeffcents of g 1 -g 5 are nterpreted as, The ntercept of arlne s.184 smaller than that of baselne ntercept (arlne 6) , but ths devaton s not statstcally sgnfcant at the.05 sgnfcance level (p<.094). The ntercept of arlne 3 s.960 smaller than that of baselne ntercept and ths devaton s statstcally dscernable from zero at the.05 level (p<.000). The ntercept of arlne 4 s ,.0975 larger than that of baselne ntercept (p<.004). The queston here s whch model s better than the other? The pooled OLS or LSDV? And why? What are the costs and benefts of addng group dummes and get dfferent group ntercepts? Is addton of group dummes valuable? 4.3 Comparng Pooled OLS and LSDV (Fxed Effect Model) There are some sgnfcant dfference between the pooled OLS and LSDV (Table 4.1). LSDV mproved all goodness-of-ft measures lke F-test, SSE, root MSE, and (adjusted) R sgnfcantly but lost 5 degrees of freedom by addng fve group dummes. LSDV seems better than the pooled OLS. Table 4.1 Comparng Pooled OLS and LSDV Pooled OLS LSDV Ouput ndex.887 (p<.000).9193 (p<.000) Fuel prce.4540 (p<.000).4175 (p<.000) Loadng factor (p<.000) (p<.000) Overall ntercept (baselne ntercept) (p<.000) (p<.000) Arlne 1 (devaton from the baselne) (p<.304) Arlne (devaton from the baselne) (p<.094) Arlne 3 (devaton from the baselne) (p<.000) Arlne 4 (devaton from the baselne).0975 (p<.004) Arlne 5 (devaton from the baselne) (p<.010) F-test (p<.0000) (p<.0000) Degrees of freedom (error) However, the coeffcent of g 1 s not statstcally dscernable from zero at the.05 level (t=-1.03, p<.304).

24 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 3 SSE (Sum of squares error) Root MSE R Adjusted R N Source: Parameter estmates of regressors show some dfferences between the pooled OLS and LSDV, but all of them are statstcally sgnfcant at the.01 level. The pooled OLS reports the overall ntercept, whle LSDV presents the ntercept of the dropped (baselne) and devatons of other fve ntercepts from the baselne. Large p-values of arlne 1 and suggest that the ntercepts of arlne 1 and are not sgnfcantly devated from the baselne ntercept (ntercept of arlne 6). Fgure 4.1 hghlghts dfferences n ntercepts between the pooled OLS (left) and LSDV (rght). The red lne on the left plot s the OLS regresson lne wth the overall ntercept of The red lne on the rght plot s the regresson lne of arlne 6 whose dummy varable was excluded from the model. Other thn lnes respectvely represent regresson lnes of arlne 1 through 5. For example, the top yellow lne has the largest ntercept of arlne 4, whle the bottom green lne has the smallest ntercept of arlne 3. Fgure 4.1.Comparng Pooled OLS and LSDV Total Cost Total Cost of U.S. Arlnes (OLS) Output Index Source: Wllam Greene Total Cost Total Cost of U.S. Arlnes (LSDV) Arlne 4 Baselne Arlne Arlne Output Index Source: Wllam Greene Note that the slopes of regresson lnes are smlar n both plots because the coeffcent of output ndex s smlar n OLS and LSDV. If loadng factor was used, the slopes of these lnes would be dfferent. Ths eyeballng gves us subjectve evdence of fxed group effect, but ths evdence s not suffcent n a strong econometrc sense. Secton 5 wll dscus a formal test to examne the presence of the fxed effect. 4.4 Estmaton Strateges: LSDV1, LSDV, and LSDV3 The least squares dummy varable (LSDV) regresson s ordnary least squares (OLS) wth dummy varables. The key ssue n LSDV s how to avod the perfect multcollnearty or so called dummy varable trap. Each approach has a constrant (restrcton) that reduces the number of parameters to be estmated by one and thus makes the model dentfed. LSDV1 drops a dummy varable; LSDV suppresses the ntercept; and LSDV3 mposes a restrcton. These approaches are dfferent from each other wth respect to model estmaton and

25 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 4 nterpretaton of dummy varable parameters (Suts 1984: 177). They produce dfferent dummy parameter estmates, but ther results are equvalent. You have to know the pros and cons of these three approaches Estmatng LSDV1 The frst approach, LSDV1, drops a dummy varable as shown n 4.. That s, the parameter of the elmnated dummy varable s set to zero and s used as a baselne. You should be LSDV 1 careful when selectng a varable to be dropped, d dropped (g 6 n 4.), so that t can play a role of the reference group effectvely. The functonal form of LSDV1 s, cos t = 0 + β1output + β fuel + β3loadng + u1g1 + ug + u3g3 + u4g4 + u5g5 β + ε Use the.regress command followed by a dependent varable and ndependent varables ncludng a set of dummes (excludng one of dummes). The coeffcent of a dummy ncluded means how far ts parameter estmate s away from the reference pont or baselne (.e., the overall ntercept).. regress cost g1-g5 output fuel load What f we drop a dfferent dummy varable, say g 1, nstead of g 6? Snce the dfferent reference pont s appled, we wll get dfferent dummy coeffcents. But other statstcs such as parameter estmates of regressors and goodness-of-ft measures reman unchanged. That s, choce of a dummy varable to be dropped does not change the model at all.. regress cost g-g6 output fuel load Source SS df MS Number of obs = F( 8, 81) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = cost Coef. Std. Err. t P> t [95% Conf. Interval] g g g g g output fuel load _cons The ntercept n ths model s the parameter estmate (Y-ntercept) of arlne 1, whose dummy varable g 1 was excluded from the model. The coeffcent ndcates the devaton of the ntercept of arlne from the baselne That s, the ntercept of arlne s.041 smaller than the reference pont of Therefore, the ntercept of arlne s computed as = Smlarly, the ntercept of arlne 3 s computed as =

26 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 5 When you have not created dummy varables, you may use the.x prefx command (nteracton expanson) to obtan the dentcal result. 15. x: regress cost.arlne output fuel load.arlne _Iarlne_1-6 (naturally coded; _Iarlne_1 omtted) Source SS df MS Number of obs = F( 8, 81) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = cost Coef. Std. Err. t P> t [95% Conf. Interval] _Iarlne_ _Iarlne_ _Iarlne_ _Iarlne_ _Iarlne_ output fuel load _cons Estmatng LSDV LSDV ncludes all dummes and, n turn, suppresses the ntercept (.e., set the ntercept to zero). Its functonal form s, cos t = 1output + β fuel + β3loadng + u1g1 + ug + u3g3 + u4g4 + u5g5 + u6g6 β + ε You can ft LSDV usng.regress wth the noconstant opton, whch suppresses the ntercept n the model. Notce that all group dummes g 1 -g 6 are ncluded n the model.. regress cost g1-g6 output fuel load, noconstant Source SS df MS Number of obs = F( 9, 81) =. Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = cost Coef. Std. Err. t P> t [95% Conf. Interval] g g g g g g output fuel load The Stata.x s used ether as an ordnary command or a prefx command..x creates dummes from a categorcal varable specfed n the term. and then run the command followng the colon. Stata by default drops the frst dummy varable.

27 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 6 Fnd that all parameter estmates of regressors are the same as those n LSDV1. Also the coeffcents of sx dummes represent ther group ntercepts; that s, you do not need to compute ndvdual group ntercepts. Ths s the beauty of LSDV. LSDV, however, reports ncorrect (nflated) R (1. >.9974) and F (very large > ). Obvously, the R of 1 are not lkely. Ths s because the X matrx does not, due to the suppressed ntercept, have a column vector of 1 and produces ncorrect sums of squares of model and total (Uyar and Erdem, 1990: 98). However, the sum of squares of errors (SSE) and ther standard errors of parameter estmates are correct n any LSDV Estmatng LSDV3 LSDV3 ncludes the ntercept and all dummes, and then mpose a restrcton that the sum of parameters of all dummes s zero. The functonal form of LSDV3 s, cos t β + ε, subject to 0 = 0 + β1output + β fuel + β3loadng + u1g1 + ug + u3g3 + u4g4 + u5g5 + u6g6 u 1 + u + u3 + u4 + u5 + u6 = In Stata, you need to use both.constrant and.cnsreg commands to ft LSDV3..constrant defnes a constrant, whle.cnsreg fts a constraned OLS usng the constrant()opton. The number n the parenthess, 1 n the followng example, ndcates the constrant number defned n.constrant.. constrant defne 1 g1 + g + g3 + g4 + g5 + g6 = 0. cnsreg cost g1-g6 output fuel load, constrant(1) Constraned lnear regresson Number of obs = 90 F( 8, 81) = Prob > F = Root MSE = ( 1) g1 + g + g3 + g4 + g5 + g6 = 0 cost Coef. Std. Err. t P> t [95% Conf. Interval] g g g g g g output fuel load _cons LSDV3 returns the same parameter estmates of regressors and ther standard errors as do LSDV1 and LSDV. Stata.cnsreg command does not provde an ANOVA table and goodness-of-ft statstcs other than F and square root of MSE. Unlke LSDV1 and LSDV, LSDV3 produces the ntercept and sx dummy coeffcents but these coeffcents have dfferent meanngs. The LSDV3 ntercept s the average of ndvdual group ntercepts, whle a dummy coeffcent s the devaton of the group ntercept from the averaged ntercept. For example, = ( )/6. The coeffcent.0165 of arlne 5 s the devaton from the averaged ntercept ; that s, 0165=

28 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Comparng LSDV1, LSDV, and LSDV3 Three approaches end up fttng the same model and report the same parameter estmates of regressors and ther standard errors (Table 4.). LSDV1 and LSDV3 reports correct goodness-of-ft measures (Stata.cnsreg dsplays F-test and root MSE only), whle LSDV reports correct SSE and root MSE but returns nflated (ncorrect) F-test and R. Three LSDV approaches return dfferent, but equvalent (representng the same group ntercepts n dfferent manners), dummy coeffcents. The key dfference of three approaches les n the meanngs of the ntercept and dummy * coeffcents (Table 4.3). A parameter estmate n LSDV, δ d, s the actual ntercept (Yntercept) of group d. It s easy to nterpret substantvely. The t-test examnes f δ * d s zero. Table 4. Comparng Results of LSDV1, LSDV, and LSDV3 LSDV1 LSDV LSDV3 Ouput ndex.9193 (.099) **.9193 (.099) **.9193 (.099) ** Fuel prce.4175 (.015) **.4175 (.015) **.4175 (.015) ** Loadng factor (.017) ** (.017) ** (.017) ** Intercept (baselne) (.637) ** (.96) ** Arlne 1 (dummy) (.084) (.1931) ** (.0456) Arlne (dummy) (.0757) (.1990) ** (.0380) Arlne 3 (dummy) (.0500) ** (.50) ** (.0161) ** Arlne 4 (dummy).0975 (.0330) ** (.418) **.1770 (.0194) ** Arlne 5 (dummy) (.039) ** (.609) **.0165 (.0367) Arlne 6 (dummy) (.637) **.0795 (.0405) F-test ** Large ** ** Degrees of freedom SSE Root MSE R Adjusted R N Source: * Standard errors n parenthess; Statstcal sgnfcance: * <.05, ** <.01 In LSDV1, a dummy coeffcent shows the extent to whch the actual ntercept of group d devates from the reference pont (the parameter of the dropped dummy varable), whch s * LSDV1 the ntercept of LSDV1, δ dropped = α. The null hypothess of t-test s that the devaton from the reference group s zero. In LSDV3, a dummy coeffcent means how far ts actual parameter s away from the average LSDV * group effect (Suts 1984: 178). The LSDV3 ntercept s the averaged effect: α = δ. d Therefore, the null hypothess s that the devaton of a group ntercept from the averaged ntercept s zero. 3 1 In short, each approach has a dfferent baselne and restrcton (u 5 =0 n LSDV1; regresson ntercept=1 n LSDV; and the sum group ntercepts s 0) and thus tests a dfferent hypothess. But all approaches produce equvalent dummy coeffcents and exactly the same parameter estmates of regressors. In other word, they all ft the same model; gven one LSDV ftted, n other words, we can replcate the other two LSDVs.

29 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 8 Table 4.3. Summary of Dummy Coeffcents n LSDV1, LSDV, and LSDV3 LSDV1 LSDV LSDV3 Dummes ncluded LSDV1 LSDV1 * * LSDV 3 d1 dd except d1 dd d1 d for LSDV1 d dropped LSDV 3 d Intercept? LSDV1 α No LSDV 3 α All dummes? No (d-1) Yes (d) Yes (d) Constrant LSDV δ = 0 (restrcton)? dropped α = 0 δ LSDV 3 = 0 (Drop one dummy) (Suppress the ntercept) (Impose a restrcton) Actual dummy * LSDV1 LSDV1 * * δ = α + δ, δ 1, δ, * * LSDV 3 LSDV 3 δ d δ = α + δ, parameters δ = α * LSDV 1 dropped Meanng of a dummy coeffcent How far away from the reference group (dropped)? H 0 of the t-test * * δ = 0 δ δ 0 dropped Actual ndvdual ntercept * = Source: Constructed from Suts (1984) and Davd Good s lecture (004) LSDV * α = δ d How far away from the averaged group effect? * δ d * δ = 0 Whch approach s better than the others? You need to consder both estmaton and nterpretaton ssues carefully. In general, LSDV1 s often preferred because of easy estmaton n statstcal software packages. Oftentmes researchers want to see how far dummy parameters devate from the reference group rather than the actual group ntercepts. If you have to report ndvdual group ntercepts, LSDV gves the answer drectly. Fnally, LSDV and LSDV3 nvolve some estmaton problems; for example, LSDV reports an ncorrect R.

30 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 9 5. Fxed Effect Model A fxed group model examnes group dfferences n ntercepts. The LSDV for ths fxed model needs to create as many dummy varables as the number of enttes or subjects. When many dummes are needed, the wthn effect model s useful snce t uses transformed varables wthout creatng dummes. The wthn estmaton does not use dummy varables and thus has larger degrees of freedom, smaller MSE, and smaller standard errors of parameters than those of LSDV; therefore, we need to adjust these statstcs. Because ths estmaton does not report ndvdual dummy coeffcents ether, you need to compute them f really needed. Notce that R reported n the wthn effect model s ncorrect. 5.1 Estmatng Wthn Estmators Manually In order to estmate wthn group estmators manually, you need to compute group means of all dependent varables and regressors. The quetly below suppresses the termnal output of the command.egen, whch produces group means n ths case.. quetly egen gm_cost=mean(cost), by(arlne). quetly egen gm_output=mean(output), by(arlne). quetly egen gm_fuel=mean(fuel), by(arlne). quetly egen gm_load=mean(load), by(arlne) You wll get the followng group means of varables. For nstance, s the mean of total costs of arlne 1 from perod 1 through arlne gm_cost gm_output gm_fuel gm_load Then, transform dependent and ndependent varables to compute ther devatons from group means.. quetly gen gw_cost = cost - gm_cost. quetly gen gw_output = output - gm_output. quetly gen gw_fuel = fuel - gm_fuel. quetly gen gw_load = load - gm_load Ths transformaton results n new varables as follows.. lst arlne year gw_cost-gw_load arlne year gw_cost gw_output gw_fuel gw_load

31 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Now, we are ready to run the wthn effect model wth the ntercept suppressed. The noconstant (or noc) opton suppresses the ntercept.. regress gw_cost gw_output gw_fuel gw_load, noc Source SS df MS Number of obs = F( 3, 87) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.058 gw_cost Coef. Std. Err. t P> t [95% Conf. Interval] gw_output gw_fuel gw_load Compare ths output wth the LSDV output n 4.. The wthn effect model reports correct SSE and parameter estmates of regressors but produces ncorrect R and standard errors of parameter estmates. Notce that the degrees of freedom ncrease from 81 (LSDV) to 87 snce sx dummy varables are not used. * You may compute group ntercepts usng d = y β ' x. For example, the ntercept of arlne 5 s computed as = {.9193*(-.857) * ( )*.5665}. In order to get the correct standard errors, you need to adjust them usng the rato of degrees of freedom of the wthn effect model and LSDV. For example, the standard error of output ndex s computed as.099=.088*sqrt(87/81). 5. Wthn Estmaton Usng.xtreg The Stata.xtreg command estmates wthn group estmators wthout creatng dummy varables. Let us frst run the.tsset command and specfes cross-sectonal and tme-seres varables. Note that both varables should be numerc n.tsset.. quetly tsset arlne year The.xtreg command s followed by a dependent varable, regressors, and optons. The fe opton tells Stata to ft the wthn effect model. 16. xtreg cost output fuel load, fe (arlne) 16 (arlne) specfes arlne as the ndependent unt but ths opton s redundant because group and tme varables are already defned n.tsset.

32 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 31 Fxed-effects (wthn) regresson Number of obs = 90 Group varable: arlne Number of groups = 6 R-sq: wthn = Obs per group: mn = 15 between = avg = 15.0 overall = max = 15 F(3,81) = corr(u_, Xb) = Prob > F = cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons sgma_u sgma_e rho (fracton of varance due to u_) F test that all u_=0: F(5, 81) = Prob > F = Compare ths wthn effect model wth the LSDV output n 4.. Ths command reports correct parameter estmates and ther standard errors of regressors but returns ncorrect F 3, and R of.996. The F-test n the last lne of the output examnes the null hypothess that fve dummy parameters n LSDV1 are zero (e.g., μ 1 =0, μ =0, μ 3 =0, μ 4 =0, and μ 5 =0). The large F statstc reject the null hypothess n favor of the fxed group effect (p<.0000). Recall that the ntercept of s the averaged ntercept n LSDV3. By default,.xtreg does not dsplay an analyss of varance (ANOVA) table ncludng SSE. Snce many related statstcs are stored n macros, you need to run.dsplay (or.d) to get them. 17 The followng commands return SSM, SSE, SEE or square root of MSE=e(rss)/e(df_r), R, and adjusted R, respectvely. Notce that SEE s reported under the label sgma_e.. dsplay e(mss) e(rss) sqrt(e(rss)/e(df_r)) d e(r) e(r_a) Alternatvely, you may use.areg to get the same result except for the correctr. Lke.xtreg, the.areg command returns the same ntercept, the averaged ntercept n LSDV3.. areg cost output fuel load, absorb(arlne) Lnear regresson, absorbng ndcators Number of obs = 90 F( 3, 81) = Prob > F = R-squared = Adj R-squared = Root MSE = cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel In order to vew the lst of macros avalable n.xtreg, run.help xtreg.

33 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 3 load _cons arlne F(5, 81) = (6 categores) Let us get SSE from the macro varable e(rss).. d e(rss) e(tss)-e(rss) Table 5.1 Comparson of OLS, LSDV, and Wthn Effect Models OLS LSDV Wthn.xtreg.areg Ouput ndex.887 **.9193 **.9193 **.9193 **.9193 ** (.0133) (.099) Fuel prce.4540 **.4175 ** (.003) (.015) Loadng factor ** ** (.3453) (.017) Intercept (baselne) ** ** (.9) (.637) Arlne 1 (dummy) (.084) Arlne (dummy) (.0757) Arlne 3 (dummy) ** (.0500) Arlne 4 (dummy).0975 ** (.0330) Arlne 5 (dummy) ** (.088).4175 ** (.0147) ** (.1946) (.099).4175 ** (.015) ** (.017) ** (.96) (.099).4175 ** (.015) ** (.017) ** (.96) (.039) F-test (model) ** ** ** ** ** Degrees of freedom SSM (model) SSE (error/resdual) Root MSE (SEE) R Adjusted R F-test (fxed effect) ** ** N Source: * Standard errors n parenthess; Statstcs hdden n macros are talczed; Statstcal sgnfcance: * <.05, ** <.01 Table 5.1 contrasts the output of the pooled OLS and four fxed effect estmatons (.e., LSDV, the wthn effect model,.xtreg, and.areg). All for estmatons produce the same SSE and parameter estmates but reports a bt dfferent standard errors and goodness-of-ft measures. The orgnal wthn effect model reports ncorrect standard errors, F statstcs, SEE, and R (See the numbers n red). The estmaton usng.xtreg and.areg return adjusted (corrected) standard errors and SEE; conduct F-test for fxed effect; and report the correct averaged ntercept and ts standard error. However, they report the same, wrong F statstc and do not, by default, dsplay SSE. Whle.areg reports correct (adjusted) R,.xtreg holds wrong correct (adjusted) R n macro varables. So whch estmaton s best for you? LSDV s generally preferred because of correct estmaton, goodness-of-ft, and group/tme specfc ntercepts (n partcular LSDV). If the number of enttes and/or tme perods s large enough, say 100 tme perods,.xtreg and.areg wll provde less panful and more elegant solutons ncludng F-test for fxed effects. However, you should keep n mnd that they produce an ncorrect F score for model test and (adjusted) R (n.xtreg). Agan DO NOT read F score and R from the.xtreg output but get correct ones from LSDV.

34 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 33 If you want to try a random tme effect model, add (year) to.xtreg. Or swtch crosssectonal and tme seres varables usng.tsset and then run.xtreg agan.. quetly xtreg cost output fuel load, fe (year). tsset year arlne panel varable: year (strongly balanced) tme varable: arlne, 1 to 6 delta: 1 unt. quetly xtreg cost output fuel load, fe 5.3 Testng a Fxed Effect (F-test) How do we know f there s a sgnfcant fxed group effect? The F-test based on loss of ft s the case. The null hypothess of ths F-test s that all dummy parameters except for one are zero: H 0 : μ 1 =... = μ n 1 = 0. In order for the F-test, let us obtan the SSE (e e) of from the pooled OLS regresson and.96 from the LSDVs (LSDV1 through LSDV3). Alternatvely, you may draw R of.9974 from LSDV1 or LSDV3 and.9883 from the pooled OLS. DO NOT, however, read R from LSDV, the orgnal wthn effect model, or the.xtreg output. The F statstc s computed as, ( ) (6 1) ( ) (6 1) = ~ [5,81]. (.96) (90 6 3) (1.9974) (90 6 3) seems large enough to reject the null hypothess. We already know that.xtreg and.areg by default conduct the F-test for fxed effects. Alternatvely, we can run the.test command, a follow-up command for the Wald test, rght after fttng LSDV.. quetly regress cost g1-g5 output fuel load. test g1 g g3 g4 g5 ( 1) g1 = 0 ( ) g = 0 ( 3) g3 = 0 ( 4) g4 = 0 ( 5) g5 = 0 F( 5, 81) = Prob > F =

35 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Random Effect Model A random effect model examnes how group and/or tme nfluence error varances. Ths secton dscusses the feasble generalzed least squares (FGLS) and varous estmaton methods avalable n Stata. 18 In order to get θ for FGLS, we need between estmaton frst. 6.1 Between Estmaton: Group Mean Regresson In a between group effect model, the unt of analyss s not an ndvdual observaton, but entty. Accordngly, the number of observatons jumps down from nt to n. Snce ths model uses aggregate group means of varables, t s often called as group mean regresson. Let us compute group means usng the.collapse command. Ths command computes aggregate nformaton, group means n ths case, and stores nto a new data set n memory. Note that /// lnks two command lnes.. collapse (mean) gm_cost=cost (mean) gm_output=output (mean) gm_fuel=fuel (mean) /// gm_load=load, by(arlne) Indvdual group means are lsted below.. lst arlne gm_cost-gm_load arlne gm_cost gm_output gm_fuel gm_load Now run OLS on these new varables n order to get SSE regress gm_cost gm_output gm_fuel gm_load Source SS df MS Number of obs = F( 3, ) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.1585 gm_cost Coef. Std. Err. t P> t [95% Conf. Interval] gm_output gm_fuel gm_load _cons Baltag and Cheng (1994) ntroduce varous ANOVA estmaton methods, such as a modfed Wallace and Hussan method, the Wansbeek and Kapteyn method, the Swamy and Arora method, and Henderson s method III. They also dscuss maxmum lkelhood (ML) estmators, restrcted ML estmators, mnmum norm quadratc unbased estmators (MINQUE), and mnmum varance quadratc unbased estmators (MIVQUE). Based on a Monte Carlo smulaton, they argue that ANOVA estmators are Best Quadratc Unbased estmators of the varance components for the balanced model, whereas ML, restrcted ML, MINQUE, and MIVQUE are recommended for the unbalanced models.

36 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 35 You can also use be opton n.xtreg to ft the same between effect model, but t does not report an ANOVA table. In the followng output, R.9936 s reported under the label between = and SEE.158 under sd(u_ + avg(e_.))=.. xtreg cost output fuel load, be (arlne) Between regresson (regresson on group means) Number of obs = 90 Group varable: arlne Number of groups = 6 R-sq: wthn = Obs per group: mn = 15 between = avg = 15.0 overall = max = 15 F(3,) = sd(u_ + avg(e_.))= Prob > F = cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons Estmatng a Random Effect Model Manually Snce the covarance structure of ndvdual, Σ, s not known, we have to estmateθ usng the SSEs of the between group effect model (.0317) and the fxed group effect model (.96). See the formula n 3.4 and computaton below. The varance component of errorσ s =.9687/(6*15-6-3) ˆv The varance component of group ˆ σ u s = /(6-4) /15 ˆ σ ˆ v σ v Thus, θˆ s = 1 = 1 T ˆ σ + ˆ ˆ u σ v Tσ between SSE where ˆ σ = between between = = n k = * /(6-3 -1), Next, transform the dependent and ndependent varables ncludng the ntercept usng θˆ gen rg_cost = cost *gm_cost. gen rg_output = output *gm_output. gen rg_fuel = fuel *gm_fuel. gen rg_load = load *gm_load. gen rg_nt = // for the ntercept. lst arlne year rg_cost-rg_nt arlne year rg_cost rg_output rg_fuel rg_load rg_nt

37 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Fnally, run OLS wth these transformed varables. Do not forget to add noconstant to suppress the OLS ntercept.. regress rg_cost rg_output rg_fuel rg_load rg_nt, noc Source SS df MS Number of obs = F( 4, 86) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = rg_cost Coef. Std. Err. t P> t [95% Conf. Interval] rg_output rg_fuel rg_load rg_nt Parameter estmates are smlar to those n the fxed effect model n 4. and 5.. SSE and SEE are.3116 and.060, respectvely. The (adjusted) R reported s.9989 but s not correct because the ntercept s suppressed; correct R s Random Effect Model Usng.xtreg We need to use the re opton n.xtreg to produce FGLS estmates. The theta opton reports an estmated theta (.8767). The parameter estmates and ther standard errors are the same as those n 6.. The R of.995 under the label wthn = s smlar to correct xtreg cost output fuel load, re theta Random-effects GLS regresson Number of obs = 90 Group varable: arlne Number of groups = 6 R-sq: wthn = Obs per group: mn = 15 between = avg = 15.0 overall = max = 15 Random effects u_ ~ Gaussan Wald ch(3) = corr(u_, X) = 0 (assumed) Prob > ch = theta = cost Coef. Std. Err. z P> z [95% Conf. Interval] output fuel load _cons

38 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 37 sgma_u sgma_e rho (fracton of varance due to u_) The sgma_u (σ u ) and sgma_e (σ v ) are square roots of the varance components for groups and errors, respectvely (.0156=.149^,.0036=.0601^). Note that SEE s.060 dsplayed under sgma_e. The label rho represents the rato of ndvdual specfc error varance to the composte (entre) error varance; that s,.8119=.149 / ( ). A large rato means that ndvdual specfc errors account for large proporton of the composte error varance; In ths random effect model, for nstance, the ndvdual specfc error can explan 81 percent of entre composte error varance. Accordngly, ths rato may be nterpreted as a goodness-offt of random effect model. The.xtmxed command also provdes estmaton methods for random effects. The arlne:, opton (the comma should not be omtted) tells Stata to use the subject varable arlne. Parameter estmates and ther standard errors are slghtly dfferent from those n 6.. Varance components for groups and errors are reported under the labels sd(_cons) and sd(resdual).. xtmxed cost output fuel load arlne:, Performng EM optmzaton: Performng gradent-based optmzaton: Iteraton 0: log restrcted-lkelhood = Iteraton 1: log restrcted-lkelhood = Computng standard errors: Mxed-effects REML regresson Number of obs = 90 Group varable: arlne Number of groups = 6 Obs per group: mn = 15 avg = 15.0 max = 15 Wald ch(3) = Log restrcted-lkelhood = Prob > ch = cost Coef. Std. Err. z P> z [95% Conf. Interval] output fuel load _cons Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] arlne: Identty sd(_cons) sd(resdual) LR test vs. lnear regresson: chbar(01) = Prob >= chbar =

39 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 38 Both.xtreg and.xtmxed commands wth the mle opton support maxmum lkelhood estmaton. The followng two commands produce the same result. Notce that error varance components are computed as.0130=1141^ and.0035 =.0591^.. xtreg cost output fuel load, re mle Random-effects ML regresson Number of obs = 90 Group varable: arlne Number of groups = 6 Random effects u_ ~ Gaussan Obs per group: mn = 15 avg = 15.0 max = 15 LR ch(3) = Log lkelhood = Prob > ch = cost Coef. Std. Err. z P> z [95% Conf. Interval] output fuel load _cons /sgma_u /sgma_e rho Lkelhood-rato test of sgma_u=0: chbar(01)= Prob>=chbar = xtmxed cost output fuel load arlne:, mle Performng EM optmzaton: Performng gradent-based optmzaton: Iteraton 0: log lkelhood = Iteraton 1: log lkelhood = Computng standard errors: Mxed-effects ML regresson Number of obs = 90 Group varable: arlne Number of groups = 6 Obs per group: mn = 15 avg = 15.0 max = 15 Wald ch(3) = Log lkelhood = Prob > ch = cost Coef. Std. Err. z P> z [95% Conf. Interval] output fuel load _cons Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] arlne: Identty sd(_cons) sd(resdual) LR test vs. lnear regresson: chbar(01) = Prob >= chbar = If you want to try a random tme effect model, add (year) to.xtreg as shown n 5.3

40 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 39. xtreg cost output fuel load, re (year) theta Table 6.1 Comparson of OLS and Varous Random Effect Estmatons OLS Random Effect.xtreg.xtmxed.xtreg mle Ouput ndex.887 **.9067 **.9067 **.9073 **.9053 ** (.0133) (.056) (.056) (.058) Fuel prce.4540 **.48 **.48 **.45 ** (.003) (.0140) (.0140) (.0141) Loadng factor ** ** ** ** (.3453) (.001) (.001) (.1998) Intercept ** ** ** 9.63 ** (.9) (.10) (.10) (.116) F, Wald, LR test ** ** ** ** ** SSM (model) SSE SEE or σˆ v (.054).434 ** (.0139) ** (.196) ** (.066) σˆ u θ R Adjusted R LR Test ** ** N Source: * Standard errors n parenthess; Statstcal sgnfcance: * <.05, ** <.01 The.xtreg wthout mle produces correct parameter estmates and ther standard errors of the random effect model but ncorrect R. The.xtmxed command wthout mle employes restrcted maxmum lkelhood (REML) estmaton and report slghtly dfferent parameter estmates and ther standard errors. The.xtreg and.xtmxed commands wth mle return the same full nformaton maxmum lkelhood (FIML) estmates, whch are slghtly dfferent from the frst two methods. The maxmum lkelhood estmaton conducts the lkelhood rato (LR) test to examne random effects. 6.4 Testng a Random Effect: LM test The Breusch-Pagan Lagrange multpler (LM) test examnes f any random effect exsts. The null hypothess s that ndvdual-specfc or tme-specfc error varance components are zero: H 0 : σ u = 0. If the null hypothess s not rejected, the pooled OLS s preferred; otherwse, the random effect model s better. See the formula n In order for the LM test, we need to know e' e, SSE of the pooled OLS, and e' e, the sum of squared group specfc resduals. The e' e of the pooled OLS s and e' e s computed as follows.. quetly regress cost output fuel load // run pooled OLS. quetly predct r, resd // calculate resduals and save nto r. collapse (mean) gm_r=r, by(arlne) // calculate group means of r. quetly gen gm_r=gm_r^ // calculate squared group means of r. lst arlne gm_r gm_r

41 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: tabstat gm_r, stat(sum) // obtan the sum of squared group means of r varable sum gm_r Fnally the LM test s, 6 *15 15 * = 1 ~ χ (1) (15 1) Wth the large ch-squared of , we reject the null hypothess n favor of the random group effect model (p <.0000). In Stata, run the.xttest0 command rght after estmatng the one-way random effect model n order to get the same result.. quetly xtreg cost output fuel load, re (arlne). xttest0 Breusch and Pagan Lagrangan multpler test for random effects cost[arlne,t] = Xb + u[arlne] + e[arlne,t] Estmated results: Var sd = sqrt(var) cost e u Test: Var(u) = 0 ch(1) = Prob > ch =

42 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Hausman Test and Chow Test If you fnd both sgnfcant fxed and random effects, whch effect s more sgnfcant and whch model s better than the other? The Hausman specfcaton test can answer ths queston by comparng fxed and random effects. What f you come to thnk that ndvdual slopes of regressors are not constant but vary across arlne or tme? A poolablty test wll gve you an answer. Table 6.1 summarzes the results of pooled OLS, fxed effect, and random effect model. We may ask, Whch model s better than the others? 7.1 Hausman Test To Choose Fxed or Random Effect How do we compare a fxed effect model and ts random counterpart? The Hausman specfcaton test examnes f the ndvdual effects are uncorrelated wth other regressors n the model. If ndvdual effects are correlated wth any other regressor, the random effect model volates a Gauss-Markov assumpton and s no longer Best Lnear Unbased Estmate (BLUE). It s because ndvdual effects are parts of the error term n a random effect model. Therefore, f the null hypothess s rejected, a fxed effect model s favored over the random counterpart. In a fxed effect model, ndvdual effects are parts of the ntercept and the correlaton between the ntercept and regressors does not volate any Gauss-Markov assumpton; a fxed effect model s stll BLUE. Let us frst check cross-sectonal and tme-seres varables to make sure we are dong fne.. tsset arlne year panel varable: arlne (strongly balanced) tme varable: year, 1 to 15 delta: 1 unt The Hausman test requres that both fxed and random effect models are ftted..estmate store saves the result of the random effect model nto random_group.. quetly xtreg cost output fuel load, re. quetly estmates store random_group. quetly xtreg cost output fuel load, fe. quetly estmates store fxed_group. hausman random_group fxed_group Run the.hausman command followed by random and fxed effect results n order.. hausman random_group fxed_group ---- Coeffcents ---- (b) (B) (b-b) sqrt(dag(v_b-v_b)) random_group fxed_group Dfference S.E. output fuel load b = consstent under Ho and Ha; obtaned from xtreg B = nconsstent under Ha, effcent under Ho; obtaned from xtreg Test: Ho: dfference n coeffcents not systematc

43 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 4 ch(3) = (b-b)'[(v_b-v_b)^(-1)](b-b) = -.1 ch<0 ==> model ftted on these data fals to meet the asymptotc assumptons of the Hausman test; see suest for a generalzed test Alternatvely, you may replace fxed_group wth a perod (.) ndcatng last ftted model, the fxed effect model n ths case.. hausman random_group. The Hausman test returns -.1 and but warns that data fals to meet the asymptotc assumptons. Although the ch-squares score s small enough not to reject the null hypothess, we may not conclude that the random effect model s better than ts fxed counterpart; the test s not conclusve. 7. Chow Test for Poolablty The poolablty test, here Chow test, examnes f panel data are poolable so that the slopes of regressors are the same across ndvdual enttes or tme perods (Bantag, 001: 51-55). If the null hypothess of poolablty s rejected, ndvdual arlnes may have ther own slopes of regressors and then fxed and/or random effects are no longer appealng. Instead, you may try random coeffcent model or herarchcal regresson model that s skpped n the workng paper. In order for poolablty test, we need to run group by group (or tme by tme) OLS regressons. In Stata, the forvalues loop and f qualfer make t easy to run group by group regressons. Open the Stata do edtor by runnng.doedt at the dot prompt, type n the followng commands, and then execute them. forvalues = 1(1)6 { dsplay "OLS regresson for group " `' regress cost output fuel load f arlne==`' } OLS regresson for group 1 Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.0486 cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons OLS regresson for group Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.066 cost Coef. Std. Err. t P> t [95% Conf. Interval]

44 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 43 output fuel load _cons OLS regresson for group 3 Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.0456 cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons OLS regresson for group 4 Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE =.0561 cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons OLS regresson for group 5 Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons OLS regresson for group 6 Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Resdual R-squared = Adj R-squared = Total Root MSE = cost Coef. Std. Err. t P> t [95% Conf. Interval] output fuel load _cons

45 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 44 The null hypothess of the poolablty test across group s that all slopes of regressors are the same across group: H 0 : β k = β k for 1 th group and 1 kth regressor. The SSE of the pooled OLS regresson, whch represented by e' e, s And the sum of SSEs of group by group regresson, e ' e, s.1007 = The F statstc s, ( (6 1)(3 + 1) (15 3 1) ~ [ 0,66] The large rejects the null hypothess of poolablty (p<.0000). We conclude that the panel data are not poolable wth respect to arlne. Both fxed and random effect models may be msleadng and we need to try random coeffcent model or herarchcal lnear regresson model. 19 The followng.xtrc estmates Swamy s (1970) random-coeffcents lnear regresson model and betas presents group specfc slopes. Theoretcal dscusson and nterpretaton of the result are skpped.. xtrc cost output fuel load, betas Random-coeffcents regresson Number of obs = 90 Group varable: arlne Number of groups = 6 Obs per group: mn = 15 avg = 15.0 max = 15 Wald ch(3) = Prob > ch = cost Coef. Std. Err. z P> z [95% Conf. Interval] output fuel load _cons Test of parameter constancy: ch(0) = Prob > ch = Group-specfc coeffcents Coef. Std. Err. z P> z [95% Conf. Interval] Group 1 output fuel load _cons Group output fuel load _cons However, ths Chow test may be problematc when errors do not follow a normal dstrbuton: ε ~ N(0,Ω) nstead of ε ~ N(0,σ I). See Bantag (001: 5-55) for extensve dscusson on ths ssue.

46 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 45 Group 3 output fuel load _cons Group 4 output fuel load _cons Group 5 output fuel load _cons Group 6 output fuel load _cons

47 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Presentng Panel Data Models The key queston now s, Whch nformaton do we have to report? And how? Some studes report parameter estmates and ther statstcal sgnfcances only; Others nclude standard errors but exclude goodness-of-ft measures; And oftentmes researchers fal to nterpret the results substantvely for readers. Ths secton dscusses general gudelnes for presentng panel data models. However, specfc peces of nformaton to be presented and ther styles depend on research questons and purpose of studes. 8.1 Presentng All Possble Models? No! Some studes present all possble models ncludng the pooled OLS, fxed effect model, random effect models, and two-way effect model. Is ths practce reasonable? No. Strctly speakng, f one model s rght, the other models are wrong. It must be absurd to present wrong models together unless comparson of models s the goal of the study. If a fxed effect turns out nsgnfcant, why are you tryng to present the wrong model? In short, you just need to report a rght model or your fnal model only. 8. Whch Informaton Should Be Reported? You MUST report goodness-of-ft measures, parameter estmates wth ther standard errors, and test results (See Table 8.1) Goodness-of-ft Measures Goodness-of-ft examnes the extant that the model fts data. In case of poor goodness-of-ft, you need to try other model. The essental goodness-of-ft measures that you need to report are, F-test (or lkelhood rato test) to test the model and ts sgnfcance (p-value). Sum of squared errors (resdual), degrees of freedom for errors, and N (nt). R n OLS and fxed effect models. Theta θ and varance components σˆ u estmated n a random effect model. Keep n mnd that some estmaton methods report ncorrect statstcs and standard errors. For example,.xtreg returns ncorrect R n a fxed effect model because the command fts the wthn estmator (runnng OLS on transformed data wth the ntercept suppressed). Both between R and overall R dsplayed n Stata output are almost meanngless. In order to get correct R for a fxed effect model, use LSDV1 or.areg. Use macro varables, f needed, to obtan varous goodness-of-ft measures that are not dsplayed n Stata output. 8.. Parameter Estmates of Regressors Obvously, you must report parameter estmates and ther standard errors. Fortunately,.regress and.xtreg produce correct parameter estmates and ther adjusted standard errors. But the wthn estmaton tself produces ncorrect standard errors due to ncorrect (larger) degrees of freedom (see Table 3.) Parameter Estmates of Dummy Varables

48 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 47 In a fxed effect model, a queston s f ndvdual ntercepts need to be reported. In general, parameter estmates of regressors are of prmary nterest n most cases and accordngly ndvdual ntercepts are not needed. However, you have to report them f audence wants to know or ndvdual effects are of man research nterest. The combnaton of LSDV1 or LSDV wll gve you easy solutons for ths case (see 4.). 0 Do not forget that LSDV1, LSDV, and LSDV3 have dfferent meanngs of dummy parameters and that null hypotheses of t-test dffer from one another (see Table 4.3) Test Results Fnally, you should report f fxed and/or random effect exsts because panel data modelng s to examne fxed and/or random effects. Report and nterpret the results of F-test for a fxed effect model and/or Breusch-Pagan LM test for a random effect model. When both fxed and random effects are statstcally sgnfcant, you need to conduct a Hausman test and report ts result. 1 If you doubt constant slopes across group and/or tme, conduct a Chow test to examne the poolablty of data. Table 8.1 Examples of Presentng Analyss Results Pooled OLS Fxed Effect Model Random Effect Model Ouput ndex.887 **.9193 **.9067 ** (.0133) (.099) Fuel prce.4540 **.4175 ** (.003) (.015) Loadng factor ** ** (.3453) (.017) Intercept (baselne) ** ** (.9) (.637) Arlne 1 (dummy) (.084) Arlne (dummy) (.0757) Arlne 3 (dummy) ** (.0500) Arlne 4 (dummy).0975 ** (.0330) Arlne 5 (dummy) ** (.056).48 ** (.0140) ** (.001) ** (.10) (.039) F-test (model) ** ** ** DF R SSE (SRMSE) SEE or σˆ v σˆ u θ.8767 Effect Test ** ** N Source: * Standard errors n parenthess; ** Statstcal sgnfcance: * <.05, ** <.01 0 In some uncommon cases, you need to report varance components estmated n a random effect model. Unlke the SAS MIXED procedure,.xtreg does not report these statstcs. 1 The null hypothess s that group/tme specfc effects are not correlated wth any regressors. Ether A random effect model s better than the fxed effect model or A random effect model s consstent s NOT a correct null hypothess. If the null hypothess s rejected, a random effect model volates a key OLS assumpton and ends up wth based and nconsstent estmates; however, a fxed effect model stll remans unbased and consstent.

49 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Interpretng Results Substantvely If your model fts the data well and ndvdual regressors turn out statstcally sgnfcant, you have to nterpret parameter estmators n a senscal way. You may not smply report sgns and magntude of coeffcents. Do not smply say, for example, an ndependent varable s sgnfcant, negatvely (or postvely) related to, or nsgnfcantly related... A standard form of nterpretaton s, For one unt ncrease n an IV, DV s expected to ncrease by OO unts, holdng all other varables constant. You may omt the ceters parbus assumpton (holdng all other varables constant). However, try to make nterpretaton more sense to audence who does not know much about econometrcs. See 4.1 and 4. for examples of substantve nterpretaton. Provde statstcal sgnfcance n a table and the p-value n parenthess at the end of the nterpretaton sentence. 8.4 Presentng Results Professonally Many studes often present results n tables but some of them fal to construct professonal tables. Common bad table examples nclude 1) large and varous fonts, ) too small and/or too large numbers, 3) colorful and stylsh border lnes, 4) badly algned numbers, and 5) nonsystematc order. The followng s the lst of checkponts to be consdered when constructng a professonal table (see Table 8.1 for an example). Ttle should descrbe the contents of a table approprately. Provde unt of measurement (e.g., Mllon Dollars) and perod (e.g., Year 010) f needed. Organze a table systemcally and compactly. Provde parameter estmates and ther standard errors n parenthess. Do not use varable names used n computer software as labels. Use loadng factor nstead of load. Use 10 pont Tmes New Roman for labels and 10 pont Courer New for numbers. Do not use stylsh fonts (e.g., Cooperplate) and too bg or too small sze. Rescale numbers approprately n order to avod such numbers as or 75,845,341,697,785. Report up to three or four dgts below the decmal pont. Do not round numbers arbtrarly. Do not use stylsh border lnes (e.g., ther colors, thckness, and type of lnes). Mnmze use of vertcal and horzontal lnes. Use no vertcal lne n general. Algn numbers to the rght and consder the locaton of decmal pont carefully. Use Standardzed coeffcents, f needed, rather than Beta, β, or beta coeffcents. Nobody knows the true value of β. Provde the source of data, f applcable, at the bottom of the table. Indcate statstcal sgnfcance as * <.05, ** < Common Mstakes and Awkward Expressons It s not dffcult to fnd awkward expressons even n academc papers. Consder followng suggestons for common mstakes n presentaton.

50 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Statstcal Sgnfcance Do not say, sgnfcant level, at 5% level, or at the level of sgnfcance α=5%, and the lke. These expressons should be sgnfcance level, at the.05 level, and at the.05 (sgnfcance) level, respectvely. Use a specfc sgnfcance level (e.g., at the.01 sgnfcance level ) rather than at the conventonal level Hypothess A hypothess s a conjecture about the unknown (e.g, α, β, δ, and σ). Therefore, b 1 = 0 s not a vald hypothess, but β 1 = 0 s. Because the b 1 s already known (estmated from the sample), you do not need to test b 1 = Parameter Estmates Say, parameter estmates of β 1 or the coeffcent of an ndependent varable 1 nstead of The coeffcent of β 1. Also say, standardzed coeffcents nstead of Beta, β, or beta coeffcent P-values Do not say, The p-value s sgnfcant. A p-value tself s nether sgnfcant nor nsgnfcant. You may say, The p-value s small enough to reject H 0 or A small p-value suggests rejecton of H Reject or Do Not Reject the Null Hypothess Say, reject or do not reject the null hypothess rather than accept (or confrm) the null hypothess. Also say reject the H 0 at the.01 level nstead of I do not beleve that the H 0 s true or The test provdes decsve evdence that the H 0 s wrong (no one knows f a H 0 s really true or wrong). Always be smple and clear.

51 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: Concluson Panel data are analyzed to nvestgate ndvdual (group) and/or tme effects usng fxed effect and random effect models. A fxed effect model asks how heterogenety from group and/or tme affects ndvdual ntercepts, whle a random effect model hypotheszes error varance structures affected by group and/or tme. Dsturbances n a random effect model are assumed to be randomly dstrbuted across group or tme. But the key dfference between fxed and random effect models s that ndvdual effect u n a random effect model should not correlated wth any regressor. Slopes are assumed unchanged n both fxed effect and random effect models. A panel data set needs to be arranged n the long form as shown n.3. Longtudnal data are balanced or unbalanced, fxed or rotatng, and long or short. If data are severely unbalanced, too long, or too short, read output wth cauton and, n case of an unbalanced panel, consder droppng subjects wth many mssng data ponts. If the number of groups (subjects) or tme perods s extremely large, you may consder categorzng subjects to reduce the number of groups or tme perods. A fxed effect model s estmated by the least squares dummy varable (LSDV) regresson and wthn estmaton. LSDV has three approaches to avod perfect multcollnearty. LSDV1 drops a dummy; LSDV suppresses the ntercept; and LSDV3 ncludes all dummes and mposes a restrcton nstead. LSDV1 s commonly used snce t produces correct statstcs. LSDV provdes actual ndvdual ntercepts, but reports ncorrect R and F score. Remember that the dummy parameters of three LSDV approaches have dfferent meanngs and thus conduct dfferent t-tests. The wthn estmaton does not use dummy varables but devatons from group means. Thus, ths estmaton s useful when there are many groups and/or tme perods n the panel data set snce t s able to avod the ncdental parameter problem. In turn, tme-nvarant ndependent varables are wped out n the data transformaton process and the dummy parameter estmates need to be computed afterward. Because of ts larger degrees of freedom, the wthn estmaton produces ncorrect R and standard errors of parameters although Stata reports adjusted standard errors. Fxed effect (F test) H 0 s not rejected (No fxed effect) H 0 s rejected (fxed effect) H 0 s not rejected (No fxed effect) H 0 s rejected (fxed effect) Random effect (B-P LM test) H 0 s not rejected (No random effect) H 0 s not rejected (No random effect) H 0 s rejected (random effect) H 0 s rejected (random effect) Your Selecton Pooled OLS Fxed effect model Random effect model Choose a fxed effect model f the null hypothess of a Hausman test s rejected; otherwse, ft a random effect model. In order to determne an approprate model for a panel, frst descrbe data carefully by producng summary statstcs and drawng plots. Then begn wth a smple model lke the pooled OLS.

52 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 51 Imagne four possble outcomes of hypothess testng shown n the table above. If both null hypotheses of F-test and LM test are not rejected, your best model s the pooled OLS. If the null hypothess of an F-test n a fxed effect model s rejected and the null of a Breusch-Pagan LM test n a random effect model s not, a fxed effect model s the case. If you fnd both sgnfcant fxed and random effects n your panel data, conduct a Hausman specfcaton test and compares a fxed effect model and a random effect model. If the null hypothess of uncorrelaton between ndvdual effects and regressors s rejected, ft a random effect model; otherwse, a fxed effect model s preferred. If you thnk that your data are not poolable and each entty has dfferent slopes of regressors, conduct a Chow test and then, f ts null hypothess s rejected, try to ft a random coeffcent model or herarchcal lnear model. For detals about model selecton, see 3.6. It s mportant to present the result correctly. The essental nformaton ncludes goodness-offt measures (e.g., F score and lkelhood rato, SSE, and R ), parameter estmates wth ther standard errors, and test results (.e., F-test, LM test, Hausman test, and Chow test). These peces of nformaton should be presented n a professonal table. Researchers should nterpret the results substantvely so that audence wthout sophstcated econometrc knowledge can understand.

53 011 Hun Myoung Park (10/19/011) Regresson Models for Panel Data Usng Stata: 5 References Baltag, Bad H Econometrc Analyss of Panel Data. Wley, John & Sons. Baltag, Bad H., and Young-Jae Chang "Incomplete Panels: A Comparatve Study of Alternatve Estmators for the Unbalanced One-way Error Component Regresson Model." Journal of Econometrcs, 6(): Breusch, T. S., and A. R. Pagan "The Lagrange Multpler Test and ts Applcatons to Model Specfcaton n Econometrcs." Revew of Economc Studes, 47(1): Cameron, A. Coln, and Pravn K. Trved Mcroeconometrcs: Methods and Applcatons. New York: Cambrdge Unversty Press. Cameron, A. Coln, and Pravn K. Trved Mcroeconometrcs Usng Stata. TX: Stata Press. Chow, Gregory C "Tests of Equalty Between Sets of Coeffcents n Two Lnear Regressons." Econometrca, 8 (3): Freund, Rudolf J., and Ramon C. Lttell SAS System for Regresson, 3 rd ed. Cary, NC: SAS Insttute. Fuller, Wayne A. and George E. Battese "Transformatons for Estmaton of Lnear Models wth Nested-Error Structure." Journal of the Amercan Statstcal Assocaton, 68(343) (September): Fuller, Wayne A. and George E. Battese "Estmaton of Lnear Models wth Crossed- Error Structure." Journal of Econometrcs, : Greene, Wllam H LIMDEP Verson 9.0 Econometrc Modelng Gude 1. Planvew, New York: Econometrc Software. Greene, Wllam H Econometrc Analyss, 6th ed. Upper Saddle Rver, NJ: Prentce Hall. Hausman, J. A "Specfcaton Tests n Econometrcs." Econometrca, 46(6): Kennedy, Peter A Gude to Econometrcs, 6 th ed. Malden, MA: Blackwell Publshng SAS Insttute SAS/ETS 9.1 User s Gude. Cary, NC: SAS Insttute. SAS Insttute SAS/STAT 9.1 User s Gude. Cary, NC: SAS Insttute. Stata Press Stata Base Reference Manual, Release 11. College Staton, TX: Stata Press. Stata Press Stata Longtudnal/Panel Data Reference Manual, Release 11. College Staton, TX: Stata Press. Suts, Danel B Dummy Varables: Mechancs V. Interpretaton. Revew of Economcs & Statstcs, 66 (1): Swamy, P. A. V. B Effcent Inference n a Random Coeffcent Regresson Model. Econometrca, 38: Uyar, Bulent, and Orhan Erdem "Regresson Procedures n SAS: Problems?" Amercan Statstcan, 44(4): Wooldrdge, Jeffrey M Econometrc Analyss of Cross Secton and Panel Data. nd ed. Cambrdge, MA: MIT Press.