2 Generalized Gini and Concentration coefficients. Keywords sgini ; Stata ; generalized Gini ; Concentration
|
|
|
- Gervais Woods
- 9 years ago
- Views:
Transcription
1 sgini Generalized Gini and Concentration coefficients (with factor decomposition) in Stata Philippe Van Kerm CEPS/INSTEAD, Luxembourg September 2009 (revised February 2010) Abstract sgini is a user-written Stata package to compute generalized Gini and concentration coefficients. This note describes syntax, formulas and usage examples. Keywords sgini ; Stata ; generalized Gini ; Concentration coefficient JEL Classification C88; D31 1 Introduction This document describes sgini, a lightweight command for calculations of generalized Gini (a.k.a. S-Gini) and Concentration coefficients from unit-record data (not grouped data) in Stata. sgini computes relative (scale invariant) Gini indices of inequality by default but can be requested to produce absolute (translation invariant) indices or aggregate welfare S-Gini indices. The command can also optionally report decomposition of these indices by factor components (income sources). As this document shows, while the scope of the command itself is seemingly relatively narrow, sgini can serve as a flexible building block in a wide array of economic applications. The command is available online for installation in net-aware Stata. 1 At the command prompt, type net install sgini, from( 2 Generalized Gini and Concentration coefficients 2.1 Covariance-based formulations The Gini coefficient is one of the most popular measure of inequality. One of its many formulations (see Yitzhaki, 1998) is based on a covariance expression: ( ) X GINI(X) = 2 Cov, (1 F(X)) µ(x) where X is a random variable of interest with mean µ(x), and F(X) is its cumulative distribution function (see, e.g., Lerman & Yitzhaki, 1984, Jenkins, 1988). 2 Closely related to the Gini coefficient is the Concentration coefficient. The Concentration coefficient measures the association between two random variables and can be expressed as ( ) X CONC(X, Y) = 2 Cov, (1 G(Y)) µ(x) Centre d Etudes de Populations, de Pauvreté et de Politiques Socio-Economiques/International Networks for Studies in Technology, Environment, Alternatives, Development. B.P. 48, L-4501 Differdange, Luxembourg. lu. [email protected]. 1 The latest version of the sgini package is (of ). Stata 8.2 or later is required. 2 The Gini coefficient is perhaps more generally known as one minus twice the area under the Lorenz curve of X which, if X is income, plots the share of total income held by the poorest 100 p percent of the population against p.
2 where G(Y) is the cumulative distribution function of Y. CONC(X, Y) reflects how much X is concentrated on observations with high ranks in Y (see, e.g., Kakwani, 1977a). A single-parameter generalization of the Gini coefficient has been proposed by Donaldson & Weymark (1980, 1983) and Yitzhaki (1983). The generalized Gini coefficient (a.k.a. the S-Gini, or extended Gini coefficient) can also be expressed as a covariance: ( ) X GINI(X; υ) = υ Cov, (1 F(X))υ 1 µ(x) where υ is a parameter tuning the degree of aversion to inequality. The standard Gini corresponds to υ = 2. See Yitzhaki & Schechtman (2005) for a recent review. A generalized Concentration coefficient can be similarly defined as ( ) X CONC(X, Y; υ) = υ Cov, (1 G(Y))υ 1. µ(x) Note finally that while measures of inequality are typically taken as relative one considers the distribution of, say, income shares, there are situations in which there is either interest in taking levels into account too, a.k.a. aggregate welfare indices (Sen, 1976), or in considering deviations from the mean, a.k.a. absolute inequality (Blackorby & Donaldson, 1980). In the first case, Concentration coefficients (and Gini coefficients with X = Y) are simply redefined as while the second case is obtained as 2.2 Calculation TOTCONC(X, Y; υ) = µ(x) (1 CONC(X, Y; υ)) = υ µ(x) Cov ( X, (1 G(Y)) υ 1) ABSCONC(X, Y; υ) = µ(x) TOTCONC(X, Y; υ) = µ(x) CONC(X, Y; υ). Covariance-based expressions for the generalized Gini and Concentration coefficients are particularly convenient for calculations from unit-record data. Estimation simply involves estimating a sample covariance between the observations from variable X (divided by their sample mean) and the (fractional) ranks of observations from variable X or Y. Attention needs to be paid to handle carefully tied values in the computation of the fractional ranks. Consider a sample of N observations on a variable Y with associated sample weights: {(y i, w i )} N i=1. Let K be the number of distinct values observed on Y, denoted y 1 < y 2 <... < y K, and denote by π k the corresponding weighted sample proportions: π k = N i=1 w i1(y i = y k ) N i=1 w i (1(condition) is equal to 1 if condition is true and 0 otherwise). The fractional rank attached to each y k is then given by k 1 F k = π j + 0.5π k j=0 where π 0 = 0 (Lerman & Yitzhaki, 1989, Chotikapanich & Griffiths, 2001). Each observation in the sample is then associated with the fractional rank F i = K F k1(y i = y k). k=1 2
3 This procedure ensures that tied observations are associated with identical fractional ranks and that the sample mean of the fractional ranks is equal to 0.5. {(F i, y i, w i )} N i=1 can then be plugged in a standard sample covariance formula. This makes the resulting Gini coefficient estimate independent on the sample/population size. 3 3 Applications Gini coefficients are popular measures of inequality by themselves. Similarly, Concentration coefficients are often used to measure income-related inequalities in other socially important variables. van Doorslaer et al. (1997), for example, compared income-related inequalities in health across a number of countries using the Concentration coefficient of a self-reported health measure against income. But both measures are also used as building blocks for a number of related applications. Four cases are illustrated here: (i) the decomposition of income inequality by sources, (ii) the measurement of tax progressivity and horizontal equity, (iii) the measurement of income mobility and of the pro-poorness of growth, and (iv) the measurement of income polarization. 3.1 Decomposition of income inequality by source Total family income can be seen as the sum of a number of components: earnings, capital income, transfer income, etc. There might be interest in identifying the contribution of each of these sources to inequality in total income. The natural decomposition of Generalized Gini coefficients is GINI(Y; υ) = K k=1 µ(y k ) µ(y) CONC(Yk, Y; υ) where CONC(Y k, Y; υ) is the generalized Concentration coefficient of incomes from source k with respect to total income and µ(y k ) and µ(y) denote means of source k and total income respectively (Fei et al., 1978, Lerman & Yitzhaki, 1985). Lerman & Yitzhaki (1985) also noted that CONC(Y k, Y; υ) can further be expressed as CONC(Y k, Y; υ) = GINI(Y k ; υ) R(Y k, Y; υ) where R(Y k, Y; υ) is the (generalized) Gini correlation R(Y k, Y; υ) = Cov ( Y k, (1 F(Y)) υ 1) Cov (Y k, (1 F k (Y k )) υ 1 ). Finally, Lerman & Yitzhaki (1985) derive an expression for the relative impact on the Gini coefficient of a marginal increase in the size of source k: ( 1 GINI(Y; υ) GINI(Y; υ) e k = µ(yk ) CONC(Y k ), Y; υ) 1 µ(y) GINI(Y; υ) = µ(yk ) µ(y) ( GINI(Y k ; υ)r(y k, Y; υ) GINI(Y; υ) ) 1 Note how these components are in effect combinations of estimates of Gini and Concentration coefficients, along with sample means. 3 See Yitzhaki & Schechtman (2005), Berger (2008), and Davidson (2009) for recent discussions of this issue. Also see Cox (2002) for a more general discussion of (fractional) ranks. In Stata, cumul or glcurve s pvar() option are sometimes used to estimate ranks that are plugged in formula for the Gini or Concentration index. Using these estimates of the empirical distribution function is not advisable here because ranks estimated in this way are not fractional and the resulting estimate of the Gini is therefore not population invariant (and so depend on the sample size). 3
4 López-Feldman (2006) discusses this decomposition in greater details and describes descogini, a Stata command for calculating the components of the decomposition Tax progressivity and horizontal equity Much of the analyses on taxation schemes attempt to measure how progressive is a tax schedule, that is, how much inequality is reduced after application of the tax. A popular measure is the Reynolds- Smolensky index of redistributive effect defined as the difference between the Gini coefficient of pre-tax income and the Gini coefficient of post-tax income (Reynolds & Smolensky, 1977): Π RS = GINI(X pre ) GINI(X post ) where X pre and X post are pre- and post-tax income, respectively. The Kakwani measure of progressivity is similarly defined (Kakwani, 1977b): Π K = CONC(T, X pre ) GINI(X pre ) where T is the tax paid: T = X pre X post. Combining the progressivity measure with a component capturing the re-ranking induced by the tax schedule leads to a decomposition of Π RS as Π RS = g 1 g ΠK R where R = (CONC(X post, X pre ) GINI(X post )) captures the effect of re-ranking on the net reduction in the Gini coefficient, and g is the average tax rate. See Lambert (2001) for a textbook exposition. Again, all that is required to compute these measures is estimation of a number of Gini and Concentration coefficients. These (and other) measures are computed by the Stata command progres available on SSC (Peichl & Van Kerm, 2007). 3.3 Income mobility and pro-poor growth Transposing concepts from the progressivity measurement, Jenkins & Van Kerm (2006) relate the change in income inequality over time to the progressivity of individual income growth a measure of the pro-poorness of economic growth and mobility in the form of re-ranking: where and (υ) = R(υ) P(υ) P(υ) = GINI(X 0 ; υ) CONC(X 0, X 1 ; υ) R(υ) = GINI(X 1 ; υ) CONC(X 0, X 1 ; υ) P(υ) can be interpreted as an indicator of how much growth has benefited disproportionately to individuals at the bottom of the distribution in the initial time period. R(υ) captures how much a progressive income growth has lead to re-ranking between individuals, so that the net reduction in inequality is the difference between P(υ) and R(υ). Note that R(υ) can also be interpreted as a measure of mobility (in the form of re-ranking) in its own right (Yitzhaki & Wodon, 2004). In an analysis of cross-country convergence in GDP, O Neill & Van Kerm (2008) have interpreted (υ) as a measure of σ-convergence and P(υ) as a measure of β-convergence, thereby reconciling the two concepts in a single framework. This decomposition of changes over time in the Gini coefficient is implemented in Stata in the dsginideco package available from the SSC archive (Jenkins & Van Kerm, 2009b). 4 descogini does not currently handle sample weights (as of the January 2008 version) and is limited to υ = 2. sgini can optionally be used to report similar decomposition coefficients without these constraints (see supra). 4
5 3.4 Income polarization The Gini coefficient has also been used as a building block of several measures of income bipolarization. Arrange a population in increasing order of income and divide it in two equal-sized groups: the poor are individuals with an income below the median and the rich are those with income above the median. Measures of bipolarization capture the distance between these two groups. Denote the median M(Y), mean income µ(y), mean income among the poor µ(y P ), mean income among the rich µ(y R ), the Gini coefficient among the poor GINI(Y P ), the Gini coefficient among the rich GINI(Y R ) and the overall Gini, GINI(Y). Silber et al. (2007) show that several measures of bipolarization can be expressed in terms of within-group Gini and between-group Gini which can be written in this situation as: Within(Y P, Y R ) = 1 ( µ(y P ) ) 4 µ(y) GINI(YP ) + µ(yr ) µ(y) GINI(YR ) and Between(Y P, Y R ) = 1 4 ( µ(y R ) ) µ(y) µ(yp ). µ(y) Note that the between-group Gini is equivalent to estimating the Gini coefficient of mean income in the two groups, that is GINI((µ(Y P ), µ(y R )) ). The bipolarization index suggested by Silber et al. (2007) is defined as P 1 = Between(YP, Y R ) Within(Y P, Y R ), GINI(Y) the index of Wolfson (1994) as P 2 = ( Between(Y P, Y R ) Within(Y P, Y R ) ) and the index proposed by Zhang & Kanbur (2001) as P 3 = Between(YP, Y R ) Within(Y P, Y R ). µ(y) Med(Y), See Silber et al. (2007). sgini makes estimation of these bipolarization measures straightforward. It also opens possibilities for extensions of these measures by using variations of the inequality aversion parameter υ. 4 The sgini command sgini is a lightweight command to compute generalized Gini and concentration coefficents from unitrecord data in Stata using the formulas given in Section 2. sgini can also report the decomposition by income source (a.k.a. a factor decomposition) if requested. sgini comes with a companion command fracrank for generating fractional ranks as infra. 4.1 Syntax The syntax of sgini is as follows: sgini varlist [ if ] [ in ] [ weight ] [, param(numlist) sortvar(varname) fracrankvar(varname) sourcedecomposition aggregate absolute format(%fmt) ] fweight, aweight, pweight and iweight are allowed; see [U] weight Weights. by, bootstrap, jackknife are allowed; see [U] Prefix commands. Time-series operators are accepted; see [U] Time-series varlists. 5
6 sgini computes the generalized Gini or concentration coefficients for each variable in varlist according to the parameters passed in option param and with ordering variable of option sortvar. Decomposition by income source is optionally computed. See detailed option descriptions below. Multiple variables and multiple inequality aversion parameters can be passed to sgini. Beware that if multiple variables are input, sgini will discard observations with missing data on any of the input variables and compute all coefficients on the resulting sample. The syntax of the accompanying fracrank is fracrank varname [ if ] [ in ] [ weight ], generate(newvarname) fweight and aweight are allowed. Time-series operators are accepted; see [U] Time-series varlists. fracrank takes one numeric variable as input and creates a new variable filled with the corresponding fractional rank for each observation. 4.2 Options sgini options param(numlist) specifies generalized Gini parameters. Default is 2 leading to the standard Gini or Concentration coefficient. Multiple parameters can be specified. sortvar(varname) requests the computation of a Concentration coefficient by cumulating the variable(s) of interest in increasing order of varname. Default is to cumulate the variable(s) of interest against themselves, leading to Gini coefficients. fracrankvar(varname) is a rarely used option that passes the name of an existing variable varname containing pre-specified fractional ranks based on which the Gini and Concentration coefficients can be computed. This is a potentially dangerous option, but it may lead to considerable speed gains under certain circumstances. It is essential that the fractional rank variable be computed correctly in the first place (using e.g., fracrank) and on the adequate sample (think missing data, if clauses, ordering). sourcedecomposition requests factor decomposition of indices. It is relevant when more than one variable is passed in varlist. It requests that a variable be created by taking the row sum of all elements in varlist computes the Gini (or Concentration) coefficient for this new variable, and estimates the contribution of each element of varlist to the latter by applying the natural decomposition rule for Gini coefficients (as in Lerman & Yitzhaki, 1985). aggregate and absolute request, respectively, computation of aggregate S-Gini welfare measures or computation of absolute Gini and Concentration coefficients, instead of the relative inequality measures. They are mutually exclusive and incompatible with sourcedecomposition. format(%fmt) controls the display format; default is %4.3f fracrank options generate(newvarname) specifies the name for the created variable. 6
7 4.3 Saved results Scalars r(n) r(sum w) r(coeff) Matrices r(coeffs) r(parameters) r(r) r(c) r(elasticity) r(s) Macros r(varlist) r(paramlist) r(sortvar) number of observations sum of weights estimated coefficient for first variable, first parameter all estimated coefficients vector inequality aversion parameters vector Gini correlations between source and total income (if requested) concentration coefficients of each source (if requested) elasticities between source and total Gini (if requested) factor shares (if requested) varlist list of parameters from option param varname if sortvar(varname) specified 4.4 Dependencies on user-written packages sgini does not require other user-written packages. 5 Example This example illustrates usage of sgini using data from the National Longitudinal Survey of Youth, available from the Stata Press website. Take this as merely illustrative of syntax using data easily available from within Stata, not as of any substantive interest! First open the data, generate a wage variable and tsset the data as appropriate.. cap use clear. tsset idcode year panel variable: time variable: delta:. gen w = exp(ln_wage) idcode (unbalanced) year, 68 to 88, but with gaps 1 unit sgini can then be used to estimate Gini coefficients on the wage variable (across all waves of data). The second syntax produces absolute Gini coefficients for multiple inequality aversion parameters 7
8 . sgini w Gini coefficient for w Variable v=2 w sgini w, param(1.5(.5)4) absolute Generalized Gini coefficient for w Variable v=1.5 v=2 v=2.5 v=3 v=3.5 v=4 w The third example illustrates usage of multiple variables in varlist (and of time-series operators) and estimation of concentration coefficients with the sortvar(varname) option. Note how results for variable w differ from the first example because of the case-wise deletion of observations with missing data on L.w and L2.w.. sgini w L.w L2.w Gini coefficient for w, L.w, L2.w Variable v=2 w L.w L2.w sgini w L.w L2.w, sortvar(w) param(2 3) Generalized Concentration coefficient for w, L.w, L2.w against w Variable v=2 v=3 w L.w L2.w return list scalars: macros: r(sum_w) = 3481 r(n) = 3481 r(coeff) = r(sortvar) : "w" r(paramlist) : "2 3" r(varlist) : "w L.w L2.w" matrices: r(coeffs) : 1 x 6 r(parameters) : 1 x 2. matrix list r(coeffs) r(coeffs)[1,6] param1: param1: param1: param2: L. L2. w w w w Coeff param2: param2: L. L2. w w 8
9 Coeff Specifying option sourcedecomposition produces the factor decomposition.. sgini w L.w L2.w, source Gini coefficient for w, L.w, L2.w Variable v=2 w L.w L2.w Decomposition by source: TOTAL = w + L.w + L2.w Parameter: v=2 Variable s g r g*r s*g*r s*g*r/g s*g*r/g-s w L.w L2.w TOTAL return list scalars: macros: r(sum_w) = 3481 r(n) = 3481 r(coeff) = r(paramlist) : "2" r(varlist) : "w L.w L2.w" matrices: r(s) : 1 x 3 r(elasticity) : 1 x 3 r(c) : 1 x 3 r(r) : 1 x 3 r(coeffs) : 1 x 3 r(parameters) : 1 x 1 The final examples illustrates how Stata s built-in bootstrap or jackknife prefix commands can be used with sgini to produce standard errors and confidence intervals.. gen newid = idcode. tsset newid year panel variable: time variable: delta: newid (unbalanced) year, 68 to 88, but with gaps 1 unit. bootstrap G=r(coeff) /// >, cluster(idcode) idcluster(newid) reps(250) nodots: /// > sgini w if!mi(w) Warning: Because sgini is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means that no observations will be excluded from the resampling because of missing values or other reasons. If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset 9
10 in memory contains only the relevant data. Bootstrap results Number of obs = Replications = 250 command: sgini w G: r(coeff) (Replications based on 4711 clusters in idcode) Observed Bootstrap Normal-based Coef. Std. Err. z P> z [95% Conf. Interval] G jackknife G=r(coeff) /// >, cluster(idcode) idcluster(newid) rclass nodots: /// > sgini w if!mi(w) Jackknife results Number of obs = Replications = 4711 command: sgini w if!mi(w) G: r(coeff) n(): r(n) (Replications based on 4711 clusters in idcode) Jackknife Coef. Std. Err. t P> t [95% Conf. Interval] G Citation, liability, conditions of use sgini is not an official Stata command. It is a free contribution to the research community. The program should work as described, but it is freely offered as-is. Use at your own risk! Bug reports as well as comments and suggestions can be sent to [email protected]. Please cite as: Van Kerm, P. (2009), sgini Generalized Gini and Concentration coefficients (with factor decomposition) in Stata, v1.1 (revised February 2010), CEPS/INSTEAD, Differdange, Luxembourg. References Berger, Y. G. (2008), A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini coefficient, Journal of Official Statistics, 24(4): Blackorby, C. & Donaldson, D. (1980), A theoretical treatment of indices of absolute inequality, International Economic Review, 21(1): Chotikapanich, D. C. & Griffiths, W. (2001), On calculation of the extended Gini coefficient, Review of Income and Wealth, 47: Cox, N. J. (2002), Calculating percentile ranks or plotting positions, in Stata FAQs, StataCorp LP, College Station, TX, available at Davidson, R. (2009), Reliable inference for the Gini index, Journal of Econometrics, available online doi: /j.jeconom
11 Donaldson, D. & Weymark, J. A. (1980), A single-parameter generalization of the Gini indices of inequality, Journal of Economic Theory, 22: (1983), Ethically flexible Gini indices for income distributions in the continuum, Journal of Economic Theory, 29: Fei, J. C. H., Ranis, G. & Kuo, S. W. Y. (1978), Growth and the family distribution of income by factor components, Quarterly Journal of Economics, 92(1): Jenkins, S. P. (1988), Calculating income distribution indices from micro-data, National Tax Journal, 41(1): Jenkins, S. P. & Van Kerm, P. (2006), Trends in income inequality, pro-poor income growth and income mobility, Oxford Economic Papers, 58(3): (2009), dsginideco: Decomposition of inequality change into pro-poor growth and mobility components, Statistical Software Components S457009, Boston College Department of Economics, Kakwani, N. C. (1977a), Applications of Lorenz curves in economic analysis, Econometrica, 45(3): (1977b), Measurement of tax progressivity: an international comparison, The Economic Journal, 87(345): Lambert, P. J. (2001), The Distribution and Redistribution of Income, a Mathematical Analysis, Manchester University Press, Manchester, UK, 3rd edn. Lerman, R. I. & Yitzhaki, S. (1984), A note on the calculation and interpretation of the Gini index, Economics Letters, 15: (1985), Income inequality effects by income source: A new approach and applications to the United States, Review of Economics and Statistics, 67(1): (1989), Improving the accuracy of estimates of Gini coefficients, Journal of Econometrics, 42: López-Feldman, A. (2006), Decomposing inequality and obtaining marginal effects, The Stata Journal, 6(1): O Neill, D. & Van Kerm, P. (2008), An integrated framework for analysing income convergence, The Manchester School, 76(1):1 20. Peichl, A. & Van Kerm, P. (2007), progres: Module to measure distributive effects of an income tax, Statistical Software Components S456867, Boston College Department of Economics, http: //ideas.repec.org/c/boc/bocode/s html. Reynolds, M. & Smolensky, E. (1977), Public Expenditures, Taxes, and the Distribution of Income: The United States, 1950, 1961, 1970, Academic Press,, New York. Sen, A. K. (1976), Real national income, Review of Economic Studies, 43(1): Silber, J., Hanoka, M. & Deutsch, J. (2007), On the link between the concepts of kurtosis and bipolarization, Economics Bulletin, 4(36):1 6. van Doorslaer, E., Wagstaff, A., Bleichrodt, H., Calonge, S., Gerdtham, U.-G., Gerfin, M., Geurts, J., Gross, L., Häkinnen, U., Leu, R. E., O Donnell, O., Propper, C., Puffer, F., Rodriguez, M., Sundberg, G. & Winkelhake, O. (1997), Income-related inequalities in health: Some international comparisons, Journal of Health Economics, 16:
12 Wolfson, M. C. (1994), When inequalities diverge, American Economic Review, 84(2): Yitzhaki, S. (1983), On an extension of the Gini inequality index, International Economic Review, 24: (1998), More than a dozen alternative ways of spelling Gini, in D. J. Slottje (ed.), Research on Economic Inequality, vol. 8, JAI Press, Elsevier Science, Yitzhaki, S. & Schechtman, E. (2005), The properties of the extended Gini measures of variability and inequality, METRON International Journal of Statistics, 63(3): Yitzhaki, S. & Wodon, Q. (2004), Inequality, mobility, and horizontal equity, in Y. Amiel & J. A. Bishop (eds.), Research on Economic Inequality (Studies on Economic Well-Being: Essays in Honor of John P. Formby), vol. 12, JAI Press, Elsevier Science, Zhang, X. & Kanbur, R. (2001), What difference do polarisation measures make? An application to China, Journal of Development Studies, 37(3): Acknowledgements This work is part of the MeDIM project (Advances in the Measurement of Discrimination, Inequality and Mobility) supported by the Luxembourg Fonds National de la Recherche (contract FNR/06/15/08) and by core funding for CEPS/INSTEAD by the Ministry of Culture, Higher Education and Research of Luxembourg. 12
lerman: A Stata module to decompose inequality using sampling weights
The program : A Stata module to decompose inequality using sampling weights Portuguese Stata Users Group meeting 2015 The program Outline 1 Introduction 2 3 The program 4 5 The program Motivation The impact
Concentration Curves. The concentration curve defined
7 Concentration Curves In previous chapters, we assessed health inequality through variation in mean health across quintiles of some measure of living standards. Although convenient, such a grouped analysis
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
Income and Mobility in Germany
TRENDS IN INCOME INEQUALITY, PRO-POOR INCOME GROWTH AND INCOME MOBILITY Stephen P. Jenkins (ISER, University of Essex) Philippe Van Kerm (CEPS/INSTEAD, Luxembourg) ISER Working Papers Number 2003-27 Institute
Module 5: Measuring (step 3) Inequality Measures
Module 5: Measuring (step 3) Inequality Measures Topics 1. Why measure inequality? 2. Basic dispersion measures 1. Charting inequality for basic dispersion measures 2. Basic dispersion measures (dispersion
Chapter 6. Inequality Measures
Chapter 6. Inequality Measures Summary Inequality is a broader concept than poverty in that it is defined over the entire population, and does not only focus on the poor. The simplest measurement of inequality
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
The Concentration Index
8 The Concentration Index Concentration curves can be used to identify whether socioeconomic inequality in some health sector variable exists and whether it is more pronounced at one point in time than
Inequalities in Self-Reported Health: A Meta-Regression Analysis
WP 13/05 Inequalities in Self-Reported Health: A Meta-Regression Analysis Joan Costa-Font and Cristina Hernandez-Quevedo February 2013 york.ac.uk/res/herc/hedgwp Inequalities in Self-Reported Health: A
Survey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP Stata Conference DC 2009 J. Pitblado (StataCorp) Survey Data Analysis DC 2009 1 / 44 Outline 1 Types of
Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see
Title stata.com summarize Summary statistics Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see Syntax summarize [ varlist ] [ if ] [ in ] [ weight
Measuring pro-poor growth
Economics Letters 78 (2003) 93 99 www.elsevier.com/ locate/ econbase q Measuring pro-poor growth * Martin Ravallion, Shaohua Chen World Bank, MSN MC 3-306 Development Research Group, 1818 H Street NW,
The Wage Return to Education: What Hides Behind the Least Squares Bias?
DISCUSSION PAPER SERIES IZA DP No. 8855 The Wage Return to Education: What Hides Behind the Least Squares Bias? Corrado Andini February 2015 Forschungsinstitut zur Zukunft der Arbeit Institute for the
The Stata Journal. Editor Nicholas J. Cox Geography Department Durham University South Road Durham City DH1 3LE UK n.j.cox@stata-journal.
The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A & M University College Station, Texas 77843 979-845-3142; FAX 979-845-3144 [email protected] Editor Nicholas J. Cox Geography
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
An elementary characterization of the Gini index
MPRA Munich Personal RePEc Archive An elementary characterization of the Gini index Joss Sanchez-Perez and Leobardo Plata-Perez and Francisco Sanchez-Sanchez 31. January 2012 Online at https://mpra.ub.uni-muenchen.de/36328/
Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem
Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become
A GENERALIZATION OF THE PFÄHLER-LAMBERT DECOMPOSITION. Jorge Onrubia Universidad Complutense. Fidel Picos REDE-Universidade de Vigo
A GENERALIZATION OF THE PFÄHLER-LAMBERT DECOMPOSITION Jorge Onrubia Universidad Complutense Fidel Picos REDE-Universidade de Vigo María del Carmen Rodado Universidad Rey Juan Carlos XX Encuentro de Economía
Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Poverty Indices: Checking for Robustness
Chapter 5. Poverty Indices: Checking for Robustness Summary There are four main reasons why measures of poverty may not be robust. Sampling error occurs because measures of poverty are based on sample
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Survey Data Analysis in Stata
Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP 2009 Canadian Stata Users Group Meeting Outline 1 Types of data 2 2 Survey data characteristics 4 2.1 Single
From the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
Income inequality: Trends and Measures
Key Points Income inequality: Trends and Measures UK income inequality increased by 32% between 1960 and 2005. During the same period, it increased by 23% in the USA, and in Sweden decreased by 12%. In
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Estimating the random coefficients logit model of demand using aggregate data
Estimating the random coefficients logit model of demand using aggregate data David Vincent Deloitte Economic Consulting London, UK [email protected] September 14, 2012 Introduction Estimation
Article: Main results from the Wealth and Assets Survey: July 2012 to June 2014
Article: Main results from the Wealth and Assets Survey: July 2012 to June 2014 Coverage: GB Date: 18 December 2015 Geographical Area: Region Theme: Economy Main points In July 2012 to June 2014: aggregate
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Using Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2
Title stata.com ttest t tests (mean-comparison tests) Syntax Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see One-sample t test ttest varname
Charting Income Inequality
Charting Inequality Resources for policy making Module 000 Charting Inequality Resources for policy making Charting Inequality by Lorenzo Giovanni Bellù, Agricultural Policy Support Service, Policy Assistance
Working Paper Series 2013-03
Working Paper Series 2013-03 Dos and Don ts of Gini Decompositions November 2013 Simon Jurkatis, Humboldt-Universität zu Berlin Wolfgang Strehl, Freie Universität Berlin Sponsored by BDPEMS Spandauer Straße
From the help desk: Demand system estimation
The Stata Journal (2002) 2, Number 4, pp. 403 410 From the help desk: Demand system estimation Brian P. Poi Stata Corporation Abstract. This article provides an example illustrating how to use Stata to
Measuring and Explaining Inequity in Health Service Delivery
15 Measuring and Explaining Inequity in Health Service Delivery Equitable distribution of health care is a principle subscribed to in many countries, often explicitly in legislation or official policy
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Syntax Menu Description Options Remarks and examples Stored results References Also see
Title stata.com xi Interaction expansion Syntax Syntax Menu Description Options Remarks and examples Stored results References Also see xi [, prefix(string) noomit ] term(s) xi [, prefix(string) noomit
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
EXCEL SOLVER TUTORIAL
ENGR62/MS&E111 Autumn 2003 2004 Prof. Ben Van Roy October 1, 2003 EXCEL SOLVER TUTORIAL This tutorial will introduce you to some essential features of Excel and its plug-in, Solver, that we will be using
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
Competing-risks regression
Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview
Forecasting in STATA: Tools and Tricks
Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be
Income Distribution Database (http://oe.cd/idd)
Income Distribution Database (http://oe.cd/idd) TERMS OF REFERENCE OECD PROJECT ON THE DISTRIBUTION OF HOUSEHOLD INCOMES 2014/15 COLLECTION October 2014 The OECD income distribution questionnaire aims
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Economic Growth, Poverty and Inequality in South Africa: The First Decade of Democracy Haroon Bhorat & Carlene van der Westhuizen 1
Economic Growth, Poverty and Inequality in South Africa: The First Decade of Democracy Haroon Bhorat & Carlene van der Westhuizen 1 1 Professor, Development Policy Research Unit, School of Economics, University
Clustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
Stata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting
2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
The following postestimation commands for time series are available for regress:
Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat
Quantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
An introduction to Value-at-Risk Learning Curve September 2003
An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk
Statistics: Descriptive Statistics & Probability
Statistics: Descriptive Statistics & Mediterranean Agronomic Institute of Chania & University of Crete MS.c Program Business Economics and Management 7 October 2013, Vol. 1.1 Statistics: Descriptive Statistics
Poverty and income growth: Measuring pro-poor growth in the case of Romania
Poverty and income growth: Measuring pro-poor growth in the case of EVA MILITARU, CRISTINA STROE Social Indicators and Standard of Living Department National Scientific Research Institute for Labour and
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Nonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
Module 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724
Permutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
Sampling Error Estimation in Design-Based Analysis of the PSID Data
Technical Series Paper #11-05 Sampling Error Estimation in Design-Based Analysis of the PSID Data Steven G. Heeringa, Patricia A. Berglund, Azam Khan Survey Research Center, Institute for Social Research
The Regression Calibration Method for Fitting Generalized Linear Models with Additive Measurement Error
The Stata Journal (), Number, pp. 1 11 The Regression Calibration Method for Fitting Generalized Linear Models with Additive Measurement Error James W. Hardin Norman J. Arnold School of Public Health University
MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!
MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics
CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
2. Descriptive statistics in EViews
2. Descriptive statistics in EViews Features of EViews: Data processing (importing, editing, handling, exporting data) Basic statistical tools (descriptive statistics, inference, graphical tools) Regression
Catastrophic Payments for Health Care
18 Catastrophic Payments for Health Care Health care finance in low-income countries is still characterized by the dominance of out-of-pocket payments and the relative lack of prepayment mechanisms, such
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Correlated Random Effects Panel Data Models
INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear
Comparison of resampling method applied to censored data
International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison
A Comparison of Correlation Coefficients via a Three-Step Bootstrap Approach
ISSN: 1916-9795 Journal of Mathematics Research A Comparison of Correlation Coefficients via a Three-Step Bootstrap Approach Tahani A. Maturi (Corresponding author) Department of Mathematical Sciences,
INCOME INEQUALITY AND INCOME STRATIFICATION IN POLAND
STATISTICS IN TRANSITION new series, Spring 204 269 STATISTICS IN TRANSITION new series, Spring 204 Vol. 5, No. 2, pp. 269 282 INCOME INEQUALITY AND INCOME STRATIFICATION IN POLAND Alina Jędrzeczak ABSTRACT
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
Statistical Bulletin. The Effects of Taxes and Benefits on Household Income, 2011/12. Key points
Statistical Bulletin The Effects of Taxes and Benefits on Household Income, 2011/12 Coverage: UK Date: 10 July 2013 Geographical Area: UK and GB Theme: Economy Theme: People and Places Key points There
Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
