Computing Poverty measures with R vs. Stata Rosendo Ramirez and Darryl McLeod Professor Vinod R-Group presentation, May 1, 2014 Fordham University E-530 Dealy 12 noon Outline of Presentation 1. Accessing survey data in R and Stata, Peru has a survey of about 25,000 persons, a longitudinal panel, 2007 to 2011. We are using the 2011 survey data, reading it first into Stata (it is published in Stata format by the Peruvian..???) 2. To make the survey same representative of the 30 million people in Peru, we have to weight each family by its relative prevalence in the national population. This weight scheme is accomplished by svyset in Stata and, more or less, by a subroutine called svydesign in R. 3. We also use a program called sepov to computer p(0), p(1) and p(2) three standard poverty measures derived from the Foster-Greer-Thorbeke or FGT poverty index. 4. We find that the Stat and R routines are equally capable of computing basic poverty rates, but so far we have not been able to implement the survey design or weighting scheme Stata uses to make a HH survey representative of the entire population. 5. On the other hand, R is free and constantly being updated and it present capacity to handle large data sets such as the peru survey of 25,000 households is impressive. 6. As of this writing, Stata s panel data routines (not shown here) are a bit easier to use that those R. In fact we have not figured out how to load the entire 5 year Peruvian survey into R (suggestions welcome). Resources/Files Camtasia Tutorial for R-Studio Early version (needs editing) (you can download this mp4 videos) How do I use the Stata survey (svy) commands? The Peruvian Nuevo Sol is the currency of Peru. Our currency rankings show that the most popular Peru Nuevo Sol exchange rate is the PEN to USD rate. The currency code for Nuevos Soles is PEN, and the currency symbol is S/.. Data: 2011 HH Survey data for Peru, from the Stata Do file for tutorial: Sample Stata output with notes All files on http://www.gdsnet.org/ R files: R file for reading Stata survey data R inflation VAR data Prueba.R (not sure what this file is)
Background note on the FGT poverty and severity measures: the headcount or H or p(0) or the poverty gap (H*I where I has distance below the poverty line of the average poor person) and the severity measure p(2) or gap squared. A useful, encompassing measure of poverty is the Foster, Greer, Thorbeke (FGT) index, where n is total population, q is the population below the poverty line yp and yi is the income of poor person i. The income gap or shortfall of each poor q yp y i FGT (1/ n) vi where vi where yp is the poverty line, yi is the income of household i, y i 1 p q is the number of poor households, n is the number of households in the entire population. Suppose the poverty line is $400 and there are four poor people with of a total population (n) of 10. The two rural poor people have $200 annual income and the two urban poor have $300. When α = 0 and the FGT index p(0) equals the basic headcount measure of poverty (H). When α= 1 the FGT index p(1) is H*I, where I is the average income shortfall or (yp - ȳ)/yp where ȳ is the average income of the poor and again yp is the official poverty line. When α = 2 the FGT poverty index or P(2) is the sum of the average income gaps squared. This implies the poorest have more weight in the poverty index, so that if the government redistributes income to the poorest of the poor, the index p(2) falls most ( remember the neediest is the NY Times motto) The global standard for severe poverty is 38/month or $1.25 a day PPP in low income countries. Middle income countries like Peru use $2.50 per day or $76 per month as their severe poverty line or $4-$5 per day for everyday or moderate poverty line. Note that the Peruvian currency, the Nuevo Sol trades at about 2.8 per dollar U.S. The PPP conversión factor for Peru is about 1.66 in other words a dollar in Peru (rural and urban) buy what a $1.66 would buy in the United Stats. Files: This Stata file contains the 24,000 HHs in the 2011 survey: sumaria2011.dta Do file program: sumaria.do Stata code clear * open the data use "D:\economic_research\r-software\fordham\sumaria2011", clear *set the data survey design svyset conglome [pw=facpob], strata(estrato) * monthly per capita expenditure National
tabstat gpcm [aw=facpob], stats(mean semean sd n ) * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8 Soles/US$ * National tabstat linpe if (estrato>=1) [aw=facpob], stats(mean p50) * Urban tabstat linpe if (estrato<6) [aw=facpob], stats(mean p50) *Rural tabstat linpe if (estrato>=6) [aw=facpob], stats(mean p50) * mean of monthly percapita expenditure - poverty in local currency (soles) exchange rate = 2.8 Soles/US$ * National tabstat linea if (estrato>=1) [aw=facpob], stats(mean p50) * Urban tabstat linea if (estrato<6) [aw=facpob], stats(mean p50) * Rural tabstat linea if (estrato>=6) [aw=facpob], stats(mean p50) * Extreme Poverty headcount * National sepov gpcm [w=facpob], povline(linea) * Urban sepov gpcm [w=facpob] if (estrato<6), povline(linea) * Rural sepov gpcm [w=facpob] if (estrato>=6), povline(linea) * Poverty headcount * National sepov gpcm [w=facpob], povline(linpe) * Urban sepov gpcm [w=facpob] if (estrato<6), povline(linpe) * Rural sepov gpcm [w=facpob] if (estrato>=6), povline(linpe)
Stata Results 1. * monthly per capita expenditure - National tabstat gpcm [aw=facpob], stats(mean semean sd n ) variable mean se(mean) sd N -------------+---------------------------------------- gpcm 484.6624 2.556388 402.6534 24809 ------------------------------------------------------ 2. * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8 Soles/US$. * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8 Soles/US$. * National. tabstat linpe if (estrato>=1) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linpe 143.0299.1328722 137.7326 --------------------------------------------. * Urban. tabstat linpe if (estrato<6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linpe 150.6009.1561769 143.5867 --------------------------------------------. *Rural. tabstat linpe if (estrato>=6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------
linpe 121.2698.0161088 121.4675 -------------------------------------------- 3.. * mean of monthly percapita expenditure - poverty in local currency (soles) exchange rate = 2.8 Soles/US$. * National. tabstat linea if (estrato>=1) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linea 272.2597.3591983 275.7272 --------------------------------------------. * Urban. tabstat linea if (estrato<6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linea 296.3015.3693753 277.5714 --------------------------------------------. * Rural. tabstat linea if (estrato>=6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linea 203.1609.0766447 200.8827 --------------------------------------------
4.. * Poverty headcount. * National. sepov gpcm [w=facpob], povline(linea) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 24809 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 24809 Population size = 29943619 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.2782429.00415.2701086.2863772 2.127523 p1.0780467.0014051.0752928.0808007 1.902044 p2.0318401.0007396.0303904.0332898 1.827785. * Urban. sepov gpcm [w=facpob] if (estrato<6), povline(linea) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 15065 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 15065
Population size = 22214450 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.1799882.0048984.1703869.1895896 2.448933 p1.0400419.0014403.0372188.042865 2.561502 p2.0138027.0006963.0124379.0151675 2.724085. * Rural. sepov gpcm [w=facpob] if (estrato>=6), povline(linea) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 9744 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 9744 Population size = 7729168.5 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.5606372.0062727.5483413.5729331 1.556334 p1.1872767.002944.1815059.1930476 1.737225 p2.0836816.0018121.0801295.0872337 1.834893
5.. * Extreme Poverty headcount. * National. sepov gpcm [w=facpob], povline(linpe) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 24809 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 24809 Population size = 29943619 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.0634228.0019537.0595934.0672523 1.594156 p1.0149874.0005739.0138625.0161122 1.588561 p2.0053678.0002667.0048451.0058906 1.497365. * Urban. sepov gpcm [w=facpob] if (estrato<6), povline(linpe) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 15065 Strata: <one> Number of strata = 1
PSU: <observations> Number of PSUs = 15065 Population size = 22214450 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.0141625.0015339.0111558.0171693 2.538724 p1.0027967.0004132.0019867.0036066 2.922182 p2.0008881.0001642.0005662.0012099 2.583605. * Rural. sepov gpcm [w=facpob] if (estrato>=6), povline(linpe) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 9744 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 9744 Population size = 7729168.5 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.2050021.0055135.1941946.2158097 1.817265 p1.0500248.001753.0465885.0534611 1.902171 p2.0182432.0008841.0165102.0199762 1.957318
Poverty measures with R Software # how to set a directory? setwd("d:/economic_research/r-software/fordham") # how to get a directory? getwd() # how to read a stata file? # download foreign package - Read Stata file in R Software # for example stata file: sumaria2011.dta, mus08psidextract.dta, etc c<-read.dta("d:/economic_research/r-software/fordham/sumaria2011.dta") summary(~gpcm) # download survey package - Data survey poverty<-svydesign(id=~conglome, strata=~estrato, weights=~facpob, data=c) monthly_percapita_expenditure<-svymean(~gpcm, design=poverty) monthly_percapita_expenditure # download ineq package - Poverty package linea<-svymean(~linea, design=poverty) linea linpe<-svymean(~linpe, design=poverty) linpe pov(c$gpcm, 143.03, parameter=1, type ="Foster") pov(c$gpcm, 272.26, parameter=1, type ="Foster")
R Software Results > monthly_percapita_expenditure<-svymean(~gpcm, design=poverty) 1. monthly_percapita_expenditure mean SE gpcm 484.66 5.3645 Comparison: We have the same mean monthly per capita expenditure but different standard error of mean Stata R software Mean gpcm 484.6624 484.66 SE(mean gpcm) 2.556388 5.3645 > # download ineq package - Poverty package 2. > # mean monthly percapita expenditure National poverty line > linea<-svymean(~linea, design=poverty) > linea mean SE linea 272.26 0.83 3. > # mean monthly percapita expenditure National extreme poverty > linpe<-svymean(~linpe, design=poverty) > linpe mean SE linpe 143.03 0.3305 Comparison We have the same mean monthly per capita expenditure extreme poverty but different standard error of mean. National Stata R Software Mean linpe 143.0299 143.03 SE Mean linpe.1328722 0.3305
We have the same mean monthly per capita expenditure poverty but different standard error of mean. National Stata R Software Mean linpe 272.2597 272.26 SE Mean linpe.3591983 0.83 4. > # mean monthly percapita expenditure - extreme poverty line National > # National extreme poverty headcount > pov(c$gpcm, 143.03, parameter=1, type ="Foster") [1] 0.1050022 > # National poverty headcount > pov(c$gpcm, 272.26, parameter=1, type ="Foster") [1] 0.3374179 > Comparison Stata takes the data survey design (wei ght) while R Software uses only the sample. National Stata (with Weighted sample) R Software (unweighted data) Headcount Extreme poverty.0634228 0.1050022 Headcount Poverty.2782429 0.3374179 I am trying to find other packages to work with poverty measures using data survey design. So far I found ineq package that works with sample no with data survey design (weight).