Computing Poverty measures with R vs. Stata. Rosendo Ramirez and Darryl McLeod. Professor Vinod R-Group presentation, May 1, 2014



Similar documents
Chapter 6. Inequality Measures

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Usage and importance of DASP in Stata

Introduction; Descriptive & Univariate Statistics

Poverty Indicators Household Income and Expenditure Survey /07 Department of Census and Statistics Ministry of Finance and Planning Sri Lanka

Poverty Indices: Checking for Robustness

Running Descriptive Statistics: Sample and Population Values

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 4. Measures of Poverty

Measuring pro-poor growth

Using Stata for One Sample Tests

Correlation and Regression

National Longitudinal Study of Adolescent Health. Strategies to Perform a Design-Based Analysis Using the Add Health Data

Standard errors of marginal effects in the heteroskedastic probit model

Nominal, Real and PPP GDP

USAID POVERTY ASSESSMENT TOOLS (PAT) DATA ANALYSIS GUIDE

From the help desk: Demand system estimation

Sample Size Calculation for Longitudinal Studies

Survey Data Analysis in Stata

Health Care Payments and Poverty

Survey Data Analysis in Stata

Lab 11. Simulations. The Concept

From the help desk: hurdle models

Surveys on children: child poverty in Kyrgyzstan

Module 3: Measuring (step 2) Poverty Lines

Skewed Data and Non-parametric Methods

FINANCIAL INCLUSION INDICATORS FOR DEVELOPING COUNTRIES: The Peruvian Case

Complex Survey Design Using Stata

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Chapter 5 Estimating Demand Functions

Child Poverty in High- and Middle-Income Countries: Selected Findings from LIS 1

Comparing Levels of Development

Standard Deviation Estimator

SPSS and AM statistical software example.

Module 14: Missing Data Stata Practical

Introduction to STATA 11 for Windows

Econometrics I: Econometric Methods

THE EFFECT OF ECONOMIC GROWTH ON POVERTY IN EASTERN EUROPE

Chapter 6: Measuring the Price Level and Inflation. The Price Level and Inflation. Connection between money and prices. Index Numbers in General

CALCULATIONS & STATISTICS

Coefficient of Determination

Time Value of Money Level I Quantitative Methods. IFT Notes for the CFA exam

Cosumnes River College Principles of Macroeconomics Problem Set 3 Due September 17, 2015

Introduction to RStudio

THE 10 IDEAS BEHIND FOREX TRADING

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

Income Distribution Database (

Harmonization of Health Insurance Schemes in China

How to set the main menu of STATA to default factory settings standards

and Gologit2: A Program for Ordinal Variables Last revised May 12, 2005 Page 1 ologit y x1 x2 x3 gologit2 y x1 x2 x3, pl lrforce

2. Linear regression with multiple regressors

Q = ak L + bk L. 2. The properties of a short-run cubic production function ( Q = AL + BL )

Briefing note for countries on the 2015 Human Development Report. Palestine, State of

ANSWERS TO END-OF-CHAPTER QUESTIONS

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Nepal. Country coverage and the methodology of the Statistical Annex of the 2015 HDR

International Trade and. Exchange Rate Volatility

Nonlinear Regression Functions. SW Ch 8 1/54/

Longitudinal Data Analysis: Stata Tutorial

Briefing note for countries on the 2015 Human Development Report. Philippines

Madagascar. Country coverage and the methodology of the Statistical Annex of the 2015 HDR

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2

Agricultural Policies and Food Security Challenges in Zambia

Malawi. Country coverage and the methodology of the Statistical Annex of the 2015 HDR

Sierra Leone. Country coverage and the methodology of the Statistical Annex of the 2015 HDR

Briefing note for countries on the 2015 Human Development Report. Niger

Briefing note for countries on the 2015 Human Development Report. Burkina Faso

Thailand. Country coverage and the methodology of the Statistical Annex of the 2015 HDR

6. CONDUCTING SURVEY DATA ANALYSIS

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Congo (Democratic Republic of the)

Tanzania (United Republic of)

Briefing note for countries on the 2015 Human Development Report. Mozambique

Survey Analysis: Options for Missing Data

Tracking the Macroeconomy

Unit 7 The Number System: Multiplying and Dividing Integers

Monitoring and Evaluation of ICT Sector Reforms in a Pacific Island Country: A case study based on recent/actual work done

MEASURES OF VARIATION

Economic Growth and Poverty Reduction Strategies in Korea

quick start guide A Quick Start Guide inflow Support GET STARTED WITH INFLOW

Monetary Policy Bank of Canada

Sources, Methods & Compilation of Consumer Price Index (CPI).

Confidence Intervals for One Standard Deviation Using Standard Deviation

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Using Stata for Categorical Data Analysis

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Panel Data Analysis in Stata

Transcription:

Computing Poverty measures with R vs. Stata Rosendo Ramirez and Darryl McLeod Professor Vinod R-Group presentation, May 1, 2014 Fordham University E-530 Dealy 12 noon Outline of Presentation 1. Accessing survey data in R and Stata, Peru has a survey of about 25,000 persons, a longitudinal panel, 2007 to 2011. We are using the 2011 survey data, reading it first into Stata (it is published in Stata format by the Peruvian..???) 2. To make the survey same representative of the 30 million people in Peru, we have to weight each family by its relative prevalence in the national population. This weight scheme is accomplished by svyset in Stata and, more or less, by a subroutine called svydesign in R. 3. We also use a program called sepov to computer p(0), p(1) and p(2) three standard poverty measures derived from the Foster-Greer-Thorbeke or FGT poverty index. 4. We find that the Stat and R routines are equally capable of computing basic poverty rates, but so far we have not been able to implement the survey design or weighting scheme Stata uses to make a HH survey representative of the entire population. 5. On the other hand, R is free and constantly being updated and it present capacity to handle large data sets such as the peru survey of 25,000 households is impressive. 6. As of this writing, Stata s panel data routines (not shown here) are a bit easier to use that those R. In fact we have not figured out how to load the entire 5 year Peruvian survey into R (suggestions welcome). Resources/Files Camtasia Tutorial for R-Studio Early version (needs editing) (you can download this mp4 videos) How do I use the Stata survey (svy) commands? The Peruvian Nuevo Sol is the currency of Peru. Our currency rankings show that the most popular Peru Nuevo Sol exchange rate is the PEN to USD rate. The currency code for Nuevos Soles is PEN, and the currency symbol is S/.. Data: 2011 HH Survey data for Peru, from the Stata Do file for tutorial: Sample Stata output with notes All files on http://www.gdsnet.org/ R files: R file for reading Stata survey data R inflation VAR data Prueba.R (not sure what this file is)

Background note on the FGT poverty and severity measures: the headcount or H or p(0) or the poverty gap (H*I where I has distance below the poverty line of the average poor person) and the severity measure p(2) or gap squared. A useful, encompassing measure of poverty is the Foster, Greer, Thorbeke (FGT) index, where n is total population, q is the population below the poverty line yp and yi is the income of poor person i. The income gap or shortfall of each poor q yp y i FGT (1/ n) vi where vi where yp is the poverty line, yi is the income of household i, y i 1 p q is the number of poor households, n is the number of households in the entire population. Suppose the poverty line is $400 and there are four poor people with of a total population (n) of 10. The two rural poor people have $200 annual income and the two urban poor have $300. When α = 0 and the FGT index p(0) equals the basic headcount measure of poverty (H). When α= 1 the FGT index p(1) is H*I, where I is the average income shortfall or (yp - ȳ)/yp where ȳ is the average income of the poor and again yp is the official poverty line. When α = 2 the FGT poverty index or P(2) is the sum of the average income gaps squared. This implies the poorest have more weight in the poverty index, so that if the government redistributes income to the poorest of the poor, the index p(2) falls most ( remember the neediest is the NY Times motto) The global standard for severe poverty is 38/month or $1.25 a day PPP in low income countries. Middle income countries like Peru use $2.50 per day or $76 per month as their severe poverty line or $4-$5 per day for everyday or moderate poverty line. Note that the Peruvian currency, the Nuevo Sol trades at about 2.8 per dollar U.S. The PPP conversión factor for Peru is about 1.66 in other words a dollar in Peru (rural and urban) buy what a $1.66 would buy in the United Stats. Files: This Stata file contains the 24,000 HHs in the 2011 survey: sumaria2011.dta Do file program: sumaria.do Stata code clear * open the data use "D:\economic_research\r-software\fordham\sumaria2011", clear *set the data survey design svyset conglome [pw=facpob], strata(estrato) * monthly per capita expenditure National

tabstat gpcm [aw=facpob], stats(mean semean sd n ) * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8 Soles/US$ * National tabstat linpe if (estrato>=1) [aw=facpob], stats(mean p50) * Urban tabstat linpe if (estrato<6) [aw=facpob], stats(mean p50) *Rural tabstat linpe if (estrato>=6) [aw=facpob], stats(mean p50) * mean of monthly percapita expenditure - poverty in local currency (soles) exchange rate = 2.8 Soles/US$ * National tabstat linea if (estrato>=1) [aw=facpob], stats(mean p50) * Urban tabstat linea if (estrato<6) [aw=facpob], stats(mean p50) * Rural tabstat linea if (estrato>=6) [aw=facpob], stats(mean p50) * Extreme Poverty headcount * National sepov gpcm [w=facpob], povline(linea) * Urban sepov gpcm [w=facpob] if (estrato<6), povline(linea) * Rural sepov gpcm [w=facpob] if (estrato>=6), povline(linea) * Poverty headcount * National sepov gpcm [w=facpob], povline(linpe) * Urban sepov gpcm [w=facpob] if (estrato<6), povline(linpe) * Rural sepov gpcm [w=facpob] if (estrato>=6), povline(linpe)

Stata Results 1. * monthly per capita expenditure - National tabstat gpcm [aw=facpob], stats(mean semean sd n ) variable mean se(mean) sd N -------------+---------------------------------------- gpcm 484.6624 2.556388 402.6534 24809 ------------------------------------------------------ 2. * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8 Soles/US$. * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8 Soles/US$. * National. tabstat linpe if (estrato>=1) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linpe 143.0299.1328722 137.7326 --------------------------------------------. * Urban. tabstat linpe if (estrato<6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linpe 150.6009.1561769 143.5867 --------------------------------------------. *Rural. tabstat linpe if (estrato>=6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------

linpe 121.2698.0161088 121.4675 -------------------------------------------- 3.. * mean of monthly percapita expenditure - poverty in local currency (soles) exchange rate = 2.8 Soles/US$. * National. tabstat linea if (estrato>=1) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linea 272.2597.3591983 275.7272 --------------------------------------------. * Urban. tabstat linea if (estrato<6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linea 296.3015.3693753 277.5714 --------------------------------------------. * Rural. tabstat linea if (estrato>=6) [aw=facpob], stats(mean semean p50) variable mean se(mean) p50 -------------+------------------------------ linea 203.1609.0766447 200.8827 --------------------------------------------

4.. * Poverty headcount. * National. sepov gpcm [w=facpob], povline(linea) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 24809 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 24809 Population size = 29943619 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.2782429.00415.2701086.2863772 2.127523 p1.0780467.0014051.0752928.0808007 1.902044 p2.0318401.0007396.0303904.0332898 1.827785. * Urban. sepov gpcm [w=facpob] if (estrato<6), povline(linea) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 15065 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 15065

Population size = 22214450 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.1799882.0048984.1703869.1895896 2.448933 p1.0400419.0014403.0372188.042865 2.561502 p2.0138027.0006963.0124379.0151675 2.724085. * Rural. sepov gpcm [w=facpob] if (estrato>=6), povline(linea) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 9744 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 9744 Population size = 7729168.5 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.5606372.0062727.5483413.5729331 1.556334 p1.1872767.002944.1815059.1930476 1.737225 p2.0836816.0018121.0801295.0872337 1.834893

5.. * Extreme Poverty headcount. * National. sepov gpcm [w=facpob], povline(linpe) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 24809 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 24809 Population size = 29943619 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.0634228.0019537.0595934.0672523 1.594156 p1.0149874.0005739.0138625.0161122 1.588561 p2.0053678.0002667.0048451.0058906 1.497365. * Urban. sepov gpcm [w=facpob] if (estrato<6), povline(linpe) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 15065 Strata: <one> Number of strata = 1

PSU: <observations> Number of PSUs = 15065 Population size = 22214450 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.0141625.0015339.0111558.0171693 2.538724 p1.0027967.0004132.0019867.0036066 2.922182 p2.0008881.0001642.0005662.0012099 2.583605. * Rural. sepov gpcm [w=facpob] if (estrato>=6), povline(linpe) (sampling weights assumed) Poverty measures for the variable gpcm: (unlabeled) Survey mean estimation pweight: facpob Number of obs = 9744 Strata: <one> Number of strata = 1 PSU: <observations> Number of PSUs = 9744 Population size = 7729168.5 Mean Estimate Std. Err. [95% Conf. Interval] Deff ---------+-------------------------------------------------------------------- p0.2050021.0055135.1941946.2158097 1.817265 p1.0500248.001753.0465885.0534611 1.902171 p2.0182432.0008841.0165102.0199762 1.957318

Poverty measures with R Software # how to set a directory? setwd("d:/economic_research/r-software/fordham") # how to get a directory? getwd() # how to read a stata file? # download foreign package - Read Stata file in R Software # for example stata file: sumaria2011.dta, mus08psidextract.dta, etc c<-read.dta("d:/economic_research/r-software/fordham/sumaria2011.dta") summary(~gpcm) # download survey package - Data survey poverty<-svydesign(id=~conglome, strata=~estrato, weights=~facpob, data=c) monthly_percapita_expenditure<-svymean(~gpcm, design=poverty) monthly_percapita_expenditure # download ineq package - Poverty package linea<-svymean(~linea, design=poverty) linea linpe<-svymean(~linpe, design=poverty) linpe pov(c$gpcm, 143.03, parameter=1, type ="Foster") pov(c$gpcm, 272.26, parameter=1, type ="Foster")

R Software Results > monthly_percapita_expenditure<-svymean(~gpcm, design=poverty) 1. monthly_percapita_expenditure mean SE gpcm 484.66 5.3645 Comparison: We have the same mean monthly per capita expenditure but different standard error of mean Stata R software Mean gpcm 484.6624 484.66 SE(mean gpcm) 2.556388 5.3645 > # download ineq package - Poverty package 2. > # mean monthly percapita expenditure National poverty line > linea<-svymean(~linea, design=poverty) > linea mean SE linea 272.26 0.83 3. > # mean monthly percapita expenditure National extreme poverty > linpe<-svymean(~linpe, design=poverty) > linpe mean SE linpe 143.03 0.3305 Comparison We have the same mean monthly per capita expenditure extreme poverty but different standard error of mean. National Stata R Software Mean linpe 143.0299 143.03 SE Mean linpe.1328722 0.3305

We have the same mean monthly per capita expenditure poverty but different standard error of mean. National Stata R Software Mean linpe 272.2597 272.26 SE Mean linpe.3591983 0.83 4. > # mean monthly percapita expenditure - extreme poverty line National > # National extreme poverty headcount > pov(c$gpcm, 143.03, parameter=1, type ="Foster") [1] 0.1050022 > # National poverty headcount > pov(c$gpcm, 272.26, parameter=1, type ="Foster") [1] 0.3374179 > Comparison Stata takes the data survey design (wei ght) while R Software uses only the sample. National Stata (with Weighted sample) R Software (unweighted data) Headcount Extreme poverty.0634228 0.1050022 Headcount Poverty.2782429 0.3374179 I am trying to find other packages to work with poverty measures using data survey design. So far I found ineq package that works with sample no with data survey design (weight).