Crop Yield Estimation at District Level by Combining Improvement of Crop Statistics Scheme Data and Census Data



Similar documents
Predicting epidemics on directed contact networks

On the urbanization of poverty

Every manufacturer is confronted with the problem

Ross Recovery Empirical Project

A SPATIAL UNIT LEVEL MODEL FOR SMALL AREA ESTIMATION

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

On Adaboost and Optimal Betting Strategies

Sickness Absence in the UK:

The one-year non-life insurance risk

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION

10.2 Systems of Linear Equations: Matrices

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

Cross-Over Analysis Using T-Tests

State of Louisiana Office of Information Technology. Change Management Plan

GUIDELINE. Guideline for the Selection of Engineering Services

Closer Look at ACOs. Putting the Accountability in Accountable Care Organizations: Payment and Quality Measurements. Introduction

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY

A Spare Part Inventory Management Model for Better Maintenance of Intelligent Transportation Systems

Improving Direct Marketing Profitability with Neural Networks

Closer Look at ACOs. Making the Most of Accountable Care Organizations (ACOs): What Advocates Need to Know

FROM THE EDITOR Challenges in Statistics Production for Domains and Small Areas (II) Other Articles

ASAND: Asynchronous Slot Assignment and Neighbor Discovery Protocol for Wireless Networks

Corporate performance: What do investors want to know? Innovate your way to clearer financial reporting

Stock Market Value Prediction Using Neural Networks

A New Evaluation Measure for Information Retrieval Systems

How To Plan A Cloud Infrastructure

7 Help Desk Tools. Key Findings. The Automated Help Desk

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

9 Setting a Course: Goals for the Help Desk

Periodized Training for the Strength/Power Athlete

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

Designing an Authentication Strategy

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

The Boutique Premium. Do Boutique Investment Managers Create Value? AMG White Paper June

Web Appendices of Selling to Overcon dent Consumers

Designing and Deploying File Servers

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market

Ch 10. Arithmetic Average Options and Asian Opitons

Bond Calculator. Spreads (G-spread, T-spread) References and Contact details

Deploying Network Load Balancing

Hybrid Model Predictive Control Applied to Production-Inventory Systems

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND

Planning an Active Directory Deployment Project

! # % & ( ) +,,),. / % ( 345 6, & & & &&3 6

Modelling and Resolving Software Dependencies

Using GPU to Compute Options and Derivatives

TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings

Candidate: Kevin Taylor. Date: 04/02/2012

The Quick Calculus Tutorial

Manure Spreader Calibration

An Introduction to Event-triggered and Self-triggered Control

Purposefully Engineered High-Performing Income Protection

EMC ViPR Analytics Pack for VMware vcenter Operations Management Suite

Modelling football match results and the efficiency of fixed-odds betting

Calibration of the broad band UV Radiometer

Modeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc.

FINANCIAL FITNESS SELECTING A CREDIT CARD. Fact Sheet

Option Pricing for Inventory Management and Control

A Data Placement Strategy in Scientific Cloud Workflows


A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5

Towards a Framework for Enterprise Architecture Frameworks Comparison and Selection

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters

CALCULATION INSTRUCTIONS

View Synthesis by Image Mapping and Interpolation

Towers Watson Manager Research

Performance And Analysis Of Risk Assessment Methodologies In Information Security

Chapter Consider an economy described by the following equations: Y = 5,000 G = 1,000

Math , Fall 2012: HW 1 Solutions

Motorola Reinvents its Supplier Negotiation Process Using Emptoris and Saves $600 Million. An Emptoris Case Study. Emptoris, Inc.

Optimal Energy Commitments with Storage and Intermittent Supply

Transcription:

Crop Yiel Estimation at District Level by Combining Improvement of Crop Statistics Scheme Data an Censs Data U C S Inian Agricltral Statistics Research Institte, New Delhi, Inia Email: cs@iasri.res.in V K Bhatia Inian Agricltral Statistics Research Institte, New Delhi, Inia Email: vkbhatia@iasri.res.in Hkm Chanra Inian Agricltral Statistics Research Institte, New Delhi, Inia Email: hchanra@iasri.res.in A K Srivastava Fiel Operations Division, National Sample Srvey Office, Fariaba, Inia Email: aswing@reiffmail.com ABSTRACT In this article we escribe an application of small area estimation techniqes to erive istrict level estimates of crop yiel for pay in the State of Uttar Praesh sing the ata on crop ctting experiments spervise ner Improvement of Crop Statistics (ICS) scheme an the seconary ata from Poplation Censs. The reslts show consierable improvement in the estimates generate by sing small area estimation metho. Keywors: Crop ctting experiments, General crop estimation srveys, Improvement of Crop Statistics, istrict level estimates, small area estimation, Censs 1. INTRODUCTION Crop area an crop proction forms the backbone of any agricltral statistics system. In Inia, crop area figres are, by an large, compile on the basis of complete enmeration while the crop yiel is estimate on the basis of sample srvey approach. The yiel rate estimates are evelope on the basis of scientifically esigne crop ctting experiments (CCEs) concte ner 1

the scheme of General Crop Estimation Srveys (GCES). A crop ctting experiment consists of ranomly locating a fiel growing a specific crop, location an marking, as per specifie instrctions, a plot of given size an shape in the selecte fiel, harvesting, threshing an winnowing the proce within the plot an weighing the grains obtaine. Since the grain on the harveste ay contains moistre, it is store an reweighte after riage to etermine the marketable form of proce.the GCES covers 68 crops (5 foo an 16 non-foo) in 5 States an 4 Union Territories. More than 500,000 CCEs are concte annally for this prpose. This mch sample size is sfficient to provie precise estimates of crop yiel (i.e., proction per hectare of lan) at the istrict level. Althogh the CCE techniqe is an objective metho of assessment of crop yiel, the procere of conct of CCE is teios an time consming. De to this an some other factors, a tenency has been seen that the enmerators o not follow the prescribe procere for the conct of CCE in a nmber of cases. As a reslt of this, the ata qality ner the GCES is observe to be below the esirable limit. To improve the qality of ata collecte ner the GCES, a scheme title Improvement of Crop Statistics (ICS) / has been introce by the Directorate of Economics an Statistics, Ministry of Agricltre, Government of Inia an implemente by the National Sample Srvey Office (NSSO) an the State Agricltral Statistics Athority (SASA) jointly. Uner this scheme, qality check on the fiel operation of GCES is carrie ot by spervising aron 30,000 CCE by NSSO an State Government spervisory officers. The finings of the ICS reslts reveal that the crop ctting experiments are generally not carrie ot properly reslting in ata which lacks esire qality. In view of limitation of infrastrctre an constraints of resorces, there is a felt nee to rece the sample size ner GCES rastically so that volme of work of the enmerator is rece an also better spervision of the operation of CCE becomes possible leaing to improvement in ata qality. However, rection in sample size will have a irect bearing on the stanar error of the estimator. The rece sample size is more alarming when se for procing estimates at istrict level since estimators base on the sample ata from any particlar istrict can be nstable. This small sample size problem can be easily resolve provie axiliary information is available to strengthen the limite sample ata from the istrict. The nerlying theory is referre to as the small area estimation (SAE). The SAE techniqes aim at procing reliable estimates for sch istricts/areas with small (or even no) sample sizes by borrowing strength from ata of other areas. The SAE techniqes are generally base on moel-base methos, see for example, Pfeffermann (00) an Rao (003). The iea is to se statistical moels to link the variable of interest with axiliary information, e.g. Censs an Aministrative ata, for the small areas to efine moelbase estimators for these areas. Sch small area moels can be classifie into two broa types: (i) Area level ranom effect moels, which are se when axiliary information is available only at area level. They relate small area irect estimates to area-specific covariates (Fay an Herriot, 1979) an (ii) Unit level ranom effect moels, propose originally by Battese, Harter an Fller (1988). These moels relate the nit vales of a sty variable to nit-specific covariates. In this article we explore an application of SAE techniqes to erive moel-base estimates of average yiel for pay crop at small area levels in the State of Uttar Praesh in Inia by linking ata generate ner ICS scheme by NSSO (ata collecte with mch rece sample size, however, the qality of ata is very high) an the Poplation Censs 001. Small areas are efine as the istricts of State of Uttar Praesh in Inia. It is noteworthy that we aopt the area level moel since covariates for or sty are available only at the area level. The paper illstrates how the ICS ata an Censs ata can be combine to erive reliable istrict level estimates of crop yiel. The rest of the paper is organise as follows. Section introces the ata se for the analysis an Section 3 escribes the methoology applie for the analysis. In Section 4 we present the iagnostic proceres for examining the moel assmptions an valiating the small area estimates an iscss the reslts. Section 5 finally sets ot the main conclsions.

. DATA DESCRIPTION In this sty we se ata pertaining to spervise CCE on pay crop ner ICS scheme for kharif season for the State of Uttar Praesh in Inia collecte ring the year 009-10. The variable of interest for which small area estimates are reqire is yiel for pay crop. We are intereste in estimating the average yiel at the istrict level. In the State of Uttar Praesh there are 70 istricts however spervision, on a sb-sample, of crop ctting experiments work ner ICS scheme is carrie ot in 58 istricts only an there is no sample ata for the remaining 1 istricts. In what follows, we refer these 1 istricts as the ot of sample istricts. These 70 (58 in sample an 1 ot of sample) istricts are the small areas for which we are intereste in procing the estimates. The area specific sample sizes for these 58 sample istricts range from minimm of 4 to maximm of 8 CCE with average of 11 (see Figre 1). A total of 655 CCE were spervise for recoring yiel ata in the State of Uttar Praesh for pay crop for the year 009-10. We see that in a few istricts the sample size is small so the traitional sample srvey estimation approaches lea to nstable estimate. In aition, in 1 istricts e to non availability of sample ner ICS, we can not estimate pay yiel. Inee, there is no esign base soltion to provie estimates for these 1 ot of sample istricts (Pfeffermann, 00). The SAE is an obvios choice for sch cases. The covariates (axiliary variables) known for the poplation are rawn from the Poplation Censs 001. There were 11 covariates available from these sorces to consier for moelling. However, we i some exploratory ata analysis, for example, first we segregate grop of covariates with significant correlation with target variable an sbseqently we implemente step wise regression analysis. Finally we choose moel with two significant variables, average hosehol size (HH_SIZE) an female poplation of marginal hosehol (MARG_HH_F) with 6 per cent R. The resial iagnostic plots in Figre inicate that fitte moel is reasonable. For SAE analysis we therefore se these two covariates. Note that for SAE of 1 ot of sample istricts we se the same two covariates since we assme that the nerlying moel for sample areas also hols for ot of sample istricts. 3. SMALL AREA ESTIMATION METHODOLOGY In this Section we escribe the nerlining theory of SAE se in the paper. In particlar, we elaborate SAE base on the area level moel (Fay an Herriot, 1979). It was propose to estimate the per-capita income of small places with poplation size less than 1000. This moel relates small area irect srvey estimates to area-specific covariates. The SAE ner this moel is one of the most poplar methos se by private an pblic agencies becase of its flexibility in combining ifferent sorces of information an explaining ifferent sorces of errors. To start with, we first fix or notation. Throghot, we se a sbscript to inex the qantities belonging to small area or istrict ( 1,..., D), where D is the nmber of small areas (or istricts) in the poplation. Let ˆ enotes the irect srvey estimate of nobservable poplation vale for area ( 1,..., D). Let x be the p-vector of known axiliary variable, often obtaine from varios aministrative an censs recors, relate to the poplation mean. The simple area specific two stage moel sggeste by Fay an Herriot (1979) has the form T e an x, 1,..., D. (1) ˆ We can express moel (1) as an area level linear mixe moel given by ˆ T x e ; 1,..., D. () Here is a p-vector of nknown fixe effect parameters, ientically istribte normal ranom errors with E ( ) 0 an inepenent sampling errors normally istribte with Ee ( ) 0, 3 s are inepenent an Var( ), an e s are Var( e ). The two errors are inepenent of each other within an across areas. Usally, is known while is nknown an it has to be estimate from the ata. Methos of estimating incle maximm

likelihoo (ML) an restricte maximm likelihoo (REML) ner normality, the metho of fitting constants withot normality assmption, See Rao (003, Chapter 5). Let enotes estimate of. Then ner moel (), the Empirical Best Linear Unbiase Preictor (EBLUP) of is given by ˆEBLUP T ˆ ˆ T ˆ ( ˆ) ˆ ˆ T x x (1 ˆ ) x ˆ (3) q where ˆ ˆ / ( ˆ ) an ˆ is the generalize least sqare estimate of. It may be note that ˆ EBLUP is a linear combination of irect estimate ˆ an the moel base regression synthetic T estimate x ˆ, with weight ˆ. Here ˆ is calle shrinkage factor since it shrinks the irect estimator, ˆ T towars the synthetic estimator, x ˆ. For ot of sample areas (i.e. areas with n 0 ), the EBLUP preictor (3) leas to synthetic preictor of the form ˆ SYN T x ˆ. Prasa an Rao (1990) propose an approximately moel nbiase (i.e. with bias of orer o(1/d) ) estimate of mean sqare error (MSE) of the EBLUP (3) given by where ˆEBLUP MSE( ) g ( ˆ ) g ( ˆ ) g ( ˆ ) Var ˆ ( ˆ ), (4) ˆ ˆ 1 ( ) g, g ˆ 1 3 ˆ T ( ) (1 ) xvar( ) x ˆ, an g ˆ ˆ ˆ 3 ( ) 4 / ( ) 3 ( Var ) D with ( ˆ ) ˆ Var D 1 when estimating ˆ by metho of fitting constants. See Rao (003, Chapter 5) for etails abot varios theoretical evelopments. Uner moel (), the MSE estimate for the synthetic preictor ˆSYN is given by ˆSYN T MSE( ) x Var( ˆ) x ˆ. 4 ˆ 4. EMPIRICAL RESULTS This Section presents the reslts from ata an theory escribe in previos Sections. We carry ot some iagnostics to examine the reliability of small area estimates. We se the bias iagnostics an coefficient of variation to valiate the reliability of the moel-base small area estimates. We also compte the 95 percent confience (CI) intervals for both irect an moelbase estimates. The bias iagnostics is se to investigate if the moel-base estimates are less extreme when compare to the irect srvey estimates. In aition, if irect estimates are nbiase, their regression on the tre vales shol be linear an correspon to the ientity line. If moelbase estimates are close to the tre vales the regression of the irect estimates on the moel-base estimates shol be similar (Ambler et al., 001 an Chanra et al., 011). We plot irect estimates on Y-axis an moel-base estimates on X-axis an look for ivergence of regression line from Y = X an test for intercept = 0 an slope = 1. The bias scatter plots of the irect estimates against the moel-base estimates are given in Figre 3. From the bias iagnostic we fin that the intercept fails this iagnostic. The plots show that the moel-base estimates are less extreme when compare to the irect estimates, emonstrating the typical SAE otcome of shrinking more extreme vales towars the average. We compte the coefficient of variation (CV) to assess the improve precision of the moel-base estimates compare to the irect estimates. The CVs show the sampling variability as a percentage of the estimate. Althogh, there are no internationally acceptable tables for jging what CV is too high, estimates with large CVs are consiere nreliable. Figre 4 shows the CVs for the irect srvey estimates an moel-base. The figre shows that the estimate CVs for the moel-base estimates have a higher egree of reliability when compare to the irect srvey estimates. Table 1 presents the istrict-wise moel-base estimates, 95 percent confience interval (CI) limits an percentage coefficient of variation for

pay crop yiel for all 70 (i.e. both for 58 sample an 1 ot of sample) istricts. In right han sie part of Table 1, reslts for last 1 istricts correspon to ot of sample istricts. The CV reslts in Table 1 reveal that average CV of these ot of sample istricts is 0.10 per cent. Figre 5 shows the 95% CI of the moel-base an the irect srvey estimates. It is apparent that the stanar errors of the irect estimates are large an therefore the estimates are nreliable. 5. CONCLUSIONS This paper illstrates that the small area estimation techniqe can be satisfactorily applie to proce reliable istrict level estimates of crop yiel sing CCE spervise ner ICS scheme. Althogh the ICS spervise crop ctting experiments nmber only 30,000 in the entire contry i.e. the sample size is very low, the collecte ata is of very high qality. The estimates generate sing this ata are expecte to be relatively free from varios sorces of non-sampling errors. Frther small area estimation techniqe provies estimates for those istricts where there is no sample information ner ICS an so irect estimates can not be compte. It is, therefore, recommene that wherever it is not possible to conct aeqate nmber of crop ctting experiments e to constraints of cost or infrastrctre or both, small area estimation techniqe can be gainflly se to generate reliable estimates of crop yiel base on a smaller sample. REFERENCES Ambler, R., Caplan, D., Chambers, R. Kovacevic, M.an S. Wang (001). Combining Unemployment Benefits Data an LFS Data to Estimate ILO Unemployment for Small Areas: An Application of a Moifie Fay-Herriot Metho. Proceeings of the International Association of Srvey Statistician, Meeting of the ISI, Seol, Agst 001. Battese, G. E., Harter, R. M. an Fller, W. A. (1988). An Error Component Moel for Preiction of Conty Crop Areas Using Srvey an Satellite Data. Jornal of the American Statistics Association, 83, 8-36. Censs of Inia (001). Registrar General an Censs Commissioner, New Delhi, Inia. Chanra, H., Salvati, N. an S, U.C. (011). Disaggregate-level Estimates of Inebteness in the State of Uttar Praesh in Inia-An Application of Small Area Estimation Techniqe. Jornal of Applie Statistics, Forthcoming isse. Fay, R. E. an Herriot, R. A. (1979). Estimation of Income from Small Places: An Application of James-Stein Proceres to Censs Data. Jornal of the American Statistics Association, 74, 69-77. Pfeffermann, D. (00). Small Area Estimation: New Developments an Directions. International Statistical Review, 70, 15-143. Rao, J.N.K. (003). Small Area Estimation. Wiley, New York. 5

Table 1. Districts wise vales of moel-base estimate, 95 percent confience interval limits an coefficient of variation (CV) for pay (green) yiel (gm/5sq. m). Districts Estimate Lower Upper CV, % Districts Estimate Lower Upper CV, % Saharanpr 17759 13667 1851 11.5 Ambekar Nagar 16667 1365 19681 9.04 Mzaffarnagar 1708 11735 681 15.90 Sltanpr 16793 13899 19688 8.6 Bijnor 1897 16306 1547 6.9 Bahraich 14735 13606 15865 3.83 Moraaba 16781 139 13 13.6 Shrawasti 15168 10783 19553 14.46 Rampr 17174 16148 1800.99 Balrampr 1338 906 15470 1.69 Jyotiba Phle Nagar 116 8894 14351 11.74 Gona 16708 14611 18805 6.8 Ghaziaba 1676 11101 351 16.8 Siharthnagar 191 9808 16033 1.05 Blanshahar 18116 14555 1677 9.83 Basti 14165 10331 17999 13.53 Aligarh 1478 1077 1880 14.01 Sant Kabir Nagar 1373 1166 1490 6.0 Mathra 1688 83 17054 17.0 Mahrajganj 18640 14465 815 11.0 Etah 1508 1074 1474 8.93 Gorakhpr 1437 9608 1566 11.37 Mainpri 13711 9065 18357 16.94 Kshinagar 16699 1301 1096 13.17 Ban 13307 9961 1665 1.57 Deoria 8866 6143 11588 15.35 Bareilly 14140 10976 17305 11.19 Azamgarh 1033 10073 13993 8.14 Pilibhit 14687 1007 19166 15.5 Ma 10489 7090 13888 16.0 Shahjahanpr 18411 16184 0638 6.05 Ballia 7763 5056 10470 17.44 Kheri 15079 103 18135 10.13 Janpr 16418 1386 19549 9.54 Sitapr 164 1836 0007 10.9 Ghazipr 1179 8606 13953 11.85 Haroi 19315 16665 1965 6.86 Chanali 19 8333 1615 15.93 Unnao 14005 11188 1681 10.05 Varanasi 17063 1659 1468 1.91 Lcknow 184 13196 389 13.83 Sant Ravias Nagar 7133 939 1137 9.40 Rae Bareli 1987 1618 446 8.19 Mirzapr 1505 11815 1890 10.76 Farrkhaba 10446 740 13471 14.48 Sonbhara 1638 11079 1578 16.08 Kannaj 30450 7119 3378 5.47 Meert # 14984 8898 1069 0.31 Etawah 15431 13899 16964 4.97 Baghpat # 144 618 1870 5.16 Araiya 101 1711 49 9.8 Gatam Bha Nr # 16704 10436 973 18.76 Kanpr Dehat 19547 15717 3378 9.80 Hathras # 1558 9158 1357 19.99 Kanpr Nr 16315 1090 0539 1.95 Agra # 14803 8716 0890 0.56 Bana 13375 8039 18711 19.95 Firozaba # 14391 889 049 1.0 Fatehpr 15881 11406 0355 14.09 Jalan # 15186 9048 135 0.1 Pratapgarh 16437 1543 0331 11.84 Jhansi # 17378 1109 3547 17.75 Kashambi 1664 11363 1884 15.8 Lalitpr # 1698 10684 317 18.44 Allahaba 018 16164 47 10.03 Hamirpr # 1650 1073 767 18.91 Barabanki 18756 15176 336 9.54 Mahoba # 1685 10030 540 19.1 Faizaba 16556 1690 04 11.68 Chitrakoot # 14948 8773 11 0.65 # Districts with no sample information ner ICS, Nr enotes Nagar Figre 1. Distribtion of istrict-specific sample sizes in sample istricts. 6

Figre. Histogram an normal P-P plot of regression stanarize resial. Figre 3. Bias iagnostic plots for sample istricts. Direct estimates verss moel base estimates, y=x line (Soli) an linear regression fit line (ash). 7

Figre 4. Coefficient of variations of irect estimates (soli line) an moel base estimates (ash line) for sample istricts. Figre 5. 95 per cent confience interval (CI) for irect estimates (Δ) an moel base estimates (О) for sample istricts. CI for irect estimates (soli line) an CI for moel base estimates (ash line). 8